What is Pruning and knowledge Distillation in LLM?

Welcome back to Another story where we will try to understand some of the concepts in the LLM Industry which goes over our heads 😂

Following AI updates, the glossary and the jargon are very overwhelming. Sometimes we get lost while trying to survive in this whirlpool of tech revolution.

Okay, let’s come back to the main topic.

Recently I was going through an article where that mentioned how we can break down a large language model trained on large datasets into smaller models to increase the model's efficiency and improve the performance.

The term they used is called Pruning.

Pruning is the process of making the model smaller and leaner, either by dropping layers (depth pruning) or dropping neurons and attention heads and embedding channels (width pruning).

Now this is a new term that we came across. In simple terms, it is the process of reducing the size of the model by reducing it either layer by layer or disconnecting neurons from the network.

In this way, we can break down the model into smaller chunks.

Now, Reducing the model size and converting it into smaller models does not solve the problem. We need to make those models usable and more effective.

For this, there is a process called Model Distillation

Model distillation is a technique used to transfer knowledge from a large, complex model, to a smaller, simpler model. The goal is to create a more efficient model that retains much of the predictive power of the original larger model while being faster and less resource-intensive to run.

In simple words, we have to make these smaller models capable equal to the large parent model. There are a couple of techniques used in this distillation process.

  1. Classic Data Finetuning

As the name suggests, it uses the original training data on which the parent data is trained and makes these smaller models effective.

2. Synthetic Data Generation Finetuning

Synthetic data is additional artificial data created by the LLM based on the classic data to create more data to train additional models. These new models are fine-tuned with classic and also synthetic data generated by these models.

The benefits of doing this are to reduce costs, increase efficiency, and tailor the smaller models for specific use case serving.

There are other technical domain-specific benefits like benchmark performances, token utilization and other performance metric will come into consideration to use these smaller models.

There is a lot to uncover, including the process of converting one large LLM into a smaller LLM, and other technical terms that are essential in the process of learning about it.

Thanks for reading and I will meet you in the next article

In Plain English 🚀

Thank you for being a part of the In Plain English community! Before you go:


What is Pruning and knowledge Distillation in LLM? was originally published in Artificial Intelligence in Plain English on Medium, where people are continuing the conversation by highlighting and responding to this story.

* This article was originally published here

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses cookies to offer you a better browsing experience. By browsing this website, you agree to our use of cookies.