Skip to content

What is PEFT?

Also known as: Parameter-Efficient Fine-Tuning

PEFT (Parameter-Efficient Fine-Tuning) is a family of techniques that adapt large pre-trained models to new tasks by updating or adding only a tiny fraction of parameters instead of retraining the entire model.

Traditional fine-tuning updates every weight in a model, which becomes prohibitively expensive in memory and compute as models grow to billions of parameters. PEFT methods freeze the original weights and introduce small trainable components or selectively update a subset of parameters.

Common approaches include low-rank adapters (LoRA), bottleneck adapters, prefix tuning, and prompt tuning. These modules are inserted into the network and trained while the base model stays fixed, dramatically cutting the number of optimized parameters—often to less than 1 % of the total.

After training, the small adapter weights can be merged with the base model or swapped at inference time, enabling multiple task-specific versions to share the same large backbone.

Example

A developer fine-tunes a 7-billion-parameter LLM for customer-support chat using LoRA; only 8 million new parameters are trained on a single GPU, then the adapter is merged to produce a specialized model without retraining the full network.

Why it matters

PEFT makes it practical for individuals and small teams to customize massive foundation models on modest hardware, lowering cost barriers and enabling widespread, task-specific AI deployment.

Frequently asked questions

Regular fine-tuning updates all model weights; PEFT updates or adds only a small number of parameters while keeping most of the model frozen.