Skip to content
Sign in

What is LoRA?

Also known as: Low-Rank Adaptation

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method that lets users adapt large pre-trained AI models to new tasks by updating only a tiny fraction of parameters instead of the full model.

LoRA works by freezing the original model weights and injecting trainable low-rank decomposition matrices into selected layers. These matrices (typically called A and B) approximate the weight updates needed for the new task while keeping the total number of trainable parameters very small.

During training only the low-rank matrices are optimized; at inference time their effect can be merged back into the original weights so there is no extra latency. This approach dramatically cuts memory and compute requirements compared with full fine-tuning.

The rank hyper-parameter controls the size of the low-rank matrices and therefore the trade-off between adaptation capacity and efficiency.

Example

A user fine-tunes a 7-billion-parameter language model on a custom customer-support dataset. Instead of updating all 7 B parameters, LoRA trains only about 8 million parameters (rank 8 adapters), allowing the process to run on a single consumer GPU in a few hours.

Why it matters

LoRA makes it practical for individuals and small teams to customize powerful foundation models without massive cloud bills, accelerating research and enabling widespread personalized AI applications.

Frequently asked questions

Normal fine-tuning updates every model weight; LoRA freezes the original weights and only trains small low-rank matrices, using far less memory and storage.