Advanced parameter efficient fine-tuning

Scaling fine-tuning with Ray and DeepSpeed ZeRO

Watch On Demand

One of the biggest challenges with LLM fine-tuning involves model storage in memory. Even with PEFT, models must be available in GPU memory while training. Only the most advanced and expensive GPUs have enough RAM to hold the most advanced models. And that is if you can buy or get one from your cloud provider.

That is where ZeRO (Zero Redundancy Optimizer) techniques come into play. ZeRO allows fine-tuning a large LLM using multiple, smaller-memory GPUs in parallel. ZeRO is realized via the DeepSpeed library. DeepSpeed, when integrated with a Ray cluster, can accelerate fine-tuning further across multiple nodes.

Key takeaways:

  • Learn about ZeRO and how it helps you fine-tune models across GPUs
  • Using DeepSpeed and Ray to fine-tune GPTJ-6b model
  • Load the model for inference onto a GPU or a CPU