Scaling GenAI Series
Advanced Parameter Efficient Fine-tuning
Scaling Fine-Tuning with Ray and DeepSpeed ZeRO
Save Your Spot!
Virtual Event: December 12th
9AM PT | 12PM ET | 5PM BST
One of the biggest challenges with LLM fine-tuning involves model storage in memory. Even with PEFT, models must be available in GPU memory while training. Only the most advanced and expensive GPUs have enough RAM to hold the most advanced models. And that is if you can buy or get one from your cloud provider.
That is where ZeRO (Zero Redundancy Optimizer) techniques come into play. ZeRO allows fine-tuning a large LLM using multiple, smaller-memory GPUs in parallel. ZeRO is realized via the DeepSpeed library. DeepSpeed, when integrated with a Ray cluster, can accelerate fine-tuning further across multiple nodes.
- Learn about ZeRO and how it helps you fine-tune models across GPUs
- Using DeepSpeed and Ray to fine-tune GPTJ-6b model
- Load the model for inference onto a GPU or a CPU