Fine-Tuning Large Language Models

Optimizing with Quantization and LoRA

Watch Recording

Check out the first installment of Scaling GenAI: the Builder's Toolkit. As you transition from generative AI PoCs to bringing models to production, you must adapt LLMs to your company's use cases.

Fine-tuning emerged as the way to infuse domain knowledge into pre-trained models. Yet fine-tuning can be very demanding on infrastructure, slow, and expensive. This on-demand webinar will help you overcome these challenges.

The session recording will explore two optimization techniques: Quantization and Low-Rank Adaptation (or LoRA).

Key Takeaways:

  • Review the motivation and theory behind PEFT (parameter-efficient fine-tuning) techniques
  • Discover the power of quantization with the Huggingface Trainer on Domino using Falcon-40b
  • Investigate LoRA with the Falcon-7b LLM using PyTorch Lightning