Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ftllmweb #1632

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
title: LLM Fine-Tuning for Web Applications

minutes_to_complete: 60

who_is_this_for: This learning path provides an introduction for developers and data scientists new to fine-tuning large language models (LLMs) and looking to develop a fine-tuned LLM for web applications. Fine-tuning involves adapting a pre-trained LLM to specific tasks or domains by training it on domain-specific data and optimizing its responses for accuracy and relevance. For web applications, fine-tuning enables personalized interactions, enhanced query handling, and improved contextual understanding, making AI-driven features more effective. This session will cover key concepts, techniques, tools, and best practices, ensuring a structured approach to building a fine-tuned LLM that aligns with real-world web application requirements.

learning_objectives:
- Learn the basics of large language models (LLMs) and how fine-tuning enhances model performance for specific use cases.
- Understand full fine-tuning, parameter-efficient fine-tuning (e.g., LoRA, QLoRA, PEFT), and instruction-tuning.
- Learn when to use different fine-tuning approaches based on model size, task complexity, and computational constraints.
- Learn how to curate, clean, and preprocess domain-specific datasets for optimal fine-tuning.
- Understand dataset formats, tokenization, and annotation techniques for improving model learning.
- Implementing Fine-Tuning with Popular Frameworks like Hugging Face Transformers and PyTorch for LLM fine-tuning.

prerequisites:
- Basic Understanding of Machine Learning & Deep Learning (Familiarity with concepts like supervised learning, neural networks, transfer learning and Understanding of model training, validation, & overfitting concepts)
- Familiarity with Deep Learning Frameworks (Experience with PyTorch for building, training neural networks and Knowledge of Hugging Face Transformers for working with pre-trained LLMs.

author: Parichay Das

### Tags
skilllevels: Introductory
subjects: GenAI
armips:
- Neoverse

tools_software_languages:
- LLM
- GenAI
- Python
- PyTorch
- ExecuTorch
operatingsystems:
- Linux
- Windows



further_reading:
- resource:
title: Hugging Face Documentation
link: https://huggingface.co/docs
type: documentation
- resource:
title: PyTorch Documentation
link: https://pytorch.org/docs/stable/index.html
type: documentation




### FIXED, DO NOT MODIFY
# ================================================================================
weight: 1 # _index.md always has weight of 1 to order correctly
layout: "learningpathall" # All files under learning paths have this same wrapper
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
# ================================================================================
# FIXED, DO NOT MODIFY THIS FILE
# ================================================================================
weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
title: "Next Steps" # Always the same, html page title.
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
title: Overview
weight: 2

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## What is Fine-Tuning
Fine-tuning in the context of large language models (LLMs) refers to the process of further training a pre-trained LLM on domain-specific or task-specific data to enhance its performance for a particular application. LLMs, such as GPT, BERT, and LLaMA, are initially trained on massive corpora containing billions of tokens, enabling them to develop a broad linguistic understanding. Fine-tuning refines this knowledge by exposing the model to specialized datasets, allowing it to generate more contextually relevant and accurate responses. Rather than training an LLM from scratch, fine-tuning leverages the pre-existing knowledge embedded in the model, optimizing it for specific use cases such as customer support, content generation, legal document analysis, or medical text processing. This approach significantly reduces computational requirements and data needs while improving adaptability and efficiency in real-world applications.

## Advantage of Fine-Tuning
Fine-tuning is essential for optimizing large language models (LLMs) to meet specific application requirements, enhance performance, and reduce computational costs. While pre-trained LLMs have broad linguistic capabilities, they may not always produce domain-specific, contextually accurate, or application-tailored responses
- Customization for Specific Domains
- Improved Response Quality and Accuracy
- Task-Specific Adaptation
- Reduction in Computational and Data Requirements
- Enhanced Efficiency in Real-World Applications
- Alignment with Ethical, Regulatory, and Organizational Guidelines

## Fine-Tuning Methods
Fine-tuning LLM uses different techniques based on the various use cases, computational constraints, and efficiency requirements. Below are the key fine-tuning methods:

### Full Fine-Tuning (Supervised Learning Approach)
It involves updating all parameters of the LLM using task-specific data, requiring significant computational power and large labeled datasets, which provides the highest level of customization.

### Instruction Fine-Tuning
Instruction fine-tuning is a supervised learning method. A pre-trained large language model (LLM) is further trained on instruction-response pairs to improve its ability to follow human instructions accurately. Instruction Fine-Tuning has some key features using Labeled Instruction-Response Pairs, Enhances Model Alignment with Human Intent, Commonly Used in Chatbots and AI Assistants, and Prepares Models for Zero-Shot and Few-Shot Learning.

### Parameter-Efficient Fine-Tuning (PEFT)
It is a optimized approaches that reduce the number of trainable parameters while maintaining high performance:

- ###### LoRA (Low-Rank Adaptation)
- Introduces small trainable weight matrices (rank decomposition) while freezing the main model weights.
- It will significantly reduce GPU memory usage and training time.

- ###### QLoRA (Quantized LoRA)
- It will use quantization (e.g., 4-bit or 8-bit precision) to reduce memory footprint while applying LoRA fine-tuning.
- It is Ideal for fine-tuning large models on limited hardware.

- ###### Adapter Layers
- Inserts small trainable layers between existing layers of the model and Keeps most parameters frozen, reducing computational overhead.

- ###### Reinforcement Learning from Human Feedback (RLHF)
- Fine-tunes models based on human preferences using reinforcement learning.

- ###### Domain-Specific Fine-Tuning
- Fine-tunes the LLM with domain-specific datasets and Improves accuracy and relevance in specialized applications.

- ###### Multi-Task Learning (MTL) Fine-Tuning
- Trains the model on multiple tasks simultaneously, enabling generalization across different applications.



## Fine-Tuning Implementaion
The following steps need to be performed to implement fine-tuning:


![example image alt-text#center](1.png "Figure 1. Fine-Tuning Implementaion")

- Base Model Selection: Choose a pre-trained model based on your use cases. You can find pre-trained models at [Hugging Face](https://huggingface.co/)
- Fine-Tuning Method Finalization: Select the most appropriate fine-tuning method (e.g., supervised, instruction-based, PEFT) based on your use case and dataset. You can typically find various datasets on [Hugging Face](https://huggingface.co/datasets) and [Kaggle](https://www.kaggle.com/datasets).
- Dataset Prepration:Organize your data for your use case-specific training, ensuring it aligns with the model's required format.
- Training:Utilize frameworks such as TensorFlow and PyTorch to fine-tune the model.
- Evaluate: Evaluate the model, refine it as needed, and retrain to enhance performance
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
title: Fine Tuning Large Language Model - Setup Environment
weight: 3

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## Fine Tuning Large Language Model - Setup Environment

#### Set Up Required Libraries
The following commands install the necessary libraries for the task, including Hugging Face Transformers, Datasets, and fine-tuning methods. These libraries facilitate model loading, training, and fine-tuning

###### The transformers library (by Hugging Face) provides pre-trained LLMs
```python
!pip install transformers

```
###### This installs transformers along with PyTorch, ensuring that models are trained and fine-tuned using the Torch backend.
```python
!pip install transformers[torch]
```
###### The datasets library (by Hugging Face) provides access to a vast collection of pre-built datasets

```python
!pip install datasets
```
###### The evaluate library provides metrics for model performance assessment

```python
!pip install evaluate
```
###### Speed up fine-tuning of Large Language Models (LLMs)
[Unsloth](https://huggingface.co/unsloth) is a library designed to speed up fine-tuning of Large Language Models (LLMs) while reducing computational costs. It optimizes training efficiency, particularly for LoRA (Low-Rank Adaptation) fine-tuning
```python
%%capture
# %%capture is a Jupyter Notebook magic command that suppresses the output of a cell.

!pip install unsloth
```
##### Uninstalls the existing Unsloth installation and installs the latest version directly from the GitHub repository

```python
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
---
title: Fine Tuning Large Language Model - Load Pre-trained Model & Tokenizer
weight: 4

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## Fine Tuning Large Language Model - Load Pre-trained Model & Tokenizer

#### Load Pre-trained Model & Tokenizer
The following commands Load the pre-trained model and tokenizer, ensuring compatibility with the fine-tuning task and optimizing memory usage

###### Import Required Modules
- FastLanguageModel: A highly optimized loader for LLaMA models in Unsloth, making it faster and memory-efficient.
- torch: Required for handling tensors and computations.
```python
from unsloth import FastLanguageModel
import torch

```
###### Define Model Configuration
- max_seq_length = 2048 → Defines the maximum number of tokens the model can process at once.
- dtype = None → Auto-selects Float16 for older GPUs (Tesla T4, V100)
- load_in_4bit = True → Enables 4-bit quantization to reduce memory usage
```python
max_seq_length = 2048
dtype = None
load_in_4bit = True
```
###### Load the Pre-trained Model
- Loads a 1B parameter fine-tuned LLaMA model
- Loads the optimized LLaMA model with reduced VRAM usage and faster processing
- Loads the corresponding tokenizer for tokenizing inputs properly

```python
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/Llama-3.2-1B-Instruct",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
```
###### Parameter-Efficient Fine-Tuning (PEFT) using LoRA (Low-Rank Adaptation) for the pre-trained model
- LoRA Rank (r): Defines the rank of the low-rank matrices used in LoRA
- Target Modules: Specifies which layers should be fine-tuned with LoRA, Includes attention layers (q_proj, k_proj, v_proj, o_proj) and feedforward layers (gate_proj, up_proj, down_proj)
- LoRA Alpha (lora_alpha):Scaling factor for LoRA weights and A higher value makes the LoRA layers contribute more to the model's output
- LoRA Dropout: Dropout randomly disables connections to prevent overfitting
- Bias (bias): No additional bias parameters are trained (optimized for efficiency)
- Gradient Checkpointing: Optimized memory-saving method
- Random Seed: Ensures reproducibility across training runs
- Rank-Stabilized LoRA: Rank stabilization not used
- LoFTQ Quantization: No LoFTQ (Low-bit Quantization) applied
```python
model = FastLanguageModel.get_peft_model(
model,
r = 16,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0,
bias = "none",
use_gradient_checkpointing = "unsloth",
random_state = 3407,
use_rslora = False,
loftq_config = None,
)
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
---
title: Fine Tuning Large Language Model - Prepare Dataset
weight: 5

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## Fine Tuning Large Language Model - Prepare Dataset
This step prepares the dataset for fine-tuning by formatting it to match the LLaMA-3.1 chat template.

###### Import Chat Template for Tokenizer
This imports the chat template functionality from Unsloth and It allows us to structure the dataset in a format that LLaMA-3.1 expects
```python
from unsloth.chat_templates import get_chat_template
```

###### Apply the Chat Template to Tokenizer
- Apply the Chat Template to Tokenizer.
- Ensures prompt formatting is consistent when training the model.
```python
tokenizer = get_chat_template(
tokenizer,
chat_template = "llama-3.1",
)


```
###### Format Dataset Prompts
- Extracts the instruction column from the dataset.
- Applies the chat template formatting to each instruction.
- Returns a new dictionary with the formatted text.
```python
def formatting_prompts_func(examples):
convos = examples["instruction"]
texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
return { "text" : texts, }
pass
```
###### Load the Dataset
- LLoads a [customer support chatbot training dataset](https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset) from Hugging Face
- The dataset contains example conversations with instructions for fine-tuning
- Loads the corresponding tokenizer for tokenizing inputs properly

```python
from datasets import load_dataset
dataset = load_dataset("bitext/Bitext-customer-support-llm-chatbot-training-dataset", split = "train")

```
![example image alt-text#center](2.png )

###### Import Standardization Function
- Imports standardize_sharegpt, a function that helps in structuring dataset inputs in a ShareGPT-like format (a commonly used format for LLM fine-tuning).
- Ensures that data follows a standardized format required for effective instruction tuning.
```python
from unsloth.chat_templates import standardize_sharegpt
```
###### Define a Function to Format Dataset
- Extracts the instruction (input text) and response (output text) from the dataset.
- Stores them as "instruction_text" and "response_text".
```python
def formatting_prompts_func(examples):
return { "instruction_text": examples["instruction"], "response_text": examples["response"] }

```

###### Apply Formatting to Dataset
- Applies formatting_prompts_func to every record in the dataset.
- Uses batch processing (batched=True) for efficiency.
```python
def formatting_prompts_func(examples):
return { "instruction_text": examples["instruction"], "response_text": examples["response"] }

```
![example image alt-text#center](3.png )
Loading