Model Fine-tuning Guide¶
Introduction¶
This guide covers the process of fine-tuning AI models using Local-AI-Cyber-Lab's infrastructure. Learn how to prepare data, train models, and evaluate results effectively.
Prerequisites¶
- Basic understanding of machine learning
- Prepared dataset
- Sufficient computational resources:
- GPU with 16GB+ VRAM recommended
- 32GB+ RAM
- 100GB+ free disk space
Fine-tuning Infrastructure¶
1. Components¶
- MLflow for experiment tracking
- Ollama for model execution
- MinIO for artifact storage
- Qdrant for vector storage
- Jupyter for development
2. Directory Structure¶
finetune/
├── config/ # Training configurations
├── data/ # Training datasets
├── models/ # Fine-tuned models
└── scripts/ # Training scripts
Data Preparation¶
1. Dataset Format¶
{
"conversations": [
{
"input": "User query or prompt",
"output": "Desired response",
"metadata": {
"category": "topic",
"source": "origin",
"timestamp": "ISO-8601"
}
}
]
}
2. Data Processing¶
from datasets import Dataset
import json
def prepare_dataset(file_path: str):
# Load data
with open(file_path, 'r') as f:
data = json.load(f)
# Convert to HuggingFace dataset
dataset = Dataset.from_dict({
'input': [x['input'] for x in data['conversations']],
'output': [x['output'] for x in data['conversations']]
})
return dataset.train_test_split(test_size=0.1)
Training Configuration¶
1. Basic Configuration¶
# config/training_config.yaml
model:
base_model: mistral
architecture: llama
tokenizer: sentencepiece
training:
batch_size: 4
learning_rate: 2e-5
epochs: 3
warmup_steps: 100
gradient_accumulation: 4
evaluation:
metrics:
- accuracy
- perplexity
- rouge
2. Advanced Settings¶
optimization:
quantization:
bits: 4
scheme: nf4
pruning:
method: magnitude
target_sparsity: 0.3
lora:
r: 8
alpha: 32
dropout: 0.1
Training Process¶
1. Basic Training¶
# Start training
./finetune/scripts/train.sh \
--model mistral \
--dataset ./data/training.json \
--config ./config/training_config.yaml \
--output ./models/custom
2. Advanced Training¶
from transformers import Trainer, TrainingArguments
import mlflow
def train_model(model, dataset, config):
with mlflow.start_run():
# Set training arguments
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=config.epochs,
per_device_train_batch_size=config.batch_size,
learning_rate=config.learning_rate,
warmup_steps=config.warmup_steps,
gradient_accumulation_steps=config.gradient_accumulation
)
# Initialize trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset["train"],
eval_dataset=dataset["test"]
)
# Train model
trainer.train()
# Log metrics
metrics = trainer.evaluate()
mlflow.log_metrics(metrics)
Model Evaluation¶
1. Basic Metrics¶
def evaluate_model(model, test_dataset):
results = {
"accuracy": [],
"perplexity": [],
"latency": []
}
for example in test_dataset:
start_time = time.time()
output = model.generate(example["input"])
latency = time.time() - start_time
results["accuracy"].append(
calculate_accuracy(output, example["output"])
)
results["perplexity"].append(
calculate_perplexity(output)
)
results["latency"].append(latency)
return {k: np.mean(v) for k, v in results.items()}
2. Advanced Evaluation¶
def comprehensive_evaluation(model, test_dataset):
evaluator = ModelEvaluator(
metrics=[
"accuracy",
"perplexity",
"rouge",
"bleu",
"bertscore"
],
tests=[
"robustness",
"bias",
"toxicity"
]
)
results = evaluator.evaluate(
model=model,
dataset=test_dataset
)
# Log to MLflow
with mlflow.start_run():
mlflow.log_metrics(results)
Model Deployment¶
1. Export Model¶
def export_model(model, config):
# Save model artifacts
model.save_pretrained("./export")
# Create model card
model_card = {
"name": config.model_name,
"version": config.version,
"architecture": config.architecture,
"training_data": config.dataset_info,
"metrics": config.evaluation_results,
"parameters": config.model_parameters
}
with open("./export/model_card.json", "w") as f:
json.dump(model_card, f)
2. Deploy to Ollama¶
# Convert and deploy model
ollama import ./export/model.tar.gz
# Test deployment
ollama run custom-model "Test prompt"
Monitoring and Optimization¶
1. Training Monitoring¶
from langfuse import Langfuse
langfuse = Langfuse()
def monitor_training(run_id: str):
# Log training metrics
langfuse.log_metrics(
run_id=run_id,
metrics={
"loss": current_loss,
"accuracy": current_accuracy,
"gpu_utilization": get_gpu_usage(),
"memory_usage": get_memory_usage()
}
)
2. Performance Optimization¶
def optimize_model(model, config):
# Quantization
if config.quantization.enabled:
model = quantize_model(
model,
bits=config.quantization.bits
)
# Pruning
if config.pruning.enabled:
model = prune_model(
model,
target_sparsity=config.pruning.target_sparsity
)
return model
Best Practices¶
1. Data Quality¶
- Clean and validate data
- Balance dataset
- Remove duplicates
- Handle missing values
2. Training Process¶
- Start with small datasets
- Monitor resource usage
- Use checkpointing
- Implement early stopping
3. Model Evaluation¶
- Use multiple metrics
- Test edge cases
- Validate outputs
- Monitor performance
Support¶
For fine-tuning assistance:
- Email: support@cyber-ai-agents.com
- Documentation: Fine-tuning API
- Examples: Training Examples