Skip to content

Model Fine-tuning Guide

Introduction

This guide covers the process of fine-tuning AI models using Local-AI-Cyber-Lab's infrastructure. Learn how to prepare data, train models, and evaluate results effectively.

Prerequisites

  • Basic understanding of machine learning
  • Prepared dataset
  • Sufficient computational resources:
  • GPU with 16GB+ VRAM recommended
  • 32GB+ RAM
  • 100GB+ free disk space

Fine-tuning Infrastructure

1. Components

  • MLflow for experiment tracking
  • Ollama for model execution
  • MinIO for artifact storage
  • Qdrant for vector storage
  • Jupyter for development

2. Directory Structure

finetune/
├── config/           # Training configurations
├── data/            # Training datasets
├── models/          # Fine-tuned models
└── scripts/         # Training scripts

Data Preparation

1. Dataset Format

{
    "conversations": [
        {
            "input": "User query or prompt",
            "output": "Desired response",
            "metadata": {
                "category": "topic",
                "source": "origin",
                "timestamp": "ISO-8601"
            }
        }
    ]
}

2. Data Processing

from datasets import Dataset
import json

def prepare_dataset(file_path: str):
    # Load data
    with open(file_path, 'r') as f:
        data = json.load(f)

    # Convert to HuggingFace dataset
    dataset = Dataset.from_dict({
        'input': [x['input'] for x in data['conversations']],
        'output': [x['output'] for x in data['conversations']]
    })

    return dataset.train_test_split(test_size=0.1)

Training Configuration

1. Basic Configuration

# config/training_config.yaml
model:
  base_model: mistral
  architecture: llama
  tokenizer: sentencepiece

training:
  batch_size: 4
  learning_rate: 2e-5
  epochs: 3
  warmup_steps: 100
  gradient_accumulation: 4

evaluation:
  metrics:
    - accuracy
    - perplexity
    - rouge

2. Advanced Settings

optimization:
  quantization:
    bits: 4
    scheme: nf4

  pruning:
    method: magnitude
    target_sparsity: 0.3

  lora:
    r: 8
    alpha: 32
    dropout: 0.1

Training Process

1. Basic Training

# Start training
./finetune/scripts/train.sh \
  --model mistral \
  --dataset ./data/training.json \
  --config ./config/training_config.yaml \
  --output ./models/custom

2. Advanced Training

from transformers import Trainer, TrainingArguments
import mlflow

def train_model(model, dataset, config):
    with mlflow.start_run():
        # Set training arguments
        training_args = TrainingArguments(
            output_dir="./results",
            num_train_epochs=config.epochs,
            per_device_train_batch_size=config.batch_size,
            learning_rate=config.learning_rate,
            warmup_steps=config.warmup_steps,
            gradient_accumulation_steps=config.gradient_accumulation
        )

        # Initialize trainer
        trainer = Trainer(
            model=model,
            args=training_args,
            train_dataset=dataset["train"],
            eval_dataset=dataset["test"]
        )

        # Train model
        trainer.train()

        # Log metrics
        metrics = trainer.evaluate()
        mlflow.log_metrics(metrics)

Model Evaluation

1. Basic Metrics

def evaluate_model(model, test_dataset):
    results = {
        "accuracy": [],
        "perplexity": [],
        "latency": []
    }

    for example in test_dataset:
        start_time = time.time()
        output = model.generate(example["input"])
        latency = time.time() - start_time

        results["accuracy"].append(
            calculate_accuracy(output, example["output"])
        )
        results["perplexity"].append(
            calculate_perplexity(output)
        )
        results["latency"].append(latency)

    return {k: np.mean(v) for k, v in results.items()}

2. Advanced Evaluation

def comprehensive_evaluation(model, test_dataset):
    evaluator = ModelEvaluator(
        metrics=[
            "accuracy",
            "perplexity",
            "rouge",
            "bleu",
            "bertscore"
        ],
        tests=[
            "robustness",
            "bias",
            "toxicity"
        ]
    )

    results = evaluator.evaluate(
        model=model,
        dataset=test_dataset
    )

    # Log to MLflow
    with mlflow.start_run():
        mlflow.log_metrics(results)

Model Deployment

1. Export Model

def export_model(model, config):
    # Save model artifacts
    model.save_pretrained("./export")

    # Create model card
    model_card = {
        "name": config.model_name,
        "version": config.version,
        "architecture": config.architecture,
        "training_data": config.dataset_info,
        "metrics": config.evaluation_results,
        "parameters": config.model_parameters
    }

    with open("./export/model_card.json", "w") as f:
        json.dump(model_card, f)

2. Deploy to Ollama

# Convert and deploy model
ollama import ./export/model.tar.gz

# Test deployment
ollama run custom-model "Test prompt"

Monitoring and Optimization

1. Training Monitoring

from langfuse import Langfuse

langfuse = Langfuse()

def monitor_training(run_id: str):
    # Log training metrics
    langfuse.log_metrics(
        run_id=run_id,
        metrics={
            "loss": current_loss,
            "accuracy": current_accuracy,
            "gpu_utilization": get_gpu_usage(),
            "memory_usage": get_memory_usage()
        }
    )

2. Performance Optimization

def optimize_model(model, config):
    # Quantization
    if config.quantization.enabled:
        model = quantize_model(
            model,
            bits=config.quantization.bits
        )

    # Pruning
    if config.pruning.enabled:
        model = prune_model(
            model,
            target_sparsity=config.pruning.target_sparsity
        )

    return model

Best Practices

1. Data Quality

  • Clean and validate data
  • Balance dataset
  • Remove duplicates
  • Handle missing values

2. Training Process

  • Start with small datasets
  • Monitor resource usage
  • Use checkpointing
  • Implement early stopping

3. Model Evaluation

  • Use multiple metrics
  • Test edge cases
  • Validate outputs
  • Monitor performance

Support

For fine-tuning assistance: