Skip to content

Common Issues and Solutions

This guide covers frequently encountered issues in the Local AI Cyber Lab environment and their solutions.

AI Service Issues

Model Loading Failures

Symptoms

  • Models fail to load
  • Slow model initialization
  • Out of memory errors

Solutions

  1. Check GPU Memory Usage:
    nvidia-smi
    
  2. Ensure sufficient GPU memory is available
  3. Consider using smaller models or enabling model offloading

  4. Verify Model Files:

    ls -l models/
    sha256sum models/your-model.bin
    

  5. Confirm model files are complete and uncorrupted
  6. Compare checksums with original sources

  7. Check File Permissions:

    chmod 644 models/*
    chown -R user:group models/
    

API Connection Issues

Symptoms

  • API timeouts
  • Authentication failures
  • Connection refused errors

Solutions

  1. Verify Service Status:

    docker-compose ps
    docker logs ai-guardian
    

  2. Check API Keys:

  3. Verify key format and expiration
  4. Ensure proper environment variable setup
  5. Check rate limits

  6. Network Connectivity:

    curl -v http://localhost:8000/health
    docker network ls
    

Security Component Issues

AI Guardian Service

Symptoms

  • Failed security checks
  • Blocked legitimate requests
  • High latency in security validation

Solutions

  1. Review Security Logs:

    tail -f logs/ai-guardian.log
    

  2. Adjust Security Rules:

  3. Review and update validation rules
  4. Check for false positives
  5. Tune rate limiting settings

  6. Monitor Resource Usage:

    docker stats ai-guardian
    

Database Connection Issues

Symptoms

  • Failed database operations
  • Connection timeouts
  • Data consistency errors

Solutions

  1. Check Database Status:

    docker-compose ps supabase-db
    docker logs supabase-db
    

  2. Verify Connection Settings:

  3. Check database URL and credentials
  4. Verify network connectivity
  5. Review connection pool settings

Performance Issues

Slow Response Times

Symptoms

  • High latency in API responses
  • Slow model inference
  • System resource exhaustion

Solutions

  1. Monitor System Resources:

    htop
    nvidia-smi -l 1
    

  2. Optimize Configuration:

  3. Adjust worker counts
  4. Enable caching
  5. Configure model optimization settings

  6. Check Logging Levels:

  7. Reduce debug logging in production
  8. Configure log rotation
  9. Monitor log file sizes

Memory Management

Symptoms

  • Out of memory errors
  • System slowdown
  • Container restarts

Solutions

  1. Monitor Memory Usage:

    docker stats
    free -h
    

  2. Adjust Resource Limits:

  3. Update container memory limits
  4. Configure swap space
  5. Implement memory optimization strategies

Integration Issues

Service Communication

Symptoms

  • Inter-service timeouts
  • Failed service discovery
  • Network connectivity issues

Solutions

  1. Check Docker Network:

    docker network inspect local-ai-cyber-lab_default
    

  2. Verify Service Discovery:

  3. Check DNS resolution
  4. Verify service names and ports
  5. Review network policies

  6. Test Connectivity:

    docker-compose exec service-name ping other-service
    

Recovery Procedures

System Recovery

  1. Backup Current State:

    ./scripts/backup.sh
    

  2. Stop Services:

    docker-compose down
    

  3. Clear Problematic State:

    docker system prune
    

  4. Restore from Backup:

    ./scripts/restore.sh backup_file
    

Emergency Procedures

  1. Quick Service Restart:

    docker-compose restart service-name
    

  2. Force Clean Restart:

    docker-compose down -v
    docker-compose up -d
    

  3. Reset to Known Good State:

    git checkout main
    docker-compose pull
    docker-compose up -d
    

Getting Help

If you continue to experience issues:

  1. Check the GitHub Issues
  2. Join our Discord Community
  3. Review the Documentation
  4. Contact Support