1. Choose Your Inference Engine (GGUF vs. MLX)
On Device AI stands out by supporting the two major local inference models in modern Apple architectures, allowing users to choose the optimal engine for their hardware:
- GGUF (via llama.cpp): Offers broad model compatibility and operates universally across modern iOS, iPadOS, macOS, and visionOS devices. Perfect for general open-weight models.
- MLX (Apple Silicon native): Apple's machine learning framework, engineered specifically for Apple hardware. MLX provides enhanced memory management and lightning-fast inference on Apple Silicon Macs, utilizing unified memory to its fullest potential.
2. Choose a Model Based on Device Memory
Because local processing relies heavily on physical RAM (or Unified Memory in Apple Silicon), On Device AI helps you choose compatible models tailored to your specific hardware configurations:
- For iPhones/iPads (6GB - 8GB RAM): Select compact, optimized models such as DeepSeek-R1 1.5B, Qwen 2.5 1.5B/3B, or Gemma 2 2B. These fit easily inside mobile memory footprints without triggering OS memory pressure limits.
- For iPads/Macs (8GB - 16GB RAM): Comfortably execute high-reasoning models like Llama 3 8B, Phi-4 14B, or Qwen 2.5 7B.
- For Pro Macs (24GB - 128GB Unified Memory): Experience massive reasoning models up to 32B or 70B parameters locally at high tokens-per-second, entirely offline.
3. Custom Hugging Face GGUF Imports
Not limited to the built-in model library? On Device AI includes a custom downloader: simply copy a direct GGUF model download link from repositories like Hugging Face, paste it in the app's Import section, and download it natively. Your custom model becomes immediately available in chat and subagent workflows.
4. Private, Performant, and Pure Native
By bypassing sluggish Electron wrappers or web view wrappers, On Device AI is written purely in SwiftUI to ensure native hardware acceleration. Because the models execute on your neural engine and GPU local cores, no text, conversations, or files ever leave your device.