Local AI Processing

Run a Local LLM on iPhone, iPad and Mac

On Device AI turns Apple hardware into an advanced local AI workspace with 190+ model options, dual GGUF and MLX engines, and custom model imports.

Local LLM settings and GGUF parameters on macOS

1. Choose Your Inference Engine (GGUF vs. MLX)

On Device AI stands out by supporting the two major local inference models in modern Apple architectures, allowing users to choose the optimal engine for their hardware:

  • GGUF (via llama.cpp): Offers broad model compatibility and operates universally across modern iOS, iPadOS, macOS, and visionOS devices. Perfect for general open-weight models.
  • MLX (Apple Silicon native): Apple's machine learning framework, engineered specifically for Apple hardware. MLX provides enhanced memory management and lightning-fast inference on Apple Silicon Macs, utilizing unified memory to its fullest potential.

2. Choose a Model Based on Device Memory

Because local processing relies heavily on physical RAM (or Unified Memory in Apple Silicon), On Device AI helps you choose compatible models tailored to your specific hardware configurations:

  • For iPhones/iPads (6GB - 8GB RAM): Select compact, optimized models such as DeepSeek-R1 1.5B, Qwen 2.5 1.5B/3B, or Gemma 2 2B. These fit easily inside mobile memory footprints without triggering OS memory pressure limits.
  • For iPads/Macs (8GB - 16GB RAM): Comfortably execute high-reasoning models like Llama 3 8B, Phi-4 14B, or Qwen 2.5 7B.
  • For Pro Macs (24GB - 128GB Unified Memory): Experience massive reasoning models up to 32B or 70B parameters locally at high tokens-per-second, entirely offline.

3. Custom Hugging Face GGUF Imports

Not limited to the built-in model library? On Device AI includes a custom downloader: simply copy a direct GGUF model download link from repositories like Hugging Face, paste it in the app's Import section, and download it natively. Your custom model becomes immediately available in chat and subagent workflows.

4. Private, Performant, and Pure Native

By bypassing sluggish Electron wrappers or web view wrappers, On Device AI is written purely in SwiftUI to ensure native hardware acceleration. Because the models execute on your neural engine and GPU local cores, no text, conversations, or files ever leave your device.

Download On Device AI Read Model Setup Guide →