Cloud Providers
While On Device AI is designed for 100% offline operation, you can optionally connect to cloud AI providers when you need access to larger models or additional capabilities. Cloud is always opt-in and off by default.
Supported Providers
On Device AI supports the following cloud and local server providers:
- OpenAI: GPT-4o, GPT-4 Turbo, GPT-3.5 Turbo, and more
- Anthropic: Claude 3 Opus, Sonnet, Haiku, and Claude 4 models
- Google Gemini: Gemini Pro, Gemini Flash, and other models
- Mistral: Access Mistral models via OpenAI-compatible chat and models endpoints
- Groq: Ultra-fast inference for Llama, Mixtral, and Gemma models
- xAI: Access to the powerful Grok family of models
- OpenRouter: Access to hundreds of models through a single API
- Nvidia: Access Nvidia's NIM microservices and high-performance models
- AWS Bedrock: Access Amazon Bedrock foundation models via SigV4-authenticated requests
- Z.ai (Zhipu GLM): Access GLM-4, GLM-5 and other Zhipu AI models
- Opencode Zen: Access multiple frontier models (GPT, Claude, Gemini) through a unified API
- Qwen Portal: Access Alibaba's Qwen models via API key or OAuth
- Kimi (Moonshot AI): Access Moonshot's Kimi long-context models
- LM Studio: Connect to locally-hosted models running on your Mac or PC
- Ollama: Connect to locally-hosted Ollama server
Setting Up a Provider
- Open Settings → Cloud Providers
Navigate to the Cloud Providers section in app settings.
- Select a provider
Choose the provider you want to connect to.
- Enter your credentials
For most providers, paste your API key from the provider's dashboard. For AWS Bedrock, enter your Access Key ID and Secret Access Key. For Qwen Portal, you can also use an OAuth refresh token. All credentials are stored securely in your device's Keychain — never in plain text.
- Select a model
Browse available models from the provider and select the one you want to use. For providers without automatic model listing (Bedrock, Kimi), you can enter a model ID manually.
When using cloud providers, your conversation data is transmitted to the provider's servers. The app does not control how providers handle your data. Review each provider's privacy policy before use.
AWS Bedrock
AWS Bedrock requires AWS credentials rather than a simple API key:
- Set your AWS Region
Enter the AWS region where Bedrock is enabled (e.g.
us-east-1,us-west-2). This is saved in your app configuration, not in the Keychain. - Enter AWS Credentials
Tap Enter Credentials and provide your AWS Access Key ID and Secret Access Key. An optional session token is supported for temporary credentials (AWS STS).
- Enter a Bedrock Model ID
Use the manual model entry field to type the Bedrock model ID, e.g.
anthropic.claude-3-sonnet-20240229-v1:0oramazon.titan-text-premier-v1:0.
Requests to Bedrock are authenticated using AWS Signature Version 4 (SigV4) — signed directly on your device. No credentials are ever transmitted to any proxy server.
Qwen Portal
Qwen Portal supports two authentication modes:
- API Key: Use a standard API key with
Authorization: Bearerheader. - OAuth Refresh Token: Provide an OAuth refresh token. The app automatically exchanges it for a short-lived access token before each request, without requiring you to manually refresh.
Select your preferred authentication mode in Settings → Cloud Providers → Qwen Portal before entering your credentials.
Switching Between Local & Cloud
You can switch between local and cloud models at any time, even within the same conversation:
- Open the model picker
- Cloud models appear alongside local models, clearly labeled with the provider name
- Select any model to switch — the conversation context is maintained
When you switch from a local model to a cloud model mid-conversation, your conversation history is sent to the cloud provider. Consider starting a new conversation if you have sensitive content.
Privacy Considerations
On Device AI is designed with privacy first:
- Cloud is always opt-in: No data is ever sent to any server unless you explicitly configure and select a cloud provider
- API keys stored in Keychain: Your credentials are stored using Apple's secure Keychain, not in UserDefaults or plain text
- Direct connection: Data goes directly from your device to the provider — we don't proxy or store anything
- Clear indicators: The UI clearly shows when you're using a cloud model vs. a local one
Local Servers (Ollama & LM Studio)
Ollama and LM Studio are special cases — they run AI models on your own hardware (Mac, PC, or server) rather than in the cloud. This gives you the power of larger models while maintaining privacy:
- Ollama: Set the server URL (default:
http://localhost:11434) in Settings - LM Studio: Set the server URL (default:
http://localhost:1234) in Settings
Your data stays on your local network when using these providers. This is a great option for running larger models on a powerful Mac while chatting from your iPhone.
On Device AI can also serve as a remote inference server itself (macOS). Connect your iPhone to your Mac running On Device AI for the best of both worlds.