What text-to-speech engines are available in On Device AI?

On Device AI offers three on-device TTS engines: Apple Voices (lightweight, many languages), Kokoro TTS (highly natural neural speech, ~80 MB download), and PocketTTS (streaming neural speech for English with ~80ms latency, Pro required).

Can I adjust the speed and tone of the speech?

Yes, you can adjust the speech speed from 0.5x to 2.0x for all engines. If using PocketTTS, you can also adjust the Temperature (0.0-1.0) to control expressiveness and toggle De-essing to reduce harsh sounds.

Text-to-Speech

On Device AI can read text aloud using three on-device engines: Apple's built-in voices, Kokoro neural TTS, and PocketTTS for low-latency streaming speech. Pick the one that fits your needs in Settings → Voice.

On this page

TTS Engines
PocketTTS settings
Using TTS
Generate & Save
Save Text to a Library
Speech Speed
Auto-Play Responses

TTS Engines

Three text-to-speech engines are available:

Apple Voices: The system's built-in speech synthesis. Dozens of voices, many languages, minimal resource use. No download needed.
Kokoro TTS: Neural TTS that produces natural, human-like speech in multiple languages. Runs on-device. Requires a one-time model download (~80 MB).
PocketTTS: Streaming neural TTS, English only. Audio begins playing within roughly 80 ms, before the full sentence finishes generating. Requires a Pro subscription and a one-time model download. PRO

Which engine to pick

Apple voices are lightweight and cover the most languages. Kokoro sounds more natural and still supports multiple languages. PocketTTS has the lowest playback latency but only works with English text.

PocketTTS settings

When PocketTTS is the active engine, two extra controls appear in Settings → Voice:

Temperature (0.0–1.0, default 0.7): Lower values produce more consistent speech. Higher values add more expressiveness and variation.
De-essing (on by default): Reduces harsh sibilant ("s" and "sh") sounds in the output.

PocketTTS supports English only. Non-English text will still synthesize on a best-effort basis, but the results are unpredictable. For other languages, switch to Kokoro or Apple.

Using TTS

There are several ways to use text-to-speech:

TTS Tab: Navigate to the Text to Speech tab, paste or type text, and tap the play button
AI Chat responses: Tap the speaker icon on any AI response to have it read aloud
Message Actions: Select "Send to TTS" from a message's context menu (iOS) or the "More Actions" button (macOS) to send it directly to the TTS tab
Auto-play: Enable auto-play to have every AI response spoken automatically

Generate & Save

From the Text to Speech tab, you can export speech to a WAV file. Tap "Generate & Save" to write the audio to your device's recordings folder with a metadata sidecar (text, engine, voice, speed). Cancelling mid-export removes any partial files automatically.

Save Text to a Knowledge Library

If generated or pasted text should become reusable project context, import it into a Knowledge Library from TTS history or from the library's Add History flow. The text becomes an editable document in the destination library. Exported audio stays in Text-to-Speech history; only the text is copied into the library.

Speech Speed

Adjust the speech speed in Settings → Voice:

Range: 0.5x (slow) to 2.0x (fast)
Default: 1.0x (normal speed)
Test button: Preview the current speed setting before applying

Auto-Play Responses

When enabled, AI responses in chat are automatically converted to speech and played back. This creates a hands-free conversational experience, especially useful when combined with voice input.

Toggle auto-play in Settings → Voice → "Auto-play voice responses".