What is speaker diarization in On Device AI?

Speaker diarization is a Pro Voice Notes feature that separates a transcript by speaker and labels sections as Speaker 1, Speaker 2, and so on. It helps meeting, interview, and lecture transcripts show who said what.

Does speaker labeling run on device?

Yes. Voice Notes transcription and speaker labeling are designed for on-device processing in On Device AI. Recordings and transcript text stay on the user's Apple device during local processing.

Can users tune speaker diarization behavior?

Yes. On Device AI includes advanced speaker diarization settings for users who want more control, including speaker count limits, clustering threshold, overlap tolerance, and unknown-speaker smoothing.

Which transcript export formats are available?

Voice Note transcripts can be exported as plain text (.txt), SubRip subtitles (.srt), or Markdown (.md), depending on how the user wants to use the transcript after recording.

Is speaker diarization a Pro feature?

Yes. Speaker diarization is a Pro feature. The speaker diarization model must also be downloaded and ready before speaker labeling can run.

Speaker Diarization & Meeting Transcription

Meeting notes fail when every voice looks the same

Most meeting transcripts solve only half the problem. They capture the words but flatten the room. A decision from the client and a concern from engineering land in the same wall of text, with no way to tell them apart.

That creates real work later. According to a 2024 Otter.ai workplace survey, professionals spend roughly 4.4 hours every week on post-meeting follow-up. Much of that time goes to rereading transcripts and guessing who owned each point. The job people actually want done is simple: know what was said, know who said it, and move on to what happens next.

What speaker diarization adds to Voice Notes

Speaker diarization in On Device AI does not try to guess real names. It separates the recording into speaker-labeled sections such as Speaker 1 and Speaker 2, then shows those labels in the transcript when speaker display is enabled.

The feature works inside Voice Notes. Record a meeting or import audio, run transcription, and use speaker labeling to make the transcript easier to scan. Existing recordings with saved diarization data reopen with those labels available, so you do not have to process the same note again just to see who was speaking.

For Pro users with the diarization model downloaded and ready, speaker labeling can be enabled in the creation and re-transcription flows. If the model is missing, the app asks you to download the speaker diarization models first.

The controls are practical, not decorative

Meeting audio is rarely clean. People overlap. Someone speaks from across the room. A short "yeah" can be hard to assign to the right person. On Device AI exposes advanced speaker diarization settings for users who need finer control over those edge cases.

Available settings include speaker count limits, clustering threshold, overlap tolerance, and smoothing for short runs of unrecognized speaker segments. There is also a reset path for returning everything to defaults. Most people will not need to touch these. Teams that record panel interviews, client workshops, or noisy conference rooms will probably appreciate having them.

Speaker context carries into the rest of the workflow

A transcript is usually not the final destination. After a meeting, you might want a summary, action items, grammar cleanup, or a custom prompt in AI Chat.

On Device AI keeps speaker context available for Voice Note conversation actions when speaker labeling is enabled and diarization data exists. That matters because "Speaker 2 raised a concern about the timeline" is more useful to a summarizer than "someone raised a concern." The model gets better context, and you spend less time editing the output afterward.

Export when the transcript needs to leave the app

Some meeting notes stay in On Device AI. Others need to move into a project folder, a video editor, a client recap, or a shared document. Voice Notes supports export formats for those different jobs.

Plain text for simple notes and searchable archives.
SubRip subtitles (.srt) for timed captions.
Markdown (.md) for structured notes that can move into docs, wikis, or developer workflows.

That export surface keeps the feature useful after the recording is finished. You can keep the transcript private while processing it, then choose the file format that fits the next step.

Why local voice processing matters

Meetings contain details most teams would not paste into a public chatbot: roadmap decisions, client names, pricing discussions, hiring notes. Apple's own App Privacy guidelines classify audio recordings as sensitive data. The usual tradeoff is convenience against control.

On Device AI takes the local route. Voice Notes is designed around on-device transcription and speaker labeling on Apple hardware, with cloud providers remaining optional elsewhere in the app. For users who record sensitive conversations, that distinction matters because it is the reason the workflow is usable at all.

A better default for professional recordings

A useful meeting transcript is one you actually open the next morning. Speaker labels cut the time it takes to skim, pull out action items, and hand off notes to someone who was not in the room. Voice Notes is already the most-used feature among On Device AI Pro subscribers, and speaker diarization makes it more useful for the recordings that matter most.

If you already use On Device AI as a private workspace, speaker diarization fills the gap between recording a conversation and doing something with it, without uploading audio to a third-party server.

← Back to News & Blogs Voice Notes guide → Download On Device AI →