A menu-bar dictation app for macOS. Hold a hotkey, speak, release — and the text lands wherever your cursor is. Transcription and language models run entirely on-device.
Watch the repo for the first release →Early preview. Dictator is in active development — things will be rough, and some behaviour may change between versions. Feedback (bugs, papercuts, "this should do X") is genuinely welcome at hello@robgough.net.
From audio to written text — and the cleanup that makes it readable — all on your Mac.
Most dictation apps either stop at the raw transcript or send it through a paid cloud LLM to clean it up. Dictator runs the whole pipeline locally, and every language-model pass after the transcription has a deterministic safety net: if the model drifts beyond a measurable threshold, that pass reverts.
WhisperKit or Parakeet (FluidAudio on the Apple Neural Engine) turns your audio into a raw transcript. Both engines run locally; models download once and stay on disk. Pick the one that matches the trade-off you want between accuracy and latency.
A small local LLM punctuates, capitalises, and resolves spoken cues — "new line", "new paragraph", "smiley face", named emoji. Skipped entirely for question-shaped input, where the transcript is already well-punctuated.
Guard: 60% of input anchor words must survive, and the word count can't grow by more than 15% + 3. Otherwise the raw transcript is used.Fixes contractions, subject-verb agreement, duplicate words — the small things you'd tidy by hand. Off by default; toggle from Settings.
Guard: word-level edit distance from the input. Reverts if drift exceeds your threshold (15% by default).Paragraph breaks and bullet lists for longer dictations — useful when you're talking out an email or a memo, not a one-liner.
Guard: strict word-sequence equality after lowercasing. Reverts on any word change — bullets and breaks only, never rewording.A deterministic vocabulary substitution step runs between Format and Grammar — case-insensitive whole-word replacements for names, jargon, and the words your accent always trips up. No model, just your list. The cleanup passes (Format, Grammar, Structure) are the kind of feature most dictation apps charge a subscription for. Here they're included, local, and inspectable.
A second hotkey for editing text by speaking to it.
Hold the assistant hotkey, speak an instruction, release. If you had text selected when you triggered it, Dictator sees both — the selection and the instruction — and the local LLM decides what kind of response you want.
"Tighten this." "Translate to French." "Fix the comma splice." The selection is replaced in place — and Dictator re-selects the new text so you can iterate ("now make it shorter") without reaching for the mouse.
"Reply to this with a polite no." "Summarise these notes." "Draft an email about X." The result lands on the clipboard, or in a small floating window you can read and copy from when there's nowhere obvious to paste.
Conversations are multi-turn: follow-ups extend the same thread, with automatic compaction once the model's context window starts filling. Recent conversations are one click away in the menu-bar dropdown.
Two engines for speech-to-text, four small instruction-tuned LLMs for the cleanup passes — or no LLM at all.
All weights are downloaded on first use and stored in ~/Library/Application Support/Dictator/Models/. You can mix and match: smaller transcription model with a bigger LLM, or vice versa.
| Model | Disk | RAM | Notes |
|---|---|---|---|
| Whisper Tiny (English) | 75 MB | ~150 MB | Fastest, lowest accuracy. |
| Whisper Base (English) | 140 MB | ~250 MB | Good balance for short utterances. |
| Whisper Small (English) | 470 MB | ~700 MB | Solid accuracy. Default. |
| Whisper Large v3 Turbo | 1.5 GB | ~2 GB | Best quality, multilingual. |
| Parakeet TDT v3 (multilingual) | 475 MB | ~700 MB | Runs on the Apple Neural Engine. ~60–70× realtime. 25 European languages. |
| Parakeet TDT v2 (English) | 475 MB | ~700 MB | Slightly better English accuracy than v3. |
| Model | Disk | RAM | Notes |
|---|---|---|---|
| Llama 3.2 1B (4-bit) | 760 MB | ~1.5 GB | Snappy. Decent formatting; weaker on grammar and Assistant tasks. |
| Llama 3.2 3B (4-bit) | 1.9 GB | ~2.5 GB | Recommended default. Good balance of quality and speed. |
| Qwen 2.5 3B (4-bit) | 1.8 GB | ~2.5 GB | Alternative 3B with a different prose style. |
| Qwen 2.5 7B (4-bit) | 4.4 GB | ~5–6 GB | Higher quality output, noticeably slower. |
| None | — | — | Disables every LLM pass. Raw transcript ships through (dictionary substitution still applies). |
All language models are 4-bit quantised via MLX. RAM figures are approximate steady-state at modest context lengths; long Assistant-mode conversations grow as the KV cache fills. Total resident memory with the defaults (Whisper Small + Llama 3.2 3B, both pre-loaded) is around 3 GB.
Running everything on your Mac is a real trade-off. Worth being clear about both sides.
Apple Silicon Mac macOS 15 or newer Roughly 3 GB for models (one Whisper, one LLM)