Coming soon

Talk to your Mac.
Nothing leaves it.

A menu-bar dictation app for macOS. Hold a hotkey, speak, release — and the text lands wherever your cursor is. Transcription and language models run entirely on-device.

Watch the repo for the first release → Apple Silicon · macOS 15+

Early preview. Dictator is in active development — things will be rough, and some behaviour may change between versions. Feedback (bugs, papercuts, "this should do X") is genuinely welcome at hello@robgough.net.

The pipeline

From audio to written text — and the cleanup that makes it readable — all on your Mac.

Most dictation apps either stop at the raw transcript or send it through a paid cloud LLM to clean it up. Dictator runs the whole pipeline locally, and every language-model pass after the transcription has a deterministic safety net: if the model drifts beyond a measurable threshold, that pass reverts.

  1. Transcribe

    WhisperKit or Parakeet (FluidAudio on the Apple Neural Engine) turns your audio into a raw transcript. Both engines run locally; models download once and stay on disk. Pick the one that matches the trade-off you want between accuracy and latency.

  2. Format

    A small local LLM punctuates, capitalises, and resolves spoken cues — "new line", "new paragraph", "smiley face", named emoji. Skipped entirely for question-shaped input, where the transcript is already well-punctuated.

    Guard: 60% of input anchor words must survive, and the word count can't grow by more than 15% + 3. Otherwise the raw transcript is used.
  3. Grammar optional

    Fixes contractions, subject-verb agreement, duplicate words — the small things you'd tidy by hand. Off by default; toggle from Settings.

    Guard: word-level edit distance from the input. Reverts if drift exceeds your threshold (15% by default).
  4. Structure optional

    Paragraph breaks and bullet lists for longer dictations — useful when you're talking out an email or a memo, not a one-liner.

    Guard: strict word-sequence equality after lowercasing. Reverts on any word change — bullets and breaks only, never rewording.

A deterministic vocabulary substitution step runs between Format and Grammar — case-insensitive whole-word replacements for names, jargon, and the words your accent always trips up. No model, just your list. The cleanup passes (Format, Grammar, Structure) are the kind of feature most dictation apps charge a subscription for. Here they're included, local, and inspectable.

Assistant mode

A second hotkey for editing text by speaking to it.

Hold the assistant hotkey, speak an instruction, release. If you had text selected when you triggered it, Dictator sees both — the selection and the instruction — and the local LLM decides what kind of response you want.

Replace

"Tighten this." "Translate to French." "Fix the comma splice." The selection is replaced in place — and Dictator re-selects the new text so you can iterate ("now make it shorter") without reaching for the mouse.

Draft

"Reply to this with a polite no." "Summarise these notes." "Draft an email about X." The result lands on the clipboard, or in a small floating window you can read and copy from when there's nowhere obvious to paste.

Conversations are multi-turn: follow-ups extend the same thread, with automatic compaction once the model's context window starts filling. Recent conversations are one click away in the menu-bar dropdown.

Choose your models

Two engines for speech-to-text, four small instruction-tuned LLMs for the cleanup passes — or no LLM at all.

All weights are downloaded on first use and stored in ~/Library/Application Support/Dictator/Models/. You can mix and match: smaller transcription model with a bigger LLM, or vice versa.

Speech-to-text

ModelDiskRAMNotes
Whisper Tiny (English) 75 MB ~150 MB Fastest, lowest accuracy.
Whisper Base (English) 140 MB ~250 MB Good balance for short utterances.
Whisper Small (English) 470 MB ~700 MB Solid accuracy. Default.
Whisper Large v3 Turbo 1.5 GB ~2 GB Best quality, multilingual.
Parakeet TDT v3 (multilingual)475 MB ~700 MB Runs on the Apple Neural Engine. ~60–70× realtime. 25 European languages.
Parakeet TDT v2 (English) 475 MB ~700 MB Slightly better English accuracy than v3.

Language model — used for the cleanup passes and Assistant mode

ModelDiskRAMNotes
Llama 3.2 1B (4-bit) 760 MB ~1.5 GB Snappy. Decent formatting; weaker on grammar and Assistant tasks.
Llama 3.2 3B (4-bit) 1.9 GB ~2.5 GB Recommended default. Good balance of quality and speed.
Qwen 2.5 3B (4-bit) 1.8 GB ~2.5 GB Alternative 3B with a different prose style.
Qwen 2.5 7B (4-bit) 4.4 GB ~5–6 GB Higher quality output, noticeably slower.
None Disables every LLM pass. Raw transcript ships through (dictionary substitution still applies).

All language models are 4-bit quantised via MLX. RAM figures are approximate steady-state at modest context lengths; long Assistant-mode conversations grow as the KV cache fills. Total resident memory with the defaults (Whisper Small + Llama 3.2 3B, both pre-loaded) is around 3 GB.

Local-first, honestly

Running everything on your Mac is a real trade-off. Worth being clear about both sides.

What you get

  • Privacy. Audio, transcripts, prompts, and conversation history never leave the device. No telemetry, no analytics, no account.
  • Cost. Free after the model downloads. No subscription, no API key, no per-token billing, no surprise bill at the end of the month.
  • Offline. Works on a plane, in a hotel, on a train through a tunnel. Once the models are on disk, the network is irrelevant.
  • Predictable. A vendor can't deprecate the model out from under you or quietly change its behaviour.
  • Inspectable. Open source, so you can see what runs, when, and on what data.

What you give up

  • RAM. 8 GB is the floor, 16 GB is comfortable, more is better if you want the larger LLM. The bigger models assume you have memory to spare.
  • Speed. A small local model takes a couple of seconds per pass. Frontier cloud models like Claude finish in milliseconds — they have orders-of-magnitude more compute behind them.
  • Ceiling. A 3B local model is not Claude. For dictation cleanup it's plenty; for genuinely hard text massaging the frontier still wins.
  • First run. A few GB of weights to download — typically 2.5 GB if you take the defaults.
  • Heat. Sustained inference uses the GPU and Apple Neural Engine. Your Mac will get warm under repeated use.

Why I built this

Rob Gough

I'm Rob Gough. I work as a tech advisor and fractional CTO — drawing on a long career of senior engineering and tech-leadership roles. Alongside the advisory work I'm building Stay Upfront, a unified support and incident management tool for B2B SaaS companies.

Dictator was built to solve a personal need. Every macOS dictation app I tried wanted a subscription for a stack of models that are themselves free and open: Whisper for the speech-to-text, a small Llama or Qwen for the cleanup. Nothing in that pipeline costs the developer per-use — there's no expensive cloud in the loop. The subscription is mostly a tax on not knowing what's inside. I figured I'd build the version that doesn't have it.

These days I dictate most of my long-form writing — emails, notes, half this page. Voice is faster than I'd given it credit for, and the cleanup passes mean the output is something I'd send to a colleague rather than re-edit by hand. My fingers are grateful.

Voice-as-input is going to matter — not as a replacement for typing, but as a second modality you reach for when it fits. Dictator is a small argument that you shouldn't need a cloud account or a credit card to start. You can start now, on the Mac you already own.

If you'd like a senior pair of eyes on your tech strategy, your roadmap, or what to build next — fractional CTO, advisor, sounding board — that's the work I do. More at robgough.net.

Requirements

Apple Silicon Mac macOS 15 or newer Roughly 3 GB for models (one Whisper, one LLM)