The homework

The real numbers

Marketing pages usually round up. This one is the raw material: what we measured, what we interpolated, and what still runs too hot. Sources and the full research live in the repo — every claim below links to something you can check.

Decode speed, measured

Tokens per second, ~1–1.5B model, 4-bit, stock llama.cpp on CPU (anchors from arXiv 2506.19884); our own pipeline measured ~157–177 tok/s for a 0.5B on an M-series Mac with Metal.

Chip1B3–4B7–8BStatus
A14 (iPhone 12)15 tok/s~7–9not advisedmeasured
A16 (iPhone 15)20 tok/s~10–13tightmeasured
A17 Pro / A18 Pro~24–28~11–13~8interpolated
Snapdragon 8 Gen 2/3~10–20~6–12~4–8measured anchors

What we’ll say out loud that a landing page usually won’t

  • Small models get things wrong. A 1B model is a pocket assistant, not an oracle. The app grades every model’s quality honestly — our lightest build ships labeled Low.
  • Phones throttle. Sustained generation heats a phone until the chip slows itself — published studies show 10–44% drops on long runs. We design around it (thermal-aware thread planning) rather than pretend it away.
  • Some numbers are interpolated. Where hard data is missing (newest chips, GPU decode), our tables say so, in italics, instead of extrapolating quietly.
  • Cloud models are smarter. If you need frontier reasoning and have a connection, use one. Quenderin is for the words you don’t want to hand over and the places the connection doesn’t reach.

Check the homework

The research and the bug history are public — not as a stunt, but because that’s what open source means to us:

  • REALITY.md — the honest “can phones actually do this” write-up that seeds the app’s calibration code.
  • On-device LLM research — 28 sources, adversarially verified; the refuted claims are listed too.
  • Similar projects — who else does this well, and what we learned from each of them.
  • The bug journal — every bug we’ve fixed, what caused it, and the lesson. Yes, really.

Judge it by the code, not this page.