Model catalog
Bring any model. Right-sized for your hardware.
Quenderin runs on llama.cpp, so it runs any GGUF model — Llama, Qwen, DeepSeek, Mistral, Gemma, Phi, and thousands more from Hugging Face. Below is the curated shortlist it recommends from, sized for everything from a Raspberry Pi to an M-series Mac.
| Model | Good for | Download | Min RAM | Quant |
|---|---|---|---|---|
| Qwen3 4BRecommended | General-purpose, Apache 2.0 — the current go-to | 2.4 GB | 4 GB | Q4_K_M |
| DeepSeek-R1 7B | Step-by-step reasoning & math | 4.7 GB | 8 GB | Q4_K_M |
| Qwen2.5 Coder 7B | Code generation & tool use | 4.7 GB | 8 GB | Q4_K_M |
| Gemma 3 4B | Multilingual, 140+ languages | 2.5 GB | 4 GB | Q4_K_M |
| Phi-4 mini 3.8B | Efficient, runs well on CPU | 2.3 GB | 4 GB | Q4_K_M |
| Mistral 7B | Fast, capable all-rounder | 4.1 GB | 6 GB | Q4_K_M |
| Llama 3.2 1B | Ultra-light — runs on a Pi | 0.8 GB | 1.5 GB | Q4_K_M |
| Qwen3 14B | Best quality for a strong device | 9.0 GB | 12 GB | Q4_K_M |
+ thousands more from Hugging Face — any GGUF works.
How recommendation works
You don't have to choose.
Choosing a model is optional. Quenderin detects your device's memory and chip and automatically selects the best fit — leaving headroom for the operating system so the app stays responsive. It steers you away from a model that would swap to disk or fail to load.
Prefer to drive? Override the recommendation and load any GGUF you like, including ones you've downloaded yourself.
Quantization
Why Q4_K_M.
The catalog defaults to Q4_K_M quantization — a 4-bit format that cuts a model's size to roughly a quarter of full precision while keeping almost all of its quality. That's what makes a 7-billion-parameter model fit, and run, on a phone.
Larger quants (more quality, more memory) and smaller ones (lighter, faster) are available for any model you bring.
Plain English
The jargon, decoded.
- On-device
- The model runs on your own phone or laptop — not on a company’s server.
- GGUF
- The model file format Quenderin loads. Thousands of open models ship in it.
- Quantization (Q4_K_M)
- A way of shrinking a model to roughly a quarter of its size while keeping almost all of its quality — what makes it fit on a phone.
- llama.cpp
- The open-source engine that runs these models fast and efficiently on everyday hardware.
Find your model
What runs on your device?
Slide to your device's memory and see what Quenderin would recommend.
Run one on your device.
It's open source and runs from GitHub today. Star the repo to follow along.