Models — Quenderin

Model catalog

Bring any model. Right-sized for your hardware.

Quenderin runs on llama.cpp, so it runs any GGUF model — Llama, Qwen, DeepSeek, Mistral, Gemma, Phi, and thousands more from Hugging Face. Below is the curated shortlist it recommends from, sized for everything from a Raspberry Pi to an M-series Mac.

Model	Good for	Download	Min RAM	Quant
Qwen3 4BRecommended	General-purpose, Apache 2.0 — the current go-to	2.4 GB	4 GB	Q4_K_M
DeepSeek-R1 7B	Step-by-step reasoning & math	4.7 GB	8 GB	Q4_K_M
Qwen2.5 Coder 7B	Code generation & tool use	4.7 GB	8 GB	Q4_K_M
Gemma 3 4B	Multilingual, 140+ languages	2.5 GB	4 GB	Q4_K_M
Phi-4 mini 3.8B	Efficient, runs well on CPU	2.3 GB	4 GB	Q4_K_M
Mistral 7B	Fast, capable all-rounder	4.1 GB	6 GB	Q4_K_M
Llama 3.2 1B	Ultra-light — runs on a Pi	0.8 GB	1.5 GB	Q4_K_M
Qwen3 14B	Best quality for a strong device	9.0 GB	12 GB	Q4_K_M

+ thousands more from Hugging Face — any GGUF works.

How recommendation works

You don't have to choose.

Choosing a model is optional. Quenderin detects your device's memory and chip and automatically selects the best fit — leaving headroom for the operating system so the app stays responsive. It steers you away from a model that would swap to disk or fail to load.

Prefer to drive? Override the recommendation and load any GGUF you like, including ones you've downloaded yourself.

See how the probe works →

Quantization

Why Q4_K_M.

The catalog defaults to Q4_K_M quantization — a 4-bit format that cuts a model's size to roughly a quarter of full precision while keeping almost all of its quality. That's what makes a 7-billion-parameter model fit, and run, on a phone.

Larger quants (more quality, more memory) and smaller ones (lighter, faster) are available for any model you bring.

Plain English

The jargon, decoded.

On-device: The model runs on your own phone or laptop — not on a company’s server.
GGUF: The model file format Quenderin loads. Thousands of open models ship in it.
Quantization (Q4_K_M): A way of shrinking a model to roughly a quarter of its size while keeping almost all of its quality — what makes it fit on a phone.
llama.cpp: The open-source engine that runs these models fast and efficiently on everyday hardware.

Find your model

What runs on your device?

Slide to your device's memory and see what Quenderin would recommend.

Run one on your device.

It's open source and runs from GitHub today. Star the repo to follow along.

Star on GitHub Read the FAQ →