Model catalog

Bring any model. Right-sized for your hardware.

Quenderin runs on llama.cpp, so it runs any GGUF model — Llama, Qwen, DeepSeek, Mistral, Gemma, Phi, and thousands more from Hugging Face. Below is the curated shortlist it recommends from, sized for everything from a Raspberry Pi to an M-series Mac.

ModelGood forDownloadMin RAMQuant
DeepSeek-R1 7BStep-by-step reasoning & math4.7 GB8 GBQ4_K_M
Qwen2.5 Coder 7BCode generation & tool use4.7 GB8 GBQ4_K_M
Gemma 3 4BMultilingual, 140+ languages2.5 GB4 GBQ4_K_M
Phi-4 mini 3.8BEfficient, runs well on CPU2.3 GB4 GBQ4_K_M
Mistral 7BFast, capable all-rounder4.1 GB6 GBQ4_K_M
Llama 3.2 1BUltra-light — runs on a Pi0.8 GB1.5 GBQ4_K_M
Qwen3 14BBest quality for a strong device9.0 GB12 GBQ4_K_M

thousands more from Hugging Face — any GGUF works.

How recommendation works

You don't have to choose.

Choosing a model is optional. Quenderin detects your device's memory and chip and automatically selects the best fit — leaving headroom for the operating system so the app stays responsive. It steers you away from a model that would swap to disk or fail to load.

Prefer to drive? Override the recommendation and load any GGUF you like, including ones you've downloaded yourself.

See how the probe works →

Quantization

Why Q4_K_M.

The catalog defaults to Q4_K_M quantization — a 4-bit format that cuts a model's size to roughly a quarter of full precision while keeping almost all of its quality. That's what makes a 7-billion-parameter model fit, and run, on a phone.

Larger quants (more quality, more memory) and smaller ones (lighter, faster) are available for any model you bring.

Plain English

The jargon, decoded.

On-device
The model runs on your own phone or laptop — not on a company’s server.
GGUF
The model file format Quenderin loads. Thousands of open models ship in it.
Quantization (Q4_K_M)
A way of shrinking a model to roughly a quarter of its size while keeping almost all of its quality — what makes it fit on a phone.
llama.cpp
The open-source engine that runs these models fast and efficiently on everyday hardware.

Find your model

What runs on your device?

Slide to your device's memory and see what Quenderin would recommend.

8 GB

Recommended

Qwen3 4B
2.4 GB · Q4_K_M

~4 GB / 8 GB

Run one on your device.

It's open source and runs from GitHub today. Star the repo to follow along.