Skip to main content

Model Guide

Local AI Model Guide

Every model we install, benchmarked on Apple Silicon. Filter by use case, RAM, or model family.

18 models

Llama 3.3 70B

Llama70B·48GB min
Powerful

Meta's flagship instruction-tuned model — the current standard for high-quality local AI on consumer hardware.

Qwen 2.5 72B

Qwen72B·48GB min
Powerful

Alibaba's 72B model with exceptional instruction following and multilingual capability — frequently outperforms Llama 3.3 70B on structured tasks.

Llama 4 Scout 109B MoE

Llama109B MoE·64GB min
Powerful

Meta's next-generation mixture-of-experts model — frontier-class capability at MoE efficiency, requiring 64GB+ RAM.

DeepSeek R1 70B

DeepSeek70B·48GB min
Powerful

DeepSeek's reasoning-focused model trained with reinforcement learning — exceptional at math, logic, and structured thinking.

Qwen 3 32B

Qwen32B·24GB min
Balanced

The sweet spot for 24GB RAM configurations — excellent quality without the memory demands of 70B models.

Qwen 2.5-Coder 32B

Qwen32B·24GB min
Balanced

Alibaba's dedicated coding model at 32B parameters — tops most coding benchmarks for its size class across 92+ languages.

DeepSeek R1 32B

DeepSeek32B·24GB min
Balanced

The smaller variant of DeepSeek's reasoning model — excellent for math, logic, and structured analysis on 24GB hardware.

Gemma 3 27B

Gemma27B·24GB min
Balanced

Google's open-source 27B model with multimodal capability — vision plus text in a single 24GB-compatible package.

Mistral Small 24B

Mistral24B·16GB min
Balanced

Mistral AI's efficient 24B model — strong instruction following and business task performance with a low memory footprint.

Qwen3-Coder-Next 80B MoE

Qwen80B MoE·48GB min
Powerful

An 80B mixture-of-experts coding model designed for large codebase analysis, complex refactoring, and architectural reasoning.

Phi-4 14B

Phi14B·16GB min
Fast

Microsoft's research model optimized for STEM and reasoning — punches well above its 14B weight on math and logic.

Qwen 3 14B

Qwen14B·16GB min
Fast

Qwen's efficient 14B model — solid general-purpose chat and instruction following, good balance of speed and capability for 16GB configs.

Gemma 3 12B

Gemma12B·16GB min
Fast

Google's efficient 12B model with good multilingual support and capable everyday performance.

Mistral NeMo 12B

Mistral12B·16GB min
Fast

Co-developed by Mistral and NVIDIA — efficient 12B model with solid chat and basic coding capability.

Qwen 3 8B

Qwen8B·8GB min
Fast

Qwen's fast 8B model — excellent for quick queries and automation tasks where speed matters more than maximum quality.

Llama 3.2 8B

Llama8B·8GB min
Fast

Meta's entry-level instruction model — reliable for basic tasks, fast on any Apple Silicon Mac with 8GB+ RAM.

Gemma 3 4B

Gemma4B·8GB min
Fast

Google's compact 4B model — ultra-fast responses on any Apple Silicon, best for automation triggers and speed-critical tasks.

Llama 3.2 3B

Llama3B·8GB min
Fast

Meta's smallest capable model — runs on any Mac, responds in milliseconds, used primarily for automation and routing.

Not sure which models to run?

We select the right set for your hardware and workflow. Book a consultation.

Book a Consultation