Model Guide
Local AI Model Guide
Every model we install, benchmarked on Apple Silicon. Filter by use case, RAM, or model family.
| Model | Family | Params | Min RAM | Best for | Speed | |
|---|---|---|---|---|---|---|
| Llama 3.3 70B | Llama | 70B | 48GB | ChatReasoning | Powerful | View → |
| Qwen 2.5 72B | Qwen | 72B | 48GB | ChatMultilingual | Powerful | View → |
| Llama 4 Scout 109B MoE | Llama | 109B MoE | 64GB | Frontier reasoningComplex chat | Powerful | View → |
| DeepSeek R1 70B | DeepSeek | 70B | 48GB | ReasoningMath | Powerful | View → |
| Qwen 3 32B | Qwen | 32B | 24GB | ChatReasoning | Balanced | View → |
| Qwen 2.5-Coder 32B | Qwen | 32B | 24GB | CodingCode review | Balanced | View → |
| DeepSeek R1 32B | DeepSeek | 32B | 24GB | ReasoningMath | Balanced | View → |
| Gemma 3 27B | Gemma | 27B | 24GB | ChatDocument analysis | Balanced | View → |
| Mistral Small 24B | Mistral | 24B | 16GB | ChatSummarization | Balanced | View → |
| Qwen3-Coder-Next 80B MoE | Qwen | 80B MoE | 48GB | Advanced codingLarge codebases | Powerful | View → |
| Phi-4 14B | Phi | 14B | 16GB | ReasoningSTEM | Fast | View → |
| Qwen 3 14B | Qwen | 14B | 16GB | ChatGeneral use | Fast | View → |
| Gemma 3 12B | Gemma | 12B | 16GB | ChatMultilingual | Fast | View → |
| Mistral NeMo 12B | Mistral | 12B | 16GB | ChatBasic coding | Fast | View → |
| Qwen 3 8B | Qwen | 8B | 8GB | Fast chatAutomation | Fast | View → |
| Llama 3.2 8B | Llama | 8B | 8GB | Fast chatEntry-level | Fast | View → |
| Gemma 3 4B | Gemma | 4B | 8GB | Ultra-fastAutomation | Fast | View → |
| Llama 3.2 3B | Llama | 3B | 8GB | Fastest local modelAutomation | Fast | View → |
Llama 3.3 70B
Meta's flagship instruction-tuned model — the current standard for high-quality local AI on consumer hardware.
Qwen 2.5 72B
Alibaba's 72B model with exceptional instruction following and multilingual capability — frequently outperforms Llama 3.3 70B on structured tasks.
Llama 4 Scout 109B MoE
Meta's next-generation mixture-of-experts model — frontier-class capability at MoE efficiency, requiring 64GB+ RAM.
DeepSeek R1 70B
DeepSeek's reasoning-focused model trained with reinforcement learning — exceptional at math, logic, and structured thinking.
Qwen 3 32B
The sweet spot for 24GB RAM configurations — excellent quality without the memory demands of 70B models.
Qwen 2.5-Coder 32B
Alibaba's dedicated coding model at 32B parameters — tops most coding benchmarks for its size class across 92+ languages.
DeepSeek R1 32B
The smaller variant of DeepSeek's reasoning model — excellent for math, logic, and structured analysis on 24GB hardware.
Gemma 3 27B
Google's open-source 27B model with multimodal capability — vision plus text in a single 24GB-compatible package.
Mistral Small 24B
Mistral AI's efficient 24B model — strong instruction following and business task performance with a low memory footprint.
Qwen3-Coder-Next 80B MoE
An 80B mixture-of-experts coding model designed for large codebase analysis, complex refactoring, and architectural reasoning.
Phi-4 14B
Microsoft's research model optimized for STEM and reasoning — punches well above its 14B weight on math and logic.
Qwen 3 14B
Qwen's efficient 14B model — solid general-purpose chat and instruction following, good balance of speed and capability for 16GB configs.
Gemma 3 12B
Google's efficient 12B model with good multilingual support and capable everyday performance.
Mistral NeMo 12B
Co-developed by Mistral and NVIDIA — efficient 12B model with solid chat and basic coding capability.
Qwen 3 8B
Qwen's fast 8B model — excellent for quick queries and automation tasks where speed matters more than maximum quality.
Llama 3.2 8B
Meta's entry-level instruction model — reliable for basic tasks, fast on any Apple Silicon Mac with 8GB+ RAM.
Gemma 3 4B
Google's compact 4B model — ultra-fast responses on any Apple Silicon, best for automation triggers and speed-critical tasks.
Llama 3.2 3B
Meta's smallest capable model — runs on any Mac, responds in milliseconds, used primarily for automation and routing.
Not sure which models to run?
We select the right set for your hardware and workflow. Book a consultation.
Book a Consultation