Skip to main content

Agent Framework

Ollama API

Build your own AI-powered applications

Ollama exposes a local REST API that mirrors OpenAI's API format. If you have an application that currently calls OpenAI's API, you can point it at your local Ollama instance instead — often with just a URL change. Build custom integrations, run AI in your own scripts, or connect any tool that supports an OpenAI-compatible endpoint.

Book a consultation to set up the Ollama API →

What it is

OpenAI-compatible. Local hardware.

Ollama runs a REST API on your Mac that speaks the same language as OpenAI's API. Applications built to call api.openai.com can often be redirected to your local instance by changing one line — the base URL.

This makes local AI accessible to developers and technical users who already have tools that call OpenAI, as well as anyone building new applications who wants AI inference that never touches the cloud.

The Ollama API is installed in every setup. This configuration tier focuses on custom integration work: helping you connect your specific application, script, or tool to your local AI instance.

How it works

Local endpoint, familiar interface

Ollama listens on localhost:11434 by default and exposes endpoints for chat completions, completions, and embeddings — the same endpoints OpenAI uses. Your application sends a request; Ollama routes it to the appropriate model and returns the response.

External access is handled via Tailscale, allowing your applications on other devices to reach the API securely without opening a public port. We configure authentication to prevent unauthorized access.

Who it's for

Developers and technical users building on top of AI

  • Developers building applications that use OpenAI's API and want to self-host
  • Technical users running AI in their own scripts and automation
  • Teams with existing OpenAI integrations they want to move to local hardware
  • Anyone who wants programmatic AI access without per-token costs
  • Developers prototyping AI features without sending data to the cloud

Note

The Ollama API is installed and available in every setup. If you're choosing this tier specifically, it means you want help with the integration work — connecting your application or scripts to your local instance and configuring it for your use case.

Full stack

What gets installed

LayerComponentPurpose
AI EngineOllama (MLX backend)Runs models on Apple Silicon with REST API
API InterfaceOpenAI-compatible endpointDrop-in replacement for OpenAI API calls
Chat UIOpen WebUIBrowser chat, always available
NetworkingTailscaleSecure API access from anywhere
SecurityHardened configLoopback binding, optional API key auth
IntegrationCustom configurationConfigured for your specific integration target

Security

Loopback by default, Tailscale for remote

Ollama binds to loopback by default — the API is not accessible from the network without explicit configuration. Remote access is handled through Tailscale's encrypted tunnel. We configure API key authentication for any external access and limit the API to specific model versions to prevent unauthorized model loading.

Recommended models

Models that pair well

Llama 3.3 70B

GPT-4 equivalent — use as a drop-in replacement for OpenAI's most capable models

Qwen 3 32B

Versatile and strong — works well across a wide range of integration use cases

Qwen 2.5-Coder 32B

Code-focused — ideal if your integration is a coding tool or developer workflow

Ready to build with local AI?

Book a consultation. We'll configure the Ollama API on your Mac and help you connect your application or integration.

Book a consultation to set up the Ollama API →