Native Apple Silicon | Zero Cloud | Maximum Performance

Your Mac as an
AI Server

Squeeze every drop of AI out of your Mac.
Local LLM inference, background agents, REST API, and CLI tools. Ships as a daemon.

Get Early Access View API Docs

Terminal -- zsh

$ brew install cider

Downloading cider v0.1.0 for arm64-apple-macosx...

Installing daemon via SMAppService...

Done. Cider daemon is running.

$ cider status

Chip: Apple M4 Pro

RAM: 48GB (31GB available for AI)

Model: Llama 3.1 70B Q4 (loaded, 38.2GB)

API: http://localhost:8080

Uptime: 4d 12h 33m

$ cider ask "Explain the CAP theorem"

The CAP theorem states that a distributed system

can only guarantee two of three properties...

$ curl localhost:8080/v1/chat/completions \

-H "Content-Type: application/json" \

-d '{"model":"auto","messages":[{"role":"user","content":"hello"}]}'

{"choices":[{"message":{"content":"Hello! How can I help?"}}]}

Three ways in. One daemon.

CLI

$ cider ask "Refactor this function"
$ cider agent run summarize-inbox
$ cider models list
$ cider models pull qwen2.5-72b

Pipe stdin, chain with unix tools, script everything.

{}

REST API

POST /v1/chat/completions
POST /v1/embeddings
POST /v1/agents/run
GET  /v1/models
GET  /v1/status

OpenAI-compatible. Drop-in replacement for any SDK.

Menu Bar

Option+Space  -> Quick chat
Click icon    -> Full window
Agent panel   -> Task monitor
Models tab    -> Download/swap

SwiftUI native. When you want a GUI instead.

Built for developers.
Runs like infrastructure.

Always-On Daemon

launchd-managed via SMAppService. Survives sleep, logout, restarts. Model stays loaded in unified memory. Sub-100ms cold start.

Apple Silicon Native

Apple's own ML framework. Direct Metal GPU access. 5-10x faster than llama.cpp on the same hardware. No Python, no GGUF conversion.

Background Agents

Define agent tasks in YAML. Schedule with cron. ReAct loop with shell, file, and web tools. Queue overnight batch jobs.

OpenAI-Compatible API

Same endpoints, same JSON schema. Point your existing code at localhost:8080. Works with LangChain, LlamaIndex, Vercel AI SDK.

Model Management

Pull from HuggingFace. Auto-detect hardware tier. Hot-swap models without restart. Run multiple models concurrently on high-RAM machines.

MCP Server

Model Context Protocol built in. Connect Cider to Claude, Cursor, Windsurf, or any MCP client. Your local models, their interface.

OpenAI-compatible. Zero config.

Python

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed"  # local, no auth
)

response = client.chat.completions.create(
    model="auto",  # picks best loaded model
    messages=[
        {"role": "user", "content": "Explain transformers"}
    ],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")

curl

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "user", "content": "Explain transformers"}
    ],
    "stream": true
  }'

Agent Task

curl http://localhost:8080/v1/agents/run \
  -d '{
    "task": "Summarize all PDFs in ~/Documents",
    "tools": ["file_read", "file_write"],
    "model": "auto",
    "notify": true
  }'

What can your Mac run?

Hardware	Best Model	Speed	Use Case
M1/M2 8GB	Phi-4 Mini (3.8B)	~30 tok/s	Quick chats, code completion
M2/M3 16GB	Llama 3.1 8B	~45 tok/s	General assistant, writing
M3/M4 Pro 24GB	Qwen 2.5 14B	~35 tok/s	Complex reasoning, agents
M4 Pro 48GB	Llama 3.1 70B Q4	~20 tok/s	Near-GPT-4 quality, local
M4 Ultra 192GB	Llama 3.1 405B Q4	~8 tok/s	Frontier-class, fully local

Pricing

Pay for cloud features. Local is always free.

Open

$0/forever

Unlimited local inference
CLI + API + Menu Bar
All supported models
OpenAI-compatible endpoints
Hardware auto-detection

Get Started

For Teams

Pro

$15/month

Everything in Open
Background agent scheduling
Cloud GPU fallback
Team shared knowledge base
Remote API access (Tailscale)
MCP server mode
Priority support

Get Early Access

Your Mac as anAI Server