Native Apple Silicon | Zero Cloud | Maximum Performance

Your Mac as an
AI Server

Squeeze every drop of AI out of your Mac.
Local LLM inference, background agents, REST API, and CLI tools. Ships as a daemon.

Terminal -- zsh
$ brew install cider
Downloading cider v0.1.0 for arm64-apple-macosx...
Installing daemon via SMAppService...
Done. Cider daemon is running.
$ cider status
Chip: Apple M4 Pro
RAM: 48GB (31GB available for AI)
Model: Llama 3.1 70B Q4 (loaded, 38.2GB)
API: http://localhost:8080
Uptime: 4d 12h 33m
$ cider ask "Explain the CAP theorem"
The CAP theorem states that a distributed system
can only guarantee two of three properties...
$ curl localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"auto","messages":[{"role":"user","content":"hello"}]}'
{"choices":[{"message":{"content":"Hello! How can I help?"}}]}

Three ways in. One daemon.

>

CLI

$ cider ask "Refactor this function"
$ cider agent run summarize-inbox
$ cider models list
$ cider models pull qwen2.5-72b

Pipe stdin, chain with unix tools, script everything.

UI

Menu Bar

Option+Space  -> Quick chat
Click icon    -> Full window
Agent panel   -> Task monitor
Models tab    -> Download/swap

SwiftUI native. When you want a GUI instead.

Built for developers.
Runs like infrastructure.

Always-On Daemon

launchd-managed via SMAppService. Survives sleep, logout, restarts. Model stays loaded in unified memory. Sub-100ms cold start.

Apple Silicon Native

Apple's own ML framework. Direct Metal GPU access. 5-10x faster than llama.cpp on the same hardware. No Python, no GGUF conversion.

Background Agents

Define agent tasks in YAML. Schedule with cron. ReAct loop with shell, file, and web tools. Queue overnight batch jobs.

OpenAI-Compatible API

Same endpoints, same JSON schema. Point your existing code at localhost:8080. Works with LangChain, LlamaIndex, Vercel AI SDK.

Model Management

Pull from HuggingFace. Auto-detect hardware tier. Hot-swap models without restart. Run multiple models concurrently on high-RAM machines.

MCP Server

Model Context Protocol built in. Connect Cider to Claude, Cursor, Windsurf, or any MCP client. Your local models, their interface.

OpenAI-compatible. Zero config.

Python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed"  # local, no auth
)

response = client.chat.completions.create(
    model="auto",  # picks best loaded model
    messages=[
        {"role": "user", "content": "Explain transformers"}
    ],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")
curl
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "user", "content": "Explain transformers"}
    ],
    "stream": true
  }'
Agent Task
curl http://localhost:8080/v1/agents/run \
  -d '{
    "task": "Summarize all PDFs in ~/Documents",
    "tools": ["file_read", "file_write"],
    "model": "auto",
    "notify": true
  }'

What can your Mac run?

Hardware Best Model Speed Use Case
M1/M2 8GB Phi-4 Mini (3.8B) ~30 tok/s Quick chats, code completion
M2/M3 16GB Llama 3.1 8B ~45 tok/s General assistant, writing
M3/M4 Pro 24GB Qwen 2.5 14B ~35 tok/s Complex reasoning, agents
M4 Pro 48GB Llama 3.1 70B Q4 ~20 tok/s Near-GPT-4 quality, local
M4 Ultra 192GB Llama 3.1 405B Q4 ~8 tok/s Frontier-class, fully local

Pricing

Pay for cloud features. Local is always free.

Open

$0/forever
  • Unlimited local inference
  • CLI + API + Menu Bar
  • All supported models
  • OpenAI-compatible endpoints
  • Hardware auto-detection
Get Started