Squeeze every drop of AI out of your Mac.
Local LLM inference, background agents, REST API, and CLI tools. Ships as a daemon.
$ cider ask "Refactor this function"
$ cider agent run summarize-inbox
$ cider models list
$ cider models pull qwen2.5-72b
Pipe stdin, chain with unix tools, script everything.
POST /v1/chat/completions
POST /v1/embeddings
POST /v1/agents/run
GET /v1/models
GET /v1/status
OpenAI-compatible. Drop-in replacement for any SDK.
Option+Space -> Quick chat
Click icon -> Full window
Agent panel -> Task monitor
Models tab -> Download/swap
SwiftUI native. When you want a GUI instead.
launchd-managed via SMAppService. Survives sleep, logout, restarts. Model stays loaded in unified memory. Sub-100ms cold start.
Apple's own ML framework. Direct Metal GPU access. 5-10x faster than llama.cpp on the same hardware. No Python, no GGUF conversion.
Define agent tasks in YAML. Schedule with cron. ReAct loop with shell, file, and web tools. Queue overnight batch jobs.
Same endpoints, same JSON schema. Point your existing code at localhost:8080. Works with LangChain, LlamaIndex, Vercel AI SDK.
Pull from HuggingFace. Auto-detect hardware tier. Hot-swap models without restart. Run multiple models concurrently on high-RAM machines.
Model Context Protocol built in. Connect Cider to Claude, Cursor, Windsurf, or any MCP client. Your local models, their interface.
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed" # local, no auth
)
response = client.chat.completions.create(
model="auto", # picks best loaded model
messages=[
{"role": "user", "content": "Explain transformers"}
],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content, end="")
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [
{"role": "user", "content": "Explain transformers"}
],
"stream": true
}'
curl http://localhost:8080/v1/agents/run \
-d '{
"task": "Summarize all PDFs in ~/Documents",
"tools": ["file_read", "file_write"],
"model": "auto",
"notify": true
}'
| Hardware | Best Model | Speed | Use Case |
|---|---|---|---|
| M1/M2 8GB | Phi-4 Mini (3.8B) | ~30 tok/s | Quick chats, code completion |
| M2/M3 16GB | Llama 3.1 8B | ~45 tok/s | General assistant, writing |
| M3/M4 Pro 24GB | Qwen 2.5 14B | ~35 tok/s | Complex reasoning, agents |
| M4 Pro 48GB | Llama 3.1 70B Q4 | ~20 tok/s | Near-GPT-4 quality, local |
| M4 Ultra 192GB | Llama 3.1 405B Q4 | ~8 tok/s | Frontier-class, fully local |
Pay for cloud features. Local is always free.
Get notified when Cider launches. No spam.
Free tier is permanent. No credit card required.