Not an App. OpenClaw is an open-source AI Agent framework / execution AI assistant that runs on your computer or server to execute real tasks, not chat. **Real Talk:** It's basically a junior engineer with sudo privileges.

What's the essential difference from ChatGPT / Claude?

In short: ChatGPT 'thinks', OpenClaw 'does'. ChatGPT: Answers questions, gives advice OpenClaw: Reads files, runs commands, modifies code, executes workflows **Real Talk:** ChatGPT is a consultant. OpenClaw is an intern who actually does the work.

Does OpenClaw have its own LLM?

No. It's a 'scheduler' that needs you to connect: OpenAI, Claude or local models. 👉 It doesn't sell models, just makes models 'work'. **Real Talk:** You're the DJ. OpenClaw is just the mixer.

Is my data private with local models?

**Short answer:** If you use Ollama locally, yes. The model runs on your machine. **How to verify:** Don't take my word for it. Block outbound traffic (except localhost) via Little Snitch or `ufw`. If OpenClaw still talks to your local Ollama, it's local. If it hangs, check your `base_url`. **Caveat:** If you use API providers (DeepSeek, OpenAI, Anthropic), your prompts go to their servers. Read their privacy policies.

Does it support DeepSeek API?

✅ Yes. Set `LLM_PROVIDER=openai` and `BASE_URL=https://api.deepseek.com`. **Config example (.env)**: ```bash LLM_PROVIDER="openai" LLM_BASE_URL="https://api.deepseek.com" LLM_API_KEY="sk-your-key-here" LLM_MODEL="deepseek-reasoner" ``` 👉 See our **[DeepSeek Config Guide](/guides/how-to-use-deepseek-with-openclaw)** for full setup.

Does it support local DeepSeek (Ollama)?

✅ Yes. Use `provider: ollama`. **Config example (.env)**: ```bash # Install Ollama & pull model curl -fsSL https://ollama.com/install.sh | sh ollama run deepseek-r1:8b # Configure OpenClaw LLM_PROVIDER="ollama" LLM_BASE_URL="http://localhost:11434/v1" LLM_MODEL="deepseek-r1:8b" ``` ⚠️ **Warning:** Requires heavy hardware. See **[Hardware Reality Check](/guides/fix-openclaw-cuda-oom-errors)**.

What is the relationship between OpenClaw and Ollama?

**Ollama is the engine; OpenClaw is the driver.** Ollama runs the DeepSeek model (loads it into VRAM, handles inference). OpenClaw tells it what to do (reads files, runs commands, executes workflows). If Ollama is down, OpenClaw is useless. If OpenClaw isn't running, Ollama is just a chatbot. **Analogy:** Ollama = Engine, OpenClaw = Driver. You need both to drive the car.

Do I need to know programming?

Basic use: No coding needed, but need basic logic Advanced use: Knowing some command line/project structure helps 👉 It's not 'zero barrier', but 'low barrier, high ceiling'. **Real Talk:** If you don't know what `chmod +x` means, you're going to have a bad time.

Can it run on Windows / Mac / Linux?

✅ Mac: Most friendly ✅ Linux / Server: First choice for production ⚠️ Windows: Usually via WSL2 (strongly recommended) **Symptom:** `Error: connect ECONNREFUSED 127.0.0.1:11434` (Networking issue) **Real Talk:** Mac users suffer slowly (3.2 tokens/sec). Windows users suffer dramatically (WSL2 drama). Linux users just suffer.

Can OpenClaw run continuously?

Yes. It can: run long-term, retry on failure, save intermediate state, stop by rules. This is why it's called an autonomous agent. **Real Talk:** That's also why it's called a 'security risk'. It doesn't know when to quit.

Why am I getting JSON parsing errors?

DeepSeek R1 wraps responses in `` tags before the actual JSON. OpenClaw's JSON parser fails. **Symptom:** `SyntaxError: Unexpected token <` (The model is 'thinking' out loud) 👉 **Fix it here:** **[JSON Parsing Fix](/guides/fix-openclaw-json-mode-errors)**.

Is OpenClaw safe? How to prevent Prompt Injection?

**Think of OpenClaw as a junior engineer with sudo privileges.** If you wouldn't trust a junior intern with root access to this folder, don't trust the agent. **Real incidents I've stopped:** - Agent tried to `rm -rf .` to "clean build artifacts" - Agent attempted `curl unknown.sh | bash` because it needed a tool **Mitigation**: - Run in Docker container with read-only filesystem - Use dedicated device (Mac Mini, cheap server) - Block dangerous commands (rm, format, dd, etc.) - Review EVERY execution log 👉 **Read the full autopsy:** **[CVE-2026-25253 Analysis](/guides/openclaw-security-rce-cve-2026-25253)**.

If you give too many permissions, yes. OpenClaw's capabilities ≈ permissions you give Correct approach: read-only by default, specify writable directories, block dangerous commands **Real Talk:** 'Going rogue' is just a fancy way of saying 'it did exactly what you told it to do, not what you meant'.

Suitable for production?

**Short answer:** Yes. **Honest answer:** Only if you have strict guardrails. Otherwise, expect to wake up at 3 AM. **Production requirements**: - You know exactly what the agent can and cannot do - You have tested EVERY workflow in staging - You have permission isolation (read-only by default) - You have logging AND rollback mechanisms - You have a human reviewing every action If you're missing any of these, you're not ready for production. 👉 Beginners should NOT start with production.

OpenClaw Slow Inference? Why 3.5s/token Is Normal (And How to Fix It)

TL;DR: The Fix

OpenClaw feels slow because your hardware can't move data fast enough. A MacBook M2 gets ~3 t/s. An RTX 4090 gets ~80 t/s. A cloud A100 gets ~Real-time speeds.

Quick Fix #1 (Reduce Context):
{
  "num_ctx": 2048,
  "n_gpu_layers": 35
}
Quick Fix #2 (Use Cloud):

Stop debugging physics. Deploy on Vultr (H100/A100 Ready) (High Availability & Limited Time Promotion for new accounts) — rent an A100 for Hourly billing and get Real-time speeds.

The Log: What "Slow" Actually Looks Like

I ran OpenClaw with DeepSeek R1 8B on a MacBook Air M2 (16GB RAM). Here's the actual log:

[2026-02-04 09:23:11] INFO: Model loaded: deepseek-r1:8b (Q4_K_M)
[2026-02-04 09:23:11] INFO: Starting inference...
[2026-02-04 09:23:12] INFO: Token 1 generated
[2026-02-04 09:23:15] INFO: Token 2 generated
[2026-02-04 09:23:18] INFO: Token 3 generated
...
[2026-02-04 09:25:47] INFO: Token 50 generated

[STATS]
eval time = 3450.22 ms / token
tokens per second = 0.29
load time = 12.3 seconds

Read that again: 0.29 tokens per second. A 100-token response took almost 6 minutes. This is not "working" — this is broken.

The Physics: Why It's Slow

Why MacBooks Are Terrible for Inference

Your MacBook's unified memory sounds great on paper. "16GB shared between CPU and GPU!" But for inference, it's a bottleneck:

Hardware	Memory Bandwidth	Real-World Speed
MacBook Air M2 (16GB)	~100 GB/s	0.3 - 3 t/s
MacBook Pro M2 Max (32GB)	~400 GB/s	8 - 15 t/s
RTX 3090 (24GB VRAM)	~936 GB/s	45 - 60 t/s
RTX 4090 (24GB VRAM)	~1,008 GB/s	70 - 90 t/s
A100 (40GB VRAM)	~1,935 GB/s	100 - 120 t/s

The math: Inference speed is limited by memory bandwidth. The model weights need to be read for every token generated. If your RAM can only push 100 GB/s, you're going to be slow.

Why GPU VRAM Is Different

Dedicated GPU memory (GDDR6X, HBM2e) has 10-20x the bandwidth of system RAM. That's why:

An RTX 4090 with 24GB VRAM is 20x faster than a MacBook M2 Max with 32GB unified memory
Bandwidth matters more than capacity for inference

The Fix: Config Tweaks (For Local Hardware)

If you're stuck with local hardware, squeeze every drop of performance:

Fix #1: Reduce Context Window

{
  "num_ctx": 2048
}

Why it works: Smaller context = less memory to read per token = faster inference.

Trade-off: You lose conversation history. OpenClaw will "forget" earlier messages.

Fix #2: Increase GPU Layer Offloading

{
  "n_gpu_layers": 35
}

Why it works: More model layers on GPU = faster computation. CPU is the bottleneck.

Trade-off: If you don't have enough VRAM, this will crash with OOM errors. See our CUDA OOM Fix Guide.

Fix #3: Use Quantized Models

# Use Q4_K_M instead of Q5 or Q6
ollama run deepseek-r1:8b-q4_K_M

Why it works: Smaller model = less memory bandwidth needed.

Trade-off: Output quality drops. The model makes more mistakes.

Complete Config Example

{
  "num_ctx": 2048,
  "n_gpu_layers": 35,
  "num_batch": 512,
  "num_thread": 8
}

Expected results:

MacBook M2: ~3-5 t/s (still slow)
RTX 3090: ~50-70 t/s (usable)
RTX 4090: ~80-Real-time speeds (fast)

When Local Optimization Isn't Enough

The Hard Truth

I spent 2 weeks "optimizing" my OpenClaw setup on a MacBook Air M2. I:

Tried every llama.cpp flag
Switched between Q4, Q5, Q6 quantizations
Closed every app to free RAM
Overclocked my CPU (killed battery life)

Final result: 3.2 tokens/second. Still unusable.

The "Instant" Fix: Cloud H100

Local hardware has limits. If you need 100+ tokens/sec for production:

👉 Deploy on Vultr (H100/A100 Ready) (High Availability & Limited Time Promotion for new accounts)

Cloud GPU	Hourly Cost	Tokens/sec	Break-Even vs Your Time
RTX 4090	~$0.80/hr	80 t/s	Worth it for any serious work
A100 40GB	~$1.50/hr	Real-time speeds	Cheaper than your hourly rate debugging
H100 80GB	~$3.00/hr	150+ t/s	Overkill for most, but fun

My recommendation: Start with an RTX 4090 equivalent. It's 20x faster than your MacBook and costs less than a coffee.

Common Failure Modes

"It Works But It's Incredibly Slow"

Diagnosis: You're on CPU-only inference (Mac or low-VRAM GPU).

Check:

# Check if GPU is being used
nvidia-smi  # Linux/Windows
# macOS: No good way to check, trust the benchmarks above

Fix: Move to a GPU server or accept that it will be slow.

"Sometimes It's Fast, Sometimes It's Slow"

Diagnosis: You're hitting thermal throttling (laptop) or memory pressure (too many apps open).

Fix:

Close Chrome, Slack, and other RAM-hungry apps
Use a cooling pad (laptops)
Reduce context window

Complete Working Example

Here's a complete OpenClaw config optimized for speed:

# openclaw_config.py
from openclaw import Client

# Use a smaller, faster model
client = Client(model="deepseek-r1:8b-q4_K_M")

# Aggressive speed settings
client.context_window = 2048
client.num_gpu_layers = 35
client.num_batch = 512

# For production: use a cloud GPU
# client = Client(
#     model="deepseek-r1:32b",
#     base_url="https://api.openclaw-cloud.com/v1",  # Example
#     api_key="your-key-here"
# )

response = client.generate("Your prompt here")
print(f"Generated {len(response.tokens)} tokens in {response.duration}s")
print(f"Speed: {response.tokens_per_second} t/s")

FAQ

Q: Why is my OpenClaw so slow on Mac?

A: Macs use unified memory with ~100-400 GB/s bandwidth. Inference needs to read model weights for every token. GPU VRAM has ~1,000-2,000 GB/s bandwidth. The math doesn't lie: your Mac is 10-20x slower than a dedicated GPU.

Q: Will more RAM help OpenClaw speed?

A: No. RAM capacity (16GB vs 64GB) doesn't matter for speed. Bandwidth does. A 16GB RTX 4090 is faster than a 128GB MacBook Pro because GPU memory bandwidth is 10x higher.

Q: Is 3 tokens per second normal for OpenClaw?

A: It's "normal" for a MacBook or CPU-only inference. But it's not usable for real work. You need at least 20+ t/s for interactive chat, 50+ t/s for agent loops. Rent a GPU if you need speed.

Q: Can I get Real-time speeds locally?

A: Only with an RTX 4090 or better. And even then, only with smaller models (8B). For 32B or larger models, you need cloud GPUs (A100/H100). Local optimization has limits.

How to Fix OpenClaw OOM Errors - VRAM optimization tips
How to Fix OpenClaw JSON Parsing Errors - DeepSeek thinking tags break JSON mode
Running OpenClaw with DeepSeek R1: The Complete Guide - Setup and configuration

Need Real-time speeds? Stop debugging Mac bandwidth limits. Deploy on Vultr (H100/A100 Ready) (High Availability & Limited Time Promotion for new accounts) — rent an A100 and get Real-time speeds for Hourly billing.

Still Stuck? Check Your Hardware

Sometimes the code is fine, but the GPU is simply refusing to cooperate. Before you waste another hour debugging, compare your specs against the Hardware Reality Table to see if you are fighting impossible physics.