Not an App. OpenClaw is an open-source AI Agent framework / execution AI assistant that runs on your computer or server to execute real tasks, not chat. **Real Talk:** It's basically a junior engineer with sudo privileges.

What's the essential difference from ChatGPT / Claude?

In short: ChatGPT 'thinks', OpenClaw 'does'. ChatGPT: Answers questions, gives advice OpenClaw: Reads files, runs commands, modifies code, executes workflows **Real Talk:** ChatGPT is a consultant. OpenClaw is an intern who actually does the work.

Does OpenClaw have its own LLM?

No. It's a 'scheduler' that needs you to connect: OpenAI, Claude or local models. 👉 It doesn't sell models, just makes models 'work'. **Real Talk:** You're the DJ. OpenClaw is just the mixer.

Is my data private with local models?

**Short answer:** If you use Ollama locally, yes. The model runs on your machine. **How to verify:** Don't take my word for it. Block outbound traffic (except localhost) via Little Snitch or `ufw`. If OpenClaw still talks to your local Ollama, it's local. If it hangs, check your `base_url`. **Caveat:** If you use API providers (DeepSeek, OpenAI, Anthropic), your prompts go to their servers. Read their privacy policies.

Does it support DeepSeek API?

✅ Yes. Set `LLM_PROVIDER=openai` and `BASE_URL=https://api.deepseek.com`. **Config example (.env)**: ```bash LLM_PROVIDER="openai" LLM_BASE_URL="https://api.deepseek.com" LLM_API_KEY="sk-your-key-here" LLM_MODEL="deepseek-reasoner" ``` 👉 See our **[DeepSeek Config Guide](/guides/how-to-use-deepseek-with-openclaw)** for full setup.

Does it support local DeepSeek (Ollama)?

✅ Yes. Use `provider: ollama`. **Config example (.env)**: ```bash # Install Ollama & pull model curl -fsSL https://ollama.com/install.sh | sh ollama run deepseek-r1:8b # Configure OpenClaw LLM_PROVIDER="ollama" LLM_BASE_URL="http://localhost:11434/v1" LLM_MODEL="deepseek-r1:8b" ``` ⚠️ **Warning:** Requires heavy hardware. See **[Hardware Reality Check](/guides/fix-openclaw-cuda-oom-errors)**.

What is the relationship between OpenClaw and Ollama?

**Ollama is the engine; OpenClaw is the driver.** Ollama runs the DeepSeek model (loads it into VRAM, handles inference). OpenClaw tells it what to do (reads files, runs commands, executes workflows). If Ollama is down, OpenClaw is useless. If OpenClaw isn't running, Ollama is just a chatbot. **Analogy:** Ollama = Engine, OpenClaw = Driver. You need both to drive the car.

Do I need to know programming?

Basic use: No coding needed, but need basic logic Advanced use: Knowing some command line/project structure helps 👉 It's not 'zero barrier', but 'low barrier, high ceiling'. **Real Talk:** If you don't know what `chmod +x` means, you're going to have a bad time.

Can it run on Windows / Mac / Linux?

✅ Mac: Most friendly ✅ Linux / Server: First choice for production ⚠️ Windows: Usually via WSL2 (strongly recommended) **Symptom:** `Error: connect ECONNREFUSED 127.0.0.1:11434` (Networking issue) **Real Talk:** Mac users suffer slowly (3.2 tokens/sec). Windows users suffer dramatically (WSL2 drama). Linux users just suffer.

Can OpenClaw run continuously?

Yes. It can: run long-term, retry on failure, save intermediate state, stop by rules. This is why it's called an autonomous agent. **Real Talk:** That's also why it's called a 'security risk'. It doesn't know when to quit.

Why am I getting JSON parsing errors?

DeepSeek R1 wraps responses in `` tags before the actual JSON. OpenClaw's JSON parser fails. **Symptom:** `SyntaxError: Unexpected token <` (The model is 'thinking' out loud) 👉 **Fix it here:** **[JSON Parsing Fix](/guides/fix-openclaw-json-mode-errors)**.

Is OpenClaw safe? How to prevent Prompt Injection?

**Think of OpenClaw as a junior engineer with sudo privileges.** If you wouldn't trust a junior intern with root access to this folder, don't trust the agent. **Real incidents I've stopped:** - Agent tried to `rm -rf .` to "clean build artifacts" - Agent attempted `curl unknown.sh | bash` because it needed a tool **Mitigation**: - Run in Docker container with read-only filesystem - Use dedicated device (Mac Mini, cheap server) - Block dangerous commands (rm, format, dd, etc.) - Review EVERY execution log 👉 **Read the full autopsy:** **[CVE-2026-25253 Analysis](/guides/openclaw-security-rce-cve-2026-25253)**.

If you give too many permissions, yes. OpenClaw's capabilities ≈ permissions you give Correct approach: read-only by default, specify writable directories, block dangerous commands **Real Talk:** 'Going rogue' is just a fancy way of saying 'it did exactly what you told it to do, not what you meant'.

Suitable for production?

**Short answer:** Yes. **Honest answer:** Only if you have strict guardrails. Otherwise, expect to wake up at 3 AM. **Production requirements**: - You know exactly what the agent can and cannot do - You have tested EVERY workflow in staging - You have permission isolation (read-only by default) - You have logging AND rollback mechanisms - You have a human reviewing every action If you're missing any of these, you're not ready for production. 👉 Beginners should NOT start with production.

Fix OpenClaw CUDA Out of Memory Errors

Error Confirmation

Error: CUDA out of memory. Tried to allocate 2.5GiB
  (GPU 0: NVIDIA GeForce RTX 3080; 10GiB total capacity;
   8.2GiB already allocated; 1.5GiB free; 9.7GiB reserved)

Stack trace:
  at /pytorch/aten/src/ATen/cuda/CUDAGraphs.cuh:287
  at openclaw/runtime/gpu_allocator.py:142
  at Model.load_weights (/lib/model_loader.py:89)

Or the raw PyTorch error:

torch.cuda.OutOfMemoryError: CUDA out of memory.
Tried to allocate 2.50 GiB. GPU 0 has a total capacity of 10.00 GiB
of which 14.20 MiB is free. Process included: PID 2812 (python3)
- using 9.98 GiB.

Scope: OpenClaw crashes when the GPU runs out of VRAM (Video RAM). This is not a software bug — it's a hardware constraint. The model requires more memory than your GPU has available.

Error Code: CUDA out of memory — GPU VRAM is a fixed physical resource. When the model + KV cache exceeds available VRAM, CUDA cannot allocate more and the process crashes.

Verified Environment

Component	Version	Last Verified
OpenClaw	Latest stable	2026-02-06
CUDA	11.8+, 12.x	2026-02-06
PyTorch	2.0+	2026-02-06
NVIDIA Driver	525+	2026-02-06
Models	DeepSeek R1 8B, 32B, 70B	2026-02-06

VRAM Requirements (FP16):

Model	VRAM Required	VRAM Required (Q4)
DeepSeek R1 8B	~16GB	~6GB
DeepSeek R1 32B	~64GB	~20GB
DeepSeek R1 70B	~140GB	~42GB

Note: These are minimums. KV cache, conversation history, and other processes add to VRAM usage.

3-Minute Sanity Check

Run these commands to confirm VRAM capacity and usage:

# 1. Check your GPU VRAM
nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader,nounits
# Expected: e.g., "RTX 3080,10240,2048" (name,total MB,free MB)

# 2. Check current VRAM usage while OpenClaw runs
nvidia-smi
# Look at "Memory-Usage" column - if near 100%, you're OOM

# 3. Check what model you're trying to run
ollama list | grep deepseek
# Expected: Shows installed DeepSeek R1 variants

# 4. Calculate required VRAM
# For DeepSeek R1 32B Q4: ~20GB minimum
# For DeepSeek R1 8B Q4: ~6GB minimum
python3 -c "print(f'Required: {int(20 * 1024)} MB for 32B Q4')"

If step 1 shows < 6GB free: You cannot run DeepSeek R1 8B without quantization or offloading.

If step 1 shows < 20GB free: You cannot run DeepSeek R1 32B without extreme measures.

If your GPU is integrated (Intel/AMD): This is not CUDA OOM. You're running on CPU/Metal, which is much slower. Check if you're actually using an NVIDIA GPU.

Decision Gate

Stop fighting VRAM physics.

Should you keep debugging CUDA OOM locally?

Continue local debugging only if:

Your GPU has the minimum VRAM for your target model (6GB for 8B Q4, 20GB for 32B Q4)

You have not tried quantization (Q4_K_M, Q5_K_M) yet

You have not tried reducing context window (--num_ctx 4096 or lower)

You are willing to accept reduced model quality for local execution

Stop here if any apply:

Your GPU VRAM is mathematically insufficient for the model (e.g., 8GB VRAM for 32B model)

You already quantized (Q4) and reduced context, still OOM

You have spent more than ~30 minutes on VRAM optimization

You need production reliability (consumer GPUs are not designed for 24/7 AI workloads)

Past this point, you are fighting physics, not software. VRAM is a fixed resource. No config change can make 8GB physically hold 20GB.

Primary Exit Path: Local Optimization

Use when: Your GPU has sufficient VRAM for the model, or you're willing to use a smaller model.

Why this works:

Quantization reduces VRAM usage by 60-75%
Context reduction limits KV cache growth
Model selection matches VRAM to requirements

Time investment: 10-15 minutes

Solution 1: Use a Quantized Model (Recommended)

Quantized models use fewer bits per parameter, dramatically reducing VRAM requirements.

# Check available quantized versions
ollama list | grep deepseek

# Run Q4 quantized version (uses ~60% less VRAM)
ollama run deepseek-r1:32b-q4_K_M

# Or use the 8B model instead (much lower VRAM)
ollama run deepseek-r1:8b-q4_K_M

# Configure OpenClaw to use quantized model
export OPENCLAW_MODEL="deepseek-r1:32b-q4_K_M"
openclaw serve

Solution 2: Reduce Context Window

Limiting the context window reduces KV cache size and prevents Agent Loop OOM.

# Default is usually 16384, cut to 4096 or 2048
ollama run deepseek-r1:32b-q4_K_M \
  --num_ctx 4096 \
  --num-gpu 99 \
  --repeat-penalty 0.6

Solution 3: Partial GPU Offload

Let GPU handle some layers, CPU handles the rest. Slower, but uses less VRAM.

# Only load 35 layers on GPU, rest on CPU
ollama run deepseek-r1:32b-q4_K_M \
  --num-gpu-layers 35 \
  --num_ctx 2048

# Warning: CPU inference is 5-10x slower

Solution 4: Enable Conversation Pruning

OpenClaw can automatically drop old messages to prevent KV cache overflow.

# openclaw_config.py
from openclaw import Client

client = Client(model="deepseek-r1:8b-q4_K_M")

# Auto-prune conversation history
client.max_history_messages = 10
client.context_window = 4096

# OpenClaw drops old messages to prevent OOM
response = client.generate("Your prompt here")

Verification:

# Monitor VRAM while running
watch -n 1 nvidia-smi

# In another terminal, run OpenClaw
# VRAM should stabilize below 100%

Secondary Exit Path (Conditional)

Use when: Primary Exit Path fails — your GPU physically cannot hold the model.

This is the only remaining technically valid option when:

Your GPU VRAM < 6GB (cannot run 8B even with quantization)
You need DeepSeek R1 32B or 70B for your use case
You require production reliability (consumer GPUs are not designed for 24/7 workloads)
You've quantized, reduced context, and still hit OOM

Solution: Cloud GPU with Dedicated VRAM

Cloud GPUs provide the VRAM needed for larger models. This is not a "workaround" — it's the correct solution when hardware constraints are absolute.

Why this is valid:

VRAM is a physical resource. You cannot add VRAM to a consumer GPU.
Cloud GPUs (H100, A100) have 40GB-80GB VRAM — sufficient for any DeepSeek R1 variant.
Hourly billing lets you test and scale.
Production isolation prevents AI workloads from affecting your development machine.

Steps:

Choose a GPU provider with NVIDIA H100 or A100 availability
Select GPU VRAM based on your model:
- DeepSeek R1 8B: 16GB+ VRAM
- DeepSeek R1 32B: 24GB+ VRAM
- DeepSeek R1 70B: 48GB+ VRAM
Install Ollama on the VPS
Run DeepSeek R1 at full precision (no quantization needed)

Time investment: 20-30 minutes (including VPS setup)

Note: This is not "giving up" on local debugging. It's recognizing that VRAM is a fixed resource and selecting the appropriate hardware for your workload.

Approved Next Action

When local VRAM is exhausted, the remaining option is a cloud GPU with dedicated VRAM. Start with a clean Linux environment and NVIDIA GPU (H100, A100, or L40S are recommended for DeepSeek R1 32B/70B).

View Cloud GPU Options →

Why NOT Other Options

Option	Rejection Reason
Add more system RAM	CUDA errors are about GPU VRAM, not system RAM. Adding 64GB of system RAM won't help if your GPU has 8GB VRAM.
Extreme quantization (Q2/Q3)	Model quality degrades significantly. The output becomes incoherent for OpenClaw agent loops. Not technically viable.
Swap to SSD	Impossibly slow. GPU memory access is ~1000x faster than SSD. Swapping GPU memory to disk would make inference unusable.
Run multiple GPUs in parallel	Doesn't increase per-GPU VRAM. Multi-GPU splits the model across GPUs, but each GPU still needs sufficient VRAM for its shard.
Wait for model optimization	DeepSeek R1 is already optimized. The VRAM requirements are fundamental to the model architecture.
Upgrade consumer GPU	RTX 4090 (24GB) costs $1600+. For that price, you can run a cloud GPU for months. And 4090 still can't run R1 70B at full precision.

Summary

Check	Command	Pass Criteria
GPU VRAM available	`nvidia-smi --query-gpu=memory.free --format=csv,noheader,nounits`	Shows free VRAM in MB
Model VRAM requirement	Calculate based on model size	Free VRAM ≥ Required VRAM
Quantization tested	`ollama run deepseek-r1:32b-q4_K_M`	Model loads without OOM
Context reduction tested	`ollama run --num_ctx 4096`	Stable after 10+ messages

Decision:

VRAM ≥ Required VRAM: Use Primary Exit Path (local optimization). Quantize and reduce context.
VRAM < Required VRAM after optimization: Use Secondary Exit Path (cloud GPU). This is physics, not failure.
Production reliability required: Use cloud GPU regardless. Consumer GPUs are not designed for 24/7 AI workloads.

Hardware Reality Check:

GPU	VRAM	Can Run 8B Q4?	Can Run 32B Q4?
RTX 3060	12GB	Yes (tight)	No
RTX 3080	10GB	No	No
RTX 3090	24GB	Yes	Yes (tight)
RTX 4090	24GB	Yes	Yes (tight)
Cloud H100	80GB	Yes	Yes (full precision)

Last resort: If you have spent more than 30 minutes on VRAM optimization and still hit OOM, your GPU is not sufficient for your target model. Use a cloud GPU or switch to a smaller model. This is not a failure — it's matching hardware to workload requirements.

Hardware Requirements Reality Check - Can your PC run OpenClaw?
OpenClaw Agent API Cost Model - API vs GPU breakpoint analysis
Fix OpenClaw JSON Mode Errors - DeepSeek thinking tags

FAQ

Q: Will adding more system RAM fix CUDA OOM?

A: No. CUDA errors are about GPU VRAM, not system RAM. Adding 64GB of system RAM won't help if your GPU has 8GB VRAM. The model must fit in GPU memory to run. You can offload some layers to CPU, but performance drops significantly.

Q: Can I run DeepSeek R1 32B on an RTX 3060 (12GB)?

A: Not practically. The Q4 quantized version requires ~20GB VRAM. You could try extreme quantization (Q2), but output quality degrades significantly. Better option: use DeepSeek R1 8B, or use a cloud GPU with 24GB+ VRAM.

Q: Why does it work for 10 messages then crash?

A: That's Agent Loop OOM. OpenClaw accumulates conversation history in the KV cache, which grows with each message. After 10-15 messages, the cache fills your VRAM. Fix: reduce context window (--num_ctx 4096) or enable history pruning (client.max_history_messages = 10).

Q: Is cloud GPU worth it for OpenClaw?

A: If your local GPU is insufficient, yes. Cloud GPUs provide VRAM that no consumer GPU has (80GB on H100). You pay for what you use, and you get production reliability. For occasional testing, local is fine. For production or heavy usage, cloud GPU is the correct technical choice.

Still Stuck? Check Your Hardware

Sometimes the code is fine, but the GPU is simply refusing to cooperate. Before you waste another hour debugging, compare your specs against the Hardware Reality Table to see if you are fighting impossible physics.

Fix OpenClaw CUDA OOM: The $0.50 Solution vs. The 4-Hour Debug

Fix OpenClaw CUDA Out of Memory Errors

3-Minute Sanity Check

Decision Gate

Stop fighting VRAM physics.

Should you keep debugging CUDA OOM locally?

Primary Exit Path: Local Optimization

Secondary Exit Path (Conditional)

Approved Next Action

Why NOT Other Options

Summary

FAQ

Q: Will adding more system RAM fix CUDA OOM?

Q: Can I run DeepSeek R1 32B on an RTX 3060 (12GB)?

Q: Why does it work for 10 messages then crash?

Q: Is cloud GPU worth it for OpenClaw?

Still Stuck? Check Your Hardware

Bookmark this site

Fix OpenClaw CUDA Out of Memory Errors

3-Minute Sanity Check

Decision Gate

Stop fighting VRAM physics.

Should you keep debugging CUDA OOM locally?

Primary Exit Path: Local Optimization

Secondary Exit Path (Conditional)

Approved Next Action

Why NOT Other Options

Summary

Related Guides

FAQ

Q: Will adding more system RAM fix CUDA OOM?

Q: Can I run DeepSeek R1 32B on an RTX 3060 (12GB)?

Q: Why does it work for 10 messages then crash?

Q: Is cloud GPU worth it for OpenClaw?

Still Stuck? Check Your Hardware

Bookmark this site