OpenClaw Peekaboo Skill: See What Your Agent Sees in Real Time

Q: Can Peekaboo capture a specific window rather than the full screen?

Yes. Set capture_mode: window in the skill config and specify the target_window_title pattern. Peekaboo uses partial string matching, so 'Chrome' matches any window whose title contains 'Chrome', giving you precise targeting without exact title strings.

Key Takeaways

Peekaboo is built into OpenClaw — enable it with three config lines and a screen permission grant
Supports full-screen capture, single-window targeting, and region-based cropping
Visual output can be routed directly to any vision-capable model (GPT-4o, Claude 3.5, etc.)
macOS requires explicit Screen Recording permission — forgetting this is the #1 setup failure
Set jpeg_quality between 60–80 for fast, low-overhead captures; use PNG only when OCR accuracy matters

Agents that can only read text are half an agent. Peekaboo changes that. With one config block, your OpenClaw instance gains the ability to take screenshots, describe what is on screen, and trigger downstream actions based on visual state. We have consistently seen users unlock new automation categories the moment this skill is running — from interface monitoring to visual QA workflows.

What the Peekaboo Skill Does

Peekaboo is a screen-capture skill that gives your agent access to visual data from your machine's display. It is not a remote desktop tool and it does not stream video. It captures still frames — either on demand when your agent calls for them, or automatically on a configured interval.

Those captures are then available to your agent as base64-encoded image data, which can be passed directly to a vision-capable model for analysis. The model describes the screen, extracts text via OCR, identifies UI state, or makes decisions based on what it sees.

Three things Peekaboo is actively used for in production:

Interface monitoring — watch a dashboard or app for specific visual changes and trigger alerts
Visual QA — capture the state of a web or desktop application at key steps and verify it looks correct
Context injection — let a conversational agent see what you are looking at when you ask it a question

We'll get to the automation workflows in depth — but first, the permissions step that blocks most new setups.

Platform Permissions — Do This First

Peekaboo requires OS-level permission to capture screen content. Grant this before trying to configure the skill or you will waste time debugging a permission error disguised as a config error.

macOS

System Settings › Privacy & Security › Screen Recording

Find OpenClaw in the list and toggle it on. If OpenClaw is not listed, run it once, deny the permission prompt, then come back here — it will appear after the first attempt. You must restart OpenClaw after granting the permission.

Windows

No Special Permission Required

Windows does not require an explicit screen-capture permission for desktop applications. The Peekaboo skill works out of the box on Windows 10 and 11 once configured. If you are running OpenClaw in a sandboxed or constrained environment (WSL, container), screen capture will fail.

Linux

X11 vs Wayland

Peekaboo works on X11 without additional setup. Wayland requires the xdg-desktop-portal backend to be running. Check your compositor's documentation. Most mainstream desktop environments (GNOME 45+, KDE Plasma 6) handle this automatically.

⚠️

Permission Change Requires Restart

On macOS, granting Screen Recording permission to a running process does not take effect until that process restarts. Always restart OpenClaw after changing the permission. If you test before restarting, you will get a permission denied error and assume the config is wrong — it is not.

Enabling the Peekaboo Skill

Add the following block to your OpenClaw config file under the skills: section:

skills:
  peekaboo:
    enabled: true
    capture_mode: fullscreen          # fullscreen | window | region
    image_format: jpeg                # jpeg | png
    jpeg_quality: 75                  # 1–100, only for jpeg format
    capture_interval: 0               # seconds; 0 = on-demand only
    save_captures: false              # persist to disk
    capture_dir: "~/.openclaw/caps"  # used only if save_captures: true
    max_capture_age: 60               # discard captures older than N seconds
    target_window_title: ""           # partial match; used in window mode

After saving, restart OpenClaw. Confirm the skill loaded:

[2025-01-21 10:44:01] INFO  Skill loaded: peekaboo (v1.7.0)
[2025-01-21 10:44:01] INFO  Peekaboo: display detected, capture ready

Full Config Reference

Option	Default	Notes
`capture_mode`	fullscreen	fullscreen, window, or region
`image_format`	jpeg	jpeg for speed; png for OCR accuracy
`jpeg_quality`	80	60–75 is adequate for most agent tasks
`capture_interval`	0	0 = on-demand only; set seconds for continuous capture
`target_window_title`	""	Partial string match against window titles
`region`	null	Object with x, y, width, height in pixels

Capture Modes Explained

The three capture modes serve different use cases and have meaningfully different performance profiles.

Fullscreen Mode

Captures the entire primary display. Use this when you want your agent to have full situational awareness of your screen. It generates the largest images and the highest token count when sent to a vision model, so set jpeg_quality appropriately.

Window Mode

Captures a single named window. This is the most common production configuration. Set target_window_title to a partial string that matches the window you want to monitor. The skill matches case-insensitively, so "chrome" captures any Chrome window currently in focus or visible.

skills:
  peekaboo:
    enabled: true
    capture_mode: window
    target_window_title: "Dashboard"   # captures any window titled "Dashboard"
    jpeg_quality: 70

Region Mode

Captures a pixel-defined rectangle of the screen. This is the most efficient mode — smallest file, lowest token cost, highest precision. Use it when you know exactly where the relevant content appears on screen.

skills:
  peekaboo:
    enabled: true
    capture_mode: region
    region:
      x: 0
      y: 0
      width: 800
      height: 400

💡

Region Mode for Cost Control

If you are routing Peekaboo captures to a paid vision API, region mode dramatically reduces token costs. A 800×400 region at JPEG 70 quality typically costs 60–70% fewer tokens than a 1440×900 full-screen capture of the same content area.

Automation Workflows

Here are three patterns that consistently produce high value in real deployments.

Visual Change Detection

Run Peekaboo on a 10-second interval. After each capture, pass the image to a vision model with a prompt like "Does this dashboard show any red status indicators or error alerts? Respond with YES or NO only." When the model returns YES, trigger a notification. This turns a passive screen into an active monitoring system without writing custom scraping code.

Ask Your Agent What You're Looking At

Configure a hotkey trigger that fires a Peekaboo capture on demand, then routes the image to your agent with the message "Describe what is on my screen and answer this question: [user input]." This makes your agent context-aware in seconds. The agent sees exactly what you see when you ask it something.

Automated UI Testing

Pair Peekaboo with OpenClaw's step-trigger system to capture a screenshot at each stage of a workflow, pass it to a vision model, and assert that the expected UI state is present. This is a lightweight visual regression test that runs inside your existing agent infrastructure.

Performance Tuning

Continuous capture at high frequency adds up. Here is what we have measured:

Fullscreen JPEG 80, every 5 seconds: ~3% CPU, ~15MB/hour disk if saving
Window mode JPEG 70, every 10 seconds: ~1.5% CPU, ~4MB/hour disk if saving
Region mode JPEG 65, every 5 seconds: under 1% CPU, ~1MB/hour disk if saving

PNG is 3–5x larger than JPEG at equivalent quality. Only use PNG when you are running OCR on text-heavy screens where JPEG compression artifacts might cause character misreads.

Common Mistakes

Here's where most people stop when Peekaboo does not work as expected.

Permission not granted before config — the skill initializes but every capture attempt fails silently. Always grant screen permission first, then configure
Window mode with an exact title — target_window_title does partial matching. If your window title contains dynamic content (e.g., "Dashboard — March 28"), use a static substring like "Dashboard"
capture_interval set too low — running at 1-second intervals while routing to a vision API will hit rate limits and generate enormous API costs within minutes. Start at 30 seconds or higher and reduce only if needed
Forgetting max_capture_age — if your agent calls peekaboo.get_latest and no recent capture exists, it returns null. Set capture_interval or call peekaboo.capture_now explicitly before reading
Using fullscreen mode in headless environments — if OpenClaw runs on a server without a display, fullscreen capture fails. Use a virtual display (Xvfb on Linux) or switch to window mode targeting a specific headless app window

Frequently Asked Questions

What is the OpenClaw Peekaboo skill?

Peekaboo is a built-in OpenClaw skill that captures screenshots or window content on demand or on a schedule, then makes that visual data available to your agent for analysis, description, or automated decision-making based on visual state.

Does Peekaboo require special screen permissions on macOS?

Yes. macOS requires explicit screen recording permission for any process capturing screen content. Grant it under System Settings > Privacy & Security > Screen Recording. OpenClaw must be in the approved list before Peekaboo will function — and you must restart OpenClaw after granting it.

Can Peekaboo capture a specific window rather than the full screen?

Yes. Set capture_mode: window in the skill config and specify the target_window_title pattern. Peekaboo uses partial string matching, so "Chrome" matches any window whose title contains "Chrome", giving you precise targeting without exact title strings.

How much does frequent Peekaboo capture affect system performance?

At 5-second intervals, Peekaboo uses roughly 2–4% CPU on a modern machine. Compress captures using jpeg_quality: 70 and limit capture_interval to the minimum your use case needs. Full-resolution PNG every second will noticeably impact performance.

Can I use Peekaboo to monitor a remote desktop session?

Peekaboo captures the local display only — it does not have network reach. For remote monitoring, run OpenClaw with Peekaboo on the remote machine and relay results back via a channel integration like Telegram or Slack.

Does Peekaboo store screenshots to disk?

By default, Peekaboo keeps captures in memory only and discards them after analysis. Set save_captures: true and specify a capture_dir path to persist screenshots. Be aware that this accumulates storage quickly on frequent capture schedules.

What vision models work best with Peekaboo output?

As of early 2025, GPT-4o and Claude 3.5 Sonnet return the most accurate screen content descriptions. Configure your agent to route Peekaboo outputs through a vision-capable model using the model_override option in your agent config.

You now have everything needed to give your OpenClaw agent real visual awareness. Grant the screen permission, pick the right capture mode for your use case, and start with on-demand capture before committing to a scheduled interval. The change-detection workflow is the fastest path from setup to visible value.

Enable Peekaboo, point it at one window you care about, and ask your agent what it sees. Everything else follows from there.

M. Kim

Skills & Integrations Specialist · aiagentsguides.com

M. Kim specializes in OpenClaw's visual and multimedia skills. She has built production monitoring systems using Peekaboo for clients in SaaS operations and DevOps, and documented the skill's behavior across macOS, Windows, and Linux environments.