Skills & Plugins Built-in Skills Content Monitoring

OpenClaw BlogWatcher: Monitor Any Blog on Full Autopilot

Set BlogWatcher on any RSS feed or web page and your agent starts delivering AI-written summaries the moment new posts appear — keyword-filtered, channel-delivered, and running 24/7 without you lifting a finger.

AL
A. Larsen
Content Automation Specialist
Jan 12, 2025 15 min read 7.8k views
Updated Jan 25, 2025
Key Takeaways
  • BlogWatcher monitors RSS feeds and HTML pages on a configurable schedule and summarises new posts using your LLM.
  • Keyword filtering means you only get notified about content that actually matters to you — not every post on a busy site.
  • Summaries can be pushed to any configured channel: Telegram, Discord, Slack, or email.
  • Content hashing prevents duplicate summaries even when feeds republish or update old entries.
  • As of early 2025, BlogWatcher handles up to 50 monitored feeds per instance without performance issues.

Most people skim 12 tabs daily trying to stay current. BlogWatcher eliminates that entirely. Point it at the sources you care about, tell it which topics matter, and your agent handles the rest — fetching, reading, summarising, and alerting you only when something worth your attention appears.

How BlogWatcher Works

BlogWatcher runs as a persistent background process within your OpenClaw instance. It wakes on a configurable schedule, fetches configured sources, extracts new content, and passes it to your LLM for summarisation. The summary is stored locally and — if configured — pushed to your notification channel.

The flow is simple:

  1. BlogWatcher polls the configured URL on schedule
  2. New entries are identified by URL hash and publish timestamp
  3. Content is extracted (RSS body or HTML scrape)
  4. Keyword filter runs — entries without a match are discarded
  5. LLM generates a 3–5 sentence summary targeting your stated preferences
  6. Summary is stored in BlogWatcher's local index and optionally pushed to your channel

Your agent can query the BlogWatcher index at any time: "What's new from Paul Graham this week?" or "Show me any AI posts from Hacker News today." No real-time fetch required — it reads from the local summary store.

📌
Storage Location

BlogWatcher stores its index at ~/.openclaw/blogwatcher/index.db — a SQLite file. You can query it directly for debugging. The index grows at roughly 2–5 KB per summarised post.

Prerequisites

BlogWatcher ships with OpenClaw v1.5.0 and later. You need:

  • OpenClaw v1.5.0 or later
  • An LLM provider configured in OpenClaw (used for summarisation)
  • Outbound HTTPS access from your OpenClaw host to the sites you want to monitor

No additional dependencies. BlogWatcher uses OpenClaw's built-in HTTP client and your already-configured model provider.

Basic Configuration

Add a blogwatcher block to your ~/.openclaw/config.yaml. Here's a minimal working config monitoring three sources:

# ~/.openclaw/config.yaml
skills:
  blogwatcher:
    enabled: true
    summary_length: medium        # short | medium | long
    dedup_window_days: 30         # ignore posts seen in last 30 days
    feeds:
      - name: "Paul Graham Essays"
        url: "https://www.paulgraham.com/rss.html"
        mode: rss
        poll_interval_minutes: 360   # check every 6 hours
        filter_keywords: []          # empty = all posts
      - name: "OpenAI Blog"
        url: "https://openai.com/blog/rss.xml"
        mode: rss
        poll_interval_minutes: 60
        filter_keywords: ["GPT", "agents", "safety", "API"]
      - name: "Hacker News Front Page"
        url: "https://news.ycombinator.com"
        mode: scrape
        content_selector: ".titleline a"
        poll_interval_minutes: 30
        filter_keywords: ["AI", "LLM", "agent", "Claude", "OpenClaw"]

Save and restart OpenClaw. BlogWatcher starts its polling loop immediately and logs the first fetch results within the configured interval.

RSS vs Scrape Mode

RSS mode is always preferred when available. It's faster, more reliable, and gives BlogWatcher clean structured data — publish date, title, author, full body text. The summary quality is higher because the LLM gets complete context.

Scrape mode is the fallback for sites without RSS. Here's where most people get tripped up.

You need to provide a content_selector — a CSS selector that targets article links or content on the page. Finding the right selector takes 2 minutes with your browser's DevTools: right-click the article title, inspect element, copy the selector.

# Scrape mode example — targeting article titles
- name: "Company Engineering Blog"
  url: "https://example-company.com/blog"
  mode: scrape
  content_selector: "article h2 a"    # CSS selector for article links
  follow_links: true                   # fetch full article content at each link
  poll_interval_minutes: 120

With follow_links: true, BlogWatcher fetches each article's full text before summarising. This produces much better summaries but uses more LLM tokens per post. For high-volume sites, consider leaving it false and summarising from titles and excerpts only.

⚠️
Scrape Rate Limits

Setting poll_interval_minutes below 30 for scrape-mode sources risks getting your OpenClaw host IP blocked by the target site. Respect robots.txt and use sensible intervals. RSS sources can be polled more frequently since they're designed for that use case.

Keyword Filtering

This is the feature that makes BlogWatcher genuinely useful rather than overwhelming. Without filtering, a site like TechCrunch would generate 50+ summaries per day. With the right keywords, it delivers only the 3–4 articles that match your focus.

Filtering runs against the post title and first 500 characters of body content. It's case-insensitive. A post matches if any keyword in your list appears.

Here's what we've seen work well for common monitoring goals:

  • AI news from general tech sites: ["AI", "LLM", "artificial intelligence", "machine learning", "GPT", "Claude", "agents"]
  • Competitor monitoring: ["[CompanyName]", "[ProductName]", "funding", "launch", "announcement"]
  • Developer blogs for security updates: ["vulnerability", "CVE", "patch", "security", "breach", "update"]
  • Market news: ["earnings", "revenue", "acquisition", "IPO", "valuation"]

Start broad and narrow down. Your first week of BlogWatcher usage tells you what's actually appearing — then you can tighten filters based on real signal.

Push Notifications to Your Channel

Every new summarised post can trigger a message to any configured OpenClaw channel. Add notify_channel to your feed config:

feeds:
  - name: "OpenAI Blog"
    url: "https://openai.com/blog/rss.xml"
    mode: rss
    poll_interval_minutes: 60
    filter_keywords: ["GPT", "agents", "API"]
    notify_channel: "telegram"     # must match a key in your gateways block
    notify_format: |
      📰 *New post: {title}*
      {summary}
      🔗 {url}

The notify_format field accepts a template string with {title}, {summary}, {url}, {author}, and {published} placeholders. Format it for the channel you're using — Telegram supports MarkdownV2, Discord supports standard Markdown.

We'll cover channel configuration in detail in a moment — but understand that the channel must be set up in your gateways block first. BlogWatcher references existing channels; it doesn't create them.

💡
Digest Mode

Rather than per-post alerts, set notify_mode: digest and digest_schedule: "0 8 * * *" to receive one daily summary of everything new. Useful for lower-priority sources where you don't need immediate alerts.

Common Mistakes

Three configuration mistakes cause almost every BlogWatcher setup problem:

1. Using scrape mode for a site that has RSS. Most blogs have an RSS feed — it's just not always linked prominently. Check /feed, /rss, /rss.xml, or /atom.xml before defaulting to scrape. RSS mode gives cleaner summaries and avoids selector maintenance when the site redesigns.

2. Setting filter_keywords too narrowly from the start. If your keyword list is very specific, you might miss relevant posts that use different terminology. Start with 5–8 broad terms and refine after seeing what actually appears. You can always add keywords without restarting — BlogWatcher reloads its config on each poll cycle.

3. Not setting dedup_window_days. Without deduplication, if a feed republishes old posts (common with "best of" newsletters), you get re-summaries of content you've already seen. The default 30-day window catches most republishing patterns.

Here's where most people stop. They hit one of these issues, assume BlogWatcher is broken, and abandon it. Fix the config and it runs cleanly for months.

Frequently Asked Questions

What is the OpenClaw BlogWatcher skill?

BlogWatcher is a built-in OpenClaw skill that polls RSS feeds and web pages on schedule, summarises new content using your LLM, and delivers structured digests to your agent. It handles both RSS feeds and HTML scraping for sites without syndication feeds.

Does BlogWatcher work on sites without an RSS feed?

Yes. BlogWatcher includes scrape mode that extracts article content using configurable CSS selectors. It's less reliable than RSS but covers modern blogs that omit feeds. Set mode: scrape and provide a content_selector pointing to article titles or links on the page.

How often does BlogWatcher check for new posts?

The check interval is configurable per feed via poll_interval_minutes (default 60). You can set high-priority sources to 15 minutes and low-priority ones to daily. OpenClaw staggers requests automatically to avoid hammering servers during simultaneous polls.

Can BlogWatcher notify me through a messaging channel?

Yes. Pair BlogWatcher with any configured OpenClaw channel — Telegram, Discord, Slack, or email — and new post digests are pushed automatically. Set notify_channel in your feed config to the channel name from your gateways block.

What happens if a blog post is behind a paywall?

BlogWatcher fetches only publicly accessible content. Paywalled articles return partial content or a subscription prompt. The skill extracts whatever is available and marks the summary as truncated. Full-text extraction requires a logged-in session, which BlogWatcher doesn't support.

How does BlogWatcher avoid summarising duplicate content?

Every fetched post gets a content hash stored in BlogWatcher's local SQLite index. Before summarising, it checks the hash against previously processed entries. The dedup_window_days setting controls how far back the dedup check looks — default is 30 days.

Can I filter posts by keyword before summarising?

Yes. The filter_keywords option accepts a list of strings. BlogWatcher only summarises and delivers posts where the title or first 500 characters contain at least one match. This is essential for monitoring high-volume sites like TechCrunch for only specific topics.

AL
A. Larsen
Content Automation Specialist · aiagentsguides.com

A. Larsen builds and maintains automated content monitoring pipelines for research teams. She currently runs BlogWatcher across 38 feeds covering AI research, developer tooling, and competitive intelligence — all delivered to a single Telegram channel.

Get new guides every week.

Join 50,000 OpenClaw users. No spam, ever.