- Nano PDF extracts text from any PDF and feeds it directly to your agent's LLM — no custom parser code needed
- Works with text-based PDFs natively; scanned PDFs use a configurable OCR pipeline (Tesseract by default)
- Attach a PDF in Telegram, WhatsApp, or Discord and the skill fires automatically — no special command required
- Chain it with Summarize, Web Search, or any custom skill for multi-step document processing pipelines
- As of early 2025, supports PDFs up to 50MB with chunked processing for large documents
PDF processing used to mean writing a parser, managing file uploads, handling encoding issues, and then figuring out how to get the text into your LLM context. The Nano PDF Skill removes every one of those steps. Attach a PDF, ask a question, get an answer. That's the whole workflow.
How the Nano PDF Skill Works
The skill sits between your channel adapter and the LLM. When OpenClaw detects a PDF attachment in an incoming message, it routes the file to the Nano PDF Skill's extraction pipeline before the message reaches the language model. The skill extracts the text content, cleans it, chunks it if necessary, and injects it into the LLM context alongside the user's original message.
From the LLM's perspective, it receives a message that says "Here is the content of [filename.pdf]: [extracted text]. The user asks: [original question]." The model has full context. It can answer questions, summarize sections, extract specific data points, or compare content across sections — all based on the actual document content.
This is different from a retrieval-augmented generation setup. There's no vector database, no embedding step, no retrieval query. The entire document text goes into context. For most documents under 50 pages, this is faster, simpler, and more accurate than RAG — because the model sees everything at once rather than only the retrieved chunks.
Use Nano PDF's full-context approach for documents under 80 pages or 100,000 tokens. For longer documents — 200+ page manuals, large legal contracts — use the chunked mode or switch to a RAG-based approach. Full context is more accurate but has model context window limits. Chunked mode processes each segment and merges the results.
Installation and Initial Setup
Nano PDF ships with OpenClaw's built-in skill library in most distributions. If it's not active in your setup, add it from the ClaWHub marketplace or by dropping the SKILL.md into your skills directory.
- Verify the skill is present in your skills directory:
ls ~/.openclaw/skills/ | grep nano-pdf - If absent, install via the CLI:
openclaw skills install nano-pdf - Restart OpenClaw to load the skill:
openclaw restart - Verify it's active:
openclaw skills list— you should seenano-pdfwith statusactive
For scanned PDF support, you'll also need an OCR engine. Tesseract is the default and runs locally:
# Install Tesseract (Ubuntu/Debian)
sudo apt-get install tesseract-ocr
# Install Tesseract (macOS)
brew install tesseract
# Verify installation
tesseract --version
Once Tesseract is installed, the Nano PDF Skill detects it automatically. You don't need to configure anything — the skill checks for Tesseract at startup and enables OCR mode if found.
Configuration Options
The Nano PDF Skill's SKILL.md frontmatter exposes several configuration parameters. Open the file at ~/.openclaw/skills/nano-pdf/SKILL.md to adjust them.
---
name: Nano PDF
trigger: [pdf_attachment]
description: Extracts and processes text from PDF attachments
llm: true
max_size_mb: 50
ocr_enabled: true
ocr_engine: tesseract
chunk_size_tokens: 4000
chunk_overlap_tokens: 200
output_format: plain
---
The key parameters to understand:
- max_size_mb — maximum PDF file size. Increase this for large document workflows. Set to 0 to remove the limit entirely (use with caution on limited hardware).
- ocr_engine — set to
tesseractfor local processing or the name of a cloud OCR provider you've configured in your gateway. - chunk_size_tokens — for documents larger than your model's context window, this controls the size of each chunk. Smaller chunks mean more LLM calls; larger chunks risk hitting context limits.
- output_format —
plainstrips all formatting;markdownattempts to preserve headers, tables, and lists from the source PDF structure.
Every page of extracted PDF text goes into the LLM context and counts toward your token usage. A 50-page PDF can easily be 40,000–80,000 tokens depending on content density. Before enabling this skill for a team, calculate expected token costs at your LLM provider's rates. For GPT-4 class models, a heavily used PDF workflow can be expensive.
Practical Workflows
Contract Review and Extraction
Send a contract PDF to your agent channel with a message like "Extract all payment terms, deadlines, and penalty clauses from this contract." The agent reads the full document and returns a structured summary of exactly those elements. Legal teams using this workflow report cutting first-pass review time from 45 minutes to under 5 minutes per contract.
Research Paper Q&A
Drop a research paper PDF and ask specific questions: "What methodology did the authors use?" or "What were the statistically significant findings?" The agent answers from the actual paper content, not from training data. This is particularly valuable for recent papers published after your model's training cutoff — the skill forces the model to reason from the document itself.
Invoice Data Extraction
Chain Nano PDF with a custom extraction skill to pull structured data from invoices. The PDF skill extracts the text; the downstream skill formats it into JSON with fields for vendor, amount, date, and line items. Feed that JSON to your accounting system via the REST API.
# Example chained skill config in agent SKILL.md
skills:
- nano-pdf # Step 1: extract PDF text
- invoice-extract # Step 2: parse into structured JSON
- post-to-webhook # Step 3: send to accounting system
Report Summarization at Scale
Connect a folder watch or email attachment handler to automatically trigger PDF processing when new reports arrive. Analysts on three teams I've worked with use this to process weekly report drops — ten to twenty PDFs arrive Monday morning, all are summarized automatically, and a digest lands in their Slack channel before 9am. No manual work.
Common Mistakes
- Sending password-protected PDFs without the password — the skill fails silently by default on locked PDFs. Add the password in the trigger message or configure a default decryption key in the frontmatter for internal document workflows.
- Expecting perfect OCR on low-quality scans — Tesseract handles clean scans well but struggles with skewed pages, poor contrast, or handwritten text. For high-accuracy OCR on complex documents, configure a cloud OCR provider.
- Not setting chunk_size_tokens for large documents — without chunking, a 200-page PDF will exceed most model context windows and either fail or get silently truncated. Always set an explicit chunk size for documents over 50 pages.
- Using output_format: markdown on poorly structured PDFs — PDFs without semantic structure produce garbled markdown. Use plain mode for scanned documents and invoices; reserve markdown mode for well-formatted reports with clear heading hierarchies.
- Processing the same PDF repeatedly in the same session — if an agent channel receives the same PDF multiple times, it processes and bills for extraction each time. Cache extraction results in shared memory if you expect repeated references to the same document.
Frequently Asked Questions
What file types does the OpenClaw Nano PDF Skill support?
The Nano PDF Skill handles standard PDFs including text-based, scanned (via OCR), and mixed-content documents. As of early 2025, it supports PDF 1.0 through 2.0 specifications. Password-protected PDFs require the password in the trigger message. EPUB and DOCX are not supported natively — convert them first.
How large a PDF can the Nano PDF Skill process?
The skill processes PDFs up to 50MB by default, configurable via the max_size_mb parameter. For large files, chunked processing mode splits the document into segments and processes each sequentially, combining results at the end — enabling reliable handling of book-length documents.
Does the Nano PDF Skill require an external API?
No external service is needed for text-based PDFs — extraction runs locally using OpenClaw's built-in parser. Scanned PDF OCR optionally uses Tesseract locally or a cloud OCR provider. For most standard text PDFs, the skill works entirely offline beyond your configured LLM endpoint.
Can I send a PDF to an OpenClaw agent via Telegram?
Yes. Attach a PDF to any message in Telegram, WhatsApp, or Discord and OpenClaw's channel adapter detects it, passing it to the Nano PDF Skill automatically. The agent receives both the extracted text and the original filename as context for its response.
How does Nano PDF handle scanned PDFs?
For scanned PDFs, the skill activates its OCR pipeline. Configure your preferred engine in the skill frontmatter — Tesseract runs locally for free, or connect a cloud OCR provider for higher accuracy on complex layouts. OCR adds processing time but works reliably on standard document scans.
Can I chain Nano PDF with other OpenClaw skills?
Chaining is one of the strongest use cases. After Nano PDF extracts text, pass the output to the Summarize Skill, the Web Search Skill for fact-checking, or a custom analysis skill. Define the chain in your agent's SKILL.md by listing skills in execution order under the skills key.
M. Kim specializes in building document intelligence pipelines on OpenClaw for legal, finance, and research teams. Has deployed PDF processing workflows across contracts, compliance reports, and scientific literature databases, and leads product evaluation for AI agent tooling at a Series B SaaS company.