---
name: pdf
metadata:
author: Z.AI
version: "1.0"
description: Professional PDF toolkit with four production lines: (1) Report - structured documents via ReportLab (reports, proposals, contracts, white papers) (2) Creative - visual design via JSON Blueprint โ design_engine.py โ Playwright snapshot (posters, infographics, invitations, dashboards). The LLM acts as Art Director outputting ONLY JSON spatial blueprints; convert.blueprint compiles to pixel-perfect PDF. (3) Academic - scholarly work via LaTeX/Tectonic (papers, theses, math-heavy documents) (4) Process - manipulate existing PDFs (extract, merge, split, fill forms, convert) Auto-routes based on document type. Includes ATS/creative/academic resume sub-paths.
license: Proprietary. LICENSE.txt has complete terms
---
# PDF - Document Production Workbench
## Quick Setup
```bash
bash "$PDF_SKILL_DIR/scripts/setup.sh" # Interactive environment check + install
python3 "$PDF_SKILL_DIR/scripts/pdf.py" env.check # Detailed dependency status (JSON: add -j)
python3 "$PDF_SKILL_DIR/scripts/pdf.py" env.fix # Auto-install missing Python packages
```
## Triage
Determine task weight to control how much context to load:
| Weight | Triggers | What to Load |
|--------|----------|--------------|
| **Light** | Format conversion, form fill, text extract, merge/split, simple certificate | SKILL.md + `briefs/process.md` only |
| **Standard** | Multi-page report, poster, academic paper, resume, reformat - any document with design decisions | SKILL.md + matched brief + typesetting assets on demand |
Light tasks skip typesetting files entirely. Standard tasks load them on demand per the brief's instructions.
### โ ๏ธ Pre-Routing Checks (run BEFORE matching brief)
1. **Emoji Check** - Scan user content for intentional emoji (decorative ๐๐ฏ๐ฅ, not OS-level emoji input). If found โ **force Creative brief** regardless of document type. ReportLab renders emoji as โก squares; LaTeX drops them entirely.
2. **CJK Check** - Chinese/Japanese/Korean content needs font coverage. Report brief must use `UniSong`/`UniHei` registered fonts; Creative brief must load Google Fonts Noto Sans SC with `font-display: swap`; Academic brief must use `\usepackage{ctex}`.
3. **Size Check** - Non-standard page sizes (not A4/Letter/A3) โ prefer Creative brief (Playwright handles any dimension). ReportLab can do custom sizes but pagination is manual.
4. **Character Safety Check** - Before writing any content string, scan for Japanese kana (ใฎใใใใฏ etc.), unusual Unicode symbols, or non-CJK characters that may corrupt during encoding transit ( Especially when code is written via heredoc/base64/LLM output). Replace with plain Chinese equivalents: `ใฎ`โ`ไน/็/็ผ`, `ใ `โomit or write full character. **If content must preserve Japanese, use only standard CJK Unified Ideographs (U+4E00-U+9FFF) and common kana; avoid rare/private-use codepoints.**
---
## Briefing
Match the user's intent to a production brief. Each brief contains the full workflow, tech stack specifics, and references to shared typesetting assets.
```
User Request
โ
โโ Work with existing PDF? โโโโโโโโโโโโโโฌโ Extract/merge/split/fill/convert โ briefs/process.md
โ โโ Reformat/redesign โ briefs/process.md (extract) โ delegate to report or creative brief
โ โโ User provides a PDF template/reference to match style
โ โ briefs/process.md "Template-Guided Reformat" โ delegate to matched brief
โ
โโ Report / proposal / white paper / contract / analysis?
โ โโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ briefs/report.md (ReportLab)
โ
โโ Poster / invitation / infographic / dashboard / creative layout?
โ โโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ briefs/creative.md (Playwright)
โ
โโ Academic paper / thesis / math / IEEE / ACM / LaTeX?
โ โโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ briefs/academic.md (Tectonic)
โ
โโ Math-heavy doc / TikZ diagram / algorithm pseudocode / Beamer slides?
โ โโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ briefs/academic.md (Tectonic, Scenarios A-D)
โ
โโ Document needs complex embedded diagrams (flowcharts, architecture, neural nets)?
โ โโ Route by target brief:
โ โโ Report โ Playwright+CSS โ PNG โ ReportLab Image() flowable
โ โโ Creative โ directly in HTML (CSS flexbox/grid + connectors)
โ โโ Academic โ complexity-based:
โ โโ Simple (โค6 nodes, linear/tree) โ TikZ native (vector)
โ โโ Complex (>6 nodes, branches, annotations) โ Playwright+CSS โ PNG โ \includegraphics
โ
โโ Resume / CV?
โโ ATS-safe / corporate โโโโโโโโโโโ โ briefs/report.md (resume sub-section)
โโ Creative / design industry โโโโโโ โ briefs/creative.md (resume sub-section)
โโ Academic CV / publications โโโโโโ โ briefs/academic.md (resume sub-section)
```
### Detection Keywords
| Brief | Keywords |
|-------|----------|
| Report | ๆฅๅ, report, ๅๆ, analysis, ็ฝ็ฎไนฆ, white paper, ๆๆก, proposal, ๅๅ, contract, ๆนๆก, ่งๅ, ๅ็ฅจ, invoice, ๆถๆฎ, receipt, ่ฏๅท, exam, quiz, test paper, ็ปไน , exercise, worksheet, ่่ฏ, ๆต้ช |
| Creative | ๆตทๆฅ, poster, ้่ฏทๅฝ, invitation, ไฟกๆฏๅพ, infographic, ไปช่กจ็, dashboard, ไผ ๅ, flyer, ่ฏไนฆ, certificate, ่ๅ, menu, ๅ็, business card, ๅฅ็ถ, award, ๆ ็ญพ, label, ไฟกๅฐ, envelope, ่ดบๅก, greeting card |
| Creative (Poster) | ๆตทๆฅ, poster, ไผ ๅ, flyer, ๅฎฃไผ ้กต, ๅฎฃไผ ๅ โ additionally load `briefs/poster.md` scene layer rules |
| Academic | ่ฎบๆ, paper, ๅญฆๆฏ, academic, LaTeX, ๆฐๅญฆ, math, IEEE, ACM, ๆฏไธ, thesis, ็ ็ฉถ, research, Beamer, slides, ๅผ้ขๆฅๅ, ๅญฆไฝ, dissertation, proposal |
| Process | ๆๅ, extract, ๅๅนถ, merge, ๆๅ, split, ๅกซๅ, fill, ่ฝฌๆข, convert, OCR, ้ๆ, reformat, ้ๆฐๆ็, redesign, ๆจกๆฟ, template, ๅ็ ง, ็ ง็่ฟไธชๅ, match this style, ๅ็ผฉ, compress, ๆฐดๅฐ, watermark, ๅ ๅฏ, encrypt, ็ญพๅ, sign |
### Complete Scenario Routing Matrix
Below is an exhaustive map of every known PDF request type to its handling strategy. If a scenario is not listed, route to the closest match or ask the user.
#### ๐ Creation (Generate PDF from scratch)
| Scenario | Route | Notes |
|----------|-------|-------|
| Report / white paper / analysis | report.md | ReportLab structured document |
| Report with emoji | **creative.md** | ๐จ Emoji rule override |
| Business proposal | report.md | Structured + data tables |
| Contract / legal document | report.md | Add signature placeholders (dotted line + label) |
| Invoice / receipt | report.md | Table-heavy, precision alignment |
| Exam / quiz / test paper / worksheet | report.md | Indented options, answer space reservation, structured numbering (see Exam Paper Rules in report.md) |
| Math exam / math worksheet (with formulas/equations) | academic.md | LaTeX for proper math typesetting. See ยงExam Paper Rules in academic.md |
| Poster / flyer | creative.md + **poster.md** | Visual design + poster density/sizing rules |
| Invitation / greeting card | creative.md | Non-standard size, decorative |
| Certificate / award | creative.md | Single page, centered layout, decorative border |
| Business card | creative.md | Tiny size (90ร54mm), Playwright native support |
| Envelope / label | creative.md | Non-standard size, simple layout |
| Menu / price list | creative.md | Visual layout + may contain emoji |
| Resume (ATS) | report.md | Plain text structure |
| Resume (creative) | creative.md | Visual design |
| Resume (academic CV) | academic.md | Publication list + BibTeX |
| Academic paper | academic.md | LaTeX/Tectonic |
| Math-heavy document | academic.md | LaTeX typesetting |
| Presentation / PPT-style | creative.md | Landscape (1280ร720), one topic per page |
| Book / long document | report.md | Add TOC + chapter numbering, validate with toc_validate.py |
| CJK vertical text | creative.md | HTML `writing-mode: vertical-rl` + `text-orientation: upright` + `white-space: nowrap` + Playwright |
| RTL document (Arabic/Hebrew) | creative.md | HTML `dir="rtl"` + Playwright |
| Batch generation (mail merge) | report.md | Python loop + template variable substitution |
| Infographic | creative.md | Data visualization + design |
| Calendar / schedule | creative.md | Grid layout + custom dimensions |
#### ๐ง Processing (Manipulate existing PDF)
| Scenario | Route | Command / Method |
|----------|-------|------------------|
| Merge multiple PDFs | process.md | `pages.merge a.pdf b.pdf -o out.pdf` |
| Split PDF | process.md | `pages.split input.pdf -o ./output/` |
| Extract text | process.md | `extract.text input.pdf` |
| Extract tables | process.md | `extract.table input.pdf` |
| Extract images | process.md | `extract.image input.pdf` |
| Fill forms | process.md | `form.fill input.pdf` |
| Office โ PDF | process.md | `convert.office input.docx` |
| HTML โ PDF (documents) | process.md | `convert.html input.html` or `node html2pdf-next.js` |
| HTML โ PDF (posters) | poster.md | `node html2poster.js poster.html` |
| Image โ PDF | process.md | pikepdf: one image per page, embed as XObject |
| PDF โ image | process-advanced.md | pypdfium2 render each page to PNG |
| Encrypt / decrypt | process-advanced.md | pikepdf encryption |
| Add watermark | process.md | pikepdf overlay: create watermark page โ merge onto each page |
| Compress PDF | process.md | Ghostscript: `gs -sDEVICE=pdfwrite -dPDFSETTINGS=/screen` |
| OCR scanned PDF | process-advanced.md | ocrmypdf or Tesseract |
| Rotate pages | process.md | `pages.rotate input.pdf 90 -o out.pdf` |
| Crop pages | process.md | `pages.crop input.pdf l,b,r,t -o out.pdf` |
| Remove blank pages | process.md | `pages.clean input.pdf` |
| Reformat by template | process.md โ delegate | Extract content โ regenerate via report/creative |
| PDF diff / compare | process.md | `diff-pdf` CLI or Python per-page text comparison |
| Digital signature | process.md | `pyhanko` library (requires extra install) |
| Edit metadata | process.md | `meta.set input.pdf -o out.pdf -d '{...}'` |
### Special Routing Rules
**๐จ Emoji rule (CRITICAL - check FIRST)**: Content with intentional emoji (๐๐ฏ๐ฅ๐ก etc.) โ force **briefs/creative.md** regardless of document type. ReportLab renders emoji as โก squares; LaTeX silently drops them. This rule overrides all other routing. Even if the user says "report" - if the content has emoji, use Creative pipeline.
**Non-standard page size rule**: Dimensions other than A4/Letter/A3 โ strongly prefer **briefs/creative.md**. Playwright handles any arbitrary page size natively. ReportLab requires manual pagination math.
**Academic auto-detect**: Papers, theses, or heavy math โ **briefs/academic.md** even without explicit "LaTeX" mention.
**Template-guided rule**: When the user uploads a PDF and says "match this template" / "follow this style" / "reformat like this" โ **briefs/process.md** Template-Guided Reformat section. This is a Standard triage (not Light), because it involves design decisions.
**Resume routing**: Default to Report brief (ATS-safe). Creative industry โ Creative brief. Academic CV with publications โ Academic brief.
---
## Shared Assets
These are referenced by multiple briefs. **Do not load upfront** - each brief tells you when and what to load.
| Asset | Path | Used By | Purpose |
|-------|------|---------|---------|
| Palette & Typography | `typesetting/palette.md` | Report, Creative | Color system, font rules, anti-patterns, spacing |
| Cover Layout System V2.1 | `typesetting/cover.md` | **Report + Creative + Academic** | 7 industrial-grade templates with absolute anchor grid, Z-index layers, typography weight system, mandatory Summary Block, code-level safety (5 checks), base unit `U = W*0.05`. **Unified HTML/Playwright cover system for all routes.** |
| Chart Styling & Anti-Stacking | `typesetting/charts.md` | Report, Creative, Academic | Chart defaults, collision prevention, axis/grid/legend rules |
| Overflow Prevention | `typesetting/overflow.md` | Report, Creative, Academic | Bounding box system, text/image/table overflow prevention, fallback strategies |
| **Fill Engine (Anti-Void)** | `typesetting/fill-engine.md` | **Report, Creative, Academic** | **Anti-Void Engine V2.0: font floor enforcement, fill ratio calculation, paragraph inflation, component elevation, Y-axis golden-ratio anchoring** |
| Pagination & Flow Control | `typesetting/pagination.md` | Report, Creative | Cross-page integrity, orphan/widow control, CJK punctuation rules |
| Typography System | `typesetting/typography.md` | Report, Creative | Font size scale, line-height, spacing hierarchy |
| Geometric Anchors | `typesetting/geometry.md` | Creative + Report | Decorative geometric elements, anchor placement rules |
| Cover Backgrounds | `typesetting/cover-backgrounds.md` | **Report + Creative + Academic** | Cover background rendering, transparency constraints |
| Visual Framework | `configs/visual_framework.md` | Creative | Palette mode, color harmony, SVG background params |
| Components Library | `configs/components.md` | Creative | Non-grid composition components (floating cards, oversized text, etc.) |
| Font Stacks | `configs/fonts.md` | All pipelines | Font families per pipeline (Google Fonts, ReportLab, LaTeX) |
---
## Content Rules
- **Language**: Match user's query language. Chinese query โ Chinese PDF.
- **Page/word count**: Respect explicit constraints (ยฑ20%). Unspecified โ completeness over brevity.
- **Outline**: User-provided outlines are sacred. No reordering without asking.
- **Citations**: No fabrication. Chinese โ GB/T 7714, English โ APA. Search to verify.
- **Multi-part requests**: Generate ALL parts - never silently drop a component.
### HTML Image Source Path Rules
When embedding images in HTML documents (Creative pipeline, Playwright-rendered diagrams, or any HTMLโPDF flow):
| Image location | `` value | Example |
|---|---|---|
| **Local file** | **Relative path** from the HTML file's directory | `` or `` |
| **Remote URL** | Full URL (no change needed) | `` |
**Iron rules:**
1. **NEVER use absolute paths** for local files in HTML ``, ``, CSS `url()`, or any other asset reference (e.g. `/Users/alice/project/img.png`). Absolute paths break portability across machines and environments.
2. **Always use relative paths** anchored to the HTML file's own directory. If the image lives in a subdirectory, use `images/foo.png` or `./images/foo.png`.
3. **Remote URLs (`http://` / `https://`) are fine as-is** โ do not convert them to local paths.
4. When generating HTML from a script or blueprint, ensure all referenced assets are either (a) in the same directory as the output HTML, or (b) in a clearly named subdirectory (e.g. `assets/`, `images/`), and referenced with relative paths.
5. If a build script needs to resolve paths programmatically, compute relative paths at generation time (e.g. `os.path.relpath(image_path, html_dir)`) rather than embedding absolute filesystem paths.
---
## Figure & Diagram Embedding (All Briefs)
### Iron Rule: Figures Are Block-Level
Figures, diagrams, and charts MUST be independent block elements occupying full width. **Never** float/wrap figures alongside body text - this causes the text-diagram overlap badcase.
| Brief | Correct embedding | Forbidden |
|-------|-------------------|-----------|
| Report (ReportLab) | `story.append(Image(...))` as standalone Flowable | Placing images inside Paragraph text, simulating float |
| Creative (Playwright) | `` | `float:right`, `display:flex` with text, `wrapfigure`-style CSS |
| Academic (LaTeX) | `\begin{figure}[t] ... \end{figure}` | Bare `\includegraphics` in text body (no figure env), bare `tikzpicture` in multi-column |
### Complex Diagram Strategy
When a diagram has **>12 nodes, >3 subgroups, or intricate connections**, do NOT try to render it as one giant figure. Instead:
1. **Table for details** - structured data (phases, components, specs) goes into a proper table
2. **Simplified overview diagram** - a stripped-down flowchart/Mermaid showing only the top-level flow (โค8 nodes)
3. **Cross-reference** - table caption + diagram caption reference each other
This "table + simple diagram" pattern prevents:
- Diagrams overflowing page boundaries
- Text becoming unreadably small to fit everything
- Layout engines mishandling oversized graphics
### Diagram Content Quality Rules (Cross-reference: charts)
The rules above handle **how** to embed diagrams in PDF. For **what the diagram itself looks like** (node layout, connector routing, color, readability), follow the `charts` skill rules:
**Before generating ANY flowchart/diagram for PDF embedding, check these:**
1. **Connectors must not pass through nodes** - If 3+ layers exist, connect adjacent layers only (topโmid, midโbottom). Never draw topโbottom lines through middle nodes. Use detour paths if cross-layer links are needed.
2. **Multiple arrows into one node must not pile up** - Distribute entry points evenly along target edge, or use merge-then-enter pattern (sources converge to a vertical merge line, then single arrow to target).
3. **Low-saturation fills only** - Node backgrounds must be pale (`#EFF6FF`, `#F0FDF4`). High-saturation colors (`#3B82F6`, `#10B981`) only for borders or small accents. No children's-art color schemes.
4. **Phase titles vs sub-steps must be visually distinct** - Different background color, font size, and font weight. Never same-style boxes for both.
5. **Font sizes must be readable at final output size** - Sizes depend on the embedding context:
| Output context | Node title min | Description min | Label min |
|---------------|----------------|-----------------|-----------|
| Standalone PNG (web/presentation, โฅ1200px wide) | 14px | 12px | 11px |
| Embedded in A4 PDF (ReportLab/LaTeX, ~450pt content width) | 10pt | 8pt | 7pt |
| Embedded in slide deck (landscape, ~720pt wide) | 12pt | 10pt | 9pt |
**Principle**: After embedding, the smallest text in the diagram must still be legible when the document is viewed at 100% zoom. If the diagram is scaled down to fit page width, recalculate: `effective_size = original_size ร (display_width / canvas_width)`. If effective size drops below the minimum, either increase original font size or reduce diagram complexity.
6. **Legend/annotations must not overlap content** - Separate container, โฅ 40px gap from last node, fully within canvas bounds.
**For Playwright-rendered diagrams**: Use low-saturation fills (`#EFF6FF`, `#F0FDF4`), CSS flexbox/grid for node layout, SVG ``/`` for connectors, and verify no overlap at final render size.
**For ReportLab-drawn diagrams**: Same principles apply - use `Drawing()` with explicit coordinates, check node bounding boxes for overlap before finalizing.
### Diagram Generation Strategy (Per-Brief)
Diagram rendering depends on the target brief - **NOT** a one-size-fits-all TikZ pipeline.
| Target Brief | Diagram Method | Rationale |
|---|---|---|
| **Report** (ReportLab) | Playwright+CSS โ PNG โ `Image()` | No LaTeX compiler in this route; HTML/CSS handles any layout natively |
| **Creative** (Playwright) | Directly in HTML (CSS flexbox/grid + JS connectors) | Already in browser context |
| **Academic** (Tectonic) - simple (โค6 nodes) | TikZ native `tikzpicture` | Vector output, font consistency, LaTeX-native |
| **Academic** (Tectonic) - complex (>6 nodes) | Playwright+CSS โ PNG @2ร โ `\includegraphics` | TikZ branch logic is error-prone for models; 300dpi PNG is publication-ready |
**Playwright+CSS diagram pipeline (Report & Academic-complex):**
```bash
# 1. Write diagram HTML (CSS grid/flexbox + connectors)
cat > diagram.html << 'EOF'
EOF
# 2. Screenshot at 2ร for print quality (300dpi equivalent)
python3 "$PDF_SKILL_DIR/scripts/pdf.py" convert.blueprint diagram.html --device-scale-factor 2 --output diagram.png
# Or via Playwright directly:
# page.screenshot(path='diagram.png', scale='device', device_scale_factor=2)
# 3a. Embed in ReportLab (Report brief)
from reportlab.platypus import Image
img = Image('diagram.png', width=450) # auto height via aspect ratio
story.append(img)
# 3b. Embed in LaTeX (Academic brief, complex diagrams only)
# \includegraphics[width=\columnwidth]{diagram.png}
```
**๐ซ FORBIDDEN for Report/Creative briefs:** Do NOT use TikZ standalone โ compile โ pdftoppm โ PNG pipeline. This route has no LaTeX compiler and the extra compilation steps are error-prone.
**TikZ remains valid ONLY for:**
- Academic brief with simple diagrams (โค6 nodes, linear/hierarchical)
- Direct `tikzpicture` embedding in LaTeX documents
- Math-annotated diagrams where LaTeX math rendering matters
See `briefs/academic.md` Scenario B for TikZ templates (simple diagrams only).
---
## Vector Rendering Iron Rule
**The final PDF MUST be generated via `page.pdf()` (Playwright) or ReportLab/LaTeX native output - NEVER via screenshot-to-PDF.**
| Scenario | Correct Method | Forbidden |
|----------|---------------|-----------|
| Creative pipeline (single/multi-page) | `page.pdf()` via `convert.blueprint` or `html2pdf-next.js` | `page.screenshot()` โ image โ wrap as PDF |
| Report cover (HTML/Playwright) | `page.pdf()` โ merge via pypdf | Screenshot cover โ embed as image |
| Academic cover | `page.pdf()` โ merge via pypdf | Screenshot โ `\includegraphics` for cover |
| Full-page posters/infographics | `html2poster.js` (auto overflow:hidden + height measurement + `page.pdf()`) | Any raster pipeline for the final output |
**Why:** `page.pdf()` produces vector text + vector shapes. Text remains selectable, sharp at any zoom, and file size is smaller. Screenshot-based PDFs are raster images - blurry when zoomed, unsearchable, and 3-5ร larger.
**The ONLY place screenshot/PNG embedding is acceptable:**
- **Diagrams** embedded as sub-elements inside a larger document (e.g., flowcharts in a Report). These use `page.screenshot()` at 2ร device scale factor for 300dpi print quality, then embed via `Image()` (ReportLab) or `\includegraphics` (LaTeX).
- **Chart images** generated by matplotlib/plotly saved as PNG, then embedded.
These are sub-elements, not the document itself. The document-level PDF output must always be vector.
**Quick test:** Open the generated PDF, zoom to 400%. If text is blurry, you used a screenshot pipeline. Fix it.
### HTMLโPDF Engine Selection Rules
There are **two dedicated scripts** for HTMLโPDF. Choose based on document type:
| Document type | Script | Reason |
|---------------|--------|--------|
| **Posters, infographics, long-image single-page designs** | `html2poster.js` | Auto overflow:hidden, auto height measurement, zero margin, single-page output |
| **Cover pages (Report/Academic route)** | `html2poster.js` | Covers are single-page fixed layouts with absolute positioning โ same nature as posters. `html2pdf-next.js` would convert absoluteโstatic and destroy the layout |
| **Multi-page documents, reports, academic papers, resumes** | `html2pdf-next.js` | A4/custom pagination, 20mm margin fallback, cover adaptation, pdf-lib metadata |
| **Creative pipeline (Blueprint โ HTML โ PDF)** | `html2pdf-next.js` via `convert.blueprint` | Called internally by design_engine pipeline |
#### Poster / Single-Page Long-Image โ `html2poster.js`
```bash
node "$PDF_SKILL_DIR/scripts/html2poster.js" poster.html --output poster.pdf --width 720px
```
`html2poster.js` automatically:
- Forces `overflow: hidden` on `.poster` / `.page` containers (clips decorative overflow)
- Injects `@page { margin: 0 }` (zero margins always)
- Syncs `html/body` background with poster background color
- Measures `.poster` scrollHeight and uses it as PDF height
- Generates a single-page vector PDF with exact content dimensions
**Use this for ANY fixed-width, dynamic-height, single-page design.**
#### Documents / Multi-Page โ `html2pdf-next.js`
```bash
node "$PDF_SKILL_DIR/scripts/html2pdf-next.js" input.html --output output.pdf --width 210mm --height 297mm
# Or via pdf.py wrapper:
python3 "$PDF_SKILL_DIR/scripts/pdf.py" convert.html input.html --output output.pdf
```
Pre-render hooks auto-handle @page injection, overflow detection, cover adaptation, font loading, and pdf-lib metadata.
#### โ ๏ธ Iron Rule: No Hand-Written Playwright Scripts
Common issues with hand-written Python `page.pdf()` (the dedicated scripts handle these automatically):
1. **Missing `@page` rule** โ browser default margin causes content overflow to second page or white edges
2. **Oversized elements not fixed** โ large elements with `break-inside: avoid` block pagination, content gets truncated
3. **Rendering before fonts are loaded** โ Chinese text displays as squares or falls back to wrong font
4. **No overflow detection** โ content exceeds page boundary without awareness
5. **No metadata** โ PDF title, author, and other info missing
**Iron rule: Posters and cover pages use `html2poster.js`, multi-page documents use `html2pdf-next.js`. Do not write hand-written Python Playwright scripts.**
> **โ ๏ธ Cover page gotcha:** Cover HTML uses `position: absolute` for layout. `html2pdf-next.js` pre-render hooks convert absolute-positioned elements to `static` flow (to prevent multi-page overlap), which **destroys** cover layouts. Always use `html2poster.js` for cover pages.
### No overflow:hidden on Fixed-Size Pages (html2pdf-next.js only)
When using `html2pdf-next.js` for documents, **NEVER set `overflow: hidden` on `html`, `body`, or the main page container**.
> **Note:** This rule does NOT apply to posters rendered via `html2poster.js` โ that script automatically adds `overflow: hidden` to `.poster`/`.page` containers to clip decorative overflow. You don't need to add or remove it manually.
| Problem | Cause | Fix |
|---------|-------|-----|
| Browser preview cuts off bottom content, can't scroll | `overflow: hidden` on container + viewport < design height | Remove `overflow: hidden` |
| html2pdf-next.js "Fixed vertical overflow" warning, layout may break | Pre-render detects `scrollHeight > clientHeight` + hidden overflow, force-expands container | Remove `overflow: hidden` |
**Always pair fixed-size pages with `@media screen` auto-scale** so the full page is visible in any browser window without scrolling. See `briefs/creative.md` ยง 0.5 for the CSS pattern.
### Full-Bleed Rule (No White Margins)
When generating HTML for Playwright `page.pdf()`, the content **MUST fill the entire page** with zero margins. White side margins = broken layout.
**Mandatory CSS for any HTML โ PDF:**
```css
@page {
size: ; /* e.g., 720px 960px, or A4 */
margin: 0;
}
html, body {
margin: 0;
padding: 0;
}
```
**Common causes of white margins:**
1. Missing `@page { margin: 0 }` - browser default margins kick in (~1cm each side)
2. Content width doesn't match page width - e.g., canvas is 720px but page is A4 (794px)
3. Missing `@page { size }` declaration in the HTML
4. Content has explicit `max-width` that's narrower than the page
**For blueprint pipeline:** `design_engine.py` now injects `@page { size: var(--canvas-w) var(--canvas-h); margin: 0; }` automatically.
**For raw HTML:** YOU must include the `@page` rule. No exceptions.
**For direct Playwright:** Pass `margin: { top: 0, right: 0, bottom: 0, left: 0 }` to `page.pdf()`.
### Background Color Consistency (No Color Mismatch)
**`html` / `body` background color must match the content canvas background color.**
Playwright `page.pdf({ printBackground: true })` renders the body background color. If body is white while the content area is gray/colored, color-inconsistent borders/gaps will appear in the PDF.
#### Single-color documents (all pages same background)
```css
/* MANDATORY: body background = content background */
html, body {
margin: 0;
padding: 0;
background: var(--c-bg); /* Same color as content canvas */
}
```
#### Multi-page documents with mixed backgrounds (e.g. dark cover + white body pages)
**Root cause:** Playwright resolves `.page { width: 210mm }` and `@page { size: 210mm }` to slightly different sub-pixel values (e.g. 793.688px vs 793.701px). This creates a <1px gap at the right/bottom edge of each `.page` div where `body`'s background shows through. On dark pages, a white `body` background makes this gap visible as a white edge.
**Fix โ set `body` background to the document's dominant dark color:**
```css
:root {
--primary: #0f172a; /* darkest page background */
}
html, body {
margin: 0;
padding: 0;
width: 210mm; /* match @page size */
background: var(--primary); /* fallback for sub-pixel gaps */
}
```
**Why this works and doesn't break white pages:**
- Dark pages: sub-pixel gap reveals dark `body` โ gap invisible.
- White pages: `.page-white { background: #ffffff }` fully covers `body` โ dark body never visible.
- The gap is <1px โ even on white pages, the dark body at the extreme pixel edge is imperceptible after anti-aliasing.
**Rule: when generating multi-page HTML with mixed backgrounds, always set `html, body { background }` to the darkest page's background color.** If all pages are light/white, use the lightest content background (e.g. `#f8fafc`). Never leave `body` background unset (browser default = white = guaranteed white edges on dark pages).
```
### Content Centering (No Left/Right Drift)
**After HTML-to-PDF conversion, content must be centered, no left or right drift allowed.**
Common drift causes:
1. `@page { margin }` not 0 โ browser default margin causes drift
2. `.safe-zone` or content container `inset` / `padding` left-right asymmetric
3. Content container has `max-width` but no `margin: 0 auto`
4. Grid components only occupy partial column width (e.g. `1/1 โ X/7` only uses left half)
5. **Decorative elements overflow page boundary** โ elements with `width > 100%` or negative offsets (e.g. glow circles, gradient overlays) inflate `scrollWidth` beyond page width. Playwright shrinks all content to fit, causing left-shift. **Fix: add `overflow: hidden` to `.page` containers.** See `typesetting/overflow.md` ยง3.5 for horizontal flex overflow rules.
### Anti-Void Edges (No Large Blank Margins)
**Content should not have large meaningless whitespace at page edges, top, or bottom.**
- Content should make full use of page area; do not cram all content in the top half while leaving the bottom blank
- For multi-page documents, each page's fill rate should be โฅ 60% (see `pagination.md` last page โฅ 40% rule)
- For single-page posters/infographics, fill rate should be โฅ 70%
---
## Preflight (Quality Assurance)
Every PDF must pass preflight checks before delivery. Each brief specifies the exact commands.
### HTML Pre-Render Validation (MANDATORY for ALL HTMLโPDF paths)
**Before** calling `html2pdf-next.js`, `html2poster.js`, `convert.blueprint`, or any Playwright `page.pdf()`, run:
```bash
python3 "$PDF_SKILL_DIR/scripts/poster_validate.py" check-html .html
```
| Result | Action |
|--------|--------|
| **PASS** (no errors) | Proceed to PDF generation |
| **ERROR** items | Must fix before generating PDF. Use `--fix --output .html` for auto-repair |
| **WARNING** items | Review; non-blocking but should be addressed |
**Key checks:**
- `OVERFLOW_HIDDEN_CONTAINER` (error): `overflow:hidden` on html/body/.page clips content in browser preview and triggers html2pdf-next.js auto-fix that may break layout
- `FIXED_SIZE_NO_SCREEN_ADAPT` (warning): fixed-size page without `@media screen` auto-scale โ browser preview requires scrolling
- `SCREEN_ADAPT_NO_SCALE` (warning): `@media screen` exists but lacks scale/transform/zoom
- `FONT_NO_FALLBACK` (error): font-family without generic fallback
- `COLOR_CONTRAST` (warning): text/background contrast ratio < 3:1
- Plus: remote images, absolute paths, missing margin reset, tiny fonts, background mismatch, etc.
This applies to **all three HTML routes**: Creative blueprint pipeline, Report HTML covers, and bypass/custom HTML.
### Overflow Prevention System
**โ Full spec: `typesetting/overflow.md`** - read it for any document with tables, images, or multi-column layouts.
Core principles:
1. **Measure first, draw second** - never render content without pre-calculating its dimensions
2. **Bounding Box constraint** - every element's width โค its parent container's `Max_Width`
3. **Text: use font metrics**, not character count, for width calculation
4. **Images: proportional scaling** - never insert at original size
5. **Tables: weight-based column width** + `Paragraph()` wrapping (never plain strings)
6. **Fallback ladder**: wrap โ shrink font (max -3pt) โ reduce padding โ split element โ log warning
7. **Vertical: KeepTogether** for heading+body, chart+caption; `repeatRows=1` for long tables
### Table Overflow Prevention (ReportLab)
**Most common layout bug: table columns exceed page margins.**
Before building any ReportLab Table:
1. Calculate `available_width = page_width - left_margin - right_margin`
2. Use proportional colWidths (`[0.25, 0.40, 0.20, 0.15]` ร available_width) or fixed+flex pattern
3. `sum(colWidths)` must be โค `available_width` - **verify this in code**
4. Long text columns must use `Paragraph()` wrapping, not plain strings (plain strings don't wrap)
5. CJK text is wider: budget ~12pt per character at 10pt font size
See `briefs/report.md` ยง "Table Width Management" for code patterns.
### Table Overflow Prevention (LaTeX/Academic)
**Most common bug in dual-column papers: wide tables overflow single-column width.**
Before writing any LaTeX table:
1. Count data columns - โค 4 fits single column; 5-6 needs `\small`; 7-8 needs `\resizebox`; โฅ 9 use `table*` (full width)
2. Use `tabular*{\columnwidth}` or `tabularx{\columnwidth}` instead of plain `tabular` for 5+ columns
3. Never use plain `tabular` with 8+ columns in twocolumn layout - guaranteed overflow
4. `\resizebox{\columnwidth}{!}` as last resort - verify smallest text โฅ 6pt after scaling
See `briefs/academic.md` ยง "Table width management" for LaTeX patterns.
### Playwright PDF CSS Blacklist
These CSS properties **silently break** in Playwright's PDF renderer:
- `backdrop-filter` / `-webkit-backdrop-filter` - **drops entire element content**. Use solid `rgba()` backgrounds.
- `overflow: hidden` on content containers - clips content. Only safe on small decorative elements (< 200px).
After generating any Playwright PDF, **verify every page has content** (pypdf text extraction, check non-empty).
### PDF Metadata (all briefs)
ALL PDFs must have: Title, Author (default "Z.ai"), Creator, Subject.
### Delivery Summary (all briefs)
Report to user: file path, size, page count. Academic adds word/image count. Creative adds per-page verification.
**HTMLโPDF route deliverables (MANDATORY โ applies to ALL briefs that use Playwright/HTML to generate PDF):**
Whenever the HTMLโPDF pipeline is used (Creative route, Report cover bypass, Direct HTML Flow posters, or any Playwright `page.pdf()` path), you MUST deliver **both files** to the user:
1. **HTML** โ the source HTML file, so the user can edit and reuse the design
2. **PDF** โ the final vector PDF (`page.pdf()` output)
Optionally also provide:
3. **Image** โ a full-page screenshot/preview image (PNG or JPG) for quick sharing on chat/social media
All file paths must be reported to the user. **Never deliver only the PDF without the HTML source.**
---
## Tooling Reference
### CLI: `python3 "$PDF_SKILL_DIR/scripts/pdf.py" `
```bash
# Environment
env.check # Check deps
env.fix # Auto-install missing
# Quality
code.sanitize