--- name: pdf metadata: author: Z.AI version: "1.0" description: Professional PDF toolkit with four production lines: (1) Report - structured documents via ReportLab (reports, proposals, contracts, white papers) (2) Creative - visual design via JSON Blueprint โ†’ design_engine.py โ†’ Playwright snapshot (posters, infographics, invitations, dashboards). The LLM acts as Art Director outputting ONLY JSON spatial blueprints; convert.blueprint compiles to pixel-perfect PDF. (3) Academic - scholarly work via LaTeX/Tectonic (papers, theses, math-heavy documents) (4) Process - manipulate existing PDFs (extract, merge, split, fill forms, convert) Auto-routes based on document type. Includes ATS/creative/academic resume sub-paths. license: Proprietary. LICENSE.txt has complete terms --- # PDF - Document Production Workbench ## Quick Setup ```bash bash "$PDF_SKILL_DIR/scripts/setup.sh" # Interactive environment check + install python3 "$PDF_SKILL_DIR/scripts/pdf.py" env.check # Detailed dependency status (JSON: add -j) python3 "$PDF_SKILL_DIR/scripts/pdf.py" env.fix # Auto-install missing Python packages ``` ## Triage Determine task weight to control how much context to load: | Weight | Triggers | What to Load | |--------|----------|--------------| | **Light** | Format conversion, form fill, text extract, merge/split, simple certificate | SKILL.md + `briefs/process.md` only | | **Standard** | Multi-page report, poster, academic paper, resume, reformat - any document with design decisions | SKILL.md + matched brief + typesetting assets on demand | Light tasks skip typesetting files entirely. Standard tasks load them on demand per the brief's instructions. ### โš ๏ธ Pre-Routing Checks (run BEFORE matching brief) 1. **Emoji Check** - Scan user content for intentional emoji (decorative ๐Ÿ“Š๐ŸŽฏ๐Ÿ”ฅ, not OS-level emoji input). If found โ†’ **force Creative brief** regardless of document type. ReportLab renders emoji as โ–ก squares; LaTeX drops them entirely. 2. **CJK Check** - Chinese/Japanese/Korean content needs font coverage. Report brief must use `UniSong`/`UniHei` registered fonts; Creative brief must load Google Fonts Noto Sans SC with `font-display: swap`; Academic brief must use `\usepackage{ctex}`. 3. **Size Check** - Non-standard page sizes (not A4/Letter/A3) โ†’ prefer Creative brief (Playwright handles any dimension). ReportLab can do custom sizes but pagination is manual. 4. **Character Safety Check** - Before writing any content string, scan for Japanese kana (ใฎใ€ใŒใ€ใฏ etc.), unusual Unicode symbols, or non-CJK characters that may corrupt during encoding transit ( Especially when code is written via heredoc/base64/LLM output). Replace with plain Chinese equivalents: `ใฎ`โ†’`ไน‹/็š„/็ผ”`, `ใ€…`โ†’omit or write full character. **If content must preserve Japanese, use only standard CJK Unified Ideographs (U+4E00-U+9FFF) and common kana; avoid rare/private-use codepoints.** --- ## Briefing Match the user's intent to a production brief. Each brief contains the full workflow, tech stack specifics, and references to shared typesetting assets. ``` User Request โ”‚ โ”œโ”€ Work with existing PDF? โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€ Extract/merge/split/fill/convert โ†’ briefs/process.md โ”‚ โ”œโ”€ Reformat/redesign โ†’ briefs/process.md (extract) โ†’ delegate to report or creative brief โ”‚ โ””โ”€ User provides a PDF template/reference to match style โ”‚ โ†’ briefs/process.md "Template-Guided Reformat" โ†’ delegate to matched brief โ”‚ โ”œโ”€ Report / proposal / white paper / contract / analysis? โ”‚ โ””โ”€ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ†’ briefs/report.md (ReportLab) โ”‚ โ”œโ”€ Poster / invitation / infographic / dashboard / creative layout? โ”‚ โ””โ”€ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ†’ briefs/creative.md (Playwright) โ”‚ โ”œโ”€ Academic paper / thesis / math / IEEE / ACM / LaTeX? โ”‚ โ””โ”€ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ†’ briefs/academic.md (Tectonic) โ”‚ โ”œโ”€ Math-heavy doc / TikZ diagram / algorithm pseudocode / Beamer slides? โ”‚ โ””โ”€ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ†’ briefs/academic.md (Tectonic, Scenarios A-D) โ”‚ โ”œโ”€ Document needs complex embedded diagrams (flowcharts, architecture, neural nets)? โ”‚ โ””โ”€ Route by target brief: โ”‚ โ”œโ”€ Report โ†’ Playwright+CSS โ†’ PNG โ†’ ReportLab Image() flowable โ”‚ โ”œโ”€ Creative โ†’ directly in HTML (CSS flexbox/grid + connectors) โ”‚ โ””โ”€ Academic โ†’ complexity-based: โ”‚ โ”œโ”€ Simple (โ‰ค6 nodes, linear/tree) โ†’ TikZ native (vector) โ”‚ โ””โ”€ Complex (>6 nodes, branches, annotations) โ†’ Playwright+CSS โ†’ PNG โ†’ \includegraphics โ”‚ โ””โ”€ Resume / CV? โ”œโ”€ ATS-safe / corporate โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ†’ briefs/report.md (resume sub-section) โ”œโ”€ Creative / design industry โ”€โ”€โ”€โ”€โ”€โ”€ โ†’ briefs/creative.md (resume sub-section) โ””โ”€ Academic CV / publications โ”€โ”€โ”€โ”€โ”€โ”€ โ†’ briefs/academic.md (resume sub-section) ``` ### Detection Keywords | Brief | Keywords | |-------|----------| | Report | ๆŠฅๅ‘Š, report, ๅˆ†ๆž, analysis, ็™ฝ็šฎไนฆ, white paper, ๆๆกˆ, proposal, ๅˆๅŒ, contract, ๆ–นๆกˆ, ่ง„ๅˆ’, ๅ‘็ฅจ, invoice, ๆ”ถๆฎ, receipt, ่ฏ•ๅท, exam, quiz, test paper, ็ปƒไน , exercise, worksheet, ่€ƒ่ฏ•, ๆต‹้ชŒ | | Creative | ๆตทๆŠฅ, poster, ้‚€่ฏทๅ‡ฝ, invitation, ไฟกๆฏๅ›พ, infographic, ไปช่กจ็›˜, dashboard, ไผ ๅ•, flyer, ่ฏไนฆ, certificate, ่œๅ•, menu, ๅ็‰‡, business card, ๅฅ–็Šถ, award, ๆ ‡็ญพ, label, ไฟกๅฐ, envelope, ่ดบๅก, greeting card | | Creative (Poster) | ๆตทๆŠฅ, poster, ไผ ๅ•, flyer, ๅฎฃไผ ้กต, ๅฎฃไผ ๅ• โ†’ additionally load `briefs/poster.md` scene layer rules | | Academic | ่ฎบๆ–‡, paper, ๅญฆๆœฏ, academic, LaTeX, ๆ•ฐๅญฆ, math, IEEE, ACM, ๆฏ•ไธš, thesis, ็ ”็ฉถ, research, Beamer, slides, ๅผ€้ข˜ๆŠฅๅ‘Š, ๅญฆไฝ, dissertation, proposal | | Process | ๆๅ–, extract, ๅˆๅนถ, merge, ๆ‹†ๅˆ†, split, ๅกซๅ†™, fill, ่ฝฌๆข, convert, OCR, ้‡ๆŽ’, reformat, ้‡ๆ–ฐๆŽ’็‰ˆ, redesign, ๆจกๆฟ, template, ๅ‚็…ง, ็…ง็€่ฟ™ไธชๅš, match this style, ๅŽ‹็ผฉ, compress, ๆฐดๅฐ, watermark, ๅŠ ๅฏ†, encrypt, ็ญพๅ, sign | ### Complete Scenario Routing Matrix Below is an exhaustive map of every known PDF request type to its handling strategy. If a scenario is not listed, route to the closest match or ask the user. #### ๐Ÿ“„ Creation (Generate PDF from scratch) | Scenario | Route | Notes | |----------|-------|-------| | Report / white paper / analysis | report.md | ReportLab structured document | | Report with emoji | **creative.md** | ๐Ÿšจ Emoji rule override | | Business proposal | report.md | Structured + data tables | | Contract / legal document | report.md | Add signature placeholders (dotted line + label) | | Invoice / receipt | report.md | Table-heavy, precision alignment | | Exam / quiz / test paper / worksheet | report.md | Indented options, answer space reservation, structured numbering (see Exam Paper Rules in report.md) | | Math exam / math worksheet (with formulas/equations) | academic.md | LaTeX for proper math typesetting. See ยงExam Paper Rules in academic.md | | Poster / flyer | creative.md + **poster.md** | Visual design + poster density/sizing rules | | Invitation / greeting card | creative.md | Non-standard size, decorative | | Certificate / award | creative.md | Single page, centered layout, decorative border | | Business card | creative.md | Tiny size (90ร—54mm), Playwright native support | | Envelope / label | creative.md | Non-standard size, simple layout | | Menu / price list | creative.md | Visual layout + may contain emoji | | Resume (ATS) | report.md | Plain text structure | | Resume (creative) | creative.md | Visual design | | Resume (academic CV) | academic.md | Publication list + BibTeX | | Academic paper | academic.md | LaTeX/Tectonic | | Math-heavy document | academic.md | LaTeX typesetting | | Presentation / PPT-style | creative.md | Landscape (1280ร—720), one topic per page | | Book / long document | report.md | Add TOC + chapter numbering, validate with toc_validate.py | | CJK vertical text | creative.md | HTML `writing-mode: vertical-rl` + `text-orientation: upright` + `white-space: nowrap` + Playwright | | RTL document (Arabic/Hebrew) | creative.md | HTML `dir="rtl"` + Playwright | | Batch generation (mail merge) | report.md | Python loop + template variable substitution | | Infographic | creative.md | Data visualization + design | | Calendar / schedule | creative.md | Grid layout + custom dimensions | #### ๐Ÿ”ง Processing (Manipulate existing PDF) | Scenario | Route | Command / Method | |----------|-------|------------------| | Merge multiple PDFs | process.md | `pages.merge a.pdf b.pdf -o out.pdf` | | Split PDF | process.md | `pages.split input.pdf -o ./output/` | | Extract text | process.md | `extract.text input.pdf` | | Extract tables | process.md | `extract.table input.pdf` | | Extract images | process.md | `extract.image input.pdf` | | Fill forms | process.md | `form.fill input.pdf` | | Office โ†’ PDF | process.md | `convert.office input.docx` | | HTML โ†’ PDF (documents) | process.md | `convert.html input.html` or `node html2pdf-next.js` | | HTML โ†’ PDF (posters) | poster.md | `node html2poster.js poster.html` | | Image โ†’ PDF | process.md | pikepdf: one image per page, embed as XObject | | PDF โ†’ image | process-advanced.md | pypdfium2 render each page to PNG | | Encrypt / decrypt | process-advanced.md | pikepdf encryption | | Add watermark | process.md | pikepdf overlay: create watermark page โ†’ merge onto each page | | Compress PDF | process.md | Ghostscript: `gs -sDEVICE=pdfwrite -dPDFSETTINGS=/screen` | | OCR scanned PDF | process-advanced.md | ocrmypdf or Tesseract | | Rotate pages | process.md | `pages.rotate input.pdf 90 -o out.pdf` | | Crop pages | process.md | `pages.crop input.pdf l,b,r,t -o out.pdf` | | Remove blank pages | process.md | `pages.clean input.pdf` | | Reformat by template | process.md โ†’ delegate | Extract content โ†’ regenerate via report/creative | | PDF diff / compare | process.md | `diff-pdf` CLI or Python per-page text comparison | | Digital signature | process.md | `pyhanko` library (requires extra install) | | Edit metadata | process.md | `meta.set input.pdf -o out.pdf -d '{...}'` | ### Special Routing Rules **๐Ÿšจ Emoji rule (CRITICAL - check FIRST)**: Content with intentional emoji (๐Ÿ“Š๐ŸŽฏ๐Ÿ”ฅ๐Ÿ’ก etc.) โ†’ force **briefs/creative.md** regardless of document type. ReportLab renders emoji as โ–ก squares; LaTeX silently drops them. This rule overrides all other routing. Even if the user says "report" - if the content has emoji, use Creative pipeline. **Non-standard page size rule**: Dimensions other than A4/Letter/A3 โ†’ strongly prefer **briefs/creative.md**. Playwright handles any arbitrary page size natively. ReportLab requires manual pagination math. **Academic auto-detect**: Papers, theses, or heavy math โ†’ **briefs/academic.md** even without explicit "LaTeX" mention. **Template-guided rule**: When the user uploads a PDF and says "match this template" / "follow this style" / "reformat like this" โ†’ **briefs/process.md** Template-Guided Reformat section. This is a Standard triage (not Light), because it involves design decisions. **Resume routing**: Default to Report brief (ATS-safe). Creative industry โ†’ Creative brief. Academic CV with publications โ†’ Academic brief. --- ## Shared Assets These are referenced by multiple briefs. **Do not load upfront** - each brief tells you when and what to load. | Asset | Path | Used By | Purpose | |-------|------|---------|---------| | Palette & Typography | `typesetting/palette.md` | Report, Creative | Color system, font rules, anti-patterns, spacing | | Cover Layout System V2.1 | `typesetting/cover.md` | **Report + Creative + Academic** | 7 industrial-grade templates with absolute anchor grid, Z-index layers, typography weight system, mandatory Summary Block, code-level safety (5 checks), base unit `U = W*0.05`. **Unified HTML/Playwright cover system for all routes.** | | Chart Styling & Anti-Stacking | `typesetting/charts.md` | Report, Creative, Academic | Chart defaults, collision prevention, axis/grid/legend rules | | Overflow Prevention | `typesetting/overflow.md` | Report, Creative, Academic | Bounding box system, text/image/table overflow prevention, fallback strategies | | **Fill Engine (Anti-Void)** | `typesetting/fill-engine.md` | **Report, Creative, Academic** | **Anti-Void Engine V2.0: font floor enforcement, fill ratio calculation, paragraph inflation, component elevation, Y-axis golden-ratio anchoring** | | Pagination & Flow Control | `typesetting/pagination.md` | Report, Creative | Cross-page integrity, orphan/widow control, CJK punctuation rules | | Typography System | `typesetting/typography.md` | Report, Creative | Font size scale, line-height, spacing hierarchy | | Geometric Anchors | `typesetting/geometry.md` | Creative + Report | Decorative geometric elements, anchor placement rules | | Cover Backgrounds | `typesetting/cover-backgrounds.md` | **Report + Creative + Academic** | Cover background rendering, transparency constraints | | Visual Framework | `configs/visual_framework.md` | Creative | Palette mode, color harmony, SVG background params | | Components Library | `configs/components.md` | Creative | Non-grid composition components (floating cards, oversized text, etc.) | | Font Stacks | `configs/fonts.md` | All pipelines | Font families per pipeline (Google Fonts, ReportLab, LaTeX) | --- ## Content Rules - **Language**: Match user's query language. Chinese query โ†’ Chinese PDF. - **Page/word count**: Respect explicit constraints (ยฑ20%). Unspecified โ†’ completeness over brevity. - **Outline**: User-provided outlines are sacred. No reordering without asking. - **Citations**: No fabrication. Chinese โ†’ GB/T 7714, English โ†’ APA. Search to verify. - **Multi-part requests**: Generate ALL parts - never silently drop a component. ### HTML Image Source Path Rules When embedding images in HTML documents (Creative pipeline, Playwright-rendered diagrams, or any HTMLโ†’PDF flow): | Image location | `` value | Example | |---|---|---| | **Local file** | **Relative path** from the HTML file's directory | `` or `` | | **Remote URL** | Full URL (no change needed) | `` | **Iron rules:** 1. **NEVER use absolute paths** for local files in HTML ``, ``, CSS `url()`, or any other asset reference (e.g. `/Users/alice/project/img.png`). Absolute paths break portability across machines and environments. 2. **Always use relative paths** anchored to the HTML file's own directory. If the image lives in a subdirectory, use `images/foo.png` or `./images/foo.png`. 3. **Remote URLs (`http://` / `https://`) are fine as-is** โ€” do not convert them to local paths. 4. When generating HTML from a script or blueprint, ensure all referenced assets are either (a) in the same directory as the output HTML, or (b) in a clearly named subdirectory (e.g. `assets/`, `images/`), and referenced with relative paths. 5. If a build script needs to resolve paths programmatically, compute relative paths at generation time (e.g. `os.path.relpath(image_path, html_dir)`) rather than embedding absolute filesystem paths. --- ## Figure & Diagram Embedding (All Briefs) ### Iron Rule: Figures Are Block-Level Figures, diagrams, and charts MUST be independent block elements occupying full width. **Never** float/wrap figures alongside body text - this causes the text-diagram overlap badcase. | Brief | Correct embedding | Forbidden | |-------|-------------------|-----------| | Report (ReportLab) | `story.append(Image(...))` as standalone Flowable | Placing images inside Paragraph text, simulating float | | Creative (Playwright) | `
` | `float:right`, `display:flex` with text, `wrapfigure`-style CSS | | Academic (LaTeX) | `\begin{figure}[t] ... \end{figure}` | Bare `\includegraphics` in text body (no figure env), bare `tikzpicture` in multi-column | ### Complex Diagram Strategy When a diagram has **>12 nodes, >3 subgroups, or intricate connections**, do NOT try to render it as one giant figure. Instead: 1. **Table for details** - structured data (phases, components, specs) goes into a proper table 2. **Simplified overview diagram** - a stripped-down flowchart/Mermaid showing only the top-level flow (โ‰ค8 nodes) 3. **Cross-reference** - table caption + diagram caption reference each other This "table + simple diagram" pattern prevents: - Diagrams overflowing page boundaries - Text becoming unreadably small to fit everything - Layout engines mishandling oversized graphics ### Diagram Content Quality Rules (Cross-reference: charts) The rules above handle **how** to embed diagrams in PDF. For **what the diagram itself looks like** (node layout, connector routing, color, readability), follow the `charts` skill rules: **Before generating ANY flowchart/diagram for PDF embedding, check these:** 1. **Connectors must not pass through nodes** - If 3+ layers exist, connect adjacent layers only (topโ†’mid, midโ†’bottom). Never draw topโ†’bottom lines through middle nodes. Use detour paths if cross-layer links are needed. 2. **Multiple arrows into one node must not pile up** - Distribute entry points evenly along target edge, or use merge-then-enter pattern (sources converge to a vertical merge line, then single arrow to target). 3. **Low-saturation fills only** - Node backgrounds must be pale (`#EFF6FF`, `#F0FDF4`). High-saturation colors (`#3B82F6`, `#10B981`) only for borders or small accents. No children's-art color schemes. 4. **Phase titles vs sub-steps must be visually distinct** - Different background color, font size, and font weight. Never same-style boxes for both. 5. **Font sizes must be readable at final output size** - Sizes depend on the embedding context: | Output context | Node title min | Description min | Label min | |---------------|----------------|-----------------|-----------| | Standalone PNG (web/presentation, โ‰ฅ1200px wide) | 14px | 12px | 11px | | Embedded in A4 PDF (ReportLab/LaTeX, ~450pt content width) | 10pt | 8pt | 7pt | | Embedded in slide deck (landscape, ~720pt wide) | 12pt | 10pt | 9pt | **Principle**: After embedding, the smallest text in the diagram must still be legible when the document is viewed at 100% zoom. If the diagram is scaled down to fit page width, recalculate: `effective_size = original_size ร— (display_width / canvas_width)`. If effective size drops below the minimum, either increase original font size or reduce diagram complexity. 6. **Legend/annotations must not overlap content** - Separate container, โ‰ฅ 40px gap from last node, fully within canvas bounds. **For Playwright-rendered diagrams**: Use low-saturation fills (`#EFF6FF`, `#F0FDF4`), CSS flexbox/grid for node layout, SVG ``/`` for connectors, and verify no overlap at final render size. **For ReportLab-drawn diagrams**: Same principles apply - use `Drawing()` with explicit coordinates, check node bounding boxes for overlap before finalizing. ### Diagram Generation Strategy (Per-Brief) Diagram rendering depends on the target brief - **NOT** a one-size-fits-all TikZ pipeline. | Target Brief | Diagram Method | Rationale | |---|---|---| | **Report** (ReportLab) | Playwright+CSS โ†’ PNG โ†’ `Image()` | No LaTeX compiler in this route; HTML/CSS handles any layout natively | | **Creative** (Playwright) | Directly in HTML (CSS flexbox/grid + JS connectors) | Already in browser context | | **Academic** (Tectonic) - simple (โ‰ค6 nodes) | TikZ native `tikzpicture` | Vector output, font consistency, LaTeX-native | | **Academic** (Tectonic) - complex (>6 nodes) | Playwright+CSS โ†’ PNG @2ร— โ†’ `\includegraphics` | TikZ branch logic is error-prone for models; 300dpi PNG is publication-ready | **Playwright+CSS diagram pipeline (Report & Academic-complex):** ```bash # 1. Write diagram HTML (CSS grid/flexbox + connectors) cat > diagram.html << 'EOF' EOF # 2. Screenshot at 2ร— for print quality (300dpi equivalent) python3 "$PDF_SKILL_DIR/scripts/pdf.py" convert.blueprint diagram.html --device-scale-factor 2 --output diagram.png # Or via Playwright directly: # page.screenshot(path='diagram.png', scale='device', device_scale_factor=2) # 3a. Embed in ReportLab (Report brief) from reportlab.platypus import Image img = Image('diagram.png', width=450) # auto height via aspect ratio story.append(img) # 3b. Embed in LaTeX (Academic brief, complex diagrams only) # \includegraphics[width=\columnwidth]{diagram.png} ``` **๐Ÿšซ FORBIDDEN for Report/Creative briefs:** Do NOT use TikZ standalone โ†’ compile โ†’ pdftoppm โ†’ PNG pipeline. This route has no LaTeX compiler and the extra compilation steps are error-prone. **TikZ remains valid ONLY for:** - Academic brief with simple diagrams (โ‰ค6 nodes, linear/hierarchical) - Direct `tikzpicture` embedding in LaTeX documents - Math-annotated diagrams where LaTeX math rendering matters See `briefs/academic.md` Scenario B for TikZ templates (simple diagrams only). --- ## Vector Rendering Iron Rule **The final PDF MUST be generated via `page.pdf()` (Playwright) or ReportLab/LaTeX native output - NEVER via screenshot-to-PDF.** | Scenario | Correct Method | Forbidden | |----------|---------------|-----------| | Creative pipeline (single/multi-page) | `page.pdf()` via `convert.blueprint` or `html2pdf-next.js` | `page.screenshot()` โ†’ image โ†’ wrap as PDF | | Report cover (HTML/Playwright) | `page.pdf()` โ†’ merge via pypdf | Screenshot cover โ†’ embed as image | | Academic cover | `page.pdf()` โ†’ merge via pypdf | Screenshot โ†’ `\includegraphics` for cover | | Full-page posters/infographics | `html2poster.js` (auto overflow:hidden + height measurement + `page.pdf()`) | Any raster pipeline for the final output | **Why:** `page.pdf()` produces vector text + vector shapes. Text remains selectable, sharp at any zoom, and file size is smaller. Screenshot-based PDFs are raster images - blurry when zoomed, unsearchable, and 3-5ร— larger. **The ONLY place screenshot/PNG embedding is acceptable:** - **Diagrams** embedded as sub-elements inside a larger document (e.g., flowcharts in a Report). These use `page.screenshot()` at 2ร— device scale factor for 300dpi print quality, then embed via `Image()` (ReportLab) or `\includegraphics` (LaTeX). - **Chart images** generated by matplotlib/plotly saved as PNG, then embedded. These are sub-elements, not the document itself. The document-level PDF output must always be vector. **Quick test:** Open the generated PDF, zoom to 400%. If text is blurry, you used a screenshot pipeline. Fix it. ### HTMLโ†’PDF Engine Selection Rules There are **two dedicated scripts** for HTMLโ†’PDF. Choose based on document type: | Document type | Script | Reason | |---------------|--------|--------| | **Posters, infographics, long-image single-page designs** | `html2poster.js` | Auto overflow:hidden, auto height measurement, zero margin, single-page output | | **Cover pages (Report/Academic route)** | `html2poster.js` | Covers are single-page fixed layouts with absolute positioning โ€” same nature as posters. `html2pdf-next.js` would convert absoluteโ†’static and destroy the layout | | **Multi-page documents, reports, academic papers, resumes** | `html2pdf-next.js` | A4/custom pagination, 20mm margin fallback, cover adaptation, pdf-lib metadata | | **Creative pipeline (Blueprint โ†’ HTML โ†’ PDF)** | `html2pdf-next.js` via `convert.blueprint` | Called internally by design_engine pipeline | #### Poster / Single-Page Long-Image โ†’ `html2poster.js` ```bash node "$PDF_SKILL_DIR/scripts/html2poster.js" poster.html --output poster.pdf --width 720px ``` `html2poster.js` automatically: - Forces `overflow: hidden` on `.poster` / `.page` containers (clips decorative overflow) - Injects `@page { margin: 0 }` (zero margins always) - Syncs `html/body` background with poster background color - Measures `.poster` scrollHeight and uses it as PDF height - Generates a single-page vector PDF with exact content dimensions **Use this for ANY fixed-width, dynamic-height, single-page design.** #### Documents / Multi-Page โ†’ `html2pdf-next.js` ```bash node "$PDF_SKILL_DIR/scripts/html2pdf-next.js" input.html --output output.pdf --width 210mm --height 297mm # Or via pdf.py wrapper: python3 "$PDF_SKILL_DIR/scripts/pdf.py" convert.html input.html --output output.pdf ``` Pre-render hooks auto-handle @page injection, overflow detection, cover adaptation, font loading, and pdf-lib metadata. #### โš ๏ธ Iron Rule: No Hand-Written Playwright Scripts Common issues with hand-written Python `page.pdf()` (the dedicated scripts handle these automatically): 1. **Missing `@page` rule** โ†’ browser default margin causes content overflow to second page or white edges 2. **Oversized elements not fixed** โ†’ large elements with `break-inside: avoid` block pagination, content gets truncated 3. **Rendering before fonts are loaded** โ†’ Chinese text displays as squares or falls back to wrong font 4. **No overflow detection** โ†’ content exceeds page boundary without awareness 5. **No metadata** โ†’ PDF title, author, and other info missing **Iron rule: Posters and cover pages use `html2poster.js`, multi-page documents use `html2pdf-next.js`. Do not write hand-written Python Playwright scripts.** > **โš ๏ธ Cover page gotcha:** Cover HTML uses `position: absolute` for layout. `html2pdf-next.js` pre-render hooks convert absolute-positioned elements to `static` flow (to prevent multi-page overlap), which **destroys** cover layouts. Always use `html2poster.js` for cover pages. ### No overflow:hidden on Fixed-Size Pages (html2pdf-next.js only) When using `html2pdf-next.js` for documents, **NEVER set `overflow: hidden` on `html`, `body`, or the main page container**. > **Note:** This rule does NOT apply to posters rendered via `html2poster.js` โ€” that script automatically adds `overflow: hidden` to `.poster`/`.page` containers to clip decorative overflow. You don't need to add or remove it manually. | Problem | Cause | Fix | |---------|-------|-----| | Browser preview cuts off bottom content, can't scroll | `overflow: hidden` on container + viewport < design height | Remove `overflow: hidden` | | html2pdf-next.js "Fixed vertical overflow" warning, layout may break | Pre-render detects `scrollHeight > clientHeight` + hidden overflow, force-expands container | Remove `overflow: hidden` | **Always pair fixed-size pages with `@media screen` auto-scale** so the full page is visible in any browser window without scrolling. See `briefs/creative.md` ยง 0.5 for the CSS pattern. ### Full-Bleed Rule (No White Margins) When generating HTML for Playwright `page.pdf()`, the content **MUST fill the entire page** with zero margins. White side margins = broken layout. **Mandatory CSS for any HTML โ†’ PDF:** ```css @page { size: ; /* e.g., 720px 960px, or A4 */ margin: 0; } html, body { margin: 0; padding: 0; } ``` **Common causes of white margins:** 1. Missing `@page { margin: 0 }` - browser default margins kick in (~1cm each side) 2. Content width doesn't match page width - e.g., canvas is 720px but page is A4 (794px) 3. Missing `@page { size }` declaration in the HTML 4. Content has explicit `max-width` that's narrower than the page **For blueprint pipeline:** `design_engine.py` now injects `@page { size: var(--canvas-w) var(--canvas-h); margin: 0; }` automatically. **For raw HTML:** YOU must include the `@page` rule. No exceptions. **For direct Playwright:** Pass `margin: { top: 0, right: 0, bottom: 0, left: 0 }` to `page.pdf()`. ### Background Color Consistency (No Color Mismatch) **`html` / `body` background color must match the content canvas background color.** Playwright `page.pdf({ printBackground: true })` renders the body background color. If body is white while the content area is gray/colored, color-inconsistent borders/gaps will appear in the PDF. #### Single-color documents (all pages same background) ```css /* MANDATORY: body background = content background */ html, body { margin: 0; padding: 0; background: var(--c-bg); /* Same color as content canvas */ } ``` #### Multi-page documents with mixed backgrounds (e.g. dark cover + white body pages) **Root cause:** Playwright resolves `.page { width: 210mm }` and `@page { size: 210mm }` to slightly different sub-pixel values (e.g. 793.688px vs 793.701px). This creates a <1px gap at the right/bottom edge of each `.page` div where `body`'s background shows through. On dark pages, a white `body` background makes this gap visible as a white edge. **Fix โ€” set `body` background to the document's dominant dark color:** ```css :root { --primary: #0f172a; /* darkest page background */ } html, body { margin: 0; padding: 0; width: 210mm; /* match @page size */ background: var(--primary); /* fallback for sub-pixel gaps */ } ``` **Why this works and doesn't break white pages:** - Dark pages: sub-pixel gap reveals dark `body` โ†’ gap invisible. - White pages: `.page-white { background: #ffffff }` fully covers `body` โ†’ dark body never visible. - The gap is <1px โ€” even on white pages, the dark body at the extreme pixel edge is imperceptible after anti-aliasing. **Rule: when generating multi-page HTML with mixed backgrounds, always set `html, body { background }` to the darkest page's background color.** If all pages are light/white, use the lightest content background (e.g. `#f8fafc`). Never leave `body` background unset (browser default = white = guaranteed white edges on dark pages). ``` ### Content Centering (No Left/Right Drift) **After HTML-to-PDF conversion, content must be centered, no left or right drift allowed.** Common drift causes: 1. `@page { margin }` not 0 โ€” browser default margin causes drift 2. `.safe-zone` or content container `inset` / `padding` left-right asymmetric 3. Content container has `max-width` but no `margin: 0 auto` 4. Grid components only occupy partial column width (e.g. `1/1 โ†’ X/7` only uses left half) 5. **Decorative elements overflow page boundary** โ€” elements with `width > 100%` or negative offsets (e.g. glow circles, gradient overlays) inflate `scrollWidth` beyond page width. Playwright shrinks all content to fit, causing left-shift. **Fix: add `overflow: hidden` to `.page` containers.** See `typesetting/overflow.md` ยง3.5 for horizontal flex overflow rules. ### Anti-Void Edges (No Large Blank Margins) **Content should not have large meaningless whitespace at page edges, top, or bottom.** - Content should make full use of page area; do not cram all content in the top half while leaving the bottom blank - For multi-page documents, each page's fill rate should be โ‰ฅ 60% (see `pagination.md` last page โ‰ฅ 40% rule) - For single-page posters/infographics, fill rate should be โ‰ฅ 70% --- ## Preflight (Quality Assurance) Every PDF must pass preflight checks before delivery. Each brief specifies the exact commands. ### HTML Pre-Render Validation (MANDATORY for ALL HTMLโ†’PDF paths) **Before** calling `html2pdf-next.js`, `html2poster.js`, `convert.blueprint`, or any Playwright `page.pdf()`, run: ```bash python3 "$PDF_SKILL_DIR/scripts/poster_validate.py" check-html .html ``` | Result | Action | |--------|--------| | **PASS** (no errors) | Proceed to PDF generation | | **ERROR** items | Must fix before generating PDF. Use `--fix --output .html` for auto-repair | | **WARNING** items | Review; non-blocking but should be addressed | **Key checks:** - `OVERFLOW_HIDDEN_CONTAINER` (error): `overflow:hidden` on html/body/.page clips content in browser preview and triggers html2pdf-next.js auto-fix that may break layout - `FIXED_SIZE_NO_SCREEN_ADAPT` (warning): fixed-size page without `@media screen` auto-scale โ€” browser preview requires scrolling - `SCREEN_ADAPT_NO_SCALE` (warning): `@media screen` exists but lacks scale/transform/zoom - `FONT_NO_FALLBACK` (error): font-family without generic fallback - `COLOR_CONTRAST` (warning): text/background contrast ratio < 3:1 - Plus: remote images, absolute paths, missing margin reset, tiny fonts, background mismatch, etc. This applies to **all three HTML routes**: Creative blueprint pipeline, Report HTML covers, and bypass/custom HTML. ### Overflow Prevention System **โ†’ Full spec: `typesetting/overflow.md`** - read it for any document with tables, images, or multi-column layouts. Core principles: 1. **Measure first, draw second** - never render content without pre-calculating its dimensions 2. **Bounding Box constraint** - every element's width โ‰ค its parent container's `Max_Width` 3. **Text: use font metrics**, not character count, for width calculation 4. **Images: proportional scaling** - never insert at original size 5. **Tables: weight-based column width** + `Paragraph()` wrapping (never plain strings) 6. **Fallback ladder**: wrap โ†’ shrink font (max -3pt) โ†’ reduce padding โ†’ split element โ†’ log warning 7. **Vertical: KeepTogether** for heading+body, chart+caption; `repeatRows=1` for long tables ### Table Overflow Prevention (ReportLab) **Most common layout bug: table columns exceed page margins.** Before building any ReportLab Table: 1. Calculate `available_width = page_width - left_margin - right_margin` 2. Use proportional colWidths (`[0.25, 0.40, 0.20, 0.15]` ร— available_width) or fixed+flex pattern 3. `sum(colWidths)` must be โ‰ค `available_width` - **verify this in code** 4. Long text columns must use `Paragraph()` wrapping, not plain strings (plain strings don't wrap) 5. CJK text is wider: budget ~12pt per character at 10pt font size See `briefs/report.md` ยง "Table Width Management" for code patterns. ### Table Overflow Prevention (LaTeX/Academic) **Most common bug in dual-column papers: wide tables overflow single-column width.** Before writing any LaTeX table: 1. Count data columns - โ‰ค 4 fits single column; 5-6 needs `\small`; 7-8 needs `\resizebox`; โ‰ฅ 9 use `table*` (full width) 2. Use `tabular*{\columnwidth}` or `tabularx{\columnwidth}` instead of plain `tabular` for 5+ columns 3. Never use plain `tabular` with 8+ columns in twocolumn layout - guaranteed overflow 4. `\resizebox{\columnwidth}{!}` as last resort - verify smallest text โ‰ฅ 6pt after scaling See `briefs/academic.md` ยง "Table width management" for LaTeX patterns. ### Playwright PDF CSS Blacklist These CSS properties **silently break** in Playwright's PDF renderer: - `backdrop-filter` / `-webkit-backdrop-filter` - **drops entire element content**. Use solid `rgba()` backgrounds. - `overflow: hidden` on content containers - clips content. Only safe on small decorative elements (< 200px). After generating any Playwright PDF, **verify every page has content** (pypdf text extraction, check non-empty). ### PDF Metadata (all briefs) ALL PDFs must have: Title, Author (default "Z.ai"), Creator, Subject. ### Delivery Summary (all briefs) Report to user: file path, size, page count. Academic adds word/image count. Creative adds per-page verification. **HTMLโ†’PDF route deliverables (MANDATORY โ€” applies to ALL briefs that use Playwright/HTML to generate PDF):** Whenever the HTMLโ†’PDF pipeline is used (Creative route, Report cover bypass, Direct HTML Flow posters, or any Playwright `page.pdf()` path), you MUST deliver **both files** to the user: 1. **HTML** โ€” the source HTML file, so the user can edit and reuse the design 2. **PDF** โ€” the final vector PDF (`page.pdf()` output) Optionally also provide: 3. **Image** โ€” a full-page screenshot/preview image (PNG or JPG) for quick sharing on chat/social media All file paths must be reported to the user. **Never deliver only the PDF without the HTML source.** --- ## Tooling Reference ### CLI: `python3 "$PDF_SKILL_DIR/scripts/pdf.py" ` ```bash # Environment env.check # Check deps env.fix # Auto-install missing # Quality code.sanitize