# Brief: Report Production Structured documents via ReportLab: reports, proposals, contracts, white papers, financial analysis, data tables. Also covers ATS-friendly resumes. --- > ### 🚨 EMOJI CHECK - STOP HERE IF CONTENT HAS EMOJI > > ReportLab renders emoji (📊🎯🔥💡 etc.) as **□ tofu squares**. This is unfixable. > > **If the user's content contains intentional emoji** → **STOP. Do NOT use this brief.** > Route to `briefs/creative.md` instead (Playwright renders emoji natively). > > This applies even if the document is a "report" - emoji + ReportLab = broken output. --- ## Production Workflow ``` 1. BRIEF → Confirm document type, audience, page count, outline 2. DESIGN → Run `pdf.py palette.generate --title "..."` → copy-paste output into script 3. EDIT → Content transformation (extract data, define typographic roles) 4. COVER → Read typesetting/cover.md → render HTML cover via Playwright (default ON for reports ≥ 3 pages; skip only for resumes/letters/memos/forms) 5. BUILD → Write ReportLab code (fonts → styles → content → tables → charts) 6. PREFLIGHT → code.sanitize → execute → meta.brand → font.check → toc.check → pages.clean → pdf_qa.py 7. DELIVER → Merge cover + body → final PDF ``` **Typesetting assets** (load when you reach that step): - `typesetting/palette.md` - color system, typography rules, anti-patterns - `typesetting/cover.md` - 7 cover layouts with variants, typography scale, bounding box rules - `typesetting/charts.md` - chart styling, anti-stacking rules, axis/grid/legend treatment **Cover is DEFAULT ON for reports.** Skip cover only for: resumes, letters, memos, forms, checklists, or documents ≤ 2 pages. --- ## Step 2: DESIGN - Palette & Font Plan **Before writing any ReportLab code, you MUST generate the color palette via the `palette.generate` command.** This is not optional. Do NOT hardcode hex colors. Do NOT pick colors by feel. ### 2a. Generate Palette (MANDATORY) **Run this command FIRST. Copy-paste its output directly into your Python script:** ```bash python3 "$PDF_SKILL_DIR/scripts/pdf.py" palette.generate --title "" --mode minimal ``` The command auto-derives the design intent from the document title, computes a mathematically harmonious palette, and outputs ready-to-paste ReportLab Python code: ```python # ━━ Color Palette (auto-generated by pdf.py palette.generate) ━━ # Intent: neutral | Mode: minimal | Harmony: split_complementary # Contrast: text:bg=14.12 | accent:bg=3.06 from reportlab.lib import colors ACCENT = colors.HexColor('#2f97b9') TEXT_PRIMARY = colors.HexColor('#252422') TEXT_MUTED = colors.HexColor('#8d8981') BG_SURFACE = colors.HexColor('#dfdad2') BG_PAGE = colors.HexColor('#f5f4f3') SURFACE_RGBA = 'rgba(0,0,0,0.03)' # ReportLab table colors TABLE_HEADER_COLOR = ACCENT TABLE_HEADER_TEXT = colors.white TABLE_ROW_EVEN = colors.white TABLE_ROW_ODD = BG_SURFACE ``` **Options:** - `--mode minimal` (default, recommended for 50%+ of documents) | `dark` | `pastel` | `jewel` | `light` - `--harmony auto` (default, recommended) | `complementary` | `split_complementary` | `analogous` | `triadic` | `monochrome` - `--format python` (default) | `json` | `css` - `--seed ` - for reproducible palettes across regenerations **⚠️ FORBIDDEN:** - ❌ Writing `colors.HexColor('#xxxxxx')` with any hex value you chose yourself - ❌ Using `colors.red`, `colors.blue`, or any ReportLab named color for design elements - ❌ Skipping this step and picking colors "that look good" - ❌ Using different palettes for different sections of the same document **✅ The ONLY acceptable way to get colors:** Run `palette.generate`, copy the output. ### 2b. Color Application Rules | Element | Color Source | Notes | |---------|-------------|-------| | Table headers | ACCENT (bg) + white (text) | High contrast required | | Table odd rows | BG_SURFACE at 50% opacity | Subtle striping | | Section titles | ACCENT or TEXT_PRIMARY | Match heading hierarchy | | Body text | TEXT_PRIMARY | Never use pure #000000 | | Muted/meta text | TEXT_MUTED | Dates, captions, footnotes | | Horizontal rules | ACCENT at 30% opacity | Thin, unobtrusive | | Chart colors | Derive from ACCENT (see charts.md) | Consistent with palette | ### 2c. Forbidden - ❌ Hardcoding hex colors like `#1F4E79`, `#555555`, `#888888` in code - ❌ Using `colors.red`, `colors.blue` or any ReportLab named color for design elements - ❌ Choosing colors by "feel" without running palette first - ❌ Using different accent colors across tables/sections in the same document > **Exception**: Semantic colors for data visualization (green for positive, red for negative) are acceptable but should be muted variants derived from the palette when possible. --- ## Step 3: EDIT - Content Transformation **Before writing any code, restructure raw text into visual roles.** This is the most impactful quality step - skipping it produces "Word document with a PDF extension". ### 3a. Typographic Role Extraction Parse raw text and categorize content into visual roles: | Role | ReportLab Mapping | Description | |------|------------------|-------------| | **Hero / Display** | Cover title (36-42pt) or section opener | 1-5 words, the emotional hook | | **Kicker / Eyebrow** | Subtitle or section intro (10-12pt, muted) | Tiny context line above main heading | | **Data Sculpture** | CalloutBox or bold stat block | Extract impactful numbers (97%, $4.2M, -45ms) | | **Pull Quote** | BlockQuote (italic + left indent 24pt + accent border) | Most provocative sentence, standalone | | **Body** | Standard paragraph | Everything else | **Rule**: Before writing any body paragraph, scan for numbers with units. Every metric like "revenue grew by 12%" or "latency dropped to 45ms" MUST be extracted into a CalloutBox or metrics table - never buried in paragraph prose. See §Data-to-Ink Ratio Rules below. ### 3b. Section Pacing - If any H1 section has **>400 words** of continuous body text without visual breaks, split into H2 subsections or insert a visual break (table, chart, callout) - Aim for at least 1 visual element (table/chart/callout) per 2-3 pages of body text - Dense text walls are the #1 sign of poor report design --- --- ## Character Safety Rule (MANDATORY) Three rules for safe character handling in ReportLab PDFs: **a) Superscripts and subscripts**: Use ``/`` tags, never raw Unicode superscript/subscript characters (e.g., `\u00b2`, `\u2082`). **b) Emoji**: ReportLab cannot render emoji. If content contains emoji, use Creative brief (HTML + Playwright). **c) Font fallback for mixed CJK/Latin text**: After registering fonts, call `install_font_fallback()` once. This automatically wraps missing-glyph characters in `` tags inside every `Paragraph()`. No manual `` wrapping needed for mixed Chinese-English text. Mathematical/relational operators (×, ÷, ±, ≤, √, ∑, ≅, ∫, π, ∠, Δ, etc.) are safe to use as literal characters in `Paragraph()` - both SimHei and Times New Roman have these glyphs. | Need | Correct Method | Correct Example | |------|---------------|---------| | Superscript | `` tag in `Paragraph()` | `Paragraph('102 × 103 = 105', style)` | | Subscript | `` tag in `Paragraph()` | `Paragraph('H2O', style)` | | Bold | `` tag in `Paragraph()` | `Paragraph('Title', style)` | | Math operators | Literal char in `Paragraph()` | `Paragraph('AB ⊥ AC, ∠A = 90°, and ΔABC ≅ ΔDCF', style)` | | Scientific notation | Combined tags in `Paragraph()` | `Paragraph('1.2 × 108 kg/m3', style)` | ```python from reportlab.platypus import Paragraph from reportlab.lib.styles import ParagraphStyle from reportlab.lib.enums import TA_LEFT, TA_CENTER, TA_JUSTIFY body_style = ParagraphStyle( name="ENBodyStyle", fontName="Times New Roman", fontSize=10.5, leading=18, alignment=TA_JUSTIFY, ) # Superscript: area unit Paragraph('Total area: 500 m2', body_style) # Subscript: chemical formula Paragraph('The reaction produces CO2 and H2O', body_style) # Scientific notation Paragraph('Speed of light: 3.0 × 108 m/s', body_style) # Combined Paragraph('Ek = mv2/2', body_style) # Bold heading Paragraph('Chapter 1: Introduction', header_style) # Math symbols Paragraph('When ∠ A = 90°, AB ⊥ AC and ΔABC ≅ ΔDEF', body_style) ``` **Pre-generation check - before writing ANY string, ask:** > "Does this string contain a character outside basic CJK or Mathematical/relational operators?" > If YES → it MUST be inside a `Paragraph()` with the appropriate tag. > If it is a superscript/subscript digit in raw unicode escape form → REPLACE with ``/`` tag. **NEVER rely on post-generation scanning. Prevent at the point of writing.** **Encoding safety - before writing ANY content text:** > "Does this string contain Japanese kana (の, が, は etc.) or rare Unicode symbols?" > If YES → REPLACE with safe plain Chinese equivalents. Japanese kana (Hiragana U+3040-U+309F, Katakana U+30A0-U+30FF) frequently corrupt to U+FFFD (�) when code passes through LLM output, heredoc, or terminal encoding layers. > Common safe replacements: `活の真鲷`→`活缔真鲷`, `盐烤の鲭鱼`→`盐烤鲭鱼`, `烤の鸡串`→`炭烤鸡串`. > If the character is genuinely needed, verify it survives a full write→read round-trip with `open(file, encoding='utf-8')`. --- ## Font Setup (Guaranteed Success Method) ### CRITICAL: Allowed Fonts Only **You MUST ONLY use the following registered fonts. Using ANY other font is STRICTLY FORBIDDEN.** | Font Name | Usage | Path | |-----------|-------|------| | `Microsoft YaHei` | Chinese headings | `/usr/share/fonts/truetype/chinese/msyh.ttf` | | `SimHei` | Chinese body text | `/usr/share/fonts/truetype/chinese/SimHei.ttf` | | `SarasaMonoSC` | Chinese code blocks | `/usr/share/fonts/truetype/chinese/SarasaMonoSC-Regular.ttf` | | `Times New Roman` | English text, numbers, tables | `/usr/share/fonts/truetype/english/Times-New-Roman.ttf` | | `Calibri` | English alternative | `/usr/share/fonts/truetype/english/calibri-regular.ttf` | | `DejaVuSans` | Formulas, symbols, code | `/usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf` | **FORBIDDEN fonts (DO NOT USE):** - ❌ Arial, Arial-Bold, Arial-Italic - ❌ Helvetica, Helvetica-Bold, Helvetica-Oblique - ❌ Courier, Courier-Bold - ❌ Any font not listed in the table above ### Font Registration Template ```python from reportlab.pdfbase import pdfmetrics from reportlab.pdfbase.ttfonts import TTFont from reportlab.pdfbase.pdfmetrics import registerFontFamily # Chinese fonts pdfmetrics.registerFont(TTFont('Microsoft YaHei', '/usr/share/fonts/truetype/chinese/msyh.ttf')) pdfmetrics.registerFont(TTFont('SimHei', '/usr/share/fonts/truetype/chinese/SimHei.ttf')) pdfmetrics.registerFont(TTFont("SarasaMonoSC", '/usr/share/fonts/truetype/chinese/SarasaMonoSC-Regular.ttf')) # English fonts pdfmetrics.registerFont(TTFont('Times New Roman', '/usr/share/fonts/truetype/english/Times-New-Roman.ttf')) pdfmetrics.registerFont(TTFont('Calibri', '/usr/share/fonts/truetype/english/calibri-regular.ttf')) # Symbol/Formula font pdfmetrics.registerFont(TTFont("DejaVuSans", '/usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf')) # CRITICAL: Register font families to enable , , tags registerFontFamily('Microsoft YaHei', normal='Microsoft YaHei', bold='Microsoft YaHei') registerFontFamily('SimHei', normal='SimHei', bold='SimHei') registerFontFamily('Times New Roman', normal='Times New Roman', bold='Times New Roman') registerFontFamily('Calibri', normal='Calibri', bold='Calibri') registerFontFamily('DejaVuSans', normal='DejaVuSans', bold='DejaVuSans') ``` ### Font Configuration by Document Type **For Chinese PDFs:** - Body text: `SimHei` or `Microsoft YaHei` - Headings: `Microsoft YaHei` (MUST use for Chinese headings) - Code blocks: `SarasaMonoSC` - Formulas/symbols: `DejaVuSans` - **In tables: ALL Chinese content and numbers MUST use `SimHei`** **For English PDFs:** - Body text: `Times New Roman` - Headings: `Times New Roman` (MUST use for English headings) - Code blocks: `DejaVuSans` - **In tables: ALL English content and numbers MUST use `Times New Roman`** **For Mixed Chinese-English PDFs:** - Call `install_font_fallback()` once after registering fonts - it automatically wraps characters in `` tags so you don't need to do it manually. - If you still want manual control, you can use `` tags, but the automatic fallback handles most cases. ```python import sys, os sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'scripts')) from pdf import install_font_fallback # 1. Register fonts (as above) # 2. Install fallback - one line, handles all mixed-language text install_font_fallback() # 3. Just write naturally - no manual wrapping needed! en_style = ParagraphStyle(name="EN", fontName="Times New Roman", fontSize=10.5, leading=18) cn_style = ParagraphStyle(name="CN", fontName="SimHei", fontSize=10.5, leading=18) Paragraph('MySQL、PostgreSQL、Redis', en_style) # ✅ CJK comma auto-fallback to SimHei Paragraph('Beijing (北京) has 21M people', en_style) # ✅ CJK chars auto-fallback Paragraph('价格:¥2,200~5,800/月', en_style) # ✅ all CJK chars handled Paragraph('《巴黎协定》签署于2015年', en_style) # ✅ CJK book title marks handled ``` **How it works:** `install_font_fallback()` monkey-patches `Paragraph.__init__` to scan each character against the font's `charToGlyph` table. Characters missing from the base font are automatically wrapped in ``. The fallback chain is: English fonts → SimHei, Chinese fonts → Times New Roman. For aesthetic optimization, Cyrillic text in SimHei is automatically routed to Times New Roman (serif looks better for Cyrillic). ### Chinese Plot PNG Method ```python import matplotlib.pyplot as plt plt.rcParams['font.sans-serif'] = ['SimHei'] plt.rcParams['axes.unicode_minus'] = False ``` ### Available Font Paths Run `fc-list` to get more fonts. Font files are typically located under: - `/usr/share/fonts/truetype/chinese/` - `/usr/share/fonts/truetype/english/` - `/usr/share/fonts/` --- ## Layout & Spacing Control ### Page Breaks Follow the document type strategy defined in SOUL.md Rule 1. **Structural breaks (always - MANDATORY):** - Between cover page and TOC - **cover must NEVER share a page with anything else** - Between TOC and main content - Between main content and back cover **Content breaks (by document type):** - **Default (all document types)**: ❗ **Absolutely NEVER force page breaks before H1/H2** — content flows naturally, do not insert `PageBreak()` before section headings. This is an iron rule. - Resume / contract / letter → No content page breaks - Short article → No content page breaks - **Exception 1**: Academic papers / textbooks / teaching plans → `PageBreak()` before each H1 is acceptable (only if user explicitly requests academic formatting) - **Exception 2**: User explicitly requests page breaks between chapters **Anti-tear (mandatory — but with height limit):** ```python from reportlab.platypus import KeepTogether from reportlab.lib.pagesizes import A4 from reportlab.lib.units import inch # Maximum height for KeepTogether blocks: 40% of page height # Blocks taller than this should NOT be kept together — they cause # the preceding page to be mostly empty when the block gets pushed down. MAX_KEEP_HEIGHT = A4[1] * 0.4 # ~336pt ≈ 12cm def safe_keep_together(elements): """Wrap elements in KeepTogether only if their total height is reasonable. If too tall, keep only the FIRST TWO elements together (e.g. title + table/image header) to prevent title-content separation, while letting the rest paginate naturally. """ total_h = 0 for el in elements: w, h = el.wrap(A4[0] - 2*inch, A4[1]) total_h += h if total_h <= MAX_KEEP_HEIGHT: return [KeepTogether(elements)] elif len(elements) >= 2: # Keep at least title + first element together to prevent title orphaning return [KeepTogether(elements[:2])] + list(elements[2:]) else: return list(elements) # Heading stays with first paragraph story.extend(safe_keep_together([ heading_paragraph, first_body_paragraph, ])) # Image + caption: keep together ONLY if image is small story.extend(safe_keep_together([image, caption_paragraph])) # Table title + table: keep together ONLY if table is short if len(table_data) <= 15: story.extend(safe_keep_together([table_title, table])) else: # Long table: just keep title with the table start story.extend(safe_keep_together([table_title])) story.append(table) ``` **⚠️ KEY INSIGHT:** `KeepTogether` is the #1 cause of “empty page with chart on next page”. When a `KeepTogether` block is taller than the remaining space on the current page, ReportLab pushes the **entire block** to the next page, leaving the current page mostly empty. **Always use `safe_keep_together()` to cap the maximum block height.** **⚠️ Note:** Do NOT use `KeepWithNext` - it is unreliable in ReportLab 4.x. ### Vertical Spacing Standards * Before tables: `Spacer(1, 18)` after preceding text * After tables: `Spacer(1, 6)` before table caption * After table captions: `Spacer(1, 18)` before next content * Between paragraphs: `Spacer(1, 12)` (~1 line) * Between H3 subsections: `Spacer(1, 12)` * Between H2 sections: `Spacer(1, 18)` (~1.5 lines) * Between H1 sections: `Spacer(1, 24)` (~2 lines) * NEVER use `Spacer(1, X)` where X > 24, except for H1 major breaks or cover page elements ### Cover Page: HTML/Playwright Unified Cover System **⚠️ Report route covers are ALWAYS generated via HTML/Playwright**, using the same 7-template system defined in `typesetting/cover.md`. This ensures visual consistency across all routes (Report, Creative, Academic) and avoids the limitations of ReportLab for complex cover layouts. **Pipeline (Report route):** 1. Generate **body PDF** via ReportLab (start with TOC or content - **no cover in story[]**) 2. Generate **cover HTML** following `typesetting/cover.md` 7-template system 3. **Run `poster_validate.py check-html` on cover HTML** — fix any ERRORs before rendering (overflow:hidden, font fallback, etc.) 4. **Run `cover_validate.js` on cover HTML** — detects text-vs-decorative-line overlaps. Non-zero exit = must fix before proceeding. ```bash node "$PDF_SKILL_DIR/scripts/cover_validate.js" cover.html ``` 5. Render cover HTML → single-page PDF via Playwright (`html2poster.js`) — **NOT `html2pdf-next.js`** (which converts absolute→static and destroys cover layout) 6. **Merge: insert cover as page 0** of body PDF using pypdf → output single final PDF > **Why not ReportLab covers?** ReportLab is excellent for structured content (tables, paragraphs, flowables) but painful for visual design (geometric accents, precise absolute positioning, web fonts). HTML/CSS handles these natively. One cover system, one visual standard, zero inconsistency. **⚠️ Cover page is DEFAULT ON for all Report-route documents ≥ 3 pages.** The Report route handles formal, structured documents. Readers expect a professional cover. **Always generate a cover unless the document type is explicitly excluded below.** **ALWAYS add a cover page (default behavior):** - Research reports, experiment reports, lab reports, analysis reports - White papers, industry analysis, market research - Business proposals, project plans, feasibility studies - Technical documentation, product manuals, user guides - Annual reports, financial summaries, quarterly reviews - Any formal document ≥ 3 pages that will be shared externally or submitted **NEVER add a cover page (explicit exclusions only):** - **Resumes / CVs** - recruiters want content immediately - **Letters / memos** - single-page or short-form - **Forms / checklists / invoices** - functional documents - **Internal notes / meeting minutes** - brevity over presentation - **Documents ≤ 2 pages** - cover would be disproportionate - **User explicitly said "no cover"** **Rule of thumb:** If it's a report, analysis, proposal, or any formal document ≥ 3 pages → **add a cover**. When in doubt, add the cover. It's easier to remove a cover the user didn't want than to miss one they expected. #### Step 1: Generate Body PDF (ReportLab, no cover) Build the ReportLab document starting directly with TOC or first section. **Do NOT include any cover page in the `story[]` list.** ```python # Body PDF - starts with TOC or first section, NO cover story = [] # story.append(toc) # if applicable # story.append(PageBreak()) # story.append(first_section_content) # ... doc.build(story) # Output: body.pdf (no cover) ``` #### Step 2: Generate Cover PDF via html2poster.js ```bash # ALWAYS use html2poster.js for cover rendering (NOT html2pdf-next.js) # Cover pages use position:absolute layout — html2pdf-next.js pre-render hooks # convert absolute→static and destroy the layout. html2poster.js preserves it. node "$PDF_SKILL_DIR/scripts/html2poster.js" cover.html --output cover.pdf --width 794px ``` Or from Python: ```python import subprocess, os def render_cover(html_path, pdf_path): """Render HTML cover to PDF via html2poster.js. ⚠️ ALWAYS use html2poster.js for covers (NOT html2pdf-next.js). Cover HTML uses position:absolute for layout. html2pdf-next.js pre-render hooks convert absolute→static to prevent multi-page overlap, which destroys cover layouts. html2poster.js preserves absolute positioning. """ scripts_dir = os.path.expanduser('~/.openclaw/workspace/skills/pdf/scripts') subprocess.run([ 'node', os.path.join(scripts_dir, 'html2poster.js'), html_path, '--output', pdf_path, '--width', '794px', ], check=True) ``` **Insertion script (single output PDF):** ```python from pypdf import PdfReader, PdfWriter, Transformation A4_W, A4_H = 595.28, 841.89 # A4 in points def normalize_page_to_a4(page): """Scale a page to A4 if its dimensions don't match.""" box = page.mediabox w, h = float(box.width), float(box.height) if abs(w - A4_W) > 2 or abs(h - A4_H) > 2: sx, sy = A4_W / w, A4_H / h page.add_transformation(Transformation().scale(sx=sx, sy=sy)) page.mediabox.lower_left = (0, 0) page.mediabox.upper_right = (A4_W, A4_H) return page def insert_cover(cover_pdf, body_pdf, output_pdf): """Insert cover as first page of body PDF → single output file.""" writer = PdfWriter() # Cover as page 1 cover_page = PdfReader(cover_pdf).pages[0] writer.add_page(normalize_page_to_a4(cover_page)) # Body pages follow for page in PdfReader(body_pdf).pages: writer.add_page(normalize_page_to_a4(page)) writer.add_metadata({'/Title': 'Report Title', '/Author': 'Z.ai', '/Creator': 'Z.ai'}) with open(output_pdf, 'wb') as f: writer.write(f) ``` > **⚠️ Size pitfall:** Playwright `page.pdf(width='794px', height='1123px')` output PDF dimensions may differ from A4 (595.28×841.89pt) by 1-2pt. **Do not use PIL Image.save('x.pdf')** for PNG→PDF conversion — DPI mapping causes severe size errors. You must call `normalize_page_to_a4()` to unify dimensions before insertion. **Cover HTML template reference**: See `typesetting/cover.md` for the complete 7-Layout System with variants, typography scale, and bounding box rules. **→ Full cover spec: `typesetting/cover.md`** - read it before designing any cover. **Cover design rules (summary - see cover.md for details):** - Page size: `width: 794px; height: 1123px` (A4 at 96dpi) - `body { margin: 0; padding: 0; overflow: hidden; }` - REQUIRED to avoid white borders - Load Google Fonts via `` in `` (NOT `@import url(...)` in CSS — `@import` must be first rule or is silently ignored) — Playwright fetches them at render time - **Background must be white or very light** (`#fff` / `#fafafa` / `#f5f8fb`), no dark solid backgrounds or gradient backgrounds - Primary color used only for text, lines, geometric accents - never as large-area background - Sophistication = whitespace + typography + restrained geometric accents **Cover Constitution (7-Layout System):** - **Pick a layout (1-7)** from `typesetting/cover.md` that matches the document tone. No global default - every selection must be a deliberate design decision. Layout 7 for government/bidding/proposal documents. - **Maximum 4 components** on any cover. Typical recipe: Title + subtitle + 1 geometric accent + metadata. - **Typography Scale**: Title ≈ 45pt, Subtitle ≈ 25pt, Meta ≥ 18pt (never below 14pt). Tiny text = FAIL. - **Background layer (optional)**: See `typesetting/cover-backgrounds.md` for 3 recipes (A=极简弧线, B=工程十字轴+立柱, C=锐角切割+出血文字). Background renders BELOW all foreground content at 2-5% opacity. - **Mandatory `
` chunking**: Title MUST break every 2-4 words (CJK) or 3-5 words (English). Single-line title = FAIL. - **Bounding Box spatial dispersion**: Group elements into 2-3 bounding boxes at opposite regions (e.g., top-left + bottom-right). Never cluster everything into middle 40%. - **Safe Zone**: 12% top/bottom, 14% left/right padding on cover pages. - **No body text on cover**: Cover = title + subtitle + date + optional geometric decoration. All content goes to page 2+. ### ⚠️ Cover Page Isolation Rule (MANDATORY) **The cover page must ALWAYS be on its own page.** Cover content and subsequent content (TOC, body text, etc.) must NEVER appear on the same page. This is a hard rule. - **Report route (HTML/Playwright cover)**: The cover is a separate rendered page inserted as page 0 via pypdf - isolation is inherent in the merge pipeline - **Creative route**: Cover is part of the same HTML document but on its own `
` with page-break - isolation is handled by CSS If a generated PDF shows cover + TOC on the same page, it is a **critical bug** - regenerate immediately. ### Alignment and Typography - **CJK body**: Use `TA_LEFT` + 2-char indent. Headings: no indent. - **Font sizes**: Body 11pt, subheadings 14pt, headings 18-20pt - **Line height**: 1.5-1.6 (leading at **1.4x font size minimum**, recommended 1.5x for CJK) - CJK characters are taller than Latin - 1.2x causes cramped lines - Quick reference: 10pt→14pt min/18pt rec; 14pt→20pt min/22pt rec; 36pt→50pt min/54pt rec - **⛔ Prohibited: fixed `rowHeights` in Table()**. Use `TOPPADDING` / `BOTTOMPADDING` to control row spacing. Fixed rowHeights cause content overflow clipping - the text renders but gets invisibly cut off. - **CRITICAL: Alignment Selection Rule**: - Use `TA_JUSTIFY` only when ALL of: * Text is predominantly English (≥ 90%) * Column width is sufficiently wide (A4 single-column body) * Font: Western fonts (Times New Roman / Calibri) * Chinese content: None or negligible - Otherwise, always default to `TA_LEFT` - For Chinese text, always add `wordWrap='CJK'` to ParagraphStyle ### Style Configuration * Normal paragraph: `spaceBefore=0`, `spaceAfter=6-12` * Headings: `spaceBefore=12-18`, `spaceAfter=6-12` * **Headings must be bold**: Use `` tags in Paragraph * Table captions: `spaceBefore=3`, `spaceAfter=6`, `alignment=TA_CENTER` * **CRITICAL**: For Chinese text, always add `wordWrap='CJK'` to ParagraphStyle --- ## Table Formatting ### Standard Table Color Scheme (MUST USE for ALL tables) > **Colors MUST come from the palette generated in Step 2.** The values below are examples - replace them with your actual palette output. ```python # Colors from design_engine.py palette (Step 2) - NEVER hardcode these TABLE_HEADER_COLOR = ACCENT # From palette --c-accent TABLE_HEADER_TEXT = colors.white # White text for header (fixed) TABLE_ROW_EVEN = colors.white # White for even rows TABLE_ROW_ODD = BG_SURFACE # From palette --c-mid (light stripe) ``` ### Table Rules - Caption must be centered, added immediately after the table - Entire table must be centered on the page - **Header Row**: Dark background (from palette `header_fill`), white bold text - **Cell Margins**: Left/Right at least 120-200 twips - **Alignment**: Each body element within the same table must be aligned the same method - **Color consistency**: All tables in one PDF must use the same color scheme - **Spacing**: `Spacer(1, 18)` → Table → `Spacer(1, 6)` → Caption → `Spacer(1, 18)` ### Table Centering Enforcement (MANDATORY) **Every table MUST be horizontally centered on the page. No exceptions.** ```python # ReportLab: ALWAYS set hAlign='CENTER' table = Table(data, colWidths=col_widths, hAlign='CENTER') # Width rule: table total width should be 85-100% of available content width available_width = doc.width # or page_width - left_margin - right_margin table_width = sum(col_widths) if table_width < available_width * 0.85: # Scale up columns proportionally to fill space scale = (available_width * 0.90) / table_width col_widths = [w * scale for w in col_widths] # ❌ WRONG — no hAlign (defaults to LEFT, table drifts left) table = Table(data, colWidths=[100, 200, 150]) # ✅ RIGHT — explicit centering table = Table(data, colWidths=[100, 200, 150], hAlign='CENTER') ``` **LaTeX equivalent:** ```latex \begin{table}[htbp] \centering % MANDATORY \begin{tabular}{...} ... \end{tabular} \end{table} ``` ### Table Cell Content Rule (MANDATORY - NON-NEGOTIABLE) **ALL text content in table cells MUST be wrapped in `Paragraph()`. NO EXCEPTIONS.** ❌ **PROHIBITED** - Plain strings: ```python data = [ ['Header', 'Value'], # Bold won't render! ['Pressure', '1.01 × 105'], # Superscript won't work! ] ``` ✅ **REQUIRED** - All text in Paragraph: ```python data = [ [Paragraph('Header', header_style), Paragraph('Value', header_style)], [Paragraph('Pressure', cell_style), Paragraph('1.01 × 105', cell_style)], ] ``` **The ONLY exception**: `Image()` objects can be placed directly in table cells. ### Units with Exponents (CRITICAL) - PROHIBITED: `W/m2`, `kg/m3`, `m/s2` (plain text exponents) - RIGHT: `Paragraph('W/m2', style)`, `Paragraph('kg/m3', style)` ### Numeric Values in Tables (CRITICAL) - Large numbers MUST use scientific notation: `Paragraph('-1.246 × 108', style)` not `-124600000` - Small decimals MUST use scientific notation: `Paragraph('2.5 × 10-3', style)` not `0.0025` - Threshold: Use scientific notation when |value| ≥ 10000 or |value| ≤ 0.001 ### Complete Table Example ```python from reportlab.platypus import Table, TableStyle, Paragraph, Image from reportlab.lib.styles import ParagraphStyle from reportlab.lib import colors from reportlab.lib.enums import TA_CENTER, TA_LEFT, TA_RIGHT, TA_JUSTIFY header_style = ParagraphStyle( name='TableHeader', fontName='Times New Roman', fontSize=11, textColor=colors.white, alignment=TA_CENTER ) cell_style = ParagraphStyle( name='TableCell', fontName='Times New Roman', fontSize=10, textColor=colors.black, alignment=TA_CENTER ) data = [ [Paragraph('Parameter', header_style), Paragraph('Unit', header_style), Paragraph('Value', header_style)], [Paragraph('Temperature', cell_style), Paragraph('°C', cell_style), Paragraph('25.5', cell_style)], [Paragraph('Pressure', cell_style), Paragraph('Pa', cell_style), Paragraph('1.01 × 105', cell_style)], [Paragraph('Density', cell_style), Paragraph('kg/m3', cell_style), Paragraph('1.225', cell_style)], ] table = Table(data, colWidths=[120, 80, 100]) # ⚠️ Above uses hardcoded widths - OK for this 3-col example (120+80+100=300 < available ~460pt). # For real tables, ALWAYS calculate from available_width. See "Table Width Management" below. table.setStyle(TableStyle([ ('BACKGROUND', (0, 0), (-1, 0), ACCENT), # Header: palette accent ('TEXTCOLOR', (0, 0), (-1, 0), colors.white), ('BACKGROUND', (0, 1), (-1, 1), colors.white), ('BACKGROUND', (0, 2), (-1, 2), BG_SURFACE), # Odd row: palette surface ('BACKGROUND', (0, 3), (-1, 3), colors.white), ('GRID', (0, 0), (-1, -1), 0.5, TEXT_MUTED), # Grid: palette muted ('VALIGN', (0, 0), (-1, -1), 'MIDDLE'), ('LEFTPADDING', (0, 0), (-1, -1), 8), ('RIGHTPADDING', (0, 0), (-1, -1), 8), ('TOPPADDING', (0, 0), (-1, -1), 6), ('BOTTOMPADDING', (0, 0), (-1, -1), 6), ])) ``` ### Table Width Management (⚠️ Prevent Table Overflow) **Most common table bug: columns overflow the right page margin because colWidths are hardcoded without checking available space.** **Iron Rule: The sum of all colWidths MUST NOT exceed `available_width`.** ```python from reportlab.lib.pagesizes import A4 from reportlab.lib.units import inch page_width = A4[0] # 595.28pt left_margin = 1.0 * inch # 72pt (adjust to your doc's actual margins) right_margin = 1.0 * inch # 72pt available_width = page_width - left_margin - right_margin # ≈ 451pt # ─── Method 1: Proportional widths (RECOMMENDED) ─── # Define column ratios, auto-scale to fit available_width col_ratios = [0.25, 0.40, 0.20, 0.15] # must sum to 1.0 col_widths = [r * available_width for r in col_ratios] table = Table(data, colWidths=col_widths) # ─── Method 2: Fixed + flex columns ─── # Some columns have fixed width (numbers, dates), rest fills remaining space fixed_cols = {0: 80, 3: 60} # col 0 = 80pt, col 3 = 60pt fixed_total = sum(fixed_cols.values()) flex_count = 4 - len(fixed_cols) # total columns minus fixed flex_width = (available_width - fixed_total) / flex_count col_widths = [] for i in range(4): col_widths.append(fixed_cols.get(i, flex_width)) table = Table(data, colWidths=col_widths) # ─── Method 3: Let ReportLab auto-calculate (simplest, less control) ─── # Pass colWidths=None - ReportLab auto-sizes based on content # ⚠️ Risk: may still overflow if content is too wide table = Table(data) # no colWidths ``` **Checklist before adding any table:** 1. Calculate `available_width` from your actual page size and margins 2. Sum all colWidths - must be ≤ `available_width` 3. For CJK text: characters are wider than Latin - budget ~12pt per CJK char at 10pt font 4. If a column has long text (descriptions, policies), use Paragraph() wrapping inside cells (not plain strings) - plain strings don't wrap and will overflow 5. Test with the longest expected content in each column **Common mistake - plain strings don't wrap:** ```python # ❌ Plain string: will overflow if text is long data = [['Policy Name', 'Very long description that keeps going and going']] # ✅ Paragraph: wraps within column width from reportlab.platypus import Paragraph from reportlab.lib.styles import getSampleStyleSheet styles = getSampleStyleSheet() cell_style = styles['Normal'] data = [[Paragraph('Policy Name', cell_style), Paragraph('Very long description that keeps going and going', cell_style)]] ``` ``` Do you need auto-TOC? ├─ YES → Use TocDocTemplate + doc.multiBuild(story) └─ NO → Use SimpleDocTemplate + doc.build(story) ``` | Requirement | DocTemplate | Build Method | |-------------|-------------|--------------| | Multi-page with TOC | `TocDocTemplate` | `multiBuild()` | | Single-page or no TOC | `SimpleDocTemplate` | `build()` | | With Cross-References (no TOC) | `SimpleDocTemplate` | `build()` | | Both TOC + Cross-References | `TocDocTemplate` | `multiBuild()` | **⚠️ CRITICAL**: - `multiBuild()` is ONLY needed when using `TableOfContents` - Using `build()` with `TocDocTemplate` = TOC won't work - Using `multiBuild()` without `TocDocTemplate` = unnecessary overhead --- ## Rich Text Formatting ### Prerequisites To use ``, ``, `` tags, you **must**: 1. Register fonts via `registerFont()` 2. Call `registerFontFamily()` to link normal/bold/italic variants 3. Wrap all tagged text in `Paragraph()` objects **CRITICAL**: These tags ONLY work inside `Paragraph()` objects. Plain strings will NOT render them. ### Preventing Unwanted Line Breaks ```python # Use non-breaking space to prevent breaking text = Paragraph("Professors (K.G.\u00A0Palepu) proposed...", style) # Use wordWrap='CJK' for proper Chinese typography styles.add(ParagraphStyle(name='BodyStyle', fontName='SimHei', fontSize=10.5, leading=18, alignment=TA_LEFT, wordWrap='CJK')) # Use
for intentional line breaks (NOT \n) text = Paragraph("Line 1
Line 2
Line 3", style) ``` --- ## Auto-Generated Table of Contents ### ❌ FORBIDDEN: Manual Table of Contents **NEVER manually create TOC with hardcoded page numbers.** ### ⚠️ MANDATORY: TOC requires a cover page **Unless the user explicitly requests no cover, if a document has a Table of Contents, it MUST have a cover page.** Structure: Cover (page 1) → TOC (page 2) → Content (page 3+). Do not generate a TOC without a preceding cover page. ### ✅ ALWAYS use auto-generated TOC: ```python from reportlab.platypus import SimpleDocTemplate, Paragraph, PageBreak, Spacer from reportlab.platypus.tableofcontents import TableOfContents from reportlab.lib.utils import simpleSplit import hashlib class TocDocTemplate(SimpleDocTemplate): def afterFlowable(self, flowable): if hasattr(flowable, 'bookmark_name'): level = getattr(flowable, 'bookmark_level', 0) text = getattr(flowable, 'bookmark_text', '') key = getattr(flowable, 'bookmark_key', '') # MUST pass key as 4th element - without it, TOC entries won't have clickable links self.notify('TOCEntry', (level, text, self.page, key)) doc = TocDocTemplate("document.pdf", pagesize=letter) story = [] toc = TableOfContents() toc.levelStyles = [ ParagraphStyle(name='TOCHeading1', fontSize=14, leftIndent=20, fontName='Times New Roman'), ParagraphStyle(name='TOCHeading2', fontSize=12, leftIndent=40, fontName='Times New Roman'), ] story.append(Paragraph("Table of Contents", styles['Title'])) story.append(toc) story.append(PageBreak()) def add_heading(text, style, level=0): # Generate a unique bookmark key for this heading key = 'h_%s' % hashlib.md5(text.encode()).hexdigest()[:8] p = Paragraph('%s' % (key, text), style) p.bookmark_name = text p.bookmark_level = level p.bookmark_text = text p.bookmark_key = key # This key enables clickable TOC links return p # ⚠️ Orphan Heading Prevention (NOT a page break rule) # If an H1 heading would appear in the bottom 15% of the page with no room # for at least a few lines of content, push it to the next page. # This is NOT "every H1 starts a new page" — it only triggers when the heading # would be orphaned at the very bottom. available_height = A4[1] - top_margin - bottom_margin H1_ORPHAN_THRESHOLD = available_height * 0.15 # ~100pt ≈ 3.5cm — enough for heading + 2 lines def add_major_section(text, style): """Add an H1-level heading with orphan prevention (NOT forced page break).""" return [ CondPageBreak(H1_ORPHAN_THRESHOLD), # Only break if heading would be orphaned at page bottom add_heading(text, style, level=0), ] story.extend(add_major_section("Chapter 1: Introduction", styles['Heading1'])) # ... content ... doc.multiBuild(story) # MUST use multiBuild for TOC ``` **⚠️ CRITICAL TOC LINK RULES:** 1. `afterFlowable` MUST pass `key` as the 4th element in the notify tuple - without it, TOC entries have no clickable links 2. `add_heading` MUST set `bookmark_key` AND embed `` in the Paragraph text - this creates the link destination 3. The key must be unique per heading - use a hash of the heading text ### TOC with Leader Dots (Copy-Paste Ready) **⚠️ WARNING**: This manual approach creates a visual-only TOC. It has **NO clickable links** and **NO auto-updated page numbers**. Use the auto-generated TOC above (TocDocTemplate + multiBuild) whenever possible. Only use this leader-dots approach if you need very specific visual styling that the auto TOC cannot provide - and even then, prefer the auto approach. For a professional TOC with leader dots connecting titles to page numbers: ```python from reportlab.lib.pagesizes import A4 from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph, PageBreak from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle from reportlab.lib import colors from reportlab.lib.units import inch # Setup doc = SimpleDocTemplate("report.pdf", pagesize=A4, leftMargin=0.75*inch, rightMargin=0.75*inch) styles = getSampleStyleSheet() styles['Heading1'].fontName = 'Times New Roman' styles['Heading1'].textColor = colors.black story = [] # Calculate dimensions page_width = A4[0] available_width = page_width - 1.5*inch page_num_width = 50 # Fixed width for page numbers dots_column_width = available_width - 200 - page_num_width optimal_dot_count = int(dots_column_width / 4.5) # ~4.5pt per dot at 7pt font # Define styles toc_style = ParagraphStyle('TOCEntry', parent=styles['Normal'], fontName='Times New Roman', fontSize=11, leading=16) dots_style = ParagraphStyle('LeaderDots', parent=styles['Normal'], fontName='Times New Roman', fontSize=7, leading=16) # Build TOC toc_data = [ [Paragraph('Table of Contents', styles['Heading1']), '', ''], ['', '', ''], ] entries = [('Section 1', '5'), ('Section 2', '10')] for title, page in entries: toc_data.append([ Paragraph(title, toc_style), Paragraph('.' * optimal_dot_count, dots_style), Paragraph(page, toc_style) ]) toc_table = Table(toc_data, colWidths=[None, dots_column_width, page_num_width]) toc_table.setStyle(TableStyle([ ('GRID', (0, 0), (-1, -1), 0, colors.white), ('LINEBELOW', (0, 0), (0, 0), 1.5, colors.black), ('ALIGN', (0, 0), (0, -1), 'LEFT'), ('ALIGN', (1, 0), (1, -1), 'LEFT'), ('ALIGN', (2, 0), (2, -1), 'RIGHT'), ('VALIGN', (0, 0), (-1, -1), 'TOP'), ('LEFTPADDING', (0, 0), (-1, -1), 0), ('RIGHTPADDING', (0, 0), (-1, -1), 0), ('TOPPADDING', (0, 2), (-1, -1), 3), ('BOTTOMPADDING', (0, 2), (-1, -1), 3), ('TEXTCOLOR', (1, 2), (1, -1), TEXT_MUTED), # From palette --c-muted ])) story.append(toc_table) story.append(PageBreak()) doc.build(story) ``` **MANDATORY Leader Dots Rules:** | Rule | Requirement | |------|-------------| | Column widths | ✅ MUST use fixed values. Percentage-based widths are **STRICTLY FORBIDDEN**. | | Dot count | ✅ MUST calculate dynamically: `int(dots_column_width / 4.5)`. Hard-coded counts are **STRICTLY FORBIDDEN**. | | Page number column | ✅ MUST be at least 40pt wide. | | Dot font size | ✅ MUST NOT exceed 8pt. | | Dot alignment | ✅ MUST be LEFT-aligned (visual flow from title). | | Padding | ✅ MUST be exactly 0 between columns. | | Bold text | ✅ MUST use `Paragraph('Text', style)`. Plain strings like `'Text'` are **STRICTLY FORBIDDEN**. | | Indentation | Use leading spaces for hierarchy (e.g., `" 1.1 Subsection"`). | ### TOC Post-Generation Validation (MANDATORY) After generating any PDF with a Table of Contents, run: ```bash python3 "$PDF_SKILL_DIR/scripts/toc_validate.py" check-pdf output.pdf ``` If any errors are returned, fix the code and regenerate. Common issues: - `TOC_ALL_SAME_PAGE` → You used `build()` instead of `multiBuild()`, so all page numbers are stuck at 1. - `TOC_NO_ENTRIES` → Headings are missing `bookmark_name`/`bookmark_level` attributes. - `TOC_PAGES_INVALID` → A TOC entry references a page beyond the document's total page count. --- ## Cross-References (Figures, Tables, Bibliography) Pre-register all figures, tables, and references BEFORE using them in text. ```python class CrossReferenceDocument: def __init__(self): self.figures = {} self.tables = {} self.refs = {} self.figure_counter = 0 self.table_counter = 0 self.ref_counter = 0 def add_figure(self, name): if name not in self.figures: self.figure_counter += 1 self.figures[name] = self.figure_counter return self.figures[name] def add_table(self, name): if name not in self.tables: self.table_counter += 1 self.tables[name] = self.table_counter return self.tables[name] def add_reference(self, name): if name not in self.refs: self.ref_counter += 1 self.refs[name] = self.ref_counter return self.refs[name] ``` --- ## Image Handling ### Diagram Embedding Rules (MANDATORY) **Figures are block-level Flowables.** Every image/diagram in ReportLab MUST be appended to `story` as a standalone `Image()` or wrapped in `KeepTogether([image, caption])`. Never simulate floating layouts by placing images inside Paragraph text or multi-column Table cells alongside body text. **Complex diagrams (>12 nodes)**: Decompose into a ReportLab Table (details) + a simplified diagram image (overview). The diagram gives the big picture; the table carries precision. See SKILL.md "Complex Diagram Strategy". **Diagram quality**: When generating flowcharts/diagrams for PDF embedding (via Playwright screenshot or ReportLab Drawing), follow the diagram content quality rules in SKILL.md § "Diagram Content Quality Rules". Key points: connectors must not pass through nodes, no high-saturation large fills, multi-arrow convergence must use merge pattern, font sizes must be legible at final embedded size (see context-specific minimums in SKILL.md). **Cross-brief diagram pipeline**: For flowcharts/architecture diagrams, generate via **Playwright+CSS** (HTML nodes + CSS layout + SVG connectors), screenshot at 2× device scale factor for 300dpi print quality, then embed with `Image()`. See SKILL.md § "Diagram Generation Strategy" for the full pipeline. **🚫 FORBIDDEN: TikZ standalone → PNG pipeline in Report brief.** Report uses ReportLab, not LaTeX. Going through TikZ requires a LaTeX compiler, adds compilation steps, and models frequently produce broken TikZ code. Use Playwright+CSS instead - it's native to the existing cover rendering engine. ```python # Playwright+CSS diagram → PNG → ReportLab embedding # 1. LLM generates diagram.html (CSS grid/flexbox nodes + SVG arrows) # 2. Playwright screenshots at 2× scale (300dpi equivalent) # 3. Embed: from reportlab.platypus import Image img = Image('diagram.png', width=450) # auto height via aspect ratio story.append(img) # block-level Flowable ``` **⚠️ CRITICAL: Preserve aspect ratio - NEVER hardcode both width and height without reading actual image dimensions.** **⚠️ CRITICAL: Always constrain to available space - NEVER insert at original size if it exceeds container width/height.** Pie charts become ellipses, radar charts become diamonds, photos get stretched if you set arbitrary width/height. **→ Full overflow prevention spec: `typesetting/overflow.md`** - read for the complete bounding box system. ```python from PIL import Image as PILImage from reportlab.platypus import Image from reportlab.lib.pagesizes import A4 from reportlab.lib.units import inch def embed_image(path: str, max_width: float = None, max_height: float = None) -> Image: """Embed image with preserved aspect ratio, constrained to container. ALWAYS use this pattern. Never hardcode both width and height. If max_width is None, defaults to available_width (page - margins). """ if max_width is None: max_width = A4[0] - 2.0 * inch # Default: A4 width minus 1" margins each side if max_height is None: max_height = A4[1] * 0.35 # ~294pt ≈ 10cm — leaves room for caption + surrounding text on same page pil_img = PILImage.open(path) orig_w, orig_h = pil_img.size # Scale to fit within BOTH max_width and max_height ratio_w = max_width / orig_w if orig_w > max_width else 1.0 ratio_h = max_height / orig_h if orig_h > max_height else 1.0 ratio = min(ratio_w, ratio_h) return Image(path, width=orig_w * ratio, height=orig_h * ratio) # Usage: available_width = A4[0] - left_margin - right_margin img = embed_image('chart.png', max_width=available_width, max_height=300) story.append(img) ``` ```python # ❌ NEVER do this - distorts the image: img = Image('chart.png', width=400, height=250) # arbitrary height breaks aspect ratio ``` --- ## PDF Metadata ```python doc = SimpleDocTemplate( pdf_filename, pagesize=letter, title=os.path.splitext(pdf_filename)[0], author='Z.ai', creator='Z.ai', subject='Document purpose description' ) ``` ### ⚠️ MANDATORY: Post-Generation Metadata After `doc.build(story)` completes: ```bash python3 "$PDF_SKILL_DIR/scripts/pdf.py" meta.brand output.pdf ``` ### ⚠️ MANDATORY: Post-Generation Blank Page Cleanup After metadata branding, remove any accidental blank pages: ```bash python3 "$PDF_SKILL_DIR/scripts/pdf.py" pages.clean output.pdf -o output_clean.pdf ``` If blank pages were found, rename `output_clean.pdf` → `output.pdf`. --- ## MANDATORY: Post-Generation Code Sanitization **After writing PDF generation code and BEFORE executing it**, sanitize using: ```bash # Step 0: Set skill root path (see SKILL.md § Script Path Setup) PDF_SKILL_DIR="" # Step 1: Write code to .py file cat > generate_pdf.py << 'PYEOF' # ... your PDF generation code ... PYEOF # Step 2: Sanitize (MUST run before execution) python3 "$PDF_SKILL_DIR/scripts/pdf.py" code.sanitize generate_pdf.py # Step 3: Execute python3 generate_pdf.py # Step 4: Add metadata (MUST run after PDF creation) python3 "$PDF_SKILL_DIR/scripts/pdf.py" meta.brand output.pdf # Step 5: Check for missing glyphs (RECOMMENDED) python3 "$PDF_SKILL_DIR/scripts/pdf.py" font.check output.pdf # Step 6: Check TOC quality (if document has TOC) python3 "$PDF_SKILL_DIR/scripts/pdf.py" toc.check output.pdf # Step 7: PDF quality assurance scan (MUST run after all other checks) python3 "$PDF_SKILL_DIR/scripts/pdf_qa.py" output.pdf ``` **FORBIDDEN patterns:** ```bash # ❌ PROHIBITED: python -c with inline code python -c "from reportlab... doc.build(story)" # ❌ PROHIBITED: heredoc without saving to file first python3 << 'EOF' from reportlab... EOF # ❌ PROHIBITED: executing .py without sanitizing python generate_pdf.py # Missing sanitization! ``` --- ## Debug Tips for Layout Issues **→ Full overflow prevention spec: `typesetting/overflow.md`** - read it for the complete bounding box system, two-pass rendering, fallback strategies, and all three route implementations. ```python from reportlab.platypus import HRFlowable from reportlab.lib.colors import red # Visualize spacing during development - insert between elements story.append(table) story.append(HRFlowable(width="100%", color=red, thickness=0.5, spaceBefore=0, spaceAfter=0)) story.append(Spacer(1, 6)) story.append(HRFlowable(width="100%", color=red, thickness=0.5, spaceBefore=0, spaceAfter=0)) story.append(caption) # Red lines create visual markers to see actual spacing - remove before final build ``` --- ## Resume / CV Template (ATS-Friendly) Single-column, clean layout optimised for Applicant Tracking Systems (ATS). No graphics, no sidebars, no colour blocks - just well-structured text that parses perfectly by HR software. **When to use this template:** - Applying to corporate jobs through online portals - Any scenario where the PDF will be machine-parsed before a human reads it - When the recruiter explicitly asks for "a standard resume" **When NOT to use (use Academic brief instead):** - Creative/design industry positions → Academic brief (AltaCV-style) - Academic CV with publications → Academic brief (Academic CV) ### Resume Design Rules - **Target 1 page** unless user specifies otherwise - **Margins**: `left=1.5cm, right=1.5cm, top=1.5cm, bottom=1.5cm` - **No cover page** - content starts immediately - **No TOC** - too short for table of contents - **Font**: Times New Roman (English) or SimHei (Chinese) - **Body font size**: 10-10.5pt; Name: 22-26pt; Section titles: 13-14pt - **⚠️ Minimum font size: 12px (9pt) - HARD FLOOR.** No text in the entire resume may render smaller than 12px. This includes contact info, meta text, footnotes, and captions. Anything below 12px is unreadable in print and fails accessibility checks. - **Section separator**: thin horizontal rule (`HRFlowable`) - **Bullet style**: `•` with tight spacing ### Resume Line-Break Rules (Language-Aware) - **English**: Prefer breaking at word boundaries (spaces, hyphens). If a long word must be split to avoid excessive whitespace, break at a valid syllable boundary and insert a hyphen (`-`) - this is standard typographic practice (e.g., `experi-\nence`, `develop-\nment`). ReportLab supports `wordWrap='CJK'` only for CJK content; for English use default paragraph wrapping with `allowWidows=0, allowOrphans=0`. - **Chinese/CJK**: Break allowed between any two CJK characters. Never break between a CJK character and its adjacent punctuation (、。,)》 etc. must stay with the preceding character). - **Mixed content** (e.g., "Python 开发工程师"): Break at CJK boundaries or English word boundaries. Never split an English word in a CJK paragraph unless hyphenated. - **Contact line**: Email, phone, location separated by `|` or `·`. Each segment must stay on one line - if too long, move to next line at the separator, not mid-segment. - **Dates and ranges**: "Jan 2022 - Present" must stay as one unit. Never break a date range across lines. ### Resume Page-Fill Rules (Anti-Blank-Space) - **Goal: Fill ≥85% of the page height.** Content should reach at least the bottom quarter of the page. A resume that stops at 60% height with blank space below = FAIL. - **Adaptive spacing strategy** (apply in order until page is ≥85% filled): 1. Increase `spaceBefore` / `spaceAfter` on section headers (from 10pt up to 18pt) 2. Increase `leading` (line height) on body text (from 14pt up to 18pt) 3. Increase body `fontSize` by 0.5-1pt (from 10pt up to 11.5pt max) 4. Add a "Professional Summary" or "Key Achievements" section if none exists 5. Increase margins slightly (from 1.5cm up to 2cm) to reduce line width and push content downward - **Never leave a visible blank area > 3cm at the bottom of the page.** - **If content overflows to page 2**: do the reverse - reduce spacing, tighten leading (min 12pt), reduce fontSize (min 9pt / 12px), before removing content. ### Complete Resume Template ```python from reportlab.lib.pagesizes import A4 from reportlab.lib.units import cm, mm from reportlab.lib.styles import ParagraphStyle from reportlab.lib.enums import TA_LEFT, TA_CENTER from reportlab.lib import colors from reportlab.platypus import ( SimpleDocTemplate, Paragraph, Spacer, HRFlowable, Table, TableStyle ) from reportlab.pdfbase import pdfmetrics from reportlab.pdfbase.ttfonts import TTFont from reportlab.pdfbase.pdfmetrics import registerFontFamily # ── Font Registration ── pdfmetrics.registerFont(TTFont('TimesNewRoman', '/usr/share/fonts/truetype/english/Times-New-Roman.ttf')) registerFontFamily('TimesNewRoman', normal='TimesNewRoman', bold='TimesNewRoman') # For Chinese resumes, also register SimHei: # pdfmetrics.registerFont(TTFont('SimHei', '/usr/share/fonts/truetype/chinese/SimHei.ttf')) # registerFontFamily('SimHei', normal='SimHei', bold='SimHei') # ── Styles ── # ACCENT must come from palette.generate (Step 2) # Run: python3 "$PDF_SKILL_DIR/scripts/pdf.py" palette.generate --title "Resume" --mode minimal ACCENT = colors.HexColor('') # Replace with palette output name_style = ParagraphStyle( 'ResumeName', fontName='TimesNewRoman', fontSize=24, leading=28, alignment=TA_CENTER, spaceAfter=2 ) contact_style = ParagraphStyle( 'ResumeContact', fontName='TimesNewRoman', fontSize=10, leading=14, alignment=TA_CENTER, textColor=TEXT_MUTED, # From palette --c-muted spaceAfter=8 ) section_title_style = ParagraphStyle( 'ResumeSectionTitle', fontName='TimesNewRoman', fontSize=13, leading=16, spaceBefore=10, spaceAfter=4, textColor=ACCENT ) job_title_style = ParagraphStyle( 'ResumeJobTitle', fontName='TimesNewRoman', fontSize=11, leading=14, spaceAfter=1 ) job_meta_style = ParagraphStyle( 'ResumeJobMeta', fontName='TimesNewRoman', fontSize=10, leading=13, textColor=TEXT_MUTED, spaceAfter=4 # From palette --c-muted ) bullet_style = ParagraphStyle( 'ResumeBullet', fontName='TimesNewRoman', fontSize=10, leading=14, leftIndent=14, bulletIndent=0, spaceBefore=1, spaceAfter=1 ) body_style = ParagraphStyle( 'ResumeBody', fontName='TimesNewRoman', fontSize=10, leading=14, spaceAfter=2 ) # ── Helpers ── def section_header(title): """Section title + thin rule separator.""" return [ Paragraph(f'{title}', section_title_style), HRFlowable(width='100%', thickness=0.8, color=ACCENT, spaceBefore=0, spaceAfter=6), ] def experience_entry(title, company, dates, location, bullets): """One work experience block.""" elements = [ Paragraph(f'{title}', job_title_style), Paragraph(f'{company} | {dates} | {location}', job_meta_style), ] for b in bullets: elements.append(Paragraph(f'• {b}', bullet_style)) elements.append(Spacer(1, 4)) return elements def education_entry(degree, school, dates, details=None): """One education block.""" elements = [ Paragraph(f'{degree}', job_title_style), Paragraph(f'{school} | {dates}', job_meta_style), ] if details: elements.append(Paragraph(details, body_style)) elements.append(Spacer(1, 4)) return elements def skills_row(categories): """ Skills as compact label: value pairs. categories = [('Programming', 'Python, Java, C++'), ('Tools', 'Git, Docker')] """ elements = [] for cat, vals in categories: elements.append(Paragraph(f'{cat}: {vals}', body_style)) return elements # ── Build Document ── doc = SimpleDocTemplate( 'resume.pdf', pagesize=A4, leftMargin=1.5*cm, rightMargin=1.5*cm, topMargin=1.5*cm, bottomMargin=1.5*cm, title='Resume - Your Name', author='Z.ai', creator='Z.ai' ) story = [] # Header story.append(Paragraph('YOUR NAME', name_style)) story.append(Paragraph( 'email@example.com | +86 138-0000-0000 | Shanghai, China | github.com/yourname', contact_style )) # Summary story.extend(section_header('PROFESSIONAL SUMMARY')) story.append(Paragraph( 'Results-driven software engineer with 5+ years of experience in backend systems, ' 'distributed computing, and cloud-native architectures. Led a team of 8 engineers ' 'delivering a real-time data pipeline processing 2M+ events/sec.', body_style )) # Experience story.extend(section_header('WORK EXPERIENCE')) story.extend(experience_entry( 'Senior Software Engineer', 'Tech Company Inc.', 'Jan 2022 - Present', 'Shanghai', [ 'Designed and deployed a microservices architecture serving 10M daily active users', 'Reduced API latency by 40% through query optimisation and caching strategies', 'Mentored 3 junior engineers; established code review standards adopted team-wide', ] )) story.extend(experience_entry( 'Software Engineer', 'Startup Co.', 'Jul 2019 - Dec 2021', 'Beijing', [ 'Built real-time recommendation engine using collaborative filtering (CTR +25%)', 'Implemented CI/CD pipeline reducing deployment time from 2 hours to 15 minutes', ] )) # Education story.extend(section_header('EDUCATION')) story.extend(education_entry( 'M.Sc. Computer Science', 'Tsinghua University', '2017 - 2019', 'GPA: 3.8/4.0 | Thesis: Distributed Graph Processing on Heterogeneous Clusters' )) story.extend(education_entry( 'B.Eng. Software Engineering', 'Zhejiang University', '2013 - 2017' )) # Skills story.extend(section_header('SKILLS')) story.extend(skills_row([ ('Languages', 'Python, Java, Go, SQL, TypeScript'), ('Frameworks', 'Spring Boot, FastAPI, React, Kubernetes, Kafka'), ('Tools', 'Git, Docker, Terraform, AWS (EC2/S3/Lambda), PostgreSQL, Redis'), ])) doc.build(story) ``` ### Resume Checklist - [ ] **1 page** (unless user says otherwise) - [ ] **No cover page, no TOC** - [ ] Tight margins (1.5cm all sides) - [ ] Name prominent at top (22-26pt) - [ ] Contact info single line, centered - [ ] Section headers with consistent separator style - [ ] Bullets concise - start with action verbs - [ ] Quantified achievements (%, $, count) - [ ] No photos, no icons, no colour blocks (ATS-safe) - [ ] Font: only registered fonts (Times New Roman / SimHei) - [ ] **⚠️ Minimum font size 12px (9pt)** - no text smaller than this anywhere - [ ] **Line breaks are language-aware** - no mid-word English breaks, no CJK punctuation orphans, no date range splits - [ ] **Page fill ≥85%** - no large blank area at bottom. If sparse, increase spacing/leading/font size adaptively --- ## Component Reference ### Block types: | Type | Description | Notes | |------|-------------|-------| | h1 | 22pt + accent underline rule | KeepTogether with rule | | h2 | 15pt, dark | No rule, no accent | | h3 | 11.5pt bold | No accent | | body | 10.5pt justified, 17pt leading | Supports `` `` `` | | bullet | Body size with `•` prefix | Unordered list | | numbered | Body size with N. prefix | Counter auto-resets | | callout | Accent left border 4px + light tint bg | Max one per section | | table | Accent header + alternating rows + outer box only | Supports fractional col_widths | | code | Courier 8.5pt + accent left border | Optional language label | | divider | Accent 1.2pt rule | Use sparingly | | caption | 8.5pt muted, centered | Below images/tables | | chart | matplotlib figure saved as PNG → `Image()` flowable | Generate chart in-script, `fig.savefig()` → embed. Set `plt.rcParams['font.sans-serif']=['SimHei']` for CJK. Always `tight_layout()`. **Must follow `typesetting/charts.md` rules**: delete top/right spines, dashed grid 20% opacity, donut default for pie, anti-stacking labels. | | quote | Body italic + left indent 24pt + muted accent left border 2px | For blockquotes / testimonials | | bibliography | Hanging indent (firstLineIndent=-24pt, leftIndent=24pt) | GB/T 7714 or APA format per language | | math | Rendered via `` `` tags in Paragraph | For inline math; complex equations → use academic brief instead | ### Header / footer: - Header: document title (left, 7.5pt, muted) + accent rule (1.5pt, full width) - Footer: author (left, 7.5pt, muted) + page number (right, 7.5pt, muted) + light rule above ### Custom flowables (preferred over Table hacks): - **CalloutBox**: accent 4px left border + light tint background - cleaner than Table simulation - **BibliographyItem**: hanging indent reference entry --- ## Quick Reference | Task | Best Tool | Command/Code | |------|-----------|--------------| | Create PDF (ReportLab) | reportlab | Canvas or Platypus | | Fill PDF forms | Process brief | `python3 "$PDF_SKILL_DIR/scripts/pdf.py" form.fill` or annotation workflow | | Merge PDFs | Process brief | `python3 "$PDF_SKILL_DIR/scripts/pdf.py" pages.merge` | | Extract text | Process brief | `python3 "$PDF_SKILL_DIR/scripts/pdf.py" extract.text` | | Extract tables | Process brief | `python3 "$PDF_SKILL_DIR/scripts/pdf.py" extract.table` | --- ## Document-Type Quick Reference ### 📋 Invoice / Receipt Key elements: - Company logo/name at top - Invoice number + date (right-aligned) - Seller/buyer info block (2-column layout) - Line items table: Item, Quantity, Unit Price, Amount - Total row with bold font - Tax info + payment terms at bottom - Stamp/seal placeholder (empty circle or rect with "Seal Here" text) ```python # Invoice table pattern data = [['Item', 'Qty', 'Unit Price', 'Amount']] data += [[item['name'], item['qty'], item['price'], item['total']] for item in items] data.append(['', '', 'Total:', f'¥{total}']) ``` ### 📝 Contract / Legal Document Key elements: - Title centered, bold, 18pt - Numbered clauses with auto-increment - Signature block at bottom: Party A / Party B with date line - Page numbers mandatory ```python # Signature block pattern sig_data = [ ['Party A (Seal): ________________', 'Party B (Seal): ________________'], ['Date: ____/____/________', 'Date: ____/____/________'], ] sig_table = Table(sig_data, colWidths=[250, 250]) ``` ### 📊 Book / Long Document For documents with 10+ pages: 1. Generate TOC with `TableOfContents()` from reportlab.platypus 2. Use `CondPageBreak(H1_ORPHAN_THRESHOLD)` before H1 headings (NOT `PageBreak()` — never force page breaks between chapters) 3. Add running headers/footers with chapter title + page number 4. Run `python3 "$PDF_SKILL_DIR/scripts/toc_validate.py" output.pdf` to verify TOC links 5. Consider using `bookmarks=True` for PDF outline navigation --- ### 📝 Exam / Quiz / Test Paper / Worksheet **Exam papers have unique layout requirements. These rules are MANDATORY when generating any test/quiz/exam document.** #### Numbering & Structure ``` 一、选择题(每小题 3 分,共 30 分) ← Section header (宋体/黑体 14pt Bold) 1. 以下哪个不是 Python 内置数据类型? ← Question stem (12pt) A. int ← Options: indented 24pt (2em) B. float ← Each option on new line or 2×2 grid C. array D. str 2. 以下表达式的值为? ← Next question A. True B. False ← Short options: inline 2×2 grid OK C. None D. Error ``` #### Option Indentation (MANDATORY) ```python # ReportLab styles for exam papers question_style = ParagraphStyle( 'Question', fontSize=12, leading=16, leftIndent=0, firstLineIndent=0, spaceBefore=12, spaceAfter=4, ) option_style = ParagraphStyle( 'Option', fontSize=12, leading=16, leftIndent=24, # ← MANDATORY: 24pt indent from question stem firstLineIndent=0, spaceBefore=2, spaceAfter=2, ) ``` #### Option Layout Decision ```python def format_options(options): """ Decide option layout based on content length. Short options (≤4 chars each, ≤4 options) → 2×2 grid table Long options or >4 options → vertical list, one per line """ max_len = max(len(opt) for opt in options) if len(options) <= 4 and max_len <= 6: # CJK: 4 chars; Latin: 6 chars # 2×2 grid using Table grid = Table( [[options[0], options[1]], [options[2], options[3]]], colWidths=[200, 200], hAlign='LEFT' ) grid.setStyle(TableStyle([ ('LEFTPADDING', (0,0), (-1,-1), 24), # Match option indent ('VALIGN', (0,0), (-1,-1), 'TOP'), ])) return grid else: # Vertical list return [Paragraph(f'{opt}', option_style) for opt in options] ``` #### Answer Space Reservation (MANDATORY) **Every question MUST have adequate answer space. No exceptions.** | Question Type | Minimum Space | Implementation | |--------------|---------------|----------------| | Multiple choice | 0 extra (options are the answer) | Just question + options | | Fill-in-the-blank | Inline underline, min 80pt width | `        ` or ReportLab `HRFlowable` | | Short answer | 40-60pt (2-3 lines) | `Spacer(1, 50)` | | Calculation / math work | 120-200pt (6-10 lines) | `Spacer(1, 160)` with light guide lines | | Essay / long answer | 200-400pt (10-20 lines) | `Spacer(1, 300)` with light guide lines | | Proof / derivation | 160-250pt (8-12 lines) | `Spacer(1, 200)` | ```python # Answer line helper — light dashed lines for handwriting def add_answer_lines(story, num_lines=8, line_spacing=20): """Add light guide lines for handwritten answers.""" for i in range(num_lines): story.append(Spacer(1, line_spacing)) story.append(HRFlowable( width='90%', thickness=0.3, color=colors.Color(0.8, 0.8, 0.8), # Light gray dash=[4, 4], # Dashed spaceAfter=0, spaceBefore=0 )) ``` #### Page Density - `spaceBefore=12pt` between questions minimum - `spaceBefore=24pt` before section headers (一、二、三) - Score indicator after question number: `1. (分值: 5分)` or `1. [5 pts]` - Page header: exam title + time limit + total score - No cover page unless explicitly requested --- ## Data-to-Ink Ratio Rules (MANDATORY for Reports) **CRITICAL: Do NOT write long paragraphs to describe data trends.** You MUST extract metrics and trends into structured visual elements. ### Pattern Detection Table Before writing ANY body paragraph, scan the text. If you find any of these patterns, extract them: | Raw text pattern | ❌ FORBIDDEN | ✅ REQUIRED visual form | |-----------------|-------------|----------------------| | "revenue grew 12% to $4.2M" | Bury in paragraph prose | CalloutBox with bold `+12%` + `$4.2M` | | "latency dropped from 120ms to 75ms" | Long explanatory sentence | CalloutBox or metrics table row | | "Q1→Q2→Q3: 10%→25%→40%" | Inline numbers in text | Chart (matplotlib PNG → `Image()`) | | "first...second...third..." steps | Paragraph with ordinal words | Numbered Table or process list | | "compared to last year / vs Q2" | Nested comparisons in prose | Side-by-side comparison table | | "accounted for 60% of total" | Percentage in paragraph | Pie chart or stacked bar chart | ### ReportLab CalloutBox Template Use this pattern for extracted metrics - it's visually clean and takes only 5 lines: ```python stat_style = ParagraphStyle( name='StatBig', fontName='Times New Roman', fontSize=22, leading=26, textColor=ACCENT, alignment=TA_CENTER # From palette ) label_style = ParagraphStyle( name='StatLabel', fontName='Times New Roman', fontSize=9, leading=12, textColor=TEXT_MUTED, alignment=TA_CENTER # From palette ) callout = Table( [[Paragraph('+12%', stat_style)], [Paragraph('Revenue Growth vs Q2', label_style)]], colWidths=[160] ) callout.setStyle(TableStyle([ ('BACKGROUND', (0,0), (-1,-1), BG_SURFACE), # From palette --c-mid ('BOX', (0,0), (-1,-1), 1, ACCENT), # From palette --c-accent ('TOPPADDING', (0,0), (-1,-1), 10), ('BOTTOMPADDING', (0,0), (-1,-1), 10), ('VALIGN', (0,0), (-1,-1), 'MIDDLE'), ])) story.append(KeepTogether(callout)) ``` ### Cover Pages (HTML/Playwright) When the cover routes through Playwright: - Use `Delta_Widget` components for KPIs and metrics. - Use `Process_List` components for workflows and timelines. - Use `Sidenote_Block` components (in `tufte_report` archetype) for citations and supplementary data. - For data-heavy pages, add `data_points` arrays to any component - the engine renders them as content-aware background curves. --- ## Critical Reminders Checklist Before submitting code, verify: - [ ] **Font restriction**: Only 6 registered fonts used - [ ] **Font family registered**: `registerFontFamily()` called for all used fonts - [ ] **Mixed language**: `install_font_fallback()` called after font registration (auto handles `` wrapping) - [ ] **Rich text tags**: Only inside `Paragraph()` objects - [ ] **Table cells**: ALL text wrapped in `Paragraph()` - [ ] **Scientific notation**: Large/small numbers use `` tags - [ ] **Chinese text**: `wordWrap='CJK'` in ParagraphStyle - [ ] **Line breaks**: Using `
` not `\n` - [ ] **Headings**: Bold with `` tags - [ ] **Table headers**: White bold text on dark blue (#1F4E79) - [ ] **Metadata**: Title matches filename, Author/Creator = "Z.ai" - [ ] **Sanitization**: Code sanitized before execution - [ ] **Post-build metadata**: `pdf.py meta.brand` called after build - [ ] **Post-build blank page cleanup**: `pdf.py pages.clean` called after build - [ ] **Glyph check**: `pdf.py font.check` run to verify no missing glyphs - [ ] **TOC check**: `pdf.py toc.check` run if document has TOC (entries, pages, links) - [ ] **PDF QA scan**: `pdf_qa.py` run to verify page consistency, CJK punctuation, overflow, margins, table centering, font embedding, metadata