Files
2026-06-06 05:21:10 +00:00

1660 lines
72 KiB
Markdown
Executable File
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Brief: Report Production
Structured documents via ReportLab: reports, proposals, contracts, white papers, financial analysis, data tables. Also covers ATS-friendly resumes.
---
> ### 🚨 EMOJI CHECK - STOP HERE IF CONTENT HAS EMOJI
>
> ReportLab renders emoji (📊🎯🔥💡 etc.) as **□ tofu squares**. This is unfixable.
>
> **If the user's content contains intentional emoji** → **STOP. Do NOT use this brief.**
> Route to `briefs/creative.md` instead (Playwright renders emoji natively).
>
> This applies even if the document is a "report" - emoji + ReportLab = broken output.
---
## Production Workflow
```
1. BRIEF → Confirm document type, audience, page count, outline
2. DESIGN → Run `pdf.py palette.generate --title "..."` → copy-paste output into script
3. EDIT → Content transformation (extract data, define typographic roles)
4. COVER → Read typesetting/cover.md → render HTML cover via Playwright (default ON for reports ≥ 3 pages; skip only for resumes/letters/memos/forms)
5. BUILD → Write ReportLab code (fonts → styles → content → tables → charts)
6. PREFLIGHT → code.sanitize → execute → meta.brand → font.check → toc.check → pages.clean → pdf_qa.py
7. DELIVER → Merge cover + body → final PDF
```
**Typesetting assets** (load when you reach that step):
- `typesetting/palette.md` - color system, typography rules, anti-patterns
- `typesetting/cover.md` - 7 cover layouts with variants, typography scale, bounding box rules
- `typesetting/charts.md` - chart styling, anti-stacking rules, axis/grid/legend treatment
**Cover is DEFAULT ON for reports.** Skip cover only for: resumes, letters, memos, forms, checklists, or documents ≤ 2 pages.
---
## Step 2: DESIGN - Palette & Font Plan
**Before writing any ReportLab code, you MUST generate the color palette via the `palette.generate` command.** This is not optional. Do NOT hardcode hex colors. Do NOT pick colors by feel.
### 2a. Generate Palette (MANDATORY)
**Run this command FIRST. Copy-paste its output directly into your Python script:**
```bash
python3 "$PDF_SKILL_DIR/scripts/pdf.py" palette.generate --title "<document title>" --mode minimal
```
The command auto-derives the design intent from the document title, computes a mathematically harmonious palette, and outputs ready-to-paste ReportLab Python code:
```python
# ━━ Color Palette (auto-generated by pdf.py palette.generate) ━━
# Intent: neutral | Mode: minimal | Harmony: split_complementary
# Contrast: text:bg=14.12 | accent:bg=3.06
from reportlab.lib import colors
ACCENT = colors.HexColor('#2f97b9')
TEXT_PRIMARY = colors.HexColor('#252422')
TEXT_MUTED = colors.HexColor('#8d8981')
BG_SURFACE = colors.HexColor('#dfdad2')
BG_PAGE = colors.HexColor('#f5f4f3')
SURFACE_RGBA = 'rgba(0,0,0,0.03)'
# ReportLab table colors
TABLE_HEADER_COLOR = ACCENT
TABLE_HEADER_TEXT = colors.white
TABLE_ROW_EVEN = colors.white
TABLE_ROW_ODD = BG_SURFACE
```
**Options:**
- `--mode minimal` (default, recommended for 50%+ of documents) | `dark` | `pastel` | `jewel` | `light`
- `--harmony auto` (default, recommended) | `complementary` | `split_complementary` | `analogous` | `triadic` | `monochrome`
- `--format python` (default) | `json` | `css`
- `--seed <int>` - for reproducible palettes across regenerations
**⚠️ FORBIDDEN:**
- ❌ Writing `colors.HexColor('#xxxxxx')` with any hex value you chose yourself
- ❌ Using `colors.red`, `colors.blue`, or any ReportLab named color for design elements
- ❌ Skipping this step and picking colors "that look good"
- ❌ Using different palettes for different sections of the same document
**✅ The ONLY acceptable way to get colors:** Run `palette.generate`, copy the output.
### 2b. Color Application Rules
| Element | Color Source | Notes |
|---------|-------------|-------|
| Table headers | ACCENT (bg) + white (text) | High contrast required |
| Table odd rows | BG_SURFACE at 50% opacity | Subtle striping |
| Section titles | ACCENT or TEXT_PRIMARY | Match heading hierarchy |
| Body text | TEXT_PRIMARY | Never use pure #000000 |
| Muted/meta text | TEXT_MUTED | Dates, captions, footnotes |
| Horizontal rules | ACCENT at 30% opacity | Thin, unobtrusive |
| Chart colors | Derive from ACCENT (see charts.md) | Consistent with palette |
### 2c. Forbidden
- ❌ Hardcoding hex colors like `#1F4E79`, `#555555`, `#888888` in code
- ❌ Using `colors.red`, `colors.blue` or any ReportLab named color for design elements
- ❌ Choosing colors by "feel" without running palette first
- ❌ Using different accent colors across tables/sections in the same document
> **Exception**: Semantic colors for data visualization (green for positive, red for negative) are acceptable but should be muted variants derived from the palette when possible.
---
## Step 3: EDIT - Content Transformation
**Before writing any code, restructure raw text into visual roles.** This is the most impactful quality step - skipping it produces "Word document with a PDF extension".
### 3a. Typographic Role Extraction
Parse raw text and categorize content into visual roles:
| Role | ReportLab Mapping | Description |
|------|------------------|-------------|
| **Hero / Display** | Cover title (36-42pt) or section opener | 1-5 words, the emotional hook |
| **Kicker / Eyebrow** | Subtitle or section intro (10-12pt, muted) | Tiny context line above main heading |
| **Data Sculpture** | CalloutBox or bold stat block | Extract impactful numbers (97%, $4.2M, -45ms) |
| **Pull Quote** | BlockQuote (italic + left indent 24pt + accent border) | Most provocative sentence, standalone |
| **Body** | Standard paragraph | Everything else |
**Rule**: Before writing any body paragraph, scan for numbers with units. Every metric like "revenue grew by 12%" or "latency dropped to 45ms" MUST be extracted into a CalloutBox or metrics table - never buried in paragraph prose. See §Data-to-Ink Ratio Rules below.
### 3b. Section Pacing
- If any H1 section has **>400 words** of continuous body text without visual breaks, split into H2 subsections or insert a visual break (table, chart, callout)
- Aim for at least 1 visual element (table/chart/callout) per 2-3 pages of body text
- Dense text walls are the #1 sign of poor report design
---
---
## Character Safety Rule (MANDATORY)
Three rules for safe character handling in ReportLab PDFs:
**a) Superscripts and subscripts**: Use `<super>`/`<sub>` tags, never raw Unicode superscript/subscript characters (e.g., `\u00b2`, `\u2082`).
**b) Emoji**: ReportLab cannot render emoji. If content contains emoji, use Creative brief (HTML + Playwright).
**c) Font fallback for mixed CJK/Latin text**: After registering fonts, call `install_font_fallback()` once. This automatically wraps missing-glyph characters in `<font>` tags inside every `Paragraph()`. No manual `<font name="...">` wrapping needed for mixed Chinese-English text.
Mathematical/relational operators (×, ÷, ±, ≤, √, ∑, ≅, ∫, π, ∠, Δ, etc.) are safe to use as literal characters in `Paragraph()` - both SimHei and Times New Roman have these glyphs.
| Need | Correct Method | Correct Example |
|------|---------------|---------|
| Superscript | `<super>` tag in `Paragraph()` | `Paragraph('10<super>2</super> × 10<super>3</super> = 10<super>5</super>', style)` |
| Subscript | `<sub>` tag in `Paragraph()` | `Paragraph('H<sub>2</sub>O', style)` |
| Bold | `<b>` tag in `Paragraph()` | `Paragraph('<b>Title</b>', style)` |
| Math operators | Literal char in `Paragraph()` | `Paragraph('AB ⊥ AC, ∠A = 90°, and ΔABC ≅ ΔDCF', style)` |
| Scientific notation | Combined tags in `Paragraph()` | `Paragraph('1.2 × 10<super>8</super> kg/m<super>3</super>', style)` |
```python
from reportlab.platypus import Paragraph
from reportlab.lib.styles import ParagraphStyle
from reportlab.lib.enums import TA_LEFT, TA_CENTER, TA_JUSTIFY
body_style = ParagraphStyle(
name="ENBodyStyle",
fontName="Times New Roman",
fontSize=10.5,
leading=18,
alignment=TA_JUSTIFY,
)
# Superscript: area unit
Paragraph('Total area: 500 m<super>2</super>', body_style)
# Subscript: chemical formula
Paragraph('The reaction produces CO<sub>2</sub> and H<sub>2</sub>O', body_style)
# Scientific notation
Paragraph('Speed of light: 3.0 × 10<super>8</super> m/s', body_style)
# Combined
Paragraph('E<sub>k</sub> = mv<super>2</super>/2', body_style)
# Bold heading
Paragraph('<b>Chapter 1: Introduction</b>', header_style)
# Math symbols
Paragraph('When ∠ A = 90°, AB ⊥ AC and ΔABC ≅ ΔDEF', body_style)
```
**Pre-generation check - before writing ANY string, ask:**
> "Does this string contain a character outside basic CJK or Mathematical/relational operators?"
> If YES → it MUST be inside a `Paragraph()` with the appropriate tag.
> If it is a superscript/subscript digit in raw unicode escape form → REPLACE with `<super>`/`<sub>` tag.
**NEVER rely on post-generation scanning. Prevent at the point of writing.**
**Encoding safety - before writing ANY content text:**
> "Does this string contain Japanese kana (の, が, は etc.) or rare Unicode symbols?"
> If YES → REPLACE with safe plain Chinese equivalents. Japanese kana (Hiragana U+3040-U+309F, Katakana U+30A0-U+30FF) frequently corrupt to U+FFFD (<28>) when code passes through LLM output, heredoc, or terminal encoding layers.
> Common safe replacements: `活の真鲷`→`活缔真鲷`, `盐烤の鲭鱼`→`盐烤鲭鱼`, `烤の鸡串`→`炭烤鸡串`.
> If the character is genuinely needed, verify it survives a full write→read round-trip with `open(file, encoding='utf-8')`.
---
## Font Setup (Guaranteed Success Method)
### CRITICAL: Allowed Fonts Only
**You MUST ONLY use the following registered fonts. Using ANY other font is STRICTLY FORBIDDEN.**
| Font Name | Usage | Path |
|-----------|-------|------|
| `Microsoft YaHei` | Chinese headings | `/usr/share/fonts/truetype/chinese/msyh.ttf` |
| `SimHei` | Chinese body text | `/usr/share/fonts/truetype/chinese/SimHei.ttf` |
| `SarasaMonoSC` | Chinese code blocks | `/usr/share/fonts/truetype/chinese/SarasaMonoSC-Regular.ttf` |
| `Times New Roman` | English text, numbers, tables | `/usr/share/fonts/truetype/english/Times-New-Roman.ttf` |
| `Calibri` | English alternative | `/usr/share/fonts/truetype/english/calibri-regular.ttf` |
| `DejaVuSans` | Formulas, symbols, code | `/usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf` |
**FORBIDDEN fonts (DO NOT USE):**
- ❌ Arial, Arial-Bold, Arial-Italic
- ❌ Helvetica, Helvetica-Bold, Helvetica-Oblique
- ❌ Courier, Courier-Bold
- ❌ Any font not listed in the table above
### Font Registration Template
```python
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
from reportlab.pdfbase.pdfmetrics import registerFontFamily
# Chinese fonts
pdfmetrics.registerFont(TTFont('Microsoft YaHei', '/usr/share/fonts/truetype/chinese/msyh.ttf'))
pdfmetrics.registerFont(TTFont('SimHei', '/usr/share/fonts/truetype/chinese/SimHei.ttf'))
pdfmetrics.registerFont(TTFont("SarasaMonoSC", '/usr/share/fonts/truetype/chinese/SarasaMonoSC-Regular.ttf'))
# English fonts
pdfmetrics.registerFont(TTFont('Times New Roman', '/usr/share/fonts/truetype/english/Times-New-Roman.ttf'))
pdfmetrics.registerFont(TTFont('Calibri', '/usr/share/fonts/truetype/english/calibri-regular.ttf'))
# Symbol/Formula font
pdfmetrics.registerFont(TTFont("DejaVuSans", '/usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf'))
# CRITICAL: Register font families to enable <b>, <super>, <sub> tags
registerFontFamily('Microsoft YaHei', normal='Microsoft YaHei', bold='Microsoft YaHei')
registerFontFamily('SimHei', normal='SimHei', bold='SimHei')
registerFontFamily('Times New Roman', normal='Times New Roman', bold='Times New Roman')
registerFontFamily('Calibri', normal='Calibri', bold='Calibri')
registerFontFamily('DejaVuSans', normal='DejaVuSans', bold='DejaVuSans')
```
### Font Configuration by Document Type
**For Chinese PDFs:**
- Body text: `SimHei` or `Microsoft YaHei`
- Headings: `Microsoft YaHei` (MUST use for Chinese headings)
- Code blocks: `SarasaMonoSC`
- Formulas/symbols: `DejaVuSans`
- **In tables: ALL Chinese content and numbers MUST use `SimHei`**
**For English PDFs:**
- Body text: `Times New Roman`
- Headings: `Times New Roman` (MUST use for English headings)
- Code blocks: `DejaVuSans`
- **In tables: ALL English content and numbers MUST use `Times New Roman`**
**For Mixed Chinese-English PDFs:**
- Call `install_font_fallback()` once after registering fonts - it automatically wraps characters in `<font>` tags so you don't need to do it manually.
- If you still want manual control, you can use `<font name='...'>` tags, but the automatic fallback handles most cases.
```python
import sys, os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'scripts'))
from pdf import install_font_fallback
# 1. Register fonts (as above)
# 2. Install fallback - one line, handles all mixed-language text
install_font_fallback()
# 3. Just write naturally - no manual <font> wrapping needed!
en_style = ParagraphStyle(name="EN", fontName="Times New Roman", fontSize=10.5, leading=18)
cn_style = ParagraphStyle(name="CN", fontName="SimHei", fontSize=10.5, leading=18)
Paragraph('MySQL、PostgreSQL、Redis', en_style) # ✅ CJK comma auto-fallback to SimHei
Paragraph('Beijing (北京) has 21M people', en_style) # ✅ CJK chars auto-fallback
Paragraph('价格:¥2,200~5,800/月', en_style) # ✅ all CJK chars handled
Paragraph('《巴黎协定》签署于2015年', en_style) # ✅ CJK book title marks handled
```
**How it works:** `install_font_fallback()` monkey-patches `Paragraph.__init__` to scan each character against the font's `charToGlyph` table. Characters missing from the base font are automatically wrapped in `<font name="FallbackFont">`. The fallback chain is: English fonts → SimHei, Chinese fonts → Times New Roman. For aesthetic optimization, Cyrillic text in SimHei is automatically routed to Times New Roman (serif looks better for Cyrillic).
### Chinese Plot PNG Method
```python
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
```
### Available Font Paths
Run `fc-list` to get more fonts. Font files are typically located under:
- `/usr/share/fonts/truetype/chinese/`
- `/usr/share/fonts/truetype/english/`
- `/usr/share/fonts/`
---
## Layout & Spacing Control
### Page Breaks
Follow the document type strategy defined in SOUL.md Rule 1.
**Structural breaks (always - MANDATORY):**
- Between cover page and TOC - **cover must NEVER share a page with anything else**
- Between TOC and main content
- Between main content and back cover
**Content breaks (by document type):**
- **Default (all document types)**: ❗ **Absolutely NEVER force page breaks before H1/H2** — content flows naturally, do not insert `PageBreak()` before section headings. This is an iron rule.
- Resume / contract / letter → No content page breaks
- Short article → No content page breaks
- **Exception 1**: Academic papers / textbooks / teaching plans → `PageBreak()` before each H1 is acceptable (only if user explicitly requests academic formatting)
- **Exception 2**: User explicitly requests page breaks between chapters
**Anti-tear (mandatory — but with height limit):**
```python
from reportlab.platypus import KeepTogether
from reportlab.lib.pagesizes import A4
from reportlab.lib.units import inch
# Maximum height for KeepTogether blocks: 40% of page height
# Blocks taller than this should NOT be kept together — they cause
# the preceding page to be mostly empty when the block gets pushed down.
MAX_KEEP_HEIGHT = A4[1] * 0.4 # ~336pt ≈ 12cm
def safe_keep_together(elements):
"""Wrap elements in KeepTogether only if their total height is reasonable.
If too tall, keep only the FIRST TWO elements together (e.g. title + table/image header)
to prevent title-content separation, while letting the rest paginate naturally.
"""
total_h = 0
for el in elements:
w, h = el.wrap(A4[0] - 2*inch, A4[1])
total_h += h
if total_h <= MAX_KEEP_HEIGHT:
return [KeepTogether(elements)]
elif len(elements) >= 2:
# Keep at least title + first element together to prevent title orphaning
return [KeepTogether(elements[:2])] + list(elements[2:])
else:
return list(elements)
# Heading stays with first paragraph
story.extend(safe_keep_together([
heading_paragraph,
first_body_paragraph,
]))
# Image + caption: keep together ONLY if image is small
story.extend(safe_keep_together([image, caption_paragraph]))
# Table title + table: keep together ONLY if table is short
if len(table_data) <= 15:
story.extend(safe_keep_together([table_title, table]))
else:
# Long table: just keep title with the table start
story.extend(safe_keep_together([table_title]))
story.append(table)
```
**⚠️ KEY INSIGHT:** `KeepTogether` is the #1 cause of “empty page with chart on next page”.
When a `KeepTogether` block is taller than the remaining space on the current page,
ReportLab pushes the **entire block** to the next page, leaving the current page mostly empty.
**Always use `safe_keep_together()` to cap the maximum block height.**
**⚠️ Note:** Do NOT use `KeepWithNext` - it is unreliable in ReportLab 4.x.
### Vertical Spacing Standards
* Before tables: `Spacer(1, 18)` after preceding text
* After tables: `Spacer(1, 6)` before table caption
* After table captions: `Spacer(1, 18)` before next content
* Between paragraphs: `Spacer(1, 12)` (~1 line)
* Between H3 subsections: `Spacer(1, 12)`
* Between H2 sections: `Spacer(1, 18)` (~1.5 lines)
* Between H1 sections: `Spacer(1, 24)` (~2 lines)
* NEVER use `Spacer(1, X)` where X > 24, except for H1 major breaks or cover page elements
### Cover Page: HTML/Playwright Unified Cover System
**⚠️ Report route covers are ALWAYS generated via HTML/Playwright**, using the same 7-template system defined in `typesetting/cover.md`. This ensures visual consistency across all routes (Report, Creative, Academic) and avoids the limitations of ReportLab for complex cover layouts.
**Pipeline (Report route):**
1. Generate **body PDF** via ReportLab (start with TOC or content - **no cover in story[]**)
2. Generate **cover HTML** following `typesetting/cover.md` 7-template system
3. **Run `poster_validate.py check-html` on cover HTML** — fix any ERRORs before rendering (overflow:hidden, font fallback, etc.)
4. **Run `cover_validate.js` on cover HTML** — detects text-vs-decorative-line overlaps. Non-zero exit = must fix before proceeding.
```bash
node "$PDF_SKILL_DIR/scripts/cover_validate.js" cover.html
```
5. Render cover HTML → single-page PDF via Playwright (`html2poster.js`) — **NOT `html2pdf-next.js`** (which converts absolute→static and destroys cover layout)
6. **Merge: insert cover as page 0** of body PDF using pypdf → output single final PDF
> **Why not ReportLab covers?** ReportLab is excellent for structured content (tables, paragraphs, flowables) but painful for visual design (geometric accents, precise absolute positioning, web fonts). HTML/CSS handles these natively. One cover system, one visual standard, zero inconsistency.
**⚠️ Cover page is DEFAULT ON for all Report-route documents ≥ 3 pages.**
The Report route handles formal, structured documents. Readers expect a professional cover. **Always generate a cover unless the document type is explicitly excluded below.**
**ALWAYS add a cover page (default behavior):**
- Research reports, experiment reports, lab reports, analysis reports
- White papers, industry analysis, market research
- Business proposals, project plans, feasibility studies
- Technical documentation, product manuals, user guides
- Annual reports, financial summaries, quarterly reviews
- Any formal document ≥ 3 pages that will be shared externally or submitted
**NEVER add a cover page (explicit exclusions only):**
- **Resumes / CVs** - recruiters want content immediately
- **Letters / memos** - single-page or short-form
- **Forms / checklists / invoices** - functional documents
- **Internal notes / meeting minutes** - brevity over presentation
- **Documents ≤ 2 pages** - cover would be disproportionate
- **User explicitly said "no cover"**
**Rule of thumb:** If it's a report, analysis, proposal, or any formal document ≥ 3 pages → **add a cover**. When in doubt, add the cover. It's easier to remove a cover the user didn't want than to miss one they expected.
#### Step 1: Generate Body PDF (ReportLab, no cover)
Build the ReportLab document starting directly with TOC or first section. **Do NOT include any cover page in the `story[]` list.**
```python
# Body PDF - starts with TOC or first section, NO cover
story = []
# story.append(toc) # if applicable
# story.append(PageBreak())
# story.append(first_section_content)
# ...
doc.build(story)
# Output: body.pdf (no cover)
```
#### Step 2: Generate Cover PDF via html2poster.js
```bash
# ALWAYS use html2poster.js for cover rendering (NOT html2pdf-next.js)
# Cover pages use position:absolute layout — html2pdf-next.js pre-render hooks
# convert absolute→static and destroy the layout. html2poster.js preserves it.
node "$PDF_SKILL_DIR/scripts/html2poster.js" cover.html --output cover.pdf --width 794px
```
Or from Python:
```python
import subprocess, os
def render_cover(html_path, pdf_path):
"""Render HTML cover to PDF via html2poster.js.
⚠️ ALWAYS use html2poster.js for covers (NOT html2pdf-next.js).
Cover HTML uses position:absolute for layout. html2pdf-next.js pre-render
hooks convert absolute→static to prevent multi-page overlap, which
destroys cover layouts. html2poster.js preserves absolute positioning.
"""
scripts_dir = os.path.expanduser('~/.openclaw/workspace/skills/pdf/scripts')
subprocess.run([
'node', os.path.join(scripts_dir, 'html2poster.js'),
html_path, '--output', pdf_path,
'--width', '794px',
], check=True)
```
**Insertion script (single output PDF):**
```python
from pypdf import PdfReader, PdfWriter, Transformation
A4_W, A4_H = 595.28, 841.89 # A4 in points
def normalize_page_to_a4(page):
"""Scale a page to A4 if its dimensions don't match."""
box = page.mediabox
w, h = float(box.width), float(box.height)
if abs(w - A4_W) > 2 or abs(h - A4_H) > 2:
sx, sy = A4_W / w, A4_H / h
page.add_transformation(Transformation().scale(sx=sx, sy=sy))
page.mediabox.lower_left = (0, 0)
page.mediabox.upper_right = (A4_W, A4_H)
return page
def insert_cover(cover_pdf, body_pdf, output_pdf):
"""Insert cover as first page of body PDF → single output file."""
writer = PdfWriter()
# Cover as page 1
cover_page = PdfReader(cover_pdf).pages[0]
writer.add_page(normalize_page_to_a4(cover_page))
# Body pages follow
for page in PdfReader(body_pdf).pages:
writer.add_page(normalize_page_to_a4(page))
writer.add_metadata({'/Title': 'Report Title', '/Author': 'Z.ai', '/Creator': 'Z.ai'})
with open(output_pdf, 'wb') as f:
writer.write(f)
```
> **⚠️ Size pitfall:** Playwright `page.pdf(width='794px', height='1123px')` output PDF dimensions may differ from A4 (595.28×841.89pt) by 1-2pt. **Do not use PIL Image.save('x.pdf')** for PNG→PDF conversion — DPI mapping causes severe size errors. You must call `normalize_page_to_a4()` to unify dimensions before insertion.
**Cover HTML template reference**: See `typesetting/cover.md` for the complete 7-Layout System with variants, typography scale, and bounding box rules.
**→ Full cover spec: `typesetting/cover.md`** - read it before designing any cover.
**Cover design rules (summary - see cover.md for details):**
- Page size: `width: 794px; height: 1123px` (A4 at 96dpi)
- `body { margin: 0; padding: 0; overflow: hidden; }` - REQUIRED to avoid white borders
- Load Google Fonts via `<link href="..." rel="stylesheet">` in `<head>` (NOT `@import url(...)` in CSS — `@import` must be first rule or is silently ignored) — Playwright fetches them at render time
- **Background must be white or very light** (`#fff` / `#fafafa` / `#f5f8fb`), no dark solid backgrounds or gradient backgrounds
- Primary color used only for text, lines, geometric accents - never as large-area background
- Sophistication = whitespace + typography + restrained geometric accents
**Cover Constitution (7-Layout System):**
- **Pick a layout (1-7)** from `typesetting/cover.md` that matches the document tone. No global default - every selection must be a deliberate design decision. Layout 7 for government/bidding/proposal documents.
- **Maximum 4 components** on any cover. Typical recipe: Title + subtitle + 1 geometric accent + metadata.
- **Typography Scale**: Title ≈ 45pt, Subtitle ≈ 25pt, Meta ≥ 18pt (never below 14pt). Tiny text = FAIL.
- **Background layer (optional)**: See `typesetting/cover-backgrounds.md` for 3 recipes (A=极简弧线, B=工程十字轴+立柱, C=锐角切割+出血文字). Background renders BELOW all foreground content at 2-5% opacity.
- **Mandatory `<br>` chunking**: Title MUST break every 2-4 words (CJK) or 3-5 words (English). Single-line title = FAIL.
- **Bounding Box spatial dispersion**: Group elements into 2-3 bounding boxes at opposite regions (e.g., top-left + bottom-right). Never cluster everything into middle 40%.
- **Safe Zone**: 12% top/bottom, 14% left/right padding on cover pages.
- **No body text on cover**: Cover = title + subtitle + date + optional geometric decoration. All content goes to page 2+.
### ⚠️ Cover Page Isolation Rule (MANDATORY)
**The cover page must ALWAYS be on its own page.** Cover content and subsequent content (TOC, body text, etc.) must NEVER appear on the same page. This is a hard rule.
- **Report route (HTML/Playwright cover)**: The cover is a separate rendered page inserted as page 0 via pypdf - isolation is inherent in the merge pipeline
- **Creative route**: Cover is part of the same HTML document but on its own `<section>` with page-break - isolation is handled by CSS
If a generated PDF shows cover + TOC on the same page, it is a **critical bug** - regenerate immediately.
### Alignment and Typography
- **CJK body**: Use `TA_LEFT` + 2-char indent. Headings: no indent.
- **Font sizes**: Body 11pt, subheadings 14pt, headings 18-20pt
- **Line height**: 1.5-1.6 (leading at **1.4x font size minimum**, recommended 1.5x for CJK)
- CJK characters are taller than Latin - 1.2x causes cramped lines
- Quick reference: 10pt→14pt min/18pt rec; 14pt→20pt min/22pt rec; 36pt→50pt min/54pt rec
- **⛔ Prohibited: fixed `rowHeights` in Table()**. Use `TOPPADDING` / `BOTTOMPADDING` to control row spacing. Fixed rowHeights cause content overflow clipping - the text renders but gets invisibly cut off.
- **CRITICAL: Alignment Selection Rule**:
- Use `TA_JUSTIFY` only when ALL of:
* Text is predominantly English (≥ 90%)
* Column width is sufficiently wide (A4 single-column body)
* Font: Western fonts (Times New Roman / Calibri)
* Chinese content: None or negligible
- Otherwise, always default to `TA_LEFT`
- For Chinese text, always add `wordWrap='CJK'` to ParagraphStyle
### Style Configuration
* Normal paragraph: `spaceBefore=0`, `spaceAfter=6-12`
* Headings: `spaceBefore=12-18`, `spaceAfter=6-12`
* **Headings must be bold**: Use `<b></b>` tags in Paragraph
* Table captions: `spaceBefore=3`, `spaceAfter=6`, `alignment=TA_CENTER`
* **CRITICAL**: For Chinese text, always add `wordWrap='CJK'` to ParagraphStyle
---
## Table Formatting
### Standard Table Color Scheme (MUST USE for ALL tables)
> **Colors MUST come from the palette generated in Step 2.** The values below are examples - replace them with your actual palette output.
```python
# Colors from design_engine.py palette (Step 2) - NEVER hardcode these
TABLE_HEADER_COLOR = ACCENT # From palette --c-accent
TABLE_HEADER_TEXT = colors.white # White text for header (fixed)
TABLE_ROW_EVEN = colors.white # White for even rows
TABLE_ROW_ODD = BG_SURFACE # From palette --c-mid (light stripe)
```
### Table Rules
- Caption must be centered, added immediately after the table
- Entire table must be centered on the page
- **Header Row**: Dark background (from palette `header_fill`), white bold text
- **Cell Margins**: Left/Right at least 120-200 twips
- **Alignment**: Each body element within the same table must be aligned the same method
- **Color consistency**: All tables in one PDF must use the same color scheme
- **Spacing**: `Spacer(1, 18)` → Table → `Spacer(1, 6)` → Caption → `Spacer(1, 18)`
### Table Centering Enforcement (MANDATORY)
**Every table MUST be horizontally centered on the page. No exceptions.**
```python
# ReportLab: ALWAYS set hAlign='CENTER'
table = Table(data, colWidths=col_widths, hAlign='CENTER')
# Width rule: table total width should be 85-100% of available content width
available_width = doc.width # or page_width - left_margin - right_margin
table_width = sum(col_widths)
if table_width < available_width * 0.85:
# Scale up columns proportionally to fill space
scale = (available_width * 0.90) / table_width
col_widths = [w * scale for w in col_widths]
# ❌ WRONG — no hAlign (defaults to LEFT, table drifts left)
table = Table(data, colWidths=[100, 200, 150])
# ✅ RIGHT — explicit centering
table = Table(data, colWidths=[100, 200, 150], hAlign='CENTER')
```
**LaTeX equivalent:**
```latex
\begin{table}[htbp]
\centering % MANDATORY
\begin{tabular}{...}
...
\end{tabular}
\end{table}
```
### Table Cell Content Rule (MANDATORY - NON-NEGOTIABLE)
**ALL text content in table cells MUST be wrapped in `Paragraph()`. NO EXCEPTIONS.**
❌ **PROHIBITED** - Plain strings:
```python
data = [
['<b>Header</b>', 'Value'], # Bold won't render!
['Pressure', '1.01 × 10<super>5</super>'], # Superscript won't work!
]
```
✅ **REQUIRED** - All text in Paragraph:
```python
data = [
[Paragraph('<b>Header</b>', header_style), Paragraph('Value', header_style)],
[Paragraph('Pressure', cell_style), Paragraph('1.01 × 10<super>5</super>', cell_style)],
]
```
**The ONLY exception**: `Image()` objects can be placed directly in table cells.
### Units with Exponents (CRITICAL)
- PROHIBITED: `W/m2`, `kg/m3`, `m/s2` (plain text exponents)
- RIGHT: `Paragraph('W/m<super>2</super>', style)`, `Paragraph('kg/m<super>3</super>', style)`
### Numeric Values in Tables (CRITICAL)
- Large numbers MUST use scientific notation: `Paragraph('-1.246 × 10<super>8</super>', style)` not `-124600000`
- Small decimals MUST use scientific notation: `Paragraph('2.5 × 10<super>-3</super>', style)` not `0.0025`
- Threshold: Use scientific notation when |value| ≥ 10000 or |value| ≤ 0.001
### Complete Table Example
```python
from reportlab.platypus import Table, TableStyle, Paragraph, Image
from reportlab.lib.styles import ParagraphStyle
from reportlab.lib import colors
from reportlab.lib.enums import TA_CENTER, TA_LEFT, TA_RIGHT, TA_JUSTIFY
header_style = ParagraphStyle(
name='TableHeader', fontName='Times New Roman', fontSize=11,
textColor=colors.white, alignment=TA_CENTER
)
cell_style = ParagraphStyle(
name='TableCell', fontName='Times New Roman', fontSize=10,
textColor=colors.black, alignment=TA_CENTER
)
data = [
[Paragraph('<b>Parameter</b>', header_style),
Paragraph('<b>Unit</b>', header_style),
Paragraph('<b>Value</b>', header_style)],
[Paragraph('Temperature', cell_style),
Paragraph('°C', cell_style),
Paragraph('25.5', cell_style)],
[Paragraph('Pressure', cell_style),
Paragraph('Pa', cell_style),
Paragraph('1.01 × 10<super>5</super>', cell_style)],
[Paragraph('Density', cell_style),
Paragraph('kg/m<super>3</super>', cell_style),
Paragraph('1.225', cell_style)],
]
table = Table(data, colWidths=[120, 80, 100])
# ⚠️ Above uses hardcoded widths - OK for this 3-col example (120+80+100=300 < available ~460pt).
# For real tables, ALWAYS calculate from available_width. See "Table Width Management" below.
table.setStyle(TableStyle([
('BACKGROUND', (0, 0), (-1, 0), ACCENT), # Header: palette accent
('TEXTCOLOR', (0, 0), (-1, 0), colors.white),
('BACKGROUND', (0, 1), (-1, 1), colors.white),
('BACKGROUND', (0, 2), (-1, 2), BG_SURFACE), # Odd row: palette surface
('BACKGROUND', (0, 3), (-1, 3), colors.white),
('GRID', (0, 0), (-1, -1), 0.5, TEXT_MUTED), # Grid: palette muted
('VALIGN', (0, 0), (-1, -1), 'MIDDLE'),
('LEFTPADDING', (0, 0), (-1, -1), 8),
('RIGHTPADDING', (0, 0), (-1, -1), 8),
('TOPPADDING', (0, 0), (-1, -1), 6),
('BOTTOMPADDING', (0, 0), (-1, -1), 6),
]))
```
### Table Width Management (⚠️ Prevent Table Overflow)
**Most common table bug: columns overflow the right page margin because colWidths are hardcoded without checking available space.**
**Iron Rule: The sum of all colWidths MUST NOT exceed `available_width`.**
```python
from reportlab.lib.pagesizes import A4
from reportlab.lib.units import inch
page_width = A4[0] # 595.28pt
left_margin = 1.0 * inch # 72pt (adjust to your doc's actual margins)
right_margin = 1.0 * inch # 72pt
available_width = page_width - left_margin - right_margin # ≈ 451pt
# ─── Method 1: Proportional widths (RECOMMENDED) ───
# Define column ratios, auto-scale to fit available_width
col_ratios = [0.25, 0.40, 0.20, 0.15] # must sum to 1.0
col_widths = [r * available_width for r in col_ratios]
table = Table(data, colWidths=col_widths)
# ─── Method 2: Fixed + flex columns ───
# Some columns have fixed width (numbers, dates), rest fills remaining space
fixed_cols = {0: 80, 3: 60} # col 0 = 80pt, col 3 = 60pt
fixed_total = sum(fixed_cols.values())
flex_count = 4 - len(fixed_cols) # total columns minus fixed
flex_width = (available_width - fixed_total) / flex_count
col_widths = []
for i in range(4):
col_widths.append(fixed_cols.get(i, flex_width))
table = Table(data, colWidths=col_widths)
# ─── Method 3: Let ReportLab auto-calculate (simplest, less control) ───
# Pass colWidths=None - ReportLab auto-sizes based on content
# ⚠️ Risk: may still overflow if content is too wide
table = Table(data) # no colWidths
```
**Checklist before adding any table:**
1. Calculate `available_width` from your actual page size and margins
2. Sum all colWidths - must be ≤ `available_width`
3. For CJK text: characters are wider than Latin - budget ~12pt per CJK char at 10pt font
4. If a column has long text (descriptions, policies), use Paragraph() wrapping inside cells (not plain strings) - plain strings don't wrap and will overflow
5. Test with the longest expected content in each column
**Common mistake - plain strings don't wrap:**
```python
# ❌ Plain string: will overflow if text is long
data = [['Policy Name', 'Very long description that keeps going and going']]
# ✅ Paragraph: wraps within column width
from reportlab.platypus import Paragraph
from reportlab.lib.styles import getSampleStyleSheet
styles = getSampleStyleSheet()
cell_style = styles['Normal']
data = [[Paragraph('Policy Name', cell_style),
Paragraph('Very long description that keeps going and going', cell_style)]]
```
```
Do you need auto-TOC?
├─ YES → Use TocDocTemplate + doc.multiBuild(story)
└─ NO → Use SimpleDocTemplate + doc.build(story)
```
| Requirement | DocTemplate | Build Method |
|-------------|-------------|--------------|
| Multi-page with TOC | `TocDocTemplate` | `multiBuild()` |
| Single-page or no TOC | `SimpleDocTemplate` | `build()` |
| With Cross-References (no TOC) | `SimpleDocTemplate` | `build()` |
| Both TOC + Cross-References | `TocDocTemplate` | `multiBuild()` |
**⚠️ CRITICAL**:
- `multiBuild()` is ONLY needed when using `TableOfContents`
- Using `build()` with `TocDocTemplate` = TOC won't work
- Using `multiBuild()` without `TocDocTemplate` = unnecessary overhead
---
## Rich Text Formatting
### Prerequisites
To use `<b>`, `<super>`, `<sub>` tags, you **must**:
1. Register fonts via `registerFont()`
2. Call `registerFontFamily()` to link normal/bold/italic variants
3. Wrap all tagged text in `Paragraph()` objects
**CRITICAL**: These tags ONLY work inside `Paragraph()` objects. Plain strings will NOT render them.
### Preventing Unwanted Line Breaks
```python
# Use non-breaking space to prevent breaking
text = Paragraph("Professors (K.G.\u00A0Palepu) proposed...", style)
# Use wordWrap='CJK' for proper Chinese typography
styles.add(ParagraphStyle(name='BodyStyle', fontName='SimHei', fontSize=10.5,
leading=18, alignment=TA_LEFT, wordWrap='CJK'))
# Use <br/> for intentional line breaks (NOT \n)
text = Paragraph("Line 1<br/>Line 2<br/>Line 3", style)
```
---
## Auto-Generated Table of Contents
### ❌ FORBIDDEN: Manual Table of Contents
**NEVER manually create TOC with hardcoded page numbers.**
### ⚠️ MANDATORY: TOC requires a cover page
**Unless the user explicitly requests no cover, if a document has a Table of Contents, it MUST have a cover page.** Structure: Cover (page 1) → TOC (page 2) → Content (page 3+). Do not generate a TOC without a preceding cover page.
### ✅ ALWAYS use auto-generated TOC:
```python
from reportlab.platypus import SimpleDocTemplate, Paragraph, PageBreak, Spacer
from reportlab.platypus.tableofcontents import TableOfContents
from reportlab.lib.utils import simpleSplit
import hashlib
class TocDocTemplate(SimpleDocTemplate):
def afterFlowable(self, flowable):
if hasattr(flowable, 'bookmark_name'):
level = getattr(flowable, 'bookmark_level', 0)
text = getattr(flowable, 'bookmark_text', '')
key = getattr(flowable, 'bookmark_key', '')
# MUST pass key as 4th element - without it, TOC entries won't have clickable links
self.notify('TOCEntry', (level, text, self.page, key))
doc = TocDocTemplate("document.pdf", pagesize=letter)
story = []
toc = TableOfContents()
toc.levelStyles = [
ParagraphStyle(name='TOCHeading1', fontSize=14, leftIndent=20, fontName='Times New Roman'),
ParagraphStyle(name='TOCHeading2', fontSize=12, leftIndent=40, fontName='Times New Roman'),
]
story.append(Paragraph("<b>Table of Contents</b>", styles['Title']))
story.append(toc)
story.append(PageBreak())
def add_heading(text, style, level=0):
# Generate a unique bookmark key for this heading
key = 'h_%s' % hashlib.md5(text.encode()).hexdigest()[:8]
p = Paragraph('<a name="%s"/>%s' % (key, text), style)
p.bookmark_name = text
p.bookmark_level = level
p.bookmark_text = text
p.bookmark_key = key # This key enables clickable TOC links
return p
# ⚠️ Orphan Heading Prevention (NOT a page break rule)
# If an H1 heading would appear in the bottom 15% of the page with no room
# for at least a few lines of content, push it to the next page.
# This is NOT "every H1 starts a new page" — it only triggers when the heading
# would be orphaned at the very bottom.
available_height = A4[1] - top_margin - bottom_margin
H1_ORPHAN_THRESHOLD = available_height * 0.15 # ~100pt ≈ 3.5cm — enough for heading + 2 lines
def add_major_section(text, style):
"""Add an H1-level heading with orphan prevention (NOT forced page break)."""
return [
CondPageBreak(H1_ORPHAN_THRESHOLD), # Only break if heading would be orphaned at page bottom
add_heading(text, style, level=0),
]
story.extend(add_major_section("Chapter 1: Introduction", styles['Heading1']))
# ... content ...
doc.multiBuild(story) # MUST use multiBuild for TOC
```
**⚠️ CRITICAL TOC LINK RULES:**
1. `afterFlowable` MUST pass `key` as the 4th element in the notify tuple - without it, TOC entries have no clickable links
2. `add_heading` MUST set `bookmark_key` AND embed `<a name="key"/>` in the Paragraph text - this creates the link destination
3. The key must be unique per heading - use a hash of the heading text
### TOC with Leader Dots (Copy-Paste Ready)
**⚠️ WARNING**: This manual approach creates a visual-only TOC. It has **NO clickable links** and **NO auto-updated page numbers**. Use the auto-generated TOC above (TocDocTemplate + multiBuild) whenever possible. Only use this leader-dots approach if you need very specific visual styling that the auto TOC cannot provide - and even then, prefer the auto approach.
For a professional TOC with leader dots connecting titles to page numbers:
```python
from reportlab.lib.pagesizes import A4
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph, PageBreak
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.lib import colors
from reportlab.lib.units import inch
# Setup
doc = SimpleDocTemplate("report.pdf", pagesize=A4,
leftMargin=0.75*inch, rightMargin=0.75*inch)
styles = getSampleStyleSheet()
styles['Heading1'].fontName = 'Times New Roman'
styles['Heading1'].textColor = colors.black
story = []
# Calculate dimensions
page_width = A4[0]
available_width = page_width - 1.5*inch
page_num_width = 50 # Fixed width for page numbers
dots_column_width = available_width - 200 - page_num_width
optimal_dot_count = int(dots_column_width / 4.5) # ~4.5pt per dot at 7pt font
# Define styles
toc_style = ParagraphStyle('TOCEntry', parent=styles['Normal'],
fontName='Times New Roman', fontSize=11, leading=16)
dots_style = ParagraphStyle('LeaderDots', parent=styles['Normal'],
fontName='Times New Roman', fontSize=7, leading=16)
# Build TOC
toc_data = [
[Paragraph('<b>Table of Contents</b>', styles['Heading1']), '', ''],
['', '', ''],
]
entries = [('Section 1', '5'), ('Section 2', '10')]
for title, page in entries:
toc_data.append([
Paragraph(title, toc_style),
Paragraph('.' * optimal_dot_count, dots_style),
Paragraph(page, toc_style)
])
toc_table = Table(toc_data, colWidths=[None, dots_column_width, page_num_width])
toc_table.setStyle(TableStyle([
('GRID', (0, 0), (-1, -1), 0, colors.white),
('LINEBELOW', (0, 0), (0, 0), 1.5, colors.black),
('ALIGN', (0, 0), (0, -1), 'LEFT'),
('ALIGN', (1, 0), (1, -1), 'LEFT'),
('ALIGN', (2, 0), (2, -1), 'RIGHT'),
('VALIGN', (0, 0), (-1, -1), 'TOP'),
('LEFTPADDING', (0, 0), (-1, -1), 0),
('RIGHTPADDING', (0, 0), (-1, -1), 0),
('TOPPADDING', (0, 2), (-1, -1), 3),
('BOTTOMPADDING', (0, 2), (-1, -1), 3),
('TEXTCOLOR', (1, 2), (1, -1), TEXT_MUTED), # From palette --c-muted
]))
story.append(toc_table)
story.append(PageBreak())
doc.build(story)
```
**MANDATORY Leader Dots Rules:**
| Rule | Requirement |
|------|-------------|
| Column widths | ✅ MUST use fixed values. Percentage-based widths are **STRICTLY FORBIDDEN**. |
| Dot count | ✅ MUST calculate dynamically: `int(dots_column_width / 4.5)`. Hard-coded counts are **STRICTLY FORBIDDEN**. |
| Page number column | ✅ MUST be at least 40pt wide. |
| Dot font size | ✅ MUST NOT exceed 8pt. |
| Dot alignment | ✅ MUST be LEFT-aligned (visual flow from title). |
| Padding | ✅ MUST be exactly 0 between columns. |
| Bold text | ✅ MUST use `Paragraph('<b>Text</b>', style)`. Plain strings like `'<b>Text</b>'` are **STRICTLY FORBIDDEN**. |
| Indentation | Use leading spaces for hierarchy (e.g., `" 1.1 Subsection"`). |
### TOC Post-Generation Validation (MANDATORY)
After generating any PDF with a Table of Contents, run:
```bash
python3 "$PDF_SKILL_DIR/scripts/toc_validate.py" check-pdf output.pdf
```
If any errors are returned, fix the code and regenerate. Common issues:
- `TOC_ALL_SAME_PAGE` → You used `build()` instead of `multiBuild()`, so all page numbers are stuck at 1.
- `TOC_NO_ENTRIES` → Headings are missing `bookmark_name`/`bookmark_level` attributes.
- `TOC_PAGES_INVALID` → A TOC entry references a page beyond the document's total page count.
---
## Cross-References (Figures, Tables, Bibliography)
Pre-register all figures, tables, and references BEFORE using them in text.
```python
class CrossReferenceDocument:
def __init__(self):
self.figures = {}
self.tables = {}
self.refs = {}
self.figure_counter = 0
self.table_counter = 0
self.ref_counter = 0
def add_figure(self, name):
if name not in self.figures:
self.figure_counter += 1
self.figures[name] = self.figure_counter
return self.figures[name]
def add_table(self, name):
if name not in self.tables:
self.table_counter += 1
self.tables[name] = self.table_counter
return self.tables[name]
def add_reference(self, name):
if name not in self.refs:
self.ref_counter += 1
self.refs[name] = self.ref_counter
return self.refs[name]
```
---
## Image Handling
### Diagram Embedding Rules (MANDATORY)
**Figures are block-level Flowables.** Every image/diagram in ReportLab MUST be appended to `story` as a standalone `Image()` or wrapped in `KeepTogether([image, caption])`. Never simulate floating layouts by placing images inside Paragraph text or multi-column Table cells alongside body text.
**Complex diagrams (>12 nodes)**: Decompose into a ReportLab Table (details) + a simplified diagram image (overview). The diagram gives the big picture; the table carries precision. See SKILL.md "Complex Diagram Strategy".
**Diagram quality**: When generating flowcharts/diagrams for PDF embedding (via Playwright screenshot or ReportLab Drawing), follow the diagram content quality rules in SKILL.md § "Diagram Content Quality Rules". Key points: connectors must not pass through nodes, no high-saturation large fills, multi-arrow convergence must use merge pattern, font sizes must be legible at final embedded size (see context-specific minimums in SKILL.md).
**Cross-brief diagram pipeline**: For flowcharts/architecture diagrams, generate via **Playwright+CSS** (HTML nodes + CSS layout + SVG connectors), screenshot at 2× device scale factor for 300dpi print quality, then embed with `Image()`. See SKILL.md § "Diagram Generation Strategy" for the full pipeline.
**🚫 FORBIDDEN: TikZ standalone → PNG pipeline in Report brief.** Report uses ReportLab, not LaTeX. Going through TikZ requires a LaTeX compiler, adds compilation steps, and models frequently produce broken TikZ code. Use Playwright+CSS instead - it's native to the existing cover rendering engine.
```python
# Playwright+CSS diagram → PNG → ReportLab embedding
# 1. LLM generates diagram.html (CSS grid/flexbox nodes + SVG arrows)
# 2. Playwright screenshots at 2× scale (300dpi equivalent)
# 3. Embed:
from reportlab.platypus import Image
img = Image('diagram.png', width=450) # auto height via aspect ratio
story.append(img) # block-level Flowable
```
**⚠️ CRITICAL: Preserve aspect ratio - NEVER hardcode both width and height without reading actual image dimensions.**
**⚠️ CRITICAL: Always constrain to available space - NEVER insert at original size if it exceeds container width/height.**
Pie charts become ellipses, radar charts become diamonds, photos get stretched if you set arbitrary width/height.
**→ Full overflow prevention spec: `typesetting/overflow.md`** - read for the complete bounding box system.
```python
from PIL import Image as PILImage
from reportlab.platypus import Image
from reportlab.lib.pagesizes import A4
from reportlab.lib.units import inch
def embed_image(path: str, max_width: float = None, max_height: float = None) -> Image:
"""Embed image with preserved aspect ratio, constrained to container.
ALWAYS use this pattern. Never hardcode both width and height.
If max_width is None, defaults to available_width (page - margins).
"""
if max_width is None:
max_width = A4[0] - 2.0 * inch # Default: A4 width minus 1" margins each side
if max_height is None:
max_height = A4[1] * 0.35 # ~294pt ≈ 10cm — leaves room for caption + surrounding text on same page
pil_img = PILImage.open(path)
orig_w, orig_h = pil_img.size
# Scale to fit within BOTH max_width and max_height
ratio_w = max_width / orig_w if orig_w > max_width else 1.0
ratio_h = max_height / orig_h if orig_h > max_height else 1.0
ratio = min(ratio_w, ratio_h)
return Image(path, width=orig_w * ratio, height=orig_h * ratio)
# Usage:
available_width = A4[0] - left_margin - right_margin
img = embed_image('chart.png', max_width=available_width, max_height=300)
story.append(img)
```
```python
# ❌ NEVER do this - distorts the image:
img = Image('chart.png', width=400, height=250) # arbitrary height breaks aspect ratio
```
---
## PDF Metadata
```python
doc = SimpleDocTemplate(
pdf_filename,
pagesize=letter,
title=os.path.splitext(pdf_filename)[0],
author='Z.ai',
creator='Z.ai',
subject='Document purpose description'
)
```
### ⚠️ MANDATORY: Post-Generation Metadata
After `doc.build(story)` completes:
```bash
python3 "$PDF_SKILL_DIR/scripts/pdf.py" meta.brand output.pdf
```
### ⚠️ MANDATORY: Post-Generation Blank Page Cleanup
After metadata branding, remove any accidental blank pages:
```bash
python3 "$PDF_SKILL_DIR/scripts/pdf.py" pages.clean output.pdf -o output_clean.pdf
```
If blank pages were found, rename `output_clean.pdf` → `output.pdf`.
---
## MANDATORY: Post-Generation Code Sanitization
**After writing PDF generation code and BEFORE executing it**, sanitize using:
```bash
# Step 0: Set skill root path (see SKILL.md § Script Path Setup)
PDF_SKILL_DIR="<skill_directory>"
# Step 1: Write code to .py file
cat > generate_pdf.py << 'PYEOF'
# ... your PDF generation code ...
PYEOF
# Step 2: Sanitize (MUST run before execution)
python3 "$PDF_SKILL_DIR/scripts/pdf.py" code.sanitize generate_pdf.py
# Step 3: Execute
python3 generate_pdf.py
# Step 4: Add metadata (MUST run after PDF creation)
python3 "$PDF_SKILL_DIR/scripts/pdf.py" meta.brand output.pdf
# Step 5: Check for missing glyphs (RECOMMENDED)
python3 "$PDF_SKILL_DIR/scripts/pdf.py" font.check output.pdf
# Step 6: Check TOC quality (if document has TOC)
python3 "$PDF_SKILL_DIR/scripts/pdf.py" toc.check output.pdf
# Step 7: PDF quality assurance scan (MUST run after all other checks)
python3 "$PDF_SKILL_DIR/scripts/pdf_qa.py" output.pdf
```
**FORBIDDEN patterns:**
```bash
# ❌ PROHIBITED: python -c with inline code
python -c "from reportlab... doc.build(story)"
# ❌ PROHIBITED: heredoc without saving to file first
python3 << 'EOF'
from reportlab...
EOF
# ❌ PROHIBITED: executing .py without sanitizing
python generate_pdf.py # Missing sanitization!
```
---
## Debug Tips for Layout Issues
**→ Full overflow prevention spec: `typesetting/overflow.md`** - read it for the complete bounding box system, two-pass rendering, fallback strategies, and all three route implementations.
```python
from reportlab.platypus import HRFlowable
from reportlab.lib.colors import red
# Visualize spacing during development - insert between elements
story.append(table)
story.append(HRFlowable(width="100%", color=red, thickness=0.5, spaceBefore=0, spaceAfter=0))
story.append(Spacer(1, 6))
story.append(HRFlowable(width="100%", color=red, thickness=0.5, spaceBefore=0, spaceAfter=0))
story.append(caption)
# Red lines create visual markers to see actual spacing - remove before final build
```
---
## Resume / CV Template (ATS-Friendly)
Single-column, clean layout optimised for Applicant Tracking Systems (ATS). No graphics, no sidebars, no colour blocks - just well-structured text that parses perfectly by HR software.
**When to use this template:**
- Applying to corporate jobs through online portals
- Any scenario where the PDF will be machine-parsed before a human reads it
- When the recruiter explicitly asks for "a standard resume"
**When NOT to use (use Academic brief instead):**
- Creative/design industry positions → Academic brief (AltaCV-style)
- Academic CV with publications → Academic brief (Academic CV)
### Resume Design Rules
- **Target 1 page** unless user specifies otherwise
- **Margins**: `left=1.5cm, right=1.5cm, top=1.5cm, bottom=1.5cm`
- **No cover page** - content starts immediately
- **No TOC** - too short for table of contents
- **Font**: Times New Roman (English) or SimHei (Chinese)
- **Body font size**: 10-10.5pt; Name: 22-26pt; Section titles: 13-14pt
- **⚠️ Minimum font size: 12px (9pt) - HARD FLOOR.** No text in the entire resume may render smaller than 12px. This includes contact info, meta text, footnotes, and captions. Anything below 12px is unreadable in print and fails accessibility checks.
- **Section separator**: thin horizontal rule (`HRFlowable`)
- **Bullet style**: `` with tight spacing
### Resume Line-Break Rules (Language-Aware)
- **English**: Prefer breaking at word boundaries (spaces, hyphens). If a long word must be split to avoid excessive whitespace, break at a valid syllable boundary and insert a hyphen (`-`) - this is standard typographic practice (e.g., `experi-\nence`, `develop-\nment`). ReportLab supports `wordWrap='CJK'` only for CJK content; for English use default paragraph wrapping with `allowWidows=0, allowOrphans=0`.
- **Chinese/CJK**: Break allowed between any two CJK characters. Never break between a CJK character and its adjacent punctuation (、。,)》 etc. must stay with the preceding character).
- **Mixed content** (e.g., "Python 开发工程师"): Break at CJK boundaries or English word boundaries. Never split an English word in a CJK paragraph unless hyphenated.
- **Contact line**: Email, phone, location separated by `|` or `·`. Each segment must stay on one line - if too long, move to next line at the separator, not mid-segment.
- **Dates and ranges**: "Jan 2022 - Present" must stay as one unit. Never break a date range across lines.
### Resume Page-Fill Rules (Anti-Blank-Space)
- **Goal: Fill ≥85% of the page height.** Content should reach at least the bottom quarter of the page. A resume that stops at 60% height with blank space below = FAIL.
- **Adaptive spacing strategy** (apply in order until page is ≥85% filled):
1. Increase `spaceBefore` / `spaceAfter` on section headers (from 10pt up to 18pt)
2. Increase `leading` (line height) on body text (from 14pt up to 18pt)
3. Increase body `fontSize` by 0.5-1pt (from 10pt up to 11.5pt max)
4. Add a "Professional Summary" or "Key Achievements" section if none exists
5. Increase margins slightly (from 1.5cm up to 2cm) to reduce line width and push content downward
- **Never leave a visible blank area > 3cm at the bottom of the page.**
- **If content overflows to page 2**: do the reverse - reduce spacing, tighten leading (min 12pt), reduce fontSize (min 9pt / 12px), before removing content.
### Complete Resume Template
```python
from reportlab.lib.pagesizes import A4
from reportlab.lib.units import cm, mm
from reportlab.lib.styles import ParagraphStyle
from reportlab.lib.enums import TA_LEFT, TA_CENTER
from reportlab.lib import colors
from reportlab.platypus import (
SimpleDocTemplate, Paragraph, Spacer, HRFlowable, Table, TableStyle
)
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
from reportlab.pdfbase.pdfmetrics import registerFontFamily
# ── Font Registration ──
pdfmetrics.registerFont(TTFont('TimesNewRoman', '/usr/share/fonts/truetype/english/Times-New-Roman.ttf'))
registerFontFamily('TimesNewRoman', normal='TimesNewRoman', bold='TimesNewRoman')
# For Chinese resumes, also register SimHei:
# pdfmetrics.registerFont(TTFont('SimHei', '/usr/share/fonts/truetype/chinese/SimHei.ttf'))
# registerFontFamily('SimHei', normal='SimHei', bold='SimHei')
# ── Styles ──
# ACCENT must come from palette.generate (Step 2)
# Run: python3 "$PDF_SKILL_DIR/scripts/pdf.py" palette.generate --title "Resume" --mode minimal
ACCENT = colors.HexColor('<accent from palette>') # Replace with palette output
name_style = ParagraphStyle(
'ResumeName', fontName='TimesNewRoman', fontSize=24,
leading=28, alignment=TA_CENTER, spaceAfter=2
)
contact_style = ParagraphStyle(
'ResumeContact', fontName='TimesNewRoman', fontSize=10,
leading=14, alignment=TA_CENTER, textColor=TEXT_MUTED, # From palette --c-muted
spaceAfter=8
)
section_title_style = ParagraphStyle(
'ResumeSectionTitle', fontName='TimesNewRoman', fontSize=13,
leading=16, spaceBefore=10, spaceAfter=4,
textColor=ACCENT
)
job_title_style = ParagraphStyle(
'ResumeJobTitle', fontName='TimesNewRoman', fontSize=11,
leading=14, spaceAfter=1
)
job_meta_style = ParagraphStyle(
'ResumeJobMeta', fontName='TimesNewRoman', fontSize=10,
leading=13, textColor=TEXT_MUTED, spaceAfter=4 # From palette --c-muted
)
bullet_style = ParagraphStyle(
'ResumeBullet', fontName='TimesNewRoman', fontSize=10,
leading=14, leftIndent=14, bulletIndent=0,
spaceBefore=1, spaceAfter=1
)
body_style = ParagraphStyle(
'ResumeBody', fontName='TimesNewRoman', fontSize=10,
leading=14, spaceAfter=2
)
# ── Helpers ──
def section_header(title):
"""Section title + thin rule separator."""
return [
Paragraph(f'<b>{title}</b>', section_title_style),
HRFlowable(width='100%', thickness=0.8, color=ACCENT,
spaceBefore=0, spaceAfter=6),
]
def experience_entry(title, company, dates, location, bullets):
"""One work experience block."""
elements = [
Paragraph(f'<b>{title}</b>', job_title_style),
Paragraph(f'{company} | {dates} | {location}', job_meta_style),
]
for b in bullets:
elements.append(Paragraph(f'• {b}', bullet_style))
elements.append(Spacer(1, 4))
return elements
def education_entry(degree, school, dates, details=None):
"""One education block."""
elements = [
Paragraph(f'<b>{degree}</b>', job_title_style),
Paragraph(f'{school} | {dates}', job_meta_style),
]
if details:
elements.append(Paragraph(details, body_style))
elements.append(Spacer(1, 4))
return elements
def skills_row(categories):
"""
Skills as compact label: value pairs.
categories = [('Programming', 'Python, Java, C++'), ('Tools', 'Git, Docker')]
"""
elements = []
for cat, vals in categories:
elements.append(Paragraph(f'<b>{cat}:</b> {vals}', body_style))
return elements
# ── Build Document ──
doc = SimpleDocTemplate(
'resume.pdf', pagesize=A4,
leftMargin=1.5*cm, rightMargin=1.5*cm,
topMargin=1.5*cm, bottomMargin=1.5*cm,
title='Resume - Your Name',
author='Z.ai', creator='Z.ai'
)
story = []
# Header
story.append(Paragraph('<b>YOUR NAME</b>', name_style))
story.append(Paragraph(
'email@example.com | +86 138-0000-0000 | Shanghai, China | github.com/yourname',
contact_style
))
# Summary
story.extend(section_header('PROFESSIONAL SUMMARY'))
story.append(Paragraph(
'Results-driven software engineer with 5+ years of experience in backend systems, '
'distributed computing, and cloud-native architectures. Led a team of 8 engineers '
'delivering a real-time data pipeline processing 2M+ events/sec.',
body_style
))
# Experience
story.extend(section_header('WORK EXPERIENCE'))
story.extend(experience_entry(
'Senior Software Engineer', 'Tech Company Inc.', 'Jan 2022 - Present', 'Shanghai',
[
'Designed and deployed a microservices architecture serving 10M daily active users',
'Reduced API latency by 40% through query optimisation and caching strategies',
'Mentored 3 junior engineers; established code review standards adopted team-wide',
]
))
story.extend(experience_entry(
'Software Engineer', 'Startup Co.', 'Jul 2019 - Dec 2021', 'Beijing',
[
'Built real-time recommendation engine using collaborative filtering (CTR +25%)',
'Implemented CI/CD pipeline reducing deployment time from 2 hours to 15 minutes',
]
))
# Education
story.extend(section_header('EDUCATION'))
story.extend(education_entry(
'M.Sc. Computer Science', 'Tsinghua University', '2017 - 2019',
'GPA: 3.8/4.0 | Thesis: Distributed Graph Processing on Heterogeneous Clusters'
))
story.extend(education_entry(
'B.Eng. Software Engineering', 'Zhejiang University', '2013 - 2017'
))
# Skills
story.extend(section_header('SKILLS'))
story.extend(skills_row([
('Languages', 'Python, Java, Go, SQL, TypeScript'),
('Frameworks', 'Spring Boot, FastAPI, React, Kubernetes, Kafka'),
('Tools', 'Git, Docker, Terraform, AWS (EC2/S3/Lambda), PostgreSQL, Redis'),
]))
doc.build(story)
```
### Resume Checklist
- [ ] **1 page** (unless user says otherwise)
- [ ] **No cover page, no TOC**
- [ ] Tight margins (1.5cm all sides)
- [ ] Name prominent at top (22-26pt)
- [ ] Contact info single line, centered
- [ ] Section headers with consistent separator style
- [ ] Bullets concise - start with action verbs
- [ ] Quantified achievements (%, $, count)
- [ ] No photos, no icons, no colour blocks (ATS-safe)
- [ ] Font: only registered fonts (Times New Roman / SimHei)
- [ ] **⚠️ Minimum font size 12px (9pt)** - no text smaller than this anywhere
- [ ] **Line breaks are language-aware** - no mid-word English breaks, no CJK punctuation orphans, no date range splits
- [ ] **Page fill ≥85%** - no large blank area at bottom. If sparse, increase spacing/leading/font size adaptively
---
## Component Reference
### Block types:
| Type | Description | Notes |
|------|-------------|-------|
| h1 | 22pt + accent underline rule | KeepTogether with rule |
| h2 | 15pt, dark | No rule, no accent |
| h3 | 11.5pt bold | No accent |
| body | 10.5pt justified, 17pt leading | Supports `<b>` `<i>` `<font>` |
| bullet | Body size with `` prefix | Unordered list |
| numbered | Body size with N. prefix | Counter auto-resets |
| callout | Accent left border 4px + light tint bg | Max one per section |
| table | Accent header + alternating rows + outer box only | Supports fractional col_widths |
| code | Courier 8.5pt + accent left border | Optional language label |
| divider | Accent 1.2pt rule | Use sparingly |
| caption | 8.5pt muted, centered | Below images/tables |
| chart | matplotlib figure saved as PNG → `Image()` flowable | Generate chart in-script, `fig.savefig()` → embed. Set `plt.rcParams['font.sans-serif']=['SimHei']` for CJK. Always `tight_layout()`. **Must follow `typesetting/charts.md` rules**: delete top/right spines, dashed grid 20% opacity, donut default for pie, anti-stacking labels. |
| quote | Body italic + left indent 24pt + muted accent left border 2px | For blockquotes / testimonials |
| bibliography | Hanging indent (firstLineIndent=-24pt, leftIndent=24pt) | GB/T 7714 or APA format per language |
| math | Rendered via `<super>` `<sub>` tags in Paragraph | For inline math; complex equations → use academic brief instead |
### Header / footer:
- Header: document title (left, 7.5pt, muted) + accent rule (1.5pt, full width)
- Footer: author (left, 7.5pt, muted) + page number (right, 7.5pt, muted) + light rule above
### Custom flowables (preferred over Table hacks):
- **CalloutBox**: accent 4px left border + light tint background - cleaner than Table simulation
- **BibliographyItem**: hanging indent reference entry
---
## Quick Reference
| Task | Best Tool | Command/Code |
|------|-----------|--------------|
| Create PDF (ReportLab) | reportlab | Canvas or Platypus |
| Fill PDF forms | Process brief | `python3 "$PDF_SKILL_DIR/scripts/pdf.py" form.fill` or annotation workflow |
| Merge PDFs | Process brief | `python3 "$PDF_SKILL_DIR/scripts/pdf.py" pages.merge` |
| Extract text | Process brief | `python3 "$PDF_SKILL_DIR/scripts/pdf.py" extract.text` |
| Extract tables | Process brief | `python3 "$PDF_SKILL_DIR/scripts/pdf.py" extract.table` |
---
## Document-Type Quick Reference
### 📋 Invoice / Receipt
Key elements:
- Company logo/name at top
- Invoice number + date (right-aligned)
- Seller/buyer info block (2-column layout)
- Line items table: Item, Quantity, Unit Price, Amount
- Total row with bold font
- Tax info + payment terms at bottom
- Stamp/seal placeholder (empty circle or rect with "Seal Here" text)
```python
# Invoice table pattern
data = [['Item', 'Qty', 'Unit Price', 'Amount']]
data += [[item['name'], item['qty'], item['price'], item['total']] for item in items]
data.append(['', '', 'Total:', f'¥{total}'])
```
### 📝 Contract / Legal Document
Key elements:
- Title centered, bold, 18pt
- Numbered clauses with auto-increment
- Signature block at bottom: Party A / Party B with date line
- Page numbers mandatory
```python
# Signature block pattern
sig_data = [
['Party A (Seal): ________________', 'Party B (Seal): ________________'],
['Date: ____/____/________', 'Date: ____/____/________'],
]
sig_table = Table(sig_data, colWidths=[250, 250])
```
### 📊 Book / Long Document
For documents with 10+ pages:
1. Generate TOC with `TableOfContents()` from reportlab.platypus
2. Use `CondPageBreak(H1_ORPHAN_THRESHOLD)` before H1 headings (NOT `PageBreak()` — never force page breaks between chapters)
3. Add running headers/footers with chapter title + page number
4. Run `python3 "$PDF_SKILL_DIR/scripts/toc_validate.py" output.pdf` to verify TOC links
5. Consider using `bookmarks=True` for PDF outline navigation
---
### 📝 Exam / Quiz / Test Paper / Worksheet
**Exam papers have unique layout requirements. These rules are MANDATORY when generating any test/quiz/exam document.**
#### Numbering & Structure
```
一、选择题(每小题 3 分,共 30 分) ← Section header (宋体/黑体 14pt Bold)
1. 以下哪个不是 Python 内置数据类型? ← Question stem (12pt)
A. int ← Options: indented 24pt (2em)
B. float ← Each option on new line or 2×2 grid
C. array
D. str
2. 以下表达式的值为? ← Next question
A. True B. False ← Short options: inline 2×2 grid OK
C. None D. Error
```
#### Option Indentation (MANDATORY)
```python
# ReportLab styles for exam papers
question_style = ParagraphStyle(
'Question', fontSize=12, leading=16,
leftIndent=0, firstLineIndent=0,
spaceBefore=12, spaceAfter=4,
)
option_style = ParagraphStyle(
'Option', fontSize=12, leading=16,
leftIndent=24, # ← MANDATORY: 24pt indent from question stem
firstLineIndent=0,
spaceBefore=2, spaceAfter=2,
)
```
#### Option Layout Decision
```python
def format_options(options):
"""
Decide option layout based on content length.
Short options (≤4 chars each, ≤4 options) → 2×2 grid table
Long options or >4 options → vertical list, one per line
"""
max_len = max(len(opt) for opt in options)
if len(options) <= 4 and max_len <= 6: # CJK: 4 chars; Latin: 6 chars
# 2×2 grid using Table
grid = Table(
[[options[0], options[1]], [options[2], options[3]]],
colWidths=[200, 200], hAlign='LEFT'
)
grid.setStyle(TableStyle([
('LEFTPADDING', (0,0), (-1,-1), 24), # Match option indent
('VALIGN', (0,0), (-1,-1), 'TOP'),
]))
return grid
else:
# Vertical list
return [Paragraph(f'{opt}', option_style) for opt in options]
```
#### Answer Space Reservation (MANDATORY)
**Every question MUST have adequate answer space. No exceptions.**
| Question Type | Minimum Space | Implementation |
|--------------|---------------|----------------|
| Multiple choice | 0 extra (options are the answer) | Just question + options |
| Fill-in-the-blank | Inline underline, min 80pt width | `<u>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</u>` or ReportLab `HRFlowable` |
| Short answer | 40-60pt (2-3 lines) | `Spacer(1, 50)` |
| Calculation / math work | 120-200pt (6-10 lines) | `Spacer(1, 160)` with light guide lines |
| Essay / long answer | 200-400pt (10-20 lines) | `Spacer(1, 300)` with light guide lines |
| Proof / derivation | 160-250pt (8-12 lines) | `Spacer(1, 200)` |
```python
# Answer line helper — light dashed lines for handwriting
def add_answer_lines(story, num_lines=8, line_spacing=20):
"""Add light guide lines for handwritten answers."""
for i in range(num_lines):
story.append(Spacer(1, line_spacing))
story.append(HRFlowable(
width='90%', thickness=0.3,
color=colors.Color(0.8, 0.8, 0.8), # Light gray
dash=[4, 4], # Dashed
spaceAfter=0, spaceBefore=0
))
```
#### Page Density
- `spaceBefore=12pt` between questions minimum
- `spaceBefore=24pt` before section headers (一、二、三)
- Score indicator after question number: `1. (分值: 5分)` or `1. [5 pts]`
- Page header: exam title + time limit + total score
- No cover page unless explicitly requested
---
## Data-to-Ink Ratio Rules (MANDATORY for Reports)
**CRITICAL: Do NOT write long paragraphs to describe data trends.** You MUST extract metrics and trends into structured visual elements.
### Pattern Detection Table
Before writing ANY body paragraph, scan the text. If you find any of these patterns, extract them:
| Raw text pattern | ❌ FORBIDDEN | ✅ REQUIRED visual form |
|-----------------|-------------|----------------------|
| "revenue grew 12% to $4.2M" | Bury in paragraph prose | CalloutBox with bold `+12%` + `$4.2M` |
| "latency dropped from 120ms to 75ms" | Long explanatory sentence | CalloutBox or metrics table row |
| "Q1→Q2→Q3: 10%→25%→40%" | Inline numbers in text | Chart (matplotlib PNG → `Image()`) |
| "first...second...third..." steps | Paragraph with ordinal words | Numbered Table or process list |
| "compared to last year / vs Q2" | Nested comparisons in prose | Side-by-side comparison table |
| "accounted for 60% of total" | Percentage in paragraph | Pie chart or stacked bar chart |
### ReportLab CalloutBox Template
Use this pattern for extracted metrics - it's visually clean and takes only 5 lines:
```python
stat_style = ParagraphStyle(
name='StatBig', fontName='Times New Roman', fontSize=22,
leading=26, textColor=ACCENT, alignment=TA_CENTER # From palette
)
label_style = ParagraphStyle(
name='StatLabel', fontName='Times New Roman', fontSize=9,
leading=12, textColor=TEXT_MUTED, alignment=TA_CENTER # From palette
)
callout = Table(
[[Paragraph('<b>+12%</b>', stat_style)],
[Paragraph('Revenue Growth vs Q2', label_style)]],
colWidths=[160]
)
callout.setStyle(TableStyle([
('BACKGROUND', (0,0), (-1,-1), BG_SURFACE), # From palette --c-mid
('BOX', (0,0), (-1,-1), 1, ACCENT), # From palette --c-accent
('TOPPADDING', (0,0), (-1,-1), 10),
('BOTTOMPADDING', (0,0), (-1,-1), 10),
('VALIGN', (0,0), (-1,-1), 'MIDDLE'),
]))
story.append(KeepTogether(callout))
```
### Cover Pages (HTML/Playwright)
When the cover routes through Playwright:
- Use `Delta_Widget` components for KPIs and metrics.
- Use `Process_List` components for workflows and timelines.
- Use `Sidenote_Block` components (in `tufte_report` archetype) for citations and supplementary data.
- For data-heavy pages, add `data_points` arrays to any component - the engine renders them as content-aware background curves.
---
## Critical Reminders Checklist
Before submitting code, verify:
- [ ] **Font restriction**: Only 6 registered fonts used
- [ ] **Font family registered**: `registerFontFamily()` called for all used fonts
- [ ] **Mixed language**: `install_font_fallback()` called after font registration (auto handles `<font>` wrapping)
- [ ] **Rich text tags**: Only inside `Paragraph()` objects
- [ ] **Table cells**: ALL text wrapped in `Paragraph()`
- [ ] **Scientific notation**: Large/small numbers use `<super>` tags
- [ ] **Chinese text**: `wordWrap='CJK'` in ParagraphStyle
- [ ] **Line breaks**: Using `<br/>` not `\n`
- [ ] **Headings**: Bold with `<b>` tags
- [ ] **Table headers**: White bold text on dark blue (#1F4E79)
- [ ] **Metadata**: Title matches filename, Author/Creator = "Z.ai"
- [ ] **Sanitization**: Code sanitized before execution
- [ ] **Post-build metadata**: `pdf.py meta.brand` called after build
- [ ] **Post-build blank page cleanup**: `pdf.py pages.clean` called after build
- [ ] **Glyph check**: `pdf.py font.check` run to verify no missing glyphs
- [ ] **TOC check**: `pdf.py toc.check` run if document has TOC (entries, pages, links)
- [ ] **PDF QA scan**: `pdf_qa.py` run to verify page consistency, CJK punctuation, overflow, margins, table centering, font embedding, metadata