72 KiB
Executable File
Brief: Report Production
Structured documents via ReportLab: reports, proposals, contracts, white papers, financial analysis, data tables. Also covers ATS-friendly resumes.
🚨 EMOJI CHECK - STOP HERE IF CONTENT HAS EMOJI
ReportLab renders emoji (📊🎯🔥💡 etc.) as □ tofu squares. This is unfixable.
If the user's content contains intentional emoji → STOP. Do NOT use this brief. Route to
briefs/creative.mdinstead (Playwright renders emoji natively).This applies even if the document is a "report" - emoji + ReportLab = broken output.
Production Workflow
1. BRIEF → Confirm document type, audience, page count, outline
2. DESIGN → Run `pdf.py palette.generate --title "..."` → copy-paste output into script
3. EDIT → Content transformation (extract data, define typographic roles)
4. COVER → Read typesetting/cover.md → render HTML cover via Playwright (default ON for reports ≥ 3 pages; skip only for resumes/letters/memos/forms)
5. BUILD → Write ReportLab code (fonts → styles → content → tables → charts)
6. PREFLIGHT → code.sanitize → execute → meta.brand → font.check → toc.check → pages.clean → pdf_qa.py
7. DELIVER → Merge cover + body → final PDF
Typesetting assets (load when you reach that step):
typesetting/palette.md- color system, typography rules, anti-patternstypesetting/cover.md- 7 cover layouts with variants, typography scale, bounding box rulestypesetting/charts.md- chart styling, anti-stacking rules, axis/grid/legend treatment
Cover is DEFAULT ON for reports. Skip cover only for: resumes, letters, memos, forms, checklists, or documents ≤ 2 pages.
Step 2: DESIGN - Palette & Font Plan
Before writing any ReportLab code, you MUST generate the color palette via the palette.generate command. This is not optional. Do NOT hardcode hex colors. Do NOT pick colors by feel.
2a. Generate Palette (MANDATORY)
Run this command FIRST. Copy-paste its output directly into your Python script:
python3 "$PDF_SKILL_DIR/scripts/pdf.py" palette.generate --title "<document title>" --mode minimal
The command auto-derives the design intent from the document title, computes a mathematically harmonious palette, and outputs ready-to-paste ReportLab Python code:
# ━━ Color Palette (auto-generated by pdf.py palette.generate) ━━
# Intent: neutral | Mode: minimal | Harmony: split_complementary
# Contrast: text:bg=14.12 | accent:bg=3.06
from reportlab.lib import colors
ACCENT = colors.HexColor('#2f97b9')
TEXT_PRIMARY = colors.HexColor('#252422')
TEXT_MUTED = colors.HexColor('#8d8981')
BG_SURFACE = colors.HexColor('#dfdad2')
BG_PAGE = colors.HexColor('#f5f4f3')
SURFACE_RGBA = 'rgba(0,0,0,0.03)'
# ReportLab table colors
TABLE_HEADER_COLOR = ACCENT
TABLE_HEADER_TEXT = colors.white
TABLE_ROW_EVEN = colors.white
TABLE_ROW_ODD = BG_SURFACE
Options:
--mode minimal(default, recommended for 50%+ of documents) |dark|pastel|jewel|light--harmony auto(default, recommended) |complementary|split_complementary|analogous|triadic|monochrome--format python(default) |json|css--seed <int>- for reproducible palettes across regenerations
⚠️ FORBIDDEN:
- ❌ Writing
colors.HexColor('#xxxxxx')with any hex value you chose yourself - ❌ Using
colors.red,colors.blue, or any ReportLab named color for design elements - ❌ Skipping this step and picking colors "that look good"
- ❌ Using different palettes for different sections of the same document
✅ The ONLY acceptable way to get colors: Run palette.generate, copy the output.
2b. Color Application Rules
| Element | Color Source | Notes |
|---|---|---|
| Table headers | ACCENT (bg) + white (text) | High contrast required |
| Table odd rows | BG_SURFACE at 50% opacity | Subtle striping |
| Section titles | ACCENT or TEXT_PRIMARY | Match heading hierarchy |
| Body text | TEXT_PRIMARY | Never use pure #000000 |
| Muted/meta text | TEXT_MUTED | Dates, captions, footnotes |
| Horizontal rules | ACCENT at 30% opacity | Thin, unobtrusive |
| Chart colors | Derive from ACCENT (see charts.md) | Consistent with palette |
2c. Forbidden
- ❌ Hardcoding hex colors like
#1F4E79,#555555,#888888in code - ❌ Using
colors.red,colors.blueor any ReportLab named color for design elements - ❌ Choosing colors by "feel" without running palette first
- ❌ Using different accent colors across tables/sections in the same document
Exception: Semantic colors for data visualization (green for positive, red for negative) are acceptable but should be muted variants derived from the palette when possible.
Step 3: EDIT - Content Transformation
Before writing any code, restructure raw text into visual roles. This is the most impactful quality step - skipping it produces "Word document with a PDF extension".
3a. Typographic Role Extraction
Parse raw text and categorize content into visual roles:
| Role | ReportLab Mapping | Description |
|---|---|---|
| Hero / Display | Cover title (36-42pt) or section opener | 1-5 words, the emotional hook |
| Kicker / Eyebrow | Subtitle or section intro (10-12pt, muted) | Tiny context line above main heading |
| Data Sculpture | CalloutBox or bold stat block | Extract impactful numbers (97%, $4.2M, -45ms) |
| Pull Quote | BlockQuote (italic + left indent 24pt + accent border) | Most provocative sentence, standalone |
| Body | Standard paragraph | Everything else |
Rule: Before writing any body paragraph, scan for numbers with units. Every metric like "revenue grew by 12%" or "latency dropped to 45ms" MUST be extracted into a CalloutBox or metrics table - never buried in paragraph prose. See §Data-to-Ink Ratio Rules below.
3b. Section Pacing
- If any H1 section has >400 words of continuous body text without visual breaks, split into H2 subsections or insert a visual break (table, chart, callout)
- Aim for at least 1 visual element (table/chart/callout) per 2-3 pages of body text
- Dense text walls are the #1 sign of poor report design
Character Safety Rule (MANDATORY)
Three rules for safe character handling in ReportLab PDFs:
a) Superscripts and subscripts: Use <super>/<sub> tags, never raw Unicode superscript/subscript characters (e.g., \u00b2, \u2082).
b) Emoji: ReportLab cannot render emoji. If content contains emoji, use Creative brief (HTML + Playwright).
c) Font fallback for mixed CJK/Latin text: After registering fonts, call install_font_fallback() once. This automatically wraps missing-glyph characters in <font> tags inside every Paragraph(). No manual <font name="..."> wrapping needed for mixed Chinese-English text.
Mathematical/relational operators (×, ÷, ±, ≤, √, ∑, ≅, ∫, π, ∠, Δ, etc.) are safe to use as literal characters in Paragraph() - both SimHei and Times New Roman have these glyphs.
| Need | Correct Method | Correct Example |
|---|---|---|
| Superscript | <super> tag in Paragraph() |
Paragraph('10<super>2</super> × 10<super>3</super> = 10<super>5</super>', style) |
| Subscript | <sub> tag in Paragraph() |
Paragraph('H<sub>2</sub>O', style) |
| Bold | <b> tag in Paragraph() |
Paragraph('<b>Title</b>', style) |
| Math operators | Literal char in Paragraph() |
Paragraph('AB ⊥ AC, ∠A = 90°, and ΔABC ≅ ΔDCF', style) |
| Scientific notation | Combined tags in Paragraph() |
Paragraph('1.2 × 10<super>8</super> kg/m<super>3</super>', style) |
from reportlab.platypus import Paragraph
from reportlab.lib.styles import ParagraphStyle
from reportlab.lib.enums import TA_LEFT, TA_CENTER, TA_JUSTIFY
body_style = ParagraphStyle(
name="ENBodyStyle",
fontName="Times New Roman",
fontSize=10.5,
leading=18,
alignment=TA_JUSTIFY,
)
# Superscript: area unit
Paragraph('Total area: 500 m<super>2</super>', body_style)
# Subscript: chemical formula
Paragraph('The reaction produces CO<sub>2</sub> and H<sub>2</sub>O', body_style)
# Scientific notation
Paragraph('Speed of light: 3.0 × 10<super>8</super> m/s', body_style)
# Combined
Paragraph('E<sub>k</sub> = mv<super>2</super>/2', body_style)
# Bold heading
Paragraph('<b>Chapter 1: Introduction</b>', header_style)
# Math symbols
Paragraph('When ∠ A = 90°, AB ⊥ AC and ΔABC ≅ ΔDEF', body_style)
Pre-generation check - before writing ANY string, ask:
"Does this string contain a character outside basic CJK or Mathematical/relational operators?" If YES → it MUST be inside a
Paragraph()with the appropriate tag. If it is a superscript/subscript digit in raw unicode escape form → REPLACE with<super>/<sub>tag.
NEVER rely on post-generation scanning. Prevent at the point of writing.
Encoding safety - before writing ANY content text:
"Does this string contain Japanese kana (の, が, は etc.) or rare Unicode symbols?" If YES → REPLACE with safe plain Chinese equivalents. Japanese kana (Hiragana U+3040-U+309F, Katakana U+30A0-U+30FF) frequently corrupt to U+FFFD (<28>) when code passes through LLM output, heredoc, or terminal encoding layers. Common safe replacements:
活の真鲷→活缔真鲷,盐烤の鲭鱼→盐烤鲭鱼,烤の鸡串→炭烤鸡串. If the character is genuinely needed, verify it survives a full write→read round-trip withopen(file, encoding='utf-8').
Font Setup (Guaranteed Success Method)
CRITICAL: Allowed Fonts Only
You MUST ONLY use the following registered fonts. Using ANY other font is STRICTLY FORBIDDEN.
| Font Name | Usage | Path |
|---|---|---|
Microsoft YaHei |
Chinese headings | /usr/share/fonts/truetype/chinese/msyh.ttf |
SimHei |
Chinese body text | /usr/share/fonts/truetype/chinese/SimHei.ttf |
SarasaMonoSC |
Chinese code blocks | /usr/share/fonts/truetype/chinese/SarasaMonoSC-Regular.ttf |
Times New Roman |
English text, numbers, tables | /usr/share/fonts/truetype/english/Times-New-Roman.ttf |
Calibri |
English alternative | /usr/share/fonts/truetype/english/calibri-regular.ttf |
DejaVuSans |
Formulas, symbols, code | /usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf |
FORBIDDEN fonts (DO NOT USE):
- ❌ Arial, Arial-Bold, Arial-Italic
- ❌ Helvetica, Helvetica-Bold, Helvetica-Oblique
- ❌ Courier, Courier-Bold
- ❌ Any font not listed in the table above
Font Registration Template
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
from reportlab.pdfbase.pdfmetrics import registerFontFamily
# Chinese fonts
pdfmetrics.registerFont(TTFont('Microsoft YaHei', '/usr/share/fonts/truetype/chinese/msyh.ttf'))
pdfmetrics.registerFont(TTFont('SimHei', '/usr/share/fonts/truetype/chinese/SimHei.ttf'))
pdfmetrics.registerFont(TTFont("SarasaMonoSC", '/usr/share/fonts/truetype/chinese/SarasaMonoSC-Regular.ttf'))
# English fonts
pdfmetrics.registerFont(TTFont('Times New Roman', '/usr/share/fonts/truetype/english/Times-New-Roman.ttf'))
pdfmetrics.registerFont(TTFont('Calibri', '/usr/share/fonts/truetype/english/calibri-regular.ttf'))
# Symbol/Formula font
pdfmetrics.registerFont(TTFont("DejaVuSans", '/usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf'))
# CRITICAL: Register font families to enable <b>, <super>, <sub> tags
registerFontFamily('Microsoft YaHei', normal='Microsoft YaHei', bold='Microsoft YaHei')
registerFontFamily('SimHei', normal='SimHei', bold='SimHei')
registerFontFamily('Times New Roman', normal='Times New Roman', bold='Times New Roman')
registerFontFamily('Calibri', normal='Calibri', bold='Calibri')
registerFontFamily('DejaVuSans', normal='DejaVuSans', bold='DejaVuSans')
Font Configuration by Document Type
For Chinese PDFs:
- Body text:
SimHeiorMicrosoft YaHei - Headings:
Microsoft YaHei(MUST use for Chinese headings) - Code blocks:
SarasaMonoSC - Formulas/symbols:
DejaVuSans - In tables: ALL Chinese content and numbers MUST use
SimHei
For English PDFs:
- Body text:
Times New Roman - Headings:
Times New Roman(MUST use for English headings) - Code blocks:
DejaVuSans - In tables: ALL English content and numbers MUST use
Times New Roman
For Mixed Chinese-English PDFs:
- Call
install_font_fallback()once after registering fonts - it automatically wraps characters in<font>tags so you don't need to do it manually. - If you still want manual control, you can use
<font name='...'>tags, but the automatic fallback handles most cases.
import sys, os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'scripts'))
from pdf import install_font_fallback
# 1. Register fonts (as above)
# 2. Install fallback - one line, handles all mixed-language text
install_font_fallback()
# 3. Just write naturally - no manual <font> wrapping needed!
en_style = ParagraphStyle(name="EN", fontName="Times New Roman", fontSize=10.5, leading=18)
cn_style = ParagraphStyle(name="CN", fontName="SimHei", fontSize=10.5, leading=18)
Paragraph('MySQL、PostgreSQL、Redis', en_style) # ✅ CJK comma auto-fallback to SimHei
Paragraph('Beijing (北京) has 21M people', en_style) # ✅ CJK chars auto-fallback
Paragraph('价格:¥2,200~5,800/月', en_style) # ✅ all CJK chars handled
Paragraph('《巴黎协定》签署于2015年', en_style) # ✅ CJK book title marks handled
How it works: install_font_fallback() monkey-patches Paragraph.__init__ to scan each character against the font's charToGlyph table. Characters missing from the base font are automatically wrapped in <font name="FallbackFont">. The fallback chain is: English fonts → SimHei, Chinese fonts → Times New Roman. For aesthetic optimization, Cyrillic text in SimHei is automatically routed to Times New Roman (serif looks better for Cyrillic).
Chinese Plot PNG Method
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
Available Font Paths
Run fc-list to get more fonts. Font files are typically located under:
/usr/share/fonts/truetype/chinese//usr/share/fonts/truetype/english//usr/share/fonts/
Layout & Spacing Control
Page Breaks
Follow the document type strategy defined in SOUL.md Rule 1.
Structural breaks (always - MANDATORY):
- Between cover page and TOC - cover must NEVER share a page with anything else
- Between TOC and main content
- Between main content and back cover
Content breaks (by document type):
- Default (all document types): ❗ Absolutely NEVER force page breaks before H1/H2 — content flows naturally, do not insert
PageBreak()before section headings. This is an iron rule. - Resume / contract / letter → No content page breaks
- Short article → No content page breaks
- Exception 1: Academic papers / textbooks / teaching plans →
PageBreak()before each H1 is acceptable (only if user explicitly requests academic formatting) - Exception 2: User explicitly requests page breaks between chapters
Anti-tear (mandatory — but with height limit):
from reportlab.platypus import KeepTogether
from reportlab.lib.pagesizes import A4
from reportlab.lib.units import inch
# Maximum height for KeepTogether blocks: 40% of page height
# Blocks taller than this should NOT be kept together — they cause
# the preceding page to be mostly empty when the block gets pushed down.
MAX_KEEP_HEIGHT = A4[1] * 0.4 # ~336pt ≈ 12cm
def safe_keep_together(elements):
"""Wrap elements in KeepTogether only if their total height is reasonable.
If too tall, keep only the FIRST TWO elements together (e.g. title + table/image header)
to prevent title-content separation, while letting the rest paginate naturally.
"""
total_h = 0
for el in elements:
w, h = el.wrap(A4[0] - 2*inch, A4[1])
total_h += h
if total_h <= MAX_KEEP_HEIGHT:
return [KeepTogether(elements)]
elif len(elements) >= 2:
# Keep at least title + first element together to prevent title orphaning
return [KeepTogether(elements[:2])] + list(elements[2:])
else:
return list(elements)
# Heading stays with first paragraph
story.extend(safe_keep_together([
heading_paragraph,
first_body_paragraph,
]))
# Image + caption: keep together ONLY if image is small
story.extend(safe_keep_together([image, caption_paragraph]))
# Table title + table: keep together ONLY if table is short
if len(table_data) <= 15:
story.extend(safe_keep_together([table_title, table]))
else:
# Long table: just keep title with the table start
story.extend(safe_keep_together([table_title]))
story.append(table)
⚠️ KEY INSIGHT: KeepTogether is the #1 cause of “empty page with chart on next page”.
When a KeepTogether block is taller than the remaining space on the current page,
ReportLab pushes the entire block to the next page, leaving the current page mostly empty.
Always use safe_keep_together() to cap the maximum block height.
⚠️ Note: Do NOT use KeepWithNext - it is unreliable in ReportLab 4.x.
Vertical Spacing Standards
- Before tables:
Spacer(1, 18)after preceding text - After tables:
Spacer(1, 6)before table caption - After table captions:
Spacer(1, 18)before next content - Between paragraphs:
Spacer(1, 12)(~1 line) - Between H3 subsections:
Spacer(1, 12) - Between H2 sections:
Spacer(1, 18)(~1.5 lines) - Between H1 sections:
Spacer(1, 24)(~2 lines) - NEVER use
Spacer(1, X)where X > 24, except for H1 major breaks or cover page elements
Cover Page: HTML/Playwright Unified Cover System
⚠️ Report route covers are ALWAYS generated via HTML/Playwright, using the same 7-template system defined in typesetting/cover.md. This ensures visual consistency across all routes (Report, Creative, Academic) and avoids the limitations of ReportLab for complex cover layouts.
Pipeline (Report route):
- Generate body PDF via ReportLab (start with TOC or content - no cover in story[])
- Generate cover HTML following
typesetting/cover.md7-template system - Run
poster_validate.py check-htmlon cover HTML — fix any ERRORs before rendering (overflow:hidden, font fallback, etc.) - Run
cover_validate.json cover HTML — detects text-vs-decorative-line overlaps. Non-zero exit = must fix before proceeding.node "$PDF_SKILL_DIR/scripts/cover_validate.js" cover.html - Render cover HTML → single-page PDF via Playwright (
html2poster.js) — NOThtml2pdf-next.js(which converts absolute→static and destroys cover layout) - Merge: insert cover as page 0 of body PDF using pypdf → output single final PDF
Why not ReportLab covers? ReportLab is excellent for structured content (tables, paragraphs, flowables) but painful for visual design (geometric accents, precise absolute positioning, web fonts). HTML/CSS handles these natively. One cover system, one visual standard, zero inconsistency.
⚠️ Cover page is DEFAULT ON for all Report-route documents ≥ 3 pages.
The Report route handles formal, structured documents. Readers expect a professional cover. Always generate a cover unless the document type is explicitly excluded below.
ALWAYS add a cover page (default behavior):
- Research reports, experiment reports, lab reports, analysis reports
- White papers, industry analysis, market research
- Business proposals, project plans, feasibility studies
- Technical documentation, product manuals, user guides
- Annual reports, financial summaries, quarterly reviews
- Any formal document ≥ 3 pages that will be shared externally or submitted
NEVER add a cover page (explicit exclusions only):
- Resumes / CVs - recruiters want content immediately
- Letters / memos - single-page or short-form
- Forms / checklists / invoices - functional documents
- Internal notes / meeting minutes - brevity over presentation
- Documents ≤ 2 pages - cover would be disproportionate
- User explicitly said "no cover"
Rule of thumb: If it's a report, analysis, proposal, or any formal document ≥ 3 pages → add a cover. When in doubt, add the cover. It's easier to remove a cover the user didn't want than to miss one they expected.
Step 1: Generate Body PDF (ReportLab, no cover)
Build the ReportLab document starting directly with TOC or first section. Do NOT include any cover page in the story[] list.
# Body PDF - starts with TOC or first section, NO cover
story = []
# story.append(toc) # if applicable
# story.append(PageBreak())
# story.append(first_section_content)
# ...
doc.build(story)
# Output: body.pdf (no cover)
Step 2: Generate Cover PDF via html2poster.js
# ALWAYS use html2poster.js for cover rendering (NOT html2pdf-next.js)
# Cover pages use position:absolute layout — html2pdf-next.js pre-render hooks
# convert absolute→static and destroy the layout. html2poster.js preserves it.
node "$PDF_SKILL_DIR/scripts/html2poster.js" cover.html --output cover.pdf --width 794px
Or from Python:
import subprocess, os
def render_cover(html_path, pdf_path):
"""Render HTML cover to PDF via html2poster.js.
⚠️ ALWAYS use html2poster.js for covers (NOT html2pdf-next.js).
Cover HTML uses position:absolute for layout. html2pdf-next.js pre-render
hooks convert absolute→static to prevent multi-page overlap, which
destroys cover layouts. html2poster.js preserves absolute positioning.
"""
scripts_dir = os.path.expanduser('~/.openclaw/workspace/skills/pdf/scripts')
subprocess.run([
'node', os.path.join(scripts_dir, 'html2poster.js'),
html_path, '--output', pdf_path,
'--width', '794px',
], check=True)
Insertion script (single output PDF):
from pypdf import PdfReader, PdfWriter, Transformation
A4_W, A4_H = 595.28, 841.89 # A4 in points
def normalize_page_to_a4(page):
"""Scale a page to A4 if its dimensions don't match."""
box = page.mediabox
w, h = float(box.width), float(box.height)
if abs(w - A4_W) > 2 or abs(h - A4_H) > 2:
sx, sy = A4_W / w, A4_H / h
page.add_transformation(Transformation().scale(sx=sx, sy=sy))
page.mediabox.lower_left = (0, 0)
page.mediabox.upper_right = (A4_W, A4_H)
return page
def insert_cover(cover_pdf, body_pdf, output_pdf):
"""Insert cover as first page of body PDF → single output file."""
writer = PdfWriter()
# Cover as page 1
cover_page = PdfReader(cover_pdf).pages[0]
writer.add_page(normalize_page_to_a4(cover_page))
# Body pages follow
for page in PdfReader(body_pdf).pages:
writer.add_page(normalize_page_to_a4(page))
writer.add_metadata({'/Title': 'Report Title', '/Author': 'Z.ai', '/Creator': 'Z.ai'})
with open(output_pdf, 'wb') as f:
writer.write(f)
⚠️ Size pitfall: Playwright
page.pdf(width='794px', height='1123px')output PDF dimensions may differ from A4 (595.28×841.89pt) by 1-2pt. Do not use PIL Image.save('x.pdf') for PNG→PDF conversion — DPI mapping causes severe size errors. You must callnormalize_page_to_a4()to unify dimensions before insertion.
Cover HTML template reference: See typesetting/cover.md for the complete 7-Layout System with variants, typography scale, and bounding box rules.
→ Full cover spec: typesetting/cover.md - read it before designing any cover.
Cover design rules (summary - see cover.md for details):
- Page size:
width: 794px; height: 1123px(A4 at 96dpi) body { margin: 0; padding: 0; overflow: hidden; }- REQUIRED to avoid white borders- Load Google Fonts via
<link href="..." rel="stylesheet">in<head>(NOT@import url(...)in CSS —@importmust be first rule or is silently ignored) — Playwright fetches them at render time - Background must be white or very light (
#fff/#fafafa/#f5f8fb), no dark solid backgrounds or gradient backgrounds - Primary color used only for text, lines, geometric accents - never as large-area background
- Sophistication = whitespace + typography + restrained geometric accents
Cover Constitution (7-Layout System):
- Pick a layout (1-7) from
typesetting/cover.mdthat matches the document tone. No global default - every selection must be a deliberate design decision. Layout 7 for government/bidding/proposal documents. - Maximum 4 components on any cover. Typical recipe: Title + subtitle + 1 geometric accent + metadata.
- Typography Scale: Title ≈ 45pt, Subtitle ≈ 25pt, Meta ≥ 18pt (never below 14pt). Tiny text = FAIL.
- Background layer (optional): See
typesetting/cover-backgrounds.mdfor 3 recipes (A=极简弧线, B=工程十字轴+立柱, C=锐角切割+出血文字). Background renders BELOW all foreground content at 2-5% opacity. - Mandatory
<br>chunking: Title MUST break every 2-4 words (CJK) or 3-5 words (English). Single-line title = FAIL. - Bounding Box spatial dispersion: Group elements into 2-3 bounding boxes at opposite regions (e.g., top-left + bottom-right). Never cluster everything into middle 40%.
- Safe Zone: 12% top/bottom, 14% left/right padding on cover pages.
- No body text on cover: Cover = title + subtitle + date + optional geometric decoration. All content goes to page 2+.
⚠️ Cover Page Isolation Rule (MANDATORY)
The cover page must ALWAYS be on its own page. Cover content and subsequent content (TOC, body text, etc.) must NEVER appear on the same page. This is a hard rule.
- Report route (HTML/Playwright cover): The cover is a separate rendered page inserted as page 0 via pypdf - isolation is inherent in the merge pipeline
- Creative route: Cover is part of the same HTML document but on its own
<section>with page-break - isolation is handled by CSS
If a generated PDF shows cover + TOC on the same page, it is a critical bug - regenerate immediately.
Alignment and Typography
- CJK body: Use
TA_LEFT+ 2-char indent. Headings: no indent. - Font sizes: Body 11pt, subheadings 14pt, headings 18-20pt
- Line height: 1.5-1.6 (leading at 1.4x font size minimum, recommended 1.5x for CJK)
- CJK characters are taller than Latin - 1.2x causes cramped lines
- Quick reference: 10pt→14pt min/18pt rec; 14pt→20pt min/22pt rec; 36pt→50pt min/54pt rec
- ⛔ Prohibited: fixed
rowHeightsin Table(). UseTOPPADDING/BOTTOMPADDINGto control row spacing. Fixed rowHeights cause content overflow clipping - the text renders but gets invisibly cut off. - CRITICAL: Alignment Selection Rule:
- Use
TA_JUSTIFYonly when ALL of:- Text is predominantly English (≥ 90%)
- Column width is sufficiently wide (A4 single-column body)
- Font: Western fonts (Times New Roman / Calibri)
- Chinese content: None or negligible
- Otherwise, always default to
TA_LEFT - For Chinese text, always add
wordWrap='CJK'to ParagraphStyle
- Use
Style Configuration
- Normal paragraph:
spaceBefore=0,spaceAfter=6-12 - Headings:
spaceBefore=12-18,spaceAfter=6-12 - Headings must be bold: Use
<b></b>tags in Paragraph - Table captions:
spaceBefore=3,spaceAfter=6,alignment=TA_CENTER - CRITICAL: For Chinese text, always add
wordWrap='CJK'to ParagraphStyle
Table Formatting
Standard Table Color Scheme (MUST USE for ALL tables)
Colors MUST come from the palette generated in Step 2. The values below are examples - replace them with your actual palette output.
# Colors from design_engine.py palette (Step 2) - NEVER hardcode these
TABLE_HEADER_COLOR = ACCENT # From palette --c-accent
TABLE_HEADER_TEXT = colors.white # White text for header (fixed)
TABLE_ROW_EVEN = colors.white # White for even rows
TABLE_ROW_ODD = BG_SURFACE # From palette --c-mid (light stripe)
Table Rules
- Caption must be centered, added immediately after the table
- Entire table must be centered on the page
- Header Row: Dark background (from palette
header_fill), white bold text - Cell Margins: Left/Right at least 120-200 twips
- Alignment: Each body element within the same table must be aligned the same method
- Color consistency: All tables in one PDF must use the same color scheme
- Spacing:
Spacer(1, 18)→ Table →Spacer(1, 6)→ Caption →Spacer(1, 18)
Table Centering Enforcement (MANDATORY)
Every table MUST be horizontally centered on the page. No exceptions.
# ReportLab: ALWAYS set hAlign='CENTER'
table = Table(data, colWidths=col_widths, hAlign='CENTER')
# Width rule: table total width should be 85-100% of available content width
available_width = doc.width # or page_width - left_margin - right_margin
table_width = sum(col_widths)
if table_width < available_width * 0.85:
# Scale up columns proportionally to fill space
scale = (available_width * 0.90) / table_width
col_widths = [w * scale for w in col_widths]
# ❌ WRONG — no hAlign (defaults to LEFT, table drifts left)
table = Table(data, colWidths=[100, 200, 150])
# ✅ RIGHT — explicit centering
table = Table(data, colWidths=[100, 200, 150], hAlign='CENTER')
LaTeX equivalent:
\begin{table}[htbp]
\centering % MANDATORY
\begin{tabular}{...}
...
\end{tabular}
\end{table}
Table Cell Content Rule (MANDATORY - NON-NEGOTIABLE)
ALL text content in table cells MUST be wrapped in Paragraph(). NO EXCEPTIONS.
❌ PROHIBITED - Plain strings:
data = [
['<b>Header</b>', 'Value'], # Bold won't render!
['Pressure', '1.01 × 10<super>5</super>'], # Superscript won't work!
]
✅ REQUIRED - All text in Paragraph:
data = [
[Paragraph('<b>Header</b>', header_style), Paragraph('Value', header_style)],
[Paragraph('Pressure', cell_style), Paragraph('1.01 × 10<super>5</super>', cell_style)],
]
The ONLY exception: Image() objects can be placed directly in table cells.
Units with Exponents (CRITICAL)
- PROHIBITED:
W/m2,kg/m3,m/s2(plain text exponents) - RIGHT:
Paragraph('W/m<super>2</super>', style),Paragraph('kg/m<super>3</super>', style)
Numeric Values in Tables (CRITICAL)
- Large numbers MUST use scientific notation:
Paragraph('-1.246 × 10<super>8</super>', style)not-124600000 - Small decimals MUST use scientific notation:
Paragraph('2.5 × 10<super>-3</super>', style)not0.0025 - Threshold: Use scientific notation when |value| ≥ 10000 or |value| ≤ 0.001
Complete Table Example
from reportlab.platypus import Table, TableStyle, Paragraph, Image
from reportlab.lib.styles import ParagraphStyle
from reportlab.lib import colors
from reportlab.lib.enums import TA_CENTER, TA_LEFT, TA_RIGHT, TA_JUSTIFY
header_style = ParagraphStyle(
name='TableHeader', fontName='Times New Roman', fontSize=11,
textColor=colors.white, alignment=TA_CENTER
)
cell_style = ParagraphStyle(
name='TableCell', fontName='Times New Roman', fontSize=10,
textColor=colors.black, alignment=TA_CENTER
)
data = [
[Paragraph('<b>Parameter</b>', header_style),
Paragraph('<b>Unit</b>', header_style),
Paragraph('<b>Value</b>', header_style)],
[Paragraph('Temperature', cell_style),
Paragraph('°C', cell_style),
Paragraph('25.5', cell_style)],
[Paragraph('Pressure', cell_style),
Paragraph('Pa', cell_style),
Paragraph('1.01 × 10<super>5</super>', cell_style)],
[Paragraph('Density', cell_style),
Paragraph('kg/m<super>3</super>', cell_style),
Paragraph('1.225', cell_style)],
]
table = Table(data, colWidths=[120, 80, 100])
# ⚠️ Above uses hardcoded widths - OK for this 3-col example (120+80+100=300 < available ~460pt).
# For real tables, ALWAYS calculate from available_width. See "Table Width Management" below.
table.setStyle(TableStyle([
('BACKGROUND', (0, 0), (-1, 0), ACCENT), # Header: palette accent
('TEXTCOLOR', (0, 0), (-1, 0), colors.white),
('BACKGROUND', (0, 1), (-1, 1), colors.white),
('BACKGROUND', (0, 2), (-1, 2), BG_SURFACE), # Odd row: palette surface
('BACKGROUND', (0, 3), (-1, 3), colors.white),
('GRID', (0, 0), (-1, -1), 0.5, TEXT_MUTED), # Grid: palette muted
('VALIGN', (0, 0), (-1, -1), 'MIDDLE'),
('LEFTPADDING', (0, 0), (-1, -1), 8),
('RIGHTPADDING', (0, 0), (-1, -1), 8),
('TOPPADDING', (0, 0), (-1, -1), 6),
('BOTTOMPADDING', (0, 0), (-1, -1), 6),
]))
Table Width Management (⚠️ Prevent Table Overflow)
Most common table bug: columns overflow the right page margin because colWidths are hardcoded without checking available space.
Iron Rule: The sum of all colWidths MUST NOT exceed available_width.
from reportlab.lib.pagesizes import A4
from reportlab.lib.units import inch
page_width = A4[0] # 595.28pt
left_margin = 1.0 * inch # 72pt (adjust to your doc's actual margins)
right_margin = 1.0 * inch # 72pt
available_width = page_width - left_margin - right_margin # ≈ 451pt
# ─── Method 1: Proportional widths (RECOMMENDED) ───
# Define column ratios, auto-scale to fit available_width
col_ratios = [0.25, 0.40, 0.20, 0.15] # must sum to 1.0
col_widths = [r * available_width for r in col_ratios]
table = Table(data, colWidths=col_widths)
# ─── Method 2: Fixed + flex columns ───
# Some columns have fixed width (numbers, dates), rest fills remaining space
fixed_cols = {0: 80, 3: 60} # col 0 = 80pt, col 3 = 60pt
fixed_total = sum(fixed_cols.values())
flex_count = 4 - len(fixed_cols) # total columns minus fixed
flex_width = (available_width - fixed_total) / flex_count
col_widths = []
for i in range(4):
col_widths.append(fixed_cols.get(i, flex_width))
table = Table(data, colWidths=col_widths)
# ─── Method 3: Let ReportLab auto-calculate (simplest, less control) ───
# Pass colWidths=None - ReportLab auto-sizes based on content
# ⚠️ Risk: may still overflow if content is too wide
table = Table(data) # no colWidths
Checklist before adding any table:
- Calculate
available_widthfrom your actual page size and margins - Sum all colWidths - must be ≤
available_width - For CJK text: characters are wider than Latin - budget ~12pt per CJK char at 10pt font
- If a column has long text (descriptions, policies), use Paragraph() wrapping inside cells (not plain strings) - plain strings don't wrap and will overflow
- Test with the longest expected content in each column
Common mistake - plain strings don't wrap:
# ❌ Plain string: will overflow if text is long
data = [['Policy Name', 'Very long description that keeps going and going']]
# ✅ Paragraph: wraps within column width
from reportlab.platypus import Paragraph
from reportlab.lib.styles import getSampleStyleSheet
styles = getSampleStyleSheet()
cell_style = styles['Normal']
data = [[Paragraph('Policy Name', cell_style),
Paragraph('Very long description that keeps going and going', cell_style)]]
Do you need auto-TOC?
├─ YES → Use TocDocTemplate + doc.multiBuild(story)
└─ NO → Use SimpleDocTemplate + doc.build(story)
| Requirement | DocTemplate | Build Method |
|---|---|---|
| Multi-page with TOC | TocDocTemplate |
multiBuild() |
| Single-page or no TOC | SimpleDocTemplate |
build() |
| With Cross-References (no TOC) | SimpleDocTemplate |
build() |
| Both TOC + Cross-References | TocDocTemplate |
multiBuild() |
⚠️ CRITICAL:
multiBuild()is ONLY needed when usingTableOfContents- Using
build()withTocDocTemplate= TOC won't work - Using
multiBuild()withoutTocDocTemplate= unnecessary overhead
Rich Text Formatting
Prerequisites
To use <b>, <super>, <sub> tags, you must:
- Register fonts via
registerFont() - Call
registerFontFamily()to link normal/bold/italic variants - Wrap all tagged text in
Paragraph()objects
CRITICAL: These tags ONLY work inside Paragraph() objects. Plain strings will NOT render them.
Preventing Unwanted Line Breaks
# Use non-breaking space to prevent breaking
text = Paragraph("Professors (K.G.\u00A0Palepu) proposed...", style)
# Use wordWrap='CJK' for proper Chinese typography
styles.add(ParagraphStyle(name='BodyStyle', fontName='SimHei', fontSize=10.5,
leading=18, alignment=TA_LEFT, wordWrap='CJK'))
# Use <br/> for intentional line breaks (NOT \n)
text = Paragraph("Line 1<br/>Line 2<br/>Line 3", style)
Auto-Generated Table of Contents
❌ FORBIDDEN: Manual Table of Contents
NEVER manually create TOC with hardcoded page numbers.
⚠️ MANDATORY: TOC requires a cover page
Unless the user explicitly requests no cover, if a document has a Table of Contents, it MUST have a cover page. Structure: Cover (page 1) → TOC (page 2) → Content (page 3+). Do not generate a TOC without a preceding cover page.
✅ ALWAYS use auto-generated TOC:
from reportlab.platypus import SimpleDocTemplate, Paragraph, PageBreak, Spacer
from reportlab.platypus.tableofcontents import TableOfContents
from reportlab.lib.utils import simpleSplit
import hashlib
class TocDocTemplate(SimpleDocTemplate):
def afterFlowable(self, flowable):
if hasattr(flowable, 'bookmark_name'):
level = getattr(flowable, 'bookmark_level', 0)
text = getattr(flowable, 'bookmark_text', '')
key = getattr(flowable, 'bookmark_key', '')
# MUST pass key as 4th element - without it, TOC entries won't have clickable links
self.notify('TOCEntry', (level, text, self.page, key))
doc = TocDocTemplate("document.pdf", pagesize=letter)
story = []
toc = TableOfContents()
toc.levelStyles = [
ParagraphStyle(name='TOCHeading1', fontSize=14, leftIndent=20, fontName='Times New Roman'),
ParagraphStyle(name='TOCHeading2', fontSize=12, leftIndent=40, fontName='Times New Roman'),
]
story.append(Paragraph("<b>Table of Contents</b>", styles['Title']))
story.append(toc)
story.append(PageBreak())
def add_heading(text, style, level=0):
# Generate a unique bookmark key for this heading
key = 'h_%s' % hashlib.md5(text.encode()).hexdigest()[:8]
p = Paragraph('<a name="%s"/>%s' % (key, text), style)
p.bookmark_name = text
p.bookmark_level = level
p.bookmark_text = text
p.bookmark_key = key # This key enables clickable TOC links
return p
# ⚠️ Orphan Heading Prevention (NOT a page break rule)
# If an H1 heading would appear in the bottom 15% of the page with no room
# for at least a few lines of content, push it to the next page.
# This is NOT "every H1 starts a new page" — it only triggers when the heading
# would be orphaned at the very bottom.
available_height = A4[1] - top_margin - bottom_margin
H1_ORPHAN_THRESHOLD = available_height * 0.15 # ~100pt ≈ 3.5cm — enough for heading + 2 lines
def add_major_section(text, style):
"""Add an H1-level heading with orphan prevention (NOT forced page break)."""
return [
CondPageBreak(H1_ORPHAN_THRESHOLD), # Only break if heading would be orphaned at page bottom
add_heading(text, style, level=0),
]
story.extend(add_major_section("Chapter 1: Introduction", styles['Heading1']))
# ... content ...
doc.multiBuild(story) # MUST use multiBuild for TOC
⚠️ CRITICAL TOC LINK RULES:
afterFlowableMUST passkeyas the 4th element in the notify tuple - without it, TOC entries have no clickable linksadd_headingMUST setbookmark_keyAND embed<a name="key"/>in the Paragraph text - this creates the link destination- The key must be unique per heading - use a hash of the heading text
TOC with Leader Dots (Copy-Paste Ready)
⚠️ WARNING: This manual approach creates a visual-only TOC. It has NO clickable links and NO auto-updated page numbers. Use the auto-generated TOC above (TocDocTemplate + multiBuild) whenever possible. Only use this leader-dots approach if you need very specific visual styling that the auto TOC cannot provide - and even then, prefer the auto approach.
For a professional TOC with leader dots connecting titles to page numbers:
from reportlab.lib.pagesizes import A4
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph, PageBreak
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.lib import colors
from reportlab.lib.units import inch
# Setup
doc = SimpleDocTemplate("report.pdf", pagesize=A4,
leftMargin=0.75*inch, rightMargin=0.75*inch)
styles = getSampleStyleSheet()
styles['Heading1'].fontName = 'Times New Roman'
styles['Heading1'].textColor = colors.black
story = []
# Calculate dimensions
page_width = A4[0]
available_width = page_width - 1.5*inch
page_num_width = 50 # Fixed width for page numbers
dots_column_width = available_width - 200 - page_num_width
optimal_dot_count = int(dots_column_width / 4.5) # ~4.5pt per dot at 7pt font
# Define styles
toc_style = ParagraphStyle('TOCEntry', parent=styles['Normal'],
fontName='Times New Roman', fontSize=11, leading=16)
dots_style = ParagraphStyle('LeaderDots', parent=styles['Normal'],
fontName='Times New Roman', fontSize=7, leading=16)
# Build TOC
toc_data = [
[Paragraph('<b>Table of Contents</b>', styles['Heading1']), '', ''],
['', '', ''],
]
entries = [('Section 1', '5'), ('Section 2', '10')]
for title, page in entries:
toc_data.append([
Paragraph(title, toc_style),
Paragraph('.' * optimal_dot_count, dots_style),
Paragraph(page, toc_style)
])
toc_table = Table(toc_data, colWidths=[None, dots_column_width, page_num_width])
toc_table.setStyle(TableStyle([
('GRID', (0, 0), (-1, -1), 0, colors.white),
('LINEBELOW', (0, 0), (0, 0), 1.5, colors.black),
('ALIGN', (0, 0), (0, -1), 'LEFT'),
('ALIGN', (1, 0), (1, -1), 'LEFT'),
('ALIGN', (2, 0), (2, -1), 'RIGHT'),
('VALIGN', (0, 0), (-1, -1), 'TOP'),
('LEFTPADDING', (0, 0), (-1, -1), 0),
('RIGHTPADDING', (0, 0), (-1, -1), 0),
('TOPPADDING', (0, 2), (-1, -1), 3),
('BOTTOMPADDING', (0, 2), (-1, -1), 3),
('TEXTCOLOR', (1, 2), (1, -1), TEXT_MUTED), # From palette --c-muted
]))
story.append(toc_table)
story.append(PageBreak())
doc.build(story)
MANDATORY Leader Dots Rules:
| Rule | Requirement |
|---|---|
| Column widths | ✅ MUST use fixed values. Percentage-based widths are STRICTLY FORBIDDEN. |
| Dot count | ✅ MUST calculate dynamically: int(dots_column_width / 4.5). Hard-coded counts are STRICTLY FORBIDDEN. |
| Page number column | ✅ MUST be at least 40pt wide. |
| Dot font size | ✅ MUST NOT exceed 8pt. |
| Dot alignment | ✅ MUST be LEFT-aligned (visual flow from title). |
| Padding | ✅ MUST be exactly 0 between columns. |
| Bold text | ✅ MUST use Paragraph('<b>Text</b>', style). Plain strings like '<b>Text</b>' are STRICTLY FORBIDDEN. |
| Indentation | Use leading spaces for hierarchy (e.g., " 1.1 Subsection"). |
TOC Post-Generation Validation (MANDATORY)
After generating any PDF with a Table of Contents, run:
python3 "$PDF_SKILL_DIR/scripts/toc_validate.py" check-pdf output.pdf
If any errors are returned, fix the code and regenerate. Common issues:
TOC_ALL_SAME_PAGE→ You usedbuild()instead ofmultiBuild(), so all page numbers are stuck at 1.TOC_NO_ENTRIES→ Headings are missingbookmark_name/bookmark_levelattributes.TOC_PAGES_INVALID→ A TOC entry references a page beyond the document's total page count.
Cross-References (Figures, Tables, Bibliography)
Pre-register all figures, tables, and references BEFORE using them in text.
class CrossReferenceDocument:
def __init__(self):
self.figures = {}
self.tables = {}
self.refs = {}
self.figure_counter = 0
self.table_counter = 0
self.ref_counter = 0
def add_figure(self, name):
if name not in self.figures:
self.figure_counter += 1
self.figures[name] = self.figure_counter
return self.figures[name]
def add_table(self, name):
if name not in self.tables:
self.table_counter += 1
self.tables[name] = self.table_counter
return self.tables[name]
def add_reference(self, name):
if name not in self.refs:
self.ref_counter += 1
self.refs[name] = self.ref_counter
return self.refs[name]
Image Handling
Diagram Embedding Rules (MANDATORY)
Figures are block-level Flowables. Every image/diagram in ReportLab MUST be appended to story as a standalone Image() or wrapped in KeepTogether([image, caption]). Never simulate floating layouts by placing images inside Paragraph text or multi-column Table cells alongside body text.
Complex diagrams (>12 nodes): Decompose into a ReportLab Table (details) + a simplified diagram image (overview). The diagram gives the big picture; the table carries precision. See SKILL.md "Complex Diagram Strategy".
Diagram quality: When generating flowcharts/diagrams for PDF embedding (via Playwright screenshot or ReportLab Drawing), follow the diagram content quality rules in SKILL.md § "Diagram Content Quality Rules". Key points: connectors must not pass through nodes, no high-saturation large fills, multi-arrow convergence must use merge pattern, font sizes must be legible at final embedded size (see context-specific minimums in SKILL.md).
Cross-brief diagram pipeline: For flowcharts/architecture diagrams, generate via Playwright+CSS (HTML nodes + CSS layout + SVG connectors), screenshot at 2× device scale factor for 300dpi print quality, then embed with Image(). See SKILL.md § "Diagram Generation Strategy" for the full pipeline.
🚫 FORBIDDEN: TikZ standalone → PNG pipeline in Report brief. Report uses ReportLab, not LaTeX. Going through TikZ requires a LaTeX compiler, adds compilation steps, and models frequently produce broken TikZ code. Use Playwright+CSS instead - it's native to the existing cover rendering engine.
# Playwright+CSS diagram → PNG → ReportLab embedding
# 1. LLM generates diagram.html (CSS grid/flexbox nodes + SVG arrows)
# 2. Playwright screenshots at 2× scale (300dpi equivalent)
# 3. Embed:
from reportlab.platypus import Image
img = Image('diagram.png', width=450) # auto height via aspect ratio
story.append(img) # block-level Flowable
⚠️ CRITICAL: Preserve aspect ratio - NEVER hardcode both width and height without reading actual image dimensions.
⚠️ CRITICAL: Always constrain to available space - NEVER insert at original size if it exceeds container width/height.
Pie charts become ellipses, radar charts become diamonds, photos get stretched if you set arbitrary width/height.
→ Full overflow prevention spec: typesetting/overflow.md - read for the complete bounding box system.
from PIL import Image as PILImage
from reportlab.platypus import Image
from reportlab.lib.pagesizes import A4
from reportlab.lib.units import inch
def embed_image(path: str, max_width: float = None, max_height: float = None) -> Image:
"""Embed image with preserved aspect ratio, constrained to container.
ALWAYS use this pattern. Never hardcode both width and height.
If max_width is None, defaults to available_width (page - margins).
"""
if max_width is None:
max_width = A4[0] - 2.0 * inch # Default: A4 width minus 1" margins each side
if max_height is None:
max_height = A4[1] * 0.35 # ~294pt ≈ 10cm — leaves room for caption + surrounding text on same page
pil_img = PILImage.open(path)
orig_w, orig_h = pil_img.size
# Scale to fit within BOTH max_width and max_height
ratio_w = max_width / orig_w if orig_w > max_width else 1.0
ratio_h = max_height / orig_h if orig_h > max_height else 1.0
ratio = min(ratio_w, ratio_h)
return Image(path, width=orig_w * ratio, height=orig_h * ratio)
# Usage:
available_width = A4[0] - left_margin - right_margin
img = embed_image('chart.png', max_width=available_width, max_height=300)
story.append(img)
# ❌ NEVER do this - distorts the image:
img = Image('chart.png', width=400, height=250) # arbitrary height breaks aspect ratio
PDF Metadata
doc = SimpleDocTemplate(
pdf_filename,
pagesize=letter,
title=os.path.splitext(pdf_filename)[0],
author='Z.ai',
creator='Z.ai',
subject='Document purpose description'
)
⚠️ MANDATORY: Post-Generation Metadata
After doc.build(story) completes:
python3 "$PDF_SKILL_DIR/scripts/pdf.py" meta.brand output.pdf
⚠️ MANDATORY: Post-Generation Blank Page Cleanup
After metadata branding, remove any accidental blank pages:
python3 "$PDF_SKILL_DIR/scripts/pdf.py" pages.clean output.pdf -o output_clean.pdf
If blank pages were found, rename output_clean.pdf → output.pdf.
MANDATORY: Post-Generation Code Sanitization
After writing PDF generation code and BEFORE executing it, sanitize using:
# Step 0: Set skill root path (see SKILL.md § Script Path Setup)
PDF_SKILL_DIR="<skill_directory>"
# Step 1: Write code to .py file
cat > generate_pdf.py << 'PYEOF'
# ... your PDF generation code ...
PYEOF
# Step 2: Sanitize (MUST run before execution)
python3 "$PDF_SKILL_DIR/scripts/pdf.py" code.sanitize generate_pdf.py
# Step 3: Execute
python3 generate_pdf.py
# Step 4: Add metadata (MUST run after PDF creation)
python3 "$PDF_SKILL_DIR/scripts/pdf.py" meta.brand output.pdf
# Step 5: Check for missing glyphs (RECOMMENDED)
python3 "$PDF_SKILL_DIR/scripts/pdf.py" font.check output.pdf
# Step 6: Check TOC quality (if document has TOC)
python3 "$PDF_SKILL_DIR/scripts/pdf.py" toc.check output.pdf
# Step 7: PDF quality assurance scan (MUST run after all other checks)
python3 "$PDF_SKILL_DIR/scripts/pdf_qa.py" output.pdf
FORBIDDEN patterns:
# ❌ PROHIBITED: python -c with inline code
python -c "from reportlab... doc.build(story)"
# ❌ PROHIBITED: heredoc without saving to file first
python3 << 'EOF'
from reportlab...
EOF
# ❌ PROHIBITED: executing .py without sanitizing
python generate_pdf.py # Missing sanitization!
Debug Tips for Layout Issues
→ Full overflow prevention spec: typesetting/overflow.md - read it for the complete bounding box system, two-pass rendering, fallback strategies, and all three route implementations.
from reportlab.platypus import HRFlowable
from reportlab.lib.colors import red
# Visualize spacing during development - insert between elements
story.append(table)
story.append(HRFlowable(width="100%", color=red, thickness=0.5, spaceBefore=0, spaceAfter=0))
story.append(Spacer(1, 6))
story.append(HRFlowable(width="100%", color=red, thickness=0.5, spaceBefore=0, spaceAfter=0))
story.append(caption)
# Red lines create visual markers to see actual spacing - remove before final build
Resume / CV Template (ATS-Friendly)
Single-column, clean layout optimised for Applicant Tracking Systems (ATS). No graphics, no sidebars, no colour blocks - just well-structured text that parses perfectly by HR software.
When to use this template:
- Applying to corporate jobs through online portals
- Any scenario where the PDF will be machine-parsed before a human reads it
- When the recruiter explicitly asks for "a standard resume"
When NOT to use (use Academic brief instead):
- Creative/design industry positions → Academic brief (AltaCV-style)
- Academic CV with publications → Academic brief (Academic CV)
Resume Design Rules
- Target 1 page unless user specifies otherwise
- Margins:
left=1.5cm, right=1.5cm, top=1.5cm, bottom=1.5cm - No cover page - content starts immediately
- No TOC - too short for table of contents
- Font: Times New Roman (English) or SimHei (Chinese)
- Body font size: 10-10.5pt; Name: 22-26pt; Section titles: 13-14pt
- ⚠️ Minimum font size: 12px (9pt) - HARD FLOOR. No text in the entire resume may render smaller than 12px. This includes contact info, meta text, footnotes, and captions. Anything below 12px is unreadable in print and fails accessibility checks.
- Section separator: thin horizontal rule (
HRFlowable) - Bullet style:
•with tight spacing
Resume Line-Break Rules (Language-Aware)
- English: Prefer breaking at word boundaries (spaces, hyphens). If a long word must be split to avoid excessive whitespace, break at a valid syllable boundary and insert a hyphen (
-) - this is standard typographic practice (e.g.,experi-\nence,develop-\nment). ReportLab supportswordWrap='CJK'only for CJK content; for English use default paragraph wrapping withallowWidows=0, allowOrphans=0. - Chinese/CJK: Break allowed between any two CJK characters. Never break between a CJK character and its adjacent punctuation (、。,)》 etc. must stay with the preceding character).
- Mixed content (e.g., "Python 开发工程师"): Break at CJK boundaries or English word boundaries. Never split an English word in a CJK paragraph unless hyphenated.
- Contact line: Email, phone, location separated by
|or·. Each segment must stay on one line - if too long, move to next line at the separator, not mid-segment. - Dates and ranges: "Jan 2022 - Present" must stay as one unit. Never break a date range across lines.
Resume Page-Fill Rules (Anti-Blank-Space)
- Goal: Fill ≥85% of the page height. Content should reach at least the bottom quarter of the page. A resume that stops at 60% height with blank space below = FAIL.
- Adaptive spacing strategy (apply in order until page is ≥85% filled):
- Increase
spaceBefore/spaceAfteron section headers (from 10pt up to 18pt) - Increase
leading(line height) on body text (from 14pt up to 18pt) - Increase body
fontSizeby 0.5-1pt (from 10pt up to 11.5pt max) - Add a "Professional Summary" or "Key Achievements" section if none exists
- Increase margins slightly (from 1.5cm up to 2cm) to reduce line width and push content downward
- Increase
- Never leave a visible blank area > 3cm at the bottom of the page.
- If content overflows to page 2: do the reverse - reduce spacing, tighten leading (min 12pt), reduce fontSize (min 9pt / 12px), before removing content.
Complete Resume Template
from reportlab.lib.pagesizes import A4
from reportlab.lib.units import cm, mm
from reportlab.lib.styles import ParagraphStyle
from reportlab.lib.enums import TA_LEFT, TA_CENTER
from reportlab.lib import colors
from reportlab.platypus import (
SimpleDocTemplate, Paragraph, Spacer, HRFlowable, Table, TableStyle
)
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
from reportlab.pdfbase.pdfmetrics import registerFontFamily
# ── Font Registration ──
pdfmetrics.registerFont(TTFont('TimesNewRoman', '/usr/share/fonts/truetype/english/Times-New-Roman.ttf'))
registerFontFamily('TimesNewRoman', normal='TimesNewRoman', bold='TimesNewRoman')
# For Chinese resumes, also register SimHei:
# pdfmetrics.registerFont(TTFont('SimHei', '/usr/share/fonts/truetype/chinese/SimHei.ttf'))
# registerFontFamily('SimHei', normal='SimHei', bold='SimHei')
# ── Styles ──
# ACCENT must come from palette.generate (Step 2)
# Run: python3 "$PDF_SKILL_DIR/scripts/pdf.py" palette.generate --title "Resume" --mode minimal
ACCENT = colors.HexColor('<accent from palette>') # Replace with palette output
name_style = ParagraphStyle(
'ResumeName', fontName='TimesNewRoman', fontSize=24,
leading=28, alignment=TA_CENTER, spaceAfter=2
)
contact_style = ParagraphStyle(
'ResumeContact', fontName='TimesNewRoman', fontSize=10,
leading=14, alignment=TA_CENTER, textColor=TEXT_MUTED, # From palette --c-muted
spaceAfter=8
)
section_title_style = ParagraphStyle(
'ResumeSectionTitle', fontName='TimesNewRoman', fontSize=13,
leading=16, spaceBefore=10, spaceAfter=4,
textColor=ACCENT
)
job_title_style = ParagraphStyle(
'ResumeJobTitle', fontName='TimesNewRoman', fontSize=11,
leading=14, spaceAfter=1
)
job_meta_style = ParagraphStyle(
'ResumeJobMeta', fontName='TimesNewRoman', fontSize=10,
leading=13, textColor=TEXT_MUTED, spaceAfter=4 # From palette --c-muted
)
bullet_style = ParagraphStyle(
'ResumeBullet', fontName='TimesNewRoman', fontSize=10,
leading=14, leftIndent=14, bulletIndent=0,
spaceBefore=1, spaceAfter=1
)
body_style = ParagraphStyle(
'ResumeBody', fontName='TimesNewRoman', fontSize=10,
leading=14, spaceAfter=2
)
# ── Helpers ──
def section_header(title):
"""Section title + thin rule separator."""
return [
Paragraph(f'<b>{title}</b>', section_title_style),
HRFlowable(width='100%', thickness=0.8, color=ACCENT,
spaceBefore=0, spaceAfter=6),
]
def experience_entry(title, company, dates, location, bullets):
"""One work experience block."""
elements = [
Paragraph(f'<b>{title}</b>', job_title_style),
Paragraph(f'{company} | {dates} | {location}', job_meta_style),
]
for b in bullets:
elements.append(Paragraph(f'• {b}', bullet_style))
elements.append(Spacer(1, 4))
return elements
def education_entry(degree, school, dates, details=None):
"""One education block."""
elements = [
Paragraph(f'<b>{degree}</b>', job_title_style),
Paragraph(f'{school} | {dates}', job_meta_style),
]
if details:
elements.append(Paragraph(details, body_style))
elements.append(Spacer(1, 4))
return elements
def skills_row(categories):
"""
Skills as compact label: value pairs.
categories = [('Programming', 'Python, Java, C++'), ('Tools', 'Git, Docker')]
"""
elements = []
for cat, vals in categories:
elements.append(Paragraph(f'<b>{cat}:</b> {vals}', body_style))
return elements
# ── Build Document ──
doc = SimpleDocTemplate(
'resume.pdf', pagesize=A4,
leftMargin=1.5*cm, rightMargin=1.5*cm,
topMargin=1.5*cm, bottomMargin=1.5*cm,
title='Resume - Your Name',
author='Z.ai', creator='Z.ai'
)
story = []
# Header
story.append(Paragraph('<b>YOUR NAME</b>', name_style))
story.append(Paragraph(
'email@example.com | +86 138-0000-0000 | Shanghai, China | github.com/yourname',
contact_style
))
# Summary
story.extend(section_header('PROFESSIONAL SUMMARY'))
story.append(Paragraph(
'Results-driven software engineer with 5+ years of experience in backend systems, '
'distributed computing, and cloud-native architectures. Led a team of 8 engineers '
'delivering a real-time data pipeline processing 2M+ events/sec.',
body_style
))
# Experience
story.extend(section_header('WORK EXPERIENCE'))
story.extend(experience_entry(
'Senior Software Engineer', 'Tech Company Inc.', 'Jan 2022 - Present', 'Shanghai',
[
'Designed and deployed a microservices architecture serving 10M daily active users',
'Reduced API latency by 40% through query optimisation and caching strategies',
'Mentored 3 junior engineers; established code review standards adopted team-wide',
]
))
story.extend(experience_entry(
'Software Engineer', 'Startup Co.', 'Jul 2019 - Dec 2021', 'Beijing',
[
'Built real-time recommendation engine using collaborative filtering (CTR +25%)',
'Implemented CI/CD pipeline reducing deployment time from 2 hours to 15 minutes',
]
))
# Education
story.extend(section_header('EDUCATION'))
story.extend(education_entry(
'M.Sc. Computer Science', 'Tsinghua University', '2017 - 2019',
'GPA: 3.8/4.0 | Thesis: Distributed Graph Processing on Heterogeneous Clusters'
))
story.extend(education_entry(
'B.Eng. Software Engineering', 'Zhejiang University', '2013 - 2017'
))
# Skills
story.extend(section_header('SKILLS'))
story.extend(skills_row([
('Languages', 'Python, Java, Go, SQL, TypeScript'),
('Frameworks', 'Spring Boot, FastAPI, React, Kubernetes, Kafka'),
('Tools', 'Git, Docker, Terraform, AWS (EC2/S3/Lambda), PostgreSQL, Redis'),
]))
doc.build(story)
Resume Checklist
- 1 page (unless user says otherwise)
- No cover page, no TOC
- Tight margins (1.5cm all sides)
- Name prominent at top (22-26pt)
- Contact info single line, centered
- Section headers with consistent separator style
- Bullets concise - start with action verbs
- Quantified achievements (%, $, count)
- No photos, no icons, no colour blocks (ATS-safe)
- Font: only registered fonts (Times New Roman / SimHei)
- ⚠️ Minimum font size 12px (9pt) - no text smaller than this anywhere
- Line breaks are language-aware - no mid-word English breaks, no CJK punctuation orphans, no date range splits
- Page fill ≥85% - no large blank area at bottom. If sparse, increase spacing/leading/font size adaptively
Component Reference
Block types:
| Type | Description | Notes |
|---|---|---|
| h1 | 22pt + accent underline rule | KeepTogether with rule |
| h2 | 15pt, dark | No rule, no accent |
| h3 | 11.5pt bold | No accent |
| body | 10.5pt justified, 17pt leading | Supports <b> <i> <font> |
| bullet | Body size with • prefix |
Unordered list |
| numbered | Body size with N. prefix | Counter auto-resets |
| callout | Accent left border 4px + light tint bg | Max one per section |
| table | Accent header + alternating rows + outer box only | Supports fractional col_widths |
| code | Courier 8.5pt + accent left border | Optional language label |
| divider | Accent 1.2pt rule | Use sparingly |
| caption | 8.5pt muted, centered | Below images/tables |
| chart | matplotlib figure saved as PNG → Image() flowable |
Generate chart in-script, fig.savefig() → embed. Set plt.rcParams['font.sans-serif']=['SimHei'] for CJK. Always tight_layout(). Must follow typesetting/charts.md rules: delete top/right spines, dashed grid 20% opacity, donut default for pie, anti-stacking labels. |
| quote | Body italic + left indent 24pt + muted accent left border 2px | For blockquotes / testimonials |
| bibliography | Hanging indent (firstLineIndent=-24pt, leftIndent=24pt) | GB/T 7714 or APA format per language |
| math | Rendered via <super> <sub> tags in Paragraph |
For inline math; complex equations → use academic brief instead |
Header / footer:
- Header: document title (left, 7.5pt, muted) + accent rule (1.5pt, full width)
- Footer: author (left, 7.5pt, muted) + page number (right, 7.5pt, muted) + light rule above
Custom flowables (preferred over Table hacks):
- CalloutBox: accent 4px left border + light tint background - cleaner than Table simulation
- BibliographyItem: hanging indent reference entry
Quick Reference
| Task | Best Tool | Command/Code |
|---|---|---|
| Create PDF (ReportLab) | reportlab | Canvas or Platypus |
| Fill PDF forms | Process brief | python3 "$PDF_SKILL_DIR/scripts/pdf.py" form.fill or annotation workflow |
| Merge PDFs | Process brief | python3 "$PDF_SKILL_DIR/scripts/pdf.py" pages.merge |
| Extract text | Process brief | python3 "$PDF_SKILL_DIR/scripts/pdf.py" extract.text |
| Extract tables | Process brief | python3 "$PDF_SKILL_DIR/scripts/pdf.py" extract.table |
Document-Type Quick Reference
📋 Invoice / Receipt
Key elements:
- Company logo/name at top
- Invoice number + date (right-aligned)
- Seller/buyer info block (2-column layout)
- Line items table: Item, Quantity, Unit Price, Amount
- Total row with bold font
- Tax info + payment terms at bottom
- Stamp/seal placeholder (empty circle or rect with "Seal Here" text)
# Invoice table pattern
data = [['Item', 'Qty', 'Unit Price', 'Amount']]
data += [[item['name'], item['qty'], item['price'], item['total']] for item in items]
data.append(['', '', 'Total:', f'¥{total}'])
📝 Contract / Legal Document
Key elements:
- Title centered, bold, 18pt
- Numbered clauses with auto-increment
- Signature block at bottom: Party A / Party B with date line
- Page numbers mandatory
# Signature block pattern
sig_data = [
['Party A (Seal): ________________', 'Party B (Seal): ________________'],
['Date: ____/____/________', 'Date: ____/____/________'],
]
sig_table = Table(sig_data, colWidths=[250, 250])
📊 Book / Long Document
For documents with 10+ pages:
- Generate TOC with
TableOfContents()from reportlab.platypus - Use
CondPageBreak(H1_ORPHAN_THRESHOLD)before H1 headings (NOTPageBreak()— never force page breaks between chapters) - Add running headers/footers with chapter title + page number
- Run
python3 "$PDF_SKILL_DIR/scripts/toc_validate.py" output.pdfto verify TOC links - Consider using
bookmarks=Truefor PDF outline navigation
📝 Exam / Quiz / Test Paper / Worksheet
Exam papers have unique layout requirements. These rules are MANDATORY when generating any test/quiz/exam document.
Numbering & Structure
一、选择题(每小题 3 分,共 30 分) ← Section header (宋体/黑体 14pt Bold)
1. 以下哪个不是 Python 内置数据类型? ← Question stem (12pt)
A. int ← Options: indented 24pt (2em)
B. float ← Each option on new line or 2×2 grid
C. array
D. str
2. 以下表达式的值为? ← Next question
A. True B. False ← Short options: inline 2×2 grid OK
C. None D. Error
Option Indentation (MANDATORY)
# ReportLab styles for exam papers
question_style = ParagraphStyle(
'Question', fontSize=12, leading=16,
leftIndent=0, firstLineIndent=0,
spaceBefore=12, spaceAfter=4,
)
option_style = ParagraphStyle(
'Option', fontSize=12, leading=16,
leftIndent=24, # ← MANDATORY: 24pt indent from question stem
firstLineIndent=0,
spaceBefore=2, spaceAfter=2,
)
Option Layout Decision
def format_options(options):
"""
Decide option layout based on content length.
Short options (≤4 chars each, ≤4 options) → 2×2 grid table
Long options or >4 options → vertical list, one per line
"""
max_len = max(len(opt) for opt in options)
if len(options) <= 4 and max_len <= 6: # CJK: 4 chars; Latin: 6 chars
# 2×2 grid using Table
grid = Table(
[[options[0], options[1]], [options[2], options[3]]],
colWidths=[200, 200], hAlign='LEFT'
)
grid.setStyle(TableStyle([
('LEFTPADDING', (0,0), (-1,-1), 24), # Match option indent
('VALIGN', (0,0), (-1,-1), 'TOP'),
]))
return grid
else:
# Vertical list
return [Paragraph(f'{opt}', option_style) for opt in options]
Answer Space Reservation (MANDATORY)
Every question MUST have adequate answer space. No exceptions.
| Question Type | Minimum Space | Implementation |
|---|---|---|
| Multiple choice | 0 extra (options are the answer) | Just question + options |
| Fill-in-the-blank | Inline underline, min 80pt width | <u> </u> or ReportLab HRFlowable |
| Short answer | 40-60pt (2-3 lines) | Spacer(1, 50) |
| Calculation / math work | 120-200pt (6-10 lines) | Spacer(1, 160) with light guide lines |
| Essay / long answer | 200-400pt (10-20 lines) | Spacer(1, 300) with light guide lines |
| Proof / derivation | 160-250pt (8-12 lines) | Spacer(1, 200) |
# Answer line helper — light dashed lines for handwriting
def add_answer_lines(story, num_lines=8, line_spacing=20):
"""Add light guide lines for handwritten answers."""
for i in range(num_lines):
story.append(Spacer(1, line_spacing))
story.append(HRFlowable(
width='90%', thickness=0.3,
color=colors.Color(0.8, 0.8, 0.8), # Light gray
dash=[4, 4], # Dashed
spaceAfter=0, spaceBefore=0
))
Page Density
spaceBefore=12ptbetween questions minimumspaceBefore=24ptbefore section headers (一、二、三)- Score indicator after question number:
1. (分值: 5分)or1. [5 pts] - Page header: exam title + time limit + total score
- No cover page unless explicitly requested
Data-to-Ink Ratio Rules (MANDATORY for Reports)
CRITICAL: Do NOT write long paragraphs to describe data trends. You MUST extract metrics and trends into structured visual elements.
Pattern Detection Table
Before writing ANY body paragraph, scan the text. If you find any of these patterns, extract them:
| Raw text pattern | ❌ FORBIDDEN | ✅ REQUIRED visual form |
|---|---|---|
| "revenue grew 12% to $4.2M" | Bury in paragraph prose | CalloutBox with bold +12% + $4.2M |
| "latency dropped from 120ms to 75ms" | Long explanatory sentence | CalloutBox or metrics table row |
| "Q1→Q2→Q3: 10%→25%→40%" | Inline numbers in text | Chart (matplotlib PNG → Image()) |
| "first...second...third..." steps | Paragraph with ordinal words | Numbered Table or process list |
| "compared to last year / vs Q2" | Nested comparisons in prose | Side-by-side comparison table |
| "accounted for 60% of total" | Percentage in paragraph | Pie chart or stacked bar chart |
ReportLab CalloutBox Template
Use this pattern for extracted metrics - it's visually clean and takes only 5 lines:
stat_style = ParagraphStyle(
name='StatBig', fontName='Times New Roman', fontSize=22,
leading=26, textColor=ACCENT, alignment=TA_CENTER # From palette
)
label_style = ParagraphStyle(
name='StatLabel', fontName='Times New Roman', fontSize=9,
leading=12, textColor=TEXT_MUTED, alignment=TA_CENTER # From palette
)
callout = Table(
[[Paragraph('<b>+12%</b>', stat_style)],
[Paragraph('Revenue Growth vs Q2', label_style)]],
colWidths=[160]
)
callout.setStyle(TableStyle([
('BACKGROUND', (0,0), (-1,-1), BG_SURFACE), # From palette --c-mid
('BOX', (0,0), (-1,-1), 1, ACCENT), # From palette --c-accent
('TOPPADDING', (0,0), (-1,-1), 10),
('BOTTOMPADDING', (0,0), (-1,-1), 10),
('VALIGN', (0,0), (-1,-1), 'MIDDLE'),
]))
story.append(KeepTogether(callout))
Cover Pages (HTML/Playwright)
When the cover routes through Playwright:
- Use
Delta_Widgetcomponents for KPIs and metrics. - Use
Process_Listcomponents for workflows and timelines. - Use
Sidenote_Blockcomponents (intufte_reportarchetype) for citations and supplementary data. - For data-heavy pages, add
data_pointsarrays to any component - the engine renders them as content-aware background curves.
Critical Reminders Checklist
Before submitting code, verify:
- Font restriction: Only 6 registered fonts used
- Font family registered:
registerFontFamily()called for all used fonts - Mixed language:
install_font_fallback()called after font registration (auto handles<font>wrapping) - Rich text tags: Only inside
Paragraph()objects - Table cells: ALL text wrapped in
Paragraph() - Scientific notation: Large/small numbers use
<super>tags - Chinese text:
wordWrap='CJK'in ParagraphStyle - Line breaks: Using
<br/>not\n - Headings: Bold with
<b>tags - Table headers: White bold text on dark blue (#1F4E79)
- Metadata: Title matches filename, Author/Creator = "Z.ai"
- Sanitization: Code sanitized before execution
- Post-build metadata:
pdf.py meta.brandcalled after build - Post-build blank page cleanup:
pdf.py pages.cleancalled after build - Glyph check:
pdf.py font.checkrun to verify no missing glyphs - TOC check:
pdf.py toc.checkrun if document has TOC (entries, pages, links) - PDF QA scan:
pdf_qa.pyrun to verify page consistency, CJK punctuation, overflow, margins, table centering, font embedding, metadata