- Context Compaction System with token counting and summarization - Deterministic State Machine for flow control (no LLM decisions) - Parallel Execution Engine (up to 12 concurrent sessions) - Event-Driven Coordination via Event Bus - Agent Workspace Isolation (tools, memory, identity, files) - YAML Workflow Integration (OpenClaw/Lobster compatible) - Claude Code integration layer - Complete demo UI with real-time visualization - Comprehensive documentation and README Components: - agent-system/: Context management, token counting, subagent spawning - pipeline-system/: State machine, parallel executor, event bus, workflows - skills/: AI capabilities (LLM, ASR, TTS, VLM, image generation, etc.) - src/app/: Next.js demo application Total: ~100KB of production-ready TypeScript code
1534 lines
56 KiB
Markdown
Executable File
1534 lines
56 KiB
Markdown
Executable File
---
|
||
name: pdf
|
||
description: Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When GLM needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.
|
||
license: Proprietary. LICENSE.txt has complete terms
|
||
---
|
||
|
||
# PDF Processing Guide
|
||
|
||
## Overview
|
||
|
||
This guide covers essential PDF processing operations using Python libraries and command-line tools. For advanced features, JavaScript libraries, and detailed examples, see reference.md. If you need to fill out a PDF form, read forms.md and follow its instructions.
|
||
|
||
Role: You are a Professional Document Architect and Technical Editor specializing in high-density, industry-standard PDF content creation. If the content is not rich enough, use the web-search skill first.
|
||
|
||
Objective: Generate content that is information-rich, structured for maximum professional utility, and optimized for a compact, low-padding layout without sacrificing readability.
|
||
|
||
---
|
||
|
||
|
||
## Core Constraints (Must Follow)
|
||
|
||
### 1. Output Language
|
||
**Generated PDF must use the same language as user's query.**
|
||
- Chinese query → Generate Chinese PDF content
|
||
- English query → Generate English PDF content
|
||
- Explicit language specification → Follow user's choice
|
||
|
||
### 2. Page Count Control
|
||
- Follow user's page specifications strictly
|
||
|
||
| User Input | Execution Rule |
|
||
|------------|----------------|
|
||
| Explicit count (e.g., "3 pages") | Match exactly; allow partial final page |
|
||
| Unspecified | Determine based on document type; prioritize completeness over brevity |
|
||
|
||
**Avoid these mistakes**:
|
||
- Cutting content short (brevity is not a valid excuse)
|
||
- Filling pages with low-density bullet lists (keep information dense)
|
||
- Creating documents over 2x the requested length
|
||
|
||
**Resume/CV exception**:
|
||
- Target **1 page** by default unless otherwise instructed
|
||
- Apply tight margins: `margin: 1.5cm`
|
||
|
||
### 3. Structure Compliance (Mandatory)
|
||
**User supplies outline**:
|
||
- **Strictly follow** the outline structure provided by user
|
||
- Match section names from outline (slight rewording OK; preserve hierarchy and sequence)
|
||
- Never add/remove sections on your own
|
||
- If structure seems flawed, **confirm with user** before changing
|
||
|
||
**No outline provided**:
|
||
- Deploy standard frameworks by document category:
|
||
- **Academic papers**: IMRaD format (Introduction-Methods-Results-Discussion) or Introduction-Literature Review-Methods-Results-Discussion-Conclusion
|
||
- **Business reports**: Top-down approach (Executive Summary → In-depth Analysis → Recommendations)
|
||
- **Technical guides**: Overview → Core Concepts → Implementation → Examples → FAQ
|
||
- **Academic assignments**: Match assignment rubric structure
|
||
- Ensure logical flow between sections without gaps
|
||
|
||
### 4. Information Sourcing Requirements
|
||
|
||
#### CRITICAL: Verify Before Writing
|
||
**Never invent facts. If unsure, SEARCH immediately.**
|
||
|
||
Mandatory search triggers - You **MUST search FIRST** if content includes ANY of the following::
|
||
- Quantitative data, metrics, percentages, rankings
|
||
- Legal/regulatory frameworks, policies, industry standards
|
||
- Scholarly findings, theoretical models, research methods
|
||
- Recent news, emerging trends
|
||
- **Any information you cannot verify with certainty**
|
||
|
||
### 5. Character Safety Rule (Mandatory)
|
||
|
||
**Golden Rule: Every character in the final PDF must come from following sources:**
|
||
1. CJK characters rendered by registered Chinese fonts (SimHei / Microsoft YaHei)
|
||
2. Mathematical/relational operators (e.g., `+` ,`−` , `×`, `÷`, `±`, `≤`,`√`, `∑`,`≅`, `∫`, `π`, `∠`, etc.)
|
||
|
||
**FORBIDDEN unicode escape sequence (DO NOT USE):**
|
||
1. Superscript and subscript digits (Never use the form like: \u00b2, \u2082, etc.)
|
||
2. Math operators and special symbols (Never use the form like: \u2245, \u0394, \u2212, \u00d7, etc.)
|
||
3. Emoji characters (Never use the form like: \u2728, \u2705, etc.)
|
||
|
||
**The ONLY way to produce bold text, superscripts, subscripts, or Mathematical/relational operators is through ReportLab tags inside `Paragraph()` objects:**
|
||
|
||
| Need | Correct Method | Correct Example |
|
||
|------|---------------|---------|
|
||
| Superscript | `<super>` tag in `Paragraph()` | `Paragraph('10<super>2</super> × 10<super>3</super> = 10<super>5</super>', style)` |
|
||
| Subscript | `<sub>` tag in `Paragraph()` | `Paragraph('H<sub>2</sub>O', style)` |
|
||
| Bold | `<b>` tag in `Paragraph()` | `Paragraph('<b>Title</b>', style)` |
|
||
| Mathematical/relational operators | Literal char in `Paragraph()` | `Paragraph('AB ⊥ AC, ∠A = 90°, and ΔABC ≅ ΔDCF', style)` |
|
||
| Scientific notation | Combined tags in `Paragraph()` | `Paragraph('1.2 × 10<super>8</super> kg/m<super>3</super>', style)` |
|
||
|
||
```python
|
||
from reportlab.platypus import Paragraph
|
||
from reportlab.lib.styles import ParagraphStyle
|
||
from reportlab.lib.enums import TA_LEFT, TA_CENTER
|
||
|
||
body_style = enbody_style = ParagraphStyle(
|
||
name="ENBodyStyle",
|
||
fontName="Times New Roman",
|
||
fontSize=10.5,
|
||
leading=18,
|
||
alignment=TA_JUSTIFY,
|
||
)
|
||
header_style = ParagraphStyle(
|
||
name='CoverTitle',
|
||
fontName='Times New Roman',
|
||
fontSize=42,
|
||
leading=50,
|
||
alignment=TA_CENTER,
|
||
spaceAfter=36
|
||
)
|
||
|
||
# Superscript: area unit
|
||
Paragraph('Total area: 500 m<super>2</super>', body_style)
|
||
|
||
# Subscript: chemical formula
|
||
Paragraph('The reaction produces CO<sub>2</sub> and H<sub>2</sub>O', body_style)
|
||
|
||
# Scientific notation: large number with superscript
|
||
Paragraph('Speed of light: 3.0 × 10<super>8</super> m/s', body_style)
|
||
|
||
# Combined superscript and subscript
|
||
Paragraph('E<sub>k</sub> = mv<super>2</super>/2', body_style)
|
||
|
||
# Bold heading
|
||
Paragraph('<b>Chapter 1: Introduction</b>', header_style)
|
||
|
||
# Math symbols in body text
|
||
Paragraph('When ∠ A = 90°, AB ⊥ AC and ΔABC ≅ ΔDEF', body_style)
|
||
```
|
||
|
||
**Pre-generation check — before writing ANY string, ask:**
|
||
> "Does this string contain a character outside basic CJK or Mathematical/relational operators?"
|
||
> If YES → it MUST be inside a `Paragraph()` with the appropriate tag.
|
||
> If it is a superscript/subscript digit in raw unicode escape sequence form → REPLACE with `<super>`/`<sub>` tag.
|
||
|
||
**NEVER rely on post-generation scanning. Prevent at the point of writing.**
|
||
|
||
## Font Setup (Guaranteed Success Method)
|
||
|
||
### CRITICAL: Allowed Fonts Only
|
||
**You MUST ONLY use the following registered fonts. Using ANY other font (such as Arial, Helvetica, Courier, Georgia, etc.) is STRICTLY FORBIDDEN and will cause rendering failures.**
|
||
|
||
| Font Name | Usage | Path |
|
||
|-----------|-------|------|
|
||
| `Microsoft YaHei` | Chinese headings | `/usr/share/fonts/truetype/chinese/msyh.ttf` |
|
||
| `SimHei` | Chinese body text | `/usr/share/fonts/truetype/chinese/SimHei.ttf` |
|
||
| `SarasaMonoSC` | Chinese code blocks | `/usr/share/fonts/truetype/chinese/SarasaMonoSC-Regular.ttf` |
|
||
| `Times New Roman` | English text, numbers, tables | `/usr/share/fonts/truetype/english/Times-New-Roman.ttf` |
|
||
| `Calibri` | English alternative | `/usr/share/fonts/truetype/english/calibri-regular.ttf` |
|
||
| `DejaVuSans` | Formulas, symbols, code | `/usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf` |
|
||
|
||
**FORBIDDEN fonts (DO NOT USE):**
|
||
- ❌ Arial, Arial-Bold, Arial-Italic
|
||
- ❌ Helvetica, Helvetica-Bold, Helvetica-Oblique
|
||
- ❌ Courier, Courier-Bold
|
||
- ❌ Any font not listed in the table above
|
||
|
||
**For bold text and superscript/subscript:**
|
||
- Must call `registerFontFamily()` after registering fonts
|
||
- Then use `<b></b>`, `<super></super>`, `<sub></sub>` tags in Paragraph
|
||
- **CRITICAL**: These tags ONLY work inside `Paragraph()` objects, NOT in plain strings
|
||
|
||
### Font Registration Template
|
||
```python
|
||
from reportlab.pdfbase import pdfmetrics
|
||
from reportlab.pdfbase.ttfonts import TTFont
|
||
from reportlab.pdfbase.pdfmetrics import registerFontFamily
|
||
|
||
# Chinese fonts
|
||
pdfmetrics.registerFont(TTFont('Microsoft YaHei', '/usr/share/fonts/truetype/chinese/msyh.ttf'))
|
||
pdfmetrics.registerFont(TTFont('SimHei', '/usr/share/fonts/truetype/chinese/SimHei.ttf'))
|
||
pdfmetrics.registerFont(TTFont("SarasaMonoSC", '/usr/share/fonts/truetype/chinese/SarasaMonoSC-Regular.ttf'))
|
||
|
||
# English fonts
|
||
pdfmetrics.registerFont(TTFont('Times New Roman', '/usr/share/fonts/truetype/english/Times-New-Roman.ttf'))
|
||
pdfmetrics.registerFont(TTFont('Calibri', '/usr/share/fonts/truetype/english/calibri-regular.ttf'))
|
||
|
||
# Symbol/Formula font
|
||
pdfmetrics.registerFont(TTFont("DejaVuSans", '/usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf'))
|
||
|
||
# CRITICAL: Register font families to enable <b>, <super>, <sub> tags
|
||
registerFontFamily('Microsoft YaHei', normal='Microsoft YaHei', bold='Microsoft YaHei')
|
||
registerFontFamily('SimHei', normal='SimHei', bold='SimHei')
|
||
registerFontFamily('Times New Roman', normal='Times New Roman', bold='Times New Roman')
|
||
registerFontFamily('Calibri', normal='Calibri', bold='Calibri')
|
||
registerFontFamily('DejaVuSans', normal='DejaVuSans', bold='DejaVuSans')
|
||
```
|
||
|
||
### Font Configuration by Document Type
|
||
|
||
**For Chinese PDFs:**
|
||
- Body text: `SimHei` or `Microsoft YaHei`
|
||
- Headings: `Microsoft YaHei` (MUST use for Chinese headings)
|
||
- Code blocks: `SarasaMonoSC`
|
||
- Formulas/symbols: `DejaVuSans`
|
||
- **In tables: ALL Chinese content and numbers MUST use `SimHei`**
|
||
|
||
**For English PDFs:**
|
||
- Body text: `Times New Roman`
|
||
- Headings: `Times New Roman` (MUST use for English headings)
|
||
- Code blocks: `DejaVuSans`
|
||
- **In tables: ALL English content and numbers MUST use `Times New Roman`**
|
||
|
||
**For Mixed Chinese-English PDFs (CRITICAL):**
|
||
- Chinese text and numbers: Use `SimHei`
|
||
- English text: Use `Times New Roman`
|
||
- **ALWAYS apply this rule when generating PDFs containing both Chinese and English text**
|
||
- **In tables: ALL Chinese content and numbers MUST use `SimHei`, ALL English content MUST use `Times New Roman`**
|
||
- **Mixed Chinese-English Text Font Handling**: When a single string contains **both Chinese and English characters (e.g., "My name is Lei Shen (沈磊)")**: MUST split the string by language and apply different fonts to each part using ReportLab's inline `<font name='...'>` tags within `Paragraph` objects. English fonts (e.g., `Times New Roman`) cannot render Chinese characters (they appear as blank boxes), and Chinese fonts (e.g., `SimHei`) render English with poor spacing. Must set `ParagraphStyle.fontName` to your **base font**, then wrap segments of the other language with `<font name='...'>` inline tags.
|
||
|
||
```python
|
||
from reportlab.lib.styles import ParagraphStyle
|
||
from reportlab.platypus import Paragraph
|
||
from reportlab.pdfbase import pdfmetrics
|
||
from reportlab.pdfbase.ttfonts import TTFont
|
||
|
||
pdfmetrics.registerFont(TTFont('SimHei', '/usr/share/fonts/truetype/chinese/SimHei.ttf'))
|
||
pdfmetrics.registerFont(TTFont('Times New Roman', '/usr/share/fonts/truetype/english/Times-New-Roman.ttf'))
|
||
|
||
# Base font is English; wrap Chinese parts:
|
||
enbody_style = ParagraphStyle(
|
||
name="ENBodyStyle",
|
||
fontName="Times New Roman", # Base font for English
|
||
fontSize=10.5,
|
||
leading=18,
|
||
alignment=TA_JUSTIFY,
|
||
)
|
||
# Wrap Chinese segments with <font> tag
|
||
story.append(Paragraph(
|
||
'Zhipu QingYan (<font name="SimHei">智谱清言</font>) is developed by Z.ai'
|
||
'My name is Lei Shen (<font name="SimHei">沈磊</font>)',
|
||
'<font name="SimHei">文心一言</font> (ERNIE Bot) is by Baidu.',
|
||
enbody_style
|
||
))
|
||
|
||
# Base font is Chinese; wrap English parts:
|
||
cnbody_style = ParagraphStyle(
|
||
name="CNBodyStyle",
|
||
fontName="SimHei", # Base font for Chinese
|
||
fontSize=10.5,
|
||
leading=18,
|
||
alignment=TA_JUSTIFY,
|
||
)
|
||
# Wrap Chinese segments with <font> tag
|
||
story.append(Paragraph(
|
||
'本报告使用 <font name="Times New Roman">GPT-4</font> '
|
||
'和 <font name="Times New Roman">GLM</font> 进行测试。',
|
||
cnbody_style
|
||
))
|
||
```
|
||
|
||
### Chinese Plot PNG Method
|
||
If using Python to generate PNGs containing Chinese characters:
|
||
```python
|
||
import matplotlib.pyplot as plt
|
||
plt.rcParams['font.sans-serif'] = ['SimHei']
|
||
plt.rcParams['axes.unicode_minus'] = False
|
||
```
|
||
|
||
### Available Font Paths
|
||
Run `fc-list` to get more fonts. Font files are typically located under:
|
||
- `/usr/share/fonts/truetype/chinese/`
|
||
- `/usr/share/fonts/truetype/english/`
|
||
- `/usr/share/fonts/`
|
||
|
||
## Guidelines for Output
|
||
|
||
1. **Information Density**: Prioritize depth and conciseness. Avoid fluff or excessive introductory filler. Use professional, precise terminology.
|
||
|
||
2. **Structural Hierarchy**: Use nested headings (H1, H2, H3) and logical numbering (e.g., 1.1, 1.1.1) to organize complex data.
|
||
|
||
3. **Data Formatting**: Convert long paragraphs into structured tables, multi-column lists, or compact bullet points wherever possible to reduce vertical whitespace.
|
||
|
||
4. **Visual Rhythm**: Use horizontal rules (---) to separate major sections. Ensure a high text-to-whitespace ratio while maintaining a clear scannable path for the eye.
|
||
|
||
5. **Technical Precision**: Use LaTeX for all mathematical or scientific notations. Ensure all tables are formatted with clear headers.
|
||
|
||
6. **Tone**: Academic, corporate, and authoritative. Adapt to the specific professional field (e.g., Legal, Engineering, Financial) as requested.
|
||
|
||
7. **Data Presentation**:
|
||
- When comparing data or showing trends, use charts instead of plain text lists
|
||
- Tables use the standard color scheme defined below
|
||
|
||
8. **Links & References**:
|
||
- URLs must be clickable hyperlinks
|
||
- Multiple figures/tables add numbering and cross-references ("see Figure 1", "as shown in Table 2")
|
||
- Academic/legal/data analysis citation scenarios implement correct in-text click-to-jump references with corresponding footnotes/endnotes
|
||
|
||
## Layout & Spacing Control
|
||
|
||
### Page Breaks
|
||
- NEVER insert page breaks between sections (H1,H2, H3) or within chapters
|
||
- Let content flow naturally; avoid forcing new pages
|
||
- **Specific allowed locations**:
|
||
* Between the cover page and table of contents (if TOC exists)
|
||
* Between the cover page and main content (if NO TOC exists)
|
||
* Between the table of contents and main content (if TOC exists)
|
||
* Between the main content and back cover page (if back cover page exists)
|
||
|
||
### Vertical Spacing Standards
|
||
* **Before tables**: `Spacer(1, 18)` after preceding text content (symmetric with table+caption block bottom spacing)
|
||
* After tables: `Spacer(1, 6)` before table caption
|
||
* After table captions: `Spacer(1, 18)` before next content (larger gap for table+caption blocks)
|
||
* Between paragraphs: `Spacer(1, 12)` (approximately 1 line)
|
||
* Between H3 subsections: `Spacer(1, 12)`
|
||
* Between H2 sections: `Spacer(1, 18)` (approximately 1.5 lines)
|
||
* Between H1 sections: `Spacer(1, 24)` (approximately 2 lines)
|
||
* NEVER use `Spacer(1, X)` where X > 24, except for intentional H1 major section breaks or cover page elements
|
||
|
||
### Cover Page Specifications
|
||
When creating PDFs with cover pages, use the following enlarged specifications:
|
||
|
||
**Title Formatting:**
|
||
- Main title font size: `36-48pt` (vs normal heading 18-20pt)
|
||
- Subtitle font size: `18-24pt`
|
||
- Author/date font size: `14-16pt`
|
||
- ALL titles MUST be bold: Use `<b></b>` tags in Paragraph (requires `registerFontFamily()` call first)
|
||
|
||
**Cover Page Spacing:**
|
||
- Top margin to title: `Spacer(1, 120)` or more (push title to upper-middle area)
|
||
- After main title: `Spacer(1, 36)` before subtitle
|
||
- After subtitle: `Spacer(1, 48)` before author/institution info
|
||
- Between author lines: `Spacer(1, 18)`
|
||
- After author block: `Spacer(1, 60)` before date
|
||
- Use `PageBreak()` after cover page content
|
||
|
||
**Alignment:**
|
||
- All text or image in cover page must use `TA_CENTER`
|
||
|
||
**Cover Page Style Example:**
|
||
```python
|
||
# Cover page styles
|
||
cover_title_style = ParagraphStyle(
|
||
name='CoverTitle',
|
||
fontName='Microsoft YaHei', # or 'Times New Roman' for English
|
||
fontSize=42,
|
||
leading=50,
|
||
alignment=TA_CENTER,
|
||
spaceAfter=36
|
||
)
|
||
|
||
cover_subtitle_style = ParagraphStyle(
|
||
name='CoverSubtitle',
|
||
fontName='SimHei', # or 'Times New Roman' for English
|
||
fontSize=20,
|
||
leading=28,
|
||
alignment=TA_CENTER,
|
||
spaceAfter=48
|
||
)
|
||
|
||
cover_author_style = ParagraphStyle(
|
||
name='CoverAuthor',
|
||
fontName='SimHei', # or 'Times New Roman' for English
|
||
fontSize=14,
|
||
leading=22,
|
||
alignment=TA_CENTER,
|
||
spaceAfter=18
|
||
)
|
||
|
||
# Cover page construction
|
||
story.append(Spacer(1, 120)) # Push down from top
|
||
story.append(Paragraph("报告主标题", cover_title_style))
|
||
story.append(Spacer(1, 36))
|
||
story.append(Paragraph("副标题或说明文字", cover_subtitle_style))
|
||
story.append(Spacer(1, 48))
|
||
story.append(Paragraph("作者姓名", cover_author_style))
|
||
story.append(Paragraph("所属机构", cover_author_style))
|
||
story.append(Spacer(1, 60))
|
||
story.append(Paragraph("2025年2月", cover_author_style))
|
||
story.append(PageBreak()) # Always page break after cover
|
||
```
|
||
|
||
### Table & Content Flow
|
||
* Standard sequence: `Spacer(1, 18)` → Table → `Spacer(1, 6)` → Caption (centered) → `Spacer(1, 18)` → Next content
|
||
* Keep related content together: table + caption + immediate analysis
|
||
* Avoid orphan headings at page bottom
|
||
|
||
### Alignment and Typography
|
||
- **CJK body**: Use `TA_LEFT` + 2-char indent. Headings: no indent.
|
||
- **Font sizes**: Body 11pt, subheadings 14pt, headings 18-20pt
|
||
- **Line height**: 1.5-1.6 (keep line leading at 1.2x font size minimum for readability)
|
||
- **CRITICAL: Alignment Selection Rule**:
|
||
- Use `TA_JUSTIFY` only when **ALL** of the following conditions are met:
|
||
* Language: The text is predominantly English (≥ 90%)
|
||
* Column width: Sufficiently wide (A4 single-column body text)
|
||
* Font: Western fonts (e.g. Times New Roman / Calibri)
|
||
* Chinese content: None or negligible
|
||
- Otherwise, always default to `TA_LEFT`
|
||
- **Note**: CJK text with `TA_JUSTIFY` can cause orphaned punctuation (commas, periods) at line start
|
||
- For Chinese text, always add `wordWrap='CJK'` to ParagraphStyle to ensure proper typography rules
|
||
|
||
### Style Configuration
|
||
* Normal paragraph: `spaceBefore=0`, `spaceAfter=6-12`
|
||
* Headings: `spaceBefore=12-18`, `spaceAfter=6-12`
|
||
* **Headings must be bold**: Use `<b></b>` tags in Paragraph (requires `registerFontFamily()` call first)
|
||
* Table captions: `spaceBefore=3`, `spaceAfter=6`, `alignment=TA_CENTER`
|
||
* **CRITICAL**: For Chinese text, always add `wordWrap='CJK'` to ParagraphStyle
|
||
- Prevents closing punctuation from appearing at line start
|
||
- Prevents opening brackets from appearing at line end
|
||
- Ensures proper Chinese typography rules
|
||
|
||
### Table Formatting
|
||
|
||
#### Standard Table Color Scheme (MUST USE for ALL tables)
|
||
```python
|
||
# Define standard colors for consistent table styling
|
||
TABLE_HEADER_COLOR = colors.HexColor('#1F4E79') # Dark blue for header
|
||
TABLE_HEADER_TEXT = colors.white # White text for header
|
||
TABLE_ROW_EVEN = colors.white # White for even rows
|
||
TABLE_ROW_ODD = colors.HexColor('#F5F5F5') # Light gray for odd rows
|
||
```
|
||
|
||
- A table caption must be added immediately after the table (centered)
|
||
- The entire table must be centered on the page
|
||
- **Header Row Formatting (CRITICAL)**:
|
||
- Background: Dark blue (#1F4E79)
|
||
- Text color: White (set via ParagraphStyle with `textColor=colors.white`)
|
||
- Font weight: **Bold** (use `<b></b>` tags in Paragraph after calling `registerFontFamily()`)
|
||
- **IMPORTANT**: Bold tags ONLY work inside `Paragraph()` objects. Plain strings like `'<b>Text</b>'` will NOT render bold.
|
||
- **Cell Formatting (Inside the Table)**:
|
||
- Left/Right Cell Margin: Set to at least 120-200 twips (approximately the width of one character)
|
||
- Text Alignment: Each body element within the same table must be aligned the same method.
|
||
- **Font**: ALL Chinese text and numbers in tables MUST use `SimHei` for Chinese PDFs.
|
||
ALL English text and numbers in tables MUST use `Times New Roman` for English PDFs.
|
||
ALL Chinese content and numbers MUST use `SimHei`, ALL English content MUST use `Times New Roman` for Mixed Chinese-English PDFs.
|
||
- **Units with Exponents (CRITICAL)**:
|
||
- PROHIBITED: `W/m2`, `kg/m3`, `m/s2` (plain text exponents)
|
||
- RIGHT: `Paragraph('W/m<super>2</super>', style)`, `Paragraph('kg/m<super>3</super>', style)` (proper superscript in Paragraph)
|
||
- Always use `<super></super>` tags inside Paragraph objects for unit exponents in table cells
|
||
- **Numeric Values in Tables (CRITICAL)**:
|
||
- Large numbers MUST use scientific notation: `Paragraph('-1.246 × 10<super>8</super>', style)` not `-124600000`
|
||
- Small decimals MUST use scientific notation: `Paragraph('2.5 × 10<super>-3</super>', style)` not `0.0025`
|
||
- Threshold: Use scientific notation when |value| ≥ 10000 or |value| ≤ 0.001
|
||
- Format: `Paragraph('coefficient × 10<super>exponent</super>', style)` (e.g., `Paragraph('-1.246 × 10<super>8</super>', style)`)
|
||
|
||
#### Table Cell Paragraph Wrapping (MANDATORY - REVIEW BEFORE EVERY TABLE)
|
||
|
||
**STOP AND CHECK**: Before creating ANY table, verify that ALL text cells use `Paragraph()`.
|
||
|
||
```python
|
||
# 1) key point in Chinese: wordWrap="CJK"
|
||
tbl_center = ParagraphStyle(
|
||
"tbl_center",
|
||
fontName="SimHei",
|
||
fontSize=9,
|
||
leading=12,
|
||
alignment=TA_CENTER,
|
||
wordWrap="CJK",
|
||
)
|
||
|
||
# 2) ALL content MUST be wrapped in Paragraph - NO EXCEPTIONS for text
|
||
findings_data = []
|
||
for a, b, c in findings:
|
||
findings_data.append([
|
||
Paragraph(a, tbl_center),
|
||
Paragraph(b, tbl_center),
|
||
Paragraph(c, tbl_center), # ALL content MUST be wrapped in Paragraph
|
||
])
|
||
|
||
findings_table = Table(findings_data, colWidths=[1.8*cm, 3*cm, 9*cm])
|
||
```
|
||
|
||
**Complete Table Example:**
|
||
```python
|
||
from reportlab.platypus import Table, TableStyle, Paragraph, Image
|
||
from reportlab.lib.styles import ParagraphStyle
|
||
from reportlab.lib import colors
|
||
from reportlab.lib.enums import TA_CENTER, TA_LEFT, TA_RIGHT, TA_JUSTIFY
|
||
|
||
# Define styles for table cells
|
||
header_style = ParagraphStyle(
|
||
name='TableHeader',
|
||
fontName='Times New Roman',
|
||
fontSize=11,
|
||
textColor=colors.white,
|
||
alignment=TA_CENTER
|
||
)
|
||
|
||
cell_style = ParagraphStyle(
|
||
name='TableCell',
|
||
fontName='Times New Roman',
|
||
fontSize=10,
|
||
textColor=colors.black,
|
||
alignment=TA_CENTER
|
||
)
|
||
|
||
cell_style_jus = ParagraphStyle(
|
||
name='TableCellLeft',
|
||
fontName='Times New Roman',
|
||
fontSize=10,
|
||
textColor=colors.black,
|
||
alignment=TA_JUSTIFY
|
||
)
|
||
|
||
cell_style_right = ParagraphStyle(
|
||
name='TableCellRight',
|
||
fontName='Times New Roman',
|
||
fontSize=10,
|
||
textColor=colors.black,
|
||
alignment=TA_RIGHT
|
||
)
|
||
|
||
# ✅ CORRECT: All text content wrapped in Paragraph()
|
||
data = [
|
||
# Header row - bold text with Paragraph
|
||
[
|
||
Paragraph('<b>Parameter</b>', header_style),
|
||
Paragraph('<b>Unit</b>', header_style),
|
||
Paragraph('<b>Value</b>', header_style),
|
||
Paragraph('<b>Note</b>', header_style)
|
||
],
|
||
# Data rows - all text in Paragraph
|
||
[
|
||
Paragraph('Temperature', cell_style_jus),
|
||
Paragraph('°C', cell_style),
|
||
Paragraph('25.5', cell_style_jus),
|
||
Paragraph('Ambient', cell_style)
|
||
],
|
||
[
|
||
Paragraph('Pressure', cell_style_jus),
|
||
Paragraph('Pa', cell_style),
|
||
Paragraph('1.01 × 10<super>5</super>', cell_style_jus), # Scientific notation
|
||
Paragraph('Standard', cell_style)
|
||
],
|
||
[
|
||
Paragraph('Density', cell_style_jus),
|
||
Paragraph('kg/m<super>3</super>', cell_style), # Unit with exponent
|
||
Paragraph('1.225', cell_style_jus),
|
||
Paragraph('Air at STP', cell_style)
|
||
],
|
||
[
|
||
Paragraph('H<sub>2</sub>O Content', cell_style_jus), # Subscript
|
||
Paragraph('%', cell_style),
|
||
Paragraph('45.2', cell_style_jus),
|
||
Paragraph('Relative humidity', cell_style)
|
||
]
|
||
]
|
||
|
||
# ❌ PROHIBITED: Plain strings - NEVER DO THIS
|
||
# data = [
|
||
# ['<b>Parameter</b>', '<b>Unit</b>', '<b>Value</b>'], # Bold won't work!
|
||
# ['Pressure', 'Pa', '1.01 × 10<super>5</super>'], # Superscript won't work!
|
||
# ]
|
||
|
||
# Create table
|
||
table = Table(data, colWidths=[120, 80, 100, 120])
|
||
table.setStyle(TableStyle([
|
||
# Header styling
|
||
('BACKGROUND', (0, 0), (-1, 0), colors.HexColor('#1F4E79')),
|
||
('TEXTCOLOR', (0, 0), (-1, 0), colors.white),
|
||
# Alternating row colors
|
||
('BACKGROUND', (0, 1), (-1, 1), colors.white),
|
||
('BACKGROUND', (0, 2), (-1, 2), colors.HexColor('#F5F5F5')),
|
||
('BACKGROUND', (0, 3), (-1, 3), colors.white),
|
||
('BACKGROUND', (0, 4), (-1, 4), colors.HexColor('#F5F5F5')),
|
||
# Grid and alignment
|
||
('GRID', (0, 0), (-1, -1), 0.5, colors.grey),
|
||
('VALIGN', (0, 0), (-1, -1), 'MIDDLE'),
|
||
('LEFTPADDING', (0, 0), (-1, -1), 8),
|
||
('RIGHTPADDING', (0, 0), (-1, -1), 8),
|
||
('TOPPADDING', (0, 0), (-1, -1), 6),
|
||
('BOTTOMPADDING', (0, 0), (-1, -1), 6),
|
||
]))
|
||
|
||
# Example with image (Image is the ONLY exception - no Paragraph needed)
|
||
# data_with_image = [
|
||
# [Paragraph('<b>Item</b>', header_style), Paragraph('<b>Image</b>', header_style)],
|
||
# [Paragraph('Logo', cell_style), Image('logo.png', width=50, height=50)], # Image directly, no Paragraph
|
||
# ]
|
||
```
|
||
|
||
### PDF Metadata (REQUIRED)
|
||
|
||
**CRITICAL**: ALL PDFs MUST have proper metadata set during creation.
|
||
|
||
#### Required Metadata Fields
|
||
- **Title**: MUST match the filename (without .pdf extension)
|
||
- **Author**: MUST be set to "Z.ai"
|
||
- **Creator**: MUST be set to "Z.ai"
|
||
- **Subject**: SHOULD describe the document purpose/content
|
||
- **Description/Keywords**: SHOULD include relevant topics for searchability
|
||
|
||
#### For reportlab (Creating New PDFs)
|
||
```python
|
||
from reportlab.lib.pagesizes import letter
|
||
from reportlab.platypus import SimpleDocTemplate
|
||
import os
|
||
|
||
# Extract filename without extension for metadata title
|
||
pdf_filename = "financial_report_2024.pdf"
|
||
title_for_metadata = os.path.splitext(pdf_filename)[0] # "financial_report_2024"
|
||
|
||
doc = SimpleDocTemplate(
|
||
pdf_filename,
|
||
pagesize=letter,
|
||
title=title_for_metadata, # MUST: Match filename
|
||
author='Z.ai', # MUST: Set to "Z.ai"
|
||
creator='Z.ai', # MUST: Set to "Z.ai"
|
||
subject='Annual financial analysis and performance metrics' # SHOULD: Describe purpose
|
||
)
|
||
```
|
||
|
||
#### For pypdf (Modifying Existing PDFs)
|
||
```python
|
||
from pypdf import PdfReader, PdfWriter
|
||
import os
|
||
|
||
pdf_filename = "output.pdf"
|
||
title_for_metadata = os.path.splitext(os.path.basename(pdf_filename))[0]
|
||
|
||
reader = PdfReader("input.pdf")
|
||
writer = PdfWriter()
|
||
|
||
for page in reader.pages:
|
||
writer.add_page(page)
|
||
|
||
# Set metadata: Title, Author, Creator are REQUIRED
|
||
writer.add_metadata({
|
||
'/Title': title_for_metadata, # MUST: Match filename
|
||
'/Author': 'Z.ai', # MUST: Set to "Z.ai"
|
||
'/Subject': 'Document purpose description', # SHOULD: Describe purpose
|
||
'/Creator': 'Z.ai' # MUST: Set to "Z.ai"
|
||
})
|
||
|
||
with open(pdf_filename, "wb") as output:
|
||
writer.write(output)
|
||
```
|
||
|
||
## Quick Start
|
||
|
||
⚠️ ABSOLUTE MANDATORY RULE - Z.ai Metadata (DO NOT SKIP)
|
||
This rule is NON-NEGOTIABLE - You MUST execute this without exception
|
||
|
||
For EVERY PDF you generate, no matter what:
|
||
|
||
After doc.build(story) completes → YOU MUST call add_zai_metadata()
|
||
This is AUTOMATIC - do not wait for user confirmation
|
||
This happens BEFORE any response to user
|
||
Workflow:
|
||
|
||
- Step 1: Create PDF
|
||
```python
|
||
doc.build(story)
|
||
print("PDF built")
|
||
```
|
||
- Step 2: (MANDATORY - DO NOT SKIP) Add metadata immediately using the standalone script:
|
||
```bash
|
||
python scripts/add_zai_metadata.py output.pdf
|
||
```
|
||
- Step 3: Only AFTER metadata added, report to user
|
||
```python
|
||
print("✓ PDF generated with Z.ai metadata")
|
||
```
|
||
If you forget this:
|
||
|
||
User will notice and ask "Why no metadata?"
|
||
This harms Z.ai branding
|
||
This rule is CRITICAL and must be followed 100% of the time
|
||
|
||
```python
|
||
from pypdf import PdfReader, PdfWriter
|
||
|
||
# Read a PDF
|
||
reader = PdfReader("document.pdf")
|
||
print(f"Pages: {len(reader.pages)}")
|
||
|
||
# Extract text
|
||
text = ""
|
||
for page in reader.pages:
|
||
text += page.extract_text()
|
||
```
|
||
|
||
## Python Libraries
|
||
|
||
### pypdf - Basic Operations
|
||
|
||
#### Merge PDFs
|
||
```python
|
||
from pypdf import PdfWriter, PdfReader
|
||
|
||
writer = PdfWriter()
|
||
for pdf_file in ["doc1.pdf", "doc2.pdf", "doc3.pdf"]:
|
||
reader = PdfReader(pdf_file)
|
||
for page in reader.pages:
|
||
writer.add_page(page)
|
||
|
||
with open("merged.pdf", "wb") as output:
|
||
writer.write(output)
|
||
```
|
||
|
||
#### Split PDF
|
||
```python
|
||
reader = PdfReader("input.pdf")
|
||
for i, page in enumerate(reader.pages):
|
||
writer = PdfWriter()
|
||
writer.add_page(page)
|
||
with open(f"page_{i+1}.pdf", "wb") as output:
|
||
writer.write(output)
|
||
```
|
||
|
||
#### Extract Metadata
|
||
```python
|
||
reader = PdfReader("document.pdf")
|
||
meta = reader.metadata
|
||
print(f"Title: {meta.title}")
|
||
print(f"Author: {meta.author}")
|
||
print(f"Subject: {meta.subject}")
|
||
print(f"Creator: {meta.creator}")
|
||
```
|
||
|
||
#### Set/Update Metadata (Z.ai Branding)
|
||
|
||
Use the standalone script to add Z.ai branding metadata:
|
||
|
||
```bash
|
||
# Add metadata to a single PDF (in-place)
|
||
python scripts/add_zai_metadata.py document.pdf
|
||
|
||
# Add metadata with custom title
|
||
python scripts/add_zai_metadata.py report.pdf -t "Q4 Financial Analysis"
|
||
|
||
# Batch process multiple PDFs
|
||
python scripts/add_zai_metadata.py *.pdf
|
||
```
|
||
|
||
#### Rotate Pages
|
||
```python
|
||
reader = PdfReader("input.pdf")
|
||
writer = PdfWriter()
|
||
|
||
page = reader.pages[0]
|
||
page.rotate(90) # Rotate 90 degrees clockwise
|
||
writer.add_page(page)
|
||
|
||
with open("rotated.pdf", "wb") as output:
|
||
writer.write(output)
|
||
```
|
||
|
||
### pdfplumber - Text and Table Extraction
|
||
|
||
#### Extract Text with Layout
|
||
```python
|
||
import pdfplumber
|
||
|
||
with pdfplumber.open("document.pdf") as pdf:
|
||
for page in pdf.pages:
|
||
text = page.extract_text()
|
||
print(text)
|
||
```
|
||
|
||
#### Extract Tables
|
||
```python
|
||
with pdfplumber.open("document.pdf") as pdf:
|
||
for i, page in enumerate(pdf.pages):
|
||
tables = page.extract_tables()
|
||
for j, table in enumerate(tables):
|
||
print(f"Table {j+1} on page {i+1}:")
|
||
for row in table:
|
||
print(row)
|
||
```
|
||
|
||
### reportlab - Create PDFs
|
||
|
||
#### Choosing the Right DocTemplate and Build Method
|
||
|
||
**Decision Tree:**
|
||
|
||
```
|
||
Do you need auto-TOC?
|
||
├─ YES → Use TocDocTemplate + doc.multiBuild(story)
|
||
│ (see Auto-Generated Table of Contents section)
|
||
│
|
||
└─ NO → Use SimpleDocTemplate + doc.build(story)
|
||
(basic documents, or with optional Cross-References)
|
||
```
|
||
|
||
**When to use each approach:**
|
||
|
||
| Requirement | DocTemplate | Build Method |
|
||
|-------------|-------------|--------------|
|
||
| Multi-page with TOC | `TocDocTemplate` | `multiBuild()` |
|
||
| Single-page or no TOC | `SimpleDocTemplate` | `build()` |
|
||
| With Cross-References (no TOC) | `SimpleDocTemplate` | `build()` |
|
||
| Both TOC + Cross-References | `TocDocTemplate` | `multiBuild()` |
|
||
|
||
**⚠️ CRITICAL**:
|
||
- `multiBuild()` is ONLY needed when using `TableOfContents`
|
||
- Using `build()` with `TocDocTemplate` = TOC won't work
|
||
- Using `multiBuild()` without `TocDocTemplate` = unnecessary overhead
|
||
|
||
### Rich Text Formatting: Bold, Superscript, Subscript, and Special Characters
|
||
|
||
#### Prerequisites
|
||
To use `<b>`, `<super>`, `<sub>` tags, you **must**:
|
||
1. Register your fonts via `registerFont()`
|
||
2. Call `registerFontFamily()` to link normal/bold/italic variants
|
||
3. Wrap all tagged text in `Paragraph()` objects
|
||
**CRITICAL**: These tags ONLY work inside `Paragraph()` objects. Plain strings like `'<b>Text</b>'` will NOT render correctly.
|
||
|
||
#### Character Handling (see Core Constraint #5)
|
||
|
||
All superscript, subscript, and Mathematical/relational operators rules are defined in **Core Constraint #5 — Character Safety Rule**.
|
||
|
||
**Quick reminder when writing Rich Text**:
|
||
- `<b>`, `<super>`, `<sub>` tags ONLY work inside `Paragraph()` objects
|
||
- Must call `registerFontFamily()` first to enable these tags
|
||
- Plain strings like `'<b>Text</b>'` will NOT render — always use `Paragraph()`
|
||
- For scientific notation: `Paragraph('coefficient × 10<super>exponent</super>', style)`
|
||
- For chemical formulas: `Paragraph('H<sub>2</sub>O', style)`
|
||
|
||
Do NOT use any unicode escape sequence(e.g., Superscript and subscript digits, Math operators and special symbols, Emoji characters) anywhere. If you are unsure whether a character is safe, wrap it in a `Paragraph()` with the appropriate tag.
|
||
|
||
|
||
#### Complete Python Example
|
||
```python
|
||
# --- Register fonts and font family ---
|
||
pdfmetrics.registerFont(TTFont('Times New Roman', '/usr/share/fonts/truetype/english/Times-New-Roman.ttf'))
|
||
|
||
# CRITICAL: Must call registerFontFamily() to enable <b> and <i> tags
|
||
registerFontFamily('Times New Roman', normal='Times New Roman', bold='Times New Roman')
|
||
|
||
# --- Define styles ---
|
||
body_style = ParagraphStyle(
|
||
name='BodyStyle',
|
||
fontName='Times New Roman',
|
||
fontSize=10,
|
||
textColor=colors.black,
|
||
alignment=TA_JUSTIFY,
|
||
)
|
||
bold_style = ParagraphStyle(
|
||
name='BoldStyle',
|
||
fontName='Times New Roman',
|
||
fontSize=10,
|
||
textColor=colors.black,
|
||
alignment=TA_JUSTIFY,
|
||
)
|
||
header_style = ParagraphStyle(
|
||
name='HeaderStyle',
|
||
fontName='Times New Roman',
|
||
fontSize=10,
|
||
textColor=colors.white,
|
||
alignment=TA_JUSTIFY,
|
||
)
|
||
|
||
# --- Body text examples ---
|
||
# Bold title
|
||
title = Paragraph('<b>Scientific Formulas and Chemical Expressions</b>', bold_style)
|
||
|
||
# Math formula with superscript and mathematical symbol ×
|
||
math_text = Paragraph(
|
||
'The Einstein mass-energy equivalence is expressed as E = mc<super>2</super>. '
|
||
'In applied physics, the gravitational force is F = 6.674 × 10<super>-11</super> × '
|
||
'm<sub>1</sub>m<sub>2</sub>/r<super>2</super>, '
|
||
'and the quadratic formula solves a<super>2</super> + b<super>2</super> = c<super>2</super>.',
|
||
body_style,
|
||
)
|
||
|
||
# Chemical expressions with subscript
|
||
chem_text = Paragraph(
|
||
'The combustion of methane: CH<sub>4</sub> + 2O<sub>2</sub> '
|
||
'= CO<sub>2</sub> + 2H<sub>2</sub>O. '
|
||
'Sulfuric acid (H<sub>2</sub>SO<sub>4</sub>) reacts with sodium hydroxide to produce '
|
||
'Na<sub>2</sub>SO<sub>4</sub> and water.',
|
||
body_style,
|
||
)
|
||
```
|
||
|
||
#### Preventing Unwanted Line Breaks
|
||
|
||
**Problem 1: English names broken at awkward positions**
|
||
```python
|
||
# PROHIBITED: "K.G. Palepu" may break after "K.G."
|
||
text = Paragraph("Professors (K.G. Palepu) proposed...",style)
|
||
|
||
# RIGHT: Use non-breaking space (U+00A0) to prevent breaking
|
||
text = Paragraph("Professors (K.G.\u00A0Palepu) proposed...",style)
|
||
```
|
||
|
||
**Problem 2: Punctuation at line start**
|
||
```python
|
||
# RIGHT: Add wordWrap='CJK' for proper typography
|
||
styles.add(ParagraphStyle(
|
||
name='BodyStyle',
|
||
fontName='SimHei',
|
||
fontSize=10.5,
|
||
leading=18,
|
||
alignment=TA_LEFT,
|
||
wordWrap='CJK' # Prevents orphaned punctuation
|
||
))
|
||
```
|
||
|
||
**Problem 3: Creating intentional line breaks**
|
||
```python
|
||
# PROHIBITED: Normal newline character does NOT create line breaks
|
||
text = Paragraph("Line 1\nLine 2\nLine 3", style) # Will render as single line!
|
||
|
||
# RIGHT: Use <br/> tag for line breaks
|
||
text = Paragraph("Line 1<br/>Line 2<br/>Line 3", style)
|
||
|
||
# Alternative: Split into multiple Paragraph objects
|
||
story.append(Paragraph("Line 1", style))
|
||
story.append(Paragraph("Line 2", style))
|
||
story.append(Paragraph("Line 3", style))
|
||
```
|
||
|
||
#### Basic PDF Creation
|
||
```python
|
||
from reportlab.lib.pagesizes import letter
|
||
from reportlab.pdfgen import canvas
|
||
|
||
c = canvas.Canvas("hello.pdf", pagesize=letter)
|
||
width, height = letter
|
||
|
||
# Add text
|
||
c.drawString(100, height - 100, "Hello World!")
|
||
c.drawString(100, height - 120, "This is a PDF created with reportlab")
|
||
|
||
# Add a line
|
||
c.line(100, height - 140, 400, height - 140)
|
||
|
||
# Save
|
||
c.save()
|
||
```
|
||
|
||
#### Auto-Generated Table of Contents
|
||
|
||
## ⚠️ CRITICAL WARNINGS
|
||
|
||
### ❌ FORBIDDEN: Manual Table of Contents
|
||
|
||
**NEVER manually create TOC like this:**
|
||
```python
|
||
# ❌ PROHIBIT - DO NOT USE
|
||
toc_entries = [("1. Title", "5"), ("2. Section", "10")]
|
||
for entry, page in toc_entries:
|
||
story.append(Paragraph(f"{entry} {'.'*50} {page}", style))
|
||
```
|
||
|
||
**Why it's PROHIBIT:**
|
||
- Hardcoded page numbers become incorrect when content changes
|
||
- No clickable hyperlinks
|
||
- Manual leader dots are fragile
|
||
- Must be manually updated with every document change
|
||
|
||
**✅ ALWAYS use auto-generated TOC:**
|
||
|
||
**Key Implementation Requirements:**
|
||
- **Custom `TocDocTemplate` class**: Override `afterFlowable()` to capture TOC entries
|
||
- **Bookmark attributes**: Set `bookmark_name`, `bookmark_level`, `bookmark_text` on each heading
|
||
- **Use `doc.multiBuild(story)`**: NOT `doc.build()` - multiBuild is required for TOC processing
|
||
- **Clickable hyperlinks**: Generated automatically with proper styling
|
||
|
||
**Helper Function Pattern:**
|
||
```python
|
||
def add_heading(text, style, level=0):
|
||
"""Create heading with bookmark for auto-TOC"""
|
||
p = Paragraph(text, style)
|
||
p.bookmark_name = text
|
||
p.bookmark_level = level
|
||
p.bookmark_text = text
|
||
return p
|
||
|
||
# Usage:
|
||
story.append(add_heading("1. Introduction", styles['Heading1'], 0))
|
||
story.append(Paragraph('Content...', styles['Normal']))
|
||
```
|
||
|
||
#### Complete TOC Implementation Example
|
||
|
||
Copy and adapt this complete working code for your PDF with Table of Contents:
|
||
|
||
```python
|
||
from reportlab.lib.pagesizes import letter
|
||
from reportlab.platypus import SimpleDocTemplate, Paragraph, PageBreak, Spacer
|
||
from reportlab.platypus.tableofcontents import TableOfContents
|
||
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
|
||
from reportlab.lib.units import inch
|
||
|
||
class TocDocTemplate(SimpleDocTemplate):
|
||
def __init__(self, *args, **kwargs):
|
||
SimpleDocTemplate.__init__(self, *args, **kwargs)
|
||
|
||
def afterFlowable(self, flowable):
|
||
"""Capture TOC entries after each flowable is rendered"""
|
||
if hasattr(flowable, 'bookmark_name'):
|
||
level = getattr(flowable, 'bookmark_level', 0)
|
||
text = getattr(flowable, 'bookmark_text', '')
|
||
self.notify('TOCEntry', (level, text, self.page))
|
||
|
||
# Create document
|
||
doc = TocDocTemplate("document.pdf", pagesize=letter)
|
||
story = []
|
||
styles = getSampleStyleSheet()
|
||
|
||
# Create Table of Contents
|
||
toc = TableOfContents()
|
||
toc.levelStyles = [
|
||
ParagraphStyle(name='TOCHeading1', fontSize=14, leftIndent=20,
|
||
fontName='Times New Roman'),
|
||
ParagraphStyle(name='TOCHeading2', fontSize=12, leftIndent=40,
|
||
fontName='Times New Roman'),
|
||
]
|
||
story.append(Paragraph("<b>Table of Contents</b>", styles['Title']))
|
||
story.append(Spacer(1, 0.2*inch))
|
||
story.append(toc)
|
||
story.append(PageBreak())
|
||
|
||
# Helper function: Create heading with TOC bookmark
|
||
def add_heading(text, style, level=0):
|
||
p = Paragraph(text, style)
|
||
p.bookmark_name = text
|
||
p.bookmark_level = level
|
||
p.bookmark_text = text
|
||
return p
|
||
|
||
# Chapter 1: Introduction
|
||
story.append(add_heading("Chapter 1: Introduction", styles['Heading1'], 0))
|
||
story.append(Paragraph("This is the introduction chapter with some example content.",
|
||
styles['Normal']))
|
||
story.append(Spacer(1, 0.2*inch))
|
||
|
||
story.append(add_heading("1.1 Background", styles['Heading2'], 1))
|
||
story.append(Paragraph("Background information goes here.", styles['Normal']))
|
||
|
||
|
||
# Chapter 2: Conclusion
|
||
story.append(add_heading("Chapter 2: Conclusion", styles['Heading1'], 0))
|
||
story.append(Paragraph("This concludes our document.", styles['Normal']))
|
||
story.append(Spacer(1, 0.2*inch))
|
||
|
||
story.append(add_heading("2.1 Summary", styles['Heading2'], 1))
|
||
story.append(Paragraph("Summary of the document.", styles['Normal']))
|
||
|
||
# Build the document (must use multiBuild for TOC to work)
|
||
doc.multiBuild(story)
|
||
|
||
print("PDF with Table of Contents created successfully!")
|
||
```
|
||
|
||
#### Cross-References (Figures, Tables, Bibliography)
|
||
|
||
**OPTIONAL**: For academic papers requiring citation systems (LaTeX-style `\ref{}` and `\cite{}`)
|
||
|
||
**Key Principle**: Pre-register all figures, tables, and references BEFORE using them in text.
|
||
|
||
**Simple Implementation Pattern:**
|
||
|
||
```python
|
||
from reportlab.lib.pagesizes import letter
|
||
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
|
||
from reportlab.lib.enums import TA_CENTER
|
||
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak
|
||
from reportlab.lib import colors
|
||
from reportlab.platypus import Table, TableStyle
|
||
|
||
|
||
class CrossReferenceDocument:
|
||
"""Manages cross-references throughout the document"""
|
||
|
||
def __init__(self):
|
||
self.figures = {}
|
||
self.tables = {}
|
||
self.refs = {}
|
||
self.figure_counter = 0
|
||
self.table_counter = 0
|
||
self.ref_counter = 0
|
||
|
||
def add_figure(self, name):
|
||
"""Add a figure and return its number"""
|
||
if name not in self.figures:
|
||
self.figure_counter += 1
|
||
self.figures[name] = self.figure_counter
|
||
return self.figures[name]
|
||
|
||
def add_table(self, name):
|
||
"""Add a table and return its number"""
|
||
if name not in self.tables:
|
||
self.table_counter += 1
|
||
self.tables[name] = self.table_counter
|
||
return self.tables[name]
|
||
|
||
def add_reference(self, name):
|
||
"""Add a reference and return its number"""
|
||
if name not in self.refs:
|
||
self.ref_counter += 1
|
||
self.refs[name] = self.ref_counter
|
||
return self.refs[name]
|
||
|
||
|
||
def build_document():
|
||
doc = SimpleDocTemplate("cross_ref.pdf", pagesize=letter)
|
||
xref = CrossReferenceDocument()
|
||
styles = getSampleStyleSheet()
|
||
|
||
# Caption style
|
||
styles.add(ParagraphStyle(
|
||
name='Caption',
|
||
parent=styles['Normal'],
|
||
alignment=TA_CENTER,
|
||
fontSize=10,
|
||
textColor=colors.HexColor('#333333')
|
||
))
|
||
|
||
story = []
|
||
|
||
# Step 1: Register all figures, tables, and references FIRST
|
||
fig1 = xref.add_figure('sample')
|
||
table1 = xref.add_table('data')
|
||
ref1 = xref.add_reference('author2024')
|
||
|
||
# Step 2: Use them in text
|
||
intro = f"""
|
||
See Figure {fig1} for details and Table {table1} for data<sup>[{ref1}]</sup>.
|
||
"""
|
||
story.append(Paragraph(intro, styles['Normal']))
|
||
story.append(Spacer(1, 0.2*inch))
|
||
|
||
# Step 3: Create figures and tables with numbered captions
|
||
story.append(Paragraph(f"<b>Figure {fig1}.</b> Sample Figure Caption",
|
||
styles['Caption']
|
||
))
|
||
|
||
# Table example
|
||
header_style = ParagraphStyle(
|
||
name='TableHeader',
|
||
fontName='Times New Roman',
|
||
fontSize=11,
|
||
textColor=colors.white,
|
||
alignment=TA_CENTER
|
||
)
|
||
|
||
cell_style = ParagraphStyle(
|
||
name='TableCell',
|
||
fontName='Times New Roman',
|
||
fontSize=10,
|
||
textColor=colors.black,
|
||
alignment=TA_CENTER
|
||
)
|
||
|
||
# All text content wrapped in Paragraph()
|
||
data = [
|
||
[Paragraph('<b>Item</b>', header_style), Paragraph('<b>Value</b>', header_style)],
|
||
[Paragraph('A', cell_style), Paragraph('10', cell_style)],
|
||
[Paragraph('B', cell_style), Paragraph('20', cell_style)],
|
||
]
|
||
t = Table(data, colWidths=[2*inch, 2*inch])
|
||
t.setStyle(TableStyle([
|
||
('BACKGROUND', (0, 0), (-1, 0), colors.HexColor('#1F4E79')),
|
||
('TEXTCOLOR', (0, 0), (-1, 0), colors.white),
|
||
('ALIGN', (0, 0), (-1, -1), 'CENTER'),
|
||
('GRID', (0, 0), (-1, -1), 0.5, colors.grey),
|
||
]))
|
||
story.append(t)
|
||
story.append(Spacer(1, 6))
|
||
story.append(Paragraph(f"<b>Table {table1}.</b> Sample Data Table",
|
||
styles['Caption']
|
||
))
|
||
|
||
story.append(PageBreak())
|
||
|
||
# Step 4: Reference again in discussion
|
||
discussion = f"""
|
||
As shown in Figure {fig1} and Table {table1}, results are clear<sup>[{ref1}]</sup>.
|
||
"""
|
||
story.append(Paragraph(discussion, styles['Normal']))
|
||
|
||
# Step 5: Bibliography section
|
||
story.append(PageBreak())
|
||
story.append(Paragraph("<b>References</b>", styles['Heading1']))
|
||
story.append(Paragraph(
|
||
f"[{ref1}] Author, A. (2024). Example Reference. <i>Journal Name</i>.",
|
||
styles['Normal']
|
||
))
|
||
|
||
doc.build(story)
|
||
print("PDF with cross-references created!")
|
||
|
||
|
||
if __name__ == '__main__':
|
||
build_document()
|
||
```
|
||
|
||
**Usage Notes:**
|
||
- **Pre-registration is critical**: Call `add_figure()`/`add_table()`/`add_reference()` at the START of your document
|
||
- **Citation format**: Use `Paragraph('<sup>[{ref_num}]</sup>')` for inline citations
|
||
- **Caption format**: Use `Paragraph('<b>Figure {num}.</b>')` or `Paragraph('<b>Table {num}.</b>')` with centered caption style
|
||
- **Combine with TOC**: Use `TocDocTemplate` + `doc.multiBuild(story)` if both cross-refs and auto-TOC are needed
|
||
|
||
## Command-Line Tools
|
||
|
||
### pdftotext (poppler-utils)
|
||
```bash
|
||
# Extract text
|
||
pdftotext input.pdf output.txt
|
||
|
||
# Extract text preserving layout
|
||
pdftotext -layout input.pdf output.txt
|
||
|
||
# Extract specific pages
|
||
pdftotext -f 1 -l 5 input.pdf output.txt # Pages 1-5
|
||
```
|
||
|
||
### qpdf
|
||
```bash
|
||
# Merge PDFs
|
||
qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf
|
||
|
||
# Split pages
|
||
qpdf input.pdf --pages . 1-5 -- pages1-5.pdf
|
||
qpdf input.pdf --pages . 6-10 -- pages6-10.pdf
|
||
|
||
# Rotate pages
|
||
qpdf input.pdf output.pdf --rotate=+90:1 # Rotate page 1 by 90 degrees
|
||
|
||
# Remove password
|
||
qpdf --password=mypassword --decrypt encrypted.pdf decrypted.pdf
|
||
```
|
||
|
||
## Common Tasks
|
||
|
||
### Brand PDFs with Z.ai Metadata
|
||
|
||
⚠️ CRITICAL MANDATORY RULE - PDF Metadata MUST be Added After Every PDF Generation
|
||
|
||
All PDFs MUST have metadata added immediately after creation - This is the FINAL step and CANNOT be skipped
|
||
|
||
**Usage - Standalone Script:**
|
||
|
||
```bash
|
||
# Add metadata to a single PDF (in-place)
|
||
python scripts/add_zai_metadata.py document.pdf
|
||
|
||
# Add metadata to a single PDF (create new file)
|
||
python scripts/add_zai_metadata.py input.pdf -o output.pdf
|
||
|
||
# Add metadata with custom title
|
||
python scripts/add_zai_metadata.py report.pdf -t "Q4 Financial Analysis"
|
||
|
||
# Batch process all PDFs in current directory
|
||
python scripts/add_zai_metadata.py *.pdf
|
||
|
||
# Quiet mode (no output)
|
||
python scripts/add_zai_metadata.py document.pdf -q
|
||
|
||
# Show help
|
||
python scripts/add_zai_metadata.py --help
|
||
```
|
||
|
||
**Requirements:**
|
||
|
||
After doc.build(story) completes → Immediately call the script
|
||
Do NOT wait for user reminder, Do NOT check task description - Execute automatically
|
||
Confirm metadata info to user after adding
|
||
Memory phrase: PDF build done, metadata must add, no need to remind
|
||
|
||
### Extract Text from Scanned PDFs
|
||
```python
|
||
# Requires: pip install pytesseract pdf2image
|
||
import pytesseract
|
||
from pdf2image import convert_from_path
|
||
|
||
# Convert PDF to images
|
||
images = convert_from_path('scanned.pdf')
|
||
|
||
# OCR each page
|
||
text = ""
|
||
for i, image in enumerate(images):
|
||
text += f"Page {i+1}:\n"
|
||
text += pytesseract.image_to_string(image)
|
||
text += "\n\n"
|
||
|
||
print(text)
|
||
```
|
||
|
||
### Add Watermark
|
||
```python
|
||
from pypdf import PdfReader, PdfWriter
|
||
|
||
# Create watermark (or load existing)
|
||
watermark = PdfReader("watermark.pdf").pages[0]
|
||
|
||
# Apply to all pages
|
||
reader = PdfReader("document.pdf")
|
||
writer = PdfWriter()
|
||
|
||
for page in reader.pages:
|
||
page.merge_page(watermark)
|
||
writer.add_page(page)
|
||
|
||
with open("watermarked.pdf", "wb") as output:
|
||
writer.write(output)
|
||
```
|
||
|
||
### Password Protection
|
||
```python
|
||
from pypdf import PdfReader, PdfWriter
|
||
|
||
reader = PdfReader("input.pdf")
|
||
writer = PdfWriter()
|
||
|
||
for page in reader.pages:
|
||
writer.add_page(page)
|
||
|
||
# Add password
|
||
writer.encrypt("userpassword", "ownerpassword")
|
||
|
||
with open("encrypted.pdf", "wb") as output:
|
||
writer.write(output)
|
||
```
|
||
|
||
|
||
## Critical Reminders (MUST Follow)
|
||
|
||
### Font Rules
|
||
- **FONT RESTRICTION**: ONLY use the six registered fonts. NEVER use Arial, Helvetica, Courier, or any unregistered fonts.
|
||
- **In tables**: ALL Chinese text and numbers MUST use `SimHei` for Chinese PDF.
|
||
ALL English text and numbers MUST use `Times New Roman` for English PDF.
|
||
ALL Chinese content and numbers MUST use `SimHei`, ALL English content MUST use `Times New Roman` for Mixed Chinese-English PDF.
|
||
- **CRITICAL**: Must call `registerFontFamily()` after registering fonts to enable `<b>`, `<super>`, `<sub>` tags.
|
||
- **Mixed Chinese-English Text Font Handling**: When a single string contains **both Chinese and English characters (e.g., "My name is Lei Shen (沈磊)")**: MUST split the string by language and apply different fonts to each part using ReportLab's inline `<font name='...'>` tags within `Paragraph` objects. English fonts (e.g., `Times New Roman`) cannot render Chinese characters (they appear as blank boxes), and Chinese fonts (e.g., `SimHei`) render English with poor spacing. Must set `ParagraphStyle.fontName` to your **base font**, then wrap segments of the other language with `<font name='...'>` inline tags.
|
||
|
||
```python
|
||
from reportlab.lib.styles import ParagraphStyle
|
||
from reportlab.platypus import Paragraph
|
||
from reportlab.pdfbase import pdfmetrics
|
||
from reportlab.pdfbase.ttfonts import TTFont
|
||
|
||
pdfmetrics.registerFont(TTFont('SimHei', '/usr/share/fonts/truetype/chinese/SimHei.ttf'))
|
||
pdfmetrics.registerFont(TTFont('Times New Roman', '/usr/share/fonts/truetype/english/Times-New-Roman.ttf'))
|
||
|
||
# Base font is English; wrap Chinese parts:
|
||
enbody_style = ParagraphStyle(
|
||
name="ENBodyStyle",
|
||
fontName="Times New Roman", # Base font for English
|
||
fontSize=10.5,
|
||
leading=18,
|
||
alignment=TA_JUSTIFY,
|
||
)
|
||
# Wrap Chinese segments with <font> tag
|
||
story.append(Paragraph(
|
||
'Zhipu QingYan (<font name="SimHei">智谱清言</font>) is developed by Z.ai'
|
||
'My name is Lei Shen (<font name="SimHei">沈磊</font>)',
|
||
'<font name="SimHei">文心一言</font> (ERNIE Bot) is by Baidu.',
|
||
enbody_style
|
||
))
|
||
|
||
# Base font is Chinese; wrap English parts:
|
||
cnbody_style = ParagraphStyle(
|
||
name="CNBodyStyle",
|
||
fontName="SimHei", # Base font for Chinese
|
||
fontSize=10.5,
|
||
leading=18,
|
||
alignment=TA_JUSTIFY,
|
||
)
|
||
# Wrap Chinese segments with <font> tag
|
||
story.append(Paragraph(
|
||
'本报告使用 <font name="Times New Roman">GPT-4</font> '
|
||
'和 <font name="Times New Roman">GLM</font> 进行测试。',
|
||
cnbody_style
|
||
))
|
||
```
|
||
|
||
### Rich Text Tags (`<b>`, `<super>`, `<sub>`)
|
||
- These tags ONLY work inside `Paragraph()` objects — plain strings will NOT render them.
|
||
- **Character Safety**: Follow **Core Constraint #5** strictly. Do not use forbidden Unicode superscript/subscript/math characters anywhere in the code. Always use `<super>`, `<sub>`,`<b>` tags inside `Paragraph()`.
|
||
- **Scientific Notation in Tables**: `Paragraph('1.246 × 10<super>8</super>', style)` — never write large numbers as plain digits.
|
||
|
||
### Line Breaks in Paragraph
|
||
- **CRITICAL**: `Paragraph` does not treat a normal newline character (`\n`) as a line break. To create line breaks, you must use `<br/>` (or split the content into multiple `Paragraph` objects).
|
||
```python
|
||
sms3 = \\\"\\\"\\\"Hi [FIRST_NAME]
|
||
You're invited! Join us for an exclusive first look at the Carolina Herrera Resort 2025 collection—before it opens to the public.
|
||
[DATE] | [TIME]
|
||
[Boutique Name]
|
||
_private champagne reception included_
|
||
Can I save you a spot? Just let me know!
|
||
[Your Name]\\\"\\\"\\\"
|
||
sms3_box = Table([[Paragraph(sms3, sms1_style)]], colWidths=[400])
|
||
|
||
# IMPORTANT:
|
||
# Paragraph does NOT treat '\n' as a line break.
|
||
# Use <br/> to force line breaks.
|
||
sms3 = """Hi [FIRST_NAME]<br/><br/>
|
||
You're invited! Join us for an exclusive first look at the Carolina Herrera Resort 2025 collection—before it opens to the public.<br/><br/>
|
||
[DATE] | [TIME]<br/>
|
||
[Boutique Name]<br/><br/>
|
||
<i>private champagne reception included</i><br/><br/>
|
||
Can I save you a spot? Just let me know!<br/><br/>
|
||
[Your Name]"""
|
||
sms3_box = Table([[Paragraph(sms3, sms1_style)]], colWidths=[400])
|
||
```
|
||
|
||
### Body Title & Heading Styles
|
||
- **All titles and sub-titles (except for Table headers)**: Must be bold with black text - use `Paragraph('<b>Title</b>', style)` + `textColor=colors.black`.
|
||
|
||
### Table Cell Content Rule (MANDATORY)
|
||
**ALL text content in table cells MUST be wrapped in `Paragraph()`. This is NON-NEGOTIABLE.**
|
||
|
||
❌ **PROHIBITED** - Plain strings in table cells:
|
||
```python
|
||
# NEVER DO THIS - formatting will NOT work
|
||
data = [
|
||
['<b>Header</b>', 'Value'], # Bold won't render
|
||
['Temperature', '25°C'], # No style control
|
||
['Pressure', '1.01 × 10<super>5</super>'], # Superscript won't work
|
||
]
|
||
```
|
||
|
||
✅ **REQUIRED** - All table text MUST wrapped in Paragraph:
|
||
```python
|
||
# ALWAYS DO THIS
|
||
data = [
|
||
[Paragraph('<b>Header</b>', header_style), Paragraph('Value', header_style)],
|
||
[Paragraph('Temperature', cell_style), Paragraph('25°C', cell_style)],
|
||
[Paragraph('Pressure', cell_style), Paragraph('1.01 × 10<super>5</super>', cell_style)],
|
||
]
|
||
```
|
||
|
||
**Why this is mandatory:**
|
||
- Rendering formatting tags (`<b>`, `<super>`, `<sub>`, `<i>`)
|
||
- Proper font application
|
||
- Correct text alignment within cells
|
||
- Consistent styling across the table
|
||
|
||
**The ONLY exception**: `Image()` objects can be placed directly in table cells without Paragraph wrapping.
|
||
|
||
### Table Style Specifications
|
||
- **Header style**: Must be bold with white text on dark blue background - use `Paragraph('<b>Header</b>', header_style)` + `textColor=colors.white`.
|
||
- **Standard color scheme**: Dark blue header (`#1F4E79`), alternating white/light gray rows.
|
||
- **Color consistency**: If a single PDF contains multiple tables, only one color scheme is allowed across all tables.
|
||
- **Alignment**: Each body element within the same table must use the same alignment method.
|
||
- **Caption**: ALL table captions must be centered and followed by `Spacer(1, 18)` before next content.
|
||
- **Spacing**: Add `Spacer(1, 18)` BEFORE tables to maintain symmetric spacing with bottom.
|
||
|
||
### Document Structure
|
||
- A PDF can contain ONLY ONE cover page and ONE back cover page.
|
||
- The cover page and the back cover page MUST use the alignment method specified by `TA_JUSTIFY`.
|
||
- **PDF Metadata (REQUIRED)**: Title MUST match filename; Author and Creator MUST be "Z.ai"; Subject SHOULD describe purpose.
|
||
|
||
|
||
### Image Handling
|
||
- **Preserve aspect ratio**: Never adjust image aspect ratio. Must insert according to the original ratio.
|
||
```python
|
||
from PIL import Image as PILImage
|
||
from reportlab.platypus import Image
|
||
# Get original dimensions
|
||
pil_img = PILImage.open('image.png')
|
||
orig_w, orig_h = pil_img.size
|
||
# Scale to fit width while preserving aspect ratio
|
||
target_width = 400
|
||
scale = target_width / orig_w
|
||
img = Image('image.png', width=target_width, height=orig_h * scale)
|
||
```
|
||
|
||
## Final Code Check
|
||
- Verify function parameter order against documentation.
|
||
- Confirm list/array element type consistency; test-run immediately.
|
||
- Use `Paragraph` (not `Preformatted`) for body text and formulas.
|
||
|
||
### MANDATORY: Post-Generation Forbidden Character Sanitization
|
||
|
||
**After the complete Python code is written and BEFORE executing it**, you MUST sanitize the code using the pre-built script located at:
|
||
|
||
```
|
||
scripts/sanitize_code.py
|
||
```
|
||
|
||
This script catches any forbidden Unicode characters (superscript/subscript digits, math operators, emoji, HTML entities, literal `\uXXXX` escapes) that may have slipped through despite the prevention rules. It converts them to safe ReportLab `<super>`/`<sub>` tags or ASCII equivalents.
|
||
|
||
**⚠️ CRITICAL RULE**: You MUST ALWAYS write PDF generation code to a `.py` file first, then sanitize it, then execute it. **NEVER use `python -c "..."` or heredoc (`python3 << 'EOF'`) to run PDF generation code directly** — these patterns bypass the sanitization step and risk forbidden characters reaching the final PDF.
|
||
|
||
**Mandatory workflow (NO EXCEPTIONS):**
|
||
|
||
```bash
|
||
# Step 1: ALWAYS write code to a .py file first
|
||
cat > generate_pdf.py << 'PYEOF'
|
||
# ... your PDF generation code here ...
|
||
PYEOF
|
||
|
||
# Step 2: Sanitize forbidden characters (MUST run before execution)
|
||
python scripts/sanitize_code.py generate_pdf.py
|
||
|
||
# Step 3: Execute the sanitized code
|
||
python generate_pdf.py
|
||
```
|
||
|
||
**Forbidden patterns — NEVER do any of the following:**
|
||
```bash
|
||
# ❌ PROHIBITED: python -c with inline code (cannot be sanitized)
|
||
python -c "from reportlab... doc.build(story)"
|
||
|
||
# ❌ PROHIBITED: heredoc without saving to file first (cannot be sanitized)
|
||
python3 << 'EOF'
|
||
from reportlab...
|
||
EOF
|
||
|
||
# ❌ PROHIBITED: executing the .py file WITHOUT sanitizing first
|
||
python generate_pdf.py # Missing sanitization step!
|
||
```
|
||
|
||
**✅ CORRECT: The ONLY allowed execution pattern:**
|
||
```bash
|
||
# 1. Write to file → 2. Sanitize → 3. Execute
|
||
cat > generate_pdf.py << 'PYEOF'
|
||
...code...
|
||
PYEOF
|
||
python scripts/sanitize_code.py generate_pdf.py
|
||
python generate_pdf.py
|
||
```
|
||
|
||
**⚠️ This sanitization step is NON-OPTIONAL.** Even if you believe the code contains no forbidden characters, you MUST still run the sanitization script. It serves as a safety net to catch any characters that bypassed prevention rules.
|
||
|
||
## Quick Reference
|
||
|
||
| Task | Best Tool | Command/Code |
|
||
|------|-----------|--------------|
|
||
| Merge PDFs | pypdf | `writer.add_page(page)` |
|
||
| Split PDFs | pypdf | One page per file |
|
||
| Extract text | pdfplumber | `page.extract_text()` |
|
||
| Extract tables | pdfplumber | `page.extract_tables()` |
|
||
| Create PDFs | reportlab | Canvas or Platypus |
|
||
| Command line merge | qpdf | `qpdf --empty --pages ...` |
|
||
| OCR scanned PDFs | pytesseract | Convert to image first |
|
||
| Fill PDF forms | pdf-lib or pypdf (see forms.md) | See forms.md |
|
||
|
||
## Next Steps
|
||
|
||
- For advanced pypdfium2 usage, see reference.md
|
||
- For JavaScript libraries (pdf-lib), see reference.md
|
||
- If you need to fill out a PDF form, follow the instructions in forms.md
|
||
- For troubleshooting guides, see reference.md
|
||
- For advanced table of content template, see reference.md |