Files
mantle-ai-trader/skills/pdf/briefs/academic.md
2026-06-06 05:21:10 +00:00

1059 lines
45 KiB
Markdown
Executable File
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Brief: Academic Production
Scholarly documents via LaTeX/Tectonic: academic papers, theses, dissertations, mathematical manuscripts, IEEE/ACM submissions, academic CVs.
```
Academic request
├─ Paper / thesis / journal → §Standard Paper Workflow
│ Embed as needed during Phase 3:
│ ├─ Heavy math (>3 equations) → use §Scenario A preamble + environments
│ ├─ Vector diagrams → §Scenario B (simple→TikZ, complex→Playwright+CSS)
│ └─ Algorithm pseudocode → use §Scenario C template
└─ Standalone diagram for other brief → §Scenario B Path B (Playwright+CSS → PNG)
└─ Resume / CV
├─ Creative / tech / startup → §Template A (AltaCV dual-column)
└─ Academic position / PhD → §Template B (Academic CV)
```
**No typesetting assets needed** - LaTeX templates handle their own design system. The `palette.md` color system does not apply to LaTeX documents.
---
## §Standard Paper Workflow
Six phases. Rules and guardrails are embedded in each phase where they apply.
### Phase 1 - BRIEF
Confirm with the user:
- Document type (journal article, thesis chapter, conference paper, technical report)
- Template requirement (IEEE, ACM, custom class, or plain `article`)
- Bibliography format (natbib superscript, numeric brackets, biblatex)
- Expected length → if >5 pages or 3+ complex elements (wide tables, block equations, TikZ), split into `\input{}` modules (~500 lines each)
**Recognising the document's personality:**
| Personality | Examples | Key concern |
|-------------|----------|-------------|
| Scholarly | Journal articles, conference proceedings | Academic conventions, bibliography accuracy |
| Utilitarian | Technical reports, manuals, specs | Information density + scannability |
| Persuasive | Proposals, pitch documents | Professional polish + 1-2 visual high-points |
| Expressive | Portfolios, brand guidebooks | Bold typographic choices |
### Phase 2 - SETUP
**Title Page (Cover) Rules:**
- **Academic route covers are generated via HTML/Playwright**, using Templates 08-11 from `typesetting/cover.md`. Templates 08-10 replicate LaTeX title page aesthetics (dark backgrounds, serif titles, symmetric layouts) in HTML/CSS. **Template 11 (Institutional)** is for thesis proposals, dissertations, and formal institutional submissions (white bg + black border frame).
- **Pipeline:** Generate body PDF via Tectonic (no title page in `.tex`) → Generate cover HTML (Template 08/09/10/11) → Playwright `page.pdf()` → Merge cover as page 0 via pypdf
- **Template selection:** For thesis proposals (开题报告), dissertations (毕业论文), and institutional submissions → **default to Template 11**. For research papers, preprints, journal submissions → Templates 08-10.
- **NEVER use `\maketitle`** - it produces ugly default output with cramped spacing
- **NEVER use `\begin{titlepage}...\end{titlepage}`** - the cover is generated separately via HTML/Playwright
- **NEVER use LaTeX TikZ overlay for full-page covers** - TikZ `current page` coordinates are unreliable with `margin=0pt`, causing backgrounds to not fill the page (right/bottom white edges). HTML/CSS full-bleed is pixel-exact.
- Title page is OPTIONAL - skip it for short documents (≤ 2 pages), letters, memos, or when content scanning is priority
- **`\tableofcontents` must be the FIRST page** of the body PDF (after merge, it becomes page 2)
- If no TOC, content starts on page 1 of the body PDF
**Cover Generation Pipeline (Academic route):**
```
1. Write .tex WITHOUT any title page
2. Run poster_validate.py check-tex on .tex file - fix table overflow / image width ERRORs
3. Compile with tectonic → body.pdf
4. Write cover HTML using Template 08/09/10/11 from typesetting/cover.md (Template 11 for thesis proposals/dissertations)
5. Run poster_validate.py check-html on cover HTML - fix any ERRORs
6. Run cover_validate.js on cover HTML - fix any text-line overlaps
7. Render cover HTML → PDF via Playwright (`html2poster.js`) — **NOT `html2pdf-next.js`** (which converts absolute→static and destroys cover layout)
8. Merge: insert cover as page 0 of body PDF via pypdf
```
**Cover HTML → PDF rendering:**
```bash
# ALWAYS use html2poster.js for cover rendering (NOT html2pdf-next.js)
# Cover pages use position:absolute layout — html2pdf-next.js pre-render hooks
# convert absolute→static and destroy the layout. html2poster.js preserves it.
node "$PDF_SKILL_DIR/scripts/html2poster.js" cover.html --output cover.pdf --width 794px
```
Or from Python:
```python
import subprocess, os
def render_cover(html_path, pdf_path):
"""
Render HTML cover to PDF via html2poster.js.
⚠️ ALWAYS use html2poster.js for covers (NOT html2pdf-next.js).
Cover HTML uses position:absolute for layout. html2pdf-next.js pre-render
hooks convert absolute→static to prevent multi-page overlap, which
destroys cover layouts. html2poster.js preserves absolute positioning.
"""
scripts_dir = os.path.join(PDF_SKILL_DIR, 'scripts') # PDF_SKILL_DIR from SKILL.md § Script Path Setup
subprocess.run([
'node', os.path.join(scripts_dir, 'html2poster.js'),
html_path, '--output', pdf_path,
'--width', '794px',
], check=True)
```
**Merge cover + body:**
```python
from pypdf import PdfReader, PdfWriter, Transformation
A4_W, A4_H = 595.28, 841.89 # A4 in points
def normalize_page_to_a4(page):
"""Scale a page to A4 if its dimensions don't match."""
box = page.mediabox
w, h = float(box.width), float(box.height)
if abs(w - A4_W) > 2 or abs(h - A4_H) > 2:
sx, sy = A4_W / w, A4_H / h
page.add_transformation(Transformation().scale(sx=sx, sy=sy))
page.mediabox.lower_left = (0, 0)
page.mediabox.upper_right = (A4_W, A4_H)
return page
writer = PdfWriter()
cover_page = normalize_page_to_a4(PdfReader('cover.pdf').pages[0])
writer.add_page(cover_page)
for page in PdfReader('body.pdf').pages:
writer.add_page(page)
with open('final.pdf', 'wb') as f:
writer.write(f)
```
**→ Full cover templates: see §PART 4.5 in `typesetting/cover.md` (Templates 08-10).**
> **⚠️ Why HTML/Playwright covers?** LaTeX TikZ `remember picture, overlay` with `margin=0pt` frequently fails to fill the page (right/bottom edges show white). HTML/CSS with `@page { margin: 0 }` and full-bleed background is pixel-exact, with zero ambiguity. This also unifies all three routes (Report, Creative, Academic) under one cover system.
Write the preamble. Start from this foundation and customise per document:
```latex
\documentclass{article}
\usepackage{graphicx}
\usepackage{xcolor}
\usepackage{geometry}
\usepackage{amsmath} % Load before hyperref
% hyperref - ALWAYS last among content packages
\usepackage[
colorlinks=true,
linkcolor=blue,
citecolor=darkgray,
urlcolor=blue,
bookmarks=true,
bookmarksnumbered=true,
unicode=true
]{hyperref}
\geometry{a4paper, top=2.5cm, bottom=2.5cm, left=2.5cm, right=2.5cm}
% ⚠️ left and right MUST be equal - asymmetric margins cause off-center content
\usepackage[numbers,super,sort&compress]{natbib}
\bibliographystyle{unsrtnat}
\usepackage{tcolorbox}
\usepackage{colortbl}
\usepackage{booktabs}
\usepackage{enumitem}
\usepackage{tabularx} % Auto-width columns (X type) - prevents table overflow
\usepackage{adjustbox} % \adjustbox{max width=\columnwidth} for emergency table fitting
```
**Guardrails for SETUP:**
- `hyperref` must load after virtually every other package - option clashes are the #1 preamble bug
- When using a Scenario template (A/B/C) or Resume template, use that template's own preamble instead
- `babel` and `polyglossia` are incompatible - load only one
- CJK: `\usepackage{ctex}` - Tectonic auto-downloads fonts, zero manual setup
- System fonts via `\setmainfont{}`: probe first with `fc-list :lang=XX`
- **🔴 Margin symmetry:** `\geometry{left=X, right=X}` - left and right MUST be equal. Asymmetric margins = off-center content = critical bug
- **🔴 Minimum margins with fancyhdr:** When using `fancyhdr` for headers/footers, `geometry` margins must leave enough room. **Minimum: `top >= 2.0cm`, `bottom >= 1.8cm`**. Also set `\setlength{\headheight}{14pt}` in the preamble. Margins smaller than this cause headers/footers to be pushed outside the page boundary (negative y-coordinates), making them invisible in print.
- **🔴 Quotation marks (English):** NEVER use straight quotes `"..."`。English text must use LaTeX curly quotes: ` ``left quote'' ` for double, `` `single' `` for single. Straight `"` in LaTeX means right double quote only.
- **🔴 Quotation marks (Chinese — CRITICAL):** Chinese quoted text like "北漂" MUST use Unicode smart quotes "…" (U+201C/U+201D) directly in the `.tex` source. **NEVER use ASCII `"` for Chinese quotes** — LaTeX interprets `"` as a right double quote (`"`), so `"北漂"` renders as `"北漂"` (two right quotes, no left quote). The correct LaTeX source is: `"北漂"` (literal Unicode characters). `\usepackage{csquotes}` is a safety net but does NOT fix raw ASCII `"` in Chinese text.
- **Scope:** This rule applies ONLY to Chinese-language body text. Do NOT replace `"` in English paragraphs (use ` ``...'' ` instead), `verbatim`/`lstlisting`/`minted` environments, `\texttt{}`/`\verb||`/`\url{}`/`\href{}{}` arguments, or BibTeX `.bib` field values.
- **🔴 Title page isolation:** Cover is generated via HTML/Playwright and merged as page 0 via pypdf - isolation is inherent in the merge pipeline. `\tableofcontents` should be the first page of the `.tex` body. Verify: does TOC start on the page immediately after the cover in the merged PDF?
- **🔴 TOC requires a cover page:** Unless the user explicitly requests no cover, if the document has `\tableofcontents`, it MUST have a cover page. Structure: Cover (page 1) → TOC (page 2) → Content (page 3+). Do not generate a TOC without a preceding cover page. This rule is consistent with `briefs/report.md`.
**When no style is specified**, apply a measured, high-craft system:
1. **Contrast** - clear figure-ground separation
2. **Hierarchy** - size, weight, hue variation for reading order
3. **White space** - ample margins and leading
4. **Coherence** - one typeface family, one accent colour, one spacing rhythm
Add enrichment proactively when content benefits:
- Callout boxes, sidebars → `tcolorbox`
- Theorem/definition/proof → `amsthm` + `tcolorbox`
- Headers/footers → `fancyhdr`; chapter openers → `titlesec`
### Phase 3 - BUILD
Write LaTeX content: sections, equations, figures, tables, bibliography.
**→ Overflow prevention**: See `typesetting/overflow.md` for the LaTeX-specific patterns (tabularx, adjustbox, widowpenalty, etc.). Key rules:
- Tables: always use `tabularx` or `tabular*` with `\columnwidth` constraint - never plain `tabular` for 5+ columns
- Images: always `\includegraphics[max width=\columnwidth, max height=0.4\textheight]` or `adjustbox` - the `max height` prevents a single figure from occupying an entire page
- Orphans/widows: set `\widowpenalty=10000` and `\clubpenalty=10000`
- Long tables: use `longtable` with `\endhead` for header repetition on every page
**Embedding components**: If the document needs heavy math, diagrams, or algorithms, refer to the Scenario sections below and embed their templates/environments into your `.tex` file. Scenarios are not separate tasks - they are building blocks for Phase 3.
**Content density guidance (for textbooks, lecture notes, tutorials):**
Documents meant for learning should maintain a healthy balance of prose and formal elements. A page full of equations with no explanation reads like a reference manual, not a textbook.
Guidelines:
- Every equation/equation group should be **preceded** by a sentence explaining what it represents and **followed** by a sentence interpreting the result or stating its significance
- Every figure should have (a) a descriptive `\caption{}` and (b) at least one sentence in the surrounding text referencing it
- Every theorem/definition should be followed by an intuitive explanation or worked example before proceeding to the next theorem
- Avoid stacking 3+ formal elements (equations, figures, tables) with zero narrative text between them
- For worked examples: state the problem, show the solution steps, then summarize the key takeaway
**When high density is acceptable**: research papers (especially methods sections where equation groups naturally cluster), formula sheets, reference appendices, and conference papers with tight page limits. In these cases, prioritise completeness over readability padding.
**Source hygiene - catch these model-generation slips:**
- **Prohibited**: emoji glyphs (tofu), markdown `*asterisk*` formatting (compile errors)
- **Use instead**: `\textbf{bold}`, `\emph{emphasis}`
**Table placement - prevent header orphans:**
- Short tables (≤15 rows): wrap in `\begin{table}[htbp]` - LaTeX keeps it together
- Long tables (>15 rows): use `longtable` with repeated header:
```latex
\usepackage{longtable}
\begin{longtable}{lll}
\toprule
Header 1 & Header 2 & Header 3 \\
\midrule
\endfirsthead
\toprule
Header 1 & Header 2 & Header 3 \\ % repeated on continuation pages
\midrule
\endhead
\bottomrule
\endfoot
Row 1 & data & data \\
Row 2 & data & data \\
\end{longtable}
```
- **Never** let a table header sit alone at the bottom of a page with no data rows following it
**Table width management - prevent column overflow (⚠️ CRITICAL):**
Tables overflowing the column width is the most common LaTeX layout bug in dual-column papers. The table looks fine in single-column preview but clips in IEEE/ACM two-column format.
**Prevention strategy (in priority order):**
1. **Use `tabular*` or `tabularx` to constrain width** (RECOMMENDED):
```latex
% tabular* - fixed total width, stretches inter-column space
\begin{table}[htbp]
\centering
\caption{Results.}\label{tab:results}
\begin{tabular*}{\columnwidth}{@{\extracolsep{\fill}} l cccc}
\toprule
Method & P@10 & R@10 & NDCG@10 & MRR \\
\midrule
Ours & \textbf{0.082} & \textbf{0.054} & \textbf{0.043} & \textbf{0.029} \\
\bottomrule
\end{tabular*}
\end{table}
% tabularx - fixed total width, X columns auto-stretch
\usepackage{tabularx}
\begin{tabularx}{\columnwidth}{l X X X X}
...
\end{tabularx}
```
2. **Reduce font size inside table** (common in conference papers):
```latex
\begin{table}[htbp]
\centering
\small % or \footnotesize for very wide tables
\caption{Comparison.}\label{tab:comp}
\begin{tabular}{lcccccccc}
...
\end{tabular}
\end{table}
```
3. **`\resizebox` as last resort** (scales the entire table to fit):
```latex
\begin{table}[htbp]
\centering
\caption{Full results.}\label{tab:full}
\resizebox{\columnwidth}{!}{%
\begin{tabular}{lcccccccc}
\toprule
...
\bottomrule
\end{tabular}
}
\end{table}
```
⚠️ `\resizebox` scales fonts too - verify the smallest text is still readable (≥ 6pt effective).
4. **Span both columns** for genuinely wide tables (8+ data columns):
```latex
\begin{table*}[t] % table* spans full width in twocolumn
\centering
\caption{Cross-dataset results.}\label{tab:cross}
\begin{tabular}{lccccccccccc}
...
\end{tabular}
\end{table*}
```
**Decision checklist before writing any table:**
| Data columns | Single-column (`\columnwidth`) | Action |
|-------------|-------------------------------|--------|
| ≤ 4 | Fits comfortably | Normal `tabular` |
| 5-6 | Tight fit | `\small` + `tabular*{\columnwidth}` |
| 7-8 | Won't fit | `\footnotesize` + `tabular*`, or `\resizebox` |
| ≥ 9 | Definitely won't fit | Use `table*` (full width), or split into two tables |
**Never**: use a plain `tabular` with 8+ columns in a two-column paper without width constraint - it WILL overflow.
**⚠️ CRITICAL: `\resizebox{\columnwidth}` NOT `\resizebox{\textwidth}` in two-column layouts!**
In dual-column documents (`twocolumn`, `sigconf`, etc.), `\textwidth` = **full page width** (both columns), while `\columnwidth` = **single column width**. Using `\resizebox{\textwidth}` inside a `table` (single-column float) scales the table to the full page width, causing it to overflow the column boundary by ~50%. **Always use `\resizebox{\columnwidth}` for single-column floats.** Only use `\resizebox{\textwidth}` inside `table*` (full-width float).
```latex
% ❌ WRONG in two-column layout
\begin{table}[t]
\resizebox{\textwidth}{!}{% <-- \textwidth = full page, table overflows column!
\begin{tabular}{lcccccccc} ... \end{tabular}}
\end{table}
% ✅ CORRECT for single-column float
\begin{table}[t]
\resizebox{\columnwidth}{!}{% <-- \columnwidth = single column width
\begin{tabular}{lcccccccc} ... \end{tabular}}
\end{table}
% ✅ CORRECT for full-width float
\begin{table*}[t]
\resizebox{\textwidth}{!}{% <-- \textwidth OK here because table* spans both columns
\begin{tabular}{lcccccccc} ... \end{tabular}}
\end{table*}
```
---
**Equation overflow prevention (⚠️ CRITICAL for dual-column papers):**
Long equations are the **#2 overflow source** after tables in dual-column papers. Column width in ACM `sigconf` is ~241pt; in IEEE `twocolumn` ~252pt. Many standard math expressions exceed this.
**Overflow patterns and fixes:**
| Pattern | Problem | Fix |
|---------|---------|-----|
| Two equations side-by-side with `\quad` | Combined width > column | Split into `align` with one equation per line |
| Deep fraction nesting (softmax, attention) | Denominator sum too wide | Use `\smash` + separate definition, or `split` |
| Long subscripts/superscripts with `\text{}` | `\text{collab}`, `\text{social}` are wide | Use short math abbreviations: `c`, `s`, or define `\newcommand` |
| `equation` with multiple terms separated by `\quad` | Horizontal overflow | Use `aligned` inside `equation`, or `align` |
**Rule M1 — Never put two independent equations on one line in dual-column:**
```latex
% ❌ WRONG — two full equations on one line, guaranteed overflow in sigconf
\begin{equation}
\mathbf{e}_u^{(l+1)} = \sum_{i} \frac{1}{\sqrt{|N_R(u)|\cdot|N_R(i)|}} \mathbf{e}_i^{(l)}, \quad
\mathbf{e}_i^{(l+1)} = \sum_{u} \frac{1}{\sqrt{|N_R(i)|\cdot|N_R(u)|}} \mathbf{e}_u^{(l)}
\end{equation}
% ✅ CORRECT — split into aligned or separate equations
\begin{align}
\mathbf{e}_u^{(l+1)} &= \sum_{i \in \mathcal{N}_R(u)} \frac{\mathbf{e}_i^{(l)}}{\sqrt{|\mathcal{N}_R(u)| \cdot |\mathcal{N}_R(i)|}}, \label{eq:collab_u} \\
\mathbf{e}_i^{(l+1)} &= \sum_{u \in \mathcal{N}_R(i)} \frac{\mathbf{e}_u^{(l)}}{\sqrt{|\mathcal{N}_R(i)| \cdot |\mathcal{N}_R(u)|}}. \label{eq:collab_i}
\end{align}
```
**Rule M2 — Wide fractions: use `split` or `multline`:**
```latex
% ❌ WRONG — softmax with long denominator
\begin{equation}
\alpha_{uv} = \frac{\exp(\text{LeakyReLU}(\mathbf{a}^{\top}[\mathbf{W}\mathbf{e}_u \| \mathbf{W}\mathbf{e}_v]))}{\sum_{k \in \mathcal{N}_S(u)} \exp(\text{LeakyReLU}(\mathbf{a}^{\top}[\mathbf{W}\mathbf{e}_u \| \mathbf{W}\mathbf{e}_k]))}
\end{equation}
% ✅ CORRECT — define numerator/denominator separately
\begin{equation}
\alpha_{uv} = \frac{\exp\bigl(f(\mathbf{e}_u, \mathbf{e}_v)\bigr)}{\sum_{k \in \mathcal{N}_S(u)} \exp\bigl(f(\mathbf{e}_u, \mathbf{e}_k)\bigr)},
\end{equation}
\text{where } f(\mathbf{e}_u, \mathbf{e}_v) = \text{LeakyReLU}\bigl(\mathbf{a}^{\top} [\mathbf{W}\mathbf{e}_u \| \mathbf{W}\mathbf{e}_v]\bigr).
```
**Rule M3 — Self-check: if `equation` body > 60 characters (excluding `\label`), it probably overflows dual-column:**
This is a quick mental check. If the raw LaTeX math string is very long, it almost certainly won't fit in ~241pt. Use `align`, `split`, `multline`, or factor out sub-expressions.
**Rule M4 — Contrastive loss / InfoNCE: always use `multline` or `split`:**
Contrastive losses with `\frac{\exp(...)}{\sum \exp(...)}` inside `\log` are notoriously wide. Always break them across lines:
```latex
\begin{multline}
\mathcal{L}_{\text{SSL}}^u = -\log \frac{\exp\bigl(\text{sim}(\mathbf{z}_u', \mathbf{z}_u'') / \tau\bigr)}
{\sum_{v \neq u} \exp\bigl(\text{sim}(\mathbf{z}_u', \mathbf{z}_v'') / \tau\bigr)}.
\end{multline}
```
---
**Algorithm overflow prevention (dual-column papers):**
Algorithm boxes with long `\KwInput` lines or verbose pseudocode frequently overflow column width.
**Rule A1 — Always set `\SetAlFnt{\small}` and limit line width:**
```latex
\SetAlFnt{\small} % Smaller font inside algorithm
\SetAlCapFnt{\small} % Smaller caption font
\SetAlCapNameFnt{\small} % Smaller "Algorithm N:" prefix
```
**Rule A2 — Break long Input/Output lines:**
```latex
% ❌ WRONG — all parameters on one line
\KwInput{Interaction graph $\mathcal{G}_R$, social graph $\mathcal{G}_S$, embedding dimension $d$, number of GNN layers $L$, learning rate $\eta$, regularization $\lambda$, SSL weight $\gamma$, temperature $\tau$}
% ✅ CORRECT — break into multiple lines
\KwInput{Interaction graph $\mathcal{G}_R$, social graph $\mathcal{G}_S$\\\quad embedding dim $d$, GNN layers $L$, learning rate $\eta$\\\quad regularization $\lambda$, SSL weight $\gamma$, temperature $\tau$}
```
**Rule A3 — Use `algorithm*` for genuinely wide algorithms:**
If the algorithm has many columns or very long lines that can't be shortened, use `\begin{algorithm*}` to span both columns.
**Rule A4 — Abbreviate variable names in pseudocode:**
Use compact notation: `emb` not `embedding`, `lr` not `learning\_rate`, `reg` not `regularization`. Define abbreviations in the Input line.
---
**Clickable navigation - every reference must be a live link:**
Attach `\label{}` right after each numbered element, cite with `\ref{}`:
```latex
\section{Background}\label{sec:bg}
\begin{figure}[htbp]
\includegraphics{...}
\caption{Overview}\label{fig:overview}
\end{figure}
\begin{equation}\label{eq:energy}
E = mc^2
\end{equation}
% All produce clickable hyperlinks:
Section~\ref{sec:bg}...
Figure~\ref{fig:overview}...
Equation~\eqref{eq:energy}... % \eqref auto-wraps in parentheses
```
**Label conventions**: `sec:`, `fig:`, `tab:`, `eq:`, `lst:` - use `~` (non-breaking space) before `\ref`.
**Bibliography** (three approaches):
```latex
% Approach 1: natbib superscript (preferred academic)
\usepackage[numbers,super,sort&compress]{natbib}
This has been studied\cite{smith2023}. % → studied^[1]
\bibliography{refs}
% Approach 2: natbib brackets
\usepackage[numbers]{natbib}
\cite{smith2023} % [1]
\citet{smith2023} % Smith (2023)
% Approach 3: biblatex
\usepackage[backend=biber,style=numeric-comp]{biblatex}
\addbibresource{refs.bib}
\printbibliography
```
**PDF metadata** (add before `\end{document}`):
```latex
\hypersetup{
pdftitle={Document Title},
pdfauthor={Author Name},
pdfsubject={Topic},
pdfkeywords={keyword1, keyword2}
}
```
**Table of Contents** (auto-clickable with hyperref):
```latex
\tableofcontents
\listoffigures % optional
\listoftables % optional
```
### Phase 4 - COMPILE
```bash
python3 "$PDF_SKILL_DIR/scripts/pdf.py" convert.latex main.tex --runs 2
```
Default: 2 passes (resolves cross-references). Use `--runs 3` when bibliography back-references are needed. Add `--keep-logs` for debugging.
**NEVER invoke Tectonic directly** - always use the wrapper. It strips noise, surfaces errors, and reports PDF statistics.
### Phase 5 - PREFLIGHT
The wrapper classifies diagnostics into three tiers:
| Tier | Impact | Action |
|------|--------|--------|
| Errors | Build aborts | Fix before anything else |
| Layout defects | Overfull/underfull boxes, missing glyphs | Repair prior to delivery |
| Advisories | Other warnings | Assess individually; fix when feasible |
**Never acceptable**: shrugging off warnings with "they don't affect the final PDF."
**Navigation troubleshooting:**
| Symptom | Fix |
|---------|-----|
| `??` in text | Recompile with `--runs 2` |
| Links not coloured | Add `colorlinks=true` to hyperref |
| `[?]` beside citations | Check `.bib` path; rebuild |
| No PDF bookmarks | Set `bookmarks=true` |
### Phase 6 - DELIVER
Final PDF. Confirm page count, cross-references resolved, no `??` placeholders.
---
## §Scenario A: Math-Heavy Technical Documents
> Use standalone for a pure-math document, or embed the preamble additions + environments into a §Standard Paper Workflow document.
When the document has **more than 3 non-trivial equations** (matrices, aligned systems, integrals, summations), use this template instead of plain `article`.
**Decision rule**: If you find yourself writing `<super>` tags or KaTeX CDN includes for math, stop and switch here.
```latex
\documentclass[11pt,a4paper]{article}
\usepackage{amsmath,amssymb,amsthm}
\usepackage[margin=2.5cm]{geometry}
\usepackage{ctex} % CJK support via Tectonic
\usepackage{booktabs}
\usepackage{hyperref}
\theoremstyle{definition}
\newtheorem{definition}{Definition}[section]
\newtheorem{theorem}{Theorem}[section]
\newtheorem{lemma}[theorem]{Lemma}
\title{Document Title}
\author{Author}
\date{\today}
\begin{document}
% ═══════════════════════════════════════════════════════
% NO TITLE PAGE IN .tex - Cover is generated separately
% via HTML/Playwright and merged as page 0.
% Body PDF starts directly with TOC or content.
% ═══════════════════════════════════════════════════════
\tableofcontents
\newpage
% Aligned equation group
\begin{align}
\nabla \cdot \mathbf{E} &= \frac{\rho}{\varepsilon_0} \\
\nabla \times \mathbf{B} &= \mu_0 \mathbf{J} + \mu_0\varepsilon_0 \frac{\partial \mathbf{E}}{\partial t}
\end{align}
% Matrix
\[
A = \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix}
\]
\end{document}
```
---
## §Scenario B: Diagram Generation for Academic Papers
> **Complexity-based routing**: Simple diagrams use TikZ natively (vector, font-consistent). Complex diagrams use Playwright+CSS → PNG → `\includegraphics` (models produce broken TikZ for complex branching logic).
### Decision: TikZ or Playwright+CSS?
| Criteria | → TikZ native | → Playwright+CSS → PNG |
|---|---|---|
| Node count | ≤6 | >6 |
| Topology | Linear chain, simple tree, layer stack | Branching, multi-path, feedback loops |
| Annotations | Minimal (labels only) | Side notes, legends, callout boxes |
| Math in nodes | Yes (LaTeX math rendering matters) | No (plain text labels) |
| Output | Vector (`tikzpicture` in document) | Raster PNG @2× (300dpi, publication-ready) |
### Path A: Simple Diagrams → TikZ Native
For ≤6 node linear/tree diagrams, embed `tikzpicture` directly or use standalone:
```latex
\documentclass[tikz,border=5pt]{standalone}
\usetikzlibrary{arrows.meta,positioning,calc,shapes.geometric}
\begin{document}
\begin{tikzpicture}[
box/.style={draw, rounded corners=2pt, minimum width=2.4cm,
minimum height=0.7cm, align=center, font=\small\sffamily},
arr/.style={-{Stealth[length=4pt]}, thick, gray!70}
]
\node[box, fill=blue!8] (a) {Input};
\node[box, fill=orange!8, right=1.2cm of a] (b) {Process};
\node[box, fill=green!8, right=1.2cm of b] (c) {Output};
\draw[arr] (a) -- (b);
\draw[arr] (b) -- (c);
\end{tikzpicture}
\end{document}
```
**Node text overflow prevention:**
When nodes contain multi-word labels (e.g. "CNN Spatial Encoder", "Multi-Head Attention"), text can overflow or overlap adjacent nodes. Prevent this:
```latex
% Use text width (not minimum width) to enable line wrapping inside nodes
box/.style={draw, rounded corners=2pt, text width=2.8cm,
minimum height=0.7cm, align=center, font=\small\sffamily},
% For nodes with long labels, use explicit line breaks
\node[box] {CNN Spatial\\Encoder};
\node[box] {Multi-Head\\Attention};
```
Rules:
- Prefer `text width` over `minimum width` when labels exceed 2 words - it wraps text instead of clipping
- When 3+ nodes sit side by side, verify total width fits within `\columnwidth` (single-column) or `0.45\textwidth` (dual-column)
- If labels still overflow, use abbreviations or `\scriptsize` - never let text clip outside node borders
- Always `\resizebox{\columnwidth}{!}{...}` when embedding in dual-column papers (see TikZ in multi-column section below)
**TikZ standalone embedding (simple diagrams only):**
```bash
# 1. Compile TikZ standalone to PDF
python3 "$PDF_SKILL_DIR/scripts/pdf.py" convert.latex diagram.tex # → diagram.pdf (tight-cropped)
# 2. Embed as vector in LaTeX
# \begin{figure}[t]
# \centering
# \includegraphics[width=\columnwidth]{diagram.pdf} % vector, not PNG
# \caption{System Architecture}\label{fig:arch}
# \end{figure}
```
**🚫 FORBIDDEN: Using TikZ standalone for Report or Creative briefs.** Those routes have no LaTeX compiler. See SKILL.md § "Diagram Generation Strategy".
### Path B: Complex Diagrams → Playwright+CSS → PNG
For >6 node diagrams with branching, annotations, or multi-path flows:
```bash
# 1. LLM generates diagram.html (CSS grid/flexbox nodes + SVG/CSS arrows)
# 2. Screenshot at 2× device scale factor for 300dpi print quality
python3 "$PDF_SKILL_DIR/scripts/pdf.py" convert.blueprint diagram.html --device-scale-factor 2 --output diagram.png
# 3. Embed in LaTeX
# \begin{figure}[t]
# \centering
# \includegraphics[width=\columnwidth]{diagram.png}
# \caption{System Architecture}\label{fig:arch}
# \end{figure}
```
**Why not TikZ for complex diagrams?** Models frequently produce broken TikZ code for >6 node graphs with conditional branches, multi-path convergence, and side annotations. The compilation-debug cycle wastes time. Playwright+CSS handles any layout natively, and 2× screenshots at A4 width give ~300dpi - indistinguishable from vector at print resolution.
### Complex Diagram Decomposition
When a diagram exceeds **12 nodes, 3 subgroups, or fills more than 40% of page height**, decompose into table + simplified overview:
```latex
% Step 1: Table carries the details
\begin{table}[htbp]
\centering
\caption{System Pipeline - Detailed Phases}\label{tab:pipeline}
\begin{tabular}{llp{5cm}}
\toprule
Phase & Module & Key Tasks \\
\midrule
Data Collection & Crawler & Crawl reviews, scrape prices, API calls \\
Data Processing & ETL Pipeline & Clean, tokenize, label, validate \\
Model Training & Trainer & Fine-tune LLM, distributed 64-GPU \\
Evaluation & Benchmark & A/B test, offline metrics, human eval \\
\bottomrule
\end{tabular}
\end{table}
% Step 2: Simplified TikZ shows only the flow skeleton
\begin{figure}[htbp]
\centering
\begin{tikzpicture}[
box/.style={draw, rounded corners=2pt, minimum width=2cm,
minimum height=0.6cm, align=center, font=\small\sffamily, fill=blue!5},
arr/.style={-{Stealth[length=4pt]}, thick, gray!60}
]
\node[box] (a) {Collection};
\node[box, right=0.8cm of a] (b) {Processing};
\node[box, right=0.8cm of b] (c) {Training};
\node[box, right=0.8cm of c] (d) {Evaluation};
\draw[arr] (a)--(b); \draw[arr] (b)--(c); \draw[arr] (c)--(d);
\end{tikzpicture}
\caption{Pipeline overview (see Table~\ref{tab:pipeline} for details).}\label{fig:pipeline-overview}
\end{figure}
```
**Rule of thumb**: The diagram gives intuition at a glance; the table carries precision.
**Common TikZ patterns**:
- Flowcharts: `positioning` + `arrows.meta` libraries
- Neural network layers: `fit` library + nested nodes
- Timelines: single axis with `\draw` segments
- Tree diagrams: `child` syntax or `forest` package
**CRITICAL: TikZ in multi-column documents (IEEE, ACM)**
Never place `tikzpicture` directly in multi-column body text - it overflows. Always wrap:
```latex
% Single-column figure (fits one column)
\begin{figure}[t]
\centering
\resizebox{\columnwidth}{!}{%
\begin{tikzpicture}[...]
...
\end{tikzpicture}
}
\caption{Architecture.}\label{fig:arch}
\end{figure}
% Full-width figure (spans both columns)
\begin{figure*}[t]
\centering
\resizebox{0.85\textwidth}{!}{%
\begin{tikzpicture}[...]
...
\end{tikzpicture}
}
\caption{System pipeline.}\label{fig:pipeline}
\end{figure*}
```
Rules: `\resizebox{\columnwidth}{!}` constrains width; `figure*` for tall diagrams; every figure needs `\caption` + `\label`; prefer `[t]` placement in two-column mode.
---
## §Scenario C: Algorithm Pseudocode
> Use standalone for a single algorithm sheet, or embed the `algorithm` environment into a §Standard Paper Workflow document.
For formal algorithm descriptions (research papers, technical specs):
```latex
\documentclass[11pt,a4paper]{article}
\usepackage[margin=2.5cm]{geometry}
\usepackage[ruled,vlined,linesnumbered]{algorithm2e}
\begin{document}
\begin{algorithm}[H]
\SetAlgoLined
\KwIn{Training set $\mathcal{D} = \{(x_i, y_i)\}_{i=1}^N$, learning rate $\eta$}
\KwOut{Optimized parameters $\theta^*$}
Initialize $\theta$ randomly\;
\For{epoch $= 1$ \KwTo $T$}{
\ForEach{mini-batch $B \subset \mathcal{D}$}{
$\mathcal{L} \leftarrow \frac{1}{|B|}\sum_{(x,y)\in B} \ell(f_\theta(x), y)$\;
$\theta \leftarrow \theta - \eta \nabla_\theta \mathcal{L}$\;
}
}
\Return{$\theta$}\;
\caption{Stochastic Gradient Descent}
\end{algorithm}
\end{document}
```
---
## §Exam Paper Rules (LaTeX)
> **Load this section** when generating exam papers, quizzes, worksheets, or exercises with mathematical content via LaTeX.
### Critical Anti-Pattern: `\parskip` + Loose Numbering = Blank Lines After Question Numbers
**This is the #1 visual defect in LaTeX exam papers.** When `\parskip` is set (e.g. `0.5em`) and question numbers are separated from question text by a blank line in the `.tex` source, LaTeX treats them as separate paragraphs — the `\parskip` gap makes it look like an intentional blank line after every number.
```latex
% ❌ WRONG — blank line between number and text = new paragraph + parskip gap
\noindent\textbf{3.}
某工厂原计划生产1200件产品……
% ❌ ALSO WRONG — even without blank line, if \parskip is large
\noindent\textbf{3.} % line break here
某工厂原计划生产1200件产品…… % LaTeX treats this as same paragraph, but confusing
% ✅ CORRECT — number and text on the SAME LINE, no break
\noindent\textbf{3.}\;某工厂原计划生产1200件产品……
```
### Iron Rules
**Rule E1 — Question number + text = same paragraph (MANDATORY):**
The question number (`\textbf{3.}`) and the question text MUST be on the same line in the `.tex` source, joined by `\;` or `\quad`. NEVER put a blank line or even a line break between them.
**Rule E2 — `\parskip` must be 0 or minimal for exam papers:**
```latex
% RECOMMENDED for exams: no parskip, control spacing explicitly
\setlength{\parskip}{0pt} % No automatic paragraph spacing
\setlength{\parindent}{0pt} % No indentation
% Use \vspace{} explicitly between questions for precise control
```
If `\parskip` is needed for other reasons, keep it ≤0.3em and be extra careful about blank lines in the source.
**Rule E3 — Use `enumitem` for structured numbering (PREFERRED):**
Instead of manual `\noindent\textbf{1.}\;`, prefer `enumitem` with custom formatting:
```latex
\usepackage{enumitem}
% Section-level numbering: 一、二、三
% Question-level: use enumerate with custom label
\begin{enumerate}[label=\textbf{\arabic*.}, leftmargin=0pt, itemindent=2em,
labelsep=0.5em, itemsep=0.8em, parsep=0pt]
\item 2024年巴黎奥运会共设有32个大项……
\item 中国空间站“天宫”在距地面……
\end{enumerate}
```
This guarantees number and text are in the same paragraph (LaTeX `\item` handles it internally).
**Rule E4 — `tasks` environment for multi-column calculation items:**
```latex
\usepackage{tasks}
% 4-column oral calculation
\begin{tasks}[counter-format={(1)}, label-align=left,
label-offset={0.5em}, label-width={2.5em},
item-indent=3em, column-sep=2em](4)
\task $\dfrac{3}{4}\times\dfrac{2}{9}=$ \underline{\hspace{2cm}}
\task $\dfrac{5}{6}\div\dfrac{1}{3}=$ \underline{\hspace{2cm}}
\end{tasks}
```
**Rule E5 — Answer space reservation:**
| Question Type | LaTeX Implementation |
|--------------|---------------------|
| Fill-in-the-blank | `\underline{\hspace{2cm}}` inline |
| Short answer | `\vspace{2cm}` after question |
| Calculation | `\vspace{4cm}` (show-your-work space) |
| Proof / derivation | `\vspace{5cm}` |
| Drawing / graphing | TikZ grid or `\vspace{4cm}` |
**Rule E6 — Page breaks for exams:**
- `page-break-after: always` between major sections (一、二、三) is OK
- NEVER break within a single question (keep question + options + answer space together)
- Use `\needspace{5cm}` before long questions to prevent orphaning
### Complete Exam Preamble Template
```latex
\documentclass[12pt,a4paper]{article}
\usepackage[margin=2cm]{geometry}
\usepackage{ctex} % CJK support
\usepackage{amsmath,amssymb} % Math
\usepackage{enumitem} % Structured lists
\usepackage{tasks} % Multi-column exercises
\usepackage{tikz} % Diagrams
\usepackage{graphicx}
\usepackage{array,tabularx}
\pagestyle{plain}
\setlength{\parindent}{0pt} % No indent for exam papers
\setlength{\parskip}{0pt} % ❗ ZERO parskip — control spacing via \vspace
% Utility commands
\newcommand{\blank}[1]{\underline{\hspace{#1}}}
\newcommand{\fn}[2]{\dfrac{#1}{#2}}
% Section header: 一、填空题每空1分共计10分
\newcommand{\examsection}[1]{%
\vspace{0.5cm}
\noindent{\heiti #1}
\vspace{0.3cm}
}
```
---
## §Resume / CV Templates
> **Skip this section** unless building a resume or CV.
Two templates available as separate files. Load the one you need:
| Template | Style | Best for | File |
|----------|-------|----------|------|
| **A: AltaCV** | Dual-column, sidebar, skill dots | Creative/tech/startup roles | `references/resume-altacv.tex` |
| **B: Academic CV** | Single-column, multi-page, publications | PhD apps, academic positions | `references/resume-academic.tex` |
**Route selection:**
| Scenario | Recommended |
|----------|-------------|
| Corporate job / ATS parsing | **Report brief** - ReportLab ATS-friendly template |
| Creative/tech/startup | **This brief** - Template A |
| Academic position / PhD | **This brief** - Template B |
| Chinese-only, simple format | **Report brief** - ReportLab (best CJK support) |
**Usage:**
1. Read the template file: `references/resume-altacv.tex` or `references/resume-academic.tex`
2. Replace placeholder content with user's information
3. Compile: `python3 "$PDF_SKILL_DIR/scripts/pdf.py" convert.latex resume.tex --runs 2`
**🔴 Resume Text Overlap Prevention:**
> AltaCV dual-column resumes are the #1 source of text overlap bugs. The sidebar and main column share vertical space but are positioned independently.
- **Column ratio sanity check:** `\columnratio{0.62}` means left column = 62%, right = 38%. If either column overflows, content bleeds into the other. Reduce ratio if left column has too much content.
- **Section spacing:** Always use `\medskip` or `\smallskip` between items - zero spacing causes lines to overlap
- **Skill dots:** Each `\cvskill` row has fixed height. If more than 8-10 skills, switch to a compact text list instead of dots
- **Experience entries:** Long descriptions in `\cvevent` can push content below the page. Use \textbf{max 3-4 bullet points per entry}
- **Two-page overflow:** If content exceeds 1 page, explicitly add `\newpage` and restart column layout. Do NOT let LaTeX auto-break the `paracol` environment - it misaligns columns on page 2
- **Compile twice:** Always `--runs 2` to resolve cross-references and stabilize column breaks
**🔴 Resume Minimum Font Size:**
- **Hard floor: 12px (9pt).** No text in the resume may render smaller than 12px, including footnotes, contact info, dates, and skill labels.
**🔴 Resume Line-Break Rules:**
- English: prefer breaking at word boundaries. Long words may be split at syllable boundaries with a hyphen (`-`) - standard typographic practice (e.g., `experi-\nence`).
- CJK: break between characters, but never separate punctuation from preceding character.
- Mixed content: respect both rules.
- Dates/ranges ("Jan 2022 - Present") must stay as one unit.
**🔴 Resume Page-Fill:**
- Content must fill ≥85% of page height. If content is sparse, increase spacing (`\medskip``\bigskip`), increase font size slightly, or add sections (Summary, Awards, Projects). Never leave visible blank area > 3cm at page bottom.
**Template A customisation quick-reference:**
| What to change | How |
|---------------|-----|
| Column ratio | `\columnratio{0.62}` → e.g. `{0.55}` |
| Accent colour | `\definecolor{accent}{HTML}{3B82F6}` → any hex |
| Skill dot count | `\foreach \x in {1,...,5}``{1,...,4}` |
| Icons | [FontAwesome5 gallery](http://texdoc.net/pkg/fontawesome5) |
| Font family | Replace `roboto`/`lato` with any LaTeX font package |
| Page margins | `\geometry{margin=1.25cm,...}` |
**Template B customisation:**
| What to change | How |
|---------------|-----|
| Accent colour | `\definecolor{accent}{HTML}{1F4E79}` → any hex |
| Header text | `\fancyhead[L]` content |
| Section style | `\titleformat{\section}` block |
---
## Reference
### Package Catalogue
**Foundational**: `hyperref` · `geometry` · `listings` · `enumitem`
**Tabular**: `booktabs` · `longtable` · `multirow` · `array` · `colortbl`
**Visual & Charting**: `tikz` · `pgfplots` · `float` · `wrapfig` · `subfig` / `subcaption`
**International & Typography**: `fontspec` (XeLaTeX/LuaLaTeX) · `ctex`
**Mathematical**: `amsmath` · `amssymb` · `amsthm` · `natbib` · `biblatex` · `siunitx`
**Algorithmic & Domain-Specific**: `algorithm` + `algpseudocode` · `chemfig`
**Page Design**: `tcolorbox` · `fancyhdr` · `titlesec` · `tocloft` · `multicol` · `setspace` · `microtype` · `parskip` · `adjustbox` · `marginnote`
**Code Listings**: `listings` · `minted` (depends on Pygments)
### Scripts & Backends
| Script | Purpose |
|--------|---------|
| `pdf.py convert.latex` | Tectonic wrapper - log sanitisation, error highlighting, PDF metrics |
### Operational Notes
**CJK**: `\usepackage{ctex}` - Tectonic pulls font bundles on the fly, zero manual install.
**Cold-start**: First compilation downloads packages (1-5 min). Cached builds: 10-30s.
**Offline**: Cached packages in `~/.cache/Tectonic/` work offline. New packages require network.
**Tectonic vs TeX Live:**
| Dimension | Tectonic | Traditional pdflatex |
|-----------|----------|---------------------|
| Package acquisition | On-demand, transparent | Manual via `tlmgr` |
| Multi-pass compilation | Handled by engine | Explicit re-invocations |
| Disk footprint | Single binary | Full TeX Live ≈ 4 GB |
**Bundled binary note:** The `scripts/tectonic` binary shipped with this skill is a **macOS arm64 (Apple Silicon)** executable. It will NOT work on other platforms. If `pdf.py convert.latex` reports "tectonic command not found", run `python3 pdf.py status` to see platform-specific installation instructions, or install manually:
| Platform | Install Command |
|----------|----------------|
| macOS (Homebrew) | `brew install tectonic` |
| macOS (binary) | `curl -sSL https://drop-sh.fullyjustified.net \| sh` |
| Debian / Ubuntu | `apt install tectonic` (if available) or conda/binary |
| Arch Linux | `pacman -S tectonic` |
| Conda (any OS) | `conda install -c conda-forge tectonic` |
| Windows (scoop) | `scoop install tectonic` |
| Windows (choco) | `choco install tectonic` |
After installing, verify: `tectonic --version`. The `_find_tectonic()` function searches: `scripts/tectonic``~/tectonic` → system PATH.
---
> **⚠️ Legacy Note:** Academic covers previously used ReportLab canvas API (cover_recipe_A/B/C/D/L). This approach is **fully deprecated**. All academic covers now use HTML/Playwright Templates 08-11 (see `typesetting/cover.md` and the pipeline at line 58-118 of this file). Do NOT write ReportLab cover code.
### ⚠️ Post-Cover Generation Checks (Mandatory)
After generating the cover HTML and before converting to PDF, run `poster_validate.py check-html`; after generating the cover PDF, run `pdf_qa.py`:
```bash
# Step 1: HTML check
python3 "$PDF_SKILL_DIR/scripts/poster_validate.py" check-html cover.html
# Step 2: Cover overlap check
node "$PDF_SKILL_DIR/scripts/cover_validate.js" cover.html
# Step 3: Convert to PDF
node "$PDF_SKILL_DIR/scripts/html2poster.js" cover.html --output cover.pdf --width 794px
# Step 4: PDF check
python3 "$PDF_SKILL_DIR/scripts/pdf_qa.py" final.pdf --skip-cover --formulas
# MANDATORY: Post-generation pipeline
python3 "$PDF_SKILL_DIR/scripts/pdf.py" meta.brand final.pdf
python3 "$PDF_SKILL_DIR/scripts/pdf.py" pages.clean final.pdf -o final_clean.pdf
python3 "$PDF_SKILL_DIR/scripts/pdf.py" font.check final.pdf
python3 "$PDF_SKILL_DIR/scripts/pdf.py" toc.check final.pdf
```
- `poster_validate.py` checks: font fallback, overflow:hidden, @media screen, background color consistency, etc.
- `pdf_qa.py` checks: cover full-bleed, blank pages, CJK punctuation, overflow, margin symmetry, font embedding, metadata, formula overflow
- `meta.brand` writes Title/Author/Creator metadata
- `pages.clean` removes accidental blank pages
- `font.check` scans for missing glyphs (□ tofu)
- `toc.check` verifies TOC entries, page numbers, and links
- Any ERROR item → fix and regenerate
### Merging Cover with Body
```python
from pypdf import PdfReader, PdfWriter
def merge_cover_body(cover_path, body_path, output_path):
writer = PdfWriter()
for page in PdfReader(cover_path).pages:
writer.add_page(page)
for page in PdfReader(body_path).pages:
writer.add_page(page)
with open(output_path, 'wb') as f:
writer.write(f)
```