--- name: web-to-markdown description: "Use ONLY when the user explicitly says: 'use the skill web-to-markdown ...' (or 'use a skill web-to-markdown ...'). Converts webpage URLs to clean Markdown by calling the local web2md CLI (Puppeteer + Readability), suitable for JS-rendered pages." metadata: version: 0.1.0 --- # web-to-markdown Convert web pages to clean Markdown by driving a locally installed browser (via `web2md`). ## Hard trigger gate (must enforce) This skill MUST NOT be used unless the user explicitly wrote **exactly** a phrase like: - `use the skill web-to-markdown ...` - `use a skill web-to-markdown ...` If the user did not explicitly request this skill by name, stop and ask them to re-issue the request including: `use the skill web-to-markdown`. ## What this skill does - Handles JS-rendered pages (Puppeteer → user Chrome). - Works best with Chromium-family browsers (Chrome/Chromium/Brave/Edge) via `puppeteer-core`. - Extracts main content (Readability). - Converts to Markdown (Turndown) with cleaned links and optional YAML frontmatter. ## Non-goals - Do not use Playwright or other browser automation stacks; the mechanism is `web2md`. ## Inputs you should collect (ask only if missing) - `url` (or a list of URLs) - Output preference: - Print to stdout (`--print`), OR - Save to a file (`--out ./file.md`), OR - Save to a directory (`--out ./some-dir/` to auto-name by page title) - Optional rendering controls for tricky pages: - `--chrome-path ` (if Chrome auto-detection fails) - `--interactive` (show Chrome and pause so the user can complete human checks/login, then press Enter) - `--wait-until load|domcontentloaded|networkidle0|networkidle2` - `--wait-for ''` - `--wait-ms ` - `--headful` (debug) - `--no-sandbox` (sometimes required in containers/CI) - `--user-data-dir ` (login/session; use a dedicated profile directory) ## Workflow 1) Confirm the user explicitly invoked the skill (`use the skill web-to-markdown`). 2) Validate URL(s) start with `http://` or `https://`. 3) Ensure `web2md` is installed: - Run: `command -v web2md` - If missing, instruct the user to install it (assume the project exists at `~/workspace/softaworks/projects/web2md`): - `cd ~/workspace/softaworks/projects/web2md && npm install && npm run build && npm link` - Or: `cd ~/workspace/softaworks/projects/web2md && npm install && npm run build && npm install -g .` 4) Convert: - Single URL → file: - `web2md '' --out ./page.md` - Single URL → auto-named file in directory: - `mkdir -p ./out && web2md '' --out ./out/` - Human verification / login walls (interactive): - `mkdir -p ./out && web2md '' --interactive --user-data-dir ./tmp/web2md-profile --out ./out/` - Then: complete the check in the browser window and press Enter in the terminal to continue. - Print to stdout: - `web2md '' --print` - Multiple URLs (batch): - Create output dir (e.g. `./out/`) then run one `web2md` command per URL using `--out ./out/` 5) Validate output: - If writing files, verify they exist and are non-empty (e.g. `ls -la ` and `wc -c `). 6) Return: - The saved file path(s), or the Markdown (stdout mode). ## Defaults (recommended) - For most pages: `--wait-until networkidle2` - For heavy apps: start with `--wait-until domcontentloaded --wait-ms 2000`, then add `--wait-for 'main'` (or another stable selector) if needed.