PRD: wkhtmltopdf - Chrome Headless Wrapper #2

Closed
opened 2026-05-20 15:29:48 -06:00 by ppreeper · 0 comments
Owner

PRD: wkhtmltopdf - Chrome Headless Wrapper

Problem Statement

Odoo and other systems depend on wkhtmltopdf for HTML-to-PDF conversion. The original wkhtmltopdf is based on a deprecated Qt WebKit engine, producing inferior PDFs and receiving no active development. Users want a drop-in replacement binary called wkhtmltopdf that accepts the same CLI arguments but delegates to headless Google Chrome for superior rendering, all built with Python 3 stdlib only.

Solution

Build wkhtmltopdf as a Python 3 CLI binary (stdlib only, zero external dependencies) that:

  • Parses wkhtmltopdf-compatible CLI arguments
  • Translates them into headless Chrome command-line flags
  • Invokes Chrome to generate the PDF
  • Returns wkhtmltopdf-compatible exit codes
  • Supports cookies, headers, footers, margins, page sizes, and other wkhtmltopdf options

User Stories

  1. As an Odoo admin, I want to replace the wkhtmltopdf binary with this wrapper so that my existing Odoo config continues to work without changes.
  2. As a developer, I want the binary to be a single Python 3 file with no pip dependencies so that deployment is trivial.
  3. As a sysadmin, I want the wrapper to auto-detect Chrome/Chromium on the system so I don't have to configure paths manually.
  4. As a user, I want --page-size A4 to produce an A4 PDF so that my reports match expected dimensions.
  5. As a user, I want --page-size Letter to produce a Letter-sized PDF so that US reports work correctly.
  6. As a user, I want --orientation landscape to rotate the page so that wide tables fit.
  7. As a user, I want --margin-top 20 to set a 20mm top margin so that my content doesn't touch the edge.
  8. As a user, I want --margin-bottom, --margin-left, --margin-right to work independently so I can control all margins.
  9. As a user, I want --page-width and --page-height to set custom page dimensions in mm so I can produce non-standard sizes.
  10. As a user, I want --dpi 300 to produce a high-resolution PDF so that images look sharp.
  11. As a user, I want --disable-local-file-access to block local file references so that untrusted HTML can't read my filesystem.
  12. As a user, I want --enable-local-file-access to allow local file references so that embedded images work.
  13. As a user, I want --header-html file.html to inject a header on every page so that I get consistent document headers.
  14. As a user, I want --footer-html file.html to inject a footer on every page so that I get page numbers and footers.
  15. As a user, I want --header-spacing to control gap between header and content so layout looks correct.
  16. As a user, I want --header-line to draw a line under the header so it visually separates from content.
  17. As a user, I want --cookie-jar cookies.txt to pass cookies to Chrome so that authenticated pages render correctly.
  18. As a user, I want --javascript-delay 5000 to wait 5 seconds before printing so that JS-heavy pages finish rendering.
  19. As a user, I want --quiet to suppress all non-error output so that logs stay clean.
  20. As a user, I want multiple input HTML files to be concatenated so I can merge documents.
  21. As a user, I want --zoom 1.5 to scale the page content so that small text becomes readable.
  22. As a user, I want --disable-smart-shrinking to prevent Chrome from auto-scaling so my CSS controls the layout exactly.
  23. As a system, I want exit code 0 on success so that callers know the PDF was generated.
  24. As a system, I want exit code 4 when input file is missing so that callers can handle file-not-found.
  25. As a system, I want exit code 6 when Chrome binary is not found so that callers know the dependency is missing.
  26. As a system, I want exit code 3 on timeout so that callers can retry or fail gracefully.
  27. As a system, I want exit code 7 on invalid arguments so that callers can show usage help.
  28. As a sysadmin, I want to set WKHTMLTOPDF_CHROME_PATH env var to override auto-detection so I can pin a specific Chrome binary.
  29. As a sysadmin, I want to set WKHTMLTOPDF_TIMEOUT env var to change the default timeout so long-running renders don't get killed.
  30. As a sysadmin, I want a config file at ~/.wkhtmltopdf.json for persistent settings so I don't repeat env vars.
  31. As a user, I want --chrome-path CLI flag to specify Chrome location so I can override per-invocation.
  32. As a user, I want --timeout CLI flag to set max wait time so I can control rendering timeout per call.
  33. As a user, I want --print-media-type to use print CSS media queries so that print-specific styles apply.
  34. As a developer, I want helpful error messages on stderr so I can debug issues quickly.
  35. As a developer, I want warnings for unsupported flags so I know what features are not yet mapped.
  36. As a user, I want --viewport-size WIDTHxHEIGHT to set the browser viewport so that responsive layouts render at the right width.
  37. As a user, I want --extra-chromium-args to pass additional Chrome flags so I can access experimental features.
  38. As a user, I want the wrapper to work inside Docker containers with --no-sandbox so that containerized deployments work.
  39. As a user, I want --version to print the wrapper version so I can verify the installed version.
  40. As a user, I want --help to print usage information so I can discover available options.

Implementation Decisions

  • Language: Python 3, stdlib only. No pip packages.
  • Binary name: wkhtmltopdf (shebang #!/usr/bin/env python3).
  • Auto-detect Chrome: Search PATH for google-chrome, google-chrome-stable, chromium-browser, chromium.
  • Always pass headless flags: --headless=new --disable-gpu --no-sandbox --disable-dev-shm-usage.
  • Margin conversion: wkhtmltopdf uses mm; Chrome --print-to-pdf uses inches. Convert mm → inches (mm / 25.4).
  • Page size mapping: Map standard names (A4, Letter, Legal, Tabloid, A3, A5, B4, B5) to Chrome's --print-to-pdf paper size.
  • Cookie handling: Parse Netscape cookie jar format, write to Chrome JSON cookie file, pass via --cookies-file, clean up temp file after.
  • Header/Footer: Use Chrome's --header-template and --footer-template (or --print-to-pdf-header-template / --print-to-pdf-footer-template) with HTML content.
  • JavaScript delay: Use --virtual-time-budget (milliseconds) or --timeout for wait.
  • Multiple inputs: Concatenate HTML files into a single temp file before passing to Chrome.
  • Zoom: Apply via CSS transform: scale() wrapper or --device-scale-factor.
  • Viewport size: Pass via --window-size=WIDTH,HEIGHT.
  • Print media type: Pass --print-to-pdf implies print media; use --run-all-compositor-stages-before-draw for full rendering.
  • Modules:
    • argparse (stdlib) for CLI parsing
    • subprocess (stdlib) for Chrome invocation
    • os, sys, tempfile, json, pathlib for utilities
    • Single-file architecture initially for simplicity.

Testing Decisions

  • Unit tests: Test flag translation logic (wkhtmltopdf args → Chrome args) in isolation.
  • Integration tests: Run wrapper against real Chrome with sample HTML files, verify PDF output exists and has correct properties.
  • No mocking of Chrome in integration tests - use actual headless Chrome.
  • Test fixtures: Small HTML files with known content for deterministic output checks.
  • Test framework: Python unittest (stdlib only).

Out of Scope

  • PDF forms, PDF attachments, encryption.
  • --web-spacing (CSS zoom via wkhtmltopdf-specific spacing).
  • Screenshot/preview mode.
  • TOC generation (--toc).
  • Cover page (--cover).
  • Grayscale output (--grayscale).
  • Outline/generation (--outline, --outline-depth).

Further Notes

  • The wrapper must be a drop-in replacement. Any Odoo config pointing to wkhtmltopdf should work without modification.
  • Chrome's --headless=new mode (Chrome 112+) provides the best rendering fidelity.
  • The --print-to-pdf flag in Chrome is the core mechanism; all wkhtmltopdf flags map to variations of this.
  • Error handling must be robust - Chrome may crash, timeout, or fail to start.
# PRD: wkhtmltopdf - Chrome Headless Wrapper ## Problem Statement Odoo and other systems depend on `wkhtmltopdf` for HTML-to-PDF conversion. The original wkhtmltopdf is based on a deprecated Qt WebKit engine, producing inferior PDFs and receiving no active development. Users want a drop-in replacement binary called `wkhtmltopdf` that accepts the same CLI arguments but delegates to headless Google Chrome for superior rendering, all built with Python 3 stdlib only. ## Solution Build `wkhtmltopdf` as a Python 3 CLI binary (stdlib only, zero external dependencies) that: - Parses wkhtmltopdf-compatible CLI arguments - Translates them into headless Chrome command-line flags - Invokes Chrome to generate the PDF - Returns wkhtmltopdf-compatible exit codes - Supports cookies, headers, footers, margins, page sizes, and other wkhtmltopdf options ## User Stories 1. As an Odoo admin, I want to replace the wkhtmltopdf binary with this wrapper so that my existing Odoo config continues to work without changes. 2. As a developer, I want the binary to be a single Python 3 file with no pip dependencies so that deployment is trivial. 3. As a sysadmin, I want the wrapper to auto-detect Chrome/Chromium on the system so I don't have to configure paths manually. 4. As a user, I want `--page-size A4` to produce an A4 PDF so that my reports match expected dimensions. 5. As a user, I want `--page-size Letter` to produce a Letter-sized PDF so that US reports work correctly. 6. As a user, I want `--orientation landscape` to rotate the page so that wide tables fit. 7. As a user, I want `--margin-top 20` to set a 20mm top margin so that my content doesn't touch the edge. 8. As a user, I want `--margin-bottom`, `--margin-left`, `--margin-right` to work independently so I can control all margins. 9. As a user, I want `--page-width` and `--page-height` to set custom page dimensions in mm so I can produce non-standard sizes. 10. As a user, I want `--dpi 300` to produce a high-resolution PDF so that images look sharp. 11. As a user, I want `--disable-local-file-access` to block local file references so that untrusted HTML can't read my filesystem. 12. As a user, I want `--enable-local-file-access` to allow local file references so that embedded images work. 13. As a user, I want `--header-html file.html` to inject a header on every page so that I get consistent document headers. 14. As a user, I want `--footer-html file.html` to inject a footer on every page so that I get page numbers and footers. 15. As a user, I want `--header-spacing` to control gap between header and content so layout looks correct. 16. As a user, I want `--header-line` to draw a line under the header so it visually separates from content. 17. As a user, I want `--cookie-jar cookies.txt` to pass cookies to Chrome so that authenticated pages render correctly. 18. As a user, I want `--javascript-delay 5000` to wait 5 seconds before printing so that JS-heavy pages finish rendering. 19. As a user, I want `--quiet` to suppress all non-error output so that logs stay clean. 20. As a user, I want multiple input HTML files to be concatenated so I can merge documents. 21. As a user, I want `--zoom 1.5` to scale the page content so that small text becomes readable. 22. As a user, I want `--disable-smart-shrinking` to prevent Chrome from auto-scaling so my CSS controls the layout exactly. 23. As a system, I want exit code 0 on success so that callers know the PDF was generated. 24. As a system, I want exit code 4 when input file is missing so that callers can handle file-not-found. 25. As a system, I want exit code 6 when Chrome binary is not found so that callers know the dependency is missing. 26. As a system, I want exit code 3 on timeout so that callers can retry or fail gracefully. 27. As a system, I want exit code 7 on invalid arguments so that callers can show usage help. 28. As a sysadmin, I want to set `WKHTMLTOPDF_CHROME_PATH` env var to override auto-detection so I can pin a specific Chrome binary. 29. As a sysadmin, I want to set `WKHTMLTOPDF_TIMEOUT` env var to change the default timeout so long-running renders don't get killed. 30. As a sysadmin, I want a config file at `~/.wkhtmltopdf.json` for persistent settings so I don't repeat env vars. 31. As a user, I want `--chrome-path` CLI flag to specify Chrome location so I can override per-invocation. 32. As a user, I want `--timeout` CLI flag to set max wait time so I can control rendering timeout per call. 33. As a user, I want `--print-media-type` to use print CSS media queries so that print-specific styles apply. 34. As a developer, I want helpful error messages on stderr so I can debug issues quickly. 35. As a developer, I want warnings for unsupported flags so I know what features are not yet mapped. 36. As a user, I want `--viewport-size WIDTHxHEIGHT` to set the browser viewport so that responsive layouts render at the right width. 37. As a user, I want `--extra-chromium-args` to pass additional Chrome flags so I can access experimental features. 38. As a user, I want the wrapper to work inside Docker containers with `--no-sandbox` so that containerized deployments work. 39. As a user, I want `--version` to print the wrapper version so I can verify the installed version. 40. As a user, I want `--help` to print usage information so I can discover available options. ## Implementation Decisions - **Language:** Python 3, stdlib only. No pip packages. - **Binary name:** `wkhtmltopdf` (shebang `#!/usr/bin/env python3`). - **Auto-detect Chrome:** Search PATH for `google-chrome`, `google-chrome-stable`, `chromium-browser`, `chromium`. - **Always pass headless flags:** `--headless=new --disable-gpu --no-sandbox --disable-dev-shm-usage`. - **Margin conversion:** wkhtmltopdf uses mm; Chrome `--print-to-pdf` uses inches. Convert mm → inches (mm / 25.4). - **Page size mapping:** Map standard names (A4, Letter, Legal, Tabloid, A3, A5, B4, B5) to Chrome's `--print-to-pdf` paper size. - **Cookie handling:** Parse Netscape cookie jar format, write to Chrome JSON cookie file, pass via `--cookies-file`, clean up temp file after. - **Header/Footer:** Use Chrome's `--header-template` and `--footer-template` (or `--print-to-pdf-header-template` / `--print-to-pdf-footer-template`) with HTML content. - **JavaScript delay:** Use `--virtual-time-budget` (milliseconds) or `--timeout` for wait. - **Multiple inputs:** Concatenate HTML files into a single temp file before passing to Chrome. - **Zoom:** Apply via CSS `transform: scale()` wrapper or `--device-scale-factor`. - **Viewport size:** Pass via `--window-size=WIDTH,HEIGHT`. - **Print media type:** Pass `--print-to-pdf` implies print media; use `--run-all-compositor-stages-before-draw` for full rendering. - **Modules:** - `argparse` (stdlib) for CLI parsing - `subprocess` (stdlib) for Chrome invocation - `os`, `sys`, `tempfile`, `json`, `pathlib` for utilities - Single-file architecture initially for simplicity. ## Testing Decisions - **Unit tests:** Test flag translation logic (wkhtmltopdf args → Chrome args) in isolation. - **Integration tests:** Run wrapper against real Chrome with sample HTML files, verify PDF output exists and has correct properties. - **No mocking of Chrome** in integration tests - use actual headless Chrome. - **Test fixtures:** Small HTML files with known content for deterministic output checks. - **Test framework:** Python `unittest` (stdlib only). ## Out of Scope - PDF forms, PDF attachments, encryption. - `--web-spacing` (CSS zoom via wkhtmltopdf-specific spacing). - Screenshot/preview mode. - TOC generation (`--toc`). - Cover page (`--cover`). - Grayscale output (`--grayscale`). - Outline/generation (`--outline`, `--outline-depth`). ## Further Notes - The wrapper must be a drop-in replacement. Any Odoo config pointing to `wkhtmltopdf` should work without modification. - Chrome's `--headless=new` mode (Chrome 112+) provides the best rendering fidelity. - The `--print-to-pdf` flag in Chrome is the core mechanism; all wkhtmltopdf flags map to variations of this. - Error handling must be robust - Chrome may crash, timeout, or fail to start.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ppreeper/wkhtmltopdf#2
No description provided.