CLI Reference
This page documents the html2rss command-line interface (CLI).
For detailed documentation on the Ruby API, please refer to the official YARD documentation.
📚 View the Ruby API Docs on rubydoc.info
Commands
Section titled “Commands”The html2rss executable is the primary way to interact with the gem from your terminal.
Automatically discovers items from a page and prints the generated RSS feed to stdout.
html2rss auto https://example.com/articles ; html2rss auto https://example.com/app --strategy browserless ; html2rss auto https://example.com/app --strategy browserless --max-redirects 5 --max-requests 6 ; html2rss auto https://example.com/articles --items_selector ".post-card"Command: html2rss auto URL
URL Surface Guidance For auto
Section titled “URL Surface Guidance For auto”auto works best when the input URL already exposes a server-rendered list of entries.
- High-success surfaces:
- newsroom or press listing pages
- blog/category/tag listing pages
- changelog/release notes/update listing pages
- paginated archive/list views
- Low-success surfaces:
- generic homepages with heavy promo/navigation chrome
- search results pages
- client-rendered app shells (
#app,#root,#__next, etc.)
When possible, pass a direct listing/update URL instead of a top-level homepage or app entrypoint.
Failure Outcomes You Should Expect
Section titled “Failure Outcomes You Should Expect”When no extractable items are found, auto now classifies likely causes instead of only returning a generic message:
blocked surface likely (anti-bot or interstitial):- retry with
--strategy browserless - try a more specific public listing URL
- retry with
app-shell surface detected:- retry with
--strategy browserless - switch to a direct listing/update URL
- retry with
unsupported extraction surface for auto mode:- switch to listing/changelog/category URLs
- use explicit selectors in a feed config
Known anti-bot interstitial responses (for example Cloudflare challenge pages) are surfaced explicitly as blocked-surface errors.
Browserless Setup And Diagnostics (CLI)
Section titled “Browserless Setup And Diagnostics (CLI)”browserless is opt-in for CLI usage.
# 1) Start Browserless in the backgrounddocker run -d --rm --name html2rss-browserless -p 3000:3000 -e "CONCURRENT=10" -e "TOKEN=6R0W53R135510" ghcr.io/browserless/chromium
# 2) Run html2rss against Browserless
BROWSERLESS_IO_WEBSOCKET_URL="ws://127.0.0.1:3000" BROWSERLESS_IO_API_TOKEN="6R0W53R135510" html2rss auto https://example.com/updates --strategy browserless
# 3) Stop Browserless when done
docker stop html2rss-browserlessIf you see Browserless connection failed, check:
BROWSERLESS_IO_WEBSOCKET_URLpoints to a reachable Browserless endpointBROWSERLESS_IO_API_TOKENmatches the BrowserlessTOKEN- the Browserless service is running and reachable from your shell environment
For custom Browserless endpoints, BROWSERLESS_IO_API_TOKEN is required.
Loads a YAML config, builds the feed, and prints the RSS XML to stdout.
html2rss feed single.yml ; html2rss feed feeds.yml my-first-feed ; html2rss feed single.yml --strategy browserless ; html2rss feed single.yml --max-redirects 5 --max-requests 6 ; html2rss feed single.yml --params id:42 foo:barCommand: html2rss feed YAML_FILE [feed_name]
The CLI keeps strategy as a top-level override and writes runtime request limits into the generated config under request.
Schema
Section titled “Schema”Prints the exported JSON Schema for the current gem version.
html2rss schema ; html2rss schema --no-pretty ; html2rss schema --write tmp/html2rss-config.schema.jsonCommand: html2rss schema
Validate
Section titled “Validate”Validates a config with the runtime validator without generating a feed.
html2rss validate single.yml ; html2rss validate feeds.yml my-first-feedCommand: html2rss validate YAML_FILE [feed_name]
Displays the help message with available commands and options.
Command: html2rss help
Version
Section titled “Version”Displays the installed version of html2rss.
Command: html2rss --version