CLI Reference
This page documents the html2rss command-line interface (CLI).
For detailed documentation on the Ruby API, please refer to the official YARD documentation.
📚 View the Ruby API Docs on rubydoc.info
Commands
Section titled “Commands”The html2rss executable is the primary way to interact with the gem from your terminal.
Automatically discovers items from a page and prints the generated RSS feed to stdout.
html2rss auto https://example.com/articles ; html2rss auto https://example.com/app --strategy browserless --max-redirects 5 --max-requests 6 ; BOTASAURUS_SCRAPER_URL="http://localhost:4010" html2rss auto https://example.com/protected --strategy botasaurus ; html2rss auto https://example.com/articles --items_selector ".post-card"Command: html2rss auto URL
Default behavior is --strategy auto, which tries faraday then botasaurus then browserless.
URL Surface Guidance For auto
Section titled “URL Surface Guidance For auto”auto works best when the input URL already exposes a server-rendered list of entries.
- High-success surfaces:
- newsroom or press listing pages
- blog/category/tag listing pages
- changelog/release notes/update listing pages
- paginated archive/list views
- Low-success surfaces:
- generic homepages with heavy promo/navigation chrome
- search results pages
- client-rendered app shells (
#app,#root,#__next, etc.)
When possible, pass a direct listing/update URL instead of a top-level homepage or app entrypoint.
Failure Outcomes You Should Expect
Section titled “Failure Outcomes You Should Expect”When no extractable items are found, auto classifies likely causes instead of only returning a generic message:
blocked surface likely (anti-bot or interstitial):- try a more specific public listing URL
app-shell surface detected:- switch to a direct listing/update URL
unsupported extraction surface for auto mode:- switch to listing/changelog/category URLs
- use explicit selectors in a feed config
Known anti-bot interstitial responses (for example Cloudflare challenge pages) are surfaced explicitly as blocked-surface errors.
If all fallback tiers run but still extract zero items, html2rss raises:
No RSS feed items extracted after auto fallback ...
If failures continue after URL/surface fixes, retry with an explicit browser-based override (--strategy browserless), or --strategy botasaurus when BOTASAURUS_SCRAPER_URL is configured.
Start by changing the input URL to a direct listing/update page, then move to explicit selectors if needed.
Browserless Setup And Diagnostics (CLI)
Section titled “Browserless Setup And Diagnostics (CLI)”browserless is an explicit override for CLI usage.
# 1) Start Browserless in the backgrounddocker run -d --rm --name html2rss-browserless -p 3000:3000 -e "CONCURRENT=10" -e "TOKEN=6R0W53R135510" ghcr.io/browserless/chromium
# 2) Run html2rss against Browserless
BROWSERLESS_IO_WEBSOCKET_URL="ws://127.0.0.1:3000" BROWSERLESS_IO_API_TOKEN="6R0W53R135510" html2rss auto https://example.com/updates --strategy browserless
# 3) Stop Browserless when done
docker stop html2rss-browserlessIf you see Browserless connection failed, check:
BROWSERLESS_IO_WEBSOCKET_URLpoints to a reachable Browserless endpointBROWSERLESS_IO_API_TOKENmatches the BrowserlessTOKEN- the Browserless service is running and reachable from your shell environment
For custom Browserless endpoints, BROWSERLESS_IO_API_TOKEN is required.
Botasaurus Environment Requirement (CLI)
Section titled “Botasaurus Environment Requirement (CLI)”botasaurus is an explicit override for CLI usage and requires BOTASAURUS_SCRAPER_URL:
BOTASAURUS_SCRAPER_URL="http://localhost:4010" html2rss auto https://example.com/updates --strategy botasaurusIf you see a Botasaurus configuration error, check:
BOTASAURUS_SCRAPER_URLis setBOTASAURUS_SCRAPER_URLis a valid URL- the Botasaurus scrape API is reachable from the shell environment running
html2rss
Loads a YAML config, builds the feed, and prints the RSS XML to stdout.
html2rss feed single.yml ; html2rss feed feeds.yml my-first-feed ; html2rss feed single.yml --strategy auto ; html2rss feed single.yml --strategy browserless ; BOTASAURUS_SCRAPER_URL="http://localhost:4010" html2rss feed single.yml --strategy botasaurus ; html2rss feed single.yml --max-redirects 5 --max-requests 6 ; html2rss feed single.yml --params id:42 foo:barCommand: html2rss feed YAML_FILE [feed_name]
The CLI keeps strategy as a top-level override and writes runtime request limits into the generated config under request.
Schema
Section titled “Schema”Prints the exported JSON Schema for the current gem version.
html2rss schema ; html2rss schema --no-pretty ; html2rss schema --write tmp/html2rss-config.schema.jsonCommand: html2rss schema
Validate
Section titled “Validate”Validates a config with the runtime validator without generating a feed.
html2rss validate single.yml ; html2rss validate feeds.yml my-first-feedCommand: html2rss validate YAML_FILE [feed_name]
Displays the help message with available commands and options.
Command: html2rss help
Version
Section titled “Version”Displays the installed version of html2rss.
Command: html2rss --version