Skip to content

Troubleshooting

This guide provides solutions to common issues encountered when using html2rss.

Your browser’s developer tools are essential for troubleshooting. Use them to inspect the HTML structure of a webpage and find the correct CSS selectors.

  • To open: Right-click an element on a webpage and select “Inspect” or “Inspect Element.”

auto Picks The Wrong Surface Or Finds No Items

Section titled “auto Picks The Wrong Surface Or Finds No Items”

The auto flow is URL-surface sensitive.

  • Higher success inputs:
    • newsroom/press listing URLs
    • category/tag/listing/archive URLs
    • changelog/release/update listing URLs
  • Lower success inputs:
    • generic homepages
    • search result pages
    • client-rendered app-shell entrypoints

If extraction quality is poor, switch to a more specific listing/update URL before tuning selectors.

If your feed is empty, check the following:

  • URL: Ensure the url in your configuration is correct and accessible.
  • items.selector: Verify that the items.selector matches the elements on the page.
  • Website Changes: Websites change their HTML structure frequently. Your selectors may be outdated.
  • JavaScript Content: If the content is loaded via JavaScript, use the browserless strategy instead of faraday.
  • Authentication: Some sites require authentication — check if you need to add headers or use a different strategy.

auto classifies no-scraper failures with actionable hints:

  • Blocked surface likely (anti-bot or interstitial):
    • retry with --strategy browserless
    • try a more specific public listing URL
  • App-shell surface detected:
    • retry with --strategy browserless
    • target a direct listing/update page instead of homepage/shell entrypoint
  • Unsupported extraction surface for auto mode:
    • switch to listing/changelog/category URLs
    • or use explicit selectors in YAML config

Known anti-bot interstitial patterns (for example Cloudflare challenge pages) are surfaced as blocked-surface errors instead of silent empty extraction results.

If you receive Browserless connection failed (...):

  1. Confirm Browserless is running and reachable from the machine running html2rss.
  2. Confirm BROWSERLESS_IO_WEBSOCKET_URL points at that running service.
  3. Confirm BROWSERLESS_IO_API_TOKEN matches the Browserless TOKEN.

Example local startup:

Terminal window
docker run --rm -p 3000:3000 -e "CONCURRENT=10" -e "TOKEN=6R0W53R135510" ghcr.io/browserless/chromium

Then run with:

Terminal window
BROWSERLESS_IO_WEBSOCKET_URL="ws://127.0.0.1:3000" BROWSERLESS_IO_API_TOKEN="6R0W53R135510" html2rss auto https://example.com/updates --strategy browserless

For custom websocket endpoints, BROWSERLESS_IO_API_TOKEN is required.

Common configuration-related errors:

  • UnsupportedResponseContentType: The website returned content that html2rss can’t parse (not HTML or JSON).
  • UnsupportedStrategy: The specified strategy is not available. Use faraday or browserless.
  • Configuration must include at least 'selectors' or 'auto_source': You need to specify either manual selectors or enable auto-source.
  • stylesheet.type invalid: Only text/css and text/xsl are supported for stylesheets.

If parts of your items (e.g., title, link) are missing, check the following:

  • Selector: Ensure the selector for the missing part is correct and relative to the items.selector.
  • Extractor: Verify that you are using the correct extractor (e.g., text, href, attribute).
  • Dynamic Content: faraday does not render JavaScript. If content loads dynamically, run with --strategy browserless (with the Browserless service available) so the page can be rendered before extraction.

If you are having issues with date/time parsing, check the following:

  • Date Format: The parse_time post-processor automatically detects common date formats using Ruby’s Time.parse. Ensure your date strings are in a recognizable format.
  • time_zone: Specify the correct time_zone if the website uses a specific time zone.

If you are getting a “command not found” error, try the following:

  • Re-install: Re-install html2rss to ensure it is installed correctly: gem install html2rss.
  • Check PATH: Ensure that the directory where Ruby gems are installed is in your system’s PATH.

  • Verify Docker is installed and running:
    Terminal window
    docker --version
  • Check logs for errors:
    Terminal window
    docker compose logs
  • Ensure the app port (default compose binding: 4000) isn’t already in use:
    Terminal window
    lsof -i :4000
  • If the app exits immediately in production, check that HTML2RSS_SECRET_KEY is set.
  • Confirm your firewall allows traffic on port 4000 or your reverse-proxy ports
  • Try accessing via the server’s IP instead of a domain name
  • Double-check that containers are running:
    Terminal window
    docker compose ps
  • 401 Unauthorized when creating feeds: The create-feed API expects a bearer token. Re-enter a valid access token in the UI or send Authorization: Bearer ... to POST /api/v1/feeds.
  • 403 Forbidden when creating feeds: Automatic feed generation may be disabled (AUTO_SOURCE_ENABLED=false) or the requested URL may not be allowed for the authenticated account.
  • 500 Internal Server Error: Check the application logs for detailed error information.
  • Health endpoint failures: Use GET /api/v1/health/live, GET /api/v1/health/ready, or authenticated GET /api/v1/health depending on which probe you are testing.
  • Some sites may require JavaScript rendering; ensure the browserless service is running
  • Check the feed configuration in feeds.yml for typos or invalid selectors
  • Look for parsing errors in the logs:
    Terminal window
    docker compose logs html2rss-web

  • Mobile Redirects: Check that the channel URL does not redirect to a mobile page with a different markup structure.
  • curl and pup: For static sites, use curl and pup to quickly find selectors: curl URL | pup.
  • CSS Selectors: For a comprehensive overview of CSS selectors, see the W3C documentation.