Troubleshooting

Name: html2rss
Author: html2rss

This guide provides solutions to common issues encountered when using html2rss.

Essential Tools

Your browser’s developer tools are essential for troubleshooting. Use them to inspect the HTML structure of a webpage and find the correct CSS selectors.

To open: Right-click an element on a webpage and select “Inspect” or “Inspect Element.”

Common Issues (Ruby Gem / CLI)

`auto` Picks The Wrong Surface Or Finds No Items

The auto flow is URL-surface sensitive.

Higher success inputs:
- newsroom/press listing URLs
- category/tag/listing/archive URLs
- changelog/release/update listing URLs
Lower success inputs:
- generic homepages
- search result pages
- client-rendered app-shell entrypoints

If extraction quality is poor, switch to a more specific listing/update URL before tuning selectors.

Empty Feeds

If your feed is empty, check the following:

URL: Ensure the url in your configuration is correct and accessible.
items.selector: Verify that the items.selector matches the elements on the page.
Website Changes: Websites change their HTML structure frequently. Your selectors may be outdated.
JavaScript Content: If the content is loaded via JavaScript, use the browserless strategy instead of faraday.
Authentication: Some sites require authentication — check if you need to add headers or use a different strategy.

`No scrapers found` Failure Taxonomy (`auto`)

auto classifies no-scraper failures with actionable hints:

Blocked surface likely (anti-bot or interstitial):
- retry with --strategy browserless
- try a more specific public listing URL
App-shell surface detected:
- retry with --strategy browserless
- target a direct listing/update page instead of homepage/shell entrypoint
Unsupported extraction surface for auto mode:
- switch to listing/changelog/category URLs
- or use explicit selectors in YAML config

Known anti-bot interstitial patterns (for example Cloudflare challenge pages) are surfaced as blocked-surface errors instead of silent empty extraction results.

Browserless Connection / Setup Failures

If you receive Browserless connection failed (...):

Confirm Browserless is running and reachable from the machine running html2rss.
Confirm BROWSERLESS_IO_WEBSOCKET_URL points at that running service.
Confirm BROWSERLESS_IO_API_TOKEN matches the Browserless TOKEN.

Example local startup:

docker run --rm -p 3000:3000 -e "CONCURRENT=10" -e "TOKEN=6R0W53R135510" ghcr.io/browserless/chromium

Then run with:

BROWSERLESS_IO_WEBSOCKET_URL="ws://127.0.0.1:3000" BROWSERLESS_IO_API_TOKEN="6R0W53R135510" html2rss auto https://example.com/updates --strategy browserless

For custom websocket endpoints, BROWSERLESS_IO_API_TOKEN is required.

Configuration Errors

Common configuration-related errors:

UnsupportedResponseContentType: The website returned content that html2rss can’t parse (not HTML or JSON).
UnsupportedStrategy: The specified strategy is not available. Use faraday or browserless.
Configuration must include at least 'selectors' or 'auto_source': You need to specify either manual selectors or enable auto-source.
stylesheet.type invalid: Only text/css and text/xsl are supported for stylesheets.

Missing Item Parts

If parts of your items (e.g., title, link) are missing, check the following:

Selector: Ensure the selector for the missing part is correct and relative to the items.selector.
Extractor: Verify that you are using the correct extractor (e.g., text, href, attribute).
Dynamic Content: faraday does not render JavaScript. If content loads dynamically, run with --strategy browserless (with the Browserless service available) so the page can be rendered before extraction.

Date/Time Parsing Errors

If you are having issues with date/time parsing, check the following:

Date Format: The parse_time post-processor automatically detects common date formats using Ruby’s Time.parse. Ensure your date strings are in a recognizable format.
time_zone: Specify the correct time_zone if the website uses a specific time zone.

`html2rss` Command Not Found

If you are getting a “command not found” error, try the following:

Re-install: Re-install html2rss to ensure it is installed correctly: gem install html2rss.
Check PATH: Ensure that the directory where Ruby gems are installed is in your system’s PATH.

Web Application Issues (html2rss-web)

Instance Won’t Start

Verify Docker is installed and running:
Terminal window
```
docker --version
```
Check logs for errors:
Terminal window
```
docker compose logs
```
Ensure the app port (default compose binding: 4000) isn’t already in use:
Terminal window
```
lsof -i :4000
```
If the app exits immediately in production, check that HTML2RSS_SECRET_KEY is set.

Can’t Access the Web Interface

Confirm your firewall allows traffic on port 4000 or your reverse-proxy ports
Try accessing via the server’s IP instead of a domain name
Double-check that containers are running:
Terminal window
```
docker compose ps
```

Authentication Errors

401 Unauthorized when creating feeds: The create-feed API expects a bearer token. Re-enter a valid access token in the UI or send Authorization: Bearer ... to POST /api/v1/feeds.
403 Forbidden when creating feeds: Automatic feed generation may be disabled (AUTO_SOURCE_ENABLED=false) or the requested URL may not be allowed for the authenticated account.
500 Internal Server Error: Check the application logs for detailed error information.
Health endpoint failures: Use GET /api/v1/health/live, GET /api/v1/health/ready, or authenticated GET /api/v1/health depending on which probe you are testing.

Feed Problems

Some sites may require JavaScript rendering; ensure the browserless service is running
Check the feed configuration in feeds.yml for typos or invalid selectors
Look for parsing errors in the logs:
Terminal window
```
docker compose logs html2rss-web
```

Tips & Tricks

Mobile Redirects: Check that the channel URL does not redirect to a mobile page with a different markup structure.
curl and pup: For static sites, use curl and pup to quickly find selectors: curl URL | pup.
CSS Selectors: For a comprehensive overview of CSS selectors, see the W3C documentation.

Still Stuck?

Join our community discussions
Review the deployment guide for production best practices