Troubleshooting
This guide provides solutions to common issues encountered when using html2rss.
Essential Tools
Section titled “Essential Tools”Your browser’s developer tools are essential for troubleshooting. Use them to inspect the HTML structure of a webpage and find the correct CSS selectors.
- To open: Right-click an element on a webpage and select “Inspect” or “Inspect Element.”
Common Issues (Ruby Gem / CLI)
Section titled “Common Issues (Ruby Gem / CLI)”auto Picks The Wrong Surface Or Finds No Items
Section titled “auto Picks The Wrong Surface Or Finds No Items”The auto flow is URL-surface sensitive.
- Higher success inputs:
- newsroom/press listing URLs
- category/tag/listing/archive URLs
- changelog/release/update listing URLs
- Lower success inputs:
- generic homepages
- search result pages
- client-rendered app-shell entrypoints
If extraction quality is poor, switch to a more specific listing/update URL before tuning selectors.
Empty Feeds
Section titled “Empty Feeds”If your feed is empty, check the following:
- URL: Ensure the
urlin your configuration is correct and accessible. items.selector: Verify that theitems.selectormatches the elements on the page.- Website Changes: Websites change their HTML structure frequently. Your selectors may be outdated.
- JavaScript Content: If the content is loaded via JavaScript, use the
browserlessstrategy instead offaraday. - Authentication: Some sites require authentication — check if you need to add headers or use a different strategy.
No scrapers found Failure Taxonomy (auto)
Section titled “No scrapers found Failure Taxonomy (auto)”auto classifies no-scraper failures with actionable hints:
- Blocked surface likely (anti-bot or interstitial):
- retry with
--strategy browserless - try a more specific public listing URL
- retry with
- App-shell surface detected:
- retry with
--strategy browserless - target a direct listing/update page instead of homepage/shell entrypoint
- retry with
- Unsupported extraction surface for auto mode:
- switch to listing/changelog/category URLs
- or use explicit selectors in YAML config
Known anti-bot interstitial patterns (for example Cloudflare challenge pages) are surfaced as blocked-surface errors instead of silent empty extraction results.
Browserless Connection / Setup Failures
Section titled “Browserless Connection / Setup Failures”If you receive Browserless connection failed (...):
- Confirm Browserless is running and reachable from the machine running
html2rss. - Confirm
BROWSERLESS_IO_WEBSOCKET_URLpoints at that running service. - Confirm
BROWSERLESS_IO_API_TOKENmatches the BrowserlessTOKEN.
Example local startup:
docker run --rm -p 3000:3000 -e "CONCURRENT=10" -e "TOKEN=6R0W53R135510" ghcr.io/browserless/chromiumThen run with:
BROWSERLESS_IO_WEBSOCKET_URL="ws://127.0.0.1:3000" BROWSERLESS_IO_API_TOKEN="6R0W53R135510" html2rss auto https://example.com/updates --strategy browserlessFor custom websocket endpoints, BROWSERLESS_IO_API_TOKEN is required.
Configuration Errors
Section titled “Configuration Errors”Common configuration-related errors:
UnsupportedResponseContentType: The website returned content that html2rss can’t parse (not HTML or JSON).UnsupportedStrategy: The specified strategy is not available. Usefaradayorbrowserless.Configuration must include at least 'selectors' or 'auto_source': You need to specify either manual selectors or enable auto-source.stylesheet.type invalid: Onlytext/cssandtext/xslare supported for stylesheets.
Missing Item Parts
Section titled “Missing Item Parts”If parts of your items (e.g., title, link) are missing, check the following:
- Selector: Ensure the selector for the missing part is correct and relative to the
items.selector. - Extractor: Verify that you are using the correct
extractor(e.g.,text,href,attribute). - Dynamic Content:
faradaydoes not render JavaScript. If content loads dynamically, run with--strategy browserless(with the Browserless service available) so the page can be rendered before extraction.
Date/Time Parsing Errors
Section titled “Date/Time Parsing Errors”If you are having issues with date/time parsing, check the following:
- Date Format: The
parse_timepost-processor automatically detects common date formats using Ruby’sTime.parse. Ensure your date strings are in a recognizable format. time_zone: Specify the correcttime_zoneif the website uses a specific time zone.
html2rss Command Not Found
Section titled “html2rss Command Not Found”If you are getting a “command not found” error, try the following:
- Re-install: Re-install
html2rssto ensure it is installed correctly:gem install html2rss. - Check
PATH: Ensure that the directory where Ruby gems are installed is in your system’sPATH.
Web Application Issues (html2rss-web)
Section titled “Web Application Issues (html2rss-web)”Instance Won’t Start
Section titled “Instance Won’t Start”- Verify Docker is installed and running:
Terminal window docker --version - Check logs for errors:
Terminal window docker compose logs - Ensure the app port (default compose binding: 4000) isn’t already in use:
Terminal window lsof -i :4000 - If the app exits immediately in production, check that
HTML2RSS_SECRET_KEYis set.
Can’t Access the Web Interface
Section titled “Can’t Access the Web Interface”- Confirm your firewall allows traffic on port 4000 or your reverse-proxy ports
- Try accessing via the server’s IP instead of a domain name
- Double-check that containers are running:
Terminal window docker compose ps
Authentication Errors
Section titled “Authentication Errors”- 401 Unauthorized when creating feeds: The create-feed API expects a bearer token. Re-enter a valid access token in the UI or send
Authorization: Bearer ...toPOST /api/v1/feeds. - 403 Forbidden when creating feeds: Automatic feed generation may be disabled (
AUTO_SOURCE_ENABLED=false) or the requested URL may not be allowed for the authenticated account. - 500 Internal Server Error: Check the application logs for detailed error information.
- Health endpoint failures: Use
GET /api/v1/health/live,GET /api/v1/health/ready, or authenticatedGET /api/v1/healthdepending on which probe you are testing.
Feed Problems
Section titled “Feed Problems”- Some sites may require JavaScript rendering; ensure the
browserlessservice is running - Check the feed configuration in
feeds.ymlfor typos or invalid selectors - Look for parsing errors in the logs:
Terminal window docker compose logs html2rss-web
Tips & Tricks
Section titled “Tips & Tricks”- Mobile Redirects: Check that the channel URL does not redirect to a mobile page with a different markup structure.
curlandpup: For static sites, usecurlandpupto quickly find selectors:curl URL | pup.- CSS Selectors: For a comprehensive overview of CSS selectors, see the W3C documentation.
Still Stuck?
Section titled “Still Stuck?”- Join our community discussions
- Review the deployment guide for production best practices