Auto Source
The auto_source scraper automatically finds items on a page, so you don’t have to specify CSS selectors.
To enable it, add auto_source: {} to your configuration:
channel: url: https://example.comauto_source: {}How It Works
Section titled “How It Works”auto_source uses the following strategies to find content:
schema: Parses<script type="json/ld">tags containing structured data (e.g., Schema.org).semantic_html: Searches for semantic HTML5 tags like<article>,<main>, and<section>.html: Analyzes the HTML structure to find frequently occurring selectors that are likely to contain the main content.- json_state: Single-page applications often stash pre-rendered article data in
<script type="application/json">tags or global variables such aswindow.__NEXT_DATA__,window.__NUXT__, orwindow.STATE. The JSON-state scraper walks those blobs, finds arrays withtitle/urlpairs, and converts them into the same hashes produced byHtmlExtractor.
json_state Limitations: the scraper requires discoverable arrays of hashes containing clear title and url fields. Minified or
obfuscated state objects, heavily encoded values, or blobs that require executing embedded functions are ignored.
Fine-Tuning
Section titled “Fine-Tuning”You can customize auto_source to improve its accuracy.
Scraper Options
Section titled “Scraper Options”Enable or disable specific scrapers and adjust their settings:
auto_source: scraper: schema: enabled: false # default: true semantic_html: enabled: false # default: true json_state: enabled: false # default: true html: enabled: true minimum_selector_frequency: 3 # default: 2 use_top_selectors: 3 # default: 5Cleanup Options
Section titled “Cleanup Options”Remove unwanted items from the results:
auto_source: cleanup: keep_different_domain: false # default: true min_words_title: 4 # default: 3For detailed documentation on the Ruby API, see the official YARD documentation.