Auto Source
The auto_source
scraper automatically finds items on a page, so you don’t have to specify CSS selectors.
To enable it, add auto_source: {}
to your configuration:
channel: url: https://example.comauto_source: {}
How It Works
Section titled “How It Works”auto_source
uses the following strategies to find content:
schema
: Parses<script type="json/ld">
tags containing structured data (e.g., Schema.org).semantic_html
: Searches for semantic HTML5 tags like<article>
,<main>
, and<section>
.html
: Analyzes the HTML structure to find frequently occurring selectors that are likely to contain the main content.
Fine-Tuning
Section titled “Fine-Tuning”You can customize auto_source
to improve its accuracy.
Scraper Options
Section titled “Scraper Options”Enable or disable specific scrapers and adjust their settings:
auto_source: scraper: schema: enabled: false # default: true semantic_html: enabled: false # default: true html: enabled: true minimum_selector_frequency: 3 # default: 2 use_top_selectors: 3 # default: 5
Cleanup Options
Section titled “Cleanup Options”Remove unwanted items from the results:
auto_source: cleanup: keep_different_domain: false # default: true min_words_title: 4 # default: 3
For detailed documentation on the Ruby API, see the official YARD documentation.