Auto Source

The auto_source scraper automatically finds items on a page, so you don’t have to specify CSS selectors.

To enable it, add auto_source: {} to your configuration:

channel:
  url: https://example.com
auto_source: {}

How It Works

auto_source uses the following strategies to find content:

  1. schema: Parses <script type="json/ld"> tags containing structured data (e.g., Schema.org).
  2. semantic_html: Searches for semantic HTML5 tags like <article>, <main>, and <section>.
  3. html: Analyzes the HTML structure to find frequently occurring selectors that are likely to contain the main content.

Fine-Tuning

You can customize auto_source to improve its accuracy.

Scraper Options

Enable or disable specific scrapers and adjust their settings:

auto_source:
  scraper:
    schema:
      enabled: false # default: true
    semantic_html:
      enabled: false # default: true
    html:
      enabled: true
      minimum_selector_frequency: 3 # default: 2
      use_top_selectors: 3 # default: 5

Cleanup Options

Remove unwanted items from the results:

auto_source:
  cleanup:
    keep_different_domain: false # default: true
    min_words_title: 4 # default: 3