Creating Feed Configurations
Welcome to the guide for html2rss-configs
. This document explains how to create your own configuration files to convert any website into an RSS feed.
You can find a list of all community-contributed configurations in the Feed Directory.
Core Concepts
An html2rss
config is a YAML file that defines how to extract data from a web page. It consists of two main building blocks: channel
and selectors
.
The channel
Block
The channel
block contains metadata about the RSS feed itself, such as its title and the source URL.
Example:
channel:
url: https://example.com/blog
title: My Awesome Blog
For a complete list of all available channel options, please see the Channel Reference.
The selectors
Block
The selectors
block is the core of the configuration, defining the rules for extracting content. It always contains an items
selector to identify the list of articles and individual selectors for the data points within each item (e.g., title
, link
).
Example:
selectors:
items:
selector: "article.post"
title:
selector: "h2 a"
link:
selector: "h2 a"
For a comprehensive guide on all available selectors, extractors, and post-processors, please see the Selectors Reference.
Tutorial: Your First Config
This tutorial walks you through creating a basic configuration file from scratch.
Step 1: Identify the Target Content
First, identify the HTML structure of the website you want to create a feed for. For this example, we’ll use a simple blog structure:
<div class="posts">
<article class="post">
<h2><a href="/post/1">First Post</a></h2>
<p>This is the summary of the first post.</p>
</article>
<article class="post">
<h2><a href="/post/2">Second Post</a></h2>
<p>This is the summary of the second post.</p>
</article>
</div>
Step 2: Create the Config File and Define the Channel
Create a new YAML file (e.g., my-blog.yml
) and define the channel
:
# my-blog.yml
channel:
url: https://example.com/blog
title: My Awesome Blog
description: The latest news from my awesome blog.
Step 3: Define the Selectors
Next, add the selectors
block to extract the content for each post.
# my-blog.yml
selectors:
items:
selector: "article.post"
title:
selector: "h2 a"
link:
selector: "h2 a"
description:
selector: "p"
-
items
: This CSS selector identifies the container for each article. -
title
,link
,description
: These selectors target the specific data points within each item. For alink
selector,html2rss
defaults to extracting thehref
attribute from the matched<a>
tag.
Advanced Techniques
Handling Pagination
To aggregate content from multiple pages, use the pagination
option within the items
selector.
selectors:
items:
selector: ".post-listing .post"
pagination:
selector: ".pagination .next-page"
limit: 5 # Optional: sets the maximum number of pages to follow
Dynamic Feeds with Parameters
Use the parameters
block to create flexible configs. This is useful for feeds based on search terms, categories, or regions.
# news-search.yml
parameters:
query:
type: string
default: "technology"
channel:
url: "https://news.example.com/search?q={query}"
title: "News results for '{query}'"
Contributing Your Config
Have you created a config that others might find useful? We strongly encourage you to contribute it to the project! By sharing your config, you make it available to all users of the public html2rss-web
service and the Feed Directory.
To contribute, please create a pull request to the html2rss-configs
repository.
Usage and Integration
With html2rss-web
Once your pull request is reviewed and merged, your config will become available on the public html2rss-web
instance. You can then access it at the path /<domainname.tld/path>.rss
.
Programmatic Usage in Ruby
You can also use html2rss-configs
programmatically in your Ruby applications.
Add this to your Gemfile:
gem 'html2rss-configs', git: 'https://github.com/html2rss/html2rss-configs.git'
And use it in your code:
require 'html2rss/configs'
config = Html2rss::Configs.find_by_name('domainname.tld/whatever')
rss = Html2rss.feed(config)