Scraping JSON Responses
When a website returns a JSON response (i.e., with a Content-Type
of application/json
), html2rss
converts the JSON to XML, allowing you to use CSS selectors for data extraction.
[!NOTE] The JSON response must be an Array or a Hash for the conversion to work.
JSON to XML Conversion Examples
JSON Object
A JSON object like this:
{
"data": [{ "title": "Headline", "url": "https://example.com" }]
}
is converted to this XML structure:
<object>
<data>
<array>
<object>
<title>Headline</title>
<url>https://example.com</url>
</object>
</array>
</data>
</object>
You would use array > object
as your items
selector.
JSON Array
A JSON array like this:
[{ "title": "Headline", "url": "https://example.com" }]
is converted to this XML structure:
<array>
<object>
<title>Headline</title>
<url>https://example.com</url>
</object>
</array>
You would use array > object
as your items
selector.
Configuration Examples
Ruby
Html2rss.feed(
headers: {
Accept: 'application/json'
},
channel: {
url: 'http://domainname.tld/whatever.json'
},
selectors: {
title: { selector: 'foo' }
}
)
YAML
headers:
Accept: application/json
channel:
url: "http://domainname.tld/whatever.json"
selectors:
title:
selector: "foo"