Scraping JSON Responses
When a website returns a JSON response (i.e., with a Content-Type
of application/json
), html2rss
converts the JSON to XML, allowing you to use CSS selectors for data extraction.
[!NOTE] The JSON response must be an Array or a Hash for the conversion to work.
JSON to XML Conversion Examples
Section titled “JSON to XML Conversion Examples”JSON Object
Section titled “JSON Object”A JSON object like this:
{ "data": [{ "title": "Headline", "url": "https://example.com" }]}
is converted to this XML structure:
<object> <data> <array> <object> <title>Headline</title> <url>https://example.com</url> </object> </array> </data></object>
You would use array > object
as your items
selector.
JSON Array
Section titled “JSON Array”A JSON array like this:
[{ "title": "Headline", "url": "https://example.com" }]
is converted to this XML structure:
<array> <object> <title>Headline</title> <url>https://example.com</url> </object></array>
You would use array > object
as your items
selector.
Configuration Examples
Section titled “Configuration Examples”Html2rss.feed( headers: { Accept: 'application/json' }, channel: { url: 'http://domainname.tld/whatever.json' }, selectors: { title: { selector: 'foo' } })
headers: Accept: application/jsonchannel: url: "http://domainname.tld/whatever.json"selectors: items: selector: "array > object" title: selector: "foo"