Scraping JSON Responses

When a website returns a JSON response (i.e., with a Content-Type of application/json), html2rss converts the JSON to XML, allowing you to use CSS selectors for data extraction.

[!NOTE] The JSON response must be an Array or a Hash for the conversion to work.

JSON to XML Conversion Examples

JSON Object

A JSON object like this:

{
  "data": [{ "title": "Headline", "url": "https://example.com" }]
}

is converted to this XML structure:

<object>
  <data>
    <array>
      <object>
        <title>Headline</title>
        <url>https://example.com</url>
      </object>
    </array>
  </data>
</object>

You would use array > object as your items selector.

JSON Array

A JSON array like this:

[{ "title": "Headline", "url": "https://example.com" }]

is converted to this XML structure:

<array>
  <object>
    <title>Headline</title>
    <url>https://example.com</url>
  </object>
</array>

You would use array > object as your items selector.

Configuration Examples

Ruby

Html2rss.feed(
  headers: {
    Accept: 'application/json'
  },
  channel: {
    url: 'http://domainname.tld/whatever.json'
  },
  selectors: {
    title: { selector: 'foo' }
  }
)

YAML

headers:
  Accept: application/json
channel:
  url: "http://domainname.tld/whatever.json"
selectors:
  title:
    selector: "foo"