Twingly Blog Search API v3

Introduction

Twingly Blog Search API is a commercial XML over HTTP API that enables machine access to Twingly’s blog search index. Currently, the last 12 months of data is searchable through the API. To be able to retrieve data through the API an API key issued by Twingly must be used. The API key then grants access to blog data for one or more languages. If you don’t have access, you can sign up for a free trial.

API Endpoint

A GET request to /search retrieves blog posts which match the specified query. The blog posts are by default returned in date order starting with the newest. At most 1000 hits are returned for a given query.

HTTP options

Request Parameters

Parameter values must be URL encoded.

Parameter Format Example Notes
apikey String E67EFC65-08A9-4086-BE92-074BBD7F78EA Required
q Search query tag:fashion sort:published Required
format String xml Optional, defaults to xml (which is the only allowed value)

Example query

This will search for all posts with tag “fashion” from the blog blogg.veckorevyn.com/fridagrahn, the result will be returned in ascending order by publish date.

tag:fashion blog:blogg.veckorevyn.com/fridagrahn sort:published sort-order:asc

See our search language documentation for definitions.

Example Request

With searchpattern: banan page-size:1

curl -s "https://api.twingly.com/blog/search/api/v3/search?apikey=KEY&q=banan%20page-size:1"

Response

<twinglydata numberOfMatchesReturned="1" secondsElapsed="0.001" numberOfMatchesTotal="114628" incompleteResult="false">
  <post>
    <id>13735921747213240519</id>
    <author>Janetember</author>
    <url>
      http://nouw.com/janetember/oatmeal-carrot-cake-30001633
    </url>
    <title>Oatmeal carrot cake</title>
    <text>
      Jättegott alternativ till frukost när man har en lite längre morgon! En oatmeal carrot cake, på bara fyra ingredienser! Blanda havregryn, kanel, rivna morötter och mosad banan i en bunke. Tillsätt lite vatten om det behövs, den ska vara hyfsat rinnig. Låt stå på 200 grader en kort stund i ugnen! Smakar precis som en morotskaka!
    </text>
    <languageCode>sv</languageCode>
    <locationCode>se</locationCode>
    <coordinates/>
    <links/>
    <tags>
      <tag>Foodporn</tag>
      <tag>Recept i världsklass</tag>
    </tags>
    <images/>
    <indexedAt>2017-05-03T05:04:00Z</indexedAt>
    <publishedAt>2017-05-03T05:03:35Z</publishedAt>
    <reindexedAt>0001-01-01T00:00:00Z</reindexedAt>
    <inlinksCount>0</inlinksCount>
    <blogId>9008121184606352387</blogId>
    <blogName>the mind of jane</blogName>
    <blogUrl>http://nouw.com/janetember</blogUrl>
    <blogRank>1</blogRank>
    <authority>0</authority>
  </post>
</twinglydata>

Elements and attributes:

<twinglydata> is the root element with the following attributes:

Within the root element, <twinglydata>, there can be zero or more <post> elements.

<post> is the root element for a post and contains the following child elements:

Rate limits

We allow for a reasonable amount of concurrent requests, no hard limits. Using the same key for production, staging, and development is also perfectly fine. If you have a need for running lots of queries in parallell for a prolonged time, we’d appreciate if you told us beforehand.

Cache

Search responses are cached for 5 minutes, meaning that you will need to wait at least 5 minutes to get fresh results for a given search query. The cache key is the digest of the search pattern and the parameters; if in need to circumvent the cache it is possible to, for instance, change the timestamps slightly.

Errors

We strive to respond with the correct HTTP status code and respond with valid XML. But since computers can be tricky at times, you should ensure your client don’t blow up if you get if we give a broken response (please contact us if we do).

The errors from the API looks like this:

<error code="123">
<message>foo</message>
</error>

In the GitHub repository for the [Blog Search Ruby client] developed by Twingly, you can find example responses for most errors documented below.

4xx Client errors

Client errors, most likely your client sending invalid requests but please contact us if you can’t figure it out.

HTTP status Error code Description
400 40001 Invalid parameters (see the message for more info)
400 40002 Invalid query (see the message for more info)
400 40003 Invalid query (see the message for more info)
401 40101 Unauthorized
402 40201 Access to language(s) denied (see the message for more info)
404 40401 Not Found

5xx Server errors

HTTP status Error code Description
500 50001 Internal Server Error. Unexpected conditions were encountered, indicating a server-side bug.
503 50301 Service Unavailable. Retry later.

Best practices

Pagination

If numberOfMatchesTotal is greater than numberOfMatchesReturned, then you will need to paginate through the result in order to retrieve all posts. The best way to do this is to utilize the start-date and/or end-date , creating a sliding time-based window.

Sort with sort-order:asc sort:published Set start-date: to the published time of the newest returned post and repeat your query. Repeat until numberOfMatchesTotal equals numberOfMatchesReturned. This technique can be seen in the examples for the Search API Ruby client.

For the (unusual) case where your search query yields a numberOfMatchesTotal greater than 1,000 (the default page size), where all posts are published at the same time, you can increase numberOfMatchesReturned by adding the page-size option to your search pattern. This works up until the maximum page-size of 10,000. In the very unlikely event that your query still yields more than 10,000 hits you will need to start adding keywords to the query, or add filters such as lang.

Example search pattern with page-size:

"christmas page-size:5000"

If you need to keep the API response small, it’s possible to paginate through up to 10,000 matches, with the page option added to your search pattern. Note that the maximum value for page is 100.

Example search pattern with page and page-size:

"christmas page:2 page-size:100"

Available data

As soon as we get new blog content it will be searchable through the Search API. Most often it takes just a few seconds for data to flow through our system, it may be a few minutes during maintenance though.

See also our page about the ingestion system details and challenges.


Known issues

Search timeouts

As the search result property incompleteResult is not yet implemented, it is hard to determine whether the search query was subject to timeouts or not. However, our current search agent timeout is set to 6 seconds. This means that if the secondsElapsed attribute is greater than 6 seconds, then there’s a good chance that at least one search agent timed out – leading to an incomplete result. If this happens repeatedly, consider rewriting your query to a less expensive form.


Clients


Changes from previous version

We have focused on simplifying the API request and extending the response.

Changes to output XML:

Migrating from Search v2

Documentation changelog

API changelog