[ckan-dev] API abuse

David Read david.read at hackneyworkshop.com
Thu Apr 18 13:52:47 UTC 2013


Thanks David, that's very useful and we'll give it a try.

I can't help thinking that it would be well worth having a skype
discussion at some point about setting up infrastructure. It a fair
chunk of effort to set-up and tuning monitoring, alerts, log files,
caching, replication, fail-over, memory, security etc. and I think
just capturing the sorts of things we're all doing would be a useful
thing.

BTW I'm not keen on the suggestion of requiring an API key, since I
think it adds a serious amount to the difficulty of using the API for
even a simple way. The RESTful API works like web pages - it is all
GETs with params in headers - so is easy to cache and tends to avoid
these load issues.

David

On 18 April 2013 14:25, David Raznick <david.raznick at okfn.org> wrote:
> Hello,
>
> The simplest approach to this, and we have done this on the datahub, was to
> limit api requests to 1 req/s per ip.  This was done with nginx though.
> http://wiki.nginx.org/HttpLimitReqModule
>
> Thanks
>
> David
>
>
> On Thu, Apr 18, 2013 at 1:25 PM, Toby Dacre <toby.okfn at gmail.com> wrote:
>>
>>
>>
>> On 18 April 2013 11:13, David Read <david.read at hackneyworkshop.com> wrote:
>>>
>>> We had an incident yesterday caused by a java web bot making
>>> simultaneous connections to our CKAN API. Averaging 10 requests per
>>> second, it caused serious server problems - postgres filling the CPU
>>> use, Apache spawning lots of processes. Normally big loads are not a
>>> problem for us because of using a cache in front of CKAN, but because
>>> the API v3 is not easily cached, it caused the problems.
>>>
>>> The user was POSTing requests to package_show, without api key. Nagios
>>> alerted us to the slowing server and we banned their IP manually
>>> within a few minutes to take it back to normal. But it has become a
>>> concern.
>>>
>>> Does anyone have any thoughts on how the CKAN community might deal
>>> with this sort of behaviour better, either in the design of CKAN or
>>> with server software?
>>
>>
>> Hmmm,
>>
>> there are a few approaches.
>>
>> 1) we could look at rate limiting api calls but that might be painful have
>> overheads and block some legitimate use
>>
>> 2) we could insist on a valid API key much earlier in the process (do we
>> allow anon api access?  or add a config option for this)
>>
>> 3) package_show is hard because we get the package from the db even if we
>> then find we won't show it - this is hard to avoid though maybe we could
>> look at pre auth checking before even hitting the db but I'm not sure that
>> would help here
>>
>> I think 2 is the easiest/quickest approach but it might be too blunt
>>
>> Toby
>>
>>
>>>
>>> David
>>>
>>> _______________________________________________
>>> ckan-dev mailing list
>>> ckan-dev at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/ckan-dev
>>> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
>>
>>
>>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ckan-dev
>> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
>>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
>




More information about the ckan-dev mailing list