[ckan-dev] datastore - large queries
Alice Heaton
a.heaton at nhm.ac.uk
Fri Mar 20 12:52:42 UTC 2015
Hi,
On 09/03/15 11:30, Alex Gartner wrote:
> Hi,
>
> thank you for the response.
> I think that in the short-term we'll follow your advises about
> limiting requests from nginx and merging the PR. If we decide to go
> beyond that for a longer-term solution will let you know.
>
Something else I just noticed going through our logs: you might want to
block external access to /datastore/dump/<resource id> for large
resources as this attempts to generate a CSV of the whole resource. On
our 2.7M resource this would time out, but use a lot of memory in the
process. Repeated access even below our request rate threshold would
have been painful.
Best Wishes,
Alice
> Thanks again,
> Alex
>
>
> On Fri, Mar 6, 2015 at 12:41 PM, Alice Heaton <a.heaton at nhm.ac.uk
> <mailto:a.heaton at nhm.ac.uk>> wrote:
>
> Hello,
>
> There is no setting in CKAN (to my knowledge) to help with this.
> Things we have done here:
>
> - Use nginx to limit the request rate (total and from a single IP)
> to the datastore api;
> - Ensure our servers can deal with as many requests as we allow
> (so worst case the site is blocked, but the servers won't go down);
> - Clear the response after each request. At the moment this
> doesn't happen, so a worker keeps the last response in memory
> until it used again. With 20 workers returning very large
> responses, this can kill your memory very quickly. You'll need to
> merge this PR to do this: https://github.com/ckan/ckan/pull/2262
>
> I have been thinking of imposing a hard limit on number of rows
> returned per request. This could be implemented as a middleware
> which simply returns a 400 error if there is a limit higher than a
> configured number. This is not a priority for us, but we might
> come round to doing it at some point. If you implement something
> like this we will definitely help with testing!
>
> Another approach currently under discussion is to have a
> streaming/chunked API, so the response is only build as the client
> consumes it. This is only at discussion phase, but worth keeping
> an eye on: https://github.com/ckan/ideas-and-roadmap/issues/128
>
> Alice
>
>
> On 06/03/15 00:32, Alex Gartner wrote:
>> Hi everyone,
>>
>> I have a question related to the datastore API being used by a
>> user with *bad intentions* to achieve a denial of service of some
>> kind. Since the project that I'm working on plans to have
>> datastore tables with around 1 million rows I'm thinking this
>> might be used against the system.
>> To give a few examples:
>>
>> * The following request to the "datastore_search_sql" endpoint
>> takes on my laptop around 2 minutes to complete for a
>> datastore_table with 2 500 rows and limit 250 000. Without
>> the limit I imagine it would take around 50 mins (There would
>> be 2 500 x 2 500 rows in the response).
>> o curl -G localhost:5000/api/action/datastore_search_sql
>> --data-urlencode "sql=SELECT a.* from datastore_table a,
>> datastore_table b limit 250000"
>> * accessing the "datastore_search" endpoint with a limit of 250
>> 000 takes also about 2 mins ( for a table of around 500 000
>> rows )
>> o curl
>> "http://localhost/api/action/datastore_search?resource_id=resource_id&limit=250000"
>>
>> I imagine that somebody hitting the datastore API with X
>> simultaneous requests for data with a limit of 1 million could
>> block the server (while using all the db connections).
>>
>> Is there a way to set a hard limit that cannot be overwritten by
>> the user for the number of results returned by a query to the
>> datastore (to force pagination in way) ?
>>
>> And in more general terms, what would be the best practice for
>> avoiding such issues ? Are there some CKAN settings that help
>> with this ? Should we setup the web server ( nginx, apache ) to
>> use a harsher limit on the number of simultaneous HTTP requests
>> to the datastore API endpoints ?
>>
>> Thanks for the help,
>> Alex
>>
>>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org <mailto:ckan-dev at lists.okfn.org>
>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>> Unsubscribe:https://lists.okfn.org/mailman/options/ckan-dev
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org <mailto:ckan-dev at lists.okfn.org>
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20150320/d3e494ef/attachment-0003.html>
More information about the ckan-dev
mailing list