[ckan-dev] datastore - large queries
Alex Gartner
alexandru.gartner+ckan at gmail.com
Mon Mar 9 11:30:22 UTC 2015
Hi,
thank you for the response.
I think that in the short-term we'll follow your advises about limiting
requests from nginx and merging the PR. If we decide to go beyond that for
a longer-term solution will let you know.
Thanks again,
Alex
On Fri, Mar 6, 2015 at 12:41 PM, Alice Heaton <a.heaton at nhm.ac.uk> wrote:
> Hello,
>
> There is no setting in CKAN (to my knowledge) to help with this. Things we
> have done here:
>
> - Use nginx to limit the request rate (total and from a single IP) to the
> datastore api;
> - Ensure our servers can deal with as many requests as we allow (so worst
> case the site is blocked, but the servers won't go down);
> - Clear the response after each request. At the moment this doesn't
> happen, so a worker keeps the last response in memory until it used again.
> With 20 workers returning very large responses, this can kill your memory
> very quickly. You'll need to merge this PR to do this:
> https://github.com/ckan/ckan/pull/2262
>
> I have been thinking of imposing a hard limit on number of rows returned
> per request. This could be implemented as a middleware which simply returns
> a 400 error if there is a limit higher than a configured number. This is
> not a priority for us, but we might come round to doing it at some point.
> If you implement something like this we will definitely help with testing!
>
> Another approach currently under discussion is to have a streaming/chunked
> API, so the response is only build as the client consumes it. This is only
> at discussion phase, but worth keeping an eye on:
> https://github.com/ckan/ideas-and-roadmap/issues/128
>
> Alice
>
>
> On 06/03/15 00:32, Alex Gartner wrote:
>
> Hi everyone,
>
> I have a question related to the datastore API being used by a user with *bad
> intentions* to achieve a denial of service of some kind. Since the
> project that I'm working on plans to have datastore tables with around 1
> million rows I'm thinking this might be used against the system.
> To give a few examples:
>
> - The following request to the "datastore_search_sql" endpoint takes
> on my laptop around 2 minutes to complete for a datastore_table with 2 500
> rows and limit 250 000. Without the limit I imagine it would take around 50
> mins (There would be 2 500 x 2 500 rows in the response).
> - curl -G localhost:5000/api/action/datastore_search_sql
> --data-urlencode "sql=SELECT a.* from datastore_table a, datastore_table b
> limit 250000"
> - accessing the "datastore_search" endpoint with a limit of 250 000
> takes also about 2 mins ( for a table of around 500 000 rows )
> - curl "
> http://localhost/api/action/datastore_search?resource_id=resource_id&limit=250000
> "
>
> I imagine that somebody hitting the datastore API with X simultaneous
> requests for data with a limit of 1 million could block the server (while
> using all the db connections).
>
> Is there a way to set a hard limit that cannot be overwritten by the
> user for the number of results returned by a query to the datastore (to
> force pagination in way) ?
>
> And in more general terms, what would be the best practice for avoiding
> such issues ? Are there some CKAN settings that help with this ? Should we
> setup the web server ( nginx, apache ) to use a harsher limit on the number
> of simultaneous HTTP requests to the datastore API endpoints ?
>
> Thanks for the help,
> Alex
>
>
> _______________________________________________
> ckan-dev mailing listckan-dev at lists.okfn.orghttps://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20150309/83899a89/attachment-0003.html>
More information about the ckan-dev
mailing list