[ckan-dev] datastore - large queries
Alex Gartner
alexandru.gartner+ckan at gmail.com
Fri Mar 6 00:32:42 UTC 2015
Hi everyone,
I have a question related to the datastore API being used by a user with *bad
intentions* to achieve a denial of service of some kind. Since the project
that I'm working on plans to have datastore tables with around 1 million
rows I'm thinking this might be used against the system.
To give a few examples:
- The following request to the "datastore_search_sql" endpoint takes on
my laptop around 2 minutes to complete for a datastore_table with 2 500
rows and limit 250 000. Without the limit I imagine it would take around 50
mins (There would be 2 500 x 2 500 rows in the response).
- curl -G localhost:5000/api/action/datastore_search_sql
--data-urlencode "sql=SELECT a.* from datastore_table a,
datastore_table b
limit 250000"
- accessing the "datastore_search" endpoint with a limit of 250 000
takes also about 2 mins ( for a table of around 500 000 rows )
- curl "
http://localhost/api/action/datastore_search?resource_id=resource_id&limit=250000
"
I imagine that somebody hitting the datastore API with X simultaneous
requests for data with a limit of 1 million could block the server (while
using all the db connections).
Is there a way to set a hard limit that cannot be overwritten by the user
for the number of results returned by a query to the datastore (to force
pagination in way) ?
And in more general terms, what would be the best practice for avoiding
such issues ? Are there some CKAN settings that help with this ? Should we
setup the web server ( nginx, apache ) to use a harsher limit on the number
of simultaneous HTTP requests to the datastore API endpoints ?
Thanks for the help,
Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20150306/718303d7/attachment-0002.html>
More information about the ckan-dev
mailing list