[ckan-dev] datastore - large queries

Alex Gartner alexandru.gartner+ckan at gmail.com
Fri Mar 6 00:32:42 UTC 2015


Hi everyone,

I have a question related to the datastore API being used by a user with *bad
intentions* to achieve a denial of service of some kind. Since the project
that I'm working on plans to have datastore tables with around 1 million
rows I'm thinking this might be used against the system.
To give a few examples:

   - The following request to the "datastore_search_sql" endpoint takes  on
   my laptop around 2 minutes to complete for a datastore_table with 2 500
   rows and limit 250 000. Without the limit I imagine it would take around 50
   mins (There would be 2 500 x 2 500 rows in the response).
   - curl -G localhost:5000/api/action/datastore_search_sql
      --data-urlencode "sql=SELECT a.* from datastore_table a,
datastore_table b
      limit 250000"
   - accessing the "datastore_search" endpoint with a limit of 250 000
   takes also about 2 mins ( for a table of around 500 000 rows )
   - curl "
      http://localhost/api/action/datastore_search?resource_id=resource_id&limit=250000
      "

I imagine that somebody hitting the datastore API with X simultaneous
requests for data with a limit of 1 million could block the server (while
using all the db connections).

Is there a way to set a hard limit that cannot be overwritten by the user
for the number of results returned by a query to the datastore (to force
pagination in way) ?

And in more general terms, what would be the best practice for avoiding
such issues ? Are there some CKAN settings that help with this ? Should we
setup the web server ( nginx, apache ) to use a harsher limit on the number
of simultaneous HTTP requests to the datastore API endpoints ?

Thanks for the help,
Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20150306/718303d7/attachment-0002.html>


More information about the ckan-dev mailing list