[ckan-dev] Specification of data processing web services.

Sean Hammond sean.hammond at okfn.org
Thu Jun 14 13:29:23 UTC 2012


Hey David, I have no idea whether this is reinventing some wheel or not,
but I think it's a really cool idea!

One general point that I wonder about, what are the efficiency
implications of having dozens of small Flask web services running on a
server and communicating with each other over http, versus having one
monolithic Pylons app?

If two web services happen to be running on the same server, is http the
best way for them to communicate with each other?

What are the implications for installing and setting up the system? You
would have to select which web services you want and install each of
them, as opposed to doing one ckan install (and then having to enable
and maybe install ckan extensions). If they're all easy to install and
you install them all in the same way, maybe that would be simple enough.
Or would there be some sort of common installer, where you run a single
command and pass the names of the services you want, and it installs
them all?

> This is a very rough specification for web-services whose main purpose
> is processing data to be used with ckan, but that also can be used as
> standalone services in their own right. These services can be split
> into two types, "long running" and "synchronous".

Should we call them synchronous and asynchronous, for symmetry?

> If there is an error the json should be of the form.
>   {
>   "task_type": "csv_parse"
>   "task_id": "fdfdsfsafdsfafdfa"
>   "requested_timestamp": "2013-01-01 19:12:33"
>   "completed_timestamp": "2013-01-01 20:12:33"
>   "sent_data": {"source_url": "www.somedata.org/data.csv",
> "target_url": "www.thedatahub.io/api/data/abf3131-3213-312-321"},
>   "error": "error info"},
>   }

Do we want more than just a string for the value of the "error" key? I'm
thinking we might need e.g. a unique error number as well as a string error
message, to make it easier for clients to handle specific errors, I know
we'll have the HTTP status code, but maybe we'll want our own
service-specific error codes as well?

> I would like to go ahead and make a very simple flask based
> implementation of this.  I think that using a simple embedded thread
> based scheduler (i.e http://packages.python.org/APScheduler/)  to
> queue the tasks with a database table to store them would be
> sufficient for this.

Is this going to be a library that we can use to quickly and easily create
web services that conform to this spec? We're obviously going to want
such a library because there'll be a lot of code shared between the
different web services.




More information about the ckan-dev mailing list