[ckan-dev] Specification of data processing web services.

David Raznick kindly at gmail.com
Thu Jun 14 16:05:49 UTC 2012


On Thu, Jun 14, 2012 at 2:29 PM, Sean Hammond <sean.hammond at okfn.org> wrote:
> Hey David, I have no idea whether this is reinventing some wheel or not,
> but I think it's a really cool idea!
>
> One general point that I wonder about, what are the efficiency
> implications of having dozens of small Flask web services running on a
> server and communicating with each other over http, versus having one
> monolithic Pylons app?

For long running tasks on big pylons app is a bad idea.  You will need
some kind of way to run asynchronous tasks and embedding a worker pool
in pylons is not very good and you would have one per process.

So whatever happens we will need some kind of queue.  The question is
a monolithic one or each service has there own.  I opted for the
second one as it is better to decentralize things.
>
> If two web services happen to be running on the same server, is http the
> best way for them to communicate with each other?

Nope, nor is http the best way over the web. However it is good
enough.  I am more worried about making reliable services then speedy
ones.  Deamonizi anything reliably is tricky. Http based web sites are
very very well tested...

>
> What are the implications for installing and setting up the system? You
> would have to select which web services you want and install each of
> them, as opposed to doing one ckan install (and then having to enable
> and maybe install ckan extensions). If they're all easy to install and
> you install them all in the same way, maybe that would be simple enough.
> Or would there be some sort of common installer, where you run a single
> command and pass the names of the services you want, and it installs
> them all?

My take on this is that setting up a webservice should be one of the
most common things for web developers to do and the barrier to entry
should be pretty low.  I have not worked out the details, but adding
another virtualhost should not be too difficult to automate.

>
>> This is a very rough specification for web-services whose main purpose
>> is processing data to be used with ckan, but that also can be used as
>> standalone services in their own right. These services can be split
>> into two types, "long running" and "synchronous".
>
> Should we call them synchronous and asynchronous, for symmetry?

yes!

>
>> If there is an error the json should be of the form.
>>   {
>>   "task_type": "csv_parse"
>>   "task_id": "fdfdsfsafdsfafdfa"
>>   "requested_timestamp": "2013-01-01 19:12:33"
>>   "completed_timestamp": "2013-01-01 20:12:33"
>>   "sent_data": {"source_url": "www.somedata.org/data.csv",
>> "target_url": "www.thedatahub.io/api/data/abf3131-3213-312-321"},
>>   "error": "error info"},
>>   }
>
> Do we want more than just a string for the value of the "error" key? I'm
> thinking we might need e.g. a unique error number as well as a string error
> message, to make it easier for clients to handle specific errors, I know
> we'll have the HTTP status code, but maybe we'll want our own
> service-specific error codes as well?

Hmm, I put it as a dictionary first an changed my mind to be a simple
string. Now I am not sure.

> Is this going to be a library that we can use to quickly and easily create
> web services that conform to this spec? We're obviously going to want
> such a library because there'll be a lot of code shared between the
> different web services.
>

The plan was to make a template flask app for people just to specify
the few things they want to change.  I am a bit worried about the
repeating code here but simple forks off the template could work.

David




More information about the ckan-dev mailing list