[okfn-help] [get.theinfo] datahub 0.8 is available

Rufus Pollock rufus.pollock at okfn.org
Fri Dec 4 09:54:32 GMT 2009


2009/12/3 Lukasz Szybalski <szybalski at gmail.com>:
> On Thu, Dec 3, 2009 at 8:27 AM, Rufus Pollock <rufus.pollock at okfn.org> wrote:
>> 2009/12/2 Lukasz Szybalski <szybalski at gmail.com>:
>>> On Wed, Dec 2, 2009 at 2:24 PM, Jonathan Gray <jonathan.gray at okfn.org> wrote:

[...]

>>> datapkg on the other hand seem to do the later....query, search and
>>> upload/load packages?????
>>>
>>> Let me know what exactly datapkg does at this point?
>>
>> Yes datapkg allows you to register packages on ckan.net, query
>> existing packages on ckan.
>
> datahub by default is a python package so pypi is best deployment option

Right, that's what we did first when in v0.1 of datapkg a couple of
years ago. But what happens if you want to handle stuff that *isn't* a
python package. Plus PyPI may not be that happy if you start uploading
packages with 100s of MB in them (or even GBs). Data is so much larger
than code stuff that I think we need a slightly different
architecture. Also not everyone wants to plugin to python. Thus while
datapkg supports python packages straight up it is also designed so it
can consume other stuff easily.

[...]

> what do you use to query ckan? xmlrpc? or?

A JSON API.

> waht do you use to expose ckan info?

?

> We're just in the process of reworking the
>> install support -- this is rather more complex than in the code case
>> because of a need to support package payloads which are e.g. apis
>> rather than actual chunks of data.
>
> What's "apis/package payload"

Well imagine a package that points to a massive database. You might
not want to install 5TB on your machine but you might want to just
talk to the API. So you'd like to support "packages" which just expose
APIs.

>> Current datapkg documentation (for trunk):
>>
>> <http://knowledgeforge.net/ckan/doc/datapkg/>
>>
>> Instructions for installation are here if you want to give it a whirl:
>>
>> <http://knowledgeforge.net/ckan/doc/datapkg/install.html>
>
> similar installation for datahub
> easy_install datahub
>
> I read over your man page, but I got lost a little in the setting up
> repository. Also the
> datapkg --repository=http://ckan.net/api/rest/ list didn't work.

Where does this come from? Not in online manual I think (for latest
version of code).

Did you run from "HEAD" (ie. from mercurial repo)? As I said in
previous email "You'll need to install from the mercurial repository
to get up to date code ...".

easy_install version from PyPI is rather out of data (0.2 when we're
about to release 0.4).

[...]

> I've looked at the uk_house_prices and browser_stats...
> In my case I would split the crawl,parse, and load.
>
> Is there a reason you use swiss package instead of tools that already
> exist like pyexcelerator, etc.?

swiss is a just a small set of helper modules that builds on existing
packages. E.g. all XLS stuff is done by xlrd.

Rufus
-- 
Promoting Open Knowledge in a Digital Age
http://www.okfn.org/ - http://blog.okfn.org/



More information about the okfn-help mailing list