[okfn-help] [get.theinfo] datahub 0.8 is available

Lukasz Szybalski szybalski at gmail.com
Fri Dec 4 16:22:08 GMT 2009


> [...]
>
>>>> datapkg on the other hand seem to do the later....query, search and
>>>> upload/load packages?????
>>>>
>>>> Let me know what exactly datapkg does at this point?
>>>
>>> Yes datapkg allows you to register packages on ckan.net, query
>>> existing packages on ckan.
>>
>> datahub by default is a python package so pypi is best deployment option
>
> Right, that's what we did first when in v0.1 of datapkg a couple of
> years ago. But what happens if you want to handle stuff that *isn't* a
> python package. Plus PyPI may not be that happy if you start uploading
> packages with 100s of MB in them (or even GBs).

I agree on that. I would assume that almost any package can be created
with just a code, and data should reside somewhere else? Do you have
any examples of data that need to be included with a package, instead
of downloading it from outside source?

 Data is so much larger
> than code stuff that I think we need a slightly different
> architecture. Also not everyone wants to plugin to python. Thus while
> datapkg supports python packages straight up it is also designed so it
> can consume other stuff easily.

I agree that forcing python is not going to work, so I use python
package as a medium only. All the programs/tools within it are your
choice. You can use shell scripts to perl code. I think as long as you
document what is required, then everything else should be automated in
process.sh, so that user only installs required packages/programs and
runs process.sh.


Do you know how many packages exist created by datapkg?


>> what do you use to query ckan? xmlrpc? or?
>
> A JSON API.

nice...so as long as the data is exposed in your website you are able
to query it via http?

>
>> We're just in the process of reworking the
>>> install support -- this is rather more complex than in the code case
>>> because of a need to support package payloads which are e.g. apis
>>> rather than actual chunks of data.
>>
>> What's "apis/package payload"
>
> Well imagine a package that points to a massive database. You might
> not want to install 5TB on your machine but you might want to just
> talk to the API. So you'd like to support "packages" which just expose
> APIs.

wouldn't that be done in setup.py under required packages?

>
>>> Current datapkg documentation (for trunk):
>>>
>>> <http://knowledgeforge.net/ckan/doc/datapkg/>
>>>
>>> Instructions for installation are here if you want to give it a whirl:
>>>
>>> <http://knowledgeforge.net/ckan/doc/datapkg/install.html>
>>
>> similar installation for datahub
>> easy_install datahub
>>
>> I read over your man page, but I got lost a little in the setting up
>> repository. Also the
>> datapkg --repository=http://ckan.net/api/rest/ list didn't work.
>
> Where does this come from? Not in online manual I think (for latest
> version of code).

datapkg man


> Did you run from "HEAD" (ie. from mercurial repo)? As I said in
> previous email "You'll need to install from the mercurial repository
> to get up to date code ...".

not yet.

>
> easy_install version from PyPI is rather out of data (0.2 when we're
> about to release 0.4).
>


Do you know of any datapkg packages that parse public.resource.org?
Do you have a list of packages that have datapkg package? on
http://www.ckan.net/package/list?
Do you query data.gov for available datasets?
Looking at http://www.ckan.net/package/list, how can one query a list
of "data sources/packages" that have download-able data, or
download-able parser?

I guess datapkg and datahub, as well as users would benefit for "query
for available data", "query for download-able data","query for parsers
of data".

Thanks,
Lucas



More information about the okfn-help mailing list