[okfn-help] [get.theinfo] datahub 0.8 is available

Jonathan Gray jonathan.gray at okfn.org
Wed Dec 2 20:24:07 GMT 2009


Dear Lukasz,

I've added you to the okfn-help list and will have to defer the
technical details of datapkg to those on this list.

All: Lukasz is working on DataHub, which aims to automate "download,
parse and load" of data. Like CKAN its in Python, and it would be
great to work with him if there are areas where the two projects could
talk to each other! He's got some questions about datapkg (see below).
It seems to me like there is some potential synergy?

Lukasz: pardon my ignorance, but just to clarify, could you give an
example use case for DataHub to give me a better idea of what it does
and how it works? We're currently working on improving download URL
(and file type metadata) in CKAN - I don't know if this is relevant to
what you're doing...

Best wishes,

Jonathan

On Wed, Dec 2, 2009 at 8:12 PM, Lukasz Szybalski <szybalski at gmail.com> wrote:
> On Wed, Dec 2, 2009 at 12:36 PM, Jonathan Gray <jonathan.gray at okfn.org> wrote:
>> Hi Lukasz,
>>
>> I can't remember where we were up to in our discussion about Datahub
>> and CKAN - but it would be great to pick this up again if there are
>> useful ways in which we could ensure they work together!
>>
>> Especially regarding automated downloads, storing copies of data, etc.
>> Two options here that spring to mind are Internet Archive, and Talis
>> Connected Commons. We're also working on open data grid for
>> decentralised storage. What are your thoughts here?
>
> As far as vision for datahub, I see it as a starting tool where you
> put your code to download, parse,and load data source etc....There is
> no specifics on where you store the data. You data could be stored in
> IA or on your distributed storage. The primary concern is the
> automating the "download, parse and load" of the data.
>
> Here is the first package created using datahub:
>
> http://pypi.python.org/pypi/datahub.gov.dot.nhtsa.recall/0.2dev
>
> So process is:
> datahub -> default template -> start project -> automate it all ->
> publish source.
> datahub -> few weeks or months -> datahub.gov.dot.nhtsa.recall (now it
> only takes 1 min to get from download to load.)
>
>
>
> I've looked at the
> http://www.knowledgeforge.net/ckan/trac/browser/datapkg/trunk and
> datapkg seems similar but its already in phase 2, meaning it allows
> you to list packages that are available. Is that implemented in
> datapkg? Where do you get a list? what else can you do with it?
>
>
>
> I haven't really looked at storing the whole package with data on some
> archive site simply because the size of data I use is small, or data
> is available somewhere else. Do you have any packages that load and
> parse the data? or are using datapkg?
>
> Thanks,
> Lucas
>
>
>
>
>
> Thanks,
> Lucas
>
>>
>> Best wishes,
>>
>> Jonathan
>>
>> On Wed, Dec 2, 2009 at 5:35 PM, Lukasz Szybalski <szybalski at gmail.com> wrote:
>>> http://pypi.python.org/pypi/datahub/0.8.90dev
>>>
>>>    * Datahub is a tool that allows faster download/crawl, parse,
>>> load, and visualize of data. It achieves this by allowing you to
>>> divide each step into its own work folders. In each work folder you
>>> get a sample files that you can start coding in.
>>>    * Datahub is for people who found some interesting data source for
>>> them, they want to download it, parse it, load it into database,
>>> provide some documentation, and visualize it. Datahub will speed up
>>> the process by creating folder for each of these actions. You will
>>> create all the programs from our base default template and move on to
>>> analyzing the data in no time.
>>>
>>>
>>> If you are doing data conversions from public/private datasets this
>>> tool is for you.
>>>
>>> Few packages that use datahub: (Recall database from NHTSA. Crawl,
>>> parse, load into db as easy as running "sh process.sh")  coming soon.
>>>
>>> Enjoy.
>>>
>>> Lucas
>>>
>>> --
>>> [from the http://groups.google.com/group/get-theinfo mailing list]
>>>
>>
>>
>>
>> --
>> Jonathan Gray
>>
>> Community Coordinator
>> The Open Knowledge Foundation
>> http://www.okfn.org
>>
>
>
>
> --
> Setup CalendarServer for your company.
> http://lucasmanual.com/mywiki/CalendarServer
> Automotive Recall Database - See if you vehicle has a recall
> http://lucasmanual.com/recall
>



-- 
Jonathan Gray

Community Coordinator
The Open Knowledge Foundation
http://www.okfn.org



More information about the okfn-help mailing list