[okfn-labs] website of well-curated datasets

Rufus Pollock rufus.pollock at okfn.org
Thu Feb 14 13:38:55 UTC 2013


On 14 February 2013 13:03, Martin Keegan <martin.keegan at okfn.org> wrote:
> On Thu, Feb 14, 2013 at 10:16 AM, Rufus Pollock <rufus.pollock at okfn.org> wrote:
>
>>> Was it at http://datasets.okfnlabs.org/ and in any case, does anyone
>>> know which site I'm referring to and where it is now?
>>
>> Apologies, temporarily down due to a domain config snafu (my fault) at
>> heroku! It's back now!
>
> No worries.
>
> What's the QA process like for getting datasets on there? Is the

QA process is issues and discussion in: https://github.com/datasets/registry

I'd propose:

1. You create an issue proposing the idea (if you can already link to
a dataset repo you have all the better)
2. You create a dataset repo (as per the structure described on
datasets.okfnlabs.org/about/) - this repo can either be in personal
account or (if you have the privileges!) you could create directly in
github.com/datasets)
3. When ready you create a pull request on the registry repo (list.txt
...) and this triggers discussion and review
4. The pull happens (and if necessary the repo is relocated to datasets ....)

> source code for any wrangling available?

The idea is you keep all source code for wrangling in a scripts
directory of the datasets repo - see the data packages spec:

http://www.dataprotocols.org/en/latest/data-packages.html

Or the summary at:

http://datasets.okfnlabs.org/about

Here's some examples:

https://github.com/datasets/s-and-p-500/tree/master/scripts
https://github.com/datasets/gdp/blob/master/process.py

(note in this case not in the scripts directory - i'm still in 2 minds
how strict to be on this ...)

Rufus




More information about the okfn-labs mailing list