[okfn-labs] public bodies project and documentation

Augusto Herrmann augusto.herrmann at gmail.com
Wed Jul 10 18:17:59 UTC 2013


Hi, all!

I've seen the publicbodies.org project, seems cool, and the site invites
discussions to this list. However, browsing through the list archives, I
couldn't find any discussions about it.

Anyway, what I would like to ask and discuss here is the types and meanings
of each column [1] in the csv files. I think a description of each of them
should be provided as project documentation. Following, I have some
questions regarding some of those fields.

1) "updated"
The "updated" field should be in which format? ISO 8601?
Does it mean the date and time when data has actually last changed values,
or just the moment when the data was checked back for conformance with the
original source?

2) "slug"
Should the slug generated specifically for this project, in case there is
not an existing official one? Any guidelines as for how to do so (list of
allowed characters, character substitution rules, what to do with accented
chats, and so on).

3) "category"
Should this be an officially labeled category, if one exists? Or is it a
categorization effort specific to this project? Is there one such list of
categories to choose from, and if so, where is it?
Should we use numerical codes or textual descriptions of categories?
I've looked into the current csv data searching for examples, but the
fields seem to be empty so far.

4) "jurisdiction" and "jurisdiction_code"
What should go into this?

5) "address"
How to encode this? Street names, city, etc., all conflated into a single
string? Also, there's no field for a postal code.

6) "contact"
I'm guessing this is the name of the contact for which the "email" field
corresponds to (whether the name of a department within the public body's
structure or a person).

7) "tags"
How to set those?

Besides, I think it would be useful to include somewhere in the repository
[2] the scripts that have been used to extract the information from the
available open data source into the csvs. That way, the data can easily be
updated again by anyone by just running the scripts. How should we name the
directory where those scripts would go? "scripts"? "import"?

Last but not least, some good news. I've been working on a script to load
the Brazilian federal government's organizational structure (4910 public
bodies) into this dataset.

[1] https://github.com/okfn/publicbodies#building-the-sqlite-db
[2] https://github.com/okfn/publicbodies

Best regards,
Augusto Herrmann
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20130710/967d9c1f/attachment-0001.html>


More information about the okfn-labs mailing list