[open-government] typology of (government) datasets
Josh Tauberer
tauberer at govtrack.us
Fri Sep 2 14:17:52 UTC 2011
On 09/02/2011 04:48 AM, Paul Hermans wrote:
> Has someone already been working on a typology for datasets used and
> produced in government settings?
I've been thinking a lot about that, for public data sets, and have
talked about it some in talks I've given [1] [2].
There are different ways to break things into a typology.
The first way that I typically break things down is into three groups:
primary legal materials, government operational records, and civic capital.
Carl Malamud at law.resource.org has articulated best the value of
primary legal materials: improved civic education, deeper research in
universities, innovation (and reduction in costs) in the legal
information market, savings to the government, reducing the cost for
small business of maintaining legal compliance, and greater access
to justice.
Government operational records are typically administrative records
(rather than legal records) that are useful for the public to identify
conflicts of interest and to perform oversight. Contracts/grants,
personal financial disclosures, etc. These documents have little utility
aside from oversight.
The last category "civic capital" are government records that have
utility that have nothing to do with government per se. Environmental
and weather data, health and safety data, corporate compliance records.
These records can save lives, make markets more efficient, etc. These
records are closest to Government as a Platform.
A second way to break things down is whether the data informs current
policy decisions --- that's particularly useful for prioritization.
A third way to break things down is in terms of general accessibility,
with government reports on the one hand (things the general public can
consume directly) and raw data on the other (things that information
specialists can transform into something new). A fourth dimension is
whether the data is purely public, or whether it is mixed with
non-public information that must be redacted.
And a fifth way to break things down is in terms of data quality, which
you can plot on two dimensions: accuracy and precision (relative to cost
of processing).
[1] http://razor.occams.info/publications.xpd
[2]
http://razor.occams.info/pubdocs/2011-07-25%20American%20Assoc%20Law%20Libraries%20text.pdf
- Josh Tauberer
- GovTrack.us / POPVOX.com
http://razor.occams.info | www.govtrack.us | www.popvox.com
On 09/02/2011 04:48 AM, Paul Hermans wrote:
> Hi,
>
>
> Has someone already been working on a typology for datasets used and
> produced in government settings?
> If yes, any pointer is highly appreciated.
>
>
> Regards,
>
>
> Paul
>
>
>
> _______________________________________________
> open-government mailing list
> open-government at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-government
More information about the open-government
mailing list