[open-government] typology of (government) datasets

Josh Tauberer tauberer at govtrack.us
Fri Sep 2 14:17:52 UTC 2011


On 09/02/2011 04:48 AM, Paul Hermans wrote:
 > Has someone already been working on a typology for datasets used and
 > produced in government settings?

I've been thinking a lot about that, for public data sets, and have 
talked about it some in talks I've given [1] [2].

There are different ways to break things into a typology.

The first way that I typically break things down is into three groups: 
primary legal materials, government operational records, and civic capital.

Carl Malamud at law.resource.org has articulated best the value of 
primary legal materials: improved civic education, deeper research in 
universities, innovation (and reduction in costs) in the legal 
information market, savings to the government, reducing the cost for 
small business of maintaining legal compliance, and greater access
to justice.

Government operational records are typically administrative records 
(rather than legal records) that are useful for the public to identify 
conflicts of interest and to perform oversight. Contracts/grants, 
personal financial disclosures, etc. These documents have little utility 
aside from oversight.

The last category "civic capital" are government records that have 
utility that have nothing to do with government per se. Environmental 
and weather data, health and safety data, corporate compliance records. 
These records can save lives, make markets more efficient, etc. These 
records are closest to Government as a Platform.

A second way to break things down is whether the data informs current 
policy decisions --- that's particularly useful for prioritization.

A third way to break things down is in terms of general accessibility, 
with government reports on the one hand (things the general public can 
consume directly) and raw data on the other (things that information 
specialists can transform into something new). A fourth dimension is 
whether the data is purely public, or whether it is mixed with 
non-public information that must be redacted.

And a fifth way to break things down is in terms of data quality, which 
you can plot on two dimensions: accuracy and precision (relative to cost 
of processing).


[1] http://razor.occams.info/publications.xpd
[2] 
http://razor.occams.info/pubdocs/2011-07-25%20American%20Assoc%20Law%20Libraries%20text.pdf


- Josh Tauberer
- GovTrack.us / POPVOX.com

http://razor.occams.info | www.govtrack.us | www.popvox.com

On 09/02/2011 04:48 AM, Paul Hermans wrote:
> Hi,
>
>
> Has someone already been working on a typology for datasets used and
> produced in government settings?
> If yes, any pointer is highly appreciated.
>
>
> Regards,
>
>
> Paul
>
>
>
> _______________________________________________
> open-government mailing list
> open-government at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-government




More information about the open-government mailing list