[open-government] [openhouseproject] The Four "A"s of Open Government Data

Gregory Slater tenkyuu at pacbell.net
Sun Feb 12 18:21:43 UTC 2012


What about 'API' for the fourth 'A' ?

On the other hand, one might argue that ease of programmatic readablity is another facet of 'Accessibility', since in the age of 'big data', data is not really accessible if it isn't formatted for programmatic access.  In fact, one way of thwarting transparency is to overwhelm the user in enormous volumes of documents that effectively cannot be parsed, summarized and searched efficiently.  Think of the last scene of 'Raiders of the Lost Ark'…

Anyway, I totally agree that programmatic machine readability is absolutely key for big data
Thanks for thoughts,

 - Greg Slater


On Feb 11, 2012, at 5:43 PM, Josh Tauberer wrote:

> Last week the House Committee on House Administration (here in the U.S.)
> held a conference on legislative data and transparency. Reynold
> Schweickhardt, the committee’s director of technology policy, made an
> interesting observation at the start of the day that policy for public
> information often is framed in terms of 3 A's:
> 
>    accessibility,
>    authenticity, and
>    accuracy.
> 
> I thought about that over the next few hours. They are good principles.
> And yet us data geeks so often find ourselves having to start from
> scratch explaining why clean data is so important. It seems
> contradictory: if accuracy is a concept practitioners in government get,
> and if 'clean' is a type of accuracy, then there must be some
> communications failure here if we're having a hard time explaining open
> data to government agencies. (To be clear, Reynold totally gets it.)
> 
>    --------------------------------------------
>    TLDR version: Read chapter 5 of my book at:
>    http://opengovdata.io/2012-02/page/5/principles-open-government-data
>    --------------------------------------------
> 
> So I was thinking that morning, what other word do we need to add to
> those 3 As to work open data in there? At first I thought about adding
> "precision". Precision is one thing we're usually asking for when we ask
> for open data. Precision is basically granularity. Compared to say a
> PDF, XHTML is more granular because it is explicit about section
> boundaries, paragraphs, identifying where in the document the important
> things are like names and dollar amounts, etc. (It is more granular with
> respect to the meaning of the document, though not its pagination.)
> 
> But precision is too narrow. When Congress releases its institutional
> spending records, it does so in a PDF. That PDF has high precision ---
> it gets down practically to line items. The problem with the PDF is that
> it has low accuracy because getting it into a spreadsheet format and
> de-duping names introduces errors.
> 
> But accuracy is already one of the three As. So what's missing here?
> 
> The Association of Computing Machinery’s Recommendation on Open
> Government (February 2009) figured this out:
> 
>> "Data published by the government should be in formats and approaches
>> that promote analysis and reuse of that data."
> http://www.acm.org/public-policy/open-government
> 
> Not only is it right, but "analysis" starts with the letter A. Plus, in order to do any useful analysis on large amounts of information, we need automation --- another A word. That is fate if I ever saw it.
> 
> Proposing a whole 17 distinct principles of open government data (read the chapter!) might be, let's say, overwhelming in any practical situation. If we had to do with just four words, maybe these will do:
> 
>    accessible,
>    authentic,
>    accurate, and
>    analyzable (using automation, because data is big these days).
> 
> Analyzable gives deeper meaning to the other three words. Accuracy is too vague alone. You can't measure accuracy in the absence of some process. In the computer science world, accuracy is how often something comes out right. I think government documents people have considered that 'something' to be if a Xerox machine copies enough pixels correctly. That's not sufficient for analysis anymore. We can't go hiring thousands of interns to read all of the documents governments produce. We didn't build computers for nothing.
> 
> With analyzable added, the meaning of accuracy is that an *automated computer process* will get it right. If someone says a document is accurate because it is a scan, I'll say that's what accurate meant in the 1960s. If the fourth "A" of government information is analyzable, we can redefine accuracy for 2012.
> 
> But if you want the full 17 principles, read the rest of the chapter, which tackles data quality (accuracy & precision), machine processability, and other concepts in more detail. There's also a case study on the House disbursements documents, looking at whether and how it met the 17 principles:
> 
>    http://opengovdata.io/2012-02/page/5/principles-open-government-data
> 
> Thanks,
> 
> - Josh Tauberer (@JoshData)
> - GovTrack.us | POPVOX.com
> 
> http://razor.occams.info | www.govtrack.us | www.popvox.com
> 
> -- 
> You received this message because you are subscribed to the Google Groups "Open House Project" group.
> To post to this group, send email to openhouseproject at googlegroups.com.
> To unsubscribe from this group, send email to openhouseproject+unsubscribe at googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/openhouseproject?hl=en.
> 





More information about the open-government mailing list