[open-government] Definition of machine readability?

Josh Tauberer tauberer at govtrack.us
Thu Jul 22 11:12:53 UTC 2010


It's hard to define because in a way it comes on a sliding scale. For 
instance with text you can have an embedded-image-only PDF, a PDF with 
text but the text is garbled when you try to copy it, a PDF with text 
that isn't garbled, a Tagged PDF whatever that is, or HTML, or HTML with 
semantic markup...

And it might be confusing because it's not about the file format but 
about the type of information the human wants to get out of it. In an 
image-only PDF there's lots of "information" in there besides the raw 
text, but we're normally talking about machine processability of the 
text. The document margins are machine-processable, but that's not relevant.

So I would say machine processable is-

When the information of interest is provided in a manner that supports 
its analysis and reuse through computing technology.

- Josh Tauberer
- CivicImpulse / GovTrack.us

http://razor.occams.info | www.govtrack.us | civicimpulse.com

"Members of both sides are reminded not to use guests of the
House as props."

On 07/22/2010 06:23 AM, Jonathan Gray wrote:
> Does anyone know of a good working definition of machine readability?
> Something we hear very often in relation to opening up government data
> -- but something I've more often heard illustrated (databases, PDFs,
> etc) than defined (e.g. criteria). Feel like necessary/sufficient
> conditions might be tough. Any ideas?
>




More information about the open-government mailing list