[open-government] Definition of machine readability?
Josh Tauberer
tauberer at govtrack.us
Thu Jul 22 11:12:53 UTC 2010
It's hard to define because in a way it comes on a sliding scale. For
instance with text you can have an embedded-image-only PDF, a PDF with
text but the text is garbled when you try to copy it, a PDF with text
that isn't garbled, a Tagged PDF whatever that is, or HTML, or HTML with
semantic markup...
And it might be confusing because it's not about the file format but
about the type of information the human wants to get out of it. In an
image-only PDF there's lots of "information" in there besides the raw
text, but we're normally talking about machine processability of the
text. The document margins are machine-processable, but that's not relevant.
So I would say machine processable is-
When the information of interest is provided in a manner that supports
its analysis and reuse through computing technology.
- Josh Tauberer
- CivicImpulse / GovTrack.us
http://razor.occams.info | www.govtrack.us | civicimpulse.com
"Members of both sides are reminded not to use guests of the
House as props."
On 07/22/2010 06:23 AM, Jonathan Gray wrote:
> Does anyone know of a good working definition of machine readability?
> Something we hear very often in relation to opening up government data
> -- but something I've more often heard illustrated (databases, PDFs,
> etc) than defined (e.g. criteria). Feel like necessary/sufficient
> conditions might be tough. Any ideas?
>
More information about the open-government
mailing list