[Open-data-census] machine readable definition
Graeme Jones
jonesiom at gmail.com
Mon Oct 7 23:08:14 BST 2013
I've also discussed structured tweets in the new parliamentary year with
@TynwaldInfo and as perhaps a lazy machine readable option in other
departments with low or no budgets to support external apps and services
for example the courts system because the fire brigade investigated a
disgruntled complaint on overcrowding when about 80 defendants in minor
cases all had to arrive at 9:30am despite most clearly needing to wait
hours to the relevant case....
On 7 October 2013 19:42, Rufus Pollock <rufus.pollock at okfn.org> wrote:
> On 5 October 2013 21:43, Andrew Stott <andrew.stott at dirdigeng.com> wrote:
>
>> Rufus****
>>
>> ** **
>>
>> I’m rather more relaxed about properly structured HTML where the data
>> could be programmatically extracted (although most examples would fail the
>> bulk download case).
>>
>
> Hmmm, I'm in 2 minds on this but incline to saying HTML is not machine
> readable as you almost always have to do siginificant work to re-extract
> info. More below ...
>
>
>> For instance if an agency want to make a data table available as HTML
>> under an open licence and this is both viewable and programmatically,
>> reliably, parsable in order to get the data then it is hard to see this is
>> not open data.****
>>
>> ** **
>>
>> However it would not be open data if:****
>>
>> ** **
>>
>> (1) the data is shown as, for instance, images within the HMTL – not
>> programmatically extractable.****
>>
>> ** **
>>
>> (2) the data is shown as implications for formatting rather than as data
>> itself (eg colouring – cf the OKFN Census league table (!))****
>>
>> ** **
>>
>> (3) the data “appears” as the result of user interaction and/or the
>> execution of scripts – that defeats automatic, programmable parsing.****
>>
>> ** **
>>
>> Conversely at one time UK Civil Service vacancies (largely structured
>> text) were shown on various UK Government websites with RDFa attributes in
>> the HTML tags precisely in order to be scrapable. This sort of technology
>> could also be a solution to publication of contractual documents – frankly
>> more useful than downloadable PDFs or Microsoft Word file.
>>
>
> I think RDFa is one thing (and I'd put RDFa as the format rather than HTML
> or perhaps HTML/RDFa) but I'd say that, by default, HTML is not
> machine-readable because it always needs parsing (and most HTML is quite
> bad HTML).
>
>
>> ****
>>
>> ** **
>>
>> As Ivan Begtin has pointed out, simply because a dataset is expressed in
>> XML it does not mean that it is machine readable in any sort of practical
>> way.
>>
>
> I'd say it is much more machine-readable ;-)
>
> Machine-readability is definitely one of the more subtle items when you
> get to the edges - i actually have a series of "bad-data" examples in
> progress to illustrate some of the edge cases at
>
> http://okfnlabs.org/bad-data/
>
> And there are a number of mapping and postcode cases where the results are
>> in open formats but are not machine-readable in the sense that you could
>> extract the data and reuse it.
>>
>> In my view we should look at machine readable as a combination of fact
>> and objective judgement, and not say that a particular format is
>> automatically machine-readable or not machine-readable.
>>
>
> That is definitely a good point but I would say that *usually* HTML would
> not be machine readable (perhaps we need a weak and strong form ;-) of it!)
>
> Rufus
> *
>
> *
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-data-census/attachments/20131007/dcd8c6f9/attachment.htm>
More information about the Open-data-census
mailing list