[od-discuss] Machine readability in v2.1 (was: Re: [okfn-discuss] Open Definition 2.1 final draft)

Rufus Pollock rufus.pollock at okfn.org
Wed Jul 29 08:53:15 UTC 2015


Just forking subject as the thread was heading off in new directions!

I appreciate, as Mike points out, that there will be variation and context
specificity in what exactly constitutes machine readability but I think the
general principle can be made clear. I also appreciate that we are
attempting that with the current phrasing. In the spirit of offering
something concrete, what about a new section 1.4 as follows:

1.4 Machine Readability

The work should be provided in "machine-readable" form, that is one that
can be easily accessed and processed by a computer, and which is in form in
which modifications to individual data elements can easily be performed.

I also note we have the following definition of machine readable in the
Open Data Handbook:

http://opendatahandbook.org/glossary/en/terms/machine-readable/

<quote>
Data in a data format that can be automatically read and processed by a
computer, such as CSV, JSON, XML, etc. Machine-readable data must be
structured data. Compare human-readable.

Non-digital material (for example printed or hand-written documents) is by
its non-digital nature not machine-readable. But even digital material need
not be machine-readable. For example, consider a PDF document containing
tables of data. These are definitely digital but are not machine-readable
because a computer would struggle to access the tabular information - even
though they are very human readable. The equivalent tables in a format such
as a spreadsheet would be machine readable.

As another example scans (photographs) of text are not machine-readable
(but are human readable!) but the equivalent text in a format such as a
simple ASCII text file or a text-processing format such as Microsoft Word
file is machine readable.

Note: The appropriate machine readable format may vary by type of data -
so, for example, machine readable formats for geographic data may differ
from those for tabular data.
</quote>

Regards,

Rufus

On 28 July 2015 at 22:05, Andrew Rens <andrewrens at gmail.com> wrote:

> Hi
>
> Perhaps it would be useful to be specific about "machine readable" in
> respect of data but expressly state that this specificity flows from  the
> general principle in 1.3
> "The work *should* be provided in the form preferred for working with and
> making modifications to it"  or whatever the final wording is agree.
> Additional wording would then stipulate: "When a work consists of or
> contains data then the preferred form for that data is a form that enables
> a recipient use automated processes to use or modify the data as a whole or
> in part."
>
> This would help by showing how the principle would be applied to one kind
> of knowledge.
>
> Of course automated processes like machine readable requires some
> refinement - algorithmic processes perhaps?
>
> Andrew
>
>
>
> Andrew Rens
>
>
>
> On 28 July 2015 at 15:59, Aaron Wolf <wolftune at riseup.net> wrote:
>
>>
>>
>> On 07/28/2015 03:44 PM, Mike Linksvayer wrote:
>> > On 07/28/2015 10:21 AM, Aaron Wolf wrote:
>> >> On 07/28/2015 01:07 PM, Benjamin Ooghe-Tabanou wrote:
>> >>> Yes I agree also that the "as a whole" is fine regarding "bulk"
>> >>>
>> >>> As Rufus pointed out my main concern left is on machine-readability.
>> >>> Aaron I understand we want the OD to handle a larger picture than just
>> >>> data, but since it has historically been used primarily for data, I
>> >>> just want to make sure we can keep doing it afterwards and do not lose
>> >>> actual specific requirements.
>> >>> That's I why I proposed to simply replace the blurred "in a form
>> >>> preferred" sentenced with a sentence precising the specific case of
>> >>> data as It was agreed on earlier in the process.
>> >>> As such, 1.3 first concerns "work" globally. Having at the end a "Data
>> >>> must be machine readable" would add the proper precision.
>> >>>
>> >>> Benjamin Ooghe-Tabanou
>> >>>
>> >>>
>> >>
>> >> Adding "Data must be machine readable" to the end of 1.3 sounds fine to
>> >> me. Let's do that.
>> >
>> > Looks like superfluous jargon to me:
>> >
>> > - the underlying issue of works being provided in a manner that the work
>> > in question can be easily processed and manipulated is not specific to
>> > data (even from a data-centric worldview, eg to mine data from
>> 'content')
>> >
>>
>> I am willing to consent to others' concerns, but I'm with Mike: 'should
>> be provided in the form preferred for making modifications to it' — in
>> principle, that means you have data you can actually use, i.e.
>> machine-readable if that's the way you would usually manage the data.
>>
>> But, I could see changing 'making modifications' to 'working with and
>> modifying' — working with data may be analyzing it but not modifying the
>> data. So, to do analysis, you'd want it to be machine-readable, but this
>> is independent of modifying the data.
>>
>> So, I think we need to have a better generalized wording here.
>>
>> I suggest 'provided in the form preferred for working with and making
>> modifications to it'
>>
>> My concern here is about the "must" vs "should" aspect: If we used
>> "must" would that say that my video is not "open" unless I provide all
>> the source files? I have mixed feelings about that but certainly don't
>> want it any stronger than "available upon request". We don't want to
>> block the distribution of videos by making *all* distributions
>> necessarily include all source files.
>>
>>
>> > - machine-readability is not defined (with respect to what? eg a bitmap
>> > image is read by a machine, even if it is encodes a scan of 'data' from
>> > a printout)
>> >
>>
>> I had this same concern about "machine-readability", but I thought
>> qualifying this as data-specific would be acceptable. But I'm not sure.
>>
>>
>> > Mike
>> >
>> > _______________________________________________
>> > od-discuss mailing list
>> > od-discuss at lists.okfn.org
>> > https://lists.okfn.org/mailman/listinfo/od-discuss
>> > Unsubscribe: https://lists.okfn.org/mailman/options/od-discuss
>> >
>>
>> --
>> Aaron Wolf
>> co-founder, Snowdrift.coop
>> music teacher, wolftune.com
>> _______________________________________________
>> od-discuss mailing list
>> od-discuss at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/od-discuss
>> Unsubscribe: https://lists.okfn.org/mailman/options/od-discuss
>>
>
>
> _______________________________________________
> od-discuss mailing list
> od-discuss at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/od-discuss
> Unsubscribe: https://lists.okfn.org/mailman/options/od-discuss
>
>


-- 

*Rufus PollockFounder and President | skype: rufuspollock | @rufuspollock
<https://twitter.com/rufuspollock>Open Knowledge <http://okfn.org/> - see
how data can change the world**http://okfn.org/ <http://okfn.org/> | @okfn
<http://twitter.com/OKFN> | Open Knowledge on Facebook
<https://www.facebook.com/OKFNetwork> |  Blog <http://blog.okfn.org/>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/od-discuss/attachments/20150729/4e6afc80/attachment-0002.html>


More information about the od-discuss mailing list