[od-discuss] Machine readability in v2.1

Peter Murray-Rust pm286 at cam.ac.uk
Wed Jul 29 12:19:55 UTC 2015


s/form/a form/

I think this works well for me at first glance.


On Wed, Jul 29, 2015 at 1:15 PM, Andrew Rens <andrewrens at gmail.com> wrote:

> +1
>
> Andrew Rens
>
>
>
> On 29 July 2015 at 08:14, Rufus Pollock <rufus.pollock at okfn.org> wrote:
>
>> Good suggested amendment Andrew. To summarize:
>>
>> 1.4 Machine Readability
>>
>>
>>
>> The work should be provided in "machine-readable" form, that is one in
>> which the content can easily be accessed and processed by a computer, and
>> which is in form in which modifications to individual data/content elements
>> can easily be performed.
>>
>>
>> Rufus
>>
>>
>>
>> On 29 July 2015 at 10:35, Andrew Stott <andrew.stott at dirdigeng.com>
>> wrote:
>>
>>> I would still be worried that this formulation could be interpreted as
>>> allowing PDFs of data. It needs to be the *content*, not the *form*, which
>>> needs to be easily accessed and processed by a computer. (Believe me a raw
>>> PDF file is *much* easier for a computer to read than a human!). So what
>>> about:
>>>
>>>
>>>
>>> The work should be provided in "machine-readable" form, that is one IN
>>> WHICH THE CONTENT can be easily accessed and processed by a computer, and
>>> which is in form in which modifications to individual elements OF THE
>>> CONTENT can easily be performed.
>>>
>>>
>>>
>>> Regards
>>>
>>>
>>>
>>> Andrew
>>>
>>> *From:* od-discuss [mailto:od-discuss-bounces at lists.okfn.org] *On
>>> Behalf Of *Rufus Pollock
>>> *Sent:* 29 July 2015 09:53
>>> *To:* Andrew Rens
>>> *Cc:* od-discuss at lists.okfn.org
>>> *Subject:* [od-discuss] Machine readability in v2.1 (was: Re:
>>> [okfn-discuss] Open Definition 2.1 final draft)
>>>
>>>
>>>
>>> Just forking subject as the thread was heading off in new directions!
>>>
>>>
>>>
>>> I appreciate, as Mike points out, that there will be variation and
>>> context specificity in what exactly constitutes machine readability but I
>>> think the general principle can be made clear. I also appreciate that we
>>> are attempting that with the current phrasing. In the spirit of offering
>>> something concrete, what about a new section 1.4 as follows:
>>>
>>>
>>>
>>> 1.4 Machine Readability
>>>
>>>
>>>
>>> The work should be provided in "machine-readable" form, that is one that
>>> can be easily accessed and processed by a computer, and which is in form in
>>> which modifications to individual data elements can easily be performed.
>>>
>>>
>>>
>>> I also note we have the following definition of machine readable in the
>>> Open Data Handbook:
>>>
>>>
>>>
>>> http://opendatahandbook.org/glossary/en/terms/machine-readable/
>>>
>>>
>>>
>>> <quote>
>>>
>>> Data in a data format that can be automatically read and processed by a
>>> computer, such as CSV, JSON, XML, etc. Machine-readable data must be
>>> structured data. Compare human-readable.
>>>
>>>
>>>
>>> Non-digital material (for example printed or hand-written documents) is
>>> by its non-digital nature not machine-readable. But even digital material
>>> need not be machine-readable. For example, consider a PDF document
>>> containing tables of data. These are definitely digital but are not
>>> machine-readable because a computer would struggle to access the tabular
>>> information - even though they are very human readable. The equivalent
>>> tables in a format such as a spreadsheet would be machine readable.
>>>
>>>
>>>
>>> As another example scans (photographs) of text are not machine-readable
>>> (but are human readable!) but the equivalent text in a format such as a
>>> simple ASCII text file or a text-processing format such as Microsoft Word
>>> file is machine readable.
>>>
>>>
>>>
>>> Note: The appropriate machine readable format may vary by type of data -
>>> so, for example, machine readable formats for geographic data may differ
>>> from those for tabular data.
>>>
>>> </quote>
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> Rufus
>>>
>>>
>>>
>>> On 28 July 2015 at 22:05, Andrew Rens <andrewrens at gmail.com> wrote:
>>>
>>> Hi
>>>
>>> Perhaps it would be useful to be specific about "machine readable" in
>>> respect of data but expressly state that this specificity flows from  the
>>> general principle in 1.3
>>> "The work *should* be provided in the form preferred for working with
>>> and making modifications to it"  or whatever the final wording is agree.
>>> Additional wording would then stipulate: "When a work consists of or
>>> contains data then the preferred form for that data is a form that enables
>>> a recipient use automated processes to use or modify the data as a whole or
>>> in part."
>>>
>>> This would help by showing how the principle would be applied to one
>>> kind of knowledge.
>>>
>>> Of course automated processes like machine readable requires some
>>> refinement - algorithmic processes perhaps?
>>>
>>> Andrew
>>>
>>>
>>>
>>>
>>>
>>>
>>> Andrew Rens
>>>
>>>
>>>
>>> On 28 July 2015 at 15:59, Aaron Wolf <wolftune at riseup.net> wrote:
>>>
>>>
>>>
>>> On 07/28/2015 03:44 PM, Mike Linksvayer wrote:
>>> > On 07/28/2015 10:21 AM, Aaron Wolf wrote:
>>> >> On 07/28/2015 01:07 PM, Benjamin Ooghe-Tabanou wrote:
>>> >>> Yes I agree also that the "as a whole" is fine regarding "bulk"
>>> >>>
>>> >>> As Rufus pointed out my main concern left is on machine-readability.
>>> >>> Aaron I understand we want the OD to handle a larger picture than
>>> just
>>> >>> data, but since it has historically been used primarily for data, I
>>> >>> just want to make sure we can keep doing it afterwards and do not
>>> lose
>>> >>> actual specific requirements.
>>> >>> That's I why I proposed to simply replace the blurred "in a form
>>> >>> preferred" sentenced with a sentence precising the specific case of
>>> >>> data as It was agreed on earlier in the process.
>>> >>> As such, 1.3 first concerns "work" globally. Having at the end a
>>> "Data
>>> >>> must be machine readable" would add the proper precision.
>>> >>>
>>> >>> Benjamin Ooghe-Tabanou
>>> >>>
>>> >>>
>>> >>
>>> >> Adding "Data must be machine readable" to the end of 1.3 sounds fine
>>> to
>>> >> me. Let's do that.
>>> >
>>> > Looks like superfluous jargon to me:
>>> >
>>> > - the underlying issue of works being provided in a manner that the
>>> work
>>> > in question can be easily processed and manipulated is not specific to
>>> > data (even from a data-centric worldview, eg to mine data from
>>> 'content')
>>> >
>>>
>>> I am willing to consent to others' concerns, but I'm with Mike: 'should
>>> be provided in the form preferred for making modifications to it' — in
>>> principle, that means you have data you can actually use, i.e.
>>> machine-readable if that's the way you would usually manage the data.
>>>
>>> But, I could see changing 'making modifications' to 'working with and
>>> modifying' — working with data may be analyzing it but not modifying the
>>> data. So, to do analysis, you'd want it to be machine-readable, but this
>>> is independent of modifying the data.
>>>
>>> So, I think we need to have a better generalized wording here.
>>>
>>> I suggest 'provided in the form preferred for working with and making
>>> modifications to it'
>>>
>>> My concern here is about the "must" vs "should" aspect: If we used
>>> "must" would that say that my video is not "open" unless I provide all
>>> the source files? I have mixed feelings about that but certainly don't
>>> want it any stronger than "available upon request". We don't want to
>>> block the distribution of videos by making *all* distributions
>>> necessarily include all source files.
>>>
>>>
>>> > - machine-readability is not defined (with respect to what? eg a bitmap
>>> > image is read by a machine, even if it is encodes a scan of 'data' from
>>> > a printout)
>>> >
>>>
>>> I had this same concern about "machine-readability", but I thought
>>> qualifying this as data-specific would be acceptable. But I'm not sure.
>>>
>>>
>>>
>>> > Mike
>>> >
>>> > _______________________________________________
>>> > od-discuss mailing list
>>>
>>> > od-discuss at lists.okfn.org
>>> > https://lists.okfn.org/mailman/listinfo/od-discuss
>>> > Unsubscribe: https://lists.okfn.org/mailman/options/od-discuss
>>> >
>>>
>>> --
>>> Aaron Wolf
>>> co-founder, Snowdrift.coop
>>> music teacher, wolftune.com
>>>
>>> _______________________________________________
>>> od-discuss mailing list
>>> od-discuss at lists.okfn.org
>>> https://lists.okfn.org/mailman/listinfo/od-discuss
>>> Unsubscribe: https://lists.okfn.org/mailman/options/od-discuss
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> od-discuss mailing list
>>> od-discuss at lists.okfn.org
>>> https://lists.okfn.org/mailman/listinfo/od-discuss
>>> Unsubscribe: https://lists.okfn.org/mailman/options/od-discuss
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Rufus Pollock
>>>
>>> Founder and President | skype: rufuspollock | @rufuspollock
>>> <https://twitter.com/rufuspollock>
>>>
>>> Open Knowledge <http://okfn.org/> - s*ee how data can change the world*
>>>
>>> http://okfn.org/ | @okfn <http://twitter.com/OKFN> | Open Knowledge on
>>> Facebook <https://www.facebook.com/OKFNetwork> |  Blog
>>> <http://blog.okfn.org/>
>>>
>>> _______________________________________________
>>> od-discuss mailing list
>>> od-discuss at lists.okfn.org
>>> https://lists.okfn.org/mailman/listinfo/od-discuss
>>> Unsubscribe: https://lists.okfn.org/mailman/options/od-discuss
>>>
>>>
>>
>>
>> --
>>
>> *Rufus PollockFounder and President | skype: rufuspollock | @rufuspollock
>> <https://twitter.com/rufuspollock>Open Knowledge <http://okfn.org/> - see
>> how data can change the world**http://okfn.org/ <http://okfn.org/> |
>> @okfn <http://twitter.com/OKFN> | Open Knowledge on Facebook
>> <https://www.facebook.com/OKFNetwork> |  Blog <http://blog.okfn.org/>*
>>
>> _______________________________________________
>> od-discuss mailing list
>> od-discuss at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/od-discuss
>> Unsubscribe: https://lists.okfn.org/mailman/options/od-discuss
>>
>>
>
> _______________________________________________
> od-discuss mailing list
> od-discuss at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/od-discuss
> Unsubscribe: https://lists.okfn.org/mailman/options/od-discuss
>
>


-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/od-discuss/attachments/20150729/294d7f6e/attachment-0003.html>


More information about the od-discuss mailing list