[od-discuss] Machine readability in v2.1

Andrew Stott andrew.stott at dirdigeng.com
Wed Jul 29 09:35:40 UTC 2015


I would still be worried that this formulation could be interpreted as allowing PDFs of data. It needs to be the *content*, not the *form*, which needs to be easily accessed and processed by a computer. (Believe me a raw PDF file is *much* easier for a computer to read than a human!). So what about:

 

The work should be provided in "machine-readable" form, that is one IN WHICH THE CONTENT can be easily accessed and processed by a computer, and which is in form in which modifications to individual elements OF THE CONTENT can easily be performed.

 

Regards

 

Andrew

From: od-discuss [mailto:od-discuss-bounces at lists.okfn.org] On Behalf Of Rufus Pollock
Sent: 29 July 2015 09:53
To: Andrew Rens
Cc: od-discuss at lists.okfn.org
Subject: [od-discuss] Machine readability in v2.1 (was: Re: [okfn-discuss] Open Definition 2.1 final draft)

 

Just forking subject as the thread was heading off in new directions!

 

I appreciate, as Mike points out, that there will be variation and context specificity in what exactly constitutes machine readability but I think the general principle can be made clear. I also appreciate that we are attempting that with the current phrasing. In the spirit of offering something concrete, what about a new section 1.4 as follows:

 

1.4 Machine Readability

 

The work should be provided in "machine-readable" form, that is one that can be easily accessed and processed by a computer, and which is in form in which modifications to individual data elements can easily be performed.

 

I also note we have the following definition of machine readable in the Open Data Handbook:

 

http://opendatahandbook.org/glossary/en/terms/machine-readable/

 

<quote>

Data in a data format that can be automatically read and processed by a computer, such as CSV, JSON, XML, etc. Machine-readable data must be structured data. Compare human-readable.

 

Non-digital material (for example printed or hand-written documents) is by its non-digital nature not machine-readable. But even digital material need not be machine-readable. For example, consider a PDF document containing tables of data. These are definitely digital but are not machine-readable because a computer would struggle to access the tabular information - even though they are very human readable. The equivalent tables in a format such as a spreadsheet would be machine readable.

 

As another example scans (photographs) of text are not machine-readable (but are human readable!) but the equivalent text in a format such as a simple ASCII text file or a text-processing format such as Microsoft Word file is machine readable.

 

Note: The appropriate machine readable format may vary by type of data - so, for example, machine readable formats for geographic data may differ from those for tabular data.

</quote>

 

Regards,

 

Rufus

 

On 28 July 2015 at 22:05, Andrew Rens <andrewrens at gmail.com> wrote:

Hi

Perhaps it would be useful to be specific about "machine readable" in respect of data but expressly state that this specificity flows from  the general principle in 1.3 
"The work should be provided in the form preferred for working with and making modifications to it"  or whatever the final wording is agree. Additional wording would then stipulate: "When a work consists of or contains data then the preferred form for that data is a form that enables a recipient use automated processes to use or modify the data as a whole or in part."

This would help by showing how the principle would be applied to one kind of knowledge. 

Of course automated processes like machine readable requires some refinement - algorithmic processes perhaps?

Andrew

 

 




Andrew Rens



 

On 28 July 2015 at 15:59, Aaron Wolf <wolftune at riseup.net> wrote:



On 07/28/2015 03:44 PM, Mike Linksvayer wrote:
> On 07/28/2015 10:21 AM, Aaron Wolf wrote:
>> On 07/28/2015 01:07 PM, Benjamin Ooghe-Tabanou wrote:
>>> Yes I agree also that the "as a whole" is fine regarding "bulk"
>>>
>>> As Rufus pointed out my main concern left is on machine-readability.
>>> Aaron I understand we want the OD to handle a larger picture than just
>>> data, but since it has historically been used primarily for data, I
>>> just want to make sure we can keep doing it afterwards and do not lose
>>> actual specific requirements.
>>> That's I why I proposed to simply replace the blurred "in a form
>>> preferred" sentenced with a sentence precising the specific case of
>>> data as It was agreed on earlier in the process.
>>> As such, 1.3 first concerns "work" globally. Having at the end a "Data
>>> must be machine readable" would add the proper precision.
>>>
>>> Benjamin Ooghe-Tabanou
>>>
>>>
>>
>> Adding "Data must be machine readable" to the end of 1.3 sounds fine to
>> me. Let's do that.
>
> Looks like superfluous jargon to me:
>
> - the underlying issue of works being provided in a manner that the work
> in question can be easily processed and manipulated is not specific to
> data (even from a data-centric worldview, eg to mine data from 'content')
>

I am willing to consent to others' concerns, but I'm with Mike: 'should
be provided in the form preferred for making modifications to it' — in
principle, that means you have data you can actually use, i.e.
machine-readable if that's the way you would usually manage the data.

But, I could see changing 'making modifications' to 'working with and
modifying' — working with data may be analyzing it but not modifying the
data. So, to do analysis, you'd want it to be machine-readable, but this
is independent of modifying the data.

So, I think we need to have a better generalized wording here.

I suggest 'provided in the form preferred for working with and making
modifications to it'

My concern here is about the "must" vs "should" aspect: If we used
"must" would that say that my video is not "open" unless I provide all
the source files? I have mixed feelings about that but certainly don't
want it any stronger than "available upon request". We don't want to
block the distribution of videos by making *all* distributions
necessarily include all source files.


> - machine-readability is not defined (with respect to what? eg a bitmap
> image is read by a machine, even if it is encodes a scan of 'data' from
> a printout)
>

I had this same concern about "machine-readability", but I thought
qualifying this as data-specific would be acceptable. But I'm not sure.



> Mike
>
> _______________________________________________
> od-discuss mailing list

> od-discuss at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/od-discuss
> Unsubscribe: https://lists.okfn.org/mailman/options/od-discuss
>

--
Aaron Wolf
co-founder, Snowdrift.coop
music teacher, wolftune.com

_______________________________________________
od-discuss mailing list
od-discuss at lists.okfn.org
https://lists.okfn.org/mailman/listinfo/od-discuss
Unsubscribe: https://lists.okfn.org/mailman/options/od-discuss

 


_______________________________________________
od-discuss mailing list
od-discuss at lists.okfn.org
https://lists.okfn.org/mailman/listinfo/od-discuss
Unsubscribe: https://lists.okfn.org/mailman/options/od-discuss





 

-- 

Rufus Pollock

Founder and President | skype: rufuspollock |  <https://twitter.com/rufuspollock> @rufuspollock

 <http://okfn.org/> Open Knowledge - see how data can change the world

 <http://okfn.org/> http://okfn.org/ |  <http://twitter.com/OKFN> @okfn |  <https://www.facebook.com/OKFNetwork> Open Knowledge on Facebook |   <http://blog.okfn.org/> Blog



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/od-discuss/attachments/20150729/2266964e/attachment-0002.html>


More information about the od-discuss mailing list