[od-discuss] Open Definition 2.1 final draft - Open Format

Peter Murray-Rust pm286 at cam.ac.uk
Wed Jul 29 11:24:26 UTC 2015


I have been meaning to contribute, but have been unclear what to say.

The issue of "machine-readability" is fundamentally both complex and
changes rapidly. I have been involved for many years in trying to create
"semantic" or "machine-processable" or "machine-understandable"
information. I use these in preference to "machine-readable" and indeed
sometimes contrast them.

The problem is that there are often many formats and technologies for
conveying information and they often form a hierarchy of value. A JPEG may
be better than a unique piece of paper somewhere as it's more easily
copied. Many people would regard this as "machine-readable" in that it can
be read into a machine and viewed by a human. But this is usually severely
suboptimal.

The higher up the hierarchy we go, the higher the effort and/or cost. And,
unfortunately, some people wish to make stuff as difficult to use as
possible. Thus in scholarly publishing Digital Science (was Macmillan) has
created ReadCube - (https://www.readcube.com/). It distributes PDFs in a
manner that cannot be printed, saved, scraped from the screen, etc. It's
effectively DRM. But it's "machine-readable".

I can't see that a single phrase will convey the meaning. To give an
example from my own work in chemistry:
 - spectrometer produces high-quality didgital data (spectrum) with 16K
data points and 16 bit resolution for each. This is what people actually
want. Unfortunately many of the formats are proprietary.... Journal
requires "supplemental data", so we have.
 * paper copy of spectrum (not machine readable) unacceptable
 * scanned coffee-stained spectrum (has been deposited as supplemental data
and it's "machine-readable")
  * PDF copy of spectrum as bit map (very common) only useful for human eyes
 * PDF containg EPS. Most people can't read this except with human eyes -
my software is starting to be able to do it.
 * JCAMP ASCII file (an acceptable standard, and software exists, but noone
uses it)
 * Bruker proprietary format - only usuable on Bruker machines.
 * CML (Chemical Markup Language, PMR). Designed specifically for the
purpose, virtually unused.

Result - spectra published per yeare = 10 million
spectra reused per year - ca 0.01%

The goal is to have something that is as useful to the user as it was to
the creator. Some phrase of that sort expresses what I'd like to see.

I think the only way to tackle this in practice is to explain each case in
the detail that I have just done. Obviously we can't do this for each case.
But we really need to persuade the various communities of practice to
create a set of minimal levels for each case.





On Wed, Jul 29, 2015 at 5:13 AM, Stephen Gates <stephen.gates at me.com> wrote:

> I’d like to test my understanding of 1.3 Open Format in the current draft.
>
>
>
> *Someone publishes an open format photograph of tabular data - is it open?*
>
> Given the current draft:
>
> “The *work* *must* be provided in an open format. An open format is one
> which places no restrictions, monetary or otherwise, upon its use and can
> be fully processed with at least one free/libre/open-source software tool.
> The work *should* be provided in the form preferred for making
> modifications to it.”
>
>
>
> Then, in the example above, you could argue that by publishing the work as
> a photo, you have placed a restriction on the work being fully processed –
> hence it is not open.  But when you apply the last  sentence, then the work
> may be open.
>
>
> Given the definition for "Should":
>
> “SHOULD - This word, or the adjective "RECOMMENDED", means that there may
> exist valid reasons in particular circumstances to ignore a particular
> item, but the full implications must be understood and carefully weighed
> before choosing a different course.” - RFC2119.
>
>
>
> So:
>
>    - If you only had a hard copy of the tabular data and you published a
>    photo of it, then it is Open.
>    - If you photographed the tabular data as shown on your computer
>    screen in Excel, then it is Not Open.
>
> Is that how others interpret the draft definition?
>
>
>
> *Preferred:*
>
> I get stuck on the word “preferred”.  Who’s preference is it – the
> publisher or the consumer? One consumer may prefer XML and another CSV, and
> the publisher something else.
>
>
>
> Perhaps, “The work *should* be provided in an open format that allows it
> to be modified”.
>
>
>
> I know “allows” was not supported in discussions at
> github.com/okfn/opendefinition/issues/68  but now that we have "must" in
> the preceding sentence, perhaps “allows” is a little more acceptable as the
> format can be fully processed by open source software.
>
>
> I'm happy the machine-readable is implied by the definition above.
> "Machine-readable" causes it's own set of issues  that I previously
> explored - http://opendefinition.org/ofd/
>
>
>
> *Fully Processed:*
>
> Along a different line of thinking, If the work must be able to be fully
> processed and fully processing includes modification, then the last
> sentence is not need and we just got stricter.
>
>
> I don't mind we're we land on this but I just suggesting some potential
> points of confusion.
>
>
> *Asides:*
>
>    - I plan to update opendefinition.org <http://opendefinition.org/ofd/>
>    /ofd/ <http://opendefinition.org/ofd/> and focus down on open formats
>    and add content from github.com
>    <http://github.com/okfn/opendatacensus/issues/585>
>    /okfn/opendatacensus/issues/585
>    <http://github.com/okfn/opendatacensus/issues/585>
>    - I'm thinking about creating a set of examples on opendefinition.org
>    <http://opendefinition.org/ofd/>/ofd/ <http://opendefinition.org/ofd/> to
>    provide practical advice on what counts as an open format - thoughts?
>    - Was a decision made about moving the mailing list to the Open
>    Knowledge Discussion Forum discuss.okfn.org/
>
> thanks
> Stephen Gates
>
>
> On 29 Jul, 2015,at 03:08 AM, Benjamin Ooghe-Tabanou <b.ooghe at gmail.com>
> wrote:
>
> Yes I agree also that the "as a whole" is fine regarding "bulk"
>
> As Rufus pointed out my main concern left is on machine-readability.
> Aaron I understand we want the OD to handle a larger picture than just
> data, but since it has historically been used primarily for data, I
> just want to make sure we can keep doing it afterwards and do not lose
> actual specific requirements.
> That's I why I proposed to simply replace the blurred "in a form
> preferred" sentenced with a sentence precising the specific case of
> data as It was agreed on earlier in the process.
> As such, 1.3 first concerns "work" globally. Having at the end a "Data
> must be machine readable" would add the proper precision.
>
> Benjamin Ooghe-Tabanou
>
>
> On Tue, Jul 28, 2015 at 6:47 PM, Rufus Pollock <rufus.pollock at okfn.org>
> wrote:
>
> I also think "as a whole" is also satisfactory (though I like bulk too ;-)
>
> ...).
>
>
> On the "machine readability" point I really think that has got a bit lost
> as
>
> Benjamin also suggested. I don't think "in a form preferred for making
>
> modifications" quite does it. I really wonder if for this we want a 1.4 as
>
> it is so central and is distinct from open format.
>
>
> I do apologize for coming in a bit late on this process and want to
>
> acknowledge the huge improvements we have seen and efforts towards that -
> as
>
> well as the exemplary cat-herding from Herb and others!
>
>
> Rufus
>
>
> On 28 July 2015 at 17:27, Leigh Dodds <leigh.dodds at theodi.org> wrote:
>
>
> Personally I'm fine with "as a whole", I think it conveys the intention
>
> well enough. "Bulk" does seem like jargon to me.
>
>
> Cheers,
>
>
> L.
>
>
> On 27 July 2015 at 18:15, Herb Lainchbury <herb.lainchbury at gmail.com>
>
> wrote:
>
>
> As Stephen Gates explains here , 2.1 the "bulk" requirement is now a
>
> *must*. We use the words "as a whole" rather than "bulk" so, 2.1 starts of
>
> as:
>
>
> "The work must be provided as a whole and..."
>
>
> We could instead say something like:
>
>
> "The work must be provided in bulk and..."
>
>
> but "bulk" seems to me like data specific jargon so seems a bit out of
>
> place to me used with "The work".
>
>
> I think the question to ask is - does "as a whole" sufficiently convey
>
> what we mean here? If so, then I think 2.1 stands as is. If not, then
>
> let's tweak it so it does explicitly convey what we want.
>
>
>
>
>
>
>
>
> On Mon, Jul 20, 2015 at 1:17 AM, Rufus Pollock <rufus.pollock at okfn.org>
>
> wrote:
>
>
> I'm also +1 on a strong explicit bulk statement.
>
>
> On 19 July 2015 at 21:58, Benjamin Ooghe-Tabanou <b.ooghe at gmail.com>
>
> wrote:
>
>
> Hi Herb and everyone, and thanks a lot for the mailing-list notice.
>
>
> I seem to have missed the latest updates regarding 1.3 and I'm only
>
> catching up now which I feel a bit guilty about... :/
>
>
> I've been exploring all the latest commits and I'm worried the
>
> successive changes have lost in the way both references to bulk access
>
> (which was indeed moved to 1.2, but then removed as redundant with "as
>
> a whole"), and to machine-readability (which makes me feel like
>
> current 1.3 could make now pdf acceptable for data for instance)
>
>
> In exchange we got this final sentence that sounds a bit unclear and
>
> blurred to me : "The work should be provided in the form preferred for
>
> making modifications to it."
>
>
> Although I understand we want to go forward a more global
>
> opendefinition than one adressing only data, I feel like it will still
>
> be one of the reference documents for data and should then still have
>
> clear precisions regarding them.
>
>
> So with this in mind, I feel like one of the previous formulation of
>
> Art 1.3 in the rewriting process was a lot more clear and adressing
>
> this matter of expliciting specifically for data these two required
>
> features : "Data must be machine-readable and should be provided in
>
> bulk."
>
> (cf this version
>
>
>
> https://github.com/okfn/opendefinition/blob/2766b3fd209799993d5ada55a3e7ac92a5d1115c/source/open-definition-2.1-dev.markdown#13-open-format
>
> )
>
>
>
> Benjamin Ooghe-Tabanou
>
>
>
> On Fri, Jul 17, 2015 at 8:30 PM, Herb Lainchbury
>
> <herb.lainchbury at gmail.com> wrote:
>
> > After further discussion, consideration and much input from various
>
> > people
>
> > in the community I think we're ready to consider the current Open
>
> > Definition
>
> > draft 2.1 dev for acceptance.
>
> >
>
> > You can find the current draft 2.1 dev version here:
>
> >
>
> >
> https://github.com/okfn/opendefinition/blob/master/source/open-definition-2.1-dev.markdown
>
> >
>
> > The actual diff can be viewed here: http://git.io/vm6W8
>
> > (note: this diff includes all changes to the repository so use the
>
> > "Files
>
> > Changed" tab to see just the changes to the
>
> > "source/open-definition-2.1-dev.markdown" file.
>
> >
>
> > The main discussions centred around the preamble as well as clauses
>
> > 1.3,
>
> > 2.2.3, 2.2.5 and 2.2.6.
>
> >
>
> > Most of the issues addressed are also documented here:
>
> >
>
> >
> https://github.com/okfn/opendefinition/issues?utf8=%E2%9C%93&q=label%3A2.1
>
> >
>
> >
>
> > Please pay particular attention to 1.3 in your review as that clause
>
> > was one
>
> > of the main reasons for this update and we want to ensure it is as
>
> > good as
>
> > we can make it. See discussions here and here and here.
>
> >
>
> > An attribution clause has also been added to the definition to
>
> > recognize the
>
> > work the definition is based on.
>
> >
>
> >
>
> > Please submit any further comments on the od-discuss list.
>
> >
>
> > Please take this opportunity to raise any final objections to voting
>
> > on
>
> > final acceptance of this draft. If no objections are received I will
>
> > call
>
> > for a vote in approximately one week.
>
> >
>
> >
>
> > Please disseminate this note further as you see fit and if you know
>
> > of
>
> > another list that we should notify, please let me know.
>
> >
>
> > Thank you,
>
> > Herb Lainchbury
>
> > Chair, Open Definition Advisory Council
>
> >
>
> > ----------
>
> >
>
> > In summary, the changes from 2.0 to the current 2.1dev are:
>
> >
>
> > Preamble
>
> >
>
> > - reference to OSD changed to wikipedia
>
> >
>
> > - change to summary section to simplify and improve clarity of the
>
> > term
>
> > **license**
>
> >
>
> >
>
> > 1.
>
> >
>
> > - fixed formatting typo
>
> >
>
> >
>
> > 1.2
>
> >
>
> > - from shall to must and from preferable to should
>
> >
>
> >
>
> > 1.3
>
> >
>
> > - from "or" to "and"
>
> >
>
> > - from "processed" to "fully processed"
>
> >
>
> > - removed bulk suggestion - already covered in 1.2
>
> >
>
> > - added *should* be provided in form preferred for making
>
> > modifications to
>
> > it
>
> >
>
> >
>
> > 2.
>
> >
>
> > - added “should be compatible”
>
> >
>
> > - fixed formatting typo
>
> >
>
> > 2.2
>
> >
>
> > - changed shall to must
>
> >
>
> > 2.2.1
>
> >
>
> > - added missing comma
>
> >
>
> > 2.2.3
>
> >
>
> > -The **license** *may* require copies or derivatives of a licensed
>
> > work to
>
> > remain under a license the same as or similar to the original.
>
> >
>
> > +The **license** *may* require distributions of the work to remain
>
> > under the
>
> > same license or a similar license.
>
> >
>
> > 2.2.5
>
> >
>
> > -The **license** *may* require modified works to be made available in
>
> > a form
>
> > preferred for further modification.
>
> >
>
> > +The **license** *may* require that anyone distributing the work
>
> > provide
>
> > recipients with access to the preferred form for making
>
> > modifications.
>
> >
>
> >
>
> > 2.2.6
>
> >
>
> > -The **license** *may* prohibit distribution of the work in a manner
>
> > where
>
> > technical measures impose restrictions on the exercise of otherwise
>
> > allowed
>
> > rights.
>
> >
>
> > +The **license** *may* require that distributions of the work remain
>
> > free of
>
> > any technical measures that would restrict the exercise of otherwise
>
> > allowed
>
> > rights.
>
> >
>
> >
>
> >
>
> > Attribution
>
> > +The Open Definition was initially derived from the Open Source
>
> > Definition,
>
> > which in turn was derived from the original Debian Free Software
>
> > Guidelines,
>
> > and the Debian Social Contract of which they are a part, which were
>
> > created
>
> > by Bruce Perens and the Debian Developers. Bruce later used the same
>
> > text in
>
> > creating the Open Source Definition. This definition is substantially
>
> > derivative of those documents and retains their essential principles.
>
> > Richard Stallman was the first to push the ideals of software freedom
>
> > which
>
> > we continue.
>
> >
>
> >
>
> >
>
> >
>
> > --
>
> > Herb
>
> >
>
> > _______________________________________________
>
> > od-discuss mailing list
>
> > od-discuss at lists.okfn.org
>
> > https://lists.okfn.org/mailman/listinfo/od-discuss
>
> > Unsubscribe: https://lists.okfn.org/mailman/options/od-discuss
>
> >
>
> _______________________________________________
>
> okfn-discuss mailing list
>
> okfn-discuss at lists.okfn.org
>
> https://lists.okfn.org/mailman/listinfo/okfn-discuss
>
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-discuss
>
>
>
>
>
> --
>
>
> Rufus Pollock
>
>
> Founder and President | skype: rufuspollock | @rufuspollock
>
>
> Open Knowledge - see how data can change the world
>
>
> http://okfn.org/ | @okfn | Open Knowledge on Facebook | Blog
>
>
>
>
>
> --
>
> --
>
> Herb
>
>
> _______________________________________________
>
> od-discuss mailing list
>
> od-discuss at lists.okfn.org
>
> https://lists.okfn.org/mailman/listinfo/od-discuss
>
> Unsubscribe: https://lists.okfn.org/mailman/options/od-discuss
>
>
>
>
>
> --
>
> Leigh Dodds, Senior Consultant, theODI.org
>
> @ldodds
>
> The ODI, 65 Clifton Street, London EC2A 4JE
>
>
>
>
>
> --
>
>
> Rufus Pollock
>
>
> Founder and President | skype: rufuspollock | @rufuspollock
>
>
> Open Knowledge - see how data can change the world
>
>
> http://okfn.org/ | @okfn | Open Knowledge on Facebook | Blog
>
>
> _______________________________________________
> od-discuss mailing list
> od-discuss at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/od-discuss
> Unsubscribe: https://lists.okfn.org/mailman/options/od-discuss
>
>


-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/od-discuss/attachments/20150729/a03a3bb6/attachment-0003.html>


More information about the od-discuss mailing list