[od-discuss] A harmonised Open Format definition

Rufus Pollock rufus.pollock at okfn.org
Wed Apr 22 07:41:02 UTC 2015


On 21 April 2015 at 20:16, Andrew Stott <andrew.stott at dirdigeng.com> wrote:

> I had been puzzling over some of the same issues as Aaron.
>
>
>
> A set of PNG format files would be a re-usable way of sharing digital
> images of  paintings in a gallery's collection but it would not be an
> (easily) re-usable way of sharing a national budget.
>
>
>
> We also need to be careful about terms like "machine-readable" - a PNG
> file of a national budget is machine-readable (or, at least, more readable
> by a machine than by a human!) but its machine-readability does not make
> the data in it easily reusable.
>
This is a good point. We tried to come up with a more precise definition of
machine readability a couple of years back:

Material (data or content) is machine readable if it is in a format that
can be easily processed by a computer.

Non-digital material (for example printed or hand-written documents) is by
its non-digital nature not machine-readable. But even digital material need
not be machine-readable. For example, consider a PDF document containing
tables of data. These are definitely digital but are not machine-readable
because a computer would struggle to access the tabular information (even
though they are very human readable!). The equivalent tables in a format
such as a spreadsheet would be machine readable.

As another example scans (photographs) of text are not machine-readable
(but are human readable!) but the equivalent text in a format such as a
simple ASCII text file or a text-processing format such as Microsoft Word
file is machine readable.

Note: The appropriate machine readable format may vary by type – so, for
example, machine readable form for geographic data may be different than
for tabular data.

http://webarchive.okfn.org/okfn.org/201404/opendata/glossary/#machine-readable

Regards,

Rufus

> HTML illustrates a  further difficulty - the reusability of a dataset may
> depend on how it is encoded in HTML - if it is just text then it is more
> difficult to parse programmatically than if it structured with semantic
> tagging. (Similarly there is some pretty tricky XML around for the same
> reason.)
>
>
>
> A lot of work was done a few years ago on the definition of an "Open
> Standard", including a legal text that a number of Members of the European
> Parliament tried to insert into EU law. There is a link between that work
> and the Open Format initiative.
>
>
>
> -----Original Message-----
> From: od-discuss [mailto:od-discuss-bounces at lists.okfn.org] On Behalf Of
> Aaron Wolf
> Sent: 21 April 2015 17:31
> To: od-discuss at lists.okfn.org
> Subject: Re: [od-discuss] A harmonised Open Format definition
>
>
>
> I think this concept is excellent, the consolidation and the clarification
> of having a page of recognized Open formats.
>
>
>
> Your points here bring up a serious problem though.
>
>
>
> If HTML and JPEG formats are considered non-machine-readable then we
> absolutely *have* to *remove* the machine-readable requirement from the
> Open Definition. This is a very serious issue. The OD covers thing like
> photographs and other images along with stuff like the writings on a blog.
> It is absolutely unacceptable to have the "machine-readable" part in the
> requirements for format in the OD if it excludes these things!
>
>
>
> I think we need to fix OD 2.1 to clarify that what is considered an Open
> Format depends on the type of content. Obviously, a JPEG screenshot of a
> webpage is not an Open Format for the webpage content, but HTML is. But we
> cannot say that a JPEG photograph is non-Open format. We could say that
> Open Data specifically should not be HTML… but I'm not certain about that
> bit.
>
>
>
> This absolutely must be addressed and clarified.
>
>
>
> FWIW, I collected some initial bits about what qualifies as Open Format at
> https://snowdrift.coop/p/snowdrift/w/en/formats-repositories and in that
> case it is listed by the sort of project we're talking about. I would love
> to see this more formally included in the OD.
>
>
>
> Best,
>
> Aaron
>
>
>
> On 04/21/2015 06:18 AM, Stephen Gates wrote:
>
> > Hello Open Knowledge and Open Data Institute friends,
>
> >
>
> > I would like to explore the possibility of aligning the Open
>
> > Definition <http://opendefinition.org/od/>, Open Data Census
>
> > <http://census.okfn.org> and Open Data Certificates
>
> > <https://certificates.theodi.org> definitions for Open Format. This
>
> > would enable the Census, Certificate and other open data tools to
>
> > refer to the Open Definition for a definition of Open Format, in the
>
> > same way they currently do for Open Licences.
>
> >
>
> > To extend this concept further, I would like to mirror the Conformant
>
> > Licenses <http://opendefinition.org/licenses/> page in the Open
>
> > Definition with a Conformant Formats page. This would provide a list
>
> > of file formats that conform with the Open Format definition. New
>
> > formats could be submitted for assessment. Common formats (e.g. XML,
>
> > JSON, KML, CSV, etc.) would be seeded on the page. Similar to the
>
> > non-conformant licences
>
> > <http://opendefinition.org/licenses/nonconformant/>  partially
>
> > conforming formats could also be captured (e.g. XLS, SHP). This would
>
> > cater for the spectrum of open file formats proposed by Tim Burners-Lee
> in his 5 star scheme <http://5stardata.info>.
>
> >
>
> >
>
> > The respective definitions or help text are:
>
> >
>
> > *Open Definition* draft 2.1
>
> > https://github.com/okfn/opendefinition/blob/master/source/open-definit
>
> > ion-2.1-dev.markdown
>
> >
>
> > The *work*/must/be machine-readable and provided in an open format. An
>
> > open format is one which places no restrictions, monetary or
>
> > otherwise, upon its use and can be fully processed with at least one
>
> > free/libre/open-source software tool. Data /should/be provided in bulk
>
> > where possible.
>
> >
>
> > *
>
> > *
>
> >
>
> > *Open Data Census*
>
> >
>
> > see format, machine readable and bulk rows in Google Sheet,
>
> >
>
> > https://docs.google.com/spreadsheet/ccc?key=0AqR8dXc6Ji4JdFI0QkpGUEZyS
>
> > 0wxYWtLdG1nTk9zU3c&usp=drive_web#gid=0
>
> >
>
> > *Format*:
>
> > This question describes the form that the data is available in. For
>
> > example, for tabular data it might be: Excel, CSV, HTML or even PDF.
>
> > For geodata it might be shapefiles, geojson or something else. If
>
> > available in multiple formats, the format descriptors are listed
>
> > separated with commas. Any further information is put in the comments
> section.
>
> >
>
> > *Machine Readable*:
>
> > Files are digital, yes, but not all can be processed or parsed easily
>
> > by a computer. In order to answer this question, you would need to
>
> > look at the datasets file type.
>
> >
>
> > As a rule of thumb the following file types are machine readable:
>
> >
>
> > - XLS
>
> > - CSV
>
> > - JSON
>
> > - XML
>
> >
>
> > If the files are in the following formats, the are NOT machine readable:
>
> >
>
> > - HTML
>
> > - PDF
>
> > - DOC
>
> > - GIF
>
> > - JPEG
>
> > - PPT
>
> >
>
> > If you have a different file type and you don’t know if it’s machine
>
> > readable or not, send an email to the Open Data Census list.
>
> >
>
> > *Bulk*:
>
> > Data is available in bulk if the whole dataset can be downloaded easily.
>
> > It is considered non-bulk if the citizens are limited to getting parts
>
> > of the dataset through an online interface.
>
> >
>
> > For example, if restricted to querying a web form and retrieving a few
>
> > results at a time from a very large database.
>
> >
>
> > *
>
> > *
>
> > *
>
> > Open Data Certificates*
>
> >
>
> > Question: Is this data in a standard open format?
>
> >
>
> > Help Text: Open standards are created through a fair, transparent and
>
> > collaborative process. Anyone can implement them and there’s lots of
>
> > support so it’s easier for you to share data with more people. For
>
> > example, XML, CSV and JSON are open standards. _Read more_… (links to
>
> > https://www.gov.uk/government/uploads/system/uploads/attachment_data/f
>
> > ile/183962/Open-Standards-Principles-FINAL.pdf)*
>
> > *
>
> >
>
> >
>
> > *Proposed changes*
>
> > *A harmonised definition*
>
> >
>
> > The *work* /must/be machine-readable and provided in an open format.
>
> > An open format is one which places no restrictions, monetary or
>
> > otherwise, upon its use and can be fully processed with at least one
>
> > free/libre/open-source software tool.
>
> >
>
> > In addition:
>
> > - Data /should/ be provided in bulk, i.e. the whole dataset can be
>
> > downloaded easily.
>
> > - An open format /should/ be documented so it can be freely
>
> > implemented by others.
>
> > - An open format /should/ be defined through a fair, transparent and
>
> > collaborative process.
>
> >
>
> > *Open Data Census and Open Data Certificates* Adjust questions and
>
> > help text to reference the Open Format definition and/or conformant
>
> > licenses page.
>
> >
>
> > *Open Definition site*
>
> > - C
>
> > onsider changing the page names from “Conformant Licences” and
>
> > “Conformant Formats” to “Open Licences” and “Open Formats”.
>
> > - Delete the open format definition page
>
> > <http://opendefinition.org/ofd/>. It is replaced by the Open Formats
>
> > page and the updated Open Definition.
>
> >
>
> >
>
> >
>
> > *What do you think? *
>
> > Is this worth progressing? Could this extend to Open APIs like Web Map
>
> > Services (WMS)?
>
> >
>
> >
>
> > thanks
>
> >
>
> > Stephen Gates
>
> > (localiser of the Open Data Census and Open Data Certificates in
>
> > Australia)
>
> >
>
> >
>
> >
>
> >
>
> > _______________________________________________
>
> > od-discuss mailing list
>
> > od-discuss at lists.okfn.org
>
> > https://lists.okfn.org/mailman/listinfo/od-discuss
>
> > Unsubscribe: https://lists.okfn.org/mailman/options/od-discuss
>
> >
>
> _______________________________________________
>
> od-discuss mailing list
>
> od-discuss at lists.okfn.org
>
> https://lists.okfn.org/mailman/listinfo/od-discuss
>
> Unsubscribe: https://lists.okfn.org/mailman/options/od-discuss
>
> _______________________________________________
> od-discuss mailing list
> od-discuss at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/od-discuss
> Unsubscribe: https://lists.okfn.org/mailman/options/od-discuss
>
>


-- 

*Rufus PollockFounder and President | skype: rufuspollock | @rufuspollock
<https://twitter.com/rufuspollock>Open Knowledge <http://okfn.org/> - see
how data can change the world**http://okfn.org/ <http://okfn.org/> | @okfn
<http://twitter.com/OKFN> | Open Knowledge on Facebook
<https://www.facebook.com/OKFNetwork> |  Blog <http://blog.okfn.org/>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/od-discuss/attachments/20150422/3956acd1/attachment-0003.html>


More information about the od-discuss mailing list