[od-discuss] A harmonised Open Format definition

Andrew Stott andrew.stott at dirdigeng.com
Tue Apr 21 18:16:28 UTC 2015


I had been puzzling over some of the same issues as Aaron.

 

A set of PNG format files would be a re-usable way of sharing digital images
of  paintings in a gallery's collection but it would not be an (easily)
re-usable way of sharing a national budget. 

 

We also need to be careful about terms like "machine-readable" - a PNG file
of a national budget is machine-readable (or, at least, more readable by a
machine than by a human!) but its machine-readability does not make the data
in it easily reusable. 

 

HTML illustrates a  further difficulty - the reusability of a dataset may
depend on how it is encoded in HTML - if it is just text then it is more
difficult to parse programmatically than if it structured with semantic
tagging. (Similarly there is some pretty tricky XML around for the same
reason.)

 

A lot of work was done a few years ago on the definition of an "Open
Standard", including a legal text that a number of Members of the European
Parliament tried to insert into EU law. There is a link between that work
and the Open Format initiative.

 

-----Original Message-----
From: od-discuss [mailto:od-discuss-bounces at lists.okfn.org] On Behalf Of
Aaron Wolf
Sent: 21 April 2015 17:31
To: od-discuss at lists.okfn.org
Subject: Re: [od-discuss] A harmonised Open Format definition

 

I think this concept is excellent, the consolidation and the clarification
of having a page of recognized Open formats.

 

Your points here bring up a serious problem though.

 

If HTML and JPEG formats are considered non-machine-readable then we
absolutely *have* to *remove* the machine-readable requirement from the Open
Definition. This is a very serious issue. The OD covers thing like
photographs and other images along with stuff like the writings on a blog.
It is absolutely unacceptable to have the "machine-readable" part in the
requirements for format in the OD if it excludes these things!

 

I think we need to fix OD 2.1 to clarify that what is considered an Open
Format depends on the type of content. Obviously, a JPEG screenshot of a
webpage is not an Open Format for the webpage content, but HTML is. But we
cannot say that a JPEG photograph is non-Open format. We could say that Open
Data specifically should not be HTML. but I'm not certain about that bit.

 

This absolutely must be addressed and clarified.

 

FWIW, I collected some initial bits about what qualifies as Open Format at
<https://snowdrift.coop/p/snowdrift/w/en/formats-repositories>
https://snowdrift.coop/p/snowdrift/w/en/formats-repositories and in that
case it is listed by the sort of project we're talking about. I would love
to see this more formally included in the OD.

 

Best,

Aaron

 

On 04/21/2015 06:18 AM, Stephen Gates wrote:

> Hello Open Knowledge and Open Data Institute friends,

> 

> I would like to explore the possibility of aligning the Open 

> Definition < <http://opendefinition.org/od/>
http://opendefinition.org/od/>, Open Data Census 

> < <http://census.okfn.org> http://census.okfn.org> and Open Data
Certificates 

> < <https://certificates.theodi.org> https://certificates.theodi.org>
definitions for Open Format. This 

> would enable the Census, Certificate and other open data tools to 

> refer to the Open Definition for a definition of Open Format, in the 

> same way they currently do for Open Licences.

> 

> To extend this concept further, I would like to mirror the Conformant 

> Licenses < <http://opendefinition.org/licenses/>
http://opendefinition.org/licenses/> page in the Open 

> Definition with a Conformant Formats page. This would provide a list 

> of file formats that conform with the Open Format definition. New 

> formats could be submitted for assessment. Common formats (e.g. XML, 

> JSON, KML, CSV, etc.) would be seeded on the page. Similar to the 

> non-conformant licences 

> < <http://opendefinition.org/licenses/nonconformant/>
http://opendefinition.org/licenses/nonconformant/>  partially 

> conforming formats could also be captured (e.g. XLS, SHP). This would 

> cater for the spectrum of open file formats proposed by Tim Burners-Lee in
his 5 star scheme < <http://5stardata.info> http://5stardata.info>.

> 

> 

> The respective definitions or help text are:

> 

> *Open Definition* draft 2.1

>  <https://github.com/okfn/opendefinition/blob/master/source/open-definit>
https://github.com/okfn/opendefinition/blob/master/source/open-definit

> ion-2.1-dev.markdown

> 

> The *work*/must/be machine-readable and provided in an open format. An 

> open format is one which places no restrictions, monetary or 

> otherwise, upon its use and can be fully processed with at least one 

> free/libre/open-source software tool. Data /should/be provided in bulk 

> where possible.

> 

> *

> *

> 

> *Open Data Census*

> 

> see format, machine readable and bulk rows in Google Sheet,

>  

>  <https://docs.google.com/spreadsheet/ccc?key=0AqR8dXc6Ji4JdFI0QkpGUEZyS>
https://docs.google.com/spreadsheet/ccc?key=0AqR8dXc6Ji4JdFI0QkpGUEZyS

> 0wxYWtLdG1nTk9zU3c&usp=drive_web#gid=0

> 

> *Format*:

> This question describes the form that the data is available in. For 

> example, for tabular data it might be: Excel, CSV, HTML or even PDF. 

> For geodata it might be shapefiles, geojson or something else. If 

> available in multiple formats, the format descriptors are listed 

> separated with commas. Any further information is put in the comments
section.

> 

> *Machine Readable*:

> Files are digital, yes, but not all can be processed or parsed easily 

> by a computer. In order to answer this question, you would need to 

> look at the datasets file type.

> 

> As a rule of thumb the following file types are machine readable:

> 

> - XLS

> - CSV

> - JSON

> - XML

> 

> If the files are in the following formats, the are NOT machine readable:

> 

> - HTML

> - PDF

> - DOC

> - GIF

> - JPEG

> - PPT

> 

> If you have a different file type and you don't know if it's machine 

> readable or not, send an email to the Open Data Census list.

> 

> *Bulk*:

> Data is available in bulk if the whole dataset can be downloaded easily.

> It is considered non-bulk if the citizens are limited to getting parts 

> of the dataset through an online interface.

> 

> For example, if restricted to querying a web form and retrieving a few 

> results at a time from a very large database.

> 

> *

> *

> *

> Open Data Certificates*

> 

> Question: Is this data in a standard open format?

> 

> Help Text: Open standards are created through a fair, transparent and 

> collaborative process. Anyone can implement them and there's lots of 

> support so it's easier for you to share data with more people. For 

> example, XML, CSV and JSON are open standards. _Read more_. (links to 

>  <https://www.gov.uk/government/uploads/system/uploads/attachment_data/f>
https://www.gov.uk/government/uploads/system/uploads/attachment_data/f

> ile/183962/Open-Standards-Principles-FINAL.pdf)*

> *

> 

> 

> *Proposed changes*

> *A harmonised definition*

> 

> The *work* /must/be machine-readable and provided in an open format. 

> An open format is one which places no restrictions, monetary or 

> otherwise, upon its use and can be fully processed with at least one 

> free/libre/open-source software tool.

> 

> In addition:

> - Data /should/ be provided in bulk, i.e. the whole dataset can be 

> downloaded easily.

> - An open format /should/ be documented so it can be freely 

> implemented by others.

> - An open format /should/ be defined through a fair, transparent and 

> collaborative process.

> 

> *Open Data Census and Open Data Certificates* Adjust questions and 

> help text to reference the Open Format definition and/or conformant 

> licenses page.

> 

> *Open Definition site*

> - C

> onsider changing the page names from "Conformant Licences" and 

> "Conformant Formats" to "Open Licences" and "Open Formats".

> - Delete the open format definition page 

> < <http://opendefinition.org/ofd/> http://opendefinition.org/ofd/>. It is
replaced by the Open Formats 

> page and the updated Open Definition.

> 

> 

> 

> *What do you think? *

> Is this worth progressing? Could this extend to Open APIs like Web Map 

> Services (WMS)?

> 

> 

> thanks

> 

> Stephen Gates

> (localiser of the Open Data Census and Open Data Certificates in 

> Australia)

> 

> 

> 

> 

> _______________________________________________

> od-discuss mailing list

>  <mailto:od-discuss at lists.okfn.org> od-discuss at lists.okfn.org

>  <https://lists.okfn.org/mailman/listinfo/od-discuss>
https://lists.okfn.org/mailman/listinfo/od-discuss

> Unsubscribe:  <https://lists.okfn.org/mailman/options/od-discuss>
https://lists.okfn.org/mailman/options/od-discuss

> 

_______________________________________________

od-discuss mailing list

 <mailto:od-discuss at lists.okfn.org> od-discuss at lists.okfn.org

 <https://lists.okfn.org/mailman/listinfo/od-discuss>
https://lists.okfn.org/mailman/listinfo/od-discuss

Unsubscribe:  <https://lists.okfn.org/mailman/options/od-discuss>
https://lists.okfn.org/mailman/options/od-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/od-discuss/attachments/20150421/f63aadbb/attachment-0003.html>


More information about the od-discuss mailing list