[od-discuss] A harmonised Open Format definition
andrew.stott at dirdigeng.com
Tue Apr 21 18:16:28 UTC 2015
I had been puzzling over some of the same issues as Aaron.
A set of PNG format files would be a re-usable way of sharing digital images
of paintings in a gallery's collection but it would not be an (easily)
re-usable way of sharing a national budget.
We also need to be careful about terms like "machine-readable" - a PNG file
of a national budget is machine-readable (or, at least, more readable by a
machine than by a human!) but its machine-readability does not make the data
in it easily reusable.
HTML illustrates a further difficulty - the reusability of a dataset may
depend on how it is encoded in HTML - if it is just text then it is more
difficult to parse programmatically than if it structured with semantic
tagging. (Similarly there is some pretty tricky XML around for the same
A lot of work was done a few years ago on the definition of an "Open
Standard", including a legal text that a number of Members of the European
Parliament tried to insert into EU law. There is a link between that work
and the Open Format initiative.
From: od-discuss [mailto:od-discuss-bounces at lists.okfn.org] On Behalf Of
Sent: 21 April 2015 17:31
To: od-discuss at lists.okfn.org
Subject: Re: [od-discuss] A harmonised Open Format definition
I think this concept is excellent, the consolidation and the clarification
of having a page of recognized Open formats.
Your points here bring up a serious problem though.
If HTML and JPEG formats are considered non-machine-readable then we
absolutely *have* to *remove* the machine-readable requirement from the Open
Definition. This is a very serious issue. The OD covers thing like
photographs and other images along with stuff like the writings on a blog.
It is absolutely unacceptable to have the "machine-readable" part in the
requirements for format in the OD if it excludes these things!
I think we need to fix OD 2.1 to clarify that what is considered an Open
Format depends on the type of content. Obviously, a JPEG screenshot of a
webpage is not an Open Format for the webpage content, but HTML is. But we
cannot say that a JPEG photograph is non-Open format. We could say that Open
Data specifically should not be HTML. but I'm not certain about that bit.
This absolutely must be addressed and clarified.
FWIW, I collected some initial bits about what qualifies as Open Format at
https://snowdrift.coop/p/snowdrift/w/en/formats-repositories and in that
case it is listed by the sort of project we're talking about. I would love
to see this more formally included in the OD.
On 04/21/2015 06:18 AM, Stephen Gates wrote:
> Hello Open Knowledge and Open Data Institute friends,
> I would like to explore the possibility of aligning the Open
> Definition < <http://opendefinition.org/od/>
http://opendefinition.org/od/>, Open Data Census
> < <http://census.okfn.org> http://census.okfn.org> and Open Data
> < <https://certificates.theodi.org> https://certificates.theodi.org>
definitions for Open Format. This
> would enable the Census, Certificate and other open data tools to
> refer to the Open Definition for a definition of Open Format, in the
> same way they currently do for Open Licences.
> To extend this concept further, I would like to mirror the Conformant
> Licenses < <http://opendefinition.org/licenses/>
http://opendefinition.org/licenses/> page in the Open
> Definition with a Conformant Formats page. This would provide a list
> of file formats that conform with the Open Format definition. New
> formats could be submitted for assessment. Common formats (e.g. XML,
> JSON, KML, CSV, etc.) would be seeded on the page. Similar to the
> non-conformant licences
> < <http://opendefinition.org/licenses/nonconformant/>
> conforming formats could also be captured (e.g. XLS, SHP). This would
> cater for the spectrum of open file formats proposed by Tim Burners-Lee in
his 5 star scheme < <http://5stardata.info> http://5stardata.info>.
> The respective definitions or help text are:
> *Open Definition* draft 2.1
> The *work*/must/be machine-readable and provided in an open format. An
> open format is one which places no restrictions, monetary or
> otherwise, upon its use and can be fully processed with at least one
> free/libre/open-source software tool. Data /should/be provided in bulk
> where possible.
> *Open Data Census*
> see format, machine readable and bulk rows in Google Sheet,
> This question describes the form that the data is available in. For
> example, for tabular data it might be: Excel, CSV, HTML or even PDF.
> For geodata it might be shapefiles, geojson or something else. If
> available in multiple formats, the format descriptors are listed
> separated with commas. Any further information is put in the comments
> *Machine Readable*:
> Files are digital, yes, but not all can be processed or parsed easily
> by a computer. In order to answer this question, you would need to
> look at the datasets file type.
> As a rule of thumb the following file types are machine readable:
> - XLS
> - CSV
> - JSON
> - XML
> If the files are in the following formats, the are NOT machine readable:
> - HTML
> - PDF
> - DOC
> - GIF
> - JPEG
> - PPT
> If you have a different file type and you don't know if it's machine
> readable or not, send an email to the Open Data Census list.
> Data is available in bulk if the whole dataset can be downloaded easily.
> It is considered non-bulk if the citizens are limited to getting parts
> of the dataset through an online interface.
> For example, if restricted to querying a web form and retrieving a few
> results at a time from a very large database.
> Open Data Certificates*
> Question: Is this data in a standard open format?
> Help Text: Open standards are created through a fair, transparent and
> collaborative process. Anyone can implement them and there's lots of
> support so it's easier for you to share data with more people. For
> example, XML, CSV and JSON are open standards. _Read more_. (links to
> *Proposed changes*
> *A harmonised definition*
> The *work* /must/be machine-readable and provided in an open format.
> An open format is one which places no restrictions, monetary or
> otherwise, upon its use and can be fully processed with at least one
> free/libre/open-source software tool.
> In addition:
> - Data /should/ be provided in bulk, i.e. the whole dataset can be
> downloaded easily.
> - An open format /should/ be documented so it can be freely
> implemented by others.
> - An open format /should/ be defined through a fair, transparent and
> collaborative process.
> *Open Data Census and Open Data Certificates* Adjust questions and
> help text to reference the Open Format definition and/or conformant
> licenses page.
> *Open Definition site*
> - C
> onsider changing the page names from "Conformant Licences" and
> "Conformant Formats" to "Open Licences" and "Open Formats".
> - Delete the open format definition page
> < <http://opendefinition.org/ofd/> http://opendefinition.org/ofd/>. It is
replaced by the Open Formats
> page and the updated Open Definition.
> *What do you think? *
> Is this worth progressing? Could this extend to Open APIs like Web Map
> Services (WMS)?
> Stephen Gates
> (localiser of the Open Data Census and Open Data Certificates in
> od-discuss mailing list
> <mailto:od-discuss at lists.okfn.org> od-discuss at lists.okfn.org
> Unsubscribe: <https://lists.okfn.org/mailman/options/od-discuss>
od-discuss mailing list
<mailto:od-discuss at lists.okfn.org> od-discuss at lists.okfn.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the od-discuss