[od-discuss] A harmonised Open Format definition

Stephen Gates stephen.gates at me.com
Tue Apr 21 13:18:17 UTC 2015


Hello Open Knowledge and Open Data Institute friends,

I would like to explore the possibility of aligning the Open Definition <http://opendefinition.org/od/>, Open Data Census <http://census.okfn.org/> and Open Data Certificates <https://certificates.theodi.org/> definitions for Open Format. This would enable the Census, Certificate and other open data tools to refer to the Open Definition for a definition of Open Format, in the same way they currently do for Open Licences.

To extend this concept further, I would like to mirror the Conformant Licenses <http://opendefinition.org/licenses/> page in the Open Definition with a Conformant Formats page. This would provide a list of file formats that conform with the Open Format definition. New formats could be submitted for assessment. Common formats (e.g. XML, JSON, KML, CSV, etc.) would be seeded on the page. Similar to the non-conformant licences <http://opendefinition.org/licenses/nonconformant/>  partially conforming formats could also be captured (e.g. XLS, SHP). This would cater for the spectrum of open file formats proposed by Tim Burners-Lee in his 5 star scheme <http://5stardata.info/>.


The respective definitions or help text are:

Open Definition draft 2.1
https://github.com/okfn/opendefinition/blob/master/source/open-definition-2.1-dev.markdown <https://github.com/okfn/opendefinition/blob/master/source/open-definition-2.1-dev.markdown> 
The work must be machine-readable and provided in an open format. An open format is one which places no restrictions, monetary or otherwise, upon its use and can be fully processed with at least one free/libre/open-source software tool. Data should be provided in bulk where possible.



Open Data Census 

see format, machine readable and bulk rows in Google Sheet,  https://docs.google.com/spreadsheet/ccc?key=0AqR8dXc6Ji4JdFI0QkpGUEZyS0wxYWtLdG1nTk9zU3c&usp=drive_web#gid=0

Format:
This question describes the form that the data is available in. For example, for tabular data it might be: Excel, CSV, HTML or even PDF. For geodata it might be shapefiles, geojson or something else. If available in multiple formats, the format descriptors are listed separated with commas. Any further information is put in the comments section.

Machine Readable:
Files are digital, yes, but not all can be processed or parsed easily by a computer. In order to answer this question, you would need to look at the datasets file type. 

As a rule of thumb the following file types are machine readable:

- XLS
- CSV
- JSON
- XML

If the files are in the following formats, the are NOT machine readable:

- HTML
- PDF
- DOC
- GIF
- JPEG
- PPT

If you have a different file type and you don’t know if it’s machine readable or not, send an email to the Open Data Census list.

Bulk:
Data is available in bulk if the whole dataset can be downloaded easily. It is considered non-bulk if the citizens are limited to getting parts of the dataset through an online interface.

For example, if restricted to querying a web form and retrieving a few results at a time from a very large database.


Open Data Certificates
Question: Is this data in a standard open format?

Help Text: Open standards are created through a fair, transparent and collaborative process. Anyone can implement them and there’s lots of support so it’s easier for you to share data with more people. For example, XML, CSV and JSON are open standards. Read more… (links to https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/183962/Open-Standards-Principles-FINAL.pdf <https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/183962/Open-Standards-Principles-FINAL.pdf>)


Proposed changes
A harmonised definition
The work must be machine-readable and provided in an open format. An open format is one which places no restrictions, monetary or otherwise, upon its use and can be fully processed with at least one free/libre/open-source software tool.

In addition:
- Data should be provided in bulk, i.e. the whole dataset can be downloaded easily.
- An open format should be documented so it can be freely implemented by others.
- An open format should be defined through a fair, transparent and collaborative process.



Open Data Census and Open Data Certificates
Adjust questions and help text to reference the Open Format definition and/or conformant licenses page.

Open Definition site
- Consider changing the page names from “Conformant Licences” and “Conformant Formats” to “Open Licences” and “Open Formats”. 
- Delete the open format definition page <http://opendefinition.org/ofd/>. It is replaced by the Open Formats page and the updated Open Definition.



What do you think? 
Is this worth progressing? Could this extend to Open APIs like Web Map Services (WMS)?


thanks

Stephen Gates
(localiser of the Open Data Census and Open Data Certificates in Australia)


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/od-discuss/attachments/20150421/8710c014/attachment-0002.html>


More information about the od-discuss mailing list