[od-discuss] A harmonised Open Format definition
Andrew Stott
andrew.stott at dirdigeng.com
Tue Apr 21 18:16:28 UTC 2015
I had been puzzling over some of the same issues as Aaron.
A set of PNG format files would be a re-usable way of sharing digital images
of paintings in a gallery's collection but it would not be an (easily)
re-usable way of sharing a national budget.
We also need to be careful about terms like "machine-readable" - a PNG file
of a national budget is machine-readable (or, at least, more readable by a
machine than by a human!) but its machine-readability does not make the data
in it easily reusable.
HTML illustrates a further difficulty - the reusability of a dataset may
depend on how it is encoded in HTML - if it is just text then it is more
difficult to parse programmatically than if it structured with semantic
tagging. (Similarly there is some pretty tricky XML around for the same
reason.)
A lot of work was done a few years ago on the definition of an "Open
Standard", including a legal text that a number of Members of the European
Parliament tried to insert into EU law. There is a link between that work
and the Open Format initiative.
-----Original Message-----
From: od-discuss [mailto:od-discuss-bounces at lists.okfn.org] On Behalf Of
Aaron Wolf
Sent: 21 April 2015 17:31
To: od-discuss at lists.okfn.org
Subject: Re: [od-discuss] A harmonised Open Format definition
I think this concept is excellent, the consolidation and the clarification
of having a page of recognized Open formats.
Your points here bring up a serious problem though.
If HTML and JPEG formats are considered non-machine-readable then we
absolutely *have* to *remove* the machine-readable requirement from the Open
Definition. This is a very serious issue. The OD covers thing like
photographs and other images along with stuff like the writings on a blog.
It is absolutely unacceptable to have the "machine-readable" part in the
requirements for format in the OD if it excludes these things!
I think we need to fix OD 2.1 to clarify that what is considered an Open
Format depends on the type of content. Obviously, a JPEG screenshot of a
webpage is not an Open Format for the webpage content, but HTML is. But we
cannot say that a JPEG photograph is non-Open format. We could say that Open
Data specifically should not be HTML. but I'm not certain about that bit.
This absolutely must be addressed and clarified.
FWIW, I collected some initial bits about what qualifies as Open Format at
<https://snowdrift.coop/p/snowdrift/w/en/formats-repositories>
https://snowdrift.coop/p/snowdrift/w/en/formats-repositories and in that
case it is listed by the sort of project we're talking about. I would love
to see this more formally included in the OD.
Best,
Aaron
On 04/21/2015 06:18 AM, Stephen Gates wrote:
> Hello Open Knowledge and Open Data Institute friends,
>
> I would like to explore the possibility of aligning the Open
> Definition < <http://opendefinition.org/od/>
http://opendefinition.org/od/>, Open Data Census
> < <http://census.okfn.org> http://census.okfn.org> and Open Data
Certificates
> < <https://certificates.theodi.org> https://certificates.theodi.org>
definitions for Open Format. This
> would enable the Census, Certificate and other open data tools to
> refer to the Open Definition for a definition of Open Format, in the
> same way they currently do for Open Licences.
>
> To extend this concept further, I would like to mirror the Conformant
> Licenses < <http://opendefinition.org/licenses/>
http://opendefinition.org/licenses/> page in the Open
> Definition with a Conformant Formats page. This would provide a list
> of file formats that conform with the Open Format definition. New
> formats could be submitted for assessment. Common formats (e.g. XML,
> JSON, KML, CSV, etc.) would be seeded on the page. Similar to the
> non-conformant licences
> < <http://opendefinition.org/licenses/nonconformant/>
http://opendefinition.org/licenses/nonconformant/> partially
> conforming formats could also be captured (e.g. XLS, SHP). This would
> cater for the spectrum of open file formats proposed by Tim Burners-Lee in
his 5 star scheme < <http://5stardata.info> http://5stardata.info>.
>
>
> The respective definitions or help text are:
>
> *Open Definition* draft 2.1
> <https://github.com/okfn/opendefinition/blob/master/source/open-definit>
https://github.com/okfn/opendefinition/blob/master/source/open-definit
> ion-2.1-dev.markdown
>
> The *work*/must/be machine-readable and provided in an open format. An
> open format is one which places no restrictions, monetary or
> otherwise, upon its use and can be fully processed with at least one
> free/libre/open-source software tool. Data /should/be provided in bulk
> where possible.
>
> *
> *
>
> *Open Data Census*
>
> see format, machine readable and bulk rows in Google Sheet,
>
> <https://docs.google.com/spreadsheet/ccc?key=0AqR8dXc6Ji4JdFI0QkpGUEZyS>
https://docs.google.com/spreadsheet/ccc?key=0AqR8dXc6Ji4JdFI0QkpGUEZyS
> 0wxYWtLdG1nTk9zU3c&usp=drive_web#gid=0
>
> *Format*:
> This question describes the form that the data is available in. For
> example, for tabular data it might be: Excel, CSV, HTML or even PDF.
> For geodata it might be shapefiles, geojson or something else. If
> available in multiple formats, the format descriptors are listed
> separated with commas. Any further information is put in the comments
section.
>
> *Machine Readable*:
> Files are digital, yes, but not all can be processed or parsed easily
> by a computer. In order to answer this question, you would need to
> look at the datasets file type.
>
> As a rule of thumb the following file types are machine readable:
>
> - XLS
> - CSV
> - JSON
> - XML
>
> If the files are in the following formats, the are NOT machine readable:
>
> - HTML
> - PDF
> - DOC
> - GIF
> - JPEG
> - PPT
>
> If you have a different file type and you don't know if it's machine
> readable or not, send an email to the Open Data Census list.
>
> *Bulk*:
> Data is available in bulk if the whole dataset can be downloaded easily.
> It is considered non-bulk if the citizens are limited to getting parts
> of the dataset through an online interface.
>
> For example, if restricted to querying a web form and retrieving a few
> results at a time from a very large database.
>
> *
> *
> *
> Open Data Certificates*
>
> Question: Is this data in a standard open format?
>
> Help Text: Open standards are created through a fair, transparent and
> collaborative process. Anyone can implement them and there's lots of
> support so it's easier for you to share data with more people. For
> example, XML, CSV and JSON are open standards. _Read more_. (links to
> <https://www.gov.uk/government/uploads/system/uploads/attachment_data/f>
https://www.gov.uk/government/uploads/system/uploads/attachment_data/f
> ile/183962/Open-Standards-Principles-FINAL.pdf)*
> *
>
>
> *Proposed changes*
> *A harmonised definition*
>
> The *work* /must/be machine-readable and provided in an open format.
> An open format is one which places no restrictions, monetary or
> otherwise, upon its use and can be fully processed with at least one
> free/libre/open-source software tool.
>
> In addition:
> - Data /should/ be provided in bulk, i.e. the whole dataset can be
> downloaded easily.
> - An open format /should/ be documented so it can be freely
> implemented by others.
> - An open format /should/ be defined through a fair, transparent and
> collaborative process.
>
> *Open Data Census and Open Data Certificates* Adjust questions and
> help text to reference the Open Format definition and/or conformant
> licenses page.
>
> *Open Definition site*
> - C
> onsider changing the page names from "Conformant Licences" and
> "Conformant Formats" to "Open Licences" and "Open Formats".
> - Delete the open format definition page
> < <http://opendefinition.org/ofd/> http://opendefinition.org/ofd/>. It is
replaced by the Open Formats
> page and the updated Open Definition.
>
>
>
> *What do you think? *
> Is this worth progressing? Could this extend to Open APIs like Web Map
> Services (WMS)?
>
>
> thanks
>
> Stephen Gates
> (localiser of the Open Data Census and Open Data Certificates in
> Australia)
>
>
>
>
> _______________________________________________
> od-discuss mailing list
> <mailto:od-discuss at lists.okfn.org> od-discuss at lists.okfn.org
> <https://lists.okfn.org/mailman/listinfo/od-discuss>
https://lists.okfn.org/mailman/listinfo/od-discuss
> Unsubscribe: <https://lists.okfn.org/mailman/options/od-discuss>
https://lists.okfn.org/mailman/options/od-discuss
>
_______________________________________________
od-discuss mailing list
<mailto:od-discuss at lists.okfn.org> od-discuss at lists.okfn.org
<https://lists.okfn.org/mailman/listinfo/od-discuss>
https://lists.okfn.org/mailman/listinfo/od-discuss
Unsubscribe: <https://lists.okfn.org/mailman/options/od-discuss>
https://lists.okfn.org/mailman/options/od-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/od-discuss/attachments/20150421/f63aadbb/attachment-0003.html>
More information about the od-discuss
mailing list