[open-government] More than eightly formats for open government data

Adi Eyal adi at burgercom.co.za
Sat Sep 8 18:08:06 UTC 2012


It seems like wasted effort to try to standardise file formats. As long as
the format in question is open and preferably has open source libraries I
think it should be good enough. Many of the formats mentioned in your list
cannot be merged.

Here are some comments:
CSV: This is the king of simple formats. It is a de facto standard but
doesn't (to be best of my knowledge) have an official definition. Typically
it's preferable to store tabular data in csv rather than XLS but it isn't
always clear how to implement the conversion. How do you cope with multiple
sheets, formulae etc?
XML: This is a hierarchical data format that cannot be mapped to csv. It is
related to JSON, but the former is the more formal of the type defining
doctypes, schemata and namespaces.
PDF: I'm guessing this data type is included in the case of last resort -
i.e. it's difficult or extremely costly to convert it to another format.
No-one wants data in PDF but it's better to upload existing PDFs than
nothing at all
HTML: This could be just about anything - an html table, javascript
calculator, unstructured text - we cannot define a generic conversion
KML, GeoRSS, SHP, KMZ and others are geographical file formats - I don't
know enough about them to argue for one or the other but it's not clear
that they are 100% equivalent.
DOC: Probably best to convert these to an open document format

The other formats are less common but generally have a specific purpose
(and often are consumed by specific software). Converting these may either
be impossible or extremely costly as the consuming software may have to be
changed as well to accommodate the new file format.

In short - we should be promoting open standards but at the same time be as
pragmatic as possible so as to avoid increasing the burden on those
attempting to publish data.

Adi


On 8 September 2012 10:05, neeta <neeta at nic.in> wrote:

> I have done a small study to understand the kind of data formats acceptable
> by different govt  data portals worldwide..
> and was surprised to find that there are more than 80 formats..(excel
> sheet enclosed for details)
>
> while i understand, it is possible to convert many of these formats from
> one to other
> still it is a kind of inconvenience or an extra effort !
>
> should we not work towards some sort of standardization or atleast shorten
> the list of data formats
> may be within ten?
>
> --
> neeta verma
> senior technical director
> data centre & web services division
> national informatics centre, India
> http://india.gov.in
>
> _______________________________________________
> open-government mailing list
> open-government at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-government
>
>


-- 
Adi Eyal
Data Specialist
phone: +27 78 014 2469
skype: adieyalcas
linkedin: http://za.linkedin.com/pub/dir/Adi/Eyal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-government/attachments/20120908/9a753a8b/attachment-0001.html>


More information about the open-government mailing list