[open-government] More than eightly formats for open government data

Stefan Szilva stefan_szilva at users.sourceforge.net
Sun Sep 9 20:03:23 UTC 2012


> I have done a small study to understand the kind of data formats acceptable
> by different govt data portals worldwide..

Hi,

as a member of some working groups of the Slovak Committee for  
Standardization of Information Systems of Public Administration (IS  
PA), I would like to add some information about mandatory file formats  
in Slovakia (Slovak Republic):

Public administration in Slovakia must use file formats defined in the  
Edict About Standards for IS PA (published and annexed several times  
since 2006). It is a legislative document. In the case of  
incompliance, Ministry of Finance may impose sanctions from 2 000 EUR  
up to 35 000 EUR (to date, no sanctions have been imposed in spite of  
incompliances).
http://www.informatizacia.sk/standards-for-is-pa/4632s

- for text files, public administration must use one of these file  
formats: HTML, XHTML, OpenDocument 1.0-1.2, PDF 1.3-1.5, RTF, TXT

- for files that include tables, public administration must use one of  
these file formats:
a) if the file is a "static" list of data: HTML, XHTML, OpenDocument  
1.0-1.2, PDF 1.3-1.5, RTF, TXT
b) if it is necessary to preserve some active functions/formulas in  
the table: arbitrary file format; open and technologically neutral  
file formats are recommended (but this vague rule should change to  
some concrete open formats)

- for graphic files:
a) raster graphics: GIF, PNG, JPEG, TIFF
b) vector graphics: PDF 1.3-1.5, SVG, SWF

- for audio files:
a) LPCM in WAV or AIFF
b) Ogg Vorbis
c) MPEG formats

- for video files:
a) containers: Ogg or MPEG formats
b) video compression: Theora or MPEG compressions
c) audio compression: Vorbis or MPEG compressions

- for file compression: ZIP 2.0, GZIP, TAR

- for data exchange between information systems of public  
administration: XML, XSD, XSLT, GML

- for web services: SOAP, WSDL, UDDI, HTTP,
(including map services) OpenGIS WebMap Service, OpenGIS Web Feature  
Service, OpenGIS Web Coverage Service, OpenGIS Web Processing Service,  
OpenGIS Catalog Service for Web

... etc

Stefan Szilva


Citát neeta <neeta at nic.in>:

> thankyou all for your valuable input..
> here are my two concerns
> 1. while structure formats from where data could be easily extracted  
> or converted to other format is ok (still if we could reduce this  
> list to 10.. would be good!)
> but when we start publishing in unstructured formats such as HTML,  
> PDF  etc.. it may become very difficult to extract data from them
>
> 2. i agree that we should  allow  these formats if it is not  
> feasible (practically, cost effective..) to covert in any  
> structured/ open  format..
> problem in that case would be that  once we allow  publishing in  
> these unstructured formats  , monitoring would be difficult..
> many a times we may get data in these formats due to sheer  
> convenience or more often lack of appreciation about structured vs  
> unstructured formats
>
> neeta
>
> On 09/08/12, Adi Eyal <adi at burgercom.co.za> wrote:
>>  It seems like wasted effort to try to standardise file formats. As  
>> long as the format in question is open and preferably has open  
>> source libraries I think it should be good enough. Many of the  
>> formats mentioned in your list cannot be merged.
>>
>> Here are some comments:
>> CSV: This is the king of simple formats. It is a de facto standard  
>> but doesn't (to be best of my knowledge) have an official  
>> definition. Typically it's preferable to store tabular data in csv  
>> rather than XLS but it isn't always clear how to implement the  
>> conversion. How do you cope with multiple sheets, formulae etc?
>> XML: This is a hierarchical data format that cannot be mapped to  
>> csv. It is related to JSON, but the former is the more formal of  
>> the type defining doctypes, schemata and namespaces.
>> PDF: I'm guessing this data type is included in the case of last  
>> resort - i.e. it's difficult or extremely costly to convert it to  
>> another format. No-one wants data in PDF but it's better to upload  
>> existing PDFs than nothing at all
>> HTML: This could be just about anything - an html table,java_script  
>> calculator, unstructured text - we cannot define a generic conversion
>> KML, GeoRSS, SHP, KMZ and others are geographical file formats - I  
>> don't know enough about them to argue for one or the other but it's  
>> not clear that they are 100% equivalent.
>> DOC: Probably best to convert these to an open document format
>>
>>
>> The other formats are less common but generally have a specific  
>> purpose (and often are consumed by specific software). Converting  
>> these may either be impossible or extremely costly as the consuming  
>> software may have to be changed as well to accommodate the new file  
>> format.
>>
>>
>> In short - we should be promoting open standards but at the same  
>> time be as pragmatic as possible so as to avoid increasing the  
>> burden on those attempting to publish data.
>>
>>
>> Adi
>>
>>
>>
>>
>>  On 8 September 2012 10:05, neeta <neeta at nic.in> wrote:
>>
>> >  I have done a small study to understand the kind of data formats  
>> acceptable
>> > by different govt  data portals worldwide..
>> > and was surprised to find that there are more than 80  
>> formats..(excel sheet enclosed for details)
>> >
>> > while i understand, it is possible to convert many of these  
>> formats from one to other
>> > still it is a kind of inconvenience or an extra effort !
>> >
>> > should we not work towards some sort of standardization or  
>> atleast shorten the list of data formats
>> > may be within ten?
>> >
>> > --
>> > neeta verma
>> > senior technical director
>> > data centre & web services division
>> > national informatics centre, India
>> > http://india.gov.in(http://india.gov.in/)
>> >
>> > _______________________________________________
>> > open-government mailing list
>> > open-government at lists.okfn.org
>> > http://lists.okfn.org/mailman/listinfo/open-government
>> >
>> >
>>
>>
>>
>>
>>
>> --
>> Adi Eyal
>> Data Specialist
>> phone: +27 78 014 2469
>> skype: adieyalcas
>> linkedin: http://za.linkedin.com/pub/dir/Adi/Eyal
>>
>>
>>
>>
>>
> --
> neeta verma
> senior technical director
> data centre & web services division
> national informatics centre
>








More information about the open-government mailing list