[open-government] More than eightly formats for open government data
Stefan Szilva
stefan_szilva at users.sourceforge.net
Sun Sep 9 20:03:23 UTC 2012
> I have done a small study to understand the kind of data formats acceptable
> by different govt data portals worldwide..
Hi,
as a member of some working groups of the Slovak Committee for
Standardization of Information Systems of Public Administration (IS
PA), I would like to add some information about mandatory file formats
in Slovakia (Slovak Republic):
Public administration in Slovakia must use file formats defined in the
Edict About Standards for IS PA (published and annexed several times
since 2006). It is a legislative document. In the case of
incompliance, Ministry of Finance may impose sanctions from 2 000 EUR
up to 35 000 EUR (to date, no sanctions have been imposed in spite of
incompliances).
http://www.informatizacia.sk/standards-for-is-pa/4632s
- for text files, public administration must use one of these file
formats: HTML, XHTML, OpenDocument 1.0-1.2, PDF 1.3-1.5, RTF, TXT
- for files that include tables, public administration must use one of
these file formats:
a) if the file is a "static" list of data: HTML, XHTML, OpenDocument
1.0-1.2, PDF 1.3-1.5, RTF, TXT
b) if it is necessary to preserve some active functions/formulas in
the table: arbitrary file format; open and technologically neutral
file formats are recommended (but this vague rule should change to
some concrete open formats)
- for graphic files:
a) raster graphics: GIF, PNG, JPEG, TIFF
b) vector graphics: PDF 1.3-1.5, SVG, SWF
- for audio files:
a) LPCM in WAV or AIFF
b) Ogg Vorbis
c) MPEG formats
- for video files:
a) containers: Ogg or MPEG formats
b) video compression: Theora or MPEG compressions
c) audio compression: Vorbis or MPEG compressions
- for file compression: ZIP 2.0, GZIP, TAR
- for data exchange between information systems of public
administration: XML, XSD, XSLT, GML
- for web services: SOAP, WSDL, UDDI, HTTP,
(including map services) OpenGIS WebMap Service, OpenGIS Web Feature
Service, OpenGIS Web Coverage Service, OpenGIS Web Processing Service,
OpenGIS Catalog Service for Web
... etc
Stefan Szilva
Citát neeta <neeta at nic.in>:
> thankyou all for your valuable input..
> here are my two concerns
> 1. while structure formats from where data could be easily extracted
> or converted to other format is ok (still if we could reduce this
> list to 10.. would be good!)
> but when we start publishing in unstructured formats such as HTML,
> PDF etc.. it may become very difficult to extract data from them
>
> 2. i agree that we should allow these formats if it is not
> feasible (practically, cost effective..) to covert in any
> structured/ open format..
> problem in that case would be that once we allow publishing in
> these unstructured formats , monitoring would be difficult..
> many a times we may get data in these formats due to sheer
> convenience or more often lack of appreciation about structured vs
> unstructured formats
>
> neeta
>
> On 09/08/12, Adi Eyal <adi at burgercom.co.za> wrote:
>> It seems like wasted effort to try to standardise file formats. As
>> long as the format in question is open and preferably has open
>> source libraries I think it should be good enough. Many of the
>> formats mentioned in your list cannot be merged.
>>
>> Here are some comments:
>> CSV: This is the king of simple formats. It is a de facto standard
>> but doesn't (to be best of my knowledge) have an official
>> definition. Typically it's preferable to store tabular data in csv
>> rather than XLS but it isn't always clear how to implement the
>> conversion. How do you cope with multiple sheets, formulae etc?
>> XML: This is a hierarchical data format that cannot be mapped to
>> csv. It is related to JSON, but the former is the more formal of
>> the type defining doctypes, schemata and namespaces.
>> PDF: I'm guessing this data type is included in the case of last
>> resort - i.e. it's difficult or extremely costly to convert it to
>> another format. No-one wants data in PDF but it's better to upload
>> existing PDFs than nothing at all
>> HTML: This could be just about anything - an html table,java_script
>> calculator, unstructured text - we cannot define a generic conversion
>> KML, GeoRSS, SHP, KMZ and others are geographical file formats - I
>> don't know enough about them to argue for one or the other but it's
>> not clear that they are 100% equivalent.
>> DOC: Probably best to convert these to an open document format
>>
>>
>> The other formats are less common but generally have a specific
>> purpose (and often are consumed by specific software). Converting
>> these may either be impossible or extremely costly as the consuming
>> software may have to be changed as well to accommodate the new file
>> format.
>>
>>
>> In short - we should be promoting open standards but at the same
>> time be as pragmatic as possible so as to avoid increasing the
>> burden on those attempting to publish data.
>>
>>
>> Adi
>>
>>
>>
>>
>> On 8 September 2012 10:05, neeta <neeta at nic.in> wrote:
>>
>> > I have done a small study to understand the kind of data formats
>> acceptable
>> > by different govt data portals worldwide..
>> > and was surprised to find that there are more than 80
>> formats..(excel sheet enclosed for details)
>> >
>> > while i understand, it is possible to convert many of these
>> formats from one to other
>> > still it is a kind of inconvenience or an extra effort !
>> >
>> > should we not work towards some sort of standardization or
>> atleast shorten the list of data formats
>> > may be within ten?
>> >
>> > --
>> > neeta verma
>> > senior technical director
>> > data centre & web services division
>> > national informatics centre, India
>> > http://india.gov.in(http://india.gov.in/)
>> >
>> > _______________________________________________
>> > open-government mailing list
>> > open-government at lists.okfn.org
>> > http://lists.okfn.org/mailman/listinfo/open-government
>> >
>> >
>>
>>
>>
>>
>>
>> --
>> Adi Eyal
>> Data Specialist
>> phone: +27 78 014 2469
>> skype: adieyalcas
>> linkedin: http://za.linkedin.com/pub/dir/Adi/Eyal
>>
>>
>>
>>
>>
> --
> neeta verma
> senior technical director
> data centre & web services division
> national informatics centre
>
More information about the open-government
mailing list