[open-government] More than eightly formats for open government data

Michael Hausenblas michael.hausenblas at deri.org
Sun Sep 9 10:00:14 UTC 2012


Neeta,

> 1. while structure formats from where data could be easily extracted or converted to other format is ok (still if we could reduce this list to 10.. would be good!)
> but when we start publishing in unstructured formats such as HTML, PDF  etc.. it may become very difficult to extract data from them

Agreed. Hence, there is a step-by-step migration plan http://5stardata.info/

Cheers,
	   Michael

--
Dr. Michael Hausenblas, Research Fellow
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel.: +353 91 495730
http://mhausenblas.info/

On 9 Sep 2012, at 10:42, neeta wrote:

> thankyou all for your valuable input..
> here are my two concerns
> 1. while structure formats from where data could be easily extracted or converted to other format is ok (still if we could reduce this list to 10.. would be good!)
> but when we start publishing in unstructured formats such as HTML, PDF  etc.. it may become very difficult to extract data from them
>  
> 2. i agree that we should  allow  these formats if it is not feasible (practically, cost effective..) to covert in any structured/ open  format..
> problem in that case would be that  once we allow  publishing in these unstructured formats  , monitoring would be difficult..
> many a times we may get data in these formats due to sheer convenience or more often lack of appreciation about structured vs unstructured formats
>  
> neeta
>  
> On 09/08/12, Adi Eyal <adi at burgercom.co.za> wrote:
>> It seems like wasted effort to try to standardise file formats. As long as the format in question is open and preferably has open source libraries I think it should be good enough. Many of the formats mentioned in your list cannot be merged. 
>> 
>> Here are some comments:
>> CSV: This is the king of simple formats. It is a de facto standard but doesn't (to be best of my knowledge) have an official definition. Typically it's preferable to store tabular data in csv rather than XLS but it isn't always clear how to implement the conversion. How do you cope with multiple sheets, formulae etc?
>> XML: This is a hierarchical data format that cannot be mapped to csv. It is related to JSON, but the former is the more formal of the type defining doctypes, schemata and namespaces.
>> PDF: I'm guessing this data type is included in the case of last resort - i.e. it's difficult or extremely costly to convert it to another format. No-one wants data in PDF but it's better to upload existing PDFs than nothing at all
>> HTML: This could be just about anything - an html table,java_script calculator, unstructured text - we cannot define a generic conversion
>> KML, GeoRSS, SHP, KMZ and others are geographical file formats - I don't know enough about them to argue for one or the other but it's not clear that they are 100% equivalent.
>> DOC: Probably best to convert these to an open document format
>> 
>> The other formats are less common but generally have a specific purpose (and often are consumed by specific software). Converting these may either be impossible or extremely costly as the consuming software may have to be changed as well to accommodate the new file format.
>> 
>> In short - we should be promoting open standards but at the same time be as pragmatic as possible so as to avoid increasing the burden on those attempting to publish data.
>> 
>> Adi
>> 
>> 
>> On 8 September 2012 10:05, neeta <neeta at nic.in> wrote:
>> I have done a small study to understand the kind of data formats acceptable
>> by different govt  data portals worldwide..
>> and was surprised to find that there are more than 80 formats..(excel sheet enclosed for details)
>>  
>> while i understand, it is possible to convert many of these formats from one to other
>> still it is a kind of inconvenience or an extra effort !
>>  
>> should we not work towards some sort of standardization or atleast shorten the list of data formats
>> may be within ten?
>>  
>> --
>> neeta verma
>> senior technical director
>> data centre & web services division
>> national informatics centre, India
>> http://india.gov.in
>> 
>> _______________________________________________
>> open-government mailing list
>> open-government at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-government
>> 
>> 
>> 
>> 
>> -- 
>> Adi Eyal
>> Data Specialist
>> phone: +27 78 014 2469
>> skype: adieyalcas
>> linkedin: http://za.linkedin.com/pub/dir/Adi/Eyal
>> 
>> 
> --
> neeta verma
> senior technical director
> data centre & web services division
> national informatics centre _______________________________________________
> open-government mailing list
> open-government at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-government





More information about the open-government mailing list