[ckan-discuss] CKAN CSV alert

David Read david.read at okfn.org
Mon Jun 7 16:28:46 BST 2010


Christos,

Related to your enquiry, I've done some work on the CSV representation
of the data.gov.uk data, as found at: http://ckan.net/dump/

The escaping of quote marks is now improved. The CSV imported fine
into OpenOffice, but I understand that the mix of escaping and
quotation characters in the specific example you gave may well fox all
but the most sophisticated of parsers. I've improved the data itself,
avoiding extraneous slashes and quotes (which were down to mis-reading
the escape characters during import from another database), and the
record of interest should now parse fine, now looking like this: The
""""lower quartile"""" property

There are columns for up to 128 resources per package, which is a lot,
but this magnitude is needed for about a dozen packages, which have
figures released weekly, going back a few years. I agree this bulks
out the CSV file, since most records don't use these columns, but such
is the CSV format. If we just used a single column with JSON-formatted
resources (your suggestion), then you might as well use the fully JSON
formatted file that we supply.

To make it more readable to the human-eye, I've sorted the ordering of
the columns into a sensible order (rather allowing the random one) so
that you get: "id","name","title","version","url","author","author_email"
etc. This is also changed for the JSON file.

One last thing: I've changed the file compression from gzip to zip, so
that Windows machines can open these files more easily.

Many thanks for the feedback - I hope these changes cover your wishes.
Any more comments are most welcome.

David

On 12 May 2010 14:38, koumenides c.l. (clk1v07) <clk1v07 at ecs.soton.ac.uk> wrote:
> Hi,
>
> Thanks for your response. I will keep you posted of any other issues that might arise :)
>
> Best,
> Christos
> ________________________________________
> From: okfn.rufus.pollock at googlemail.com [okfn.rufus.pollock at googlemail.com] On Behalf Of Rufus Pollock [rufus.pollock at okfn.org]
> Sent: 12 May 2010 13:43
> To: Jonathan Gray
> Cc: koumenides c.l. (clk1v07); CKAN discuss
> Subject: Re: CKAN CSV alert
>
> Dear Christos,
>
> Thanks for reporting this. We'll look into this and try to get this
> fixed asap. In the mean time you could try using the JSON dump which
> is also available.
>
> Rufus
>
> On 12 May 2010 13:21, Jonathan Gray <jonathan.gray at okfn.org> wrote:
>> Thanks for flagging this up, Christos! I'm copying this to our
>> ckan-discuss list.
>>
>> Rufus, David, John: any ideas what might be up?
>>
>> All the best,
>>
>> Jonathan
>>
>> On Wed, May 12, 2010 at 1:58 PM, koumenides c.l. (clk1v07)
>> <clk1v07 at ecs.soton.ac.uk> wrote:
>>> Hi,
>>>
>>> I believe there is an error in all the CSV files of the CKAN dump - after - the January release.
>>>
>>> The error I find while parsing the files is this thing ->  The \\"lower quartile\"\"
>>> appearing inside an element, which breaks everything after.
>>>
>>> The same record has a name "ratio_of_lower_quartile_workplace_earnings_to_lower_quartile_house_prices".
>>>
>>> Regards,
>>>
>>> Christos Koumenides
>>>
>>> School of Electronics and Computer Science
>>> University of Southampton
>>> Southampton, SO17 1BJ
>>> United Kingdom
>>> ______________________
>>> clk1v07 at ecs.soton.ac.uk
>>> clk08r at ecs.soton.ac.uk
>>> (+44) 7975795758
>>
>>
>>
>> --
>> Jonathan Gray
>>
>> Community Coordinator
>> The Open Knowledge Foundation
>> http://blog.okfn.org
>>
>> http://twitter.com/jwyg
>> http://identi.ca/jwyg
>>
>
>
>
> --
> Open Knowledge Foundation
> Promoting Open Knowledge in a Digital Age
> http://www.okfn.org/ - http://blog.okfn.org/
>
> _______________________________________________
> ckan-discuss mailing list
> ckan-discuss at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>



More information about the ckan-discuss mailing list