[ckan-discuss] CKAN CSV alert

David Read david.read at okfn.org
Tue Jun 8 09:31:23 BST 2010


Christos,

Very interesting to see your linked data. Looks great! Will Waites has
been doing something similar and it will be interesting to see the
fruits of these.

David

On 8 June 2010 00:54, koumenides c.l. (clk1v07) <clk1v07 at ecs.soton.ac.uk> wrote:
> Hi David,
>
> Thanks for the update. I think it would also be helpful if you could keep the order of the columns consistent across different releases. But I think this is something that you have already suggested.
>
> Everything else is perfectly fine on our end. The CSV files parse without problems. We have a linked data version @ http://bagatelles.ecs.soton.ac.uk/psi/federator/data.gov.uk (you can use the URIs from data.gov.uk to access the records) which we are working on to integrate with other catalogues.
>
> Thanks again for the update.
>
> Regards,
> Christos
> ________________________________________
> From: d.t.read at gmail.com [d.t.read at gmail.com] On Behalf Of David Read [david.read at okfn.org]
> Sent: 07 June 2010 16:28
> To: koumenides c.l. (clk1v07)
> Cc: rufus.pollock at okfn.org; Jonathan Gray; CKAN discuss
> Subject: Re: [ckan-discuss] CKAN CSV alert
>
> Christos,
>
> Related to your enquiry, I've done some work on the CSV representation
> of the data.gov.uk data, as found at: http://ckan.net/dump/
>
> The escaping of quote marks is now improved. The CSV imported fine
> into OpenOffice, but I understand that the mix of escaping and
> quotation characters in the specific example you gave may well fox all
> but the most sophisticated of parsers. I've improved the data itself,
> avoiding extraneous slashes and quotes (which were down to mis-reading
> the escape characters during import from another database), and the
> record of interest should now parse fine, now looking like this: The
> """"lower quartile"""" property
>
> There are columns for up to 128 resources per package, which is a lot,
> but this magnitude is needed for about a dozen packages, which have
> figures released weekly, going back a few years. I agree this bulks
> out the CSV file, since most records don't use these columns, but such
> is the CSV format. If we just used a single column with JSON-formatted
> resources (your suggestion), then you might as well use the fully JSON
> formatted file that we supply.
>
> To make it more readable to the human-eye, I've sorted the ordering of
> the columns into a sensible order (rather allowing the random one) so
> that you get: "id","name","title","version","url","author","author_email"
> etc. This is also changed for the JSON file.
>
> One last thing: I've changed the file compression from gzip to zip, so
> that Windows machines can open these files more easily.
>
> Many thanks for the feedback - I hope these changes cover your wishes.
> Any more comments are most welcome.
>
> David
>
> On 12 May 2010 14:38, koumenides c.l. (clk1v07) <clk1v07 at ecs.soton.ac.uk> wrote:
>> Hi,
>>
>> Thanks for your response. I will keep you posted of any other issues that might arise :)
>>
>> Best,
>> Christos
>> ________________________________________
>> From: okfn.rufus.pollock at googlemail.com [okfn.rufus.pollock at googlemail.com] On Behalf Of Rufus Pollock [rufus.pollock at okfn.org]
>> Sent: 12 May 2010 13:43
>> To: Jonathan Gray
>> Cc: koumenides c.l. (clk1v07); CKAN discuss
>> Subject: Re: CKAN CSV alert
>>
>> Dear Christos,
>>
>> Thanks for reporting this. We'll look into this and try to get this
>> fixed asap. In the mean time you could try using the JSON dump which
>> is also available.
>>
>> Rufus
>>
>> On 12 May 2010 13:21, Jonathan Gray <jonathan.gray at okfn.org> wrote:
>>> Thanks for flagging this up, Christos! I'm copying this to our
>>> ckan-discuss list.
>>>
>>> Rufus, David, John: any ideas what might be up?
>>>
>>> All the best,
>>>
>>> Jonathan
>>>
>>> On Wed, May 12, 2010 at 1:58 PM, koumenides c.l. (clk1v07)
>>> <clk1v07 at ecs.soton.ac.uk> wrote:
>>>> Hi,
>>>>
>>>> I believe there is an error in all the CSV files of the CKAN dump - after - the January release.
>>>>
>>>> The error I find while parsing the files is this thing ->  The \\"lower quartile\"\"
>>>> appearing inside an element, which breaks everything after.
>>>>
>>>> The same record has a name "ratio_of_lower_quartile_workplace_earnings_to_lower_quartile_house_prices".
>>>>
>>>> Regards,
>>>>
>>>> Christos Koumenides
>>>>
>>>> School of Electronics and Computer Science
>>>> University of Southampton
>>>> Southampton, SO17 1BJ
>>>> United Kingdom
>>>> ______________________
>>>> clk1v07 at ecs.soton.ac.uk
>>>> clk08r at ecs.soton.ac.uk
>>>> (+44) 7975795758
>>>
>>>
>>>
>>> --
>>> Jonathan Gray
>>>
>>> Community Coordinator
>>> The Open Knowledge Foundation
>>> http://blog.okfn.org
>>>
>>> http://twitter.com/jwyg
>>> http://identi.ca/jwyg
>>>
>>
>>
>>
>> --
>> Open Knowledge Foundation
>> Promoting Open Knowledge in a Digital Age
>> http://www.okfn.org/ - http://blog.okfn.org/
>>
>> _______________________________________________
>> ckan-discuss mailing list
>> ckan-discuss at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>>
>



More information about the ckan-discuss mailing list