[ckan-dev] experimenting with datapusher on 2.2; possible to integrate dataexplorer in CKAN where messytables fails?

Ross Jones ross at servercode.co.uk
Thu Feb 13 16:25:25 UTC 2014


I think the problem might be worse than you think, there appears to be what are intended to be table separators every so often in the file, I guess it was an attempt at merging multiple tables into one file.

Check http://datapipes.okfnlabs.org/csv/html?url=http://www.bis.org/statistics/cbs-hanx9e.csv at line 2285 - and there are others.

Definitely one for http://okfnlabs.org/bad-data/

Sorry I can’t be more useful (I think the header row isn’t being used because of the first empty column, but can’t think of an easy online tool to help with that, and even if you could the other ‘tables’ in the doc would cause problems.

Ross


On 1 Feb 2014, at 13:23, Colum McCoole <colum.mccoole at btinternet.com> wrote:

> Hi,
> I’ve been playing around with datapusher on CKAN 2.2-7.
> In this case, I loaded a new resource pointing to this url: http://www.bis.org/statistics/cbs-hanx9e.csv
> Admittedly, it’s not an ideally structured csv, with lots of miscellaneous headers and dates running across columns.
>  
> Preliminary report: January 2014,,,,,,,,,,,,,
> Table 9E: Consolidated foreign claims and other potential exposures - ultimate risk basis,,,,,,,,,,,,,
> On individual countries by nationality of reporting banks / Amounts outstanding,,,,,,,,,,,,,
> In millions of US dollars,,,,,,,,,,,,,
> Total of 24 countries,,,,,,,,,,,,,
> ,,Dec.10,Mar.11,Jun.11,Sep.11,Dec.11,Mar.12,Jun.12,Sep.12,Dec.12,Mar.13,Jun.13,Sep.13
> All countries,Q:M:U:B:S:5A:3P,24932945,26375354,26905472,26188157,24833052,25793807,25119987,25440726,25434542,25231152,25021882,25430824
> All countries,Q:M:U:B:G:5A:3P,4729120,5257768,5385779,5486604,5185414,5577508,5591167,5682772,5861812,5613650,5700823,5745401
> All countries,Q:M:U:B:F:5A:3P,5637158,5900122,5957440,5755002,5192428,5269278,5035970,5089680,4972414,5102973,4989814,5024444
>  
> Datapusher (with the aid of messytables) copes reasonably well (see screenshot below), although it takes the first row of values as column headers.
>  
> <image001.png>
>  
> Is there any possibility of intervening on how this is parsed.
> I can see various interesting initiatives in okfnlabs, such as dataexplorer and datapipes.
> I know something like dataexplorer would help, but is it possible to integrate that into a CKAN instance in any way?
> If not, is there any way to associate a parsing script (or a datapipes command) with that resource to clean data before it gets pushed to the datastore?
> Failing that, I suppose one would just pre-clean the data and then use ckanapi to push the cleaned data as a data_dict.
>  
> Thanks,
> Colum
>  
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20140213/cd66936d/attachment-0003.html>


More information about the ckan-dev mailing list