[ckan-dev] experimenting with datapusher on 2.2; possible to integrate dataexplorer in CKAN where messytables fails?

Colum McCoole colum.mccoole at btinternet.com
Sat Feb 1 13:23:46 UTC 2014


Hi,

I've been playing around with datapusher on CKAN 2.2-7.

In this case, I loaded a new resource pointing to this url:
http://www.bis.org/statistics/cbs-hanx9e.csv

Admittedly, it's not an ideally structured csv, with lots of miscellaneous
headers and dates running across columns.

 

Preliminary report: January 2014,,,,,,,,,,,,,

Table 9E: Consolidated foreign claims and other potential exposures -
ultimate risk basis,,,,,,,,,,,,,

On individual countries by nationality of reporting banks / Amounts
outstanding,,,,,,,,,,,,,

In millions of US dollars,,,,,,,,,,,,,

Total of 24 countries,,,,,,,,,,,,,

,,Dec.10,Mar.11,Jun.11,Sep.11,Dec.11,Mar.12,Jun.12,Sep.12,Dec.12,Mar.13,Jun.
13,Sep.13

All
countries,Q:M:U:B:S:5A:3P,24932945,26375354,26905472,26188157,24833052,25793
807,25119987,25440726,25434542,25231152,25021882,25430824

All
countries,Q:M:U:B:G:5A:3P,4729120,5257768,5385779,5486604,5185414,5577508,55
91167,5682772,5861812,5613650,5700823,5745401

All
countries,Q:M:U:B:F:5A:3P,5637158,5900122,5957440,5755002,5192428,5269278,50
35970,5089680,4972414,5102973,4989814,5024444

 

Datapusher (with the aid of messytables) copes reasonably well (see
screenshot below), although it takes the first row of values as column
headers.

 



 

Is there any possibility of intervening on how this is parsed.

I can see various interesting initiatives in okfnlabs, such as dataexplorer
and datapipes.

I know something like dataexplorer <https://github.com/okfn/dataexplorer>
would help, but is it possible to integrate that into a CKAN instance in any
way?

If not, is there any way to associate a parsing script (or a datapipes
<https://github.com/okfn/datapipes>  command) with that resource to clean
data before it gets pushed to the datastore?

Failing that, I suppose one would just pre-clean the data and then use
ckanapi to push the cleaned data as a data_dict.

 

Thanks,

Colum

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20140201/76065f09/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 30600 bytes
Desc: not available
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20140201/76065f09/attachment-0002.png>


More information about the ckan-dev mailing list