[ckan-dev] Offering the same resource in multiple file formats

Derek Hohls dhohls at csir.co.za
Mon Mar 27 10:30:23 UTC 2017


Hi Florian

I am speaking somewhat "from "the side" - our group is involved in a CKAN implementation but I am not at the core level. Nonetheless,  a large part of my work does deal with data ingestion and processing.  Speaking with my 'relational database' hat on, I would think that keeping multiple copies of the same dataset is very problematic. Given how powerful the tools in the Python data science libraries are, I would argue for keeping one version (preferably the original) of the source data, and putting tools or data processing chains into place (these can be run in the background; for example, as Celery processes) that allow on-the-fly conversion to formats such as Excel or JSON.  Add in some caching and the more common requests should be able to be handled fairly efficiently.  

A longer term option would be to renegotiate with the upstream data provider to supply data in the format that most users seem to be asking for (or perhaps one that is most amenable for many transformation options).

Just some ideas for discussion.

Derek


>>> <Florian.Brucker at it.karlsruhe.de> 03/24/17 5:00 PM >>>
I've been thinking a bit about how to present the same resource in multiple formats to the user from a UI perspective. 
 
The obvious way is to create a separate copy of the resource for each secondary format (say, an XLSX-copy of each CSV-resource). This has the benefit that the secondary resource is, from a UI perspective, just another resource, and all of CKAN's features (search, facets, API access, ...) work as expected. A disadvantage, however, is that we now have two (or even more) copies of the same resource that only differ in their format. Not only need all of those copies to be kept in sync (can be automated, but still), but it might confuse users who now wonder if there are any differences between these resources. 
 
A second possibility would therefore be to somehow "augment" the original resource with the other formats. There are multiple ways of doing this (e.g. injecting conversion links via the templates), but all of these will break many CKAN features. 
 
Finally, one could use a hybrid approach by creating full-blown resources as in the first approach but combining them into a single pseudo-resource for display purposes in the templates. 
 
Honestly I'm not happy with either of these approaches, so I'd love to hear some other ideas on how to tackle this. 
 
 
Regards, 
Florian
  
 
"ckan-dev" <ckan-dev-bounces at lists.okfn.org> schrieb am 07.03.2017 14:19:24:
 
 > Von: Florian.Brucker at it.karlsruhe.de 
> An: ckan-dev at lists.okfn.org,  
> Datum: 07.03.2017 14:19 
> Betreff: [ckan-dev] Offering the same resource in multiple file formats 
> Gesendet von: "ckan-dev" <ckan-dev-bounces at lists.okfn.org> 
> 
 > Hi everybody, 
 > 
 > I often would like to offer the same resource in multiple file 
 > formats. For example, Excel's auto-import for CSV is rather broken, 
 > so instead of mangling all our CSV-files to suit Excel's needs I'd 
 > rather just offer XSLX-files of the same data in addition to 
 > "standard"-compliant CSV-files for everybody else. 
 > 
 > However, I definitely don't want to manually maintain the separate 
 > versions. Has anybody set up automated ways of doing this? Off the 
 > top of my head, I could image 
 > 
 > 1. Generating converted copies when the original resource is created/modified
 > 2. Generating converted copies when they are requested 
 > 
 > Both have their pros and cons, so I'd love the hear some real-world 
 > experiences. 
 > 
 > In addition I'm wondering about the best way to present this choice 
 > to the user. 
 > 
 > 
 > Regards, 
 > Florian_______________________________________________
 > ckan-dev mailing list
 > ckan-dev at lists.okfn.org
 > https://lists.okfn.org/mailman/listinfo/ckan-dev
 > Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
 


--

This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard. 
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html. 

Please consider the environment before printing this email. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20170327/46bd3bfd/attachment-0003.html>


More information about the ckan-dev mailing list