[ckan-dev] Relationship between resource and datastore

Dominik Moritz dominik.moritz at okfn.org
Sun Dec 30 00:34:56 UTC 2012


Hi Lasse,

On 27 Dec 2012, at 13:21, Lasse Vestergaard <ibbernik at gmail.com> wrote:

> Hi all.
> 
> I have been trying to learn CKAN for a couple of weeks, and I have now come
> to a point where I don't get the conceptual idea behind the relationship
> between resource and the datastore.
> 
> As I understand it, you have attach a datastore to an existing resource in
> CKAN, but there doesn't have to be any resemblance between the datastore
> and the resource. This means that I can have an image as a resource, and
> then have a completely arbitrary datastore related to it. Someone could
> argue that this makes sense because the datastore is collecting data that
> relates to the image - the datastore could have data about how much people
> like the image. That might be a valid argument, but it seems odd, to me,
> that I can upload ex. a CSV file as a resource, and then put it into the
> datastore through datastorer (I haven't been able to get the datastorer up
> and running yet, but I assume that the datastorer creates a datastore that
> relates to the CSV resource?).

Yes, the DataStorer downloads the file from the url, parses it and writes the raw data to the DataStore. 

> This means that I have the CSV file as the
> resource and users can download this file. At the same time I can access
> the CSV file data through the datastore. Furthermore this means that I can
> add, update and delete data in the datastore. Lets say I have regularly
> been inserting new rows into the datastore, and after a while I want to get
> a dump of what is going on. If I am a normal user, I would just go and
> download the CSV file, but that would be wrong, because the CSV file
> doesn't hold any of the updates that have been made through the datastore.

The advantage of having a datastore table for a static CSV file is that there is a defined API to access the data of the resource and that it will be available as long as CKAN is available. I see the problem that arises if you edit data in the datastore. The way I'd do it is the following: If there will never be any edits to the datastore table of a resource, use the datastorer to import the file into the datastore and let your users benefit from the advantage of the powerful API and availability. In case someone wants to download the file, he/she can download the original resource. However, if there are edits to the resource in the datastore, only use the datastore. Nick Jackson had a great idea to add a converter that converts the JSON to CSV on the fly. I am currently working on an implementation for that at https://github.com/okfn/ckan-datastore-export-service. 

We are also working on a datastorer implementation as a service which will make the whole process of importing data into the DataStore much easier. There will be no need to install anything in CKAN anymore. However, for now, you will need the CKAN DataStorer extension.

> 
> All in all this means that the users would be disappointed when they find
> out that nothing has happened in relation to the resource. That is just one
> dimension of the issue. As I perceive it the CSV file is useless from the
> very beginning. The only reason why I would have the CSV file in the
> ressource is because I wanted to make the template for the datastore tabel.
> After that the CSV file would be deprecated because it doesn't get updated
> with the datastore. If that is the case, why couldn't I just have the
> datastore as a resource by it self in the first place? I could just setup
> the tabel template through the datastore API.
> 
> It might easily by me that is missing some point about CKAN.
> 
> Regards
> 
> Lasse Vestregaard (Denmark)

Best,
Dominik

> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev





More information about the ckan-dev mailing list