[ckan-dev] Relationship between resource and datastore

Lasse Vestergaard ibbernik at gmail.com
Mon Dec 31 08:24:04 UTC 2012


Thanks for your replies.

2012/12/30 Dominik Moritz <dominik.moritz at okfn.org>

> Hi Lasse,
>
> On 27 Dec 2012, at 13:21, Lasse Vestergaard <ibbernik at gmail.com> wrote:
>
> > Hi all.
> >
> > I have been trying to learn CKAN for a couple of weeks, and I have now
> come
> > to a point where I don't get the conceptual idea behind the relationship
> > between resource and the datastore.
> >
> > As I understand it, you have attach a datastore to an existing resource
> in
> > CKAN, but there doesn't have to be any resemblance between the datastore
> > and the resource. This means that I can have an image as a resource, and
> > then have a completely arbitrary datastore related to it. Someone could
> > argue that this makes sense because the datastore is collecting data that
> > relates to the image - the datastore could have data about how much
> people
> > like the image. That might be a valid argument, but it seems odd, to me,
> > that I can upload ex. a CSV file as a resource, and then put it into the
> > datastore through datastorer (I haven't been able to get the datastorer
> up
> > and running yet, but I assume that the datastorer creates a datastore
> that
> > relates to the CSV resource?).
>
> Yes, the DataStorer downloads the file from the url, parses it and writes
> the raw data to the DataStore.
>

I managed to install the Datastorer, but it seems that it tries to update
an existing datastore instead of creating a new datastore, when I create a
new CSV resource. This seems semantically wrong to me? Furthermore, I can't
check whether I want the CSV file to be put into the datastore or not - it
does it automatically. This is fine, but wouldn't it yeld errors if the
datastorer doesn't recognize the file format?

I'm still making tests on this problem, but I just wanted to put it out
here, in case any of you have experienced the same issue with datastore
updating insted of creating.


>
> > This means that I have the CSV file as the
> > resource and users can download this file. At the same time I can access
> > the CSV file data through the datastore. Furthermore this means that I
> can
> > add, update and delete data in the datastore. Lets say I have regularly
> > been inserting new rows into the datastore, and after a while I want to
> get
> > a dump of what is going on. If I am a normal user, I would just go and
> > download the CSV file, but that would be wrong, because the CSV file
> > doesn't hold any of the updates that have been made through the
> datastore.
>
> The advantage of having a datastore table for a static CSV file is that
> there is a defined API to access the data of the resource and that it will
> be available as long as CKAN is available. I see the problem that arises if
> you edit data in the datastore. The way I'd do it is the following: If
> there will never be any edits to the datastore table of a resource, use the
> datastorer to import the file into the datastore and let your users benefit
> from the advantage of the powerful API and availability. In case someone
> wants to download the file, he/she can download the original resource.
> However, if there are edits to the resource in the datastore, only use the
> datastore.


This was also how I thought it, but how do I *only* use the datastore? I
might be misinterpreting you, but aren't you saying that I can have a
datastore as a resource? This way I don't need to upload a CSV file, and I
can just create a datastore with a specifik structure? Or do you just mean
that I shouldn't use the CSV file I uploaded?


> Nick Jackson had a great idea to add a converter that converts the JSON to
> CSV on the fly. I am currently working on an implementation for that at
> https://github.com/okfn/ckan-datastore-export-service.
>

Convertion of JSON to CSV is great. I will definitely look into this
extension!


>
> We are also working on a datastorer implementation as a service which will
> make the whole process of importing data into the DataStore much easier.
> There will be no need to install anything in CKAN anymore. However, for
> now, you will need the CKAN DataStorer extension.
>

That's perfect! Do you have any estimates on when that will be released?


>
> >
> > All in all this means that the users would be disappointed when they find
> > out that nothing has happened in relation to the resource. That is just
> one
> > dimension of the issue. As I perceive it the CSV file is useless from the
> > very beginning. The only reason why I would have the CSV file in the
> > ressource is because I wanted to make the template for the datastore
> tabel.
> > After that the CSV file would be deprecated because it doesn't get
> updated
> > with the datastore. If that is the case, why couldn't I just have the
> > datastore as a resource by it self in the first place? I could just setup
> > the tabel template through the datastore API.
> >
> > It might easily by me that is missing some point about CKAN.
> >
> > Regards
> >
> > Lasse Vestregaard (Denmark)
>
> Best,
> Dominik



/Lasse
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20121231/7f953c69/attachment-0002.html>


More information about the ckan-dev mailing list