[ckan-dev] Relationship between resource and datastore

Mon Dec 31 15:57:13 UTC 2012

Hello,

On 31 Dec 2012, at 09:24, Lasse Vestergaard <ibbernik at gmail.com> wrote:

> Thanks for your replies.
> 
> 2012/12/30 Dominik Moritz <dominik.moritz at okfn.org>
> 
>> Hi Lasse,
>> 
>> On 27 Dec 2012, at 13:21, Lasse Vestergaard <ibbernik at gmail.com> wrote:
>> 
>>> Hi all.
>>> 
>>> I have been trying to learn CKAN for a couple of weeks, and I have now
>> come
>>> to a point where I don't get the conceptual idea behind the relationship
>>> between resource and the datastore.
>>> 
>>> As I understand it, you have attach a datastore to an existing resource
>> in
>>> CKAN, but there doesn't have to be any resemblance between the datastore
>>> and the resource. This means that I can have an image as a resource, and
>>> then have a completely arbitrary datastore related to it. Someone could
>>> argue that this makes sense because the datastore is collecting data that
>>> relates to the image - the datastore could have data about how much
>> people
>>> like the image. That might be a valid argument, but it seems odd, to me,
>>> that I can upload ex. a CSV file as a resource, and then put it into the
>>> datastore through datastorer (I haven't been able to get the datastorer
>> up
>>> and running yet, but I assume that the datastorer creates a datastore
>> that
>>> relates to the CSV resource?).
>> 
>> Yes, the DataStorer downloads the file from the url, parses it and writes
>> the raw data to the DataStore.
>> 
> 
> I managed to install the Datastorer, but it seems that it tries to update
> an existing datastore instead of creating a new datastore, when I create a
> new CSV resource. This seems semantically wrong to me?

It does update an existing resource. 

> Furthermore, I can't
> check whether I want the CSV file to be put into the datastore or not - it
> does it automatically. This is fine, but wouldn't it yeld errors if the
> datastorer doesn't recognize the file format?

It only tries to import files it can parse (CSV, Excel).

> 
> I'm still making tests on this problem, but I just wanted to put it out
> here, in case any of you have experienced the same issue with datastore
> updating insted of creating.
> 
> 
>> 
>>> This means that I have the CSV file as the
>>> resource and users can download this file. At the same time I can access
>>> the CSV file data through the datastore. Furthermore this means that I
>> can
>>> add, update and delete data in the datastore. Lets say I have regularly
>>> been inserting new rows into the datastore, and after a while I want to
>> get
>>> a dump of what is going on. If I am a normal user, I would just go and
>>> download the CSV file, but that would be wrong, because the CSV file
>>> doesn't hold any of the updates that have been made through the
>> datastore.
>> 
>> The advantage of having a datastore table for a static CSV file is that
>> there is a defined API to access the data of the resource and that it will
>> be available as long as CKAN is available. I see the problem that arises if
>> you edit data in the datastore. The way I'd do it is the following: If
>> there will never be any edits to the datastore table of a resource, use the
>> datastorer to import the file into the datastore and let your users benefit
>> from the advantage of the powerful API and availability. In case someone
>> wants to download the file, he/she can download the original resource.
>> However, if there are edits to the resource in the datastore, only use the
>> datastore.
> 
> 
> This was also how I thought it, but how do I *only* use the datastore? I
> might be misinterpreting you, but aren't you saying that I can have a
> datastore as a resource? This way I don't need to upload a CSV file, and I
> can just create a datastore with a specifik structure? Or do you just mean
> that I shouldn't use the CSV file I uploaded?

To be honest, there is no clean way at the moment. You will need to create a new resource with a temporary url and then create the datastore table manually through the API. Then change the url to point to the datastore (or a url that points to the converter that returns CSV). In this case you won't need the original CSV file but you will have to do a lot of things manually. 
Alternatively, you can add the original resource as a CSV, let the datastorer import it and then change the url of the resource in the datastore. The second option will have the advantage, that the data will already be in the datastore and is probably the preferred option.

Imho, this is the best way to handle this case ATM. To be honest, we haven't considered this case yet.

> 
> 
>> Nick Jackson had a great idea to add a converter that converts the JSON to
>> CSV on the fly. I am currently working on an implementation for that at
>> https://github.com/okfn/ckan-datastore-export-service.
>> 
> 
> Convertion of JSON to CSV is great. I will definitely look into this
> extension!

You are very welcome to fork it and send pull requests ;-)

> 
> 
>> 
>> We are also working on a datastorer implementation as a service which will
>> make the whole process of importing data into the DataStore much easier.
>> There will be no need to install anything in CKAN anymore. However, for
>> now, you will need the CKAN DataStorer extension.
>> 
> 
> That's perfect! Do you have any estimates on when that will be released?

Not really. Between two weeks and two months, depending on my time and occurring problems with dependencies.

I suggest that you continue using the old datastorer for now and follow the development at https://github.com/okfn/ckan-importer-service and https://github.com/okfn/ckan-service-provider.

As you can see, we continue to improve the datastorer and your feedback is very welcome.

> 
> 
>> 
>>> 
>>> All in all this means that the users would be disappointed when they find
>>> out that nothing has happened in relation to the resource. That is just
>> one
>>> dimension of the issue. As I perceive it the CSV file is useless from the
>>> very beginning. The only reason why I would have the CSV file in the
>>> ressource is because I wanted to make the template for the datastore
>> tabel.
>>> After that the CSV file would be deprecated because it doesn't get
>> updated
>>> with the datastore. If that is the case, why couldn't I just have the
>>> datastore as a resource by it self in the first place? I could just setup
>>> the tabel template through the datastore API.
>>> 
>>> It might easily by me that is missing some point about CKAN.
>>> 
>>> Regards
>>> 
>>> Lasse Vestregaard (Denmark)
>> 
>> Best,
>> Dominik
> 
> 
> 
> /Lasse

Dominik

> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev