[ckan-dev] linking data in private S3 buckets

David Raznick david.raznick at okfn.org
Wed Feb 19 15:58:08 UTC 2014


On 19 February 2014 12:37, Anton Lundin <anton at dohi.se> wrote:
> Hello.
>
> This choice is a showstopper for upgrading to 2.2 for us.
>
> In our usage of Ckan, we make extensive use of S3 to store files and
> have S3 do all the heavy lifting for us and that makes the database the
> only state we need to keep on the machine.
>
>
> I haven't had the time to dig in deeper into the new implementation, but
> we would require to extend that to store the files in S3 before we can
> upgrade.
>
> I've seen the ckanext-s3archive extention, but that only moves the files
> to S3, after they bin uploaded to local disk.

I would recommend this as a viable option nonethess.  You can run
ckanext-s3archive on a regular basis, every hour or even in a
continuous loop if you like.
The issues with using s3 directly are that for uploads to be reliable
you have to use client side javascript directly to s3.  This has
several drawbacks:

* There is no way to assure that things in S3 are alighned with
resources in CKAN i.e there is no atomic way of knowing if the
resource was created and the file was uploaded (with the new
implentations this is a single post request).  So you most likely
would end up with files in S3 that would not actually have a resource.
* Privicy of the S3 bucket is harder to implement.  In 2.1 version of
CKAN data uploaded to s3 is not private.
* Not really accessable if you do not have javascript enabled.  We
make it a policy not to require javascript for core CKAN
functionality.

The main downside to this is that the files are temporally on disk,
possibly causing space issues, but as mentioned can be moved off disk
very regularly.

For very large files uploading uploading to S3 directly and doing what
Sefan says makes sense. Also you can get much more reliable uploads to
s3 using a desktop client (as you can do mulipart resumable uploads)
and that is pretty much impossible to do using just javascript (the
only implementations of this are proprietary java applets).

David


>
>
> //Anton
>
>
>
> On 19 February, 2014 - Nigel Babu wrote:
>
>> Hello Ivan,
>>
>> On ckan 2.2 and above, we removed the support for external filestores. Only
>> local filestores are supported. The old implementation was causing more
>> trouble than it's worth. We will, in the future, build an interface for
>> extensions to support multiple external filestores.
>>
>> Nigel Babu
>>
>> Developer  |  @nigelbabu <https://twitter.com/nigelbabu>
>>
>> The Open Knowledge Foundation <http://okfn.org/>
>>
>> Empowering through Open Knowledge
>>
>> http://okfn.org/  |  @okfn <http://twitter.com/OKFN>  |  OKF on
>> Facebook<https://www.facebook.com/OKFNetwork> |
>> Blog <http://blog.okfn.org/>  |  Newsletter<http://okfn.org/about/newsletter>
>>
>>  CKAN | http://ckan.org/ | @CKANproject
>> <http://twitter.com/CKANproject> |the world's leading open-source data
>> portal platform
>>
>>
>> On 12 February 2014 21:06, Ivan <vanzaj at gmail.com> wrote:
>>
>> > Hello,
>> >
>> > Sorry if I'm missing something obvious. I can't find any info in the docs,
>> > wikis, github issues, or elsewhere.
>> > Is there a way to create a private dataset linked to a file stored in a
>> > private S3 bucket?
>> >
>> > I have ofs.aws_access_key_id, and ofs.aws_secret_access_key in my
>> > <deploy>.ini, but it doesn't seem to be enough (i know it's not an auth
>> > issue as s3cmd with the same keys from the same host works fine). This is
>> > on ckan 2.3a.
>> >
>> > thanks,
>> > Ivan
>> >
>> >
>> > _______________________________________________
>> > ckan-dev mailing list
>> > ckan-dev at lists.okfn.org
>> > https://lists.okfn.org/mailman/listinfo/ckan-dev
>> > Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>> >
>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
> --
> Anton Lundin
>
> anton at dohi.se
> +46702-161604
>
> http://www.dohi.se/
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev



More information about the ckan-dev mailing list