[ckan-discuss] Where do city and other governments store their CKAN data?

Florian May florian.wendelin.mayer at gmail.com
Thu Mar 20 09:28:47 UTC 2014


Hi Mark,

I work for the Western Australian Department of Parks & Wildlife. We have
created a Docker [0] image for CKAN (as well as for a few other packages)
[1] to provide a few handy CKAN extensions, and to make customisation and
deployment as simple as possible while separating code and data. Bragging
rights: once the base image is built, spinning up a CKAN instance
end-to-end takes about 10 minutes.
We have just begun to run two of those CKANs - one as an internal data
catalog, one as a public-facing catalog (currently branded as "demo", will
move to data.dpaw.wa.gov.au).

Content: We have a 90-9-1 volume ratio in our data - 90% is raster data
(hosted as WMS), 9% is vector data (hosted as shape files on file shares
and WMS services), and a very small fraction is text data (spreadsheets and
relational databases of observation&measurement data from our research and
monitoring activities). Only the text data will be uploaded directly to
CKAN. All sensor streams go onto opendap servers and are accessed via web
services from there.
We're in the lucky position to be one fiber-optic cable away from the new
Pawsey petascale data center, which will store radio-astronomy data from
the Square-Kilometer-Array. We can store, if need be, all of our bulky data
on scraps of storage falling off their enormous radio-astronomy-sized table.

Internally, we store text data (as CSV with WGS 84 decimal lat/lon and
timestamps per record), the convex hull of the total spatial extent of the
dataset in the "spatial" field as used by the ckanext-spatial extension,
the actual spatial footprint of our transects (from where the data is
collected) as polygons in a geojson file, and any products derived from the
dataset. Unfortunately the server is only accessible from inside our
intranet so you'd have to take my word.

Externally-facing [2] we share the metadata, contact details, total spatial
extent, plus (only if cleared for public release) the data.
We keep the two catalogs completely separate as our datasets are very
sensitive (we monitor threatened species - we don't want to point the
general public towards the last known rare Orchid, or the best fishing
spots) and we'd rather draw the line between two databases than inside the
same CKAN with the "public" and "private" settings.

re Deployment: We use Docker images running inside Ubuntu 12.04 LTS VMs
running on Amazon AWS as well as local VMs. For dev&test, our CKANs run in
a Ubuntu 13.04 VM.
Our IT crew loves Docker and Linux containers as they provide all of the
benefits of VMs, but none of the bloat.

We'd be glad to receive your feedback on our docker image and deployment
process!

Cheers,

Florian Mayer
Marine Science Information Management
Department of Parks and Wildlife WA

[0] https://www.docker.io/
[1] https://bitbucket.org/dpaw/dpaw_docker/src
[2] http://data-demo.dpaw.wa.gov.au/




On Thu, Mar 20, 2014 at 4:43 PM, Antoine Logean
<antoine.logean at opendata.ch>wrote:

> Hi Mark,
>
> Very good point. I also would be very interested to know better what are
> the best alternatives to cover this operativ part of CKAN. In the company
> (~10000 employees) where I work I have startet a pilot where we would like
> to evaluate how we can use CKAN as intern data catalog. For us is it clear
> that CKAN will ONLY contained meta data and so should remain a catalog. The
> effectiv storage (with all the associated non functional requirements) of
> the data is clearly done by other dedicated plateforms. For the moment our
> CKAN instance is running in an intern cloud plateform on a virtual Linux
> machine.
>
> Look forward to the others answers.
>
> Kind regards
>
> Antoine
>
> Le mercredi 19 mars 2014, Mark Boyd <mark at mgboyd.com> a écrit :
>
> Hi all
>>
>> I hope this is alright to ask this here in the CKAN email discussion list.
>>
>> I am a follower of CKAN and have written about the platform several times
>> for ProgrammableWeb. I am currently working on an article about the new
>> FI-Ware open source platform that i understand will partner with CKAN in
>> the near future. FI-Ware is, in part, aiming to set up data centers to
>> provide governments (and others) with independent distributed data storage
>> solutions for their smart cities projects, and for storage of their open
>> data platforms.
>>
>> This leads me to ask existing CKAN users: *How do you manage your CKAN
>> instances and open data storage now? *Are you using cloud storage
>> (Amazon?) or internal servers? If open data is to grow and at a (smart-)
>> city level is to include sensor data in realtime, what storage needs will
>> you have and what is the current thinking to solve this challenge?
>>
>> Again, i apologise if this is inappropriate to ask on this list. Feel
>> free to email me directly if you prefer.
>>
>> Thanks for your help and for making CKAN such a great public resource.
>>
>> Mark Boyd
>> mark at mgboyd.com
>>
>> Web: *http://www.programmableweb.com/profile/MarkBoyd
>> <http://www.programmableweb.com/profile/MarkBoyd>*
>> Twitter: @mgboydcom
>> Skype: mark.boyd.es
>> Google+: https://plus.google.com/100669793976591887741/posts/p/pub
>> LinkedIn: http://es.linkedin.com/pub/mark-boyd/36/869/283
>> Phone: +34 650 527 143
>>
>
>
> --
>
> Opendata.ch - Enabling Open Government Data in Switzerland
> Antoine Logean | Founding Board Member | Community & Communication FR +41 79 3518482 | antoine.logean at opendata.ch | http://twitter.com/ecolix
>
>
>
> _______________________________________________
> ckan-discuss mailing list
> ckan-discuss at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-discuss
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.okfn.org/pipermail/ckan-discuss/attachments/20140320/aedea330/attachment.html>


More information about the ckan-discuss mailing list