[ckan-dev] road map for horizontal scaling?

Fawcett, David (MNIT) David.Fawcett at state.mn.us
Thu Oct 6 15:02:51 UTC 2016


RMX,

We currently don’t store many, if any, of the datasets in the database.  We put CKAN in front of an internal data distribution system, with our CKAN instance essentially becoming just another node on the system.  When a dataset is updated in the system, it gets pushed out to all designated nodes, and we run a script nightly to read dataset metadata and push new/updated records to CKAN via API.

Here is an example dataset (we call them resources because they include web apps, desktops, and data):
https://gisdata.mn.gov/dataset/env-buffer-protection-mn

Most of the info on the page comes from the spatial metadata.  The overview text comes from the metadata Abstract element, the tags come from metadata key words, etc.

Our state manages a lot of data in ESRI’s proprietary file geodatabase format, but to make the data accessible, we automatically generate shapefile and geopackage copies of the data and publish them as well.  This allows people to access the data without expensive licenses and proprietary software.

In this example, you can also see that there is a link to view the full metadata record, and this resource has an associated Web map, so there is button to go there too.

The file-based datasets and metadata documents are not stored on the same server as our CKAN instance.  They are on a different FTP server.  E.g. ftp://ftp.gisdata.mn.gov/pub/gdrs/data/pub/us_mn_state_dnr/env_buffer_protection_mn/shp_env_buffer_protection_mn.zip

David.



From: ckan-dev [mailto:ckan-dev-bounces at lists.okfn.org] On Behalf Of Ruima E.
Sent: Thursday, October 06, 2016 1:08 AM
To: CKAN Development Discussions <ckan-dev at lists.okfn.org>
Subject: Re: [ckan-dev] road map for horizontal scaling?

Thank you David,

That is very good to know.
All those datasets do they fit in one machine?
Are you using postgreSQL to store the datasets, or just the metadata?

Best regards,
RMX

On Thu, Oct 6, 2016 at 3:07 AM, Fawcett, David (MNIT) <David.Fawcett at state.mn.us<mailto:David.Fawcett at state.mn.us>> wrote:
RMX,

Our US state is running CKAN on Postgres.  We currently have about 600 datasets, and we are not anywhere close to being limited by the database.

data.gov<http://data.gov> has about 190,000 datasets and performs fine.

David.
________________________________
From: ckan-dev [ckan-dev-bounces at lists.okfn.org<mailto:ckan-dev-bounces at lists.okfn.org>] on behalf of Ruima E. [ruimaximo at gmail.com<mailto:ruimaximo at gmail.com>]
Sent: Wednesday, October 05, 2016 2:40 PM
To: CKAN Development Discussions
Subject: Re: [ckan-dev] road map for horizontal scaling?
Thank you Tim!
I am asking these questions because I am considering installing a CKAN as a data hub for a city. It seems a very promising ideia but I am concerned that if tomorrow the number of datasets grows and we will need it to be distributed through several machines, the PosgreSQL might be a bottleneck and a headache.
When I think about scale I have in mind the example of Hadoop. If tomorrow the datasets cannot fit one machine, just add one more node, edit a few text files and it works seamless. I am afraid that with PosgreSQL that is not the case, or am I wrong?

Best regards,
RMX


On Wed, Oct 5, 2016 at 8:52 PM, Timothy Giles <timothy.giles at slu.se<mailto:timothy.giles at slu.se>> wrote:

Hi RMX.



I wonder if you can give a concrete example of what you mean by scale? Since this is a dev forum/mailing list, I think it would helpful to quantify your issue(s) / conern(s). There are instances of CKAN with hundred of thousands and millions of datasets, as well as individual datasets being extremely large ('00s GBs).



MvH Tim







________________________________
From: ckan-dev <ckan-dev-bounces at lists.okfn.org<mailto:ckan-dev-bounces at lists.okfn.org>> on behalf of Ruima E. <ruimaximo at gmail.com<mailto:ruimaximo at gmail.com>>
Sent: 05 October 2016 02:40 PM
To: ckan-dev at lists.okfn.org<mailto:ckan-dev at lists.okfn.org>
Subject: [ckan-dev] road map for horizontal scaling?

Hi,

At the moment ckan relies on PostgreSQL as a data store. I was shocked when I found that such nice project relies on a data store that is not suitable to scale. Open data in smart cities is expected to be Big Data and it is expected to scale, jeopardizing the success of the whole initiative in a near future.

Is scaling by using open source technologies part of the  road map for CKAN?

Thank you,
RMX

_______________________________________________
ckan-dev mailing list
ckan-dev at lists.okfn.org<mailto:ckan-dev at lists.okfn.org>
https://lists.okfn.org/mailman/listinfo/ckan-dev
Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev


_______________________________________________
ckan-dev mailing list
ckan-dev at lists.okfn.org<mailto:ckan-dev at lists.okfn.org>
https://lists.okfn.org/mailman/listinfo/ckan-dev
Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20161006/a176337d/attachment-0003.html>


More information about the ckan-dev mailing list