[ckan-dev] road map for horizontal scaling?

Ruima E. ruimaximo at gmail.com
Thu Oct 13 06:57:19 UTC 2016


I second this.
We are in the same situation. We are an University that wants to build a
data capture framework (Hadoop/ Cassandra/ Spark) for the data provider in
our city and expose it to the community via CKAN.
Thank you!
RMX

On Thu, Oct 13, 2016 at 4:48 AM, Claire Herbert <Claire.Herbert at umanitoba.ca
> wrote:

> Sounds very interesting David. Would you be able share a high level
> diagram of the architecture? It sounds very useful in potentially helping
> organization like ours (University) plan a larger deployment.
>
>
> Claire
>
>
>
>
>
>
> ------------------------------
> *From:* Fawcett, David (MNIT) <David.Fawcett at state.mn.us>
> *Sent:* 06 October 2016 10:02
> *To:* CKAN Development Discussions
> *Subject:* Re: [ckan-dev] road map for horizontal scaling?
>
>
> RMX,
>
>
>
> We currently don’t store many, if any, of the datasets in the database.
> We put CKAN in front of an internal data distribution system, with our CKAN
> instance essentially becoming just another node on the system.  When a
> dataset is updated in the system, it gets pushed out to all designated
> nodes, and we run a script nightly to read dataset metadata and push
> new/updated records to CKAN via API.
>
>
>
> Here is an example dataset (we call them resources because they include
> web apps, desktops, and data):
>
> https://gisdata.mn.gov/dataset/env-buffer-protection-mn
> Buffer Protection Map, Minnesota - Resources - Minnesota Geospatial Commons
> These data represent public waters and public ditches that require
> permanent vegetation buffers or alternative riparian water quality
> practices. The buffer map data comprise two geographical...
> Read more... <https://gisdata.mn.gov/dataset/env-buffer-protection-mn>
>
>
>
> Most of the info on the page comes from the spatial metadata.  The
> overview text comes from the metadata Abstract element, the tags come from
> metadata key words, etc.
>
>
>
> Our state manages a lot of data in ESRI’s proprietary file geodatabase
> format, but to make the data accessible, we automatically generate
> shapefile and geopackage copies of the data and publish them as well.  This
> allows people to access the data without expensive licenses and proprietary
> software.
>
>
>
> In this example, you can also see that there is a link to view the full
> metadata record, and this resource has an associated Web map, so there is
> button to go there too.
>
>
>
> The file-based datasets and metadata documents are not stored on the same
> server as our CKAN instance.  They are on a different FTP server.  E.g.
> ftp://ftp.gisdata.mn.gov/pub/gdrs/data/pub/us_mn_state_dnr/
> env_buffer_protection_mn/shp_env_buffer_protection_mn.zip
>
>
>
> David.
>
>
>
>
>
>
>
> *From:* ckan-dev [mailto:ckan-dev-bounces at lists.okfn.org] *On Behalf Of *Ruima
> E.
> *Sent:* Thursday, October 06, 2016 1:08 AM
> *To:* CKAN Development Discussions <ckan-dev at lists.okfn.org>
> *Subject:* Re: [ckan-dev] road map for horizontal scaling?
>
>
>
> Thank you David,
>
>
>
> That is very good to know.
>
> All those datasets do they fit in one machine?
>
> Are you using postgreSQL to store the datasets, or just the metadata?
>
>
>
> Best regards,
>
> RMX
>
>
>
> On Thu, Oct 6, 2016 at 3:07 AM, Fawcett, David (MNIT) <
> David.Fawcett at state.mn.us> wrote:
>
> RMX,
>
> Our US state is running CKAN on Postgres.  We currently have about 600
> datasets, and we are not anywhere close to being limited by the database.
>
> data.gov has about 190,000 datasets and performs fine.
>
> David.
> ------------------------------
>
> *From:* ckan-dev [ckan-dev-bounces at lists.okfn.org] on behalf of Ruima E. [
> ruimaximo at gmail.com]
> *Sent:* Wednesday, October 05, 2016 2:40 PM
> *To:* CKAN Development Discussions
> *Subject:* Re: [ckan-dev] road map for horizontal scaling?
>
> Thank you Tim!
>
> I am asking these questions because I am considering installing a CKAN as
> a data hub for a city. It seems a very promising ideia but I am concerned
> that if tomorrow the number of datasets grows and we will need it to be
> distributed through several machines, the PosgreSQL might be a bottleneck
> and a headache.
>
> When I think about scale I have in mind the example of Hadoop. If tomorrow
> the datasets cannot fit one machine, just add one more node, edit a few
> text files and it works seamless. I am afraid that with PosgreSQL that is
> not the case, or am I wrong?
>
>
>
> Best regards,
>
> RMX
>
>
>
>
>
> On Wed, Oct 5, 2016 at 8:52 PM, Timothy Giles <timothy.giles at slu.se>
> wrote:
>
> Hi RMX.
>
>
>
> I wonder if you can give a concrete example of what you mean by scale?
> Since this is a dev forum/mailing list, I think it would helpful to
> quantify your issue(s) / conern(s). There are instances of CKAN with
> hundred of thousands and millions of datasets, as well as individual
> datasets being extremely large ('00s GBs).
>
>
>
> MvH Tim
>
>
>
>
>
>
> ------------------------------
>
> *From:* ckan-dev <ckan-dev-bounces at lists.okfn.org> on behalf of Ruima E. <
> ruimaximo at gmail.com>
> *Sent:* 05 October 2016 02:40 PM
> *To:* ckan-dev at lists.okfn.org
> *Subject:* [ckan-dev] road map for horizontal scaling?
>
>
>
> Hi,
>
>
>
> At the moment ckan relies on PostgreSQL as a data store. I was shocked
> when I found that such nice project relies on a data store that is not
> suitable to scale. Open data in smart cities is expected to be Big Data and
> it is expected to scale, jeopardizing the success of the whole initiative
> in a near future.
>
>
>
> Is scaling by using open source technologies part of the  road map for
> CKAN?
>
>
>
> Thank you,
>
> RMX
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20161013/d8c5869e/attachment-0003.html>


More information about the ckan-dev mailing list