[ckan-dev] road map for horizontal scaling?

Adrià Mercader adria.mercader at okfn.org
Thu Oct 6 11:03:36 UTC 2016


Hi RMX,

As others on this thread have said Postgres won't be a blocker in
terms of growth of your instance (Postgres can absolutely be scaled
horizontally, if you plan your architecture well). There are
techniques at the CKAN level that can improve performance for truly
massive instances (in terms of number of datasets ie metadata). I
think that you are confusing the storing of actual data with number of
datasets. As Florian says the DataStore extension is optional, and of
course if you need to publish Big Data sources you would need to use
integrations with tools suited for the job, like Hadoop, Big Query,
etc.

So don't let this scare you away from CKAN! :)

Adrià


On 6 October 2016 at 09:40, Angelos Tzotsos <gcpp.kalxas at gmail.com> wrote:
> Last time I checked, data.gov database was about 200GB and it is not on a
> single machine obviously.
> Also, the 190000 datasets listed on data.gov are actually dataset
> collections, the number of single datasets is about 2 million.
>
> http://catalog.data.gov/csw-all?service=CSW&version=2.0.2&request=GetRecords&typenames=csw:Record&elementsetname=brief
>
> Best,
> Angelos
>
>
>
> On 10/06/2016 09:08 AM, Ruima E. wrote:
>
> Thank you David,
>
> That is very good to know.
> All those datasets do they fit in one machine?
> Are you using postgreSQL to store the datasets, or just the metadata?
>
> Best regards,
> RMX
>
> On Thu, Oct 6, 2016 at 3:07 AM, Fawcett, David (MNIT) <
> David.Fawcett at state.mn.us> wrote:
>
> RMX,
>
> Our US state is running CKAN on Postgres.  We currently have about 600
> datasets, and we are not anywhere close to being limited by the database.
>
> data.gov has about 190,000 datasets and performs fine.
>
> David.
> ------------------------------
> *From:* ckan-dev [ckan-dev-bounces at lists.okfn.org] on behalf of Ruima E. [
> ruimaximo at gmail.com]
> *Sent:* Wednesday, October 05, 2016 2:40 PM
> *To:* CKAN Development Discussions
> *Subject:* Re: [ckan-dev] road map for horizontal scaling?
>
> Thank you Tim!
> I am asking these questions because I am considering installing a CKAN as
> a data hub for a city. It seems a very promising ideia but I am concerned
> that if tomorrow the number of datasets grows and we will need it to be
> distributed through several machines, the PosgreSQL might be a bottleneck
> and a headache.
> When I think about scale I have in mind the example of Hadoop. If tomorrow
> the datasets cannot fit one machine, just add one more node, edit a few
> text files and it works seamless. I am afraid that with PosgreSQL that is
> not the case, or am I wrong?
>
> Best regards,
> RMX
>
>
> On Wed, Oct 5, 2016 at 8:52 PM, Timothy Giles <timothy.giles at slu.se>
> wrote:
>
> Hi RMX.
>
>
> I wonder if you can give a concrete example of what you mean by scale?
> Since this is a dev forum/mailing list, I think it would helpful to
> quantify your issue(s) / conern(s). There are instances of CKAN with
> hundred of thousands and millions of datasets, as well as individual
> datasets being extremely large ('00s GBs).
>
>
> MvH Tim
>
>
>
>
> ------------------------------
> *From:* ckan-dev <ckan-dev-bounces at lists.okfn.org> on behalf of Ruima E.
> <ruimaximo at gmail.com>
> *Sent:* 05 October 2016 02:40 PM
> *To:* ckan-dev at lists.okfn.org
> *Subject:* [ckan-dev] road map for horizontal scaling?
>
> Hi,
>
> At the moment ckan relies on PostgreSQL as a data store. I was shocked
> when I found that such nice project relies on a data store that is not
> suitable to scale. Open data in smart cities is expected to be Big Data and
> it is expected to scale, jeopardizing the success of the whole initiative
> in a near future.
>
> Is scaling by using open source technologies part of the  road map for
> CKAN?
>
> Thank you,
> RMX
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
>
> --
> Angelos Tzotsos, PhD
> OSGeo Charter Member
> http://users.ntua.gr/tzotsos
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>



More information about the ckan-dev mailing list