[ckan-dev] road map for horizontal scaling?

Angelos Tzotsos gcpp.kalxas at gmail.com
Thu Oct 6 08:40:27 UTC 2016


Last time I checked, data.gov database was about 200GB and it is not on 
a single machine obviously.
Also, the 190000 datasets listed on data.gov are actually dataset 
collections, the number of single datasets is about 2 million.

http://catalog.data.gov/csw-all?service=CSW&version=2.0.2&request=GetRecords&typenames=csw:Record&elementsetname=brief

Best,
Angelos



On 10/06/2016 09:08 AM, Ruima E. wrote:
> Thank you David,
>
> That is very good to know.
> All those datasets do they fit in one machine?
> Are you using postgreSQL to store the datasets, or just the metadata?
>
> Best regards,
> RMX
>
> On Thu, Oct 6, 2016 at 3:07 AM, Fawcett, David (MNIT) <
> David.Fawcett at state.mn.us> wrote:
>
>> RMX,
>>
>> Our US state is running CKAN on Postgres.  We currently have about 600
>> datasets, and we are not anywhere close to being limited by the database.
>>
>> data.gov has about 190,000 datasets and performs fine.
>>
>> David.
>> ------------------------------
>> *From:* ckan-dev [ckan-dev-bounces at lists.okfn.org] on behalf of Ruima E. [
>> ruimaximo at gmail.com]
>> *Sent:* Wednesday, October 05, 2016 2:40 PM
>> *To:* CKAN Development Discussions
>> *Subject:* Re: [ckan-dev] road map for horizontal scaling?
>>
>> Thank you Tim!
>> I am asking these questions because I am considering installing a CKAN as
>> a data hub for a city. It seems a very promising ideia but I am concerned
>> that if tomorrow the number of datasets grows and we will need it to be
>> distributed through several machines, the PosgreSQL might be a bottleneck
>> and a headache.
>> When I think about scale I have in mind the example of Hadoop. If tomorrow
>> the datasets cannot fit one machine, just add one more node, edit a few
>> text files and it works seamless. I am afraid that with PosgreSQL that is
>> not the case, or am I wrong?
>>
>> Best regards,
>> RMX
>>
>>
>> On Wed, Oct 5, 2016 at 8:52 PM, Timothy Giles <timothy.giles at slu.se>
>> wrote:
>>
>>> Hi RMX.
>>>
>>>
>>> I wonder if you can give a concrete example of what you mean by scale?
>>> Since this is a dev forum/mailing list, I think it would helpful to
>>> quantify your issue(s) / conern(s). There are instances of CKAN with
>>> hundred of thousands and millions of datasets, as well as individual
>>> datasets being extremely large ('00s GBs).
>>>
>>>
>>> MvH Tim
>>>
>>>
>>>
>>>
>>> ------------------------------
>>> *From:* ckan-dev <ckan-dev-bounces at lists.okfn.org> on behalf of Ruima E.
>>> <ruimaximo at gmail.com>
>>> *Sent:* 05 October 2016 02:40 PM
>>> *To:* ckan-dev at lists.okfn.org
>>> *Subject:* [ckan-dev] road map for horizontal scaling?
>>>
>>> Hi,
>>>
>>> At the moment ckan relies on PostgreSQL as a data store. I was shocked
>>> when I found that such nice project relies on a data store that is not
>>> suitable to scale. Open data in smart cities is expected to be Big Data and
>>> it is expected to scale, jeopardizing the success of the whole initiative
>>> in a near future.
>>>
>>> Is scaling by using open source technologies part of the  road map for
>>> CKAN?
>>>
>>> Thank you,
>>> RMX
>>>
>>> _______________________________________________
>>> ckan-dev mailing list
>>> ckan-dev at lists.okfn.org
>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>>
>>>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>
>>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev


-- 
Angelos Tzotsos, PhD
OSGeo Charter Member
http://users.ntua.gr/tzotsos

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20161006/829c4da4/attachment-0003.html>


More information about the ckan-dev mailing list