[ckan-dev] road map for horizontal scaling?

Matthew Fullerton matt.fullerton at gmail.com
Fri Oct 7 04:59:36 UTC 2016


As Adrià said I think there has to be a clear distinction between datasets
(metadata) and the data itself. I think there's big scope for allowing
CKAN's datastore (the bit that can handle the actual rows of data for you,
although lets not limit it to "rows") to interface with other [big data]
databases. This came up a few days ago on this list:
https://lists.okfn.org/pipermail/ckan-dev/2016-September/010350.html

As your application is cities/smart cities, you might want to look at the
FIWARE project:
http://www.fiware.org,
https://www.fiware.org/devguides/publishing-open-data-in-fiware/how-to-publish-context-information-as-open-data-in-ckan/

It is a big collection of software services for smart cities and has
integrated CKAN so that open data portals can be linked in. I'm excited to
see how much in the coming months and years smart cities lead to more open
data or whether it will just lead to more proprietary platforms...!

Best,
Matt


On 6 October 2016 at 17:02, Fawcett, David (MNIT) <David.Fawcett at state.mn.us
> wrote:

> RMX,
>
>
>
> We currently don’t store many, if any, of the datasets in the database.
> We put CKAN in front of an internal data distribution system, with our CKAN
> instance essentially becoming just another node on the system.  When a
> dataset is updated in the system, it gets pushed out to all designated
> nodes, and we run a script nightly to read dataset metadata and push
> new/updated records to CKAN via API.
>
>
>
> Here is an example dataset (we call them resources because they include
> web apps, desktops, and data):
>
> https://gisdata.mn.gov/dataset/env-buffer-protection-mn
>
>
>
> Most of the info on the page comes from the spatial metadata.  The
> overview text comes from the metadata Abstract element, the tags come from
> metadata key words, etc.
>
>
>
> Our state manages a lot of data in ESRI’s proprietary file geodatabase
> format, but to make the data accessible, we automatically generate
> shapefile and geopackage copies of the data and publish them as well.  This
> allows people to access the data without expensive licenses and proprietary
> software.
>
>
>
> In this example, you can also see that there is a link to view the full
> metadata record, and this resource has an associated Web map, so there is
> button to go there too.
>
>
>
> The file-based datasets and metadata documents are not stored on the same
> server as our CKAN instance.  They are on a different FTP server.  E.g.
> ftp://ftp.gisdata.mn.gov/pub/gdrs/data/pub/us_mn_state_dnr/
> env_buffer_protection_mn/shp_env_buffer_protection_mn.zip
>
>
>
> David.
>
>
>
>
>
>
>
> *From:* ckan-dev [mailto:ckan-dev-bounces at lists.okfn.org] *On Behalf Of *Ruima
> E.
> *Sent:* Thursday, October 06, 2016 1:08 AM
> *To:* CKAN Development Discussions <ckan-dev at lists.okfn.org>
>
> *Subject:* Re: [ckan-dev] road map for horizontal scaling?
>
>
>
> Thank you David,
>
>
>
> That is very good to know.
>
> All those datasets do they fit in one machine?
>
> Are you using postgreSQL to store the datasets, or just the metadata?
>
>
>
> Best regards,
>
> RMX
>
>
>
> On Thu, Oct 6, 2016 at 3:07 AM, Fawcett, David (MNIT) <
> David.Fawcett at state.mn.us> wrote:
>
> RMX,
>
> Our US state is running CKAN on Postgres.  We currently have about 600
> datasets, and we are not anywhere close to being limited by the database.
>
> data.gov has about 190,000 datasets and performs fine.
>
> David.
> ------------------------------
>
> *From:* ckan-dev [ckan-dev-bounces at lists.okfn.org] on behalf of Ruima E. [
> ruimaximo at gmail.com]
> *Sent:* Wednesday, October 05, 2016 2:40 PM
> *To:* CKAN Development Discussions
> *Subject:* Re: [ckan-dev] road map for horizontal scaling?
>
> Thank you Tim!
>
> I am asking these questions because I am considering installing a CKAN as
> a data hub for a city. It seems a very promising ideia but I am concerned
> that if tomorrow the number of datasets grows and we will need it to be
> distributed through several machines, the PosgreSQL might be a bottleneck
> and a headache.
>
> When I think about scale I have in mind the example of Hadoop. If tomorrow
> the datasets cannot fit one machine, just add one more node, edit a few
> text files and it works seamless. I am afraid that with PosgreSQL that is
> not the case, or am I wrong?
>
>
>
> Best regards,
>
> RMX
>
>
>
>
>
> On Wed, Oct 5, 2016 at 8:52 PM, Timothy Giles <timothy.giles at slu.se>
> wrote:
>
> Hi RMX.
>
>
>
> I wonder if you can give a concrete example of what you mean by scale?
> Since this is a dev forum/mailing list, I think it would helpful to
> quantify your issue(s) / conern(s). There are instances of CKAN with
> hundred of thousands and millions of datasets, as well as individual
> datasets being extremely large ('00s GBs).
>
>
>
> MvH Tim
>
>
>
>
>
>
> ------------------------------
>
> *From:* ckan-dev <ckan-dev-bounces at lists.okfn.org> on behalf of Ruima E. <
> ruimaximo at gmail.com>
> *Sent:* 05 October 2016 02:40 PM
> *To:* ckan-dev at lists.okfn.org
> *Subject:* [ckan-dev] road map for horizontal scaling?
>
>
>
> Hi,
>
>
>
> At the moment ckan relies on PostgreSQL as a data store. I was shocked
> when I found that such nice project relies on a data store that is not
> suitable to scale. Open data in smart cities is expected to be Big Data and
> it is expected to scale, jeopardizing the success of the whole initiative
> in a near future.
>
>
>
> Is scaling by using open source technologies part of the  road map for
> CKAN?
>
>
>
> Thank you,
>
> RMX
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20161007/54fbc898/attachment-0003.html>


More information about the ckan-dev mailing list