[okfn-labs] Next generation data catalogues.

Pieter Colpaert pieter.colpaert at okfn.org
Tue Aug 27 09:42:28 UTC 2013


Hi all,

Maybe we should also make a clear semantic distinction between: data 
portal and a data catalogue.

What do we want to do?
- Are we going to build a system through which people can query data?
- Are we going to build a system for maintaining meta-data and finding 
datasets?
- Are we going to build a system for data governance?
- etc?

In my dictionary, a *Data Portal* is a system that puts everything in 
place for a certain domain (or region) to stimulate data reuse. It is 
*the* system which will make data used and useful (for a domain): it 
connects all stakeholders needed.

At this moment, I believe each attempt in making a *generic* data portal 
focused on all "5 stars of Open Data Portals" [1], will fail, because 
each domain needs their own tools:
- geographic focused domains will need stuff like WMS services, geojson, 
map visualizers, etc.
- Open Transport people need GTFS validators, unique identifiers for 
stop areas, tool sto maintain interconnections between stop areas and 
stop points, and so on
- Open Access people need ETL tools which convert MARC to e.g. RDF.
- ... (you get the point)

I do think however that we can provide the right subtools so that every 
"Open" domain can create their proper Open Data Portal to support data 
becoming used and useful. Everyone can then decide to choose the right 
tools for their Open Data Portal (e.g. GeoMajas + CKAN + dat, or Open 
Trip Planner + CKAN + GTFS validators + github for URIs, or Catmandu + 
Drupal + Triple stores...).

Therefore, one thing we need is an Open Data Catalogue: something that 
focuses on maintaining meta-data and enables you to:
- Harvest meta-data (or aggregate different subcatalogues)
- Publish the meta-data as open data (so that your meta-data can be 
aggregated again)
- Have unique URIs for datasets
- query an API to find links to the datasets that

CKAN is a perfect tool for this at the moment and I don't think we 
should change the scope for something called a "new generation data 
catalog". Seeing how many organisations use CKAN today. Shouldn't we 
focus on turning CKAN into something (even) better instead?

Kind regards,

Pieter

[1] http://lists.okfn.org/pipermail/okfn-labs/2013-August/001055.html


On 08/27/2013 10:37 AM, Ross Jones wrote:
> Hi Rufus,
>
> On 27 Aug 2013, at 00:12, Rufus Pollock <rufus.pollock at okfn.org 
> <mailto:rufus.pollock at okfn.org>> wrote:
>
>> Very interesting.
>>
>> I would observe that http://data.okfn.org/ is in some ways a bit of 
>> an experiment in precisely this regard - its a data catalog but a 
>> radically stripped down one with a strong orientation to a) data 
>> packages b) small data c) data stored in git (and github) [d) being 
>> written in js!]
>>
>> We've also had experiments with pure JS data catalogs (e.g. 
>> https://github.com/okfn/datacatalog.js) individual data stores, 
>> Friedrich's datahub, lists stored in wikis, extensions to CMS'es and 
>> more over the years ;-)
>
> I think they're all cool ideas (and more importantly, implementations) 
> and I don't doubt we'll be stealing as many ideas as possible, and 
> possibly even code.
>
>> I think, as already being discussed later in this thread, the key 
>> question is:
>>
>> * What user stories do you have (what features do you want)?
>
> Initially, built in multi-lingual datasets (this is important), DCAT 
> metadata, datatank style serialisation. There's some discussion about 
> API or Services, I don't think there's a solid consensus yet though.
>> Related to both of these here are some diagrams indicating thoughts 
>> at the time about what a "DataHub" might offer (and also suggesting a 
>> fairly separated architecture):
>>
>> http://notebook.okfn.org/2012/06/22/datahub-small-pieces-loosely-joined/
>> http://notebook.okfn.org/2011/04/27/data-hubs-data-management-systems-and-ckan/ 
>> (older)
>
> Think I've seen this before, but very useful, thanks.
>
>> PS: you've already mentioned it below but I'll mention it again: 
>> "beware second-system syndrome" ;-) This does not mean one should not 
>> build new (and certainly the discussion of problems and possibilities 
>> re new is *very* useful) but before actually starting on something 
>> (completely) new you want to have a very strong benefit/cost in 
>> favour of that versus fix/extend/enhance of existing system (as a 
>> side note: originally version of CKAN started life as various wikis 
>> on exactly this logic - it was only after we got *really* sick of 
>> trying to mod-MoinMoin that CKAN in its current pythonic form began …)
>
> Yeah, it was something I mulled over, but I'm not really any form of 
> lead on it, I'm just sticking my oar in (as usual ;) ). I think DKAN 
> is a fairly good example of why just trying to re-write something 
> isn't always a very good idea (primarily because it adds nothing).
>
> Ross
>
>
>
> _______________________________________________
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/okfn-labs
> Unsubscribe: http://lists.okfn.org/mailman/options/okfn-labs





More information about the okfn-labs mailing list