[ckan-discuss] Topic list (vocabulary/valuelist) for classifying datasets .

David Read david.read at hackneyworkshop.com
Wed Jun 13 11:28:10 BST 2012

Edo & Pascal,

Great topic!

Using the Eurovoc top level is also going to have the advantage of the 27
language translation work going on. And skos has a lot of traction - is
that a closed vocab that might be considered? Edo's point about hospital
usefully coming under two categories is also good - there is little point
having a strict hierarchy

By actually doing categorisation of datasets I think we will learn what
categories are missing. And when people start using it, we will realise
more what people need/want. I think this is just as important in the
process, as picking a well-known vocab.

FYI at data.gov.uk someone came up with this list of categories:
Health, Environment, Education, Finance, Society, Defence, Transportation,
Location, Administration, Spending data
We've not done much with these yet, so am happy to try and steer
data.gov.uktowards Eurovoc or other good ideas that come up here.

And at our Office of National Statistics (which provides a sizeable chunk
of the datasets) they use these categories:
Agriculture and Environment,
Business and Energy
Children, Education and Skills
Crime and Justice
Health and Social Care
Labour Market
People and Places
Travel and Transport
plus interestingly, two categories that cross-cut against the others:
Equality and Diversity
So again, this is allowing a dataset to be put in more than one category.
(Like Gmail labels, as opposed to folders.) This might seem to be getting
on like basic tagging, but the key thing I see about 'categorisation' is
that the vocab is closed, and categorisations are comprehensively applied
to datasets.

One more suggestion is to have quite a short list. Anything more than a
dozen (top-level) categories is going to be a bit bewildering for the user
browsing by them, which I guess is the aim for all of this?


On 13 June 2012 10:34, <p.romain at cg33.fr> wrote:

> Hi Edo,
> We are working on the same issue so does the team currently building the
> future opendata portal for the EU commission (see
> http://blog.okfn.org/2012/01/31/open-knowledge-foundationss-ckan-software-to-power-new-european-commission-data-portal/and
> https://github.com/okfn/ckanext-ecportalfor code and info)
> For the moment we implement de french national thesaurus available in a
> skos version but we are thinking of using the INSPIRE GEMET vocabulary as a
> complement for building a cross-domain topic list
> See http://www.eionet.europa.eu/gemet/gemet-groups.rdf?langcode=nlor
> http://www.eionet.europa.eu/gemet/
> Cordialement,*
> Pascal Romain**
> Chef de projet informatique documentaire*
> Service Projets Etudes Conseils*
> Direction des Systèmes d'Information*
> 05 56 99 33 33 poste 6643
> @datalocale*
> * <http://www.gironde.fr/>
> De :        "Plantinga, Edo" <Edo.Plantinga at koop.wmrijk.nl>
> A :        <ckan-discuss at lists.okfn.org>
> Cc :        "Overbeek, Hans" <Hans.Overbeek at koop.wmrijk.nl>
> Date :        13/06/2012 11:22
> Objet :        [ckan-discuss] Topic list (vocabulary/valuelist) for
> classifying        datasets .
> Envoyé par :        ckan-discuss-bounces at lists.okfn.org
> ------------------------------
> Hi all,
> We're currently looking into a suitable topic list (i.e.
> vocabulary/valuelist) for the Dutch open data portal (data.overheid.nl).
> The main idea is to make the datasets easier to find when searching on the
> site. When comparing the topics used by *http://data.gov.uk/*<http://data.gov.uk/>,
> *http://www.data.gov/* <http://www.data.gov/>and *http://publicdata.eu/*<http://publicdata.eu/>,
> one can see the differences in how datasets are categorized are quite big.
> Apparently there is no Obvious Best Way to do this. To me, therefore, it
> would make sense to find a topic list that is not just suitable for the
> open data domain, but also for other domains. That way it becomes easier to
> also show datasets on other websites than just open data portals.
> We are considering using the (highest level) domains of EUROVOC as such a
> topic list (see *http://eurovoc.europa.eu/drupal/?q=navigation&cl=en*<http://eurovoc.europa.eu/drupal/?q=navigation&cl=en>).
> It seems to be widely used within the EU. The list is not perfect for the
> open data domain, but I guess no reasonably short list ever will be. I
> noticed that some of these topics seem to correspond with the topics on
> publicdata.eu.
> Is anyone working on this at the moment? Does anyone know how
> publicdata.eu arrived at the current topic list? Is an international best
> practice emerging? Any other thoughts on this?
> Best regards,
> Edo Plantinga
> Data.overheid.nl
> Bezoekt u binnenkort een locatie van de Rijksoverheid?
> Dan dient u in het bezit te zijn van een geldige Rijkspas of een geldig
> identiteitsbewijs (paspoort, nationale identiteitskaart, rijbewijs of
> vreemdelingendocument). Indien u bij controle geen geldig identiteitsbewijs
> kunt tonen, wordt de toegang geweigerd. Legitimatiebewijzen van andere
> organisaties worden niet geaccepteerd.
> Dit bericht kan informatie bevatten die niet voor u is bestemd. Indien u
> niet de geadresseerde bent of dit bericht abusievelijk aan u is
> toegezonden, wordt u verzocht dat aan de afzender te melden en het bericht
> te verwijderen. De Staat aanvaardt geen aansprakelijkheid voor schade, van
> welke aard dan ook, die verband houdt met risico's verbonden aan het
> elektronisch verzenden van berichten.
> This message may contain information that is not intended for you. If you
> are not the addressee or if this message was sent to you by mistake, you
> are requested to inform the sender and delete the message. The State
> accepts no liability for damage of any kind resulting from the risk
> inherent in the electronic transmission of messages.
>  _______________________________________________
> ckan-discuss mailing list
> ckan-discuss at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-discuss
> __________________________________________________________________
> Ce message et toutes les pièces jointes sont confidentiels et établis à
> l'intention exclusive de ses destinataires. Ce message ne constitue pas un
> document officiel. Seuls les documents revêtus de la signature du Président
> du Conseil Général ou d'un de ses délégataires sont de nature à engager le
> Département.
> Toute utilisation ou diffusion non autorisée est interdite. Tout message
> électronique est susceptible d'altération et le Département de la Gironde
> décline toute responsabilité au titre de ce message s'il a été altéré,
> déformé, falsifié.
> _______________________________________________
> ckan-discuss mailing list
> ckan-discuss at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-discuss/attachments/20120613/9173703d/attachment.htm>

More information about the ckan-discuss mailing list