[ckan-discuss] Topic list (vocabulary/valuelist) forclassifyingdatasets .

Pablo Mendes pablomendes at gmail.com
Thu Jun 14 13:39:12 BST 2012


What about IPTC Media Topics? Used extensively for news. 17 top level
terms, up to 5 levels of subtopics
http://cv.iptc.org/newscodes/mediatopic

Available in several formats:
http://iptc.cms.apa.at/site/NewsCodes/View_NewsCodes#medtop

Cheers,
Pablo


On Thu, Jun 14, 2012 at 11:32 AM, Plantinga, Edo <
Edo.Plantinga at koop.wmrijk.nl> wrote:

> **
> Hi Pascal & David,
>
> I agree on the importance of having a value list that offers unambiguous
> definitions. And international translations are definitely a bonus. I also
> agree with you and David that the number number of topics that are
> displayed to a user should be limited. A maximum of around 12 *displayed to
> the visitor* would be ideal: I agree with David on that. The EUROVOC offers
> 22 topics. However, I guess this will not be such a big problem, since it
> is likely that not all topics will be filled when a dataportal contains
> fewer than say, 5000 datasets (and for larger dataportals, having 22 topics
> may be justifiable). Furthermore, for large data portals a visitor will
> most likely perform a search on a specific keyword, and then filter down
> based on topics (using faceted search). It is unlikely that a specific
> keyword will have hits in each topic, so the topics list (facets) displayed
> will be less than the full 22 topics. For example: if a visitor would
> search on "Car", one would expect hits in the  TRANSPORT or ENVIRONMENT
> (etc) topics, but not in the POLITICS (etc) topics, limiting the number of
> topics displayed to the visitor.
>
> Regarding your comment about tagging datasets with 72 GEOGRAPHY: I think
> having a geo component in a dataset does not automatically classify it in
> that category. For many geo datasets 48 TRANSPORT, 52 ENVIRONMENT or 56
> AGRICULTURE, FORESTRY AND FISHERIES will make more sense.
>
> @Pascal: So from your response: do I gather you agree that Eurovoc is the
> best option to use?
>
> I am not sure if I follow what you mean when you say:
> "we could implement a select dropdown that offers the subset to the end
> user but offer him the ability to describe its dataset with the top-level
> of this vocabulary". Could you elaborate?
>
>  Regards,
>
> Edo Plantinga
> data.overheid.nl
>
>
>  -----Oorspronkelijk bericht-----
> *Van:* p.romain at cg33.fr [mailto:p.romain at cg33.fr]
> *Verzonden:* woensdag 13 juni 2012 12:34
> *Aan:* Plantinga, Edo
> *CC:* ckan-discuss at lists.okfn.org
> *Onderwerp:* Re: [ckan-discuss] Topic list (vocabulary/valuelist)
> forclassifyingdatasets .
>
> Hi Edo,
>
> I totally agree with you. Eurovoc is a more suitable vocabulary but what
> matters is the possible alignement those vocabularies offer.
> For example the term gezondheidszorg
> http://www.eionet.europa.eu/gemet/concept?cp=3866&langcode=nl&ns=1refers
> to an exact match in Eurovoc, Agrovoc and to a close match to dbpedia.
> No one today would use the eurovoc 72 geography category to describe an
> opendata dataset including geolocalised data and the issue we are facing in
> my opinion is the number of thematic entries we offer to the end user : too
> many and we lose them, too little and we might confuse them.
>
> As a response to this issue we could implement a select dropdown that
> offers the subset to the end user but offer him the ability to describe its
> dataset with the top-level of this vocabulary
> That's the way we currently go with our work on Ckan. We could probably
> share some code in that matter, couldn't we ?
>
> Best,*
> Pascal Romain**
> Chef de projet informatique documentaire*
> Service Projets Etudes Conseils*
> Direction des Systèmes d'Information*
> 05 56 99 33 33 poste 6643
> @datalocale*
> * <http://www.gironde.fr/>
>
>
>
> De :        "Plantinga, Edo" <Edo.Plantinga at koop.wmrijk.nl>
> A :        <ckan-discuss at lists.okfn.org>
> Date :        13/06/2012 12:15
> Objet :        Re: [ckan-discuss] Topic list (vocabulary/valuelist) for
>      classifyingdatasets .
> Envoyé par :        ckan-discuss-bounces at lists.okfn.org
> ------------------------------
>
>
>
> Hi Pascal,
>
> Thank you for your swift response.
> Looking at the GEMET vocabulary, it seems to be limited to geographic
> datasets. Naturally, many open datasets are geographic datasets.
> However, there are also many datasets outside this domain. How would you
> classify the datasets on the EU portal that are now under the current
> categories Finance and Budgeting, Education and Communication, Economy
> and Industry, Social Questions
> Population & Health? The GEMET vocabulary does not seem to be suitable
> for this.
> A subset of the EUROVOC (while still not perfect, looking at the
> datasets we have), seems to be a better fit for datasets outside the geo
> domain, in my opinion. I can see a fairly big overlap between the
> current EU portal topics and the EUROVOC topics, by the way.
>
> These are the EUROVOC categories I am referring to, to avoid confusion.
> 04 POLITICS  08 INTERNATIONAL RELATIONS  10 EUROPEAN COMMUNITIES  12
> LAW  16 ECONOMICS  20 TRADE  24 FINANCE  28 SOCIAL QUESTIONS  32
> EDUCATION AND COMMUNICATIONS  36 SCIENCE  40 BUSINESS AND COMPETITION
> 44 EMPLOYMENT AND WORKING CONDITIONS  48 TRANSPORT  52 ENVIRONMENT  56
> AGRICULTURE, FORESTRY AND FISHERIES  60 AGRI-FOODSTUFFS  64 PRODUCTION,
> TECHNOLOGY AND RESEARCH  66 ENERGY  68 INDUSTRY  72 GEOGRAPHY  76
> INTERNATIONAL ORGANISATIONS
>
> The advantage of EUROVOC is that there are subcategories that can be
> used for classifying datasets that do not fall into an obvious category.
> For example: under what category should a hospital quality dataset fall?
> Searching on 'Hospital' on the EUROVOC site gives a search result of
> health policies of "MT 2841 HEALTH", so according to EUROVOC this falls
> under "28 SOCIAL QUESTIONS". Still not perfect, but hey, no
> catagorization system will be.
>
> I'd be interested to hear your take on this.
>
> Best regards,
>
> Edo Plantinga - Data.overheid.nl
> ---------------------------------------------------------------
> Bezoekt u binnenkort een locatie van de Rijksoverheid?
>
> Dan dient u in het bezit te zijn van een geldige Rijkspas of een geldig
> identiteitsbewijs (paspoort, nationale identiteitskaart, rijbewijs of
> vreemdelingendocument). Indien u bij controle geen geldig identiteitsbewijs
> kunt tonen, wordt de toegang geweigerd. Legitimatiebewijzen van andere
> organisaties worden niet geaccepteerd.
>
> Dit bericht kan informatie bevatten die niet voor u is bestemd. Indien u
> niet de geadresseerde bent of dit bericht abusievelijk aan u is
> toegezonden, wordt u verzocht dat aan de afzender te melden en het bericht
> te verwijderen. De Staat aanvaardt geen aansprakelijkheid voor schade, van
> welke aard dan ook, die verband houdt met risico's verbonden aan het
> elektronisch verzenden van berichten.
>
> This message may contain information that is not intended for you. If you
> are not the addressee or if this message was sent to you by mistake, you
> are requested to inform the sender and delete the message. The State
> accepts no liability for damage of any kind resulting from the risk
> inherent in the electronic transmission of messages.
>
>
> _______________________________________________
> ckan-discuss mailing list
> ckan-discuss at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>
>
> __________________________________________________________________
>
> Ce message et toutes les pièces jointes sont confidentiels et établis à
> l'intention exclusive de ses destinataires. Ce message ne constitue pas un
> document officiel. Seuls les documents revêtus de la signature du Président
> du Conseil Général ou d'un de ses délégataires sont de nature à engager le
> Département.
> Toute utilisation ou diffusion non autorisée est interdite. Tout message
> électronique est susceptible d'altération et le Département de la Gironde
> décline toute responsabilité au titre de ce message s'il a été altéré,
> déformé, falsifié.
>
>  **** **
>
> ** **
>
> Bezoekt u binnenkort een locatie van de Rijksoverheid?****
>
> ** **
>
> Dan dient u in het bezit te zijn van een geldige Rijkspas of een geldig
> identiteitsbewijs (paspoort, nationale identiteitskaart, rijbewijs of
> vreemdelingendocument). Indien u bij controle geen geldig identiteitsbewijs
> kunt tonen, wordt de toegang geweigerd. Legitimatiebewijzen van andere
> organisaties worden niet geaccepteerd.****
>
> ** **
>
> ** **
>
> Dit bericht kan informatie bevatten die niet voor u is bestemd. Indien u
> niet de geadresseerde bent of dit bericht abusievelijk aan u is
> toegezonden, wordt u verzocht dat aan de afzender te melden en het bericht
> te verwijderen. De Staat aanvaardt geen aansprakelijkheid voor schade, van
> welke aard dan ook, die verband houdt met risico's verbonden aan het
> elektronisch verzenden van berichten.****
>
> ** **
>
> This message may contain information that is not intended for you. If you
> are not the addressee or if this message was sent to you by mistake, you
> are requested to inform the sender and delete the message. The State
> accepts no liability for damage of any kind resulting from the risk
> inherent in the electronic transmission of messages.****
>
> ** **
>
> _______________________________________________
> ckan-discuss mailing list
> ckan-discuss at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-discuss/attachments/20120614/23ccf7dc/attachment.htm>


More information about the ckan-discuss mailing list