[ckan-discuss] Topic list

p.romain at cg33.fr p.romain at cg33.fr
Fri Jun 15 08:31:48 BST 2012


Hi all,

Yes I agree Eurovoc is the best candidate as long as there are available 
alignement with existing vocabularies or undergoing plans to do so.

"we could implement a select dropdown that offers the subset to the end 
user but offer him the ability to describe its dataset with the top-level 
of this vocabulary".

The idea would be to have a select box that has the 22 eurovoc top topics.
When the user selects one a seb select box is filled with all the subtopic 
link to the top topic.

However if selecting a top topic should be a mandatory field, selecting a 
sub toping could be an option.

Regarding the IPTC topics I was not aware that there were available as in 
a skos version. that's interesting. However I would trust more the EU 
agency maintaining the eurovoc vocabulary (plus the translation aspect) 
than the IPTC agency but I might be wrong.

@Pablo : are you currently implementing this voc in your CKAN instance ?

Cordialement,
Pascal Romain
Chef de projet informatique documentaire
Service Projets Etudes Conseils
Direction des Systèmes d'Information
05 56 99 33 33 poste 6643
@datalocale




De :    Pablo Mendes <pablomendes at gmail.com>
A :     "Plantinga, Edo" <Edo.Plantinga at koop.wmrijk.nl>
Cc :    p.romain at cg33.fr, ckan-discuss at lists.okfn.org
Date :  14/06/2012 14:39
Objet : Re: [ckan-discuss] Topic list (vocabulary/valuelist) 
forclassifyingdatasets .




What about IPTC Media Topics? Used extensively for news. 17 top level 
terms, up to 5 levels of subtopics
http://cv.iptc.org/newscodes/mediatopic 

Available in several formats:
http://iptc.cms.apa.at/site/NewsCodes/View_NewsCodes#medtop 

Cheers,
Pablo


On Thu, Jun 14, 2012 at 11:32 AM, Plantinga, Edo <
Edo.Plantinga at koop.wmrijk.nl> wrote:
Hi Pascal & David,
 
I agree on the importance of having a value list that offers unambiguous 
definitions. And international translations are definitely a bonus. I also 
agree with you and David that the number number of topics that are 
displayed to a user should be limited. A maximum of around 12 *displayed 
to the visitor* would be ideal: I agree with David on that. The EUROVOC 
offers 22 topics. However, I guess this will not be such a big problem, 
since it is likely that not all topics will be filled when a dataportal 
contains fewer than say, 5000 datasets (and for larger dataportals, having 
22 topics may be justifiable). Furthermore, for large data portals a 
visitor will most likely perform a search on a specific keyword, and then 
filter down based on topics (using faceted search). It is unlikely that a 
specific keyword will have hits in each topic, so the topics list (facets) 
displayed will be less than the full 22 topics. For example: if a visitor 
would search on "Car", one would expect hits in the  TRANSPORT or 
ENVIRONMENT (etc) topics, but not in the POLITICS (etc) topics, limiting 
the number of topics displayed to the visitor.
 
Regarding your comment about tagging datasets with 72 GEOGRAPHY: I think 
having a geo component in a dataset does not automatically classify it in 
that category. For many geo datasets 48 TRANSPORT, 52 ENVIRONMENT or 56 
AGRICULTURE, FORESTRY AND FISHERIES will make more sense. 
 
@Pascal: So from your response: do I gather you agree that Eurovoc is the 
best option to use? 
 
I am not sure if I follow what you mean when you say:
"we could implement a select dropdown that offers the subset to the end 
user but offer him the ability to describe its dataset with the top-level 
of this vocabulary". Could you elaborate?
 
Regards,
 
Edo Plantinga
data.overheid.nl
 
 
 -----Oorspronkelijk bericht-----
Van: p.romain at cg33.fr [mailto:p.romain at cg33.fr] 
Verzonden: woensdag 13 juni 2012 12:34
Aan: Plantinga, Edo
CC: ckan-discuss at lists.okfn.org
Onderwerp: Re: [ckan-discuss] Topic list (vocabulary/valuelist) 
forclassifyingdatasets .

Hi Edo,

I totally agree with you. Eurovoc is a more suitable vocabulary but what 
matters is the possible alignement those vocabularies offer.
For example the term gezondheidszorg 
http://www.eionet.europa.eu/gemet/concept?cp=3866&langcode=nl&ns=1refers 
to an exact match in Eurovoc, Agrovoc and to a close match to dbpedia.
No one today would use the eurovoc 72 geography category to describe an 
opendata dataset including geolocalised data and the issue we are facing 
in my opinion is the number of thematic entries we offer to the end user : 
too many and we lose them, too little and we might confuse them.

As a response to this issue we could implement a select dropdown that 
offers the subset to the end user but offer him the ability to describe 
its dataset with the top-level of this vocabulary   
That's the way we currently go with our work on Ckan. We could probably 
share some code in that matter, couldn't we ?

Best,
Pascal Romain
Chef de projet informatique documentaire
Service Projets Etudes Conseils
Direction des Systèmes d'Information
05 56 99 33 33 poste 6643
@datalocale




De :        "Plantinga, Edo" <Edo.Plantinga at koop.wmrijk.nl>
A :        <ckan-discuss at lists.okfn.org>
Date :        13/06/2012 12:15
Objet :        Re: [ckan-discuss] Topic list (vocabulary/valuelist) for   
     classifyingdatasets .
Envoyé par :        ckan-discuss-bounces at lists.okfn.org



Hi Pascal,

Thank you for your swift response. 
Looking at the GEMET vocabulary, it seems to be limited to geographic
datasets. Naturally, many open datasets are geographic datasets.
However, there are also many datasets outside this domain. How would you
classify the datasets on the EU portal that are now under the current
categories Finance and Budgeting, Education and Communication, Economy
and Industry, Social Questions 
Population & Health? The GEMET vocabulary does not seem to be suitable
for this. 
A subset of the EUROVOC (while still not perfect, looking at the
datasets we have), seems to be a better fit for datasets outside the geo
domain, in my opinion. I can see a fairly big overlap between the
current EU portal topics and the EUROVOC topics, by the way.

These are the EUROVOC categories I am referring to, to avoid confusion.
04 POLITICS  08 INTERNATIONAL RELATIONS  10 EUROPEAN COMMUNITIES  12
LAW  16 ECONOMICS  20 TRADE  24 FINANCE  28 SOCIAL QUESTIONS  32
EDUCATION AND COMMUNICATIONS  36 SCIENCE  40 BUSINESS AND COMPETITION
44 EMPLOYMENT AND WORKING CONDITIONS  48 TRANSPORT  52 ENVIRONMENT  56
AGRICULTURE, FORESTRY AND FISHERIES  60 AGRI-FOODSTUFFS  64 PRODUCTION,
TECHNOLOGY AND RESEARCH  66 ENERGY  68 INDUSTRY  72 GEOGRAPHY  76
INTERNATIONAL ORGANISATIONS 

The advantage of EUROVOC is that there are subcategories that can be
used for classifying datasets that do not fall into an obvious category.
For example: under what category should a hospital quality dataset fall?
Searching on 'Hospital' on the EUROVOC site gives a search result of
health policies of "MT 2841 HEALTH", so according to EUROVOC this falls
under "28 SOCIAL QUESTIONS". Still not perfect, but hey, no
catagorization system will be.

I'd be interested to hear your take on this.

Best regards,

Edo Plantinga - Data.overheid.nl
---------------------------------------------------------------
Bezoekt u binnenkort een locatie van de Rijksoverheid?

Dan dient u in het bezit te zijn van een geldige Rijkspas of een geldig 
identiteitsbewijs (paspoort, nationale identiteitskaart, rijbewijs of 
vreemdelingendocument). Indien u bij controle geen geldig 
identiteitsbewijs kunt tonen, wordt de toegang geweigerd. 
Legitimatiebewijzen van andere organisaties worden niet geaccepteerd.

Dit bericht kan informatie bevatten die niet voor u is bestemd. Indien u 
niet de geadresseerde bent of dit bericht abusievelijk aan u is 
toegezonden, wordt u verzocht dat aan de afzender te melden en het bericht 
te verwijderen. De Staat aanvaardt geen aansprakelijkheid voor schade, van 
welke aard dan ook, die verband houdt met risico's verbonden aan het 
elektronisch verzenden van berichten.

This message may contain information that is not intended for you. If you 
are not the addressee or if this message was sent to you by mistake, you 
are requested to inform the sender and delete the message. The State 
accepts no liability for damage of any kind resulting from the risk 
inherent in the electronic transmission of messages.


_______________________________________________
ckan-discuss mailing list
ckan-discuss at lists.okfn.org
http://lists.okfn.org/mailman/listinfo/ckan-discuss


__________________________________________________________________

Ce message et toutes les pièces jointes sont confidentiels et établis à 
l'intention exclusive de ses destinataires. Ce message ne constitue pas un 
document officiel. Seuls les documents revêtus de la signature du 
Président du Conseil Général ou d'un de ses délégataires sont de nature à 
engager le Département. 
Toute utilisation ou diffusion non autorisée est interdite. Tout message 
électronique est susceptible d'altération et le Département de la Gironde 
décline toute responsabilité au titre de ce message s'il a été altéré, 
déformé, falsifié.
 
 
Bezoekt u binnenkort een locatie van de Rijksoverheid?
 
Dan dient u in het bezit te zijn van een geldige Rijkspas of een geldig 
identiteitsbewijs (paspoort, nationale identiteitskaart, rijbewijs of 
vreemdelingendocument). Indien u bij controle geen geldig 
identiteitsbewijs kunt tonen, wordt de toegang geweigerd. 
Legitimatiebewijzen van andere organisaties worden niet geaccepteerd.
 
 
Dit bericht kan informatie bevatten die niet voor u is bestemd. Indien u 
niet de geadresseerde bent of dit bericht abusievelijk aan u is 
toegezonden, wordt u verzocht dat aan de afzender te melden en het bericht 
te verwijderen. De Staat aanvaardt geen aansprakelijkheid voor schade, van 
welke aard dan ook, die verband houdt met risico's verbonden aan het 
elektronisch verzenden van berichten.
 
This message may contain information that is not intended for you. If you 
are not the addressee or if this message was sent to you by mistake, you 
are requested to inform the sender and delete the message. The State 
accepts no liability for damage of any kind resulting from the risk 
inherent in the electronic transmission of messages.
 

_______________________________________________
ckan-discuss mailing list
ckan-discuss at lists.okfn.org
http://lists.okfn.org/mailman/listinfo/ckan-discuss




__________________________________________________________________

Ce message et toutes les pièces jointes sont confidentiels et établis à l'intention exclusive de ses destinataires. Ce message ne constitue pas un document officiel. Seuls les documents revêtus de la signature du Président du Conseil Général ou d'un de ses délégataires sont de nature à engager le Département. 
Toute utilisation ou diffusion non autorisée est interdite. Tout message électronique est susceptible d'altération et le Département de la Gironde décline toute responsabilité au titre de ce message s'il a été altéré, déformé, falsifié.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-discuss/attachments/20120615/815760ca/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 7254 bytes
Desc: not available
URL: <http://lists.okfn.org/pipermail/ckan-discuss/attachments/20120615/815760ca/attachment-0001.jpeg>


More information about the ckan-discuss mailing list