[ckan-discuss] (no subject)

Mark Wainwright mark.wainwright at okfn.org
Mon Feb 27 18:47:16 GMT 2012


Dear all,

I have pasted below the notes from the online meeting on CKAN and Linked
Data on 2 February. They're taken from the etherpad for the meeting here: <
http://ckan.okfnpad.org/meetup-2012-02-02>. Many thanks to Richard Cyganiak
for facilitating the meeting and for tidying up the etherpad.

This is clearly an area where there's still plenty to say - we will
hopefully schedule in a follow-up meeting in a couple of months' time.

While I'm writing - another plug for this week's meeting on the Webstore,
or the following one (in 2 weeks) on metadata standards:

Webstore: http://ckan.okfnpad.org/meetup-2012-03-01
Metadata standards: http://ckan.okfnpad.org/meetup-2012-03-15

Regards,

Mark

-- 
Mark Wainwright, CKAN Community Co-ordinator
Open Knowledge Foundation http://okfn.org/
Skype: m.wainwright
*
*
*
*
*= Community meetup: Linked Data & CKAN, Feb 2nd 2012 =*

*Community page:
**http://wiki.ckan.org/Linked_Data*<http://wiki.ckan.org/Linked_Data>

2nd February 2012
17:00 UTC (17:00 GMT, 18:00 CET)
Duration: 90 minutes

*Fallback to #ckan on freenode if problems with skype!*

   - http://webchat.freenode.net/ – #ckan channel


   - or use your IRC client at irc.freenode.net – Port 6667 – #ckan


*== Participants ==*
Please enter your skype details below:

*Present:*

   - *Host: Mark Wainwright (m.wainwright)* - CKAN Community Co-ordinator


   - Richard Cyganiak (richard.cyganiak) - DERI, Galway; LODcloud; DCAT
   standard


   - Jindřich Mynarz (jindrich.mynarz) - Prague University of Economics;
   cz.ckan.net


   - Uroš Milošević, IMP, Serbia (white_pawn) - Belgrade - LOD2 project -
   publishing stat. data


   - *Skype hoster: David Raznick (draznick)* - CKAN tech. lead - Eurovoc,
   multilingual


   - Hugh Williams (hwilliams62) - OpenLink - virtuoso universal server -
   LOD2 proj - statistics on datasets


   - Daniel Dietrich (ddie22) - chair OKF Germany - Open Gov Data & Open
   Data EU w. grps - offendaten.de: datacatalogues to Ger cities


   - Valentina Janev, IMP, Serbia (impvalentina) - part of LOD2 consortium
   - serbian CKAN


   - Pablo Mendes (pablonascimentomendes), F.U. Berlin, planet-data.eu,
   lod2.eu


*Apologies:*

   - Phil Archer (philarcher)


   - Roberto García (rogargon)


*== Agenda ==*

   - Brief intro from participants (name, organization, keywords of
   interest)


   - Review of progress on topics from last meetup


   - Topics of interest - see below


*== Review of CKAN+LD news ==*

   - David: working on better RDF import/export


   - Making it easier to add custom form/validation for groups


   - Improving vocabulary/taxonomy support. Currently there's only free
   tagging. Want to support existing taxonomies like Eurovoc


   - Richard/DavidRead working on CKAN relationships supporting LODcloud
   links


   - More doc on wiki.ckan.org/contrib


*== Topic list ==*

*Please add your name under any topics that you'd like to talk/hear about,
and add your own topics!*

*Describing the internal structure of datasets*

   - Interested: Richard, Roberto, Valentina, Uroš


   - For the Czech CKAN we are experimenting with reengineering the entity
   schema of individual (non-RDF to date, mostly XLS) datasets into UML class
   diagrams


   - The diagram files are associated with the datasets (using links in
   dataset metadata) and should serve as guidance for the design of RDFization
   and interlinking


   - Example:
   http://cz.ckan.net/dataset/state_employment_policy_spendings_cechova


   - Serbian CKAN: Use RDF Data Cube Vocabulary to describe the internal
   structure of RDF datasets from the Serbian Statistical Office


   - define Data Structure Definition (DSD) files for statistical areas,
   e.g. National Accounts or Prices.


   - use the CKAN extra fields to describe Categories (topics), Geographic
   coverage, Temporal coverage from, Temporal granularity.


   - On Publicdata.eu: http://publicdata.eu/package?extras_eu_country=RS


   - Should be on http://rs.ckan.net/ but seems not to be working today :-(


   - Please announce it on the publishing-statistical-data Google Group
   when it's ready :-)


   - Where should the description of CKAN datasets be put?


   - Should it be a separate resource? Should we put it inside the
   description of dataset?


   - Where should we host the documentation files (UML diagram images,
   DSDs)?


   - Is it appropriate to use CKAN Storage extension for this purpose?
   (yes!)


   - Relates to the question of what to do with additional documentation,
   extra resources, schema, manual, images, etc.?


   - Could also be facilitated by resource types offered via dropdown
   (standardized types)


   - Where should we store the provenance information for the documention
   describing CKAN datasets? For instance, in some cases the documentation
   (e.g., the UML diagram in the case of Czech CKAN) might not be provided by
   the datasets author/maintainer, so it would be good to have this
   information stored somewhere.


*Integrating quality assurance information with CKAN*

   - Interested: Richard, Pablo


   - see http://wiki.ckan.org/Data_Quality


   - see http://labs.mondeca.com/sparqlEndpointsStatus/index.html


   - see http://www4.wiwiss.fu-berlin.de/lodcloud/state/


   - See http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/


   - Pablo: working in LOD2 + Planet Data on a conceptual model +
   implementation for data quality


   - Conceptual model is generic, based on material like Chris Bizer's work
   on information quality


   - Implementation is planned for RDF-based datasets


   - Push evaluation results to semantic.ckan.net?


   - Can the quality assurance information be "reduced" and serialized into
   Extra data fields in standard CKAN?


   - In the long run semantic.ckan.net to be integrated more closely with
   the core.


*Projects producing additional information about Data Hub datasets*

   - Mondeca SPARQL endpoint status


   -  http://labs.mondeca.com/sparqlEndpointsStatus/index.html


   - For example:
   http://labs.mondeca.com/sparqlEndpointsStatus/details/dbpedia.html


   - University Leipzig's LODStats project


   - http://stats.lod2.eu/


   - State of the LOD Cloud / CKAN validator


   -
   http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/validate.php?package=dbpedia



   - Pablo's upcoming “Data Quality Analyser (?)”


   - Third-party metadata about datasets in CKAN: how to hook these back
   into TheDataHub.org, what to do with datasets for which these don't make
   sense?


   - Create a “bot” that periodically adds an extra field or link to the
   dataset description


   - Create a "live field" where you add a URL pattern that gets queried
   via ajax when the package is loaded.


   - Could be done as part of a special per-group view page that might come
   along with the custom per-group forms



*Integration with LOD2, WebID*
*Brief discussion of questions e-mailed in advance from Bert Van Nuffelen
(unable to attend in person)*

   - How do you see the CKAN integration in the LOD2 stack? The Stack is
   RDF based, and one of the goals of the LOD2 project is to more tightly
   integrate that.


   - Jonathan Gray promised during the review that there would be work done
   to make the interaction on the metadata RDF based.


   - David: only publicdata.eu has that at present. RDF export will be
   available for others as well using the CKAN RDF extension:

            http://wiki.ckan.org/Extensions#CKAN_RDF

   - For uploading to a CKAN repository an account is required. Does CKAN
   supports WebID? As that is chosen to be the overall supported
   authentication.


   - David: OpenID has been deprecated. WebID is something to definitely
   think about. It is an interesting candidate.



*== Post-meetup evaluation/comments ==*

   - Pablo: useful to touch base again, will have lots more to report in 2
   mths


   - Valentina - very fruitful meeting, thanks for suggestions


   - Richard - good to see what people are working on


*== Review of progress on topics from last meetup ==*

   - Keith to look into creating the converter to get native dcat/VoID into
   the CKAN API


   - Richard (with Anja, Pablo) to come up with HTML form capturing the
   lodcloud metadata


   - Richard to write a script that takes existing links from extra fields
   and turns them into proper relationships using the API


   - Comments on making the API better would be well received ;-)


   - Pablo and Pierre-Yves to explore a metadata enricher that adds
   additional fields (number of triples, vocabularies used) by looking at the
   dumps that are already listed: in-progress, relates to the so-called
   Pablo's "Data Quality Analyzer" :)


   - Pierre-Yves to add his stuff to http://wiki.ckan.org/Contrib


   - Rufus to add some links to quick&dirty CKAN bulk import scripts to
   http://wiki.ckan.org/Contrib




*== Left-over topics, consider for next meetup ==*

*Better use or integration of linked data related tools into CKAN*

   - How are APIs currently being used.


   - What tools most popular/downloaded (apps)


   - What types of data do they provide.


   - How can we link CKAN functionality with them.


   - What duplication of functionality is there between the tools and what
   does that mean for us.


   - Tools to improve CKAN search possibilities


   - Interested: Valentina, Uroš…


*Previewing linked datasets in CKAN*

   - using existing tools for viewing triples and sparql endpoints?


   - Interested: Richard, Roberto,…


*Native triple storage for CKAN (data, not metadata)*

   - as we now have native tabular storage - especially for examples or
   similar


   - see http://wiki.ckan.org/Storage


   - Interested: …


*Showing data summary information*

   - should we start showing dataset summary stats prominently in search
   results and dataset pages?


   - e.g., number of triples


   - There's a related extension for OntoWiki, from which some code may be
   adapted: https://github.com/AKSW/void.ontowiki


   - Interested: ...




*== Addendum ==*


*Integration with PoolParty*
Remarks from Martin (SWC, LOD2) added after the meeting

As discussed with Jonathan weeks/months ago, we (SWC) could support the
metadata layer of CKAN by a connection (of CKAN) to our PPT (
http://www.poolparty.biz) - PoolParty Thesaurus management Software (SKOS
vocabularies, in RDF format, including the ability to publish these
thesauri / controlled vocabulary as linked open data - and thereby enable
e.g. autocomplete mechanisms for metadata management in CKAN on the basis
of linked open controlled vocabularies). This integration project could be
done in the course of LOD2 - to bring more sense into the metadata layer of
CKAN by using controlled vocabularies instead of 100% free tagging by users
(also for federated CKAN instances). Jonathan and I called the idea: 'from
(metadata) soup 2 sense' - also as PoolParty (PPT) will become part of the
LOD2 stack as open source version in 2013 - but now we can use the
commercial version of PPT as LOD2 partners - and also as CKAN partners (SWC
is official CKAN partner)... Looking forward to discussing this in more
detail in a call as well as at the LOD2 plenary in Vienna in March 2012!!!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-discuss/attachments/20120227/9a31436c/attachment-0001.htm>


More information about the ckan-discuss mailing list