[public-lod2] (no subject)
mark.wainwright at okfn.org
Mon Feb 27 18:47:16 UTC 2012
I have pasted below the notes from the online meeting on CKAN and Linked
Data on 2 February. They're taken from the etherpad for the meeting here: <
http://ckan.okfnpad.org/meetup-2012-02-02>. Many thanks to Richard Cyganiak
for facilitating the meeting and for tidying up the etherpad.
This is clearly an area where there's still plenty to say - we will
hopefully schedule in a follow-up meeting in a couple of months' time.
While I'm writing - another plug for this week's meeting on the Webstore,
or the following one (in 2 weeks) on metadata standards:
Metadata standards: http://ckan.okfnpad.org/meetup-2012-03-15
Mark Wainwright, CKAN Community Co-ordinator
Open Knowledge Foundation http://okfn.org/
*= Community meetup: Linked Data & CKAN, Feb 2nd 2012 =*
2nd February 2012
17:00 UTC (17:00 GMT, 18:00 CET)
Duration: 90 minutes
*Fallback to #ckan on freenode if problems with skype!*
- http://webchat.freenode.net/ – #ckan channel
- or use your IRC client at irc.freenode.net – Port 6667 – #ckan
*== Participants ==*
Please enter your skype details below:
- *Host: Mark Wainwright (m.wainwright)* - CKAN Community Co-ordinator
- Richard Cyganiak (richard.cyganiak) - DERI, Galway; LODcloud; DCAT
- Jindřich Mynarz (jindrich.mynarz) - Prague University of Economics;
- Uroš Milošević, IMP, Serbia (white_pawn) - Belgrade - LOD2 project -
publishing stat. data
- *Skype hoster: David Raznick (draznick)* - CKAN tech. lead - Eurovoc,
- Hugh Williams (hwilliams62) - OpenLink - virtuoso universal server -
LOD2 proj - statistics on datasets
- Daniel Dietrich (ddie22) - chair OKF Germany - Open Gov Data & Open
Data EU w. grps - offendaten.de: datacatalogues to Ger cities
- Valentina Janev, IMP, Serbia (impvalentina) - part of LOD2 consortium
- serbian CKAN
- Pablo Mendes (pablonascimentomendes), F.U. Berlin, planet-data.eu,
- Phil Archer (philarcher)
- Roberto García (rogargon)
*== Agenda ==*
- Brief intro from participants (name, organization, keywords of
- Review of progress on topics from last meetup
- Topics of interest - see below
*== Review of CKAN+LD news ==*
- David: working on better RDF import/export
- Making it easier to add custom form/validation for groups
- Improving vocabulary/taxonomy support. Currently there's only free
tagging. Want to support existing taxonomies like Eurovoc
- Richard/DavidRead working on CKAN relationships supporting LODcloud
- More doc on wiki.ckan.org/contrib
*== Topic list ==*
*Please add your name under any topics that you'd like to talk/hear about,
and add your own topics!*
*Describing the internal structure of datasets*
- Interested: Richard, Roberto, Valentina, Uroš
- For the Czech CKAN we are experimenting with reengineering the entity
schema of individual (non-RDF to date, mostly XLS) datasets into UML class
- The diagram files are associated with the datasets (using links in
dataset metadata) and should serve as guidance for the design of RDFization
- Serbian CKAN: Use RDF Data Cube Vocabulary to describe the internal
structure of RDF datasets from the Serbian Statistical Office
- define Data Structure Definition (DSD) files for statistical areas,
e.g. National Accounts or Prices.
- use the CKAN extra fields to describe Categories (topics), Geographic
coverage, Temporal coverage from, Temporal granularity.
- On Publicdata.eu: http://publicdata.eu/package?extras_eu_country=RS
- Should be on http://rs.ckan.net/ but seems not to be working today :-(
- Please announce it on the publishing-statistical-data Google Group
when it's ready :-)
- Where should the description of CKAN datasets be put?
- Should it be a separate resource? Should we put it inside the
description of dataset?
- Where should we host the documentation files (UML diagram images,
- Is it appropriate to use CKAN Storage extension for this purpose?
- Relates to the question of what to do with additional documentation,
extra resources, schema, manual, images, etc.?
- Could also be facilitated by resource types offered via dropdown
- Where should we store the provenance information for the documention
describing CKAN datasets? For instance, in some cases the documentation
(e.g., the UML diagram in the case of Czech CKAN) might not be provided by
the datasets author/maintainer, so it would be good to have this
information stored somewhere.
*Integrating quality assurance information with CKAN*
- Interested: Richard, Pablo
- see http://wiki.ckan.org/Data_Quality
- see http://labs.mondeca.com/sparqlEndpointsStatus/index.html
- see http://www4.wiwiss.fu-berlin.de/lodcloud/state/
- See http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/
- Pablo: working in LOD2 + Planet Data on a conceptual model +
implementation for data quality
- Conceptual model is generic, based on material like Chris Bizer's work
on information quality
- Implementation is planned for RDF-based datasets
- Push evaluation results to semantic.ckan.net?
- Can the quality assurance information be "reduced" and serialized into
Extra data fields in standard CKAN?
- In the long run semantic.ckan.net to be integrated more closely with
*Projects producing additional information about Data Hub datasets*
- Mondeca SPARQL endpoint status
- For example:
- University Leipzig's LODStats project
- State of the LOD Cloud / CKAN validator
- Pablo's upcoming “Data Quality Analyser (?)”
- Third-party metadata about datasets in CKAN: how to hook these back
into TheDataHub.org, what to do with datasets for which these don't make
- Create a “bot” that periodically adds an extra field or link to the
- Create a "live field" where you add a URL pattern that gets queried
via ajax when the package is loaded.
- Could be done as part of a special per-group view page that might come
along with the custom per-group forms
*Integration with LOD2, WebID*
*Brief discussion of questions e-mailed in advance from Bert Van Nuffelen
(unable to attend in person)*
- How do you see the CKAN integration in the LOD2 stack? The Stack is
RDF based, and one of the goals of the LOD2 project is to more tightly
- Jonathan Gray promised during the review that there would be work done
to make the interaction on the metadata RDF based.
- David: only publicdata.eu has that at present. RDF export will be
available for others as well using the CKAN RDF extension:
- For uploading to a CKAN repository an account is required. Does CKAN
supports WebID? As that is chosen to be the overall supported
- David: OpenID has been deprecated. WebID is something to definitely
think about. It is an interesting candidate.
*== Post-meetup evaluation/comments ==*
- Pablo: useful to touch base again, will have lots more to report in 2
- Valentina - very fruitful meeting, thanks for suggestions
- Richard - good to see what people are working on
*== Review of progress on topics from last meetup ==*
- Keith to look into creating the converter to get native dcat/VoID into
the CKAN API
- Richard (with Anja, Pablo) to come up with HTML form capturing the
- Richard to write a script that takes existing links from extra fields
and turns them into proper relationships using the API
- Comments on making the API better would be well received ;-)
- Pablo and Pierre-Yves to explore a metadata enricher that adds
additional fields (number of triples, vocabularies used) by looking at the
dumps that are already listed: in-progress, relates to the so-called
Pablo's "Data Quality Analyzer" :)
- Pierre-Yves to add his stuff to http://wiki.ckan.org/Contrib
- Rufus to add some links to quick&dirty CKAN bulk import scripts to
*== Left-over topics, consider for next meetup ==*
*Better use or integration of linked data related tools into CKAN*
- How are APIs currently being used.
- What tools most popular/downloaded (apps)
- What types of data do they provide.
- How can we link CKAN functionality with them.
- What duplication of functionality is there between the tools and what
does that mean for us.
- Tools to improve CKAN search possibilities
- Interested: Valentina, Uroš…
*Previewing linked datasets in CKAN*
- using existing tools for viewing triples and sparql endpoints?
- Interested: Richard, Roberto,…
*Native triple storage for CKAN (data, not metadata)*
- as we now have native tabular storage - especially for examples or
- see http://wiki.ckan.org/Storage
- Interested: …
*Showing data summary information*
- should we start showing dataset summary stats prominently in search
results and dataset pages?
- e.g., number of triples
- There's a related extension for OntoWiki, from which some code may be
- Interested: ...
*== Addendum ==*
*Integration with PoolParty*
Remarks from Martin (SWC, LOD2) added after the meeting
As discussed with Jonathan weeks/months ago, we (SWC) could support the
metadata layer of CKAN by a connection (of CKAN) to our PPT (
http://www.poolparty.biz) - PoolParty Thesaurus management Software (SKOS
vocabularies, in RDF format, including the ability to publish these
thesauri / controlled vocabulary as linked open data - and thereby enable
e.g. autocomplete mechanisms for metadata management in CKAN on the basis
of linked open controlled vocabularies). This integration project could be
done in the course of LOD2 - to bring more sense into the metadata layer of
CKAN by using controlled vocabularies instead of 100% free tagging by users
(also for federated CKAN instances). Jonathan and I called the idea: 'from
(metadata) soup 2 sense' - also as PoolParty (PPT) will become part of the
LOD2 stack as open source version in 2013 - but now we can use the
commercial version of PPT as LOD2 partners - and also as CKAN partners (SWC
is official CKAN partner)... Looking forward to discussing this in more
detail in a call as well as at the LOD2 plenary in Vienna in March 2012!!!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the lod2