[ckan-discuss] Updates to CKAN Quality Assurance Extension
Richard Cyganiak
richard at cyganiak.de
Wed Dec 21 15:02:35 GMT 2011
Hi John,
This sounds awesome!
Three comments on the method for calculating the five-star rating as indicated here:
http://wiki.ckan.org/Data_Quality
First, this page doesn't mention licensing. According to (most of) the definitions of the five-star scheme, the first star already requires an open license, so a dataset shouldn't get any stars at all if it's not under an open license.
Second, regarding the fourth star. Besides considering the media type of resources, it would also make sense to check for the presence of a SPARQL endpoint. SPARQL endpoints are recorded for more than 300 datasets on the Data Hub using the pseudo-type "api/sparql". A few more are recorded with the format "SPARQL". I suggest that datasets with such resources should also be considered for the fourth star.
Third, regarding the fifth star (is the dataset linked to others?). This cannot be automatically determined just by looking at the format. It either requires inspection of the actual data, or information about links in the metadata. As you're probably aware, we've established conventions for recording information on data links in CKAN [1], as part of the work of the lodcloud group on the Data Hub. Link information is captured for hundreds of datasets. I would claim that we have the majority of four-star datasets covered there, and hence you can determine if they should get the fifth star by checking for the presence of a links:xxx field.
Are there any plans for enabling the QA extension on the Data Hub?
All the best,
Richard
[1] http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation#Custom_CKAN_fields
On 21 Dec 2011, at 14:42, John Glover wrote:
> Hello all,
>
> One of the new additions in the (soon to arrive) CKAN 1.5.1 release is
> the ability to schedule tasks/processes to run in the background. This
> allows us to update information and perform potentially slow tasks
> much more frequently in response to user actions.
>
> One of the first CKAN extensions to make use of this is the QA
> extension. We can now calculate the 5 star ratings for dataset
> resources as soon as they are added or updated. This is hopefully a
> good first step towards users getting quick feedback for some of their
> work and keeping QA information up to date. In future we hope to
> improve the actual work done by the QA system in order to provide a
> more in depth analysis of CKAN resources, and there are also plans to
> integrate the QA data with our web user interface.
>
> I have updated our wiki with some information on the updates to the QA
> extension and the addition to our domain model. See:
> http://wiki.ckan.org/Data_Quality
> http://wiki.ckan.org/Domain_Model/Task_Status
>
> Thoughts/comments welcome. This should be going live on thedatahub this week.
>
> Cheers,
> John
>
> _______________________________________________
> ckan-discuss mailing list
> ckan-discuss at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-discuss
More information about the ckan-discuss
mailing list