[ckan-dev] ckanext-archiver/qa upgraded

David Read david.read at hackneyworkshop.com
Wed Jan 27 16:20:00 UTC 2016


Popular CKAN extensions 'Archiver' and 'QA' have recently been
significantly upgraded. Now it is relatively simple to add automatic
broken link checking and 5 stars of openness grading to any CKAN site.
At a time when many open data portals suffer from quality problems,
adding these reports make it easy to identify the problems and get
credit when they are resolved.

Whilst these extensions have been around for a few years, most of the
development has been on forks, whilst the core has been languishing.
In the past couple of months there has been a big push to merge all
the efforts from US (data.gov), Finland, Greece, Slovakia and
Netherlands, and particularly those from UK (data.gov.uk), into core.
It's been a big leap forward in functionality. Now installers no
longer need to customize templates - you get details of broken links
and 5 stars shown on every dataset simply by installing and
configuring the extensions. And now we're all on the same page, it
means we can work together better from now on.

The Archiver Extension regularly tries out all datasets' data links to
see if they are still working. File URLs that do work are downloaded
and the user is offered the 'cached' copy. Otherwise, URLs that are
broken are marked in red and listed in a report. See more:
ckanext-archiver repo, docs and demo images -
https://github.com/ckan/ckanext-archiver

The QA Extension analyses the data files that Archiver has downloaded
to reliably determine their format - CSV, XLS, PDF, etc, rather than
trusting the format that the publisher has said they are. This
information is combined with the data license and whether the data is
currently accessible to give a rating out of 5 according to Tim
Berners-Lee's 5 Stars of Openness. A file that has no open licence, or
is not available gets 0 stars. If it passes those tests but is only a
PDF then it gets 1 star. A machine-readable but proprietry format like
XLS gets it 2 stars, and an open format like CSV gets it 3 stars. 4
and 5 star data is that which uses standard schemas and references
other datasets, which tends to mean RDF. See ckanext-qa repo, docs and
demo images - https://github.com/ckan/ckanext-qa

David



More information about the ckan-dev mailing list