[ckan-discuss] Using CKAN for PublicData.eu

Jonathan Gray jonathan.gray at okfn.org
Tue Jun 22 12:31:42 BST 2010


Dear CKAN developers (and others!),

This is a note to kick off discussion about how we might be able to
use CKAN for a project provisionally called PublicData.eu, which will
start this autumn as part of a bigger project on governmental linked
data around Europe.

Below is some information about proposed features for PublicData.eu
(not yet for general circulation). I thought it would be useful to
bear some of this in mind for current work on CKAN. In the medium term
I guess the next step is to comb through this document, making sure
the key points are covered by new tickets and existing tickets in the
trac.

Perhaps we could arrange a quick Skype chat / brainstorm about this --
say in the week starting 5th July? If there is interest I'll set up a
Doodle poll to close on date...

All the best,

Jonathan

= PublicData.eu – Publishing Governmental Information as Linked Data =

== Overview ==

The purpose of this PublicData.eu use case is to increase public access
to high-value, machine-readable data sets generated by the European,
national as well as regional governments and public administrations.
Although this effort will be similar to developments in other parts of
the world, for the case of Europe it will be more challenging due to
the larger organizational and linguistic diversity and thus represent
an ideal application scenario for Linked Data technologies.

The adaptation and deployment of the LOD2 Stack for the PublicData.eu use
case will increase the ability of the public to find, download, and
use easily data sets that are generated and held by various
governmental branches and institutions in Europe. PublicData.eu will
provide descriptions of the data sets (metadata), information about
how to access the data sets, a facility for sharing views and reports,
as well as tools that leverage government data sets. Based on the LOD2
research results, we will deploy tools and services to classify and
interlink data sets automatically, to assess their information quality
and to enrich and repair the published data sets.

Public participation and collaboration will be one of the keys to the
success of PublicData.eu. In this use case we will enable the public to
participate by providing downloadable data sets to build applications,
conduct analyses, and perform research. PublicData.eu will improve based
on feedback, comments, and recommendations from the public and,
therefore, we will implement methods for individuals to suggest data
sets they would like to see, to rate and comment on current data sets,
and to suggest ways of how to improve the data sets as well as the
PublicData.eu website. A primary goal of PublicData.eu is to improve access
to governmental data and expand creative use of those data beyond the
walls of government by encouraging innovative ideas (e.g., mashups and
web applications). PublicData.eu will make government more transparent
and will help creating an unprecedented level of openness in
government. Such openness will strengthen the European democracy and
promote efficiency and effectiveness in Government.

Quite some governmental and administrative information in Europe is
already publicly available in structured form. This includes, for
example, information about elected representatives, financial
transparency information or statistical data. Unfortunately, this data
is scattered around, uses a variety of different incompatible data
formats, identifiers and schemata. With PublicData.eu we will not
immediately solve these problems, but by developing guidelines, best
practices and showcases we will start a process to make European
governmental and administrative data more accessible to and compatible
and digestible for ordinary citizens. In particular, the
semi-automatic classification, interlinking, enrichment and repair
methods developed in LOD2 will create a significant benefit, since
they allow the data to be more easily explored, analyzed and mashed
together.

== Objectives ==

The objective of WP9 is to showcase the wide applicability of the LOD2
Stack through the design, specification, implementation, testing and
user evaluation of a case study targeting ordinary citizens of the
European Union. In WP9 we will develop an LOD2 infrastructure able to
make governmental data accessible for everybody. We will deploy the
LOD2 Stack as a Web service allowing governments and governmental
agencies to publish their data based on open standards. The data will
be automatically classified and interlinked with other relevant Data
Web sources using the components developed in LOD2; provenance will be
tracked and the information quality assessed. The data will be
browseable to citizens according to different access paradigms, i.e.
spatial, chronological, topical. Citizens will be enabled to subscribe
to information relevant for them based on their location, and
interests. Furthermore, the PublicData.eu platform will allow citizens to
comment and discuss information. WP7 involves LOD end users and
stakeholders from start to finish. Users and stakeholders will be
consulted at the initial user requirements phase and will be involved
in two system evaluations (interim and final). In this way, we ensure
the relevance of the LOD2 project to its target beneficiaries.

== Description of work ==

 * Task 1: Adaptation and Deployment of the LOD2 Stack for
PublicData.eu: This task will adapt and deploy the LOD2 Stack on the
website PublicData.eu. We will implement a user interface enabling a
faceted browsing of metadata and data sets itself. The data sets will
be catalogued according to various dimensions, e.g. geographical
coverage, temporal coverage, type of data, origin etc. In addition to
cataloguing the data sets, we will provide functionality to submit
tools and mashups for exploration and browsing of the data sets. All
data sets will be available in formats adhering to open-standards, in
particular RDF and Linked Data. For convenience we will deploy
automatic conversion tools to other data formats, such as CSV, Excel,
KML.
 * Task 2: Personalization of PublicData.eu: This task is devoted to the
addition of personalization features to the PublicData.eu website.
Citizens will be enabled to register and login (also via OpenId and
FOAF+SSL) to PublicData.eu. Once logged in users can rate and comment on
existing data sets; they will be enabled to provide a wish list of
missing data sets, add additional data sets with respective
descriptions; they can upload revised data set versions and develop
visualization tools and mashups on top of the data sets. We will also
work on a notification and subscription service, which will enable
users to be notified once new data sets concerning their personal
preferences (e.g. region, language, type of data etc.) are added to
PublicData.eu.
 * Task 3: Development of Data Curation, Publication and Access
Strategies for Governments and Public Administration Based on Open
Standards and Linked Data Principles: In order to support European
governments as well as public administrations we will develop a guide
on data curation, publication and access strategies based on open
standards and Linked Data principles. The guide will enable people in
governments and administrations to identify and use methods and tools
for publishing and curating data benefiting the general public. The
guide will cover topics such as identifier creation and selection,
reuse of popular vocabularies for common information, generating and
maintaining links to other data sources, recipes for serving linked
data in particular from relational representations, testing and
debugging open data sets, discovery of open data on the Web. We will
devote particular attention to elaborate on specific best practices
from the governmental and administrative domain and stress the
particular importance for participatory governance and citizen
involvement.

== Deliverables ==

 * First release of the PublicData.eu website and tools. The first
release of the PublicData.eu web portal will include all crucial
functionality for publishing, searching, browsing and exploring
Government data. This includes facet-based browsing of data set
metadata along various dimensions (data set type, spatial/temporal
coverage, origin etc.). We will collaborate with governments and
public administrations to obtain and integrate a significant amount of
data sets for this first public release of PublicData.eu.
 * Intermediate release of the PublicData.eu website and tools. For the
intermediate release, we will integrate algorithms and tools developed
in the RTD work packages of LOD2, in particular for automatic
classification, interlinking, enrichment and repair of data sets. We
will annotate each data set with an information quality score, which
gives potential users insights about the prospective quality of the
data.
 * Final release of the PublicData.eu website and tools. The final
release will be devoted on deploying improvements for both the user
interfaces and end-user functionalities, as well as regarding the
classification, interlinking, enrichment and repair algorithms.
Additional requirements and deployment targets for the final release
will be obtained based on a large-scale user evaluation of the
intermediate PublicData.eu releases.
 * Release of PublicData.eu including personalization features. Based on
the first release of the PublicData.eu website and tools, we deploy
personalized features such as rating, commenting, wish lists, user
contributed metadata and data set revisions as well as tools and
mashups. This release will also allow users to subscribe to
information related to their region of interest, language and other
personal preferences.
 * Guide and best practices presentation
 * Guide and best practices brochure



More information about the ckan-discuss mailing list