[Okfn-francophone] What data counts in Europe? Towards a public debate on Europe’s high value data and the PSI Directive

Pierre Chrzanowski pierre.chrzanowski at okfn.fr
Thu Jan 17 08:35:29 UTC 2019


Bonjour à tous,

je vous partage ce blog post
<https://blog.okfn.org/2019/01/16/what-data-counts-in-europe-towards-a-public-debate-on-europes-high-value-data-and-the-psi-directive/>
en anglais sur la révision de la Directive Européenne ISP co-écrit avec *Danny
Lämmerhirt* <https://twitter.com/DanLammerhirt?lang=de>*, et *
<https://twitter.com/pzwsk?lang=de>
* Sander van der Waal <https://twitter.com/sandervdwaal>*

Cordialement

---

January 16, 2019, by Danny Lämmerhirt
<https://blog.okfn.org/author/dannylammerhirt/>

*This blogpost was co-authored by **Danny Lämmerhirt*
<https://twitter.com/DanLammerhirt?lang=de>*, **Pierre Chrzanowski*
<https://twitter.com/pzwsk?lang=de>* and Sander van der Waal
<https://twitter.com/sandervdwaal> (*author note at the bottom)*

January 22 will mark a crucial moment for the future of open data in
Europe. That day, the final trilogue between European Commission,
Parliament, and Council is planned to decide over the ratification of the
updated PSI Directive. Among others, the European institutions will decide
over what counts as ‘high value’ data. What essential information should be
made available to the public and how those data infrastructures should be
funded and managed are critical questions for the future of the EU.

As we will discuss below, there are many ways one might envision the
collective ‘value’ of those data. This is a democratic question and we
should not be satisfied by an ill and broadly defined proposal. We
therefore propose to organise a public debate to collectively define what
counts as high value data in Europe.

What does PSI Directive say about high value datasets?

The European Commission provides several hints in the current revision of
the PSI Directive on how it envisions high value datasets. They are
determined by one of the following ‘value indicators’:

   - The potential to generate significant social, economic, or
   environmental benefits,
   - The potential to generate innovative services,
   - The number of users, in particular SMEs,
   - The revenues they may help generate,
   - The data’s potential for being combined with other datasets
   - The expected impact on the competitive situation of public
   undertakings.

Given the strategic role of open data for Europe’s Digital Single Market,
these indicators are not surprising. But as we will discuss below, there
are several challenges defining them. Also, there are different ways of
understanding the importance of data.

The annex of the PSI Directive
<http://www.europarl.europa.eu/sides/getDoc.do?type=REPORT&mode=XML&reference=A8-2018-0438&language=EN>
also includes a list of preliminary high value data, drawing primarily from
the key datasets defined by Open Knowledge International’s (OKI’s) Global
Open Data Index <https://index.okfn.org/dataset/>, as well as the G8 Open
Data Charter Technical Annex
<https://www.gov.uk/government/publications/open-data-charter/g8-open-data-charter-and-technical-annex>.
See the proposed list in the table below.

List of categories and high-value datasets:
*Category* *Description*
1. Geospatial Data Postcodes, national and local maps (cadastral,
topographic, marine, administrative boundaries).
2. Earth observation and environment Space and situ data (monitoring of the
weather and of the quality of land and water, seismicity, energy
consumption, the energy performance of buildings and emission levels).
3. Meteorological data Weather forecasts, rain, wind and atmospheric
pressure.
4. Statistics National, regional and local statistical data with main
demographic and economic indicators (gross domestic product, age,
unemployment, income, education).
5. Companies Company and business registers (list of registered companies,
ownership and management data, registration identifiers).
6. Transport data Public transport timetables of all modes of transport,
information on public works and the state of the transport network
including traffic information.



According to the proposal, regardless of who provide them, these datasets
shall be available for free, machine-readable and accessible *for
download,* *and
where appropriate, *via APIs. The conditions for re-use shall be compatible
with open standard licences.

Towards a public debate on high value datasets at EU level

There has been attempts by EU Member States to define what constitutes
high-value data at national level, with different results. In Denmark, basic
data
<https://en.digst.dk/media/14139/grunddata_uk_web_05102012_publication.pdf>
has been defined as the five core information public authorities use in
their day-to-day case processing and should release. In France, the law for
a Digital Republic aims to make available reference datasets
<https://www.data.gouv.fr/fr/reference> that have the greatest economic and
social impact. In Estonia, the country relies on the X-Road infrastructure
to connect core public information systems, but most of the data remains
restricted.

Now is the time for a shared and common definition on what constitute
high-value datasets at EU level. And this implies an agreement on how we
should define them. However, as it stands, there are several issues with
the value indicators that the European Commission proposes.

For example, how does one define the data’s potential for innovative
services? How to confidently attribute revenue gains to the use of open
data? How does one assess and compare the social, economic, and
environmental benefits of opening up data? Anyone designing these
indicators must be very cautious, as metrics to compare social, economic,
and environmental benefits may come with methodical biases. Research found
for example, that comparing economic and environmental benefits can unfairly
favour data of economic value at the expense of fuzzier social benefits
<http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1000202>,
as economic benefits are often more easily quantifiable and definable by
default.

One form of debating high value datasets could be to discuss what data gets
currently published by governments and why. For instance, with their Global
Open Data Index, Open Knowledge International has long advocated for the
publication of disaggregated, transactional spending figures. Another
example is OKI’s Open Data For Tax Justice
<https://datafortaxjustice.net/>initiative
which wanted to influence the requirements for multinational companies to
report their activities in each country (so-called
‘Country-By-Country-Reporting’), and influence a standard for publicly
accessible key data.

A public debate of high value data should critically examine the European
Commission’s considerations regarding the distortion of competition. What
market dynamics are engendered by opening up data? To what extent do
existing markets rely on scarce and closed information? Does closed data
bring about market failure, as some argue (Zinnbauer 2018
<https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3125074>)? Could it
otherwise hamper fair price mechanisms (for a discussion of these dynamics
in open access publishing, see Lawson, Gray and Mauri 2015
<https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2690570>)? How would
open data change existing market dynamics? What actors proclaim that
opening data could purport market distortion, and whose interests do they
represent?

Lastly, the European Commission does not yet consider cases of government
agencies  generating revenue from selling particularly valuable data. The
Dutch national company register has for a long time been such a case, as
has the German Weather Service. Beyond considering competition, a public
debate around high value data should take into account how marginal cost
recovery regimes currently work.
What we want to achieve

For these reasons, we want to organise a public discussion to collectively
define

   1. i) What should count as a high value datasets, and based on what
   criteria,
   2. ii) What information high value datasets should include,
   3. ii) What the conditions for access and re-use should be.

The PSI Directive will set the baseline for open data policies across the
EU. We are therefore at a critical moment to define what European societies
value as key public information. What is at stake is not only a question of
economic impact, but the question of how to democratise European
institutions, and the role the public can play in determining what data
should be opened.
How you can participate

   1. We will use the Open Knowledge forum as main channel for
   coordination, exchange of information and debate. To join the debate,
   please add your thoughts to this thread
   <https://discuss.okfn.org/t/psi-directive-review-your-opinion-on-the-proposed-changes/6679/12>
   or feel free to start a new discussion for specific topics.
   2. We gather proposals for high value datasets in this spreadsheet
   <https://docs.google.com/spreadsheets/d/1VsZVVXPFo9REics_Ie1VaJIITYUSBdQ9fxYWQLbpGAs/edit#gid=0>.
   Please feel free to use it as a discussion document, where we can
   crowdsource alternative ways of valuing data.
   3. We use the PSI Directive Data Census <http://psi.survey.okfn.org/> to
   assess the openness of high value datasets.

We also welcome any reference to scientific paper, blogpost, etc.
discussing the issue of high-value datasets. Once we have gathered
suggestions for high value datasets, we would like to assess how open
proposed high-value datasets are. This will help to provide European
countries with a diagnosis of the openness of key data.





*Author note: *

*Danny Lämmerhirt is senior researcher on open data, data governance, data
commons as well as metrics to improve open governance. He has formerly
worked with Open Knowledge International, where he led its research
activities, including the methodology development of the Global Open Data
Index 2016/17. His work focuses, among others, on the role of metrics for
open government, and the effects metrics have on the way institutions work
and make decisions. He has supervised and edited several pieces on this
topic, including the Open Data Charter’s Measurement Guide.*

*Pierre Chrzanowski is Data Specialist with the World Bank Group and a
co-founder of Open Knowledge France local group. As part of his work, he
developed the** Open Data for Resilience Initiative (OpenDRI) Index, a tool
to assess the openness of key datasets for disaster risk management
projects. He has also participated in the impact assessment prior to the
new PSI Directive proposal and has contributed to the Global Open Data
Index as well as the Web Foundation’s Open Data Barometer.*

*Sander van der Waal is Programme Lead for Fiscal Transparency at Open
Knowledge International. Furthermore, he’s responsible for the team at Open
Knowledge International that works in the areas of Research,
Communications, and Community. Sander combines a background in Computer
Science with Philosophy and has a passion for ‘open’ in all its form,
ranging from open data to open access and open source software.*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-francophone/attachments/20190117/678a2d17/attachment-0001.html>


More information about the okfn-francophone mailing list