[open-data-handbook] Wording relating to cost of data?

Ton Zijlstra ton.zijlstra at gmail.com
Thu Mar 21 12:46:55 UTC 2013


Hi all, I had sent an answer earlier, but now realize I only sent it
to Peter and not the list. So here's my response from earlier, largely
overlapping with Rufus' answer:

Hi Peter, all,

The "freely used", is free as in speech, I'd say.

As to cost of access, this can indeed get confusing easily, esp in the
EU. This as it is partly regulated when it comes to government held
data, making the public sector different from the private sector held
data.

Selling data by definition falls outside the scope of open data, as it
puts a barrier to entry at the bottom (where incidentally all the real
innovation potential lives, which is why existing gov data re-using
companies say they don't mind paying for data: it nicely seals off the
bottom of their markets, limiting competition from new players that
may have new ways of doing things)

Not all charges are about selling however. There's the cost of
delivery. The existing regulations in the EU PSI Directive for gov
data talk about being able to charge for the costs you make, with a
reasonable margin on top of it. That is not open data, in the sense
that it basically allows any charging, but also because it seems to
say you can count as cost all the things you did to get the data. Even
though most of that would be part of the public task of the data
holder. This is the way quite a number of independent public sector
agencies work, they at some point started treating the data as assets
to generate revenue.

The new regulations under discussion in the EU, and the de facto
actions of a range of EU nations, talk about charging for 'marginal
costs'. These are the incremental costs of delivering the data
specifically to you (so not anything they did to get the data, or be
able to share data in general like having a website or dataportal).
This is akin to FOIA requests: the info is always free of charge but
copying costs etc may be charged to you.  Marginal costs theoretically
always apply (use of bandwith, or whatever), but in practice for all
but the largest datasets, it means digital delivery can/should be free
as in gratis. Either because the real costs of delivering it to you
specifically are insignificantly small, or because the costs of
charging for data provision to you is a larger effort than the revenue
gained. An example here would be Dutch meteo data: it is free as in
both speech and beer, unless you want to have the data delivered in
real time from the sensor network to you. That costs 20.000 per year
to make possible. If you can wait 3 minutes, it's downloadable from
the website without charge.


Marginal costing is generally accepted as not being detrimental to the
openness of open data.


In practice the issue is that many public sector bodies have no real
idea of the actual costs they make regarding data provision nor can
make a proper distinction between marginal costs, and other costs.
>From various studies (like the POPSIS) and personal experience it
seems to me the agencies who are the loudest in saying they absolutely
need the revenue, are the ones who have the least knowledge of their
operating costs or the cost of data provision. In contrast those that
have done the math in detail conclude free/gratis data provision is
the cheapest way. (Like Norwegian Meteo e.g.)

So yes, let's try and make the text in the ODH clearer. In the sense
that marginal costs may apply, but that other charges are seen as
breaching open data principles. A link to an explanation of what
marginal costs are would then also be useful, making clear 'marginal
costing' is a term that refers to a specific meaning, used in
legislation.

best,
Ton


---------------------------------------------------------------------
Interdependent Thoughts
Ton Zijlstra

ton at tonzijlstra.eu
+31-6-34489360

http://zylstra.org/blog

---------------------------------------------------------------------


On Thu, Mar 21, 2013 at 1:25 PM, Rufus Pollock <rufus.pollock at okfn.org> wrote:
> On 19 March 2013 07:15, Peter Krantz <peter at peterkrantz.se> wrote:
>>
>> Hi!
>>
>> In the ODH there are some places where cost of access to data is
>> discussed. From time to time I meet people who are confused regarding
>> the possibility of charging for data.
>>
>> For example this page has ambiguous statements:
>> http://opendatahandbook.org/en/what-is-open-data/index.html
>>
>> "Open data is data that can be freely used, reused and redistributed
>> by anyone - subject only, at most, to the requirement to attribute and
>> sharealike."
>>
>> and in the first bullet point below:
>>
>> "Availability and Access: the data must be available as a whole and at
>> no more than a reasonable reproduction cost, preferably by downloading
>> over the internet."
>
> Good point. The first quote is the summary of the Open Definition and
> the second quote seems a slight misquote of the formal Open Definition
> point 1 which says:
>
> The work shall be available as a whole and at no more than a
> reasonable reproduction cost, preferably downloading via the Internet
> without charge ...
>
> Seems like we should correct that second quote.
>
>> So in the first paragraph it is free but in the bullet point it seems
>> like it is OK to charge for data. The second statement has been used
>
> Strictly it is *ok* to charge for data. What you must do is make data
> available in bulk at cost of reproduction (which in general will be
> free or nearly free).
>
>> as an argument by people from a gvmt agency that have a business model
>> where a single row of data about costs 0.6 EUR. (getting the entire
>> database would cost around 400 000 EUR). As "open data" is gaining in
>> popularity they like to be part of that and thus consider the 0.6 EUR
>> a "reasonable reproduction cost".
>
> But that can't be the cost of reproduction. The cost of reproduction
> is essentially 0 for a single row and even for whole DB bulk access
> for GBs today is cents (so little that it's basically not worth
> charging ...)
>
>> I think the Open data handbook has to be clarified to reduce
>> ambiguity. Expensive data is not open data and maybe open data
>> definitions need to be at the "end of the scale" stressing that data
>> need to be free to be truly open. Experience from discussions about
>> software patents (RAND terms etc) shows that "Reasonable" can mean
>> very different things to different people.
>
> I think http://opendefinition.org/okd/ is pretty clear and we should
> inline that pretty much directly into the handbook.
>
>> As a second alternative ambiguity can be reduced be providing an
>> example of what "reasonable reproduction costs" can be, maybe by
>> explaining the cost of the medium (e.g. a DVD) and that it only
>> applies to a dataset as a whole.
>
> Agreed as per above. Delighted if you want to submit a pull request
> :-) https://github.com/okfn/opendatahandbook
>
> Rufus
>
> _______________________________________________
> open-data-handbook mailing list
> open-data-handbook at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-data-handbook
> Unsubscribe: http://lists.okfn.org/mailman/options/open-data-handbook




More information about the open-data-handbook mailing list