[Open-data-census] Penalty for no bulk data not appropriate for realtime or big data

Tue May 19 05:11:42 UTC 2015

Hello,

I think the question for bulk data in the census needs to change. It is not always possible to publish open data in bulk. As pointed out in the open data handbook http://opendatahandbook.org/glossary/en/terms/bulk/ <http://opendatahandbook.org/glossary/en/terms/bulk/>  publishing bulk data is not practical for realtime or big data.

Can I suggest that the current question is reworded from:

Is the data available in bulk? - Data is available in bulk if the whole dataset can be downloaded easily. It is not available in bulk, if access to the data is through a web page that provides access to only part of the database.

to something like:

Is the data available in bulk or via a real-time feed? - Data is available in bulk if the whole dataset can be downloaded easily. It is not available in bulk, if access to the data is through a web page that provides access to only part of the database. A real-time feed provides access to a subset of a database that changes frequently and is too large to download in bulk.

As an example, in my view, a real-time public transport fed in GTFS-RT <https://developers.google.com/transit/gtfs-realtime/> format should not be penalised 10 points for not being available in bulk.

What do you think? Should the question be changed? If so, what’s the process to change it (assuming most census reference the “master” question sheet)?

thanks

Stephen Gates
Australia’s Regional Open Data Census <http://australia.census.okfn.org/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-data-census/attachments/20150519/c28b2a95/attachment.html>