[open-government] Long term preservation and archival for Open Data

Ivan Begtin ibegtin at gmail.com
Thu Oct 10 02:56:16 UTC 2013


Hi Jonathan,
   thanks for link, LOCKSS is interesting project.

I think that we need similar initiative for datasets, to keep most of them
archived and mirrored around the world.
It's also a question about how to collect datasets and how to provide
access. Data portals are different from common websites. We can't archive
them same way as we we archive other web resources.

So I think we need to include recommendations to implement interfaces for
archival purpose like metadata and datasets bulk download.

Another idea is to start measuring storage requirements for datasets. Since
CKAN is the most often used data portal software - one of ideas is to
implement something like wikiapiary.com for MediaWiki. A monitoring project
that could be used to measure actual size of datasets and to launch
archival bots.

Best Regards,
  Ivan Begtin


2013/10/9 Jonathan Gray <jonathan.gray at okfn.org>

> I think it would be really interesting to think about ways of creating
> full archives or 'mirrors' of official data sources (like you have - for
> example - with open source software repositories).
>
> Also wonder if you could learn from library/archival initiatives like
> LOCKSS (Lots of Copies Keep Stuff Safe)? [1]
>
> Jonathan
>
> [1] http://www.lockss.org/about/what-is-lockss/
>
>
> On 9 October 2013 18:43, Ton Zijlstra <ton.zijlstra at gmail.com> wrote:
>
>> Interesting question Ivan!
>>
>> In general I think governments cannot be presumed to keep supplying data
>> for the sake of re-users only. For instance when the governments purpose
>> for the data collection no longer exists.
>>
>> There are however various scenarios where mirroring of data might make
>> sense:
>> Government bodies reneging on earlier open data commitments or taking
>> steps towards less transparency
>> Government shutdowns as in the US (unlikely elsewhere in the world)
>> Government bodies dissolving without transfer of data
>> tasks/responsibilities
>> Budget cuts hitting open data provision
>> etc.
>>
>> A lot depends on the data itself as well. As archiving data may mean said
>> data is rapidly becoming useless / outdated, other than for archival
>> purposes themselves.
>> For other types of data having historic data may actually be more
>> valuable than just the current data. (e.g. I've been involved in a small
>> project where government only published todays values of data, but provided
>> no historic data, which we addressed by archiving the daily releases.)
>>
>> best,
>> Ton
>>
>>
>> ---------------------------------------------------------------------------
>> Interdependent Thoughts
>> Ton Zijlstra
>>
>> ton at tonzijlstra.eu
>> +31-6-34489360
>>
>> http://zylstra.org/blog
>>
>>
>> ---------------------------------------------------------------------------
>>
>>
>> On Wed, Oct 9, 2013 at 6:13 PM, Ivan Begtin <ibegtin at gmail.com> wrote:
>>
>>> Dear colleagues,
>>>    most of us are involved in open data activities and availability of
>>> opendata is critical issue when we want to re-use it.
>>>
>>> Right now we have a few examples when data, published earlier, disappear
>>> later.
>>> Sometimes it happens since data government information systems updated
>>> or closed, sometimes when "Government shutdown" happens (like data.govright now) and sometimes when government agencies disbanded.
>>>
>>> I know that where are some archival initiatives related to government
>>> websites. It's UK web archival initiative (
>>> http://www.nationalarchives.gov.uk/webarchive/) and similar projects in
>>> other countries (USA, Australia, Hong Kong and so on).
>>>
>>> As I understand no one such initiative covers datasets and when data.govis unavailable the only chance to get the data is to look at other
>>> commerical/non-profit projects that re-publish data.gov datasets for
>>> own use.
>>>
>>> So I would like to launch discussion about long term preservation and
>>> archival for datasets published by government and not only government.
>>>
>>> What do you think from your experience in your countires, do we need to
>>> launch long term preservation or it's not an issue right now?
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>>   Ivan Begtin
>>>
>>> Director of NGO "Informational Culture"
>>> email: ibegtin at infoculture.ru
>>> phone: +7 499 500 96 58, +7 910 426 68 83
>>> website: http://infoculture.ru
>>>
>>> _______________________________________________
>>> open-government mailing list
>>> open-government at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/open-government
>>> Unsubscribe: http://lists.okfn.org/mailman/options/open-government
>>>
>>>
>>
>> _______________________________________________
>> open-government mailing list
>> open-government at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-government
>> Unsubscribe: http://lists.okfn.org/mailman/options/open-government
>>
>>
>
>
> --
>
> Jonathan Gray
>
> Director of Policy and Ideas  | *@jwyg <https://twitter.com/jwyg>*
>
> The Open Knowledge Foundation <http://okfn.org/>
> *
>
> Empowering through Open Knowledge
>
> okfn.org  |  @okfn <http://twitter.com/OKFN>  |  OKF on Facebook<https://www.facebook.com/OKFNetwork> |
> Blog <http://blog.okfn.org/>  |  Newsletter<http://okfn.org/about/newsletter>
> *
>



-- 
С уважением,
  Иван Бегтин

Директор НП "Информационная культура"
email: ibegtin at infoculture.ru
phone: +7 499 500 96 58, +7 910 426 68 83
website: http://infoculture.ru
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-government/attachments/20131010/b2fa27ca/attachment-0001.html>


More information about the open-government mailing list