[open-government] Long term preservation and archival for Open Data

Christophe Guéret christophe.gueret at dans.knaw.nl
Thu Oct 10 09:43:11 UTC 2013


Hi Ivan, all,

Thanks for the nice discussion! I work at a digital archive called "DANS"
(Data Archiving and Networked Services) - http://www.dans.knaw.nl/ - here
in the Netherlands. We focus on archiving data coming from research done in
the humanities and also data that could be useful for these researchers.
But we are also thinking about the digital preservation of other kind of
datasets, governmental or not, and what need to be done for them. So I'm
much looking forward to discuss that matter with you all ;-)

Regarding some earlier comments, I agree with Ton that it is important to
serve up to date data. In that sense mirroring could make more sense that
archival (these are two different things from a data storage regime point
of view). For mirroring, you could have a look at the popular protocol
"OAI-PMH" used by digital archiving to share metadata. This protocol can be
used to mirror the content of a CKAN archive by letting the two instances
compare their content.

There would also be interesting things to discuss around the
de-referencability of servers. The problem with a gov shutdown is that the
URL don't resolve, even if a mirror exist users will not be redirected to
it. I'm curious how policies could be put in place to do roundrobbin of
URLs among the original source and some mirrors...
Related to that, DANS is also looking at the preservation of Linked Data.
The most interesting question there is whether one may want to keep URIs
that dereference to the archived content or not...

Cheers,
Christophe


On 10 October 2013 04:56, Ivan Begtin <ibegtin at gmail.com> wrote:

> Hi Jonathan,
>    thanks for link, LOCKSS is interesting project.
>
> I think that we need similar initiative for datasets, to keep most of them
> archived and mirrored around the world.
> It's also a question about how to collect datasets and how to provide
> access. Data portals are different from common websites. We can't archive
> them same way as we we archive other web resources.
>
> So I think we need to include recommendations to implement interfaces for
> archival purpose like metadata and datasets bulk download.
>
> Another idea is to start measuring storage requirements for datasets.
> Since CKAN is the most often used data portal software - one of ideas is to
> implement something like wikiapiary.com for MediaWiki. A monitoring
> project that could be used to measure actual size of datasets and to launch
> archival bots.
>
> Best Regards,
>   Ivan Begtin
>
>
> 2013/10/9 Jonathan Gray <jonathan.gray at okfn.org>
>
>> I think it would be really interesting to think about ways of creating
>> full archives or 'mirrors' of official data sources (like you have - for
>> example - with open source software repositories).
>>
>> Also wonder if you could learn from library/archival initiatives like
>> LOCKSS (Lots of Copies Keep Stuff Safe)? [1]
>>
>> Jonathan
>>
>> [1] http://www.lockss.org/about/what-is-lockss/
>>
>>
>> On 9 October 2013 18:43, Ton Zijlstra <ton.zijlstra at gmail.com> wrote:
>>
>>> Interesting question Ivan!
>>>
>>> In general I think governments cannot be presumed to keep supplying data
>>> for the sake of re-users only. For instance when the governments purpose
>>> for the data collection no longer exists.
>>>
>>> There are however various scenarios where mirroring of data might make
>>> sense:
>>> Government bodies reneging on earlier open data commitments or taking
>>> steps towards less transparency
>>> Government shutdowns as in the US (unlikely elsewhere in the world)
>>> Government bodies dissolving without transfer of data
>>> tasks/responsibilities
>>> Budget cuts hitting open data provision
>>> etc.
>>>
>>> A lot depends on the data itself as well. As archiving data may mean
>>> said data is rapidly becoming useless / outdated, other than for archival
>>> purposes themselves.
>>> For other types of data having historic data may actually be more
>>> valuable than just the current data. (e.g. I've been involved in a small
>>> project where government only published todays values of data, but provided
>>> no historic data, which we addressed by archiving the daily releases.)
>>>
>>> best,
>>> Ton
>>>
>>>
>>> ---------------------------------------------------------------------------
>>> Interdependent Thoughts
>>> Ton Zijlstra
>>>
>>> ton at tonzijlstra.eu
>>> +31-6-34489360
>>>
>>> http://zylstra.org/blog
>>>
>>>
>>> ---------------------------------------------------------------------------
>>>
>>>
>>> On Wed, Oct 9, 2013 at 6:13 PM, Ivan Begtin <ibegtin at gmail.com> wrote:
>>>
>>>> Dear colleagues,
>>>>    most of us are involved in open data activities and availability of
>>>> opendata is critical issue when we want to re-use it.
>>>>
>>>> Right now we have a few examples when data, published earlier,
>>>> disappear later.
>>>> Sometimes it happens since data government information systems updated
>>>> or closed, sometimes when "Government shutdown" happens (like data.govright now) and sometimes when government agencies disbanded.
>>>>
>>>> I know that where are some archival initiatives related to government
>>>> websites. It's UK web archival initiative (
>>>> http://www.nationalarchives.gov.uk/webarchive/) and similar projects
>>>> in other countries (USA, Australia, Hong Kong and so on).
>>>>
>>>> As I understand no one such initiative covers datasets and when
>>>> data.gov is unavailable the only chance to get the data is to look at
>>>> other commerical/non-profit projects that re-publish data.gov datasets
>>>> for own use.
>>>>
>>>> So I would like to launch discussion about long term preservation and
>>>> archival for datasets published by government and not only government.
>>>>
>>>> What do you think from your experience in your countires, do we need to
>>>> launch long term preservation or it's not an issue right now?
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>>   Ivan Begtin
>>>>
>>>> Director of NGO "Informational Culture"
>>>> email: ibegtin at infoculture.ru
>>>> phone: +7 499 500 96 58, +7 910 426 68 83
>>>> website: http://infoculture.ru
>>>>
>>>> _______________________________________________
>>>> open-government mailing list
>>>> open-government at lists.okfn.org
>>>> http://lists.okfn.org/mailman/listinfo/open-government
>>>> Unsubscribe: http://lists.okfn.org/mailman/options/open-government
>>>>
>>>>
>>>
>>> _______________________________________________
>>> open-government mailing list
>>> open-government at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/open-government
>>> Unsubscribe: http://lists.okfn.org/mailman/options/open-government
>>>
>>>
>>
>>
>> --
>>
>> Jonathan Gray
>>
>> Director of Policy and Ideas  | *@jwyg <https://twitter.com/jwyg>*
>>
>> The Open Knowledge Foundation <http://okfn.org/>
>> *
>>
>> Empowering through Open Knowledge
>>
>> okfn.org  |  @okfn <http://twitter.com/OKFN>  |  OKF on Facebook<https://www.facebook.com/OKFNetwork> |
>> Blog <http://blog.okfn.org/>  |  Newsletter<http://okfn.org/about/newsletter>
>> *
>>
>
>
>
> --
> С уважением,
>   Иван Бегтин
>
> Директор НП "Информационная культура"
> email: ibegtin at infoculture.ru
> phone: +7 499 500 96 58, +7 910 426 68 83
> website: http://infoculture.ru
>



-- 
Onderzoeker
+31(0)6 14576494
christophe.gueret at dans.knaw.nl

*Data Archiving and Networked Services (DANS)*
DANS bevordert duurzame toegang tot digitale onderzoeksgegevens.
Kijk op www.dans.knaw.nl voor meer informatie en contactgegevens.
DANS is een instituut van KNAW en NWO.

*Let's build a World Wide Semantic Web!*
http://worldwidesemanticweb.org/

*e-Humanities Group (KNAW)*
http://ehumanities.nl/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-government/attachments/20131010/7ef3f1cb/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 8825 bytes
Desc: not available
URL: <http://lists.okfn.org/pipermail/open-government/attachments/20131010/7ef3f1cb/attachment-0003.png>


More information about the open-government mailing list