[openspending-dev] OpenSpending Source Archive

Friedrich Lindenberg friedrich.lindenberg at okfn.org
Mon Dec 24 12:32:02 UTC 2012


Hey guys,

I think the fundamental problem about this is that it still archives the
data at a somewhat random time. What we really want is to store the exact
version of the data that was used to load OpenSpending, i.e. the
application should load a source file into the archive (with a name that
includes the Run.id) and then load that data - not the streaming version
off the web.

As for S3: it's currently picked up by the overnight backup job, so
technically its there already. I agree, though, that it would be nicer to
push this straight through boto. I like the idea of hacking something up
that does S3 directory listings, I have a lot of things on there that would
benefit from an index -- maybe this could be independent from OpenSpending?

Cheers,

- Friedrich


On Fri, Dec 21, 2012 at 10:26 PM, Vitor Baptista <vitor at vitorbaptista.com>wrote:

> True. With the current ~20 GB, we would save just 90 cents per month.
> Definitely not worth the hassle :P
>
> BTW, doesn't archive.org saves that data as well?
>
>
> 2012/12/21 Rufus Pollock <rufus.pollock at okfn.org>
>
>> Very true but I also imagine that:
>>
>> a) If there were a crash or outage and lost data it would be nice not
>> to have to wait a couple of days
>> b) it's nice to have this browsable archive in its own right
>>
>> Of course if the data really got very big you are quite right that
>> glacier might be a better cost/benefit trade-off :-)
>>
>> rufus
>>
>> On 21 December 2012 18:31, Vitor Baptista <vitor at vitorbaptista.com>
>> wrote:
>> > Well, that's the use Friedrich said on his first e-mail, right? The
>> data is
>> > already available on OpenSpending, it's just if "we need to reload the
>> > database but the original data has become inaccessible".
>> >
>> >
>> > 2012/12/21 Rufus Pollock <rufus.pollock at okfn.org>
>> >>
>> >> I don't think Glacier is really suitable as it can take a few days to
>> >> get access to your data ;-)
>> >>
>> >> Glacier is more for long-term archival storage AFAICT ...
>> >>
>> >> Rufus
>> >>
>> >> On 21 December 2012 16:42, Vitor Baptista <vitor at vitorbaptista.com>
>> wrote:
>> >> > Although there's not much data, this sounds like a great use for the
>> new
>> >> > Amazon Glacier (http://aws.amazon.com/glacier/).
>> >> >
>> >> >
>> >> > 2012/12/21 Rufus Pollock <rufus.pollock at okfn.org>
>> >> >>
>> >> >> This is brilliant Friedrich.
>> >> >>
>> >> >> One thought: what about s3 (or similar) for permanence (as opposed
>> to
>> >> >> local disk) - this seems like important resources and having them
>> >> >> preserved would be good.
>> >> >>
>> >> >> Also, some ideas (feel free to shoot down!):
>> >> >>
>> >> >> * What about one directory per dataset for consistency.
>> >> >> * What about adding some datapackage.json style metadata in each
>> >> >> directory (i volunteer to do this)
>> >> >> * If on s3 we could still have a nice index page with a small
>> >> >> index.json file and some js (again i volunteer)
>> >> >>
>> >> >> Rufus
>> >> >>
>> >> >> On 19 December 2012 13:58, Friedrich Lindenberg
>> >> >> <friedrich.lindenberg at okfn.org> wrote:
>> >> >> > Hi all,
>> >> >> >
>> >> >> > I've begun collecting all the source data for OpenSpending (all
>> >> >> > datasets) at
>> >> >> > http://archive.openspending.org/ to protect us from a situation
>> where
>> >> >> > we
>> >> >> > need to reload the database but the original data has become
>> >> >> > inaccessible.
>> >> >> > It's also a nice set of CSV just to play with.
>> >> >> >
>> >> >> > Have fun,
>> >> >> >
>> >> >> >  - Friedrich
>> >> >> >
>> >> >> > --
>> >> >> > Friedrich Lindenberg | OpenSpending & OKFN Labs | @pudo
>> >> >> > http://openspending.org | http://okfnlabs.org | http://pudo.org
>> >> >> >
>> >> >> > _______________________________________________
>> >> >> > openspending-dev mailing list
>> >> >> > openspending-dev at lists.okfn.org
>> >> >> > http://lists.okfn.org/mailman/listinfo/openspending-dev
>> >> >> > Unsubscribe:
>> http://lists.okfn.org/mailman/options/openspending-dev
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Co-Founder, Open Knowledge Foundation
>> >> >> Promoting Open Knowledge in a Digital Age
>> >> >> http://www.okfn.org/ - http://blog.okfn.org/
>> >> >>
>> >> >> _______________________________________________
>> >> >> openspending-dev mailing list
>> >> >> openspending-dev at lists.okfn.org
>> >> >> http://lists.okfn.org/mailman/listinfo/openspending-dev
>> >> >> Unsubscribe: http://lists.okfn.org/mailman/options/openspending-dev
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Co-Founder, Open Knowledge Foundation
>> >> Promoting Open Knowledge in a Digital Age
>> >> http://www.okfn.org/ - http://blog.okfn.org/
>> >
>> >
>>
>>
>>
>> --
>> Co-Founder, Open Knowledge Foundation
>> Promoting Open Knowledge in a Digital Age
>> http://www.okfn.org/ - http://blog.okfn.org/
>>
>
>


-- 
Friedrich Lindenberg | OpenSpending & OKFN Labs | @pudo
http://openspending.org | http://okfnlabs.org | http://pudo.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openspending-dev/attachments/20121224/9c0effef/attachment.html>


More information about the openspending-dev mailing list