[okfn-labs] Not browserfarm

Chris Mear chrismear at gmail.com
Tue Aug 19 12:21:45 UTC 2014


The Archive Team has something called the Warrior:

http://www.archiveteam.org/index.php?title=Warrior

which they use to do distributed downloading of websites that are going offline soon (due to acquisitions, etc.) and things like that. Could be a starting point?

Components and source code here:

http://www.archiveteam.org/index.php?title=Dev/Source_Code

Chris

On 19 Aug 2014, at 12:52, Rufus Pollock <rufus.pollock at okfn.org> wrote:

> As I understand it you are especially interested in "distributed scraping" - is that right?
> 
> This definitely a nice (and classic) problem - I know Friedrich did something with "flockscrape" (https://pypi.python.org/pypi/flockscrape-client) a while back and BrowserFarm (which never real got beyond the ideas stage) was about distributed processing and scraping using the browser as the platform.
> 
> Rufus
> 
> 
> On 18 August 2014 11:39, Thomas Levine <_ at thomaslevine.com> wrote:
> I've been working on getting information from Delaware,
> which might involve getting around rate limits based on
> IP addresses.
> http://thomaslevine.com/dada/delaware/
> 
> Rufus thinks some of you might take interest.
> _______________________________________________
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/okfn-labs
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs
> 
> 
> 
> -- 
> Rufus Pollock
> Founder and President  |  skype: rufuspollock  |  @rufuspollock
> Open Knowledge - see how data can change the world
> http://okfn.org/  |  @okfn  |  Open Knowledge on Facebook  |  Blog
> 
> The Open Knowledge Foundation is a not-for-profit organisation.  It is incorporated in England & Wales as a company limited by guarantee, with company number 05133759.  VAT Registration № GB 984404989. Registered office address: Open Knowledge Foundation, St John’s Innovation Centre, Cowley Road, Cambridge, CB4 0WS, UK.  
> _______________________________________________
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/okfn-labs
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20140819/e99bf763/attachment-0004.html>


More information about the okfn-labs mailing list