[okfn-labs] Not browserfarm

Thomas Levine _ at thomaslevine.com
Tue Aug 19 14:16:57 UTC 2014


I don't use the term "distributed scraping", but I suppose it works.

I am specifically interested in getting the Delaware data, and
my "distributed" Delaware scraper already works perfectly, so I don't
plan on changing it substantially.

I haven't come across other things I've wanted to get where rate
limits were such an issue. Has anyone else come across something
with similar difficulties?

On 19 Aug 13:21, Chris Mear wrote:
> The Archive Team has something called the Warrior:
> 
> http://www.archiveteam.org/index.php?title=Warrior
> 
> which they use to do distributed downloading of websites that are going offline soon (due to acquisitions, etc.) and things like that. Could be a starting point?
> 
> Components and source code here:
> 
> http://www.archiveteam.org/index.php?title=Dev/Source_Code
> 
> Chris
> 
> On 19 Aug 2014, at 12:52, Rufus Pollock <rufus.pollock at okfn.org> wrote:
> 
> > As I understand it you are especially interested in "distributed scraping" - is that right?
> > 
> > This definitely a nice (and classic) problem - I know Friedrich did something with "flockscrape" (https://pypi.python.org/pypi/flockscrape-client) a while back and BrowserFarm (which never real got beyond the ideas stage) was about distributed processing and scraping using the browser as the platform.
> > 
> > Rufus
> > 
> > 
> > On 18 August 2014 11:39, Thomas Levine <_ at thomaslevine.com> wrote:
> > I've been working on getting information from Delaware,
> > which might involve getting around rate limits based on
> > IP addresses.
> > http://thomaslevine.com/dada/delaware/
> > 
> > Rufus thinks some of you might take interest.
> > _______________________________________________
> > okfn-labs mailing list
> > okfn-labs at lists.okfn.org
> > https://lists.okfn.org/mailman/listinfo/okfn-labs
> > Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs
> > 
> > 
> > 
> > -- 
> > Rufus Pollock
> > Founder and President  |  skype: rufuspollock  |  @rufuspollock
> > Open Knowledge - see how data can change the world
> > http://okfn.org/  |  @okfn  |  Open Knowledge on Facebook  |  Blog
> > 
> > The Open Knowledge Foundation is a not-for-profit organisation.  It is incorporated in England & Wales as a company limited by guarantee, with company number 05133759.  VAT Registration № GB 984404989. Registered office address: Open Knowledge Foundation, St John’s Innovation Centre, Cowley Road, Cambridge, CB4 0WS, UK.  
> > _______________________________________________
> > okfn-labs mailing list
> > okfn-labs at lists.okfn.org
> > https://lists.okfn.org/mailman/listinfo/okfn-labs
> > Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs
> 



More information about the okfn-labs mailing list