[okfn-labs] Not browserfarm
Thomas Levine
_ at thomaslevine.com
Tue Aug 19 14:16:57 UTC 2014
I don't use the term "distributed scraping", but I suppose it works.
I am specifically interested in getting the Delaware data, and
my "distributed" Delaware scraper already works perfectly, so I don't
plan on changing it substantially.
I haven't come across other things I've wanted to get where rate
limits were such an issue. Has anyone else come across something
with similar difficulties?
On 19 Aug 13:21, Chris Mear wrote:
> The Archive Team has something called the Warrior:
>
> http://www.archiveteam.org/index.php?title=Warrior
>
> which they use to do distributed downloading of websites that are going offline soon (due to acquisitions, etc.) and things like that. Could be a starting point?
>
> Components and source code here:
>
> http://www.archiveteam.org/index.php?title=Dev/Source_Code
>
> Chris
>
> On 19 Aug 2014, at 12:52, Rufus Pollock <rufus.pollock at okfn.org> wrote:
>
> > As I understand it you are especially interested in "distributed scraping" - is that right?
> >
> > This definitely a nice (and classic) problem - I know Friedrich did something with "flockscrape" (https://pypi.python.org/pypi/flockscrape-client) a while back and BrowserFarm (which never real got beyond the ideas stage) was about distributed processing and scraping using the browser as the platform.
> >
> > Rufus
> >
> >
> > On 18 August 2014 11:39, Thomas Levine <_ at thomaslevine.com> wrote:
> > I've been working on getting information from Delaware,
> > which might involve getting around rate limits based on
> > IP addresses.
> > http://thomaslevine.com/dada/delaware/
> >
> > Rufus thinks some of you might take interest.
> > _______________________________________________
> > okfn-labs mailing list
> > okfn-labs at lists.okfn.org
> > https://lists.okfn.org/mailman/listinfo/okfn-labs
> > Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs
> >
> >
> >
> > --
> > Rufus Pollock
> > Founder and President | skype: rufuspollock | @rufuspollock
> > Open Knowledge - see how data can change the world
> > http://okfn.org/ | @okfn | Open Knowledge on Facebook | Blog
> >
> > The Open Knowledge Foundation is a not-for-profit organisation. It is incorporated in England & Wales as a company limited by guarantee, with company number 05133759. VAT Registration № GB 984404989. Registered office address: Open Knowledge Foundation, St John’s Innovation Centre, Cowley Road, Cambridge, CB4 0WS, UK.
> > _______________________________________________
> > okfn-labs mailing list
> > okfn-labs at lists.okfn.org
> > https://lists.okfn.org/mailman/listinfo/okfn-labs
> > Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs
>
More information about the okfn-labs
mailing list