[okfn-labs] Scraping with Javascript

Ross Jones ross.jones at okfn.org
Mon Apr 23 19:37:48 UTC 2012


I was having a brief discussion with Rufus and Friedrich about scraping with JS today on IRC, and I was suggesting that we use something like phantomjs ( http://phantomjs.org/  ) which is now 100% headless (on Linux)  to inject JS into a browser page in order to scrape it.  PhantomJS is pretty neat if you want to run JS in a browser without a browser and I am sure you can think of plenty of other uses for it (my screenshot server is only about 40 lines of JS for instance).

Also someone mentioned node.io ( https://github.com/chriso/node.io ) to me a while ago, and although I never got around to using it, it looks very interesting as a framework for scraping with JS.  Still incomplete but I expect it could be reasonably efficient if paired with node's new clustering api.


