[okfn-labs] scrapping PDF files
Michael Bauer
michael.bauer at okfn.org
Fri Dec 6 08:51:29 UTC 2013
Hi,
I've done scraping with scraperwikis pdftoxml. Works quite well for what it
is (a workaround converting PDF to another format, that's easier to parse).
What I generally do is convert it to xml, then figure out where the text is
I need and write XPath expressions for that.
Happy to give more detailed instructions if needed.
Michael
On Thu, Dec 05, 2013 at 08:53:39PM +0100, Alioune Dia wrote:
> I'am looking for a best Python Library for scrapping some bunch of
> pdf files .I' am actually focus on
> https://github.com/scraperwiki/scraperwiki-python library . did Anyone
> already experimented it . Is it exist a more interesting library Any
> Help will be appreciate .
> --Ad
> _______________________________________________
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/okfn-labs
> Unsubscribe: http://lists.okfn.org/mailman/options/okfn-labs
--
Data Diva | skype: mihi_tr | @mihi_tr
The Open Knowledge Foundation | School of Data
http://okfn.org | http://schoolofdata.org
GPG/PGP key: http://tentacleriot.eu/mihi.asc
More information about the okfn-labs
mailing list