[okfn-labs] scrapping PDF files

Michael Bauer michael.bauer at okfn.org
Mon Dec 9 08:11:13 UTC 2013


Alioune,

On Fri, Dec 06, 2013 at 07:30:23PM +0100, Alioune Dia wrote:
> Hi All
> 
> For text brut scrapping , it seem like That many tools give good
> result , I experimented The PdfMinder for the past and got good
> results . The problem is also a table scrapping, with many tools , I
> had some problems like  --multilined rows-- , -- non useful
> information--Empty line-- , ect . With scrapper wiki , I 'am also

Yes this is always a problem. When doing PDF scraping I tend to do a
clean-up of the data afterwards. It is always messy (no matter what tool
you use).

Michael

-- 
Data Diva | skype: mihi_tr | @mihi_tr
The Open Knowledge Foundation | School of Data
http://okfn.org | http://schoolofdata.org 
GPG/PGP key: http://tentacleriot.eu/mihi.asc



More information about the okfn-labs mailing list