[ddj] Unlocking PDF data

Greg Barila gregbarila at gmail.com
Mon May 27 22:27:31 UTC 2013


Thank you Luis!

Sent from my iPhone

On May 27, 2013, at 8:16 AM, Luis Martínez-Uribe <l.martinezuribe at gmail.com> wrote:

> An obvious one is Scraperwiki, see this blog post explaining how to extract data from a PDF using their tool.
> 
> 
> Luis Martinez-Uribe
> Research Data Analyst
> Australian National Data Service (ANDS)
> 
> 
> On 27 May 2013 00:24, David Weisz <davidaaronweisz at gmail.com> wrote:
> Hey Greg,
> 
> Here's a review roundup of some PDF-cracking tools from Duke University's Reporters' Lab.
> 
> http://www.reporterslab.org/pdf-to-spreadsheet-update/
> 
> I hope this helps!
> 
> Sincerely,
> 
> David
> 
> 
> On Sun, May 26, 2013 at 5:36 AM, Mehdi GUIRAUD <mehdi.guiraud at gmail.com> wrote:
> Not long ago on this list they were some tools shared :
> 
> http://tabula.nerdpower.org/
> https://knightcenter.utexas.edu/blog/00-13785-five-tools-extract-locked-data-pdfs
> 
> Most of the time Google docs and adobe reader are enough for me, so I never used them. If any are good for you please tell us/me.
> 
> 
> 
> 
> 
> Mehdi Guiraud
> Journaliste multimédia, EMI-CFD
> t. @mguiraud
> m. 06 95 92 51 33
> Tèl. : 09 53 14 98 49
> 
> 
> 2013/5/26 Greg Barila <gregbarila at gmail.com>
> Cheers. Much appreciated. 
> 
> 
> On Sun, May 26, 2013 at 1:15 PM, M. Edward (Ed) Borasky <znmeb at znmeb.net> wrote:
> If the PDF is text-based and not scanned, you can sometimes open it in
> a PDF reader (evince or okular on Linux, Acrobat Reader on Windows)
> and copy-paste the text tables right into Excel! You may have to do a
> text column split and adjust some rows after the paste, but it's worth
> a try.
> 
> I've got pretty much every open source PDF data extraction tool
> available in my Computational Journalism Publishers Workbench (Fedora
> and Ubuntu Linux). For scanned PDFs, you'll need an optical character
> recognition tool - I use Tesseract.
> 
> On Sat, May 25, 2013 at 7:28 PM, Greg Barila <gregbarila at gmail.com> wrote:
> > Hi there. I'm a journalist based in Adelaide, South Australia. I've been
> > dabbling in some simple data journalism projects over the past couple of
> > years (see some examples here: http://adelaidedatablog.tumblr.com )
> >
> > I'm interested - does anybody know of a good, open-source tool for
> > converting PDFs into editable documents, preferably excel?
> >
> > I know about tools like Tabula - but it appears the tool is experimental and
> > not available for general use.
> >
> > Any tips would be appreciated.
> >
> > Greg
> > (@GregBarila)
> >
> > _______________________________________________
> > data-driven-journalism mailing list
> > data-driven-journalism at lists.okfn.org
> > http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> > Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
> >
> 
> 
> 
> --
> Twitter: http://twitter.com/znmeb; Computational Journalism Publishers Workbench
> http://j.mp/CompJournBench/
> 
> Get out of the building - and don't come back till you have the order!
> 
> _______________________________________________
> data-driven-journalism mailing list
> data-driven-journalism at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
> 
> 
> _______________________________________________
> data-driven-journalism mailing list
> data-driven-journalism at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
> 
> 
> 
> _______________________________________________
> data-driven-journalism mailing list
> data-driven-journalism at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
> 
> 
> 
> 
> -- 
> 
> 
> _______________________________________________
> data-driven-journalism mailing list
> data-driven-journalism at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
> 
> 
> _______________________________________________
> data-driven-journalism mailing list
> data-driven-journalism at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/data-driven-journalism/attachments/20130528/9d6f92b6/attachment-0001.html>


More information about the data-driven-journalism mailing list