[ddj] Unlocking PDF data

Greg Barila gregbarila at gmail.com
Sun May 26 08:35:15 UTC 2013


Cheers. Much appreciated.


On Sun, May 26, 2013 at 1:15 PM, M. Edward (Ed) Borasky <znmeb at znmeb.net>wrote:

> If the PDF is text-based and not scanned, you can sometimes open it in
> a PDF reader (evince or okular on Linux, Acrobat Reader on Windows)
> and copy-paste the text tables right into Excel! You may have to do a
> text column split and adjust some rows after the paste, but it's worth
> a try.
>
> I've got pretty much every open source PDF data extraction tool
> available in my Computational Journalism Publishers Workbench (Fedora
> and Ubuntu Linux). For scanned PDFs, you'll need an optical character
> recognition tool - I use Tesseract.
>
> On Sat, May 25, 2013 at 7:28 PM, Greg Barila <gregbarila at gmail.com> wrote:
> > Hi there. I'm a journalist based in Adelaide, South Australia. I've been
> > dabbling in some simple data journalism projects over the past couple of
> > years (see some examples here: http://adelaidedatablog.tumblr.com )
> >
> > I'm interested - does anybody know of a good, open-source tool for
> > converting PDFs into editable documents, preferably excel?
> >
> > I know about tools like Tabula - but it appears the tool is experimental
> and
> > not available for general use.
> >
> > Any tips would be appreciated.
> >
> > Greg
> > (@GregBarila)
> >
> > _______________________________________________
> > data-driven-journalism mailing list
> > data-driven-journalism at lists.okfn.org
> > http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> > Unsubscribe:
> http://lists.okfn.org/mailman/options/data-driven-journalism
> >
>
>
>
> --
> Twitter: http://twitter.com/znmeb; Computational Journalism Publishers
> Workbench
> http://j.mp/CompJournBench/
>
> Get out of the building - and don't come back till you have the order!
>
> _______________________________________________
> data-driven-journalism mailing list
> data-driven-journalism at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/data-driven-journalism/attachments/20130526/18e0f502/attachment-0001.html>


More information about the data-driven-journalism mailing list