[ddj] Unlocking PDF data

Mehdi GUIRAUD mehdi.guiraud at gmail.com
Sun May 26 09:36:00 UTC 2013


Not long ago on this list they were some tools shared :

http://tabula.nerdpower.org/
https://knightcenter.utexas.edu/blog/00-13785-five-tools-extract-locked-data-pdfs

Most of the time Google docs and adobe reader are enough for me, so I never
used them. If any are good for you please tell us/me.





Mehdi Guiraud
Journaliste multimédia, EMI-CFD
t. @mguiraud
m. 06 95 92 51 33
Tèl. : 09 53 14 98 49


2013/5/26 Greg Barila <gregbarila at gmail.com>

> Cheers. Much appreciated.
>
>
> On Sun, May 26, 2013 at 1:15 PM, M. Edward (Ed) Borasky <znmeb at znmeb.net>wrote:
>
>> If the PDF is text-based and not scanned, you can sometimes open it in
>> a PDF reader (evince or okular on Linux, Acrobat Reader on Windows)
>> and copy-paste the text tables right into Excel! You may have to do a
>> text column split and adjust some rows after the paste, but it's worth
>> a try.
>>
>> I've got pretty much every open source PDF data extraction tool
>> available in my Computational Journalism Publishers Workbench (Fedora
>> and Ubuntu Linux). For scanned PDFs, you'll need an optical character
>> recognition tool - I use Tesseract.
>>
>> On Sat, May 25, 2013 at 7:28 PM, Greg Barila <gregbarila at gmail.com>
>> wrote:
>> > Hi there. I'm a journalist based in Adelaide, South Australia. I've been
>> > dabbling in some simple data journalism projects over the past couple of
>> > years (see some examples here: http://adelaidedatablog.tumblr.com )
>> >
>> > I'm interested - does anybody know of a good, open-source tool for
>> > converting PDFs into editable documents, preferably excel?
>> >
>> > I know about tools like Tabula - but it appears the tool is
>> experimental and
>> > not available for general use.
>> >
>> > Any tips would be appreciated.
>> >
>> > Greg
>> > (@GregBarila)
>> >
>> > _______________________________________________
>> > data-driven-journalism mailing list
>> > data-driven-journalism at lists.okfn.org
>> > http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>> > Unsubscribe:
>> http://lists.okfn.org/mailman/options/data-driven-journalism
>> >
>>
>>
>>
>> --
>> Twitter: http://twitter.com/znmeb; Computational Journalism Publishers
>> Workbench
>> http://j.mp/CompJournBench/
>>
>> Get out of the building - and don't come back till you have the order!
>>
>> _______________________________________________
>> data-driven-journalism mailing list
>> data-driven-journalism at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>>
>
>
> _______________________________________________
> data-driven-journalism mailing list
> data-driven-journalism at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/data-driven-journalism/attachments/20130526/a6272c0d/attachment-0001.html>


More information about the data-driven-journalism mailing list