[ddj] Unlocking PDF data

David Weisz davidaaronweisz at gmail.com
Sun May 26 14:24:52 UTC 2013


Hey Greg,

Here's a review roundup of some PDF-cracking tools from Duke University's
Reporters' Lab.

http://www.reporterslab.org/pdf-to-spreadsheet-update/

I hope this helps!

Sincerely,

David


On Sun, May 26, 2013 at 5:36 AM, Mehdi GUIRAUD <mehdi.guiraud at gmail.com>wrote:

> Not long ago on this list they were some tools shared :
>
> http://tabula.nerdpower.org/
>
> https://knightcenter.utexas.edu/blog/00-13785-five-tools-extract-locked-data-pdfs
>
> Most of the time Google docs and adobe reader are enough for me, so I
> never used them. If any are good for you please tell us/me.
>
>
>
>
>
> Mehdi Guiraud
> Journaliste multimédia, EMI-CFD
> t. @mguiraud
> m. 06 95 92 51 33
> Tèl. : 09 53 14 98 49
>
>
> 2013/5/26 Greg Barila <gregbarila at gmail.com>
>
>> Cheers. Much appreciated.
>>
>>
>> On Sun, May 26, 2013 at 1:15 PM, M. Edward (Ed) Borasky <znmeb at znmeb.net>wrote:
>>
>>> If the PDF is text-based and not scanned, you can sometimes open it in
>>> a PDF reader (evince or okular on Linux, Acrobat Reader on Windows)
>>> and copy-paste the text tables right into Excel! You may have to do a
>>> text column split and adjust some rows after the paste, but it's worth
>>> a try.
>>>
>>> I've got pretty much every open source PDF data extraction tool
>>> available in my Computational Journalism Publishers Workbench (Fedora
>>> and Ubuntu Linux). For scanned PDFs, you'll need an optical character
>>> recognition tool - I use Tesseract.
>>>
>>> On Sat, May 25, 2013 at 7:28 PM, Greg Barila <gregbarila at gmail.com>
>>> wrote:
>>> > Hi there. I'm a journalist based in Adelaide, South Australia. I've
>>> been
>>> > dabbling in some simple data journalism projects over the past couple
>>> of
>>> > years (see some examples here: http://adelaidedatablog.tumblr.com )
>>> >
>>> > I'm interested - does anybody know of a good, open-source tool for
>>> > converting PDFs into editable documents, preferably excel?
>>> >
>>> > I know about tools like Tabula - but it appears the tool is
>>> experimental and
>>> > not available for general use.
>>> >
>>> > Any tips would be appreciated.
>>> >
>>> > Greg
>>> > (@GregBarila)
>>> >
>>> > _______________________________________________
>>> > data-driven-journalism mailing list
>>> > data-driven-journalism at lists.okfn.org
>>> > http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>>> > Unsubscribe:
>>> http://lists.okfn.org/mailman/options/data-driven-journalism
>>> >
>>>
>>>
>>>
>>> --
>>> Twitter: http://twitter.com/znmeb; Computational Journalism Publishers
>>> Workbench
>>> http://j.mp/CompJournBench/
>>>
>>> Get out of the building - and don't come back till you have the order!
>>>
>>> _______________________________________________
>>> data-driven-journalism mailing list
>>> data-driven-journalism at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>>> Unsubscribe:
>>> http://lists.okfn.org/mailman/options/data-driven-journalism
>>>
>>
>>
>> _______________________________________________
>> data-driven-journalism mailing list
>> data-driven-journalism at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>>
>>
>
> _______________________________________________
> data-driven-journalism mailing list
> data-driven-journalism at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>
>


--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/data-driven-journalism/attachments/20130526/214bb80c/attachment-0001.html>


More information about the data-driven-journalism mailing list