[ddj] Unlocking PDF data

Luis Martínez-Uribe l.martinezuribe at gmail.com
Sun May 26 22:16:26 UTC 2013


An obvious one is Scraperwiki <https://scraperwiki.com/>, see this blog
post explaining how to extract data from a PDF using their tool.


Luis Martinez-Uribe
*Research Data Analyst*
Australian National Data Service (ANDS)


On 27 May 2013 00:24, David Weisz <davidaaronweisz at gmail.com> wrote:

> Hey Greg,
>
> Here's a review roundup of some PDF-cracking tools from Duke University's
> Reporters' Lab.
>
> http://www.reporterslab.org/pdf-to-spreadsheet-update/
>
> I hope this helps!
>
> Sincerely,
>
> David
>
>
> On Sun, May 26, 2013 at 5:36 AM, Mehdi GUIRAUD <mehdi.guiraud at gmail.com>wrote:
>
>> Not long ago on this list they were some tools shared :
>>
>> http://tabula.nerdpower.org/
>>
>> https://knightcenter.utexas.edu/blog/00-13785-five-tools-extract-locked-data-pdfs
>>
>> Most of the time Google docs and adobe reader are enough for me, so I
>> never used them. If any are good for you please tell us/me.
>>
>>
>>
>>
>>
>> Mehdi Guiraud
>> Journaliste multimédia, EMI-CFD
>> t. @mguiraud
>> m. 06 95 92 51 33
>> Tèl. : 09 53 14 98 49
>>
>>
>> 2013/5/26 Greg Barila <gregbarila at gmail.com>
>>
>>> Cheers. Much appreciated.
>>>
>>>
>>> On Sun, May 26, 2013 at 1:15 PM, M. Edward (Ed) Borasky <znmeb at znmeb.net
>>> > wrote:
>>>
>>>> If the PDF is text-based and not scanned, you can sometimes open it in
>>>> a PDF reader (evince or okular on Linux, Acrobat Reader on Windows)
>>>> and copy-paste the text tables right into Excel! You may have to do a
>>>> text column split and adjust some rows after the paste, but it's worth
>>>> a try.
>>>>
>>>> I've got pretty much every open source PDF data extraction tool
>>>> available in my Computational Journalism Publishers Workbench (Fedora
>>>> and Ubuntu Linux). For scanned PDFs, you'll need an optical character
>>>> recognition tool - I use Tesseract.
>>>>
>>>> On Sat, May 25, 2013 at 7:28 PM, Greg Barila <gregbarila at gmail.com>
>>>> wrote:
>>>> > Hi there. I'm a journalist based in Adelaide, South Australia. I've
>>>> been
>>>> > dabbling in some simple data journalism projects over the past couple
>>>> of
>>>> > years (see some examples here: http://adelaidedatablog.tumblr.com )
>>>> >
>>>> > I'm interested - does anybody know of a good, open-source tool for
>>>> > converting PDFs into editable documents, preferably excel?
>>>> >
>>>> > I know about tools like Tabula - but it appears the tool is
>>>> experimental and
>>>> > not available for general use.
>>>> >
>>>> > Any tips would be appreciated.
>>>> >
>>>> > Greg
>>>> > (@GregBarila)
>>>> >
>>>> > _______________________________________________
>>>> > data-driven-journalism mailing list
>>>> > data-driven-journalism at lists.okfn.org
>>>> > http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>>>> > Unsubscribe:
>>>> http://lists.okfn.org/mailman/options/data-driven-journalism
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Twitter: http://twitter.com/znmeb; Computational Journalism Publishers
>>>> Workbench
>>>> http://j.mp/CompJournBench/
>>>>
>>>> Get out of the building - and don't come back till you have the order!
>>>>
>>>> _______________________________________________
>>>> data-driven-journalism mailing list
>>>> data-driven-journalism at lists.okfn.org
>>>> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>>>> Unsubscribe:
>>>> http://lists.okfn.org/mailman/options/data-driven-journalism
>>>>
>>>
>>>
>>> _______________________________________________
>>> data-driven-journalism mailing list
>>> data-driven-journalism at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>>> Unsubscribe:
>>> http://lists.okfn.org/mailman/options/data-driven-journalism
>>>
>>>
>>
>> _______________________________________________
>> data-driven-journalism mailing list
>> data-driven-journalism at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>>
>>
>
>
> --
>
>
> _______________________________________________
> data-driven-journalism mailing list
> data-driven-journalism at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/data-driven-journalism/attachments/20130527/525100ed/attachment-0001.html>


More information about the data-driven-journalism mailing list