[ddj] Unlocking PDF data
Greg Barila
gregbarila at gmail.com
Mon May 27 22:27:15 UTC 2013
Thanks Peter. Great resources there.
Sent from my iPhone
On May 27, 2013, at 2:55 PM, Peter Borbely <pborbely at fairfaxmedia.com.au> wrote:
> Greg
>
> The tools may slightly vary depending on the OS you're using and also
> on how the PDF was created at the first place.
> If you have Acrobat Pro, you'll get a lot of great tools to export
> from PDF, including character recognition (to convert images (e.g.
> scanned text) to alphanumerical data). Google docs may do that for you
> as well (with limited file sizes, however).
>
> Open Office is probably your next port of call, see this thread:
> http://forum.openoffice.org/en/forum/viewtopic.php?t=43632 and
> googling will reveal thousands of more resources.
> You may find that you need specific tools / scripts particular to the
> semantic structure (or lack of it way too often) of your particular
> PDF.
>
> cheers
> Peter
>
>
> On 26 May 2013 13:19, <data-driven-journalism-request at lists.okfn.org> wrote:
>>
>> Send data-driven-journalism mailing list submissions to
>> data-driven-journalism at lists.okfn.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>> or, via email, send a message with subject or body 'help' to
>> data-driven-journalism-request at lists.okfn.org
>>
>> You can reach the person managing the list at
>> data-driven-journalism-owner at lists.okfn.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of data-driven-journalism digest..."
>>
>>
>> Today's Topics:
>>
>> 1. Re: Tax Avoidance and Evasion Data Expedition
>> (Square One Dr Peter Troxler (KvK 24480536))
>> 2. Unlocking PDF data (Greg Barila)
>> 3. Re: Unlocking PDF data (Jesus Lopez Osorio)
>> 4. Re: Unlocking PDF data (Andrew Duffy)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Sat, 25 May 2013 19:22:53 +0200
>> From: "Square One Dr Peter Troxler (KvK 24480536)" <peter at square-1.eu>
>> Subject: Re: [ddj] Tax Avoidance and Evasion Data Expedition
>> To: "List about Data Driven Journalism and Open Data in Journalism."
>> <data-driven-journalism at lists.okfn.org>
>> Cc: "schoolofdata at okfn.org" <schoolofdata at okfn.org>
>> Message-ID: <430A5CCF-1CCF-44AA-B315-F6D4672DDB18 at square-1.eu>
>> Content-Type: text/plain; charset="windows-1252"
>>
>> I would wish you'd make that a key topic at OKConf in Geneva ? but I doubt you'll be able to push that past the corporate agenda of the event?
>>
>> On 24 May 2013, at 19:07 , Lucy Chambers <lucy.chambers at okfn.org> wrote:
>>
>>> Hi All,
>>>
>>> The topic of tax avoidance and tax evasion is the topic of the day at the Open Knowledge Foundation.
>>>
>>> On 6th of June the School of Data will be running a data expedition (a group-based guided journey - picking some key questions, then seeing whether it is possible to answer them) on the topic of tax evasion to help those reporting on the topic to get a grasp of key concepts and schemes. This will take place online.
>>>
>>> Places are limited as this will be quite hands on - please sign up early if you would like to be involved (you must be able to dedicate at least 3-6 hours on 6th June).
>>>
>>> More details and signup here:
>>>
>>> http://schoolofdata.org/2013/05/24/data-expedition-tax-avoidance-and-evasion/
>>>
>>> Note: In future, for data expeditions - we will try and kick off data expeditions with "expert introductions", with people who are knowledgable about a particular topic giving a short 15-30 minute introduction via videolink / recording, or even a Q & A. If you know someone we should invite to take the stage on tax evasion or tax avoidance, please let us know!
>>>
>>> All the best,
>>>
>>> Lucy
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Lucy Chambers
>>>
>>> Project Coordinator | skype: lucyfediachambers | tel: +44 7909 330731 | @lucyfedia
>>>
>>> The Open Knowledge Foundation
>>> Empowering through Open Knowledge
>>> http://okfn.org/ | @okfn | OKF on Facebook | Blog | Newsletter
>>>
>>> OpenSpending | http://openspending.org/ | @openspending | Tracking every government financial transaction across the world
>>> School of Data | http://schoolofdata.org | @schoolofdata | Evidence is Power
>>> _______________________________________________
>>> data-driven-journalism mailing list
>>> data-driven-journalism at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>>> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://lists.okfn.org/pipermail/data-driven-journalism/attachments/20130525/f27f4a34/attachment-0001.htm>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Sun, 26 May 2013 11:58:17 +0930
>> From: Greg Barila <gregbarila at gmail.com>
>> Subject: [ddj] Unlocking PDF data
>> To: data-driven-journalism at lists.okfn.org
>> Message-ID:
>> <CAFv_f8QXHhcVDT7NAY+UEFkad3CKGQVdyf+BEYx8ssQkd_WJfQ at mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Hi there. I'm a journalist based in Adelaide, South Australia. I've been
>> dabbling in some simple data journalism projects over the past couple of
>> years (see some examples here: http://adelaidedatablog.tumblr.com )
>>
>> I'm interested - does anybody know of a good, open-source tool for
>> converting PDFs into editable documents, preferably excel?
>>
>> I know about tools like Tabula - but it appears the tool is experimental
>> and not available for general use.
>>
>> Any tips would be appreciated.
>>
>> Greg
>> (@GregBarila)
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://lists.okfn.org/pipermail/data-driven-journalism/attachments/20130526/47b2a4dd/attachment-0001.htm>
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Sun, 26 May 2013 04:43:12 +0200
>> From: Jesus Lopez Osorio <jesuslopez.osorio at elmundo.es>
>> Subject: Re: [ddj] Unlocking PDF data
>> To: List about Data Driven Journalism and Open Data in Journalism.
>> <data-driven-journalism at lists.okfn.org>
>> Message-ID:
>> <B14C05A4B058424D8AD8CE2D3C3AAACC0143BDCFDA1C at UE-MAILCCR.oficina.int>
>> Content-Type: text/plain; charset="us-ascii"
>>
>> Have you tried Zamzar? It worked for me once
>> good luck down under!
>>
>> De: data-driven-journalism-bounces at lists.okfn.org [mailto:data-driven-journalism-bounces at lists.okfn.org] En nombre de Greg Barila
>> Enviado el: domingo, 26 de mayo de 2013 4:28
>> Para: data-driven-journalism at lists.okfn.org
>> Asunto: [ddj] Unlocking PDF data
>>
>> Hi there. I'm a journalist based in Adelaide, South Australia. I've been dabbling in some simple data journalism projects over the past couple of years (see some examples here: http://adelaidedatablog.tumblr.com )
>>
>> I'm interested - does anybody know of a good, open-source tool for converting PDFs into editable documents, preferably excel?
>>
>> I know about tools like Tabula - but it appears the tool is experimental and not available for general use.
>>
>> Any tips would be appreciated.
>>
>> Greg
>> (@GregBarila)
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://lists.okfn.org/pipermail/data-driven-journalism/attachments/20130526/57f4142f/attachment-0001.htm>
>>
>> ------------------------------
>>
>> Message: 4
>> Date: Sun, 26 May 2013 11:19:28 +0800
>> From: Andrew Duffy <andrewjamesduffy at gmail.com>
>> Subject: Re: [ddj] Unlocking PDF data
>> To: "List about Data Driven Journalism and Open Data in Journalism."
>> <data-driven-journalism at lists.okfn.org>
>> Message-ID:
>> <CAO8PDYQcbiJ2LmzYgLiynWruCHurQ0g0B4mfELgxfMRLGcJ9QQ at mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Last time I checked Google Drive could convert PDFs into editable documents
>> if you tell it to.
>>
>> --Andrew Duffy (from Perth)
>>
>>
>> On Sun, May 26, 2013 at 10:28 AM, Greg Barila <gregbarila at gmail.com> wrote:
>>
>>> Hi there. I'm a journalist based in Adelaide, South Australia. I've been
>>> dabbling in some simple data journalism projects over the past couple of
>>> years (see some examples here: http://adelaidedatablog.tumblr.com )
>>>
>>> I'm interested - does anybody know of a good, open-source tool for
>>> converting PDFs into editable documents, preferably excel?
>>>
>>> I know about tools like Tabula - but it appears the tool is experimental
>>> and not available for general use.
>>>
>>> Any tips would be appreciated.
>>>
>>> Greg
>>> (@GregBarila)
>>>
>>> _______________________________________________
>>> data-driven-journalism mailing list
>>> data-driven-journalism at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>>> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>>>
>>>
>>
>>
>> --
>>
>> *Andrew Duffy - Journalist*
>>
>> *Cirrus Media*
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://lists.okfn.org/pipermail/data-driven-journalism/attachments/20130526/581de595/attachment.htm>
>>
>> ------------------------------
>>
>> _______________________________________________
>> data-driven-journalism mailing list
>> data-driven-journalism at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>> Unsubscribe: http://lists.okfn.org/mailman/optionss/data-driven-journalism
>>
>>
>> End of data-driven-journalism Digest, Vol 26, Issue 34
>> ******************************************************
>
> --
> The information contained in this e-mail message and any accompanying files
> is or may be confidential. If you are not the intended recipient, any use,
> dissemination, reliance, forwarding, printing or copying of this e-mail or
> any attached files is unauthorised. This e-mail is subject to copyright. No
> part of it should be reproduced, adapted or communicated without the
> written consent of the copyright owner. If you have received this e-mail in
> error please advise the sender immediately by return e-mail or telephone
> and delete all copies. Fairfax Media does not guarantee the accuracy or
> completeness of any information contained in this e-mail or attached files.
> Internet communications are not secure, therefore Fairfax Media does not
> accept legal responsibility for the contents of this message or attached
> files.
>
> _______________________________________________
> data-driven-journalism mailing list
> data-driven-journalism at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
More information about the data-driven-journalism
mailing list