[ddj] Unlocking PDF data

Greg Barila gregbarila at gmail.com
Mon May 27 22:27:15 UTC 2013


Thanks Peter. Great resources there.

Sent from my iPhone

On May 27, 2013, at 2:55 PM, Peter Borbely <pborbely at fairfaxmedia.com.au> wrote:

> Greg
> 
> The tools may slightly vary depending on the OS you're using and also
> on how the PDF was created at the first place.
> If you have Acrobat Pro, you'll get a lot of great tools to export
> from PDF, including character recognition (to convert images (e.g.
> scanned text) to alphanumerical data). Google docs may do that for you
> as well (with limited file sizes, however).
> 
> Open Office is probably your next port of call, see this thread:
> http://forum.openoffice.org/en/forum/viewtopic.php?t=43632 and
> googling will reveal thousands of more resources.
> You may find that you need specific tools / scripts particular to the
> semantic structure (or lack of it way too often) of your particular
> PDF.
> 
> cheers
> Peter
> 
> 
> On 26 May 2013 13:19, <data-driven-journalism-request at lists.okfn.org> wrote:
>> 
>> Send data-driven-journalism mailing list submissions to
>>        data-driven-journalism at lists.okfn.org
>> 
>> To subscribe or unsubscribe via the World Wide Web, visit
>>        http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>> or, via email, send a message with subject or body 'help' to
>>        data-driven-journalism-request at lists.okfn.org
>> 
>> You can reach the person managing the list at
>>        data-driven-journalism-owner at lists.okfn.org
>> 
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of data-driven-journalism digest..."
>> 
>> 
>> Today's Topics:
>> 
>>   1. Re: Tax Avoidance and Evasion Data Expedition
>>      (Square One Dr Peter Troxler (KvK 24480536))
>>   2. Unlocking PDF data (Greg Barila)
>>   3. Re: Unlocking PDF data (Jesus Lopez Osorio)
>>   4. Re: Unlocking PDF data (Andrew Duffy)
>> 
>> 
>> ----------------------------------------------------------------------
>> 
>> Message: 1
>> Date: Sat, 25 May 2013 19:22:53 +0200
>> From: "Square One Dr Peter Troxler (KvK 24480536)" <peter at square-1.eu>
>> Subject: Re: [ddj] Tax Avoidance and Evasion Data Expedition
>> To: "List about Data Driven Journalism and Open Data in Journalism."
>>        <data-driven-journalism at lists.okfn.org>
>> Cc: "schoolofdata at okfn.org" <schoolofdata at okfn.org>
>> Message-ID: <430A5CCF-1CCF-44AA-B315-F6D4672DDB18 at square-1.eu>
>> Content-Type: text/plain; charset="windows-1252"
>> 
>> I would wish you'd make that a key topic at OKConf in Geneva ? but I doubt you'll be able to push that past the corporate agenda of the event?
>> 
>> On 24 May 2013, at 19:07 , Lucy Chambers <lucy.chambers at okfn.org> wrote:
>> 
>>> Hi All,
>>> 
>>> The topic of tax avoidance and tax evasion is the topic of the day at the Open Knowledge Foundation.
>>> 
>>> On 6th of June the School of Data will be running a data expedition (a group-based guided journey - picking some key questions, then seeing whether it is possible to answer them) on the topic of tax evasion to help those reporting on the topic to get a grasp of key concepts and schemes. This will take place online.
>>> 
>>> Places are limited as this will be quite hands on - please sign up early if you would like to be involved (you must be able to dedicate at least 3-6 hours on 6th June).
>>> 
>>> More details and signup here:
>>> 
>>> http://schoolofdata.org/2013/05/24/data-expedition-tax-avoidance-and-evasion/
>>> 
>>> Note: In future, for data expeditions - we will try and kick off data expeditions with "expert introductions", with people who are knowledgable about a particular topic giving a short 15-30 minute introduction via videolink / recording, or even a Q & A. If you know someone we should invite to take the stage on tax evasion or tax avoidance, please let us know!
>>> 
>>> All the best,
>>> 
>>> Lucy
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Lucy Chambers
>>> 
>>> Project Coordinator  | skype: lucyfediachambers  |  tel: +44 7909 330731  |  @lucyfedia
>>> 
>>> The Open Knowledge Foundation
>>> Empowering through Open Knowledge
>>> http://okfn.org/  |  @okfn  |  OKF on Facebook  |  Blog  |  Newsletter
>>> 
>>> OpenSpending | http://openspending.org/ | @openspending |  Tracking every government financial transaction across the world
>>> School of Data | http://schoolofdata.org | @schoolofdata | Evidence is Power
>>> _______________________________________________
>>> data-driven-journalism mailing list
>>> data-driven-journalism at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>>> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>> 
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://lists.okfn.org/pipermail/data-driven-journalism/attachments/20130525/f27f4a34/attachment-0001.htm>
>> 
>> ------------------------------
>> 
>> Message: 2
>> Date: Sun, 26 May 2013 11:58:17 +0930
>> From: Greg Barila <gregbarila at gmail.com>
>> Subject: [ddj] Unlocking PDF data
>> To: data-driven-journalism at lists.okfn.org
>> Message-ID:
>>        <CAFv_f8QXHhcVDT7NAY+UEFkad3CKGQVdyf+BEYx8ssQkd_WJfQ at mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>> 
>> Hi there. I'm a journalist based in Adelaide, South Australia. I've been
>> dabbling in some simple data journalism projects over the past couple of
>> years (see some examples here: http://adelaidedatablog.tumblr.com )
>> 
>> I'm interested - does anybody know of a good, open-source tool for
>> converting PDFs into editable documents, preferably excel?
>> 
>> I know about tools like Tabula - but it appears the tool is experimental
>> and not available for general use.
>> 
>> Any tips would be appreciated.
>> 
>> Greg
>> (@GregBarila)
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://lists.okfn.org/pipermail/data-driven-journalism/attachments/20130526/47b2a4dd/attachment-0001.htm>
>> 
>> ------------------------------
>> 
>> Message: 3
>> Date: Sun, 26 May 2013 04:43:12 +0200
>> From: Jesus Lopez Osorio <jesuslopez.osorio at elmundo.es>
>> Subject: Re: [ddj] Unlocking PDF data
>> To: List about Data Driven Journalism and Open Data in Journalism.
>>        <data-driven-journalism at lists.okfn.org>
>> Message-ID:
>>        <B14C05A4B058424D8AD8CE2D3C3AAACC0143BDCFDA1C at UE-MAILCCR.oficina.int>
>> Content-Type: text/plain; charset="us-ascii"
>> 
>> Have you tried Zamzar? It worked for me once
>> good luck down under!
>> 
>> De: data-driven-journalism-bounces at lists.okfn.org [mailto:data-driven-journalism-bounces at lists.okfn.org] En nombre de Greg Barila
>> Enviado el: domingo, 26 de mayo de 2013 4:28
>> Para: data-driven-journalism at lists.okfn.org
>> Asunto: [ddj] Unlocking PDF data
>> 
>> Hi there. I'm a journalist based in Adelaide, South Australia. I've been dabbling in some simple data journalism projects over the past couple of years (see some examples here: http://adelaidedatablog.tumblr.com )
>> 
>> I'm interested - does anybody know of a good, open-source tool for converting PDFs into editable documents, preferably excel?
>> 
>> I know about tools like Tabula - but it appears the tool is experimental and not available for general use.
>> 
>> Any tips would be appreciated.
>> 
>> Greg
>> (@GregBarila)
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://lists.okfn.org/pipermail/data-driven-journalism/attachments/20130526/57f4142f/attachment-0001.htm>
>> 
>> ------------------------------
>> 
>> Message: 4
>> Date: Sun, 26 May 2013 11:19:28 +0800
>> From: Andrew Duffy <andrewjamesduffy at gmail.com>
>> Subject: Re: [ddj] Unlocking PDF data
>> To: "List about Data Driven Journalism and Open Data in Journalism."
>>        <data-driven-journalism at lists.okfn.org>
>> Message-ID:
>>        <CAO8PDYQcbiJ2LmzYgLiynWruCHurQ0g0B4mfELgxfMRLGcJ9QQ at mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>> 
>> Last time I checked Google Drive could convert PDFs into editable documents
>> if you tell it to.
>> 
>> --Andrew Duffy (from Perth)
>> 
>> 
>> On Sun, May 26, 2013 at 10:28 AM, Greg Barila <gregbarila at gmail.com> wrote:
>> 
>>> Hi there. I'm a journalist based in Adelaide, South Australia. I've been
>>> dabbling in some simple data journalism projects over the past couple of
>>> years (see some examples here: http://adelaidedatablog.tumblr.com )
>>> 
>>> I'm interested - does anybody know of a good, open-source tool for
>>> converting PDFs into editable documents, preferably excel?
>>> 
>>> I know about tools like Tabula - but it appears the tool is experimental
>>> and not available for general use.
>>> 
>>> Any tips would be appreciated.
>>> 
>>> Greg
>>> (@GregBarila)
>>> 
>>> _______________________________________________
>>> data-driven-journalism mailing list
>>> data-driven-journalism at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>>> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>>> 
>>> 
>> 
>> 
>> --
>> 
>> *Andrew Duffy - Journalist*
>> 
>> *Cirrus Media*
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://lists.okfn.org/pipermail/data-driven-journalism/attachments/20130526/581de595/attachment.htm>
>> 
>> ------------------------------
>> 
>> _______________________________________________
>> data-driven-journalism mailing list
>> data-driven-journalism at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>> Unsubscribe: http://lists.okfn.org/mailman/optionss/data-driven-journalism
>> 
>> 
>> End of data-driven-journalism Digest, Vol 26, Issue 34
>> ******************************************************
> 
> -- 
> The information contained in this e-mail message and any accompanying files 
> is or may be confidential. If you are not the intended recipient, any use, 
> dissemination, reliance, forwarding, printing or copying of this e-mail or 
> any attached files is unauthorised. This e-mail is subject to copyright. No 
> part of it should be reproduced, adapted or communicated without the 
> written consent of the copyright owner. If you have received this e-mail in 
> error please advise the sender immediately by return e-mail or telephone 
> and delete all copies. Fairfax Media does not guarantee the accuracy or 
> completeness of any information contained in this e-mail or attached files. 
> Internet communications are not secure, therefore Fairfax Media does not 
> accept legal responsibility for the contents of this message or attached 
> files.
> 
> _______________________________________________
> data-driven-journalism mailing list
> data-driven-journalism at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism




More information about the data-driven-journalism mailing list