[ddj] Lagarde List

Andrea Nelson Mauro andrea.nelson.mauro at gmail.com
Wed Mar 4 15:29:00 UTC 2015


Hi Sam,

What a great example of scraping! Can you share some stats of your work
(like range of mistakes, time to do)? I've some friends from Wikipedia who
would be very interested on the tool!

-
sorry for typos, sent by mobile
-
Andrea Nelson Mauro
@nelsonmau
dataninja.it
-
 Il giorno 04/mar/2015 16:16, "Sam Leon" <sam.leon at okfn.org> ha scritto:

> Hi All,
>
> In case anyone was interested, I ended up using ABBY Finereader Online to
> OCR the PDF <http://www.finereaderonline.com/en-us> (thanks for the
> recommendation Friedrich!) . Gave fairly good results, which were good
> enough for my purposes.
>
> I've attached the end product in case useful to anyone else.
>
> Cheers,
>
> Sam
>
> On 12 February 2015 at 09:19, Sam Leon <sam.leon at okfn.org> wrote:
>
>> Thank you Victoria and Theresa, it's not immediately clear if the public
>> ICIJ data contains the names from the Lagarde List. I'll dig in later this
>> week and report my findings here.
>>
>> Sam
>>
>> On 11 February 2015 at 12:50, Victoria Parsons <
>> victoria.megan.parsons at googlemail.com> wrote:
>>
>>> I haven't had a proper look but what about this link from the comments
>>> under the article: http://icij-uploads.s3-website-us-
>>> east-1.amazonaws.com/2013/10/offshore/csv.zip
>>>
>>> Vic
>>>
>>> --
>>>
>>> Victoria Parsons
>>> The Bureau of Investigative Journalism
>>> The Myddleton Building, 167-173 Goswell Road
>>> London  EC1V 7HD+44 (0)20 7040 0095
>>> @vicparsons_ <https://twitter.com/vicparsons_>
>>> Public key <http://bit.ly/1BpOrVG>
>>>
>>>
>>> Follow us on Twitter <https://twitter.com/TBIJ>
>>> Like us on Facebook <https://www.facebook.com/thebureauinvestigates>
>>> Find us on LinkedIn <http://www.linkedin.com/company/the-bureau-of-investigative-journalism>
>>> Sign up for email alerts <http://eepurl.com/IUtYL> from the Bureau's covert drone war investigation
>>>
>>>
>>> On Wed, Feb 11, 2015 at 12:18 PM, Theresa Mallinson <
>>> theresa.mallinson at gmail.com> wrote:
>>>
>>>> Hmmm, or maybe not... Just read comments under the article. But it
>>>> seems people are getting together to pressure ICIJ to release list. Does
>>>> Tabula not work for scanned PDFs?
>>>>
>>>> *Theresa Mallinson*
>>>> Assistant editor at *The Daily Vox <http://www.thedailyvox.co.za>*
>>>> *@tcmallinson <http://www.twitter.com/tcmallinson>*
>>>> +27 76 673 4076
>>>> Subscribe to me on *Beacon *
>>>> <http://www.beaconreader.com/theresa-mallinson>
>>>>
>>>> On 11 February 2015 at 14:15, Theresa Mallinson <
>>>> theresa.mallinson at gmail.com> wrote:
>>>>
>>>>> I haven't had a chance to look at this properly yet, but could help?
>>>>> http://www.icij.org/project/swiss-leaks/explore-swiss-leaks-data At
>>>>> very least, ICIJ should be able to help you out w/ list?
>>>>>
>>>>> Thanks
>>>>>
>>>>> *Theresa Mallinson*
>>>>> Assistant editor at *The Daily Vox <http://www.thedailyvox.co.za>*
>>>>> *@tcmallinson <http://www.twitter.com/tcmallinson>*
>>>>> +27 76 673 4076
>>>>> Subscribe to me on *Beacon *
>>>>> <http://www.beaconreader.com/theresa-mallinson>
>>>>>
>>>>> On 11 February 2015 at 14:11, Sam Leon <sam.leon at okfn.org> wrote:
>>>>>
>>>>>> Does anyone have a machine-readable copy of the "Lagarde List
>>>>>> <http://en.wikipedia.org/wiki/Lagarde_list>" they could share?
>>>>>>
>>>>>> I can only find the scanned PDFs that have been published...
>>>>>>
>>>>>> http://www.protothema.gr/files/1/2013/03/21/lagarde-list.pdf
>>>>>>
>>>>>> Sam
>>>>>>
>>>>>> --
>>>>>>
>>>>>> *Sam LeonSenior analyst & trainer | skype: samedleon  |  @Noel_Mas
>>>>>> <https://twitter.com/noel_mas>The Open Knowledge Foundation
>>>>>> <http://okfn.org/>Empowering through Open Knowledgehttp://okfn.org/
>>>>>> <http://okfn.org/>  |  @okfn <http://twitter.com/OKFN>  |  OKF on Facebook
>>>>>> <https://www.facebook.com/OKFNetwork>  |  Blog <http://blog.okfn.org/>  |
>>>>>>  Newsletter <http://okfn.org/about/newsletter>*
>>>>>>
>>>>>> _______________________________________________
>>>>>> data-driven-journalism mailing list
>>>>>> data-driven-journalism at lists.okfn.org
>>>>>> https://lists.okfn.org/mailman/listinfo/data-driven-journalism
>>>>>> Unsubscribe:
>>>>>> https://lists.okfn.org/mailman/options/data-driven-journalism
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> data-driven-journalism mailing list
>>>> data-driven-journalism at lists.okfn.org
>>>> https://lists.okfn.org/mailman/listinfo/data-driven-journalism
>>>> Unsubscribe:
>>>> https://lists.okfn.org/mailman/options/data-driven-journalism
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> data-driven-journalism mailing list
>>> data-driven-journalism at lists.okfn.org
>>> https://lists.okfn.org/mailman/listinfo/data-driven-journalism
>>> Unsubscribe:
>>> https://lists.okfn.org/mailman/options/data-driven-journalism
>>>
>>>
>>
>>
>> --
>>
>> *Sam LeonSenior analyst & trainer | skype: samedleon  |  @Noel_Mas
>> <https://twitter.com/noel_mas>The Open Knowledge Foundation
>> <http://okfn.org/>Empowering through Open Knowledgehttp://okfn.org/
>> <http://okfn.org/>  |  @okfn <http://twitter.com/OKFN>  |  OKF on Facebook
>> <https://www.facebook.com/OKFNetwork>  |  Blog <http://blog.okfn.org/>  |
>>  Newsletter <http://okfn.org/about/newsletter>*
>>
>
>
>
> --
>
> *Sam LeonSenior analyst & trainer | skype: samedleon  |  @Noel_Mas
> <https://twitter.com/noel_mas>The Open Knowledge Foundation
> <http://okfn.org/>Empowering through Open Knowledgehttp://okfn.org/
> <http://okfn.org/>  |  @okfn <http://twitter.com/OKFN>  |  OKF on Facebook
> <https://www.facebook.com/OKFNetwork>  |  Blog <http://blog.okfn.org/>  |
>  Newsletter <http://okfn.org/about/newsletter>*
>
> _______________________________________________
> data-driven-journalism mailing list
> data-driven-journalism at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/data-driven-journalism
> Unsubscribe: https://lists.okfn.org/mailman/options/data-driven-journalism
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/data-driven-journalism/attachments/20150304/e153a6c7/attachment-0003.html>


More information about the data-driven-journalism mailing list