[open-economics] [Project-Development] Exploratory Analysis

Rodney Beard rodney.m.beard at gmail.com
Mon Mar 16 14:01:08 UTC 2015


We could also automate it something like this:
http://stackoverflow.com/questions/3637781/converting-a-pdf-to-text-html-in-python-so-i-can-parse-it

Rodney

On Sun, Mar 15, 2015 at 11:31 AM, Carsten Pauck <
carsten.pauck at googlemail.com> wrote:

> To extract textual data from a pdf ...
>
> (1) you may just be able to save the pdf as text using the adobe reader.
> (2) In case the contract is saved as a picture (the contract is scanned
> and saved as pdf) you need to use a tool that provides an optical
> character recognition
> (OCR) (abbyy finereader may work well (a commercial software) as one
> example.
> "Ocr" does not guarantee a correct transformation from a Picture to a
> Textfile in case the scan is not clear - so you might fail to transform
> words
> into text in some cases.
>
> After elimination of "stop words" (a, and, the, ...) it is quite simple to
> search
> for keywords or counts
>
> [image: Inline-Bild 1]
>
>
> Once the
>
> 2015-03-09 20:28 GMT+01:00 Gustavo Silva <gustavosantaremsilva at gmail.com>:
>
>> So, the first meeting took place today, Monday the 9th of March, where we
>> managed to discuss some of the research topics I gathered from the mailing
>> list. We also discussed some secondary topics in regard to the group.
>>
>> Anyway, without any further delays, we decided to go for the this
>> analysis for now since it is more feasible and doable for us. We do not
>> have that many people involved and nobody wants to spend too much time on
>> this alone. Therefore, this analysis is perfect to start and then we will
>> continue from there. Be aware, though, this is an exploratory
>> investigation. We do not have any specific topics to cover, so we will
>> decide as it goes on.
>>
>> This analysis demands us to do the following steps:
>>
>>    1. Detail one contract from the literature perspective, meaning that
>>    we should find the most important keywords for a specific type of contract;
>>    2. Identify the various types of contracts available in the Open Oil
>>    database;
>>       1. At this point, we might be able to take some conclusions out of
>>       the research. We need to check if it is possible.
>>    3. Generalize all contracts, in an automated way;
>>       1. Take our final conclusions.
>>
>> Rodney Beard will helps us find the most relevant literature for the
>> first task, while Carsten Pauck will help us automate the process for the
>> third step. Most of the contracts will be retrieved from Open Oil's
>> interface.
>>
>> Lets take the investigation from here on. I would like to know who is
>> interested in participating in this investigation. I know more people will
>> want to participate, even though only 3 people showed their interest in our
>> previous hangout meeting.
>> After having everyone on board, I will want to talk about other things
>> that I didn't talk in this meeting - Project Management softwares, to ease
>> our job and also ensure that no information goes to waste.
>>
>> Thank you for your time.
>> --
>> Best Regards / Obrigado e com os melhores cumprimentos,
>> Gustavo Silva
>>
>> *Open Economics <http://openeconomics.net>, Work Group Coordinator*
>> *Phonebloks <https://phonebloks.com/en>, Partnership Manager*
>>
>>
>> _______________________________________________
>> open-economics mailing list
>> open-economics at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/open-economics
>> Unsubscribe: https://lists.okfn.org/mailman/options/open-economics
>>
>>
>
> _______________________________________________
> open-economics mailing list
> open-economics at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/open-economics
> Unsubscribe: https://lists.okfn.org/mailman/options/open-economics
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-economics/attachments/20150316/5b00c688/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 93909 bytes
Desc: not available
URL: <http://lists.okfn.org/pipermail/open-economics/attachments/20150316/5b00c688/attachment-0003.png>


More information about the open-economics mailing list