[open-economics] [Project-Development] Exploratory Analysis

Carsten Pauck carsten.pauck at googlemail.com
Sun Mar 15 11:31:53 UTC 2015


To extract textual data from a pdf ...

(1) you may just be able to save the pdf as text using the adobe reader.
(2) In case the contract is saved as a picture (the contract is scanned
and saved as pdf) you need to use a tool that provides an optical character
recognition
(OCR) (abbyy finereader may work well (a commercial software) as one
example.
"Ocr" does not guarantee a correct transformation from a Picture to a
Textfile in case the scan is not clear - so you might fail to transform
words
into text in some cases.

After elimination of "stop words" (a, and, the, ...) it is quite simple to
search
for keywords or counts

[image: Inline-Bild 1]


Once the

2015-03-09 20:28 GMT+01:00 Gustavo Silva <gustavosantaremsilva at gmail.com>:

> So, the first meeting took place today, Monday the 9th of March, where we
> managed to discuss some of the research topics I gathered from the mailing
> list. We also discussed some secondary topics in regard to the group.
>
> Anyway, without any further delays, we decided to go for the this analysis
> for now since it is more feasible and doable for us. We do not have that
> many people involved and nobody wants to spend too much time on this alone.
> Therefore, this analysis is perfect to start and then we will continue from
> there. Be aware, though, this is an exploratory investigation. We do not
> have any specific topics to cover, so we will decide as it goes on.
>
> This analysis demands us to do the following steps:
>
>    1. Detail one contract from the literature perspective, meaning that
>    we should find the most important keywords for a specific type of contract;
>    2. Identify the various types of contracts available in the Open Oil
>    database;
>       1. At this point, we might be able to take some conclusions out of
>       the research. We need to check if it is possible.
>    3. Generalize all contracts, in an automated way;
>       1. Take our final conclusions.
>
> Rodney Beard will helps us find the most relevant literature for the first
> task, while Carsten Pauck will help us automate the process for the third
> step. Most of the contracts will be retrieved from Open Oil's interface.
>
> Lets take the investigation from here on. I would like to know who is
> interested in participating in this investigation. I know more people will
> want to participate, even though only 3 people showed their interest in our
> previous hangout meeting.
> After having everyone on board, I will want to talk about other things
> that I didn't talk in this meeting - Project Management softwares, to ease
> our job and also ensure that no information goes to waste.
>
> Thank you for your time.
> --
> Best Regards / Obrigado e com os melhores cumprimentos,
> Gustavo Silva
>
> *Open Economics <http://openeconomics.net>, Work Group Coordinator*
> *Phonebloks <https://phonebloks.com/en>, Partnership Manager*
>
>
> _______________________________________________
> open-economics mailing list
> open-economics at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/open-economics
> Unsubscribe: https://lists.okfn.org/mailman/options/open-economics
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-economics/attachments/20150315/36b16ea7/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 93909 bytes
Desc: not available
URL: <http://lists.okfn.org/pipermail/open-economics/attachments/20150315/36b16ea7/attachment-0003.png>


More information about the open-economics mailing list