[open-economics] [Project-Development] Exploratory Analysis

Rufus Pollock rufus.pollock at okfn.org
Mon Mar 16 14:30:29 UTC 2015


You guys may be interested in:

http://okfnlabs.org/projects/tikaserver/

http://okfnlabs.org/blog/2015/02/21/documents-to-text.html

Rufus

On 16 March 2015 at 15:01, Rodney Beard <rodney.m.beard at gmail.com> wrote:

> We could also automate it something like this:
> http://stackoverflow.com/questions/3637781/converting-a-pdf-to-text-html-in-python-so-i-can-parse-it
>
> Rodney
>
> On Sun, Mar 15, 2015 at 11:31 AM, Carsten Pauck <
> carsten.pauck at googlemail.com> wrote:
>
>> To extract textual data from a pdf ...
>>
>> (1) you may just be able to save the pdf as text using the adobe reader.
>> (2) In case the contract is saved as a picture (the contract is scanned
>> and saved as pdf) you need to use a tool that provides an optical
>> character recognition
>> (OCR) (abbyy finereader may work well (a commercial software) as one
>> example.
>> "Ocr" does not guarantee a correct transformation from a Picture to a
>> Textfile in case the scan is not clear - so you might fail to transform
>> words
>> into text in some cases.
>>
>> After elimination of "stop words" (a, and, the, ...) it is quite simple
>> to search
>> for keywords or counts
>>
>> [image: Inline-Bild 1]
>>
>>
>> Once the
>>
>> 2015-03-09 20:28 GMT+01:00 Gustavo Silva <gustavosantaremsilva at gmail.com>
>> :
>>
>>> So, the first meeting took place today, Monday the 9th of March, where
>>> we managed to discuss some of the research topics I gathered from the
>>> mailing list. We also discussed some secondary topics in regard to the
>>> group.
>>>
>>> Anyway, without any further delays, we decided to go for the this
>>> analysis for now since it is more feasible and doable for us. We do not
>>> have that many people involved and nobody wants to spend too much time on
>>> this alone. Therefore, this analysis is perfect to start and then we will
>>> continue from there. Be aware, though, this is an exploratory
>>> investigation. We do not have any specific topics to cover, so we will
>>> decide as it goes on.
>>>
>>> This analysis demands us to do the following steps:
>>>
>>>    1. Detail one contract from the literature perspective, meaning that
>>>    we should find the most important keywords for a specific type of contract;
>>>    2. Identify the various types of contracts available in the Open Oil
>>>    database;
>>>       1. At this point, we might be able to take some conclusions out
>>>       of the research. We need to check if it is possible.
>>>    3. Generalize all contracts, in an automated way;
>>>       1. Take our final conclusions.
>>>
>>> Rodney Beard will helps us find the most relevant literature for the
>>> first task, while Carsten Pauck will help us automate the process for the
>>> third step. Most of the contracts will be retrieved from Open Oil's
>>> interface.
>>>
>>> Lets take the investigation from here on. I would like to know who is
>>> interested in participating in this investigation. I know more people will
>>> want to participate, even though only 3 people showed their interest in our
>>> previous hangout meeting.
>>> After having everyone on board, I will want to talk about other things
>>> that I didn't talk in this meeting - Project Management softwares, to ease
>>> our job and also ensure that no information goes to waste.
>>>
>>> Thank you for your time.
>>> --
>>> Best Regards / Obrigado e com os melhores cumprimentos,
>>> Gustavo Silva
>>>
>>> *Open Economics <http://openeconomics.net>, Work Group Coordinator*
>>> *Phonebloks <https://phonebloks.com/en>, Partnership Manager*
>>>
>>>
>>> _______________________________________________
>>> open-economics mailing list
>>> open-economics at lists.okfn.org
>>> https://lists.okfn.org/mailman/listinfo/open-economics
>>> Unsubscribe: https://lists.okfn.org/mailman/options/open-economics
>>>
>>>
>>
>> _______________________________________________
>> open-economics mailing list
>> open-economics at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/open-economics
>> Unsubscribe: https://lists.okfn.org/mailman/options/open-economics
>>
>>
>
> _______________________________________________
> open-economics mailing list
> open-economics at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/open-economics
> Unsubscribe: https://lists.okfn.org/mailman/options/open-economics
>
>


-- 

*Rufus PollockFounder and President | skype: rufuspollock | @rufuspollock
<https://twitter.com/rufuspollock>Open Knowledge <http://okfn.org/> - see
how data can change the world**http://okfn.org/ <http://okfn.org/> | @okfn
<http://twitter.com/OKFN> | Open Knowledge on Facebook
<https://www.facebook.com/OKFNetwork> |  Blog <http://blog.okfn.org/>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-economics/attachments/20150316/b2f6dbe4/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 93909 bytes
Desc: not available
URL: <http://lists.okfn.org/pipermail/open-economics/attachments/20150316/b2f6dbe4/attachment-0003.png>


More information about the open-economics mailing list