[open-economics] [Project-Development] Exploratory Analysis

Gustavo Silva gustavosantaremsilva at gmail.com
Mon Mar 16 14:54:44 UTC 2015


That's very interesting Rodney and Rufus.

Thank you for your participation! :)

2015-03-16 14:30 GMT+00:00 Rufus Pollock <rufus.pollock at okfn.org>:

> You guys may be interested in:
>
> http://okfnlabs.org/projects/tikaserver/
>
> http://okfnlabs.org/blog/2015/02/21/documents-to-text.html
>
> Rufus
>
> On 16 March 2015 at 15:01, Rodney Beard <rodney.m.beard at gmail.com> wrote:
>
>> We could also automate it something like this:
>> http://stackoverflow.com/questions/3637781/converting-a-pdf-to-text-html-in-python-so-i-can-parse-it
>>
>> Rodney
>>
>> On Sun, Mar 15, 2015 at 11:31 AM, Carsten Pauck <
>> carsten.pauck at googlemail.com> wrote:
>>
>>> To extract textual data from a pdf ...
>>>
>>> (1) you may just be able to save the pdf as text using the adobe reader.
>>> (2) In case the contract is saved as a picture (the contract is scanned
>>> and saved as pdf) you need to use a tool that provides an optical
>>> character recognition
>>> (OCR) (abbyy finereader may work well (a commercial software) as one
>>> example.
>>> "Ocr" does not guarantee a correct transformation from a Picture to a
>>> Textfile in case the scan is not clear - so you might fail to transform
>>> words
>>> into text in some cases.
>>>
>>> After elimination of "stop words" (a, and, the, ...) it is quite simple
>>> to search
>>> for keywords or counts
>>>
>>> [image: Inline-Bild 1]
>>>
>>>
>>> Once the
>>>
>>> 2015-03-09 20:28 GMT+01:00 Gustavo Silva <gustavosantaremsilva at gmail.com
>>> >:
>>>
>>>> So, the first meeting took place today, Monday the 9th of March, where
>>>> we managed to discuss some of the research topics I gathered from the
>>>> mailing list. We also discussed some secondary topics in regard to the
>>>> group.
>>>>
>>>> Anyway, without any further delays, we decided to go for the this
>>>> analysis for now since it is more feasible and doable for us. We do not
>>>> have that many people involved and nobody wants to spend too much time on
>>>> this alone. Therefore, this analysis is perfect to start and then we will
>>>> continue from there. Be aware, though, this is an exploratory
>>>> investigation. We do not have any specific topics to cover, so we will
>>>> decide as it goes on.
>>>>
>>>> This analysis demands us to do the following steps:
>>>>
>>>>    1. Detail one contract from the literature perspective, meaning
>>>>    that we should find the most important keywords for a specific type of
>>>>    contract;
>>>>    2. Identify the various types of contracts available in the Open
>>>>    Oil database;
>>>>       1. At this point, we might be able to take some conclusions out
>>>>       of the research. We need to check if it is possible.
>>>>    3. Generalize all contracts, in an automated way;
>>>>       1. Take our final conclusions.
>>>>
>>>> Rodney Beard will helps us find the most relevant literature for the
>>>> first task, while Carsten Pauck will help us automate the process for the
>>>> third step. Most of the contracts will be retrieved from Open Oil's
>>>> interface.
>>>>
>>>> Lets take the investigation from here on. I would like to know who is
>>>> interested in participating in this investigation. I know more people will
>>>> want to participate, even though only 3 people showed their interest in our
>>>> previous hangout meeting.
>>>> After having everyone on board, I will want to talk about other things
>>>> that I didn't talk in this meeting - Project Management softwares, to ease
>>>> our job and also ensure that no information goes to waste.
>>>>
>>>> Thank you for your time.
>>>> --
>>>> Best Regards / Obrigado e com os melhores cumprimentos,
>>>> Gustavo Silva
>>>>
>>>> *Open Economics <http://openeconomics.net>, Work Group Coordinator*
>>>> *Phonebloks <https://phonebloks.com/en>, Partnership Manager*
>>>>
>>>>
>>>> _______________________________________________
>>>> open-economics mailing list
>>>> open-economics at lists.okfn.org
>>>> https://lists.okfn.org/mailman/listinfo/open-economics
>>>> Unsubscribe: https://lists.okfn.org/mailman/options/open-economics
>>>>
>>>>
>>>
>>> _______________________________________________
>>> open-economics mailing list
>>> open-economics at lists.okfn.org
>>> https://lists.okfn.org/mailman/listinfo/open-economics
>>> Unsubscribe: https://lists.okfn.org/mailman/options/open-economics
>>>
>>>
>>
>> _______________________________________________
>> open-economics mailing list
>> open-economics at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/open-economics
>> Unsubscribe: https://lists.okfn.org/mailman/options/open-economics
>>
>>
>
>
> --
>
> *Rufus PollockFounder and President | skype: rufuspollock | @rufuspollock
> <https://twitter.com/rufuspollock>Open Knowledge <http://okfn.org/> - see
> how data can change the world**http://okfn.org/ <http://okfn.org/> |
> @okfn <http://twitter.com/OKFN> | Open Knowledge on Facebook
> <https://www.facebook.com/OKFNetwork> |  Blog <http://blog.okfn.org/>*
>
> _______________________________________________
> open-economics mailing list
> open-economics at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/open-economics
> Unsubscribe: https://lists.okfn.org/mailman/options/open-economics
>
>


-- 
Best Regards / Obrigado e com os melhores cumprimentos,
Gustavo Silva

*Open Economics <http://openeconomics.net>, Work Group Coordinator*
*Phonebloks <https://phonebloks.com/en>, Partnership Manager*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-economics/attachments/20150316/793c088b/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 93909 bytes
Desc: not available
URL: <http://lists.okfn.org/pipermail/open-economics/attachments/20150316/793c088b/attachment-0003.png>


More information about the open-economics mailing list