[okfn-labs] what we are working on...

Matthias Schlögl m.schloegl at bath.ac.uk
Wed Feb 12 15:17:10 UTC 2014


Hi,

my name is Matthias Schlögl. I did my master degree at the University of Vienna and I am currently working as a research assistant and PhD student at the University of Bath (Social & Policy Sciences Departement, Prof. David Miller), as well as at the Commission for Development Research in Vienna. Additionally I am involved in some minor projects (e.g. for the Arbeiterkammer, the official representation of employees in Austria).

Jonathan Gray asked me to post something on what we are working on at the moment to this list. If you are interested in something in greater detail (e.g. the technical side) please don't hesitate to contact me.

Basically speaking we/I are trying to map networks (power relations) and find patterns. However, recently we started to also include texts themselves into our research efforts.
I have to say right at the beginning that I am a social scientist by training and never really learned programming in school or at university. However, when still in University I started to recognize the power of these new technologies and started to learn to program a bit by myself (HTML, JavaScript, Python).

Technically speaking we have several different projects:
We have two semantic wikis of which one is publicly available: http://thinktanknetworkresearch.net/wiki_ttni_en/index.php?title=Main_Page Its on think tank networks and holds information on about 400 think tanks. Dieter Plehwe, Werner Krämer and I started the project at the Social Science Research Centre Berlin some years ago. However, unfortunately the information is still not equally good for all think tanks. As it is a semantic wiki you can query it like a database and it is easily possible to export data in various formats (e.g. Json, csv etc.). I wrote some Python scripts that for example can automatically generate interlocking directorates data and export files suitable for Social Network Analysis programs like Visone, Pajek or Gephi.
We use forms to enter the data into the wiki, that makes it possible that researchers that have no clue about wiki markup can edit and add information very easily.

Additionally we conducted some data-mining/scraping projects throughout recent years. We scraped for example the Science Media Centre articles, or the WEF contributors database until 2009 (about 14000 people that once spoke on the WEF). We also scanned a printed version of the Trade Associations Directory (2007) and used OCR to generate a database of trade associations and their members (only in the field of addiction industries).

We are mainly using Social Network Analysis and Natural Language Processing (e.g. TF/IDF to compute keywords for the SMC articles) techniques to analyze our data.

At the moment we have - additionally to the data we collected ourselves - data from several different sources and a system to collaborate (the wiki) that is, though working fine for the things we are doing right now, likely to get performance issues in the future. The semantic wiki is just not build for the complex networks we are trying to map. So I am working (also for my PhD) on a system that should be capable to combine the data we have right now and allow future projects. Funny enough like grano (the system used for openinterests.eu) it uses a graphdatabase, though its of course not as professional as grano.
Its build on Neo4J (graphdatabase) and Django (web framework). It holds 3 different kind of entities (nodes): People, Institutions and texts. I try to process texts added to the database automatically and find connections within them using regexp: e.g. URLs, Names, figures etc.. Once running it should automatically check RSS feeds for new entries and add them to the database.

I hope that is showing a bit what I/we are doing. I plan to go to the dataharvest+ conference in Brussels, so if someones interested we could meet there and talk about the projects in greater detail.

Kind regards,

Matthias
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20140212/bfc7cf32/attachment-0003.html>


More information about the okfn-labs mailing list