[okfn-labs] what we are working on...

Wed Feb 12 16:19:03 UTC 2014

Hey Matthias, 

it’s fascinating to hear about the work you’re doing - thanks very much for the comprehensive write-up! As a first ask, you seem to know a lot of approaches and tools in this space - I’m wondering whether you would be interested in contributing to the SNA tool survey that I’ve started (http://untangled.knightlab.com/readings/charting-social-network-analysis-tools.html). Also, it needs a better set of classification criteria :) 

I’m really interested to learn more about your work on text analysis, it would be cool to know whether you’ve actually used entity extraction techniques to generate graphs? What other connections do you see between NLP and SNA? I’ve been very interested in combining narrative and structured elements in graphs, e.g. this mock-up: http://opendatalabs.org/misc/demo/grano/_mockup/. 

It would also be interesting if you could see your work relate to OpenInterests directly: do you think it may make sense to import any of the datasets you have collected into the site? I’m quite eager to combine different sources of information, as long as they are clearly attributable it should make for interesting overlaps. 

I’m also delighted to see that you’ve already come across grano ;) The project is really new (and, to be honest, not that professional), and therefore also still quite malleable in its design - so if there is any way we could make it useful to your project, I’d be delighted to explore those options! It doesn’t actually use a graph database, mostly because I wanted a data schema that fully traces each fact’s source and attribution - in a way its more about collecting “evidence" than just making a graph.

My greatest challenge with the project is not so much about data mining at all. OpenInterests, for example, already has a significant amount of well-structured information available. The issue then is: what can I let users like investigative reporters or researchers do with this, so that it is actually an everyday tool, rather than just a big bucket of stuff that one stumbles across occasionally.

The answer will probably technically boil down to graph algos, list-making and aggregate reporting of some sort - but that isn’t the layer on which we can expect these groups of users to work. So there has to be an intermediate language that actually defines human activities, which is probably also going to be fairly domain-specific. Not sure this makes a lot of sense, but if you had any pointers to work in that direction, I would really appreciate it!

For the combination of technologies that you’re using (Neo/Django), I’d also consider having a look at detective.io, it’s Journalism++ brain child - an open source platform aimed at journalists with no data modelling skills. 

All the best, 

- Friedrich 

On 12 Feb 2014, at 16:17, Matthias Schlögl <m.schloegl at bath.ac.uk> wrote:

> Hi,
> 
> my name is Matthias Schlögl. I did my master degree at the University of Vienna and I am currently working as a research assistant and PhD student at the University of Bath (Social & Policy Sciences Departement, Prof. David Miller), as well as at the Commission for Development Research in Vienna. Additionally I am involved in some minor projects (e.g. for the Arbeiterkammer, the official representation of employees in Austria).
> 
> Jonathan Gray asked me to post something on what we are working on at the moment to this list. If you are interested in something in greater detail (e.g. the technical side) please don't hesitate to contact me.
> 
> Basically speaking we/I are trying to map networks (power relations) and find patterns. However, recently we started to also include texts themselves into our research efforts.
> I have to say right at the beginning that I am a social scientist by training and never really learned programming in school or at university. However, when still in University I started to recognize the power of these new technologies and started to learn to program a bit by myself (HTML, JavaScript, Python).
> 
> Technically speaking we have several different projects:
> We have two semantic wikis of which one is publicly available: http://thinktanknetworkresearch.net/wiki_ttni_en/index.php?title=Main_Page Its on think tank networks and holds information on about 400 think tanks. Dieter Plehwe, Werner Krämer and I started the project at the Social Science Research Centre Berlin some years ago. However, unfortunately the information is still not equally good for all think tanks. As it is a semantic wiki you can query it like a database and it is easily possible to export data in various formats (e.g. Json, csv etc.). I wrote some Python scripts that for example can automatically generate interlocking directorates data and export files suitable for Social Network Analysis programs like Visone, Pajek or Gephi.
> We use forms to enter the data into the wiki, that makes it possible that researchers that have no clue about wiki markup can edit and add information very easily.
> 
> Additionally we conducted some data-mining/scraping projects throughout recent years. We scraped for example the Science Media Centre articles, or the WEF contributors database until 2009 (about 14000 people that once spoke on the WEF). We also scanned a printed version of the Trade Associations Directory (2007) and used OCR to generate a database of trade associations and their members (only in the field of addiction industries).
> 
> We are mainly using Social Network Analysis and Natural Language Processing (e.g. TF/IDF to compute keywords for the SMC articles) techniques to analyze our data.
> 
> At the moment we have - additionally to the data we collected ourselves - data from several different sources and a system to collaborate (the wiki) that is, though working fine for the things we are doing right now, likely to get performance issues in the future. The semantic wiki is just not build for the complex networks we are trying to map. So I am working (also for my PhD) on a system that should be capable to combine the data we have right now and allow future projects. Funny enough like grano (the system used for openinterests.eu) it uses a graphdatabase, though its of course not as professional as grano.
> Its build on Neo4J (graphdatabase) and Django (web framework). It holds 3 different kind of entities (nodes): People, Institutions and texts. I try to process texts added to the database automatically and find connections within them using regexp: e.g. URLs, Names, figures etc.. Once running it should automatically check RSS feeds for new entries and add them to the database.
> 
> I hope that is showing a bit what I/we are doing. I plan to go to the dataharvest+ conference in Brussels, so if someones interested we could meet there and talk about the projects in greater detail.
> 
> Kind regards,
> 
> Matthias
> _______________________________________________
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/okfn-labs
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20140212/f404e07f/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20140212/f404e07f/attachment-0004.sig>