[openbiblio-dev] [okfn-discuss] Help needed with visualization

Tom Morris tfmorris at gmail.com
Fri Feb 24 17:18:14 UTC 2012


A visualization list would probably offer more help, but here's a
relevant question from StackOverflow:
http://stackoverflow.com/questions/7730806/visualization-dead-slow-when-using-a-large-dataset

You don't really want to naively visualize this entire thing.  In
addition to being slow, it's more than anyone can absorb.

Some techniques which can be used:

Pruning - prune all singletons (any node without an edge)
Resolution reduction - collapse all nodes with outdegree < N (allowing
user to expand them again)
Field of view - focus on a certain subset of the graph (P plies out
from the focus node)

I'm sure someone who actually knew what they were talking about could
offer additional suggestions.

Tom

On Tue, Jan 24, 2012 at 11:47 AM, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:
> This is very exciting and something I have been scratching at in science
> publishing. Now the Openbiblio team has produced Bibsoup/bibserver and it
> seems your application could be very well suited to for BibSoup
>
> On Tue, Jan 24, 2012 at 4:25 PM, Guo Xu <digitalepourpre at gmail.com> wrote:
>>
>> Hi folks,
>>
>> I have been working on visualizing the networks of academic publishing
>> in economics. Here's an example for the Quarterly Journal of
>> Economics:
>>
>> http://www.guoxu.org/econmap/map.html
>>
>> A link indicates that two economists have published together in the
>> QJE. The strength of a link is defined by how many times they have
>> published together.
>>
>> The size of the node indicates how many times an author has published
>> in the QJE. Bigger nodes have published more often.
>>
>> Finally, the color indicates the ranking of the economist's alma
>> mater. Blue indicates that the author obtained his/her PhD from a top
>> 10 university (according to
>>
>> http://www.topuniversities.com/university-rankings/world-university-rankings/2011/subject-rankings/social-sciences/economics);
>> orange indicates a top 11-20 university; green is for top 21-30 and
>> red is for all universities beyond top 30.
>>
>> Couple of interesting points:
>>
>> - It seems that the core (those at the centre) are almost all made up
>> by top 10 authors. They tend to be well-connected.
>
>
> In the UK this might be called "the old boy network" - the unofficial
> network of (men) who have been to the same school / university. It does not
> necessarily indicate  absolute vaue but it is often correlated with getting
> grants, etc. [I have been in both Blue and Red universities (in science)]
>>
>>
>> - The hubs are: Phillipe Aghion, Daron Acemoglu, Marianne Bertrand
>>
>> - There are rarely authors beyond the top 30 who get published in the QJE.
>>
>> The visualization is done with D3. But it is very slow on older
>> computers. Does anyone have ideas for optimizing this?
>
>
> Yes. This is a dynamics exercise and (I assume) you have a pairwise
> repulsion term to spread the points out. Many of your points are 0-connected
> and so you spend a lot of time computing them for nothing. Unless there is
> some other hidden coordinate  I would just separate into the disjoint
> graphs. It will be hugely fast as instead of O[N*N] you have O[N] or less
> (there is a power law distrinution of cluster size)
>>
>>
>> Also, I have a lot more characteristics lying around that can be
>> displayed (e.g. gender - btw only 10% of the authors are female), but
>> I do not really know how to do it dynamically.
>>
>> Finally, I would ideally like to do the same visualization for the
>> *entire* network of economist. I have a 300 MB dataset scraped from
>> Repec that gives me information on co-authoring for virtually all
>> economics journals and working paper series. But obviously this will
>> be too slow to visualize so it would be great if someone had
>> experience in working with such big datasets (the whole dataset has
>> ~30.000 economists, which results in a 30.000 x 30.000 data matrix!!)
>>
> You will certainly find interest on openbiblio-dev as we are looking for
> bibliographic data sets and things to do with them
>
>>
>> Anyway, let me know what you think and looking forward to suggestions!
>>
>> Guo
>>
>> _______________________________________________
>> okfn-discuss mailing list
>> okfn-discuss at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/okfn-discuss
>
>
>
>
> --
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069
>
> _______________________________________________
> openbiblio-dev mailing list
> openbiblio-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/openbiblio-dev
>




More information about the openbiblio-dev mailing list