[open-economics] Project idea OpenEcon and call for coders

Guo Xu digitalepourpre at gmail.com
Mon Apr 18 07:50:34 UTC 2011


I am reposting this because last e-mail was jumbled up for some weird reason...

###

Dear list,

Just wanted to share an idea I had for quite a while. So far, I
haven't been able to explore this further - just wanted to use this
list as an opportunity to get some feedback.

In the course of my studies and research, I have come across many
empirical (macro) econ papers. Basically, all papers are about the
relationship between X and Y, for example GDP growth and education
etc.

Even though presumably rigorous, most relationships prove very
fragile: After 20 years of growth regressions (Y=GDP growth), for
example, we still have not found any robust explanatory variable (X).
Another notorious example comes from the democracy-growth research: In
a review of 81 empirical studies (all using growth as Y and democracy
as X), 16% of the papers find a negative significant relationship, 20%
a negative insignificant, 38% a positive insignificant and 26% a
positive significant relationship.

Why is this so? Basically, there are too many degrees of freedom: You
can choose among a large number of datasets, use a different time
period, restrict the number of countries, vary with the number of
control variables and play around with the econometric estimation
(OLS, robust, clustered, FE, RE etc.). This means that you can always
get the result you want, as long as you just try hard enough and
recombine the different choices - this is what people call "kitchen
sink regressions", a result of data mining and confirmation bias.

Why is this a problem? Unfortunately, published articles will not tell
you how many different specifications they have tried until they
finally got their results (of course, this is a very unscientific
approach but it does happen a lot) - instead, the result will be
presented as if it was obtained at first trial. There is also a huge
pile of "novel contributions" that claim to have finally found the
"real" relationship, adding to the pile of contradicting empirical
work.

My idea now is to create a public database where relationships can be
entered (let's call this project "X on Y" for now) and searched.
Basically, you would choose your Y and X and the website would throw
out all existing results, either entered by users or scraped from the
pile of existing studies (most of the output tables are almost
standardized so it should be doable). This would be incredibly useful
for researchers. For example, if I am considering to test the
relationship between Co2 reductions and democracy, I would not need to
go through a literature research but can simply check on the website.
The website would then throw out all existing findings (how many
positive significant, how many negative significant etc.), maybe
including the link to the paper or the datasets. This would save a lot
of redundant work and help to advance in research.

The website would also capture those regressions that are NOT reported
in the final
research paper. This would also allow others to check whether the
result is based on confirmation bias (how many different variations
did s/he try before reporting the final result?). Eventually, the
website might even turn out to be a source of validation for
researchers. The website would be THE departing point for empirical
research. People need not go through piles of literature review but
can get the results in a standardized format.

Such a website does not exist (to my knowledge) but can have a
profound impact on how research in economics is done: 1) We could also
write a plugin for econometric software like Stata that reports each
regression you run (the resulting relationship) to the website. 2) By
doing so, the app would even ADVANCE research as it would allow the
creation of meta-regressions. These are regressions that use existing
studies (for example all 81 democracy-growth studies) to estimate a
robust relationship by controlling for differences in the method of
estimation (sample, measures, etc). Meta-regressions have become
increasingly popular these days but generally require a lot of
literature review, which could be avoided if we had that database.

As a researcher, I am quite certain that this tool will be very
useful. Unfortunately, I do not have the technical knowledge to
program such a website (I have an ASP 3.0 background but will not be
able to use all the new technologies). Just wanted to check if anyone
is interested in investing some time in this? I could also try to
mobilize some resources for this - this is a HUGE gap and filling this
could really change the way macroeconometrics is done.

Guo




More information about the open-economics mailing list