[open-economics] Content Mine launches

Pierre-Carl Langlais pierrecarl.langlais at gmail.com
Thu Apr 23 13:36:04 UTC 2015


Hi everyone,

I'm actually trying to mine French newspapers and scientific economic
journals from the XIXth century (such as the /Journal des économistes/).
You have to account for the OCR mistakes (which are much more frequent,
considering that archives may be dirty, and that OCR tools are not
thought for 150-200 years old fonts). Yet, given that corpora can be
very large, easily several decades to one century of publication, text
mining still yields interesting results (much more for textual content
than for statistics tables).

PCL

Le 23/04/15 15:19, John Levin a écrit :
> hi all, on both the open humanities and open economics lists
>
> The Content Mine has launched:
> contentmine.org
> @TheContentMine
> For a good intro to it:
> http://blogs.ch.cam.ac.uk/pmr/2015/04/16/thecontentmine-is-ready-for-business-and-will-make-scientific-and-medical-facts-available-to-everyone-on-a-massive-scale/
>
> As stated in title of that post, the aim is to make "scientific and
> medical facts available to everyone on a massive scale." This will be
> done through automated text mining of scientific literature for facts,
> and then connecting and organizing these facts.
>
> The obvious question is: what of application to other disciplines,
> such as economics, history, etc.
> My immediate thought is simply that the scraping aspect can be used to
> mine publications for data, especially numeric. There's many a table
> in economic and economic history journals; fewer, but still some, in
> other historical journals.
>
> (I spoke briefly with Peter Murray-Rust about this; it appears that
> the publications to be mined need to be fairly recent, ie last 50
> years or so. Mining historical material, in the Internet Archive for
> example, would therefore be problematic.)
>
> Any other ideas?
>
> John
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-economics/attachments/20150423/a55e3c6b/attachment-0003.html>


More information about the open-economics mailing list