[open-economics] Content Mine launches

Thu Apr 23 13:42:05 UTC 2015

I am very excited to see Content Mine kick off. Best of luck!

At ResourceContracts we have used DocumentCloud for a few years for a
combination of PDF viewer, image tagging and a light metadata schema (eg.
year of contract, country, commodity).

For example:
http://www.resourcecontracts.org/#documents?countries=liberia&mining_title=exploration-permit-license

We are currently exploring if DocumentCloud or another text mining tool
will be the best way to continue forward.

Also, worth keeping an eye out for Hypothesis: https://hypothes.is/

Best,
Anders

On Thu, Apr 23, 2015 at 9:36 AM, Pierre-Carl Langlais <
pierrecarl.langlais at gmail.com> wrote:

>  Hi everyone,
>
> I'm actually trying to mine French newspapers and scientific economic
> journals from the XIXth century (such as the *Journal des économistes*).
> You have to account for the OCR mistakes (which are much more frequent,
> considering that archives may be dirty, and that OCR tools are not thought
> for 150-200 years old fonts). Yet, given that corpora can be very large,
> easily several decades to one century of publication, text mining still
> yields interesting results (much more for textual content than for
> statistics tables).
>
> PCL
>
> Le 23/04/15 15:19, John Levin a écrit :
>
> hi all, on both the open humanities and open economics lists
>
> The Content Mine has launched:
> contentmine.org
> @TheContentMine
> For a good intro to it:
>
> http://blogs.ch.cam.ac.uk/pmr/2015/04/16/thecontentmine-is-ready-for-business-and-will-make-scientific-and-medical-facts-available-to-everyone-on-a-massive-scale/
> As stated in title of that post, the aim is to make "scientific and
> medical facts available to everyone on a massive scale." This will be done
> through automated text mining of scientific literature for facts, and then
> connecting and organizing these facts.
>
> The obvious question is: what of application to other disciplines, such as
> economics, history, etc.
> My immediate thought is simply that the scraping aspect can be used to
> mine publications for data, especially numeric. There's many a table in
> economic and economic history journals; fewer, but still some, in other
> historical journals.
>
> (I spoke briefly with Peter Murray-Rust about this; it appears that the
> publications to be mined need to be fairly recent, ie last 50 years or so.
> Mining historical material, in the Internet Archive for example, would
> therefore be problematic.)
>
> Any other ideas?
>
> John
>
>
>
> _______________________________________________
> open-economics mailing list
> open-economics at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/open-economics
> Unsubscribe: https://lists.okfn.org/mailman/options/open-economics
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-economics/attachments/20150423/553def60/attachment-0003.html>