[open-science] Text mining..Stanford NLP class

Verhelst, Lieke lieke.verhelst at wur.nl
Fri Mar 9 08:51:25 UTC 2012


Hi all

In the light of this..Do people know there is a free online Natural Language Processing class from Stanford University starting next Monday?
I believe you can still sign up.
See for this course and other at  https://www.coursera.org/landing/hub.php

Best , Lieke

-----Original Message-----
From: open-science-bounces at lists.okfn.org [mailto:open-science-bounces at lists.okfn.org] On Behalf Of open-science-request at lists.okfn.org
Sent: vrijdag 9 maart 2012 9:44
To: open-science at lists.okfn.org
Subject: open-science Digest, Vol 41, Issue 10

Send open-science mailing list submissions to
	open-science at lists.okfn.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://lists.okfn.org/mailman/listinfo/open-science
or, via email, send a message with subject or body 'help' to
	open-science-request at lists.okfn.org

You can reach the person managing the list at
	open-science-owner at lists.okfn.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of open-science digest..."


Today's Topics:

   1. Re: Text mining, PDF to text conversion, and permissions on
      abstracts (Maximilian Haeussler)
   2. Re: Text mining, PDF to text conversion, and permissions on
      abstracts (Peter Murray-Rust)
   3. Re: Text mining, PDF to text conversion, and permissions on
      abstracts (Jessy Kate Schingler)
   4. Re: Text mining, PDF to text conversion, and permissions on
      abstracts (Jack Park)
   5. publishing requires an export permit: new means for
      censorship in The Netherlands (Egon Willighagen)
   6. Re: publishing requires an export permit: new means	for
      censorship in The Netherlands (Diane Cabell)
   7. Re: publishing requires an export permit: new means for
      censorship in The Netherlands (Egon Willighagen)
   8. Re: publishing requires an export permit: new means for
      censorship in The Netherlands (Peter Murray-Rust)


----------------------------------------------------------------------

Message: 1
Date: Thu, 8 Mar 2012 10:06:25 -0800
From: Maximilian Haeussler <maximilianh at gmail.com>
Subject: Re: [open-science] Text mining, PDF to text conversion, and
	permissions on abstracts
To: Finn ?rup Nielsen <fn at imm.dtu.dk>
Cc: "open-science at lists.okfn.org" <open-science at lists.okfn.org>
Message-ID:
	<CAPR-z7ngokFgdNQDSaWVDmxN0MUJPraDZneg-FuRooF6szw3og at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Two years ago, I had the impression that pdfBox is the most mature software package in this area.

hope this helps
cheers
Max



2012/3/8 Finn ?rup Nielsen <fn at imm.dtu.dk>

> In relation to text mining:
>
>
> What do people use for converting PDF to text? My default was/is 
> 'pdftotext' but it has some issues, e.g., ligatures, greek characters, 
> whitespaces. I have looked at pyPdf which might be promising as it is 
> easier (for me) to modify the extractText method. A-PDF GUI program 
> didn't work on my Ubuntu Wine. Adobe Acrobat had the same issues as 
> pdftotext and also there is a two-column issue and it is not a CLI 
> program. I have some notes here: 
> http://neuro.imm.dtu.dk/wiki/**PDF<http://neuro.imm.dtu.dk/wiki/PDF>
>
>
> Following Todd Vision's "text-mining restrictions redux" email:
>
> What about abstracts from full text papers? Does anyone know how 
> publishers feel about their abstracts? Can we republish them? Is that 
> fair use? Are they CC-BY-NC or perhaps even CC-BY? I cannot find any 
> explicit remark about that from the publishers.
>
> Joe Dunckley
> http://journalology.blogspot.**com/2010/05/why-you-cant-copy-**
> abstracts-into.html<http://journalology.blogspot.com/2010/05/why-you-c
> ant-copy-abstracts-into.html>
>
> http://friendfeed.com/**yokofakun/0795d1b5/abstract-**
> of-article-is-it-in-public-**domain-true<http://friendfeed.com/yokofak
> un/0795d1b5/abstract-of-article-is-it-in-public-domain-true>
>
> http://www.sciencedirect.com/**science/article/pii/**S1053811909005990
> <http://www.sciencedirect.com/science/article/pii/S1053811909005990>
>
>
> Finn ?rup Nielsen
>
> ______________________________**_________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/**listinfo/open-science<http://lists.okf
> n.org/mailman/listinfo/open-science>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20120308/4c4626f4/attachment-0001.htm>

------------------------------

Message: 2
Date: Thu, 8 Mar 2012 18:15:11 +0000
From: Peter Murray-Rust <pm286 at cam.ac.uk>
Subject: Re: [open-science] Text mining, PDF to text conversion, and
	permissions on abstracts
To: Maximilian Haeussler <maximilianh at gmail.com>
Cc: "open-science at lists.okfn.org" <open-science at lists.okfn.org>
Message-ID:
	<CAD2k14PK_=x-ZJWzjw0rmJzUurg-Few_k9jxWZe2HfDCXAZNfA at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

2012/3/8 Maximilian Haeussler <maximilianh at gmail.com>

> Two years ago, I had the impression that pdfBox is the most mature 
> software package in this area.
>
> I have used PDFBox extensively but not for about 18 months. It's good 
> and
I also use it for graphics.

It would be Wonderful to get a science-based OKF group for PDFing

P.


>
>


--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20120308/d016613d/attachment-0001.htm>

------------------------------

Message: 3
Date: Thu, 8 Mar 2012 12:22:23 -0600
From: Jessy Kate Schingler <jessy at jessykate.com>
Subject: Re: [open-science] Text mining, PDF to text conversion, and
	permissions on abstracts
To: "open-science at lists.okfn.org" <open-science at lists.okfn.org>
Message-ID:
	<CA+bBsE=-=T+6taa2RUyO96fO_V12sLi0iH+9rN+OY39MKi3-sg at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

(oops, meant to reply-all)

awesome, i didn't know about PDFbox.

imagine how great it would be to combine good content extraction with the annotator tool

On Thu, Mar 8, 2012 at 12:15 PM, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:

>
>
> 2012/3/8 Maximilian Haeussler <maximilianh at gmail.com>
>
>> Two years ago, I had the impression that pdfBox is the most mature 
>> software package in this area.
>>
>> I have used PDFBox extensively but not for about 18 months. It's good 
>> and
> I also use it for graphics.
>
> It would be Wonderful to get a science-based OKF group for PDFing
>
> P.
>
>
>>
>>
>
>
> --
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science
>
>


--
Jessy
http://jessykate.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20120308/b342d613/attachment-0001.htm>

------------------------------

Message: 4
Date: Thu, 8 Mar 2012 17:47:49 -0800
From: Jack Park <jackpark at gmail.com>
Subject: Re: [open-science] Text mining, PDF to text conversion, and
	permissions on abstracts
To: "open-science at lists.okfn.org" <open-science at lists.okfn.org>
Message-ID:
	<CACeHAVCK3OJsU+hx8Yb+6s6g4td3Q1fes=VwaXB2Tu=SHoEa5Q at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Perhaps sooner rather than later we will not be limited to abstracts.
There is work being done to extend what it is that publishers will make available.  A paper to which I contributed is found at

http://oro.open.ac.uk/18563/

Jack

2012/3/8 Finn ?rup Nielsen <fn at imm.dtu.dk>:
> In relation to text mining:
>
>
> What do people use for converting PDF to text? My default was/is 'pdftotext'
> but it has some issues, e.g., ligatures, greek characters, 
> whitespaces. I have looked at pyPdf which might be promising as it is 
> easier (for me) to modify the extractText method. A-PDF GUI program 
> didn't work on my Ubuntu Wine. Adobe Acrobat had the same issues as 
> pdftotext and also there is a two-column issue and it is not a CLI program. I have some notes here:
> http://neuro.imm.dtu.dk/wiki/PDF
>
>
> Following Todd Vision's "text-mining restrictions redux" email:
>
> What about abstracts from full text papers? Does anyone know how 
> publishers feel about their abstracts? Can we republish them? Is that 
> fair use? Are they CC-BY-NC or perhaps even CC-BY? I cannot find any 
> explicit remark about that from the publishers.
>
> Joe Dunckley
> http://journalology.blogspot.com/2010/05/why-you-cant-copy-abstracts-i
> nto.html
>
> http://friendfeed.com/yokofakun/0795d1b5/abstract-of-article-is-it-in-
> public-domain-true
>
> http://www.sciencedirect.com/science/article/pii/S1053811909005990
>
>
> Finn ?rup Nielsen
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science



------------------------------

Message: 5
Date: Fri, 9 Mar 2012 08:58:44 +0100
From: Egon Willighagen <egon.willighagen at gmail.com>
Subject: [open-science] publishing requires an export permit: new
	means for	censorship in The Netherlands
To: open-science at lists.okfn.org
Message-ID:
	<CAMPqvY9Zpu5j9Rb+CMdUe=ykU_fE3S9c=WTqh1x8z2m-Pfft0A at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi all,

bad news from the Netherlands:
http://chem-bla-ics.blogspot.com/2012/03/dutch-government-threatens-with.html

Egon


--
Dr E.L. Willighagen
Postdoctoral Researcher
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



------------------------------

Message: 6
Date: Fri, 9 Mar 2012 08:18:44 +0000
From: Diane Cabell <dc at icommons.org>
Subject: Re: [open-science] publishing requires an export permit: new
	means	for censorship in The Netherlands
To: open-science list <open-science at lists.okfn.org>
Message-ID: <597A8243-C891-4218-BF5C-ED446F8D9DE4 at icommons.org>
Content-Type: text/plain; charset=us-ascii

What is the nature of the authority that the Minister has in this regard?  Is he claiming power under some munitions regulation or from something else?  Is there some specific research that led to this reaction?
dc

Diane Cabell
OeRC
Creative Commons
iCommons Ltd



On Mar 9, 2012, at 7:58 AM, Egon Willighagen wrote:

> Hi all,
> 
> bad news from the Netherlands:
> http://chem-bla-ics.blogspot.com/2012/03/dutch-government-threatens-with.html
> 
> Egon
> 
> 
> -- 
> Dr E.L. Willighagen
> Postdoctoral Researcher
> Department of Bioinformatics - BiGCaT
> Maastricht University (http://www.bigcat.unimaas.nl/)
> Homepage: http://egonw.github.com/
> LinkedIn: http://se.linkedin.com/in/egonw
> Blog: http://chem-bla-ics.blogspot.com/
> PubList: http://www.citeulike.org/user/egonw/tag/papers
> 
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science










------------------------------

Message: 7
Date: Fri, 9 Mar 2012 09:25:46 +0100
From: Egon Willighagen <egon.willighagen at gmail.com>
Subject: Re: [open-science] publishing requires an export permit: new
	means for censorship in The Netherlands
To: Diane Cabell <dc at icommons.org>
Cc: open-science list <open-science at lists.okfn.org>
Message-ID:
	<CAMPqvY8gPQ9AgQ99QdSx37FRh6F6HwY52GxF9kUoNnfsvAqtvw at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Fri, Mar 9, 2012 at 9:18 AM, Diane Cabell <dc at icommons.org> wrote:
> What is the nature of the authority that the Minister has in this regard? ?Is he claiming power under some munitions regulation or from something else? ?Is there some specific research that led to this reaction?

It originates from the research at Rotterdam where the mutated a flu
virus, showing how little is needed for a flu to mutate to become
dangerous. The international scientific community is dealing with this
properly already, but that is apparently not enough for underminister
Bleker.

But the problem is that means the Dutch underminister picks up...
"publishing is export an requires a permit".

*That* applies to all science. And that threat is really badly picked
by the underminister...

Egon


-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



------------------------------

Message: 8
Date: Fri, 9 Mar 2012 08:44:14 +0000
From: Peter Murray-Rust <pm286 at cam.ac.uk>
Subject: Re: [open-science] publishing requires an export permit: new
	means for censorship in The Netherlands
To: Egon Willighagen <egon.willighagen at gmail.com>
Cc: open-science list <open-science at lists.okfn.org>
Message-ID:
	<CAD2k14PbR=JxB_tS2dPHwvGrhLiAs51H69+bD=b5RZ7Cz-SfaQ at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

On Fri, Mar 9, 2012 at 8:25 AM, Egon Willighagen <egon.willighagen at gmail.com
> wrote:

> On Fri, Mar 9, 2012 at 9:18 AM, Diane Cabell <dc at icommons.org> wrote:
> > What is the nature of the authority that the Minister has in this
> regard?  Is he claiming power under some munitions regulation or from
> something else?  Is there some specific research that led to this reaction?
>
> It originates from the research at Rotterdam where the mutated a flu
> virus, showing how little is needed for a flu to mutate to become
> dangerous. The international scientific community is dealing with this
> properly already, but that is apparently not enough for underminister
> Bleker.
>
> But the problem is that means the Dutch underminister picks up...
> "publishing is export an requires a permit".
>
> *That* applies to all science. And that threat is really badly picked
> by the underminister...
>
>
Unbelievable.

This is similar to the US restriction on software. You may not sell
computational cheistry software to the axis-of-evil (
http://en.wikipedia.org/wiki/Axis_of_evil Iran, North Korea, Cuba etc. -
the list changes). The American Chemical Society does not accept papers
from authors in these countries (I think this is a legal restriction and
may be true of all US publishers).

And the Netherlands has also produced Neelie Kroes who is fighting for
knowledge liberation in Europe.

If Bleker meets Kroes then we we might have a knowledge - anti-knowledge
explosion

P.

Egon
>
>
> --
> Dr E.L. Willighagen
> Postdoctoral Researcher
> Department of Bioinformatics - BiGCaT
> Maastricht University (http://www.bigcat.unimaas.nl/)
> Homepage: http://egonw.github.com/
> LinkedIn: http://se.linkedin.com/in/egonw
> Blog: http://chem-bla-ics.blogspot.com/
> PubList: http://www.citeulike.org/user/egonw/tag/papers
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science
>



-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20120309/d5b03d46/attachment.htm>

------------------------------

_______________________________________________
open-science mailing list
open-science at lists.okfn.org
http://lists.okfn.org/mailman/listinfo/open-science


End of open-science Digest, Vol 41, Issue 10
********************************************






More information about the open-science mailing list