[open-science] Text mining, PDF to text conversion, and permissions on abstracts

Jack Park jackpark at gmail.com
Fri Mar 9 01:47:49 UTC 2012


Perhaps sooner rather than later we will not be limited to abstracts.
There is work being done to extend what it is that publishers will
make available.  A paper to which I contributed is found at

http://oro.open.ac.uk/18563/

Jack

2012/3/8 Finn Årup Nielsen <fn at imm.dtu.dk>:
> In relation to text mining:
>
>
> What do people use for converting PDF to text? My default was/is 'pdftotext'
> but it has some issues, e.g., ligatures, greek characters, whitespaces. I
> have looked at pyPdf which might be promising as it is easier (for me) to
> modify the extractText method. A-PDF GUI program didn't work on my Ubuntu
> Wine. Adobe Acrobat had the same issues as pdftotext and also there is a
> two-column issue and it is not a CLI program. I have some notes here:
> http://neuro.imm.dtu.dk/wiki/PDF
>
>
> Following Todd Vision's "text-mining restrictions redux" email:
>
> What about abstracts from full text papers? Does anyone know how publishers
> feel about their abstracts? Can we republish them? Is that fair use? Are
> they CC-BY-NC or perhaps even CC-BY? I cannot find any explicit remark about
> that from the publishers.
>
> Joe Dunckley
> http://journalology.blogspot.com/2010/05/why-you-cant-copy-abstracts-into.html
>
> http://friendfeed.com/yokofakun/0795d1b5/abstract-of-article-is-it-in-public-domain-true
>
> http://www.sciencedirect.com/science/article/pii/S1053811909005990
>
>
> Finn Årup Nielsen
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science




More information about the open-science mailing list