[Open-access] [GOAL] Re: Re: Fight Publishing Lobby's Latest "FIRST" Act to Delay OA - Nth Successor to PRISM, RWA etc.
mark at cottagelabs.com
Fri Nov 29 23:30:54 UTC 2013
The technology to do all of this already exists. Most of the STEM metadata
you describe is actually directly available in Medline, and the core parts
can be used as per the open biblio principles. Crawling the websites is
already possible using pubcrawler and other tools, and finding out what
their stated licence status is can be done with howopenisit (although more
often than not the answer is "not properly defined").
However the hard part is not building or running these things or collecting
all the data, but sustaining it in and imbuing it with credibility.
For example I can run a server with all this on it at not too much personal
expense, but who would treat it as a serious source? Scaling up to handle a
large amount of users and providing a good service does cost money, which I
(we) could probably find a way to fund - but even then, we still have to
solve that credibility problem. It has to be known by those in or entering
the field that "this is where you go to find this stuff" - as opposed to
the current "go to the library and follow all the rules" approach.
On Fri, Nov 29, 2013 at 8:48 PM, Bjoern Brembs <b.brembs at gmail.com> wrote:
> Thanks Daniel for chiming in, this was really helpful. I hope you don't
> mind a few more comments/questions?
> On Friday, November 29, 2013, 4:27:38 PM, you wrote:
> > (a) finding a publication on a site other than the publisher's does
> > not necessarily mean that file is legally there, or even that it's
> > easy to determine (let alone algorithmically) whether that is the case
> At least for the STEM fields, the consolidation of publishers is really
> convenient. What I would intend to do is to crawl the publishers' sites
> (not that many due to consolidation) slowly as if the crawler were a person
> (or many students) from essentially all participating libraries. The
> metadata is with the publishers and easy to read. The crawler the strips
> all the tags that could identify the source of the article (i.e.,
> everything except content), such that each article looks as if it was
> submitted by the author(s). Then, new tags are being added to all the
> articles to create a database of all the articles libraries have access to.
> One tag would denote if the article is unambiguously open access or not.
> Every article that is not unambiguously open access, public domain,
> whatever, is not accessible, not even visible from the outside (except, of
> course, its meta-data).
> > (b) mining those files may be prohibited by the copyright holder's
> > terms and conditions.
> Another tag would take care of this: miners would only see articles where
> mining is unambiguously legal. If it can't be determined, no mining.
> > Instead of crawling for the publications themselves, it may be less
> > problematic from a legal point of view to just have a platform that
> > aggregates metadata of publications, along with a link to a legal copy
> > (green or gold).
> Without standardized mark-up of the articles, their usefulness would be
> severely curtailed. This would only be a last resort option, IMHO.
> > (f) the official repositories are far from interoperable
> That is one of the things that need to be remedied ASAP!
> > For these reasons, I think it is best to develop crawling
> > infrastructure around the clearly licensed literature first (which is
> > a rather small subset at present),
> Doesn't copyright expire? Don't the green mandates work retroactively? So
> everything covered by green mandates (essentially everything with a public
> funder acknowledged), or where the publisher is not opposed to green
> deposition, or where copyright has expired or never existed (e.g. US
> government work, etc.) should be fair game.
> All of this seems like a lot - is it really not that much?
> Björn Brembs
> Universität Regensburg
> open-access mailing list
> open-access at lists.okfn.org
> Unsubscribe: http://lists.okfn.org/mailman/options/open-access
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the open-access