[open-science] query regarding language in permission to mine - with clause

Wed Jul 25 16:54:21 UTC 2012

Thanks Peter and Cameron.  Can you bear to educate me even further?

Sec 4.2 of the JISC report on Value and benefits of text mining gave an example of 2930 full-text articles in which the word "malaria" appears.  Am I wrong to think that a miner would have to cite all 2930?  Or if the mining finds a pattern that only appears in 900 of them, then must cite the entire 900?  What is the 'proximate' source in such cases: an Elselvier database of journal articles?
dc

On Jul 19, 2012, at 4:18 PM, cameronneylon.net wrote:

> Yes, this bothers me for the reason you state. It doesn't feel like a good fit for what I feel is "the right thing to do".
> 
> It seems to me that the expected community norm would be to cite the proximate source (i.e. it is reasonable to cite the immediate source of data in some form - but there isn't an expectation to cite "deeper" sources). Where this is a large set that is more challenging but it would still be good practice anyway to maintain a list of sources, even if not with the primary distribution. This is just good provenance information for the derivative work.
> 
> Could something work along the lines of:
> 
> "Where a licenced work is used as input for the purposes of text, data or for other information mining processes, attribution may be complex or challenging. The licensee shall make all reasonable efforts to clearly attribute the immediate (proximate? - is there a term for this?) source or sources of data. It is reasonable for the purposes of this licence for such attribution to made available as a separate document or work from the distributed product of said mining. Where systems exist to track attribution and citation licensee shall make all reasonable efforts to provide notification to such systems when their means of satisfying the attribution requirement differs from that for single derivative works."
> 
> IANAL!
> 
> 
> On 19 Jul 2012, at 19:44, Diane Cabell wrote:
> 
>> Apologies!  The relevant clause is:
>> 
>> "The [requirement of attribution] shall not apply to the products of text, data and other information mining processes where the Work or portions thereof that appear in the mined product are not identifiable by their original source."
>> 
>> 
>> 
>> On Jul 19, 2012, at 10:37 AM, Diane Cabell wrote:
>> 
>>> This is the draft of a clause that would waive attribution when mining.  
>>> 
>>> Do any of you have any thoughts on whether this language is reasonable for the purpose?  Is attribution stacking a problem that truly needs to be addressed?
>>> 
>>> I am a bit concerned that it leaves a possible loophole allowing the miner to intentionally omit any identifiable characteristics simply to avoid attribution.  Is that an inescapable problem?
>>> 
>>> Any advice would be appreciated.
>>> 
>>> Diane Cabell
>>> iCommons Ltd
>>> Creative Commons
>>> OeRC
>>> _______________________________________________
>>> open-science mailing list
>>> open-science at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/open-science
>> 
>> _______________________________________________
>> open-science mailing list
>> open-science at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-science
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20120725/1714a3cb/attachment-0001.html>