[Open-access] Reminder: @ccess Ininitiative meeting, TODAY Thursday June 21 1:30 pm UTC+1 (english summer time)
Mark MacGillivray
mark at cottagelabs.com
Thu Jun 21 11:54:15 UTC 2012
Very sorry, but I am going to have to cancel at the last minute too. Next
week I am in Amsterdam though so anyone who wants to meet up there, please
contact me direct (e.g. Tom, perhaps).
Here is an update about further checks on the deduplication process:
Tom has pointed out that it is odd there were 4980 in medline, 3825
malariaworld and 2608 wikipedia and yet when we deduplicated against
medline we only found 1641 in malariaworld and 1258 in wikipedia. Where do
the other roughly 2200 and 1400 records come from?
We know that the wikipedia collection is based on the open access subset -
which despite the name is NOT a subset of the medline but a subset of PMC
(which is not the same as pubmed...). So it can contain articles that are
not in medline. Here is an example:
http://malaria.bibsoup.net/test/wikipedia/anewantimalarialdrugagainstmurinemalaria
It has a PMCID of 3019440, but there is no PMID registered against it
(confirmed on the PMID to PMCID converter). Searching for it in the full
medline does not return a result. It is in PMC, but not in Pubmed. Medline
is a representation of Pubmed, whereas the open access subset is a
representation of PMC.
Also, the Malariaworld collection contains 582 articles that do not appear
to be from 2010 or 2011, despite the dump being for 2010 and 2011. This
goes towards explaining why they were not found in the medline collection
that we filtered for these years. Also, similar to above, it may contain
articles that do not appear in Medline at all.
So within our own collections we do not appear to have duplicates any more,
despite the above situation. Any manual checking that people on list could
perform would be great, and let me know if there is any more evidence of
failure to deduplicate.
If we want to take these checks further, the only thing I can think of as a
next step would be to run a comparison of every record in our malaria index
against pubmed and pmc searches - this could be done by some scripting
against the query interface at http://www.ncbi.nlm.nih.gov/sites/gquery
Let me know what you think.
Mark
On Thu, Jun 21, 2012 at 10:18 AM, Laura Newman <laura.newman at okfn.org>wrote:
> I'm also away in London I'm afraid.
>
>
>
>
> On Thu, Jun 21, 2012 at 10:18 AM, Peter Murray-Rust <pm286 at cam.ac.uk>wrote:
>
>>
>>
>> On Thu, Jun 21, 2012 at 8:53 AM, Tom Olijhoek <tom.olijhoek at gmail.com>wrote:
>>
>>> Hi all,
>>>
>>> OA meeting #19, Thursday June 21 1:30 pm UTC+1 (english summer time)
>>>
>>> meeting pad at http://open-access.okfnpad.org/19
>>>
>>> I am afraid I am giving a lecture in Nijmegen.
>>
>> P.
>>
>>
>>>
>>> last week only 2-3 participants, we have to do better than that!
>>>
>>> cheers
>>>
>>> TOM
>>>
>>> _______________________________________________
>>> open-access mailing list
>>> open-access at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/open-access
>>>
>>>
>>
>>
>> --
>> Peter Murray-Rust
>> Reader in Molecular Informatics
>> Unilever Centre, Dep. Of Chemistry
>> University of Cambridge
>> CB2 1EW, UK
>> +44-1223-763069
>>
>> _______________________________________________
>> open-access mailing list
>> open-access at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-access
>>
>>
>
>
> --
> Laura Newman
> Community Coordinator
> Open Knowledge Foundation
> http://okfn.org/
> Skype: lauranewmanonskype
> Twitter: @Newmanlk
>
>
> _______________________________________________
> open-access mailing list
> open-access at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-access
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-access/attachments/20120621/ca2f09cd/attachment.html>
More information about the open-access
mailing list