[open-bibliography] PubMed
Daniel Mietchen
daniel.mietchen at googlemail.com
Mon Nov 9 16:15:52 UTC 2015
Agreed- contact PubMed right away. On
http://www.ncbi.nlm.nih.gov/pubmed
and all the PubMed pages, there is a link "Write to the Help Desk".
They have usually gotten back to me in time with useful replies.
Cheers,
Daniel
On Mon, Nov 9, 2015 at 5:11 PM, Thad Guidry <thadguidry at gmail.com> wrote:
> Thomas,
>
> Just reach out to Pubmed themselves and ask them those same questions.
>
>
> Thad
> +ThadGuidry
>
> On Mon, Nov 9, 2015 at 8:47 AM, Thomas Krichel <krichel at openlib.org> wrote:
>>
>> Daniel Mietchen writes
>>
>> > Why not go ahead and post it here?
>>
>> ok. But pubmed is not open data, as far as I know.
>>
>> > If it's indeed too technical to be discussed here, someone might
>> > forward it to a more appropriate venue.
>>
>> The issue is in fact simple. How to get a complete copy of
>> pubmed data? I still have to understand what the difference
>> between entrez, medline and pubmed is is, but I refer to
>> complete copy as all the records that one can find in the
>> web site.
>>
>> I am a pubmed vendor, so I have access to the ftp site and the
>> data therein. From
>>
>> https://www.nlm.nih.gov/databases/journal.html
>>
>> I know that
>>
>> | The approximately 2% of the records not exported to MEDLINE/PubMed
>> | licensees are those tagged [PubMed - as supplied by publisher] in
>> | PubMed.
>>
>> I suspect that a lot of the most recent additions are temporarily in
>> this category. These are the ones that I am keen on getting. Waiting
>> is not an option.
>>
>> I assume they are included in the API described at
>>
>> http://www.ncbi.nlm.nih.gov/books/NBK25498/
>>
>> How do I get access to all of those records, and only those? One
>> way that I can come up with is to
>>
>> 1. generated a list of suspected pmids
>> 2. check I don't have data for them
>> 3. submit them to the API
>> 4. check response to see which one I did not get a response to,
>> queue for resubmission.
>>
>> It's an approach more in tune with the Vikings, the Huns etc than
>> the supposedly civilized 21st century. Is there any smarter way? I
>> have written to the NLM last week, no response yet.
>>
>> 1 is particularly problematic. Last night's data shows I have
>> 24997267 records and the maximum number is 26544013. Presumably I
>> could first try to harvest that interval, then, in later runs start
>> a little lower and go a little higher. For 4) I could use a queue
>> rule saying I will not query a record if the current waits would be
>> smaller than the sum of previous waits. But that would involve
>> keeping historic harvesting data and peridically processing it. It
>> is probably best to work in ascending order even though this may
>> introduce a periodicity in the harvested numbers.
>>
>>
>> --
>>
>> Cheers,
>>
>> Thomas Krichel http://openlib.org/home/krichel
>> skype:thomaskrichel
>> _______________________________________________
>> open-bibliography mailing list
>> open-bibliography at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/open-bibliography
>> Unsubscribe: https://lists.okfn.org/mailman/options/open-bibliography
>
>
>
> _______________________________________________
> open-bibliography mailing list
> open-bibliography at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/open-bibliography
> Unsubscribe: https://lists.okfn.org/mailman/options/open-bibliography
>
More information about the open-bibliography
mailing list