thadguidry at gmail.com
Mon Nov 9 16:11:53 UTC 2015
Just reach out to Pubmed themselves and ask them those same questions.
On Mon, Nov 9, 2015 at 8:47 AM, Thomas Krichel <krichel at openlib.org> wrote:
> Daniel Mietchen writes
> > Why not go ahead and post it here?
> ok. But pubmed is not open data, as far as I know.
> > If it's indeed too technical to be discussed here, someone might
> > forward it to a more appropriate venue.
> The issue is in fact simple. How to get a complete copy of
> pubmed data? I still have to understand what the difference
> between entrez, medline and pubmed is is, but I refer to
> complete copy as all the records that one can find in the
> web site.
> I am a pubmed vendor, so I have access to the ftp site and the
> data therein. From
> I know that
> | The approximately 2% of the records not exported to MEDLINE/PubMed
> | licensees are those tagged [PubMed - as supplied by publisher] in
> | PubMed.
> I suspect that a lot of the most recent additions are temporarily in
> this category. These are the ones that I am keen on getting. Waiting
> is not an option.
> I assume they are included in the API described at
> How do I get access to all of those records, and only those? One
> way that I can come up with is to
> 1. generated a list of suspected pmids
> 2. check I don't have data for them
> 3. submit them to the API
> 4. check response to see which one I did not get a response to,
> queue for resubmission.
> It's an approach more in tune with the Vikings, the Huns etc than
> the supposedly civilized 21st century. Is there any smarter way? I
> have written to the NLM last week, no response yet.
> 1 is particularly problematic. Last night's data shows I have
> 24997267 records and the maximum number is 26544013. Presumably I
> could first try to harvest that interval, then, in later runs start
> a little lower and go a little higher. For 4) I could use a queue
> rule saying I will not query a record if the current waits would be
> smaller than the sum of previous waits. But that would involve
> keeping historic harvesting data and peridically processing it. It
> is probably best to work in ascending order even though this may
> introduce a periodicity in the harvested numbers.
> Thomas Krichel http://openlib.org/home/krichel
> open-bibliography mailing list
> open-bibliography at lists.okfn.org
> Unsubscribe: https://lists.okfn.org/mailman/options/open-bibliography
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the open-bibliography