[open-bibliography] PubMed

Mon Nov 9 16:11:53 UTC 2015

Thomas,

Just reach out to Pubmed themselves and ask them those same questions.

Thad
+ThadGuidry <https://www.google.com/+ThadGuidry>

On Mon, Nov 9, 2015 at 8:47 AM, Thomas Krichel <krichel at openlib.org> wrote:

>   Daniel Mietchen writes
>
> > Why not go ahead and post it here?
>
>   ok. But pubmed is not open data, as far as I know.
>
> > If it's indeed too technical to be discussed here, someone might
> > forward it to a more appropriate venue.
>
>   The issue is in fact simple. How to get a complete copy of
>   pubmed data? I still have to understand what the difference
>   between entrez, medline and pubmed is is, but I refer to
>   complete copy as all the records that one can find in the
>   web site.
>
>   I am a pubmed vendor, so I have access to the ftp site and the
>   data therein.  From
>
> https://www.nlm.nih.gov/databases/journal.html
>
>   I know that
>
> | The approximately 2% of the records not exported to MEDLINE/PubMed
> | licensees are those tagged [PubMed - as supplied by publisher] in
> | PubMed.
>
>   I suspect that a lot of the most recent additions are temporarily in
>   this category. These are the ones that I am keen on getting. Waiting
>   is not an option.
>
>   I assume they are included in the API described at
>
> http://www.ncbi.nlm.nih.gov/books/NBK25498/
>
>   How do I get access to all of those records, and only those? One
>   way that I can come up with is to
>
>   1. generated a list of suspected pmids
>   2. check I don't have data for them
>   3. submit them to the API
>   4. check response to see which one I did not get a response to,
>      queue for resubmission.
>
>   It's an approach more in tune with the Vikings, the Huns etc than
>   the supposedly civilized 21st century. Is there any smarter way?  I
>   have written to the NLM last week, no response yet.
>
>   1 is particularly problematic. Last night's data shows I have
>   24997267 records and the maximum number is 26544013. Presumably I
>   could first try to harvest that interval, then, in later runs start
>   a little lower and go a little higher. For 4) I could use a queue
>   rule saying I will not query a record if the current waits would be
>   smaller than the sum of previous waits.  But that would involve
>   keeping historic harvesting data and peridically processing it.  It
>   is probably best to work in ascending order even though this may
>   introduce a periodicity in the harvested numbers.
>
>
> --
>
>   Cheers,
>
>   Thomas Krichel                  http://openlib.org/home/krichel
>                                               skype:thomaskrichel
> _______________________________________________
> open-bibliography mailing list
> open-bibliography at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/open-bibliography
> Unsubscribe: https://lists.okfn.org/mailman/options/open-bibliography
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-bibliography/attachments/20151109/3e41517d/attachment-0003.html>