[open-bibliography] Place of Publication data from the BL dataset

Karen Coyle kcoyle at kcoyle.net
Sat Nov 27 13:59:23 UTC 2010


Quoting Tom Morris <tfmorris at gmail.com>:


>
> That's assuming that the place of publication is stored on a
> per-publisher basis, not a per-book basis in the British Library
> database.  That type of knowledge about the schema used in the
> internal database will be key to making informed decisions about the
> data.  Where is that schema documented?

Assuming that BL data follows the MARC format, the schema is "documented" at:
   http://loc.gov/marc/bibliographic/ecbdhome.html

All of the fields that begin with "00" (zero zero) are coded data with  
fixed value vocabularies.

If you look at this page, along the left-hand side you will see a  
heading for MARC Code lists, with some lists, including geographic  
codes, there:
    http://www.loc.gov/marc/

If you want to grab the lists of MARC fields, subfields and codes  
lists, this page (about 2/3 of the way down) has information about and  
links to two csv files that contain the MARC data elements:
   http://futurelib.pbworks.com/w/page/13686649/Data-and-Studies

There is not an official machine-usable schema for the format, unfortunately.


>> The conversion currently takes the place of publication, distribution,
>> etc. from the 260$a. We're considering including the 008/15-17 in future
>> releases.
>
> What does that mean in English?

:-) that they would include the 2-3 letter georgraphic code from field  
008 positions 15-17. Those are the codes listed at:
    http://www.loc.gov/marc/countries/cou_home.html

The Library of Congress, the owner of these vocabularies, has not  
provided an RDF expression for the country codes. There is one,  
however, at:
    http://marccodes.heroku.com/countries/

It's "unofficial" but there are no other RDF options that I know of.

kc


>
> Is there a listing available someplace of what fields in the dump came
> from free form text fields vs database records which guarantee that
> anything linked to that record always has the same text value?  For
> example, book titles and edition statements are almost certainly
> free-form text fields, but I'd expect authors to have individual
> records where every book linked to the same record to have the same
> author name in the dump.
>
> Is there a comprehensive list of which fields are free-form vs
> database record backed?  Knowing the internal schema would be very
> helpful in making use of the data.
>
> On Fri, Nov 26, 2010 at 2:45 AM, Ben O'Steen <bosteen at gmail.com> wrote:
>> (And as Karen has just pointed out, the reason why I am exploring this field
>> is to aid disambiguation of publishers. Having created the overview that I
>> know I need,  I thought to share it here.)
>
> That makes sense, although I'd have thought that publisher data is
> noisy enough and low value enough that it'd be pretty far down on the
> priority list to clean up.
>
> More interesting I think is whether these text strings represent one
> author or three or ...:
>
>   Wilson, Angus, 1913-1991
>   Wilson, Angus, 1913-1991,
>   Wilson, Angus, 1913-1991.
>   Wilson, Angus.
>   Willson, Angus.
>
> My gut tells me that at least the first three text strings probably
> represent a single author, but that's not what the database seems to
> think.
>
> Tom
>
> p.s. Can some librarian type tell me what the trailing period (full
> stop) means?  It's not used consistently, but it appears much, MUCH
> more frequently in library data than anywhere else I've seen.
>
> _______________________________________________
> open-bibliography mailing list
> open-bibliography at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-bibliography
>



-- 
Karen Coyle
kcoyle at kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet





More information about the open-bibliography mailing list