[open-bibliography] Place of Publication data from the BL dataset
Karen Coyle
kcoyle at kcoyle.net
Sat Nov 27 13:59:23 UTC 2010
Quoting Tom Morris <tfmorris at gmail.com>:
>
> That's assuming that the place of publication is stored on a
> per-publisher basis, not a per-book basis in the British Library
> database. That type of knowledge about the schema used in the
> internal database will be key to making informed decisions about the
> data. Where is that schema documented?
Assuming that BL data follows the MARC format, the schema is "documented" at:
http://loc.gov/marc/bibliographic/ecbdhome.html
All of the fields that begin with "00" (zero zero) are coded data with
fixed value vocabularies.
If you look at this page, along the left-hand side you will see a
heading for MARC Code lists, with some lists, including geographic
codes, there:
http://www.loc.gov/marc/
If you want to grab the lists of MARC fields, subfields and codes
lists, this page (about 2/3 of the way down) has information about and
links to two csv files that contain the MARC data elements:
http://futurelib.pbworks.com/w/page/13686649/Data-and-Studies
There is not an official machine-usable schema for the format, unfortunately.
>> The conversion currently takes the place of publication, distribution,
>> etc. from the 260$a. We're considering including the 008/15-17 in future
>> releases.
>
> What does that mean in English?
:-) that they would include the 2-3 letter georgraphic code from field
008 positions 15-17. Those are the codes listed at:
http://www.loc.gov/marc/countries/cou_home.html
The Library of Congress, the owner of these vocabularies, has not
provided an RDF expression for the country codes. There is one,
however, at:
http://marccodes.heroku.com/countries/
It's "unofficial" but there are no other RDF options that I know of.
kc
>
> Is there a listing available someplace of what fields in the dump came
> from free form text fields vs database records which guarantee that
> anything linked to that record always has the same text value? For
> example, book titles and edition statements are almost certainly
> free-form text fields, but I'd expect authors to have individual
> records where every book linked to the same record to have the same
> author name in the dump.
>
> Is there a comprehensive list of which fields are free-form vs
> database record backed? Knowing the internal schema would be very
> helpful in making use of the data.
>
> On Fri, Nov 26, 2010 at 2:45 AM, Ben O'Steen <bosteen at gmail.com> wrote:
>> (And as Karen has just pointed out, the reason why I am exploring this field
>> is to aid disambiguation of publishers. Having created the overview that I
>> know I need, I thought to share it here.)
>
> That makes sense, although I'd have thought that publisher data is
> noisy enough and low value enough that it'd be pretty far down on the
> priority list to clean up.
>
> More interesting I think is whether these text strings represent one
> author or three or ...:
>
> Wilson, Angus, 1913-1991
> Wilson, Angus, 1913-1991,
> Wilson, Angus, 1913-1991.
> Wilson, Angus.
> Willson, Angus.
>
> My gut tells me that at least the first three text strings probably
> represent a single author, but that's not what the database seems to
> think.
>
> Tom
>
> p.s. Can some librarian type tell me what the trailing period (full
> stop) means? It's not used consistently, but it appears much, MUCH
> more frequently in library data than anywhere else I've seen.
>
> _______________________________________________
> open-bibliography mailing list
> open-bibliography at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-bibliography
>
--
Karen Coyle
kcoyle at kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
More information about the open-bibliography
mailing list