[open-bibliography] New BNB sample data available

Deliot, Corine Corine.Deliot at bl.uk
Fri Feb 4 14:50:44 UTC 2011


Hi Antoine, 

Many thanks for that.  

Just to clarify (in case the folks at LC are wondering!), I was making a
general point about the permanence of linked data sets. I'm not worried
about id.loc.gov being put offline. [but you knew that really ;-)]

Best wishes

Corine

-----Original Message-----
From: Antoine Isaac [mailto:aisaac at few.vu.nl] 
Sent: 04 February 2011 14:36
To: Deliot, Corine
Cc: List for Working Group on Open Bibliographic Data; public-lld
Subject: Re: [open-bibliography] New BNB sample data available

Hello Corine,

Re. 1 and 2, in fact your decision not to put the language tags is what
saves you from the inconsistency Andrew has warned about. If you were
using the same language tag as id.loc.gov, but a different literal (and
adding one dot to a literal makes it an entirely different literal),
then your data would be inconsistent with the id.loc.gov one.

Now, on having a language tag or not, I see your issue, but personally
I'm ok with originally Spanish labels being considered as English ones,
if there's no English translation for them.
Anyway, the core issue to me here is that this language tag dilemma also
applies for LoC, which made the opposite choice. Ideally if you publish
data on LC concepts, it should be compatible with what LC
has--"compatible" in the formal but also informal way: whether there is
an inconsistency or not, a data consumer may still be extremely puzzled
why LC and BL can't agree on their concepts' prefLabels!

Re. 3, getting data for indexing is a very valid concern. But it also
could be done just before the indexing step, not in the data you
publish. But well, you are perhaps in the best position to judge: as you
have put it, this is about what you feel you should provide to your
typical data consumers. Note, however, that putting the labels
re-introduces the risk of being out-of-synch with a central repository,
which you correctly identified in your first move.

About the danger of a target source being put offline, that is also a
valid point. But for id.loc.gov I wouldn't be so worry. In fact, BL
starting to rely on it for its data would be a key motivation for LC not
to put it offline :-)


Re. your last question, I guess I can only repeat what I've written
above. My gut feeling would be to replicate as little as possible:
ideally, the URI should be the only thing present in your data! But if
you have clear ideas about the amount of efforts your data consumers
would be willing to undergo, you should adapt your data to make their
life easier.
Note that the data consumers who'd be interested in such caching might
be the ones interested in accessing large dumps of data at once. So the
"true linked data version" (what you get when following your nose over
HTTP) could include only the URIs, but a fit-for-purpose dump of your
entire catalogue may include a bit more.

Best,

Antoine



> Hi Antoine and all,
>
> Many thanks for the feedback and apologies for the length of this
email.
>
> In answer to the questions about
> <dcterms:subject>
>>>          <rdf:Description
>>> rdf:about="http://id.loc.gov/authorities/sh2008107012#concept">
>>>            <skos:inScheme
>>> rdf:resource="http://id.loc.gov/authorities#conceptScheme" />
>>>            <skos:prefLabel>Literary landmarks--England--
>>> London.</skos:prefLabel>
>>>            <rdf:type
>>> rdf:resource="http://www.w3.org/2004/02/skos/core#Concept" />
>>>          </rdf:Description>
>>>        </dcterms:subject>
>
> And
>
> 1. why does the literal value contained in<skos:prefLabel>  Literary
landmarks--England--
> London.</skos:prefLabel>  does not exactly match the one served by LC
at id.loc.gov for http://id.loc.gov/authorities/sh2008107012#concept?
>
> The answer is that it should. We've matched the LCSH heading contained
in the bibliographic record to the LCSH heading in the authority file.
The issue is to do with punctuation (which is input at the end of the
heading in the bib record but is not part of the heading in the
authority file). We'll address this in the conversion - this is an issue
in the LCSH headings and I believe in other parts of our output. [So no,
we "are *not* essentially trying to say which of the SKOS preflabels the
BL prefers" as one post tried to double-guess]
>
> 2. Why does our output does not include the xml:lang="en"
in<skos:prefLabel>
> This is because in some cases this xml:lang="en" whilst true to the
data served up by id.loc.gov is actually not correct. For example, if
you look at
> <http://id.loc.gov/authorities/sh94003128#concept>  for Parque
Nacional Torotoro (Bolivia), we have
> <skos:prefLabel xml:lang="en">Parque Nacional Torotoro
(Bolivia)</skos:prefLabel>
>
> instead of Spanish.
>
> I assume the reason for that is that there isn't the granularity in
MARC 21 - where these headings originates from - to code the language of
each data element. So when LC expresses LCSH in SKOS, they couldn't
specify and went for the language of the majority of the headings, which
is English.
>
> So we - ok, I ;-) thought we could do "without" the xml:lang attribute
since it wasn't "correct" in all cases. I didn't realise the
implications.
>
> 3. Why are we outputting both the literal value and the resource URI?
> In a very first attempt, we'd only included the resource URI as you
suggest. They were concerns about the two being out of sync., e.g. when
a LCSH is updated. In fact, this is one of the uses of those URIs -
enabling easier updating of bibliographic data.
>
> But we got some advice to the contrary. Some linked data platforms
index the literal values to improve searching; it was also pointed out
that there may be a risk of the linked dataset we link to
"disappearing".
>
> There are other considerations: we are putting our data out for people
to use and re-use; and we are not too sure what they want to do with it
yet - so as you suggest, some of them may not want or be able to go and
fetch data from id.loc.gov. or any other data sets we link to. A related
question is to do with the time and resources to produce these files. At
the moment, we are concentrating on the BNB but the intention is to work
on other data sets. We are currently working on two versions of the
file, a "non-URI" and a "with added-URI" version of the data and
ideally, it would be good to have only one version - the "with
added-URI" one - to maintain/produce if it meets the needs of all/most
people.
>
> Now it's my turn for a question ;-)
>
> In your feedback, you highlight the risk of "that your data is less
complete than the one of other services"[1] e.g., if you don't have
skos:broader that id.loc.gov has for LCSH concepts.
>
> So to take the example of LCSH at id.loc.gov, how much of the data
included there should I replicate in my instance data? Isn't
the<skos:prefLabel>  and the resource URI sufficient? If you need other
info, like<skos:altLabel>  or<skos:broader>, won't you be able to fetch
it via the resource URI?
>
> That's it for now ;-)
>
> I would also like to say that from later today I shall be offline for
the next two weeks. So that people don't think we don't want to engage
or anything like that if there is no post. I really appreciate feedback.
>
> Cheers
>
> Corine




More information about the open-bibliography mailing list