[open-bibliography] New BNB sample data available
Antoine Isaac
aisaac at few.vu.nl
Fri Feb 4 14:35:58 UTC 2011
Hello Corine,
Re. 1 and 2, in fact your decision not to put the language tags is what saves you from the inconsistency Andrew has warned about. If you were using the same language tag as id.loc.gov, but a different literal (and adding one dot to a literal makes it an entirely different literal), then your data would be inconsistent with the id.loc.gov one.
Now, on having a language tag or not, I see your issue, but personally I'm ok with originally Spanish labels being considered as English ones, if there's no English translation for them.
Anyway, the core issue to me here is that this language tag dilemma also applies for LoC, which made the opposite choice. Ideally if you publish data on LC concepts, it should be compatible with what LC has--"compatible" in the formal but also informal way: whether there is an inconsistency or not, a data consumer may still be extremely puzzled why LC and BL can't agree on their concepts' prefLabels!
Re. 3, getting data for indexing is a very valid concern. But it also could be done just before the indexing step, not in the data you publish. But well, you are perhaps in the best position to judge: as you have put it, this is about what you feel you should provide to your typical data consumers. Note, however, that putting the labels re-introduces the risk of being out-of-synch with a central repository, which you correctly identified in your first move.
About the danger of a target source being put offline, that is also a valid point. But for id.loc.gov I wouldn't be so worry. In fact, BL starting to rely on it for its data would be a key motivation for LC not to put it offline :-)
Re. your last question, I guess I can only repeat what I've written above. My gut feeling would be to replicate as little as possible: ideally, the URI should be the only thing present in your data! But if you have clear ideas about the amount of efforts your data consumers would be willing to undergo, you should adapt your data to make their life easier.
Note that the data consumers who'd be interested in such caching might be the ones interested in accessing large dumps of data at once. So the "true linked data version" (what you get when following your nose over HTTP) could include only the URIs, but a fit-for-purpose dump of your entire catalogue may include a bit more.
Best,
Antoine
> Hi Antoine and all,
>
> Many thanks for the feedback and apologies for the length of this email.
>
> In answer to the questions about
> <dcterms:subject>
>>> <rdf:Description
>>> rdf:about="http://id.loc.gov/authorities/sh2008107012#concept">
>>> <skos:inScheme
>>> rdf:resource="http://id.loc.gov/authorities#conceptScheme" />
>>> <skos:prefLabel>Literary landmarks--England--
>>> London.</skos:prefLabel>
>>> <rdf:type
>>> rdf:resource="http://www.w3.org/2004/02/skos/core#Concept" />
>>> </rdf:Description>
>>> </dcterms:subject>
>
> And
>
> 1. why does the literal value contained in<skos:prefLabel> Literary landmarks--England--
> London.</skos:prefLabel> does not exactly match the one served by LC at id.loc.gov for http://id.loc.gov/authorities/sh2008107012#concept?
>
> The answer is that it should. We've matched the LCSH heading contained in the bibliographic record to the LCSH heading in the authority file. The issue is to do with punctuation (which is input at the end of the heading in the bib record but is not part of the heading in the authority file). We'll address this in the conversion - this is an issue in the LCSH headings and I believe in other parts of our output. [So no, we "are *not* essentially trying to say which of the SKOS preflabels the BL prefers" as one post tried to double-guess]
>
> 2. Why does our output does not include the xml:lang="en" in<skos:prefLabel>
> This is because in some cases this xml:lang="en" whilst true to the data served up by id.loc.gov is actually not correct. For example, if you look at
> <http://id.loc.gov/authorities/sh94003128#concept> for Parque Nacional Torotoro (Bolivia), we have
> <skos:prefLabel xml:lang="en">Parque Nacional Torotoro (Bolivia)</skos:prefLabel>
>
> instead of Spanish.
>
> I assume the reason for that is that there isn't the granularity in MARC 21 - where these headings originates from - to code the language of each data element. So when LC expresses LCSH in SKOS, they couldn't specify and went for the language of the majority of the headings, which is English.
>
> So we - ok, I ;-) thought we could do "without" the xml:lang attribute since it wasn't "correct" in all cases. I didn't realise the implications.
>
> 3. Why are we outputting both the literal value and the resource URI?
> In a very first attempt, we'd only included the resource URI as you suggest. They were concerns about the two being out of sync., e.g. when a LCSH is updated. In fact, this is one of the uses of those URIs - enabling easier updating of bibliographic data.
>
> But we got some advice to the contrary. Some linked data platforms index the literal values to improve searching; it was also pointed out that there may be a risk of the linked dataset we link to "disappearing".
>
> There are other considerations: we are putting our data out for people to use and re-use; and we are not too sure what they want to do with it yet - so as you suggest, some of them may not want or be able to go and fetch data from id.loc.gov. or any other data sets we link to. A related question is to do with the time and resources to produce these files. At the moment, we are concentrating on the BNB but the intention is to work on other data sets. We are currently working on two versions of the file, a "non-URI" and a "with added-URI" version of the data and ideally, it would be good to have only one version - the "with added-URI" one - to maintain/produce if it meets the needs of all/most people.
>
> Now it's my turn for a question ;-)
>
> In your feedback, you highlight the risk of "that your data is less complete than the one of other services"[1] e.g., if you don't have skos:broader that id.loc.gov has for LCSH concepts.
>
> So to take the example of LCSH at id.loc.gov, how much of the data included there should I replicate in my instance data? Isn't the<skos:prefLabel> and the resource URI sufficient? If you need other info, like<skos:altLabel> or<skos:broader>, won't you be able to fetch it via the resource URI?
>
> That's it for now ;-)
>
> I would also like to say that from later today I shall be offline for the next two weeks. So that people don't think we don't want to engage or anything like that if there is no post. I really appreciate feedback.
>
> Cheers
>
> Corine
More information about the open-bibliography
mailing list