[openbiblio-dev] [bibserver] changing handling of collections from frontend to get them working properly again now that collections are stored as separate objects (937d3e2)

Karen Coyle kcoyle at kcoyle.net
Sun Oct 9 15:49:28 UTC 2011

Quoting Rufus Pollock <rufus.pollock at okfn.org>:

> I wonder if we are misusing "collection" here. For me a collection is
> just like a bibliography, or even more simply, a bunch of works /
> records I want to collect together.
> Collection in the sense of "all issues or this journal or all articles
> in this journal" while they could be represented as collection might
> better be represented in their own right.

I, too, am somewhat flumoxed by the emphasis on collections here. If a  
collection is just any group of metadata from a single source, then  
it's not a terribly meaningful or useful grouping. If a collection is  
a set of metadata that has been chosen for some purpose, then it is  
closer to my concept of collection. In the end, however, there will be  
metadata records for resources, and those records may be found in  
multiple collections. Will the collections be "undone" in the  
database, or will the records retain their existence in a collection?  
How will the database handle items that are in multiple collections?  
And, as Rufus asks, what is the use of collections to users?

>> publisher/journal/volume/issue

This may be one logical way to store some data, but hierarchy tends to  
constrain potential services. You do NOT want to have to know the  
publisher in order to find the journal, obviously.

In addition, there are publishers and there are publishers. Some are  
professional organizations like ACM, but others are mere corporations,  
like Elsevier or Nature. There will be folks who have published in  
Time Magazine or the New Yorker, and you don't want to exclude that.

I would tend to record this information, probably with a different  
data element for professional or governmental organizations as  
publishers or sponsors, but not use it as a way to organize the data.  
Across different communities there are just too many different  
relationships of bodies to publications to make something like this  
work. Think broadly about the world of publication.

>> Each level in the publisher/journal/volume/issue hierarchy demands  
>> a metadata record, with attributes depending on level in the  
>> hierarchy.  It should be
>> required that each such metadata record either contains the list of  
>> at least identifiers and human-readable titles of its children, or
>> provides a pointer to such a list as a  BibJSON dataset, or both.
> Right, this sounds pretty complex :-)

Oy! Let's not make requirements that will discourage or even prevent  
input. What information do people actually have on hand when they are  
creating the metadata? (And how accurate is it? Probably not even 95%)

BTW, if you want to create records for each level, there are library  
records that contain only the publishing pattern for each journal that  
has been cataloged. Those pattern records can be used to create a full  
set of journal/volume/issue, but I have to warn you that there are  
more levels than volume/issue -- the library data allows for 6 (!)  
such levels, but they are fully defined, with their display components  
(part, number, season, date, whatever) if you want. (I'll try to find  
where these records are... they're kind of background data for the  
issue predictor systems that allow libraries to know if they've missed  
an issue.)


>> JSTOR provides a major example of a secondary collection which  
>> cross-cuts primary publishers, and for each journal may only contain
>> parts of the journal, but typically contains whole volumes and issues.

JSTOR digitized whole journal runs, using the archives of the US  
libraries. There shouldn't be many gaps in the journals they did  

Also, did you note that JSTOR has announced that it is giving open  
access to all of its public domain materials?

> I really think we want to focus on the user stories first and then
> decide whether one concept/ / domain object (e.g. "Collection") is
> sufficient for the domain we are trying to cover.

Agreed. I also think we need a wide variety of user stories from  
different fields. This group tends toward math and science, and other  
disciplines will have different needs. The social sciences and liberal  
arts should be included, no?


Karen Coyle
kcoyle at kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet

More information about the openbiblio-dev mailing list