[openbiblio-dev] [bibserver] changing handling of collections from frontend to get them working properly again now that collections are stored as separate objects (937d3e2)

Mon Oct 10 07:30:10 UTC 2011

On 9 October 2011 17:13, Jim Pitman <pitman at stat.berkeley.edu> wrote:
[...]
>> > publisher/journal/volume/issue
>> > Especially if the publisher is a small professional society, the publisher level collection is very important: it establishes
>> > the identity of the society, and is a collection the society might be proud to host on its own website in association with BibSoup.
>> > Respect for such collections must be provided to accomodate the needs of publishers for recognition, and to engage them in contributing
>> > data to BibSoup.
>> > Each level in the publisher/journal/volume/issue hierarchy demands a metadata record, with attributes depending on level in the hierarchy.  It should be
>> > required that each such metadata record either contains the list of at least identifiers and human-readable titles of its children, or
>> > provides a pointer to such a list as a  BibJSON dataset, or both.
>>
>> Right, this sounds pretty complex :-)
>
> Actually no, it is very simple, and already accomodated by standard A&I service records.

I don't think it is -- as evidenced by some of this discussion below.

>> I strongly suggest sitting down and writing out in detail the user stories here using our existing
>> spreadsheet (or creating a new document). Doing user stories would also focus us on what people actually want to do rather than focusing
>> on the details of the modelling which would come out of the user stories rather than the other way round).
>
> No problem there, and I can provide 3 basic user stories immediately.
>
> 1) Provide and maintain  a decent index of the IMS/Bernoulli open access electronic journals.  There are 5 of them, split 2/3 by technical
> structure of currently available metadata. I have made tentative starts at work on this.
>
> 2) Same for all IMS/Bernoulli journals, which are semi-open,  meaning the content is open if you know where to look for it, but otherwise
> somewhat hidden.
>
> 3) For these journals, provide a citation index, meaning a complete list of items cited in the reference lists of these journals, with some
> effort at deduplication of items.
>
> Each of these exercises provides  us with multiple examples of collections. Among the more interesting and novel collections are e.g.
> "All articles ever cited in the Annals of Probability", "All articles ever cited in IMS Journals" and the same for books or other types.
> Some of these collections may be defined programmatically, e.g. by queries in some QL, but they still deserve a metadata record saying what
> the queries mean and enabling a user to browse around such collections and know what they are looking at.
> This is very close to the issue of providing metadata records for every query to BibSoup. I think this is required, and that some queries may
> be dignified by a higher standing than others. Those with higher standing should aquire more attributes as collections.

I think these actually provide good examples of things I would not
designate as collections but rather as the results of queries
(QuerySets if you like). Does one really need to create a dedicated
collection for "All articles ever cited in the Annals of Probability"
as opposed to having a query for this (especially as such a collection
changes over time as new things get cited!).

We *really* are going to have think hard about this as well as what is
important. Implementing all possible requirements here could be very
expensive so we should focus on the simplest / highest value items
first. I think it is important we have a tool that does a few things
well rather than trying to do lots and lots of stuff (and doing it
badly).

[...]

>> I really think we want to focus on the user stories first and then
>> decide whether one concept/ / domain object (e.g. "Collection") is
>> sufficient for the domain we are trying to cover.
>
> OK. I am already convinced we need to go with a very general, flexible concept of "Collection" which is capable of adapting to
> whatever use case we can throw at it. Mathematically, collections are nothing more or less than sets of records. We need a way to allow
> users to simply specify whatever sets they care about. If there are simple boolean relations between sets, especially A subset of B and
> A disjoint from B, there must be simple ways to indicate this in the collection metadata. Its as simple as that. Do not need any more use

You realize this is really pretty complicated right!

> cases to commit to respecting those structures in the data model. I think it is mostly a matter of *where* these structures are kept, and I think
> the answer is simple, you either put this info directly in the meta record for a collection, or you link out to this info (e.g. as a query response)
> from the collection record.
> I think we should just try installing simple collection metadata support on the above lines, and then start exercising it with actual
> use cases, for which the publisher collection and departmental collections are exemplary.

>From what you are saying I think we are going to have to think about
this really quite hard -- this isn't something we can just "hack" in
quickly. I may be wrong but this really does not sound trivial.

Rufus