[openbiblio-dev] [bibserver] changing handling of collections from frontend to get them working properly again now that collections are stored as separate objects (937d3e2)

Sun Oct 9 20:26:51 UTC 2011

Mark and I definitely think alike on this; I'm nodding yes to what's  
here. In particular I agree that "collection" is a set that is the  
result of a query, not a data structure. With Flickr as the analogy,  
tags create sets/collections, and there can be any number of  
collections with overlapping members. Although I'd lean more toward  
the library approach of having at least some "special" tags, like  
publisher, journal, etc. (Some other day I'll mention how hard it is  
to define "this journal". :-))

I have a question, though, about bibjson. Well, maybe more than one.

- Is bibjson what some of us would call a "transmission format"? That  
is, is it intended for sending bib data from one source to another?
- the converse of that is: is bibjson intended as a storage/database format?

I ask this because the concept of collection appears to be hard-wired  
into bibjson, which then doesn't meet what I read below as the  
"collections are the result of queries." And it also looks like  
records are subordinate to collection, meaning that collection must be  
present.

Another question is about update. Are updates effected by re-sending  
the entire collection with new and old information? Basically, a  
replacement for the collection? Will there be updating taking place in  
Bibsoup? (It sounds like it from Mark's list, but I'm not sure.)  
Related: will one update a collection? or a bib record for a resource?

That's a start.

kc

Quoting Mark MacGillivray <mark.macgillivray at okfn.org>:

> Hi All,
>
> Clearly, the bibjson spec is the most important thing to work on at
> the moment, and we should focus on it for now. Development has gone
> well over the summer, and apart from some display points, phase 3 is
> complete. Before doing any further development we must decide what to
> do with bibjson.
>
> Here is a pad to work on, with a start at a JSON schema:
>
> http://openbiblio.okfnpad.org/bibjson
>
> Here is a review of key features:
>
> * We are building a document oriented system
> * We are not building a relational model, we are building a searchable index
> * We are focussing on the needs of individuals / small groups
>
> * a record is a document - it should contain data relevant to it, such
> as the collection to which it belongs
> * collections are any set of records that someone cares enough about  
> to collect
> * those records have properties, some of which meet a convention
> * a search across an index of those records will provide more useful  
> information
>
> * Our data model does not need to store journal/publisher/year/... as
> an understood hierarchy.
> * Instead, we just have records that may or may not have some of those keys.
> * "all articles from journal x in year y" is something that one could
> search for.
> * If the outcome of this search was something that someone cared
> about, they could make that into a collection too.
>
> * if a document stores the information related to it, a document can
> be in multiple collections
> * it just identifies as being in multiple collections
> * collections can therefore contain other collections - in whole or in part
>
> * records can be recreated at any time by a user, so system-allocated
> uuids will have no persistence
> * record identifiers should consist of information in the record
> * actions on records should be achieved by finding those records
> rather than by identifying them
>
> Whilst I understand the points re. the difference between storage and
> presentation layer, or points about how the frontend should present a
> particular record, this is drawing away from our simple goal.
>
> As much as possible, our frontend should be a simple presentation of
> documents stored in an index. Presenting collections should consist of
> searches across that index. Those searches should wherever possible be
> for values that exist naturally in the records. Where search results
> present a collection a person is interested in, they are free to
> instantiate them as a collection.
>
> We will lose the power of specificity for the sake of general
> flexibility. Which is good. So we just need to agree on the most basic
> structure of bibjson, and a convention on what would most usefully be
> found under a particular key. then write parsers from other common
> formats.
>
>
> Mark
>
>
> On Sun, Oct 9, 2011 at 6:08 PM, William Waites <ww at styx.org> wrote:
>> On Sun, 09 Oct 2011 09:49:06 -0700, pitman at stat.Berkeley.EDU (Jim  
>> Pitman) said:
>>
>>    JP> Or "All
>>    JP> conversations in Statistical Science" or "All obituaries in
>>    JP> IMS JOurnals" or "All review articles in IMS JOurnals", these
>>    JP> are all interesting collections which we should make an effort
>>    JP> to accomodate.
>>
>> These kinds of collections would be best modelled as the results of
>> queries, right? Taking that approach when possible would help
>> eliminate a lot of (de)duplication problems down the road.
>>
>> So generally,
>>
>>  1 A collection can be explicitly enumerated
>>
>>  2 A collection can be expressed as a list of queries that returns
>>    its elements
>>
>>    2.1 A special kind of query that might be a good candidate for
>>        optimisation is the "include everything from other collection,
>>        recursively"
>>
>>    2.2 The (1) is also expressible as a list of queries that just
>>        fetch elements by their identifiers
>>
>> ?
>>
>> -w
>>
>

-- 
Karen Coyle
kcoyle at kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet