[openbiblio-dev] [bibserver] changing handling of collections from frontend to get them working properly again now that collections are stored as separate objects (937d3e2)

Sun Oct 9 18:39:39 UTC 2011

Hi All,

Clearly, the bibjson spec is the most important thing to work on at
the moment, and we should focus on it for now. Development has gone
well over the summer, and apart from some display points, phase 3 is
complete. Before doing any further development we must decide what to
do with bibjson.

Here is a pad to work on, with a start at a JSON schema:

http://openbiblio.okfnpad.org/bibjson

Here is a review of key features:

* We are building a document oriented system
* We are not building a relational model, we are building a searchable index
* We are focussing on the needs of individuals / small groups

* a record is a document - it should contain data relevant to it, such
as the collection to which it belongs
* collections are any set of records that someone cares enough about to collect
* those records have properties, some of which meet a convention
* a search across an index of those records will provide more useful information

* Our data model does not need to store journal/publisher/year/... as
an understood hierarchy.
* Instead, we just have records that may or may not have some of those keys.
* "all articles from journal x in year y" is something that one could
search for.
* If the outcome of this search was something that someone cared
about, they could make that into a collection too.

* if a document stores the information related to it, a document can
be in multiple collections
* it just identifies as being in multiple collections
* collections can therefore contain other collections - in whole or in part

* records can be recreated at any time by a user, so system-allocated
uuids will have no persistence
* record identifiers should consist of information in the record
* actions on records should be achieved by finding those records
rather than by identifying them

Whilst I understand the points re. the difference between storage and
presentation layer, or points about how the frontend should present a
particular record, this is drawing away from our simple goal.

As much as possible, our frontend should be a simple presentation of
documents stored in an index. Presenting collections should consist of
searches across that index. Those searches should wherever possible be
for values that exist naturally in the records. Where search results
present a collection a person is interested in, they are free to
instantiate them as a collection.

We will lose the power of specificity for the sake of general
flexibility. Which is good. So we just need to agree on the most basic
structure of bibjson, and a convention on what would most usefully be
found under a particular key. then write parsers from other common
formats.

Mark

On Sun, Oct 9, 2011 at 6:08 PM, William Waites <ww at styx.org> wrote:
> On Sun, 09 Oct 2011 09:49:06 -0700, pitman at stat.Berkeley.EDU (Jim Pitman) said:
>
>    JP> Or "All
>    JP> conversations in Statistical Science" or "All obituaries in
>    JP> IMS JOurnals" or "All review articles in IMS JOurnals", these
>    JP> are all interesting collections which we should make an effort
>    JP> to accomodate.
>
> These kinds of collections would be best modelled as the results of
> queries, right? Taking that approach when possible would help
> eliminate a lot of (de)duplication problems down the road.
>
> So generally,
>
>  1 A collection can be explicitly enumerated
>
>  2 A collection can be expressed as a list of queries that returns
>    its elements
>
>    2.1 A special kind of query that might be a good candidate for
>        optimisation is the "include everything from other collection,
>        recursively"
>
>    2.2 The (1) is also expressible as a list of queries that just
>        fetch elements by their identifiers
>
> ?
>
> -w
>