[openbiblio-dev] Bundle creation for similar and dissimilar URIs

Tue Oct 26 12:43:44 UTC 2010

Creation:

  The act is a relatively straightforward I think:

:bundleA a Bundle; 
  <bundled_with> :A; 
  <bundled_with> :B;
  etc
  opmv:wasGeneratedBy [
        rdf:type opmv:Process ;         
        opmv:used :A;    # or should this be a DESCRIBE :A query?
        opmv:used :B;    # ditto
        opmv:used <uri of algorithm report, fellegi-sunter vector
report>
        opmv:wasPerformedBy <uri to versioned code> ;
        opmv:wasControlledBy <#me>
        ... timestamp, etc

I could've attempted a congruent closure, but this isn't the meaning I
am attempting to show. A isn't B here - I am asserting that A and B are
facets of the same underlying thing. This is why I haven't specified a
predicate for the <bundled_with> above. A small, but I think, crucial
difference. 

  A bundle is created when a decision is made that 2 or more URIs are
really different URI views of the same entity. This is typically
expensive and costly and so the data store for these has to be robust.

 - backups?
  Individual files (one per bundle operation) are expensive but easily
allows for a timeline of operations.
  Compiling a file per 'process' including OPMV triples looks
potentially useful and practical. However, this increases the reliance
of an index/cache/triplestore for a running system.

Worrying about scaling/distribution later, a compiled Trig document
containing a set of bundles looks to be the better way than the manner I
am currently (MDOFS from ofs - "bundle:0001.xml", etc)

Destruction of a bundle:

NB typically not deleting a bundle due to accidental creation. This
occurs when another costly operation happens, either human or otherwise,
states that A is NOT based on the same thing as B.

A new bundle stating exactly the opposite is made, showing that A is
dissimilar to B. Decisions have to be made as to which process overrules
the other, eg the author trumps the algorithm.

Ben