[okfn-help] ORDF feedback and questions

William Waites william.waites at okfn.org
Mon Jun 7 14:43:22 BST 2010


On 10-06-07 14:30, Alistair Turnbull wrote:
> This class seems to really be two separate types, depending on whether
> it is frozen or not. The distinction between the two types is
> analogous to the distinction between a `str` (frozen) and a `StringIO`
> (unfrozen). The two types have different methods; e.g. the unfrozen
> version has `diff()`, `commit()` and `rollback()` while the frozen
> version has `changeDate`. There might be some benefit from refactoring
> this class into two classes?

Quite possibly.

> Trivial: Is there a reason why `reify()` takes a quadruple but
> `disembody()` returns a triple and a context?

This should be made symmetric. The form of disembody() is more in line with
how rdflib tends to implement this pattern but reify() is more
"progressive" from
the point of view of n-tuples instead of triples and quads. By the way
if you have
a better word than "disembody" as opposite of "reify" I'm all ears.

> Trivial: Did you intend to cache `metadata`? And why does it return a
> Graph?

It returns a subset of the changeset graph without the actual changes.
It could
quite easily be cached. This will end up being an empty graph if the
changeset
hasn't been frozen and will only contain anything if it has been commit()ed

> The implementation of `undo()` appears to be correct only if (1) no
> ChangeSet can both add and remove the same triple, or (2) a Graph can
> hold more than one copy of a given triple. Which of those is the case?
> I'm guessing the former. If so, I think it should say so in a
> docstring somewhere, as it's not an easy invariant to deduce from the
> code.

Interesting corner case. There is no ordering to the changes within a
changeset
so adding and removing the same triple has behaviour that is not well
defined.

> What is the difference between a CS.changeSet and an ORDF.changeSet?
> I'm missing something here.

Everything should be ORDF.changeSet. CS is Talis' namespace and since they
seem completely unresponsive w.r.t. adding things to their vocabulary we put
it in our own. You've found an incomplete search-and-replace on my part.

> Have I understood correctly?:
>
>  - A version-controlled graph `g` has zero or more triples `(g,
> ORDF.changeSet, cs)` where `cs` is (the identifier of) a most recent
> ChangeSet in the construction of `g`. These triples are analogous to
> the "droppings" that a filesystem VCS leaves lying around, e.g.
> directories called "CVS" or ".hg".

More like the practice of putting $RCS Id: $ in version controlled files so
that you can look at the file and see what version it is from its content.

>   - When `cs` is applied to `g`, it explicitly removes the old
> droppings and explicitly creates the new droppings.

Correct.

>   - `cs` contains roughly five triples for every triple added to or
> removed from `g`.

Correct.

>   - Consequently, despite bein referred to from `g`, the triples that
> comprise `cs` must not be stored in `g`!

Correct. CS contains reified versions of (some of) triples that are in g.

>   - Each ChangeSet `cs` has zero or more triples `(cs,
> CS.precedingChangeSet, pcs)` where `pcs` is (the identifier of) an
> immediately preceding ChangeSet in the construction of `g`.

Correct

>   - The triples that comprise `pcs` are not stored in `cs`.

Correct

> I realise I'm diving into detail that the documentation says I should
> be ignoring, but I am finding that I am unable to proceed without some
> of the above facts, especially:

I didn't mean to imply that you shouldn't be looking at these details
in the documentation!

>  - ChangeSets are Graphs.

Yes!

>   - Version-controlled Graphs have droppings, which refer to the most
> recent ChangeSet. ChangeSets in turn refer to each other to form a DAG
> (or is it just a linked list?).

Can be a DAG especially since a changeset that contains changes
for two or more graphs may have several ancestors.

>   - The droppings do not quote the ChangeSet, but instead refer to it
> as a separate Graph. Indeed, it *must* be a separate Graph.

Correct.

> I suggest these might be candidates for clarification in the docs. It
> all makes sense, and indeed the design is probably quite well
> constrained, but it is not obvious!

Correct again monsieur! ;)

>
> Bug report: ChangeSet.diff() removes the "droppings" twice.
>
> Bug report: ChangeSet.diff() modifies `new` (removes its droppings).

Shall have a look at this - indeed this may have something to do with
why the .construct method on the handler is failing its unit tests at the
moment.

Cheers,
-w

-- 
William Waites           <william.waites at okfn.org>
Mob: +44 789 798 9965    Open Knowledge Foundation
Fax: +44 131 464 4948                Edinburgh, UK



More information about the okfn-help mailing list