[okfn-help] vdm: strange behavior

Rufus Pollock rufus.pollock at okfn.org
Sat Aug 21 08:44:50 UTC 2010


On 20 August 2010 13:58, Martijn Faassen <faassen at startifact.com> wrote:
[...]
> This is the meat of  get_as_of():
>
>            revision_class = self.__revision_class__
>            # TODO: when dealing with multi-col pks will need to update this
>            # (or just use continuity)
>            out = sess.query(revision_class).join('revision').\
>                filter(
>                    Revision.timestamp <= revision.timestamp
>                ).\
>                filter(
>                    revision_class.id == self.id
>                ).\
>                order_by(
>                    Revision.timestamp.desc()
>                )
>            return out.first()
>
> revision_class is the class of the object representing the revision of
> my class. Both my class and the revision class are ORM mapped. I'll
> call that the 'revision' here, even though it's distinct from the
> Revision that can be joined to it.
>
> This line:
>
>                     Revision.timestamp <= revision.timestamp
>
> Makes sure we get only revisions that happened before (or at the same
> time) as the revision we're checking.
>
> This then sorts the revisions so that the most recent revision comes first.
>
>                order_by(
>                    Revision.timestamp.desc()
>                )
>
> and this then takes the most recent revision.
>
>            return out.first()
>
> This:
>
>                filter(
>                    revision_class.id == self.id
>                ).\
>
> makes sure we actually get the revision of the right object,
> identifying it with the 'id' column. (which in fact makes vdm require
> the use of a 'id' for the primary key if I'm not mistaken).

Yes it does. To support multi-column primary keys we'd need to generalise this.

> But in my tests, the timestamp of the revisions is always identical;
> the test runs so fast multiple revisions get made in the same second,
> and MySQL at least only stores timestamps with second granularity. As
> a result, it can easily happen that instead of
> the correct previous revision, we get the revision we just committed,
> as they both happened in the same second.

Great debugging Martijn! I'm surprised about this feature in MySQL --
I thought most systems did at least microseconds.

One way to resolve this would be to introduce a revision 'number'
attribute which is an auto-incrementing integer. In fact up until v0.6
revision id was an auto-incrementing number so you had this for free.
However we switched to uuids to better support shares revisions
between different repositories. However we could reintroduce this and
it would resolve any ambiguity issues in the time stamps.

[...]

> Is instead this not possible? After all, we know the revision with that
> particular timestamp must be in the database.
>
>                Revision.timestamp == revision.timestamp
>
> of course that wouldn't solve our problem, where there are two
> revisions in the exact same timestamp.
>
> Instead, I'm seriously confused why the code does all this timestamp
> stuff and ordering anyway, while we have a perfectly good revision id.
> When I add this extra filter:
>
>   filter(
>      revision_class.revision_id == revision.id)
>   ).
>
> everything seems to work reliably. But why isn't the query simply this?
>
> revision_class = self.__revision_class__
>
> return sess.query(revision_class).filter(revision_class.id ==
> self.id).filter(revision_class.revision_id == revision_id).first()
>
> this appears to me more reliable and more efficient to boot (assuming
> indexes exist),
> and a join isn't even necessary! And trying it out it appears to work.
>
> There must be a reason, so please enlighten me.

the answer is simple: you are not guaranteed this specific object you
are looking at was changed in the revision you are using for
get_as_of.

Consider the simple case:
  * Domain object: Book, with two instances Book A, Book B
  * In revision 1 you create Book A and Book B
  * In revision 2 you edit Book A
  * In revision 3 you edit Book B
  * Now do get_as_of with revision 2 for Book B
  * What should you get?

Answer: Book B in revision 1. Book B was not changed in revision 2 so
it won't have an associated BookRevision with revision_id =
revision2.id. Instead we search for the last revision at which Book B
was changed prior to revision 2: i.e. we look for: first revision <=
revision 2 which is Book B's case is revision 1.

The point is that each you new revision we only create ObjectRevisions
for objects that changed in that revision. Thus, there is no guarantee
for a given object that it has an ObjectRevision with a revision id
preciesely equal to that revision.

Does this make sense?

Rufus




More information about the okfn-help mailing list