[ckan-dev] moderated edits c(r)ep. http://trac.ckan.org/ticket/1129

Wed May 11 09:57:00 UTC 2011

On Mon, May 9, 2011 at 9:35 AM, Rufus Pollock <rufus.pollock at okfn.org>wrote:

> On 9 May 2011 00:44, David Raznick <kindly at gmail.com> wrote:
> >> I'm personally of the view that this fits very naturally with the
> >> proposed new 'changeset' vdm model. In that model it is natural and
> >> relatively easy to have changesets that have been 'created' but not
> >> 'applied' to the 'working copy' (i.e. continuity objects).
> >
> > I do not see how this would make any difference in this case.   See below
> > for my explanation of this.
> >>
> >> This is more along the lines of your option 1.
> >>
> >> You don't seem to favour this :-)
> >
> >
> > I am on the fence to be honest.  Proposal 1 is quick and dirty and will
> do
> > mostly what we need if we are prepared to throw away all our 'pending'
> > changes when we change our schema and if we do not care about looking at
> > historical changes.   I do think that proposal 1 should be separate from
> vdm
> > entirely as they do different things.
>
> Why does it prevent us looking at historical changes -- I don't understand
> ...
>

It prevents us from looking at historical changes in the dictized format. We
could still look at historical change in vdm of course.

> > I will try and explain were I am coming from with an example.  Say we
> have
> > simplified package dict. with a many2many relationship with resources.
> >
> > {'name': u'anna2',
> > 'id': u'afafaff',
> > 'resources': [{'id': u'fafafaff',
> >                'url': u'http://www.annakarenina.com/'},
> >               {'id': u'fafafafa',
> >                'url': u'http://www.annakarenina.com/index.json'}]
> > }
> >
> > Say somebody changes a resource, but leaves the package intact, so
> changes
> > it to.
> >
> > {'name': u'anna2',
> > 'id': u'afafaff',
> > 'resources': [{'id': u'fafafaff',
> >                'url': u'http://differenturl'},
> >               {'id': u'fafafafa',
> >                'url': u'http://www.annakarenina.com/index.json'}]
> > }
> >
> > Both the new and old vdm will store changes to *just* the resource. There
> is
> > no getting round that unless the resource has a way of signalling the
> > package to make a whole new package dict to store.  This is my proposal
> 1,
> > to manually do this signalling selectively in the logic layer.  vdm
> cannot
> > guess what to signal nor shouldn't.
> > In my opinion our revisioning system (vdm) should not store package dicts
> > like that, as they are fragile to change.  It should only store changes
> to
> > individual tables like we do currently do.
>
> Right, that's agreed :-) (I wasn't proposing any different ...)
>
> > To prove this, say also at one point you try and look at the data the
> other
> > way round, and you consider resources the primary object. You want the
> > resource dict to look like.
> >
> > {'id': u'fafafaff',
> >  'url': u'http://differenturl'
> > 'packages': [{'name': u'anna2',
> >              'id': u'afafaff'}
> >              ]
> > }
> > If we have not pre-emptively stored the resource dict like this, then to
> > reproduce it will be very very hard if all we have is the above package
> > dicts.  It will be a lot easier to reproduce this if we make sure we
> store
> > each table separately.
> >
> > Proposal 2 gives us a way of producing these dicts historically for any
> way
> > we decide to look at the data.  The new changeset model in vdm makes that
> > hard as we will need to join on keys contained in the
> > change_object_dicts, it is much nicer and faster if the data is in an
> > indexable table like it currently is.
>
> No that's not true. The new changeset model has a changeobject table
> which has as a column (and hence indexable) a 'primary key' for that
> object (to be absolutely correctly it is a munge of: object_type (e.g.
> package) and original primary key for that object -- we could split
> those out in our implementation if we wished). As such looking up
> changeobjects is no different from looking up in revision tables as we
> would currently do.
>

The table does not have an index on the foreign keys contained in the
change_object_dict though.  In my tests of the different query strategies I
had to add an index on the package_id foreign key in the
package_extras_revision table or the query took much longer.

The changset model would work with the following provisos.

* No foreign key can change.
* We are willing to do an extra join for each relationship we have. i.e the
join will firstly be to continuity object and then back to the change_object
table.
* No continuity object can get deleted.

These I admit are not too unreasonable.  However, the querying due to the
extra join is more cumbersome.  It would most likely be slower too, due to
extra join and because you would be always joining back to the *very large*
change_object table.

>
> > If you have a new proposal then these two offered and uses the changeset
> > model then please add it to the c(r)ep :)
>
> Will do but basically I'm just trying to clarify the pros and cons
> around 1 vs 2 and their relationship to vdm :-)
>
> Rufus
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20110511/a99e8b09/attachment-0001.html>