[ckan-discuss] initial considerations for distribution of CKAN model changes

Mon Feb 22 21:07:50 GMT 2010

David Read wrote:
> John,
> 
> All sounds good.
> 
> As far as I can tell, the point of recording the tree of patches
> ensures patches aren't forgotten or applied twice to a branch.
> 
> Example:
> 
> Imagine three nodes in our distributed VCS: A, B and C. Let's say that
> A has a change to a package that is synced to B. Then B and C each
> make changes to the same package. Then C syncs to B and then to A. If
> C doesn't find out that B already has the patch which originated from
> A, then he will try and merge it to B's version (which B already did).

Thanks for illustrative example.

> If you distribute the parents of all the patches then C will avoid
> this problem.

I think you're describing the Mercurial pull model. In the (proposed) 
CKAN pull model, after C pulls unseen patches from B, and before it 
pulls unseen patches from A, it will have received the patch from B 
which originated from A. So when C pulls from A, it will ignore the 
patch which B pulled from A, and will therefore not attempt to merge it 
twice.

The Mercurial situation and the CKAN situation are somewhat converse: 
Mercurial puts in its repository a series of differences (albeit with 
full copies stored every few revisions to speed construction of working 
directory) and on request can construct a full working copy in the 
working directory; whereas CKAN has in its repository a full working 
copy (albeit with a copy of every revision) and on request can 
constructs differences between the revisions.

The Mercurial-CKAN comparison breaks down because CKAN's working copy is 
identically its repository, and so it appears to make sense either to 
imagine that we are patching a Mercurial working directory from (a queue 
of) patches and then immediately committing the changes to an unbranched 
repository, or to imagine that we are a central Mercurial repository 
without a working directory, and simply receive changes and apply them 
to the (unbranched) repository.

Tracing paths of Mercural usage that don't create branches has been 
illuminating. However, following the branch-creating pull mechanism of 
Mercurial is misleading. Also, merging appears much more central to 
distributed version control than branching. I'd venture that just 
because there are no branches doesn't mean it isn't distributed version 
control.

> 
> Of course you don't need this if you're you stick to a centralised VCS
> model or star network. Maybe this is a good simplification for what we
> need.

I reckon my proposal would work in other configurations too.

> 
> Good proposal for license service. I wonder if it would be better
> providing data this as RDF?

The data format of the entity message could be various?

J.

PS Please read all this as speculation - I'm just thinking about it 
rather than trying necessarily to persuade anybody to my adopted position.

> 
> David