Our approach was based heavily on the mercurial/git conceptual model and used as data structure the natural one implied by the domain model (~ database rows but not quite) =97 in essence we dump to json for each field and then do diffs on the json. Git (and Hg) require the data to fit in memory (and usually more than that). It can be a problem with large datasets. Benoit