[okfn-discuss] Collaborative development of text documents

Fri Aug 25 14:18:16 UTC 2006

Dear Mako (and everyone else!),

Recently I worked on a lengthy essay with some other (non-tech) people. 
The size of the piece meant that it wasn't suitable for a wiki. We chose 
to keep the work in plain ascii format (markdown) and then convert to 
html. The underlying source was kept in subversion. I would post a 
version of the text regularly, people would then making changes posting 
them back to me via email and I would then consolidate the changes and 
repost. As a consequence of the problems encountered I did some googling 
and came across your essay on a CODEX (Collaborative Online 
Documentation (D)ifference Extractor)[1]. Having looked at this I have 
various questions which I detail below.

Regards,

Rufus

## The Problem ##

This system worked just about ok. However, there were several problems:

1. Significant energy was expended by the editor to merge changes.
(in theory people could have merged their changes independently but some 
of the users weren't very technically sophisticated and my prior 
experience of document development had shown that it was better to have 
the merging done centrally).

2. Problems arose when several of the participants ended up working of 
slightly different versions (i.e. not the latest version).

3. It was to track changes and, in particular, to do easy diffing. At 
least one of the participants reverted to pasting text into Microsoft 
Word, editing and then posting Word files (of plain text) so as to be 
able to access track changes visualization.

Thinking about these issues I went to look on the web to see what had 
already been done. Googling around for subversion, text, diffs etc I 
came across this piece of yours from 2002 on a CODEX RFC/Proposal 
(Collaborative Online Documentation (D)ifference Extractor)[1] in which 
you distilled the problem as follows:

 > There is no good, free, version control system for documents or 
documentation. The creation of any piece of literature, especially one 
that involves multiple authors working on the document concurrently, 
involves adding, removing, and changing text constantly. There is no 
good way to keep track of these changes in the way that there is for 
source code. Existing solutions are proprietary and/or kludgey.

I think you are absolutely right here. My only addition would be to 
suggest that we can see at least three clear use cases for this kind of 
tool:

1. Edit the underlying document (with the editor of choice)

2. Basic versioning (with logging)

3. Visualization of changes (diffs). Nice if this was integrated with 1.

4. Ability to branch, tag and merge

5. [not sure about this one] Annotation, for example to suggest or 
indicate changes but without altering the source document (as one does 
by inserting stuff like [[TODO: remove this paragraph]])

## Possible Solutions ##

You discuss the two current solutions:

1. Source-level VCS (with some diff utility, perhaps wdiff)

   * Use case 1: in general editor independent. However support for 
binary formats may be sketchy and no obvious method to integrate with 
most WYSIWYG editors (compare this to source code IDEs)
   * Use case 2 and 4: good
   * Use case 3: left to external tools (poor)
   * Use case 4: require external tool

2. Microsoft Word or Open Office (track changes)

   * Use case 1: locked to particular tool
   * Use case 2: no logging (doubts about scalability)
   * Use case 3: good
   * Use case 4: not supported
   * Use case 5: may be possible as part of track changes functionality

*QU:* Have I summarized the situation accurately?

*QU:* Has anyone else had any experience trying to deal with this problem?

*QU:* What about wikis?

The CODEX RFC goes on to suggest a solution:

 > I propose a robust, free version control system specifically designed 
for working with documents--especially in a asynchronous collaborative 
environment. I'll refer to this (non-existent) system as CODEX, or the 
Collaborative Online Documentation (D)ifference Extractor. The software 
will be free software and will be distributed under the terms of the GNU 
GPL. The core engine will be written in either Perl or Python.

 > Since my software will be free software, I will seek to not duplicate 
effort where-ever possible. I think that building off a system like CVS 
or subversion will be the logical first step. Since a diff will show 
every changed line, it will by default show every changed word and piece 
of white space. A contextual diff (which both CVS and subversion can 
provide) will include even more information. Either of these programs 
will be able to provide information useful for resolving conflicts and 
will provide the ability to commit, checkout, release or watch a 
project. They also both provide servers with several methods to use 
interface over a LAN or the Internet. A future version of subversion 
will allow for different client-side diff programs.

 > I do NOT want this project to involve creating a new word processor. 
There are more than enough of them, most of them bad. I would almost 
certainly create another bad one. I want my software to be able to work 
with many other word processors so that it might be picked up an 
incorporated as a back-end to existing pieces of software.

*QU:* Was any work every done on this 'solution'?

*QU:* Are there any other projects out there that address this problem?

*QU:* What other possible solutions could be used?

[1]: http://mako.cc/projects/collablit/proposal/rfc.html