[okfn-discuss] Re: Collaborative development of text documents

Benj. Mako Hill mako at atdot.cc
Thu Feb 1 23:18:52 UTC 2007

<quote who="Rufus Pollock" date="Fri, Aug 25, 2006 at 03:18:16PM +0100">
> Recently I worked on a lengthy essay with some other (non-tech) people. 
> The size of the piece meant that it wasn't suitable for a wiki. We chose 
> to keep the work in plain ascii format (markdown) and then convert to 
> html. The underlying source was kept in subversion. I would post a 
> version of the text regularly, people would then making changes posting 
> them back to me via email and I would then consolidate the changes and 
> repost. As a consequence of the problems encountered I did some googling 
> and came across your essay on a CODEX (Collaborative Online 
> Documentation (D)ifference Extractor)[1]. Having looked at this I have 
> various questions which I detail below.

Sorry for the (very) late response to this. I've sort of been deciding
on a MIT Media Lab thesis and have basically decided to run with a
project that builds on my ideas in CODEX (probably with a different

I'll answer your questiosn below.

> > There is no good, free, version control system for documents or 
> > documentation. The creation of any piece of literature, especially one 
> > that involves multiple authors working on the document concurrently, 
> > involves adding, removing, and changing text constantly. There is no 
> > good way to keep track of these changes in the way that there is for 
> > source code. Existing solutions are proprietary and/or kludgey.
> I think you are absolutely right here. My only addition would be to 
> suggest that we can see at least three clear use cases for this kind of 
> tool:
> 1. Edit the underlying document (with the editor of choice)
> 2. Basic versioning (with logging)
> 3. Visualization of changes (diffs). Nice if this was integrated with 1.
> 4. Ability to branch, tag and merge

When I wrote the original CODEX propoposal, my focus was on the first
three. With time, I think those problems have been addressed in
(sometimes) reasonable ways. The third has, almost entirely, not
been. It's a difficult problem from a variety of perspectives.

> 5. [not sure about this one] Annotation, for example to suggest or 
> indicate changes but without altering the source document (as one does 
> by inserting stuff like [[TODO: remove this paragraph]])

I'm not dealing with this problem but it's clearly something people

> You discuss the two current solutions:
> 1. Source-level VCS (with some diff utility, perhaps wdiff)

I'm pretty convinced that it needs to be a separate application
specific tool (although probably one that calls existing libraries,
storage formats, and intermediary diff representation formats).

>   * Use case 1: in general editor independent. However support for 
> binary formats may be sketchy and no obvious method to integrate with 
> most WYSIWYG editors (compare this to source code IDEs)
>   * Use case 2 and 4: good
>   * Use case 3: left to external tools (poor)
>   * Use case 4: require external tool
> 2. Microsoft Word or Open Office (track changes)
>   * Use case 1: locked to particular tool
>   * Use case 2: no logging (doubts about scalability)
>   * Use case 3: good
>   * Use case 4: not supported
>   * Use case 5: may be possible as part of track changes functionality
> *QU:* Have I summarized the situation accurately?


> *QU:* Has anyone else had any experience trying to deal with this
> problem?

Everyone has AFAICT. Maintaining multiple copies of a resumé or CV
introduces this issue.

> *QU:* What about wikis?

Sure. Wikis are famously bad about having diverged documents. At the
very least, a tool like this could be used to help resolve edit
conflicts -- something that any serious wiki editors knows about and
hates. Ideally, it will allow you do much, *much* more.

> The CODEX RFC goes on to suggest a solution:
> > I propose a robust, free version control system specifically
> > designed for working with documents--especially in a asynchronous
> > collaborative environment. I'll refer to this (non-existent)
> > system as CODEX, or the > Collaborative Online Documentation
> > (D)ifference Extractor. The software > will be free software and
> > will be distributed under the terms of the GNU > GPL. The core
> > engine will be written in either Perl or Python.
> > Since my software will be free software, I will seek to not
> > duplicate effort where-ever possible. I think that building off a
> > system like CVS > or subversion will be the logical first
> > step. Since a diff will show > every changed line, it will by
> > default show every changed word and piece > of white space. A
> > contextual diff (which both CVS and subversion can > provide) will
> > include even more information. Either of these programs > will be
> > able to provide information useful for resolving conflicts and >
> > will provide the ability to commit, checkout, release or watch a >
> > project. They also both provide servers with several methods to
> > use > interface over a LAN or the Internet. A future version of
> > subversion > will allow for different client-side diff programs.
> > I do NOT want this project to involve creating a new word
> > processor.  There are more than enough of them, most of them
> > bad. I would almost > certainly create another bad one. I want my
> > software to be able to work > with many other word processors so
> > that it might be picked up an > incorporated as a back-end to
> > existing pieces of software.
> *QU:* Was any work every done on this 'solution'?

Some, but it was never usable. With time, I think other projects have
addressed many of these.

> *QU:* Are there any other projects out there that address this problem?

There are a whole set of proprietary applications that are designed to
address this issue for documentation.

There are also quite a few merge tools that are designed for code and
that only a coder could love. Eclipse, for example, has a good
one. There are none, AFAIK, that deal with merging branched documents.

I'm going to be working on that for my thesis in MIT over the next six
months. I'll be sure to keep you all informed and will let you know
what I end up with. :)


Benjamin Mako Hill
mako at atdot.cc

Creativity can be a social contribution, but only in so
far as society is free to use the results. --RMS
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.okfn.org/pipermail/okfn-discuss/attachments/20070201/aaf8a818/attachment.sig>

More information about the okfn-discuss mailing list