[okfn-discuss] Collaborative development of text documents
Rufus Pollock
rufus.pollock at okfn.org
Fri Aug 25 14:18:16 UTC 2006
Dear Mako (and everyone else!),
Recently I worked on a lengthy essay with some other (non-tech) people.
The size of the piece meant that it wasn't suitable for a wiki. We chose
to keep the work in plain ascii format (markdown) and then convert to
html. The underlying source was kept in subversion. I would post a
version of the text regularly, people would then making changes posting
them back to me via email and I would then consolidate the changes and
repost. As a consequence of the problems encountered I did some googling
and came across your essay on a CODEX (Collaborative Online
Documentation (D)ifference Extractor)[1]. Having looked at this I have
various questions which I detail below.
Regards,
Rufus
## The Problem ##
This system worked just about ok. However, there were several problems:
1. Significant energy was expended by the editor to merge changes.
(in theory people could have merged their changes independently but some
of the users weren't very technically sophisticated and my prior
experience of document development had shown that it was better to have
the merging done centrally).
2. Problems arose when several of the participants ended up working of
slightly different versions (i.e. not the latest version).
3. It was to track changes and, in particular, to do easy diffing. At
least one of the participants reverted to pasting text into Microsoft
Word, editing and then posting Word files (of plain text) so as to be
able to access track changes visualization.
Thinking about these issues I went to look on the web to see what had
already been done. Googling around for subversion, text, diffs etc I
came across this piece of yours from 2002 on a CODEX RFC/Proposal
(Collaborative Online Documentation (D)ifference Extractor)[1] in which
you distilled the problem as follows:
> There is no good, free, version control system for documents or
documentation. The creation of any piece of literature, especially one
that involves multiple authors working on the document concurrently,
involves adding, removing, and changing text constantly. There is no
good way to keep track of these changes in the way that there is for
source code. Existing solutions are proprietary and/or kludgey.
I think you are absolutely right here. My only addition would be to
suggest that we can see at least three clear use cases for this kind of
tool:
1. Edit the underlying document (with the editor of choice)
2. Basic versioning (with logging)
3. Visualization of changes (diffs). Nice if this was integrated with 1.
4. Ability to branch, tag and merge
5. [not sure about this one] Annotation, for example to suggest or
indicate changes but without altering the source document (as one does
by inserting stuff like [[TODO: remove this paragraph]])
## Possible Solutions ##
You discuss the two current solutions:
1. Source-level VCS (with some diff utility, perhaps wdiff)
* Use case 1: in general editor independent. However support for
binary formats may be sketchy and no obvious method to integrate with
most WYSIWYG editors (compare this to source code IDEs)
* Use case 2 and 4: good
* Use case 3: left to external tools (poor)
* Use case 4: require external tool
2. Microsoft Word or Open Office (track changes)
* Use case 1: locked to particular tool
* Use case 2: no logging (doubts about scalability)
* Use case 3: good
* Use case 4: not supported
* Use case 5: may be possible as part of track changes functionality
*QU:* Have I summarized the situation accurately?
*QU:* Has anyone else had any experience trying to deal with this problem?
*QU:* What about wikis?
The CODEX RFC goes on to suggest a solution:
> I propose a robust, free version control system specifically designed
for working with documents--especially in a asynchronous collaborative
environment. I'll refer to this (non-existent) system as CODEX, or the
Collaborative Online Documentation (D)ifference Extractor. The software
will be free software and will be distributed under the terms of the GNU
GPL. The core engine will be written in either Perl or Python.
> Since my software will be free software, I will seek to not duplicate
effort where-ever possible. I think that building off a system like CVS
or subversion will be the logical first step. Since a diff will show
every changed line, it will by default show every changed word and piece
of white space. A contextual diff (which both CVS and subversion can
provide) will include even more information. Either of these programs
will be able to provide information useful for resolving conflicts and
will provide the ability to commit, checkout, release or watch a
project. They also both provide servers with several methods to use
interface over a LAN or the Internet. A future version of subversion
will allow for different client-side diff programs.
> I do NOT want this project to involve creating a new word processor.
There are more than enough of them, most of them bad. I would almost
certainly create another bad one. I want my software to be able to work
with many other word processors so that it might be picked up an
incorporated as a back-end to existing pieces of software.
*QU:* Was any work every done on this 'solution'?
*QU:* Are there any other projects out there that address this problem?
*QU:* What other possible solutions could be used?
[1]: http://mako.cc/projects/collablit/proposal/rfc.html
More information about the okfn-discuss
mailing list