[okfn-discuss] Collaborative Development of Data
Rufus Pollock
rufus.pollock at okfn.org
Mon Feb 19 16:03:27 UTC 2007
Benj. Mako Hill wrote:
> <quote who="Rufus Pollock" date="Thu, Feb 15, 2007 at 01:03:22PM +0000">
>> We already have some fairly good working processes for collaborative
>> development of unstructured text: the two most prominent examples being
>> source code of computer programs and wikis for general purpose content
>> (encyclopedias etc). However these tools perform poorly (or not at all)
>> when we come to structured data.
>
> If you know how to massage the system, both Mediawiki (especially with a
> series of plugins) or MoinMoin are pretty good at this as well.
Hmmm. Depends how you are using them. Of course, /you can/ start
embedding sql queries or forms into them and having a backend database
but then you are essentially using them as a 'web framework' to help
with theming, user management etc etc (and they are no different -- and
perhaps worse -- than all the other web frameworks out there such as
rubyonrails, django, pylons, turbogears ...).
Once you start having structured data you want to start from a system
and an underlying domain model that respects that -- you don't want to
start from something that is designed to handle 'content' (that is
unstructured human-readable text).
The key point that lay behind my original post is that:
The crucial feature for collaborative development of *anything* is
versioning.
I then discussed how one would support this feature in relation to
structured data. One possibility is just to store the data as plain text
(perhaps with some agreed formatting) in a wiki. This idea comes to mind
because versioning is a major feature of wikis and is part of what makes
them good. However it is also implemented (and implementable) in lots of
other areas in which wikis would not be a good idea. I then suggested
that if you want to version data then you would indeed want something
different.
Perhaps a summary 'table' will help:
Wiki:
* Content type: processing by a human (usually text)
* Versioning: simple (per page) with associated features (recent
changes, history)
* Web interface: yes (often only interface)
* Write access: open (often anyone can write)
Code:
* Content type: human readable *and* machine readable
* Versioning: sophisticated (atomic, centralized and decentralized)
* Web interface: maybe (most development is done locally and then
commited)
* Write access: closed (limited commit access as code is fragile)
Structured Data:
* Content type: typed data with possibility of references
* Versioning: sophisticated would be nice
* Web interface: would be nice but not essential
* Write access: ?
> My biggest problem is that the syntax necessary to show where the data
> is (labeling in your description) is not something people always get
> right -- in fact, they very frequently get it wrong.
Exactly. This is the validation point. There is a reason that most of
development of systems that handle structured data (and hey this is most
of enterprise software) start from a domain model which implements this
kind of core logic (this is an integer, that is a string which should be
a valid email address etc etc).
> Now, there is syntax in wikis for documents too of course. But when you
> get it wrong, it's not usually so bad. While getting syntax wrong in a
> document may make your document a bit ugly, it's frequently noticeable
> and only infrequently impacts your meaning.
Absolutely. Wikis were designed to handle text that would be processed
by humans and it wouldn't matter if it was a bit wrong (this also
explains why most people don't develop software in 'wikis' because
errors in the code lead to stuff not compiling/running).
> But when you have a wide-open text box for data, screw-ups can be both
> much more difficult to detect (both for the computer, and for a human
> reading the page) and the impact is often that the data is unseen in
> other parts of the system.
bugs both of the basic and the more sophisticated kind become much more
costly ... That is why you want to start developing a proper domain model.
> The comparison between Microsoft Word and Microsoft Access with its form
> wizards is a useful analogy perhaps. Wikis, as they exist currently, do
> a pretty good job of addressing the first class of problems but do a
> pretty poor job (as of right now and as I understand it) of addressing
> the types of problems that Access does.
>
> What's great about Access is the interface is flexible and, once set up,
> you can make it difficult for users to add bad or invalid data by
> accident. It does most of this, of course through interface rather than
> validation -- this is reason people find such system so usable. I've
> always been sad that I've never seen a great piece of free software that
> did that same thing as well. Of course, this piece of free software
> should be collaborative as well, and that introduces lots of other
> problems.
I think this is a nice analogy but I think we should be careful here.
The core thing is what the underlying domain model supports not what the
front-end looks like. Most web applications are backed by a proper
domain model (and behind that a database) but might have very simple and
intuitive interfaces (e.g. del.icio.us). In fact can implement a proper
versioned domain model so that its interface looks 'like' a wiki. You
can also start putting proper object oriented structures into a wiki (as
I think OmegaWiki are doing) but in so doing you aren't really a wiki
any more (of if you are what one means by a wiki has become so broad as
to encompass pretty much anything).
> Perhaps, I'll write one some day. I think that such a project could
> learn a whole lot from wikis. However, I think there's a danger that we
> could "learn" a bit too much and not diverge in ways that will be
> essential to the success of such a project.
I completely agree. What I personally take from wikis is that:
* open write access (so v. low barrier to entry) can work really well
* you can successfully port existing ideas such as versioning to
other areas (in this case from code to content)
> Maybe such tools exist and I just don't know about them. That would be
> very exciting indeed!
I too would love to hear about them. As a complement to this discussion
I'm sketching out a generic versioned domain model implementation for
use in ckan (http://www.ckan.net). You can find some demo code at:
http://www.rufuspollock.org/code/vdm/
http://www.rufuspollock.org/code/vdm/README.txt
Regards,
Rufus
More information about the okfn-discuss
mailing list