[okfn-discuss] Fw: [Geodata] discoverability and the wiki

Tue Oct 9 11:34:06 BST 2007

Aaron Straup Cope wrote:
> One of the principal motivations behind using SMW for the wine site 
> (grape.spum.org) was laziness.
> 
> At the time (I was washing the dishes) I briefly considered it as an 
> opportunity to play with Rails and/or Django and then quickly decided I 
> couldn't be bothered with setting up databases and managing 
> dependencies; both of which quickly descend in to the tedium of managing 
> relationships and input validation. Or :
> 
> http://www.aaronland.info/weblog/2006/12/17/meat/#papernet

Well said -- though the danger with such mods of wikis (and I speak with 
a little experience of messing around with MoinMoin -- and MW to a much 
lesser extent -- when thinking about e.g. CKAN) is that eventually you 
are using them as a we web-app development toolkit which is *not* what 
they were really designed for. However the point is taken that one wants 
to get moving quickly.

> The decision was also influenced by an ongoing struggle about how to 
> bridge the gap (read : chasm) between people and machines for "storing" 
> recipes; a problem that is fantastically harder than it seems. Or :
> 
> http://www.aaronland.info/weblog/2007/08/21/address/#doom

Yes! There is a fundamental trade-off as you put it:

"walking the line between making it easy enough for people to bother 
putting data in to a system and still useful enough to make it worth the 
trouble of getting it out."

> When I finally started poking around how to do stuff in MW the one thing 
> that stunned me was, in fact, how complicated many of the articles were.
> 
> Like anything else, it had developed its own language of specialization 
> in the same way that people have adapted their practice (and 
> expectation) for things like tagging in delicious. Or any wiki, for that 
> matter. Or :
> 
> http://www.aaronland.info/weblog/2007/02/17/platform/#wall

To repeat my point earlier in more lapidary form:

"When you starting using a swiss army knife to build a house both the 
house and the swiss army knife suffer"

> Whether or not a registry of geodata lends itself to that kind of 
> practice remains, of course, an open question.
> 
> At this point, it is probably also worth pointing out that I am 
> intimately involved with the "machine tags" work at Flickr so if my 
> biases aren't already clear let there be no doubt :-)
> 
> http://www.flickr.com/groups/api/discuss/72157594497877875/
> 
> The thing about machine tags is that they are RDF by any measure. The 
> key difference being : You don't worry about namespaces unless you want 
> to. In a controlled environment, like the SMW, though you can simply set 
> up a registry of known prefixes and let the computrons sort it out.

Sure. But I am not sure that 'namespace' issues are the big one. 
Ultimately mapping from a well defined domain object in code to RDF or 
to anything else (json/xml ...) isn't that hard. What is usually hard 
(or perhaps time-consuming) is getting a good domain model and having 
the good user interface (including getting good performance -- e.g. 
because of the versioned nature of the domain model in CKAN loading 
certain pages (fortunately not that important ones at present) take a 
while -- I've also noticed that e.g. del.icio.us has started to get 
quite unresponsive. These sort of things mean people 'just leave').

> So, perhaps one approach would be to simply update the CKAN to (I am 
> happy to submit patches once I've looked at the code and my mother isn't 
> visiting... ;-) store machine tags to allow for chunks of arbitrary 
> domain-specific metadata, per Rufus' comment.

This I think *is* indeed a neat way to go.

> This is, in fact, really easy until you get to the search part. Or :
> 
> http://www.aaronland.info/weblog/2007/08/24/aware/#mtdb
> 
> And there's the rub. The search part -- not only finding, but finding 
> relevant answers -- is always going to be the hard part because implicit 
> in the "problem statement" (or solution) is that someone has managed to 
> write the Do What I Mean engine.

But that leads back to the fundamental trade-off:

"More structure means harder for people to enter (so less of it) but 
easier to find stuff and join it together in interesting ways"

Conversely

"Less structure (dare I mention 'horse=yes'!) means easier to enter data 
but harder to find and join it together"

Depending on where your constraints are you go one way or the other 
(e.g. if you have a bunch of librarians who will religiously use all the 
metadata fields then go for structure but if you are hoping people will 
just drop in off the 'net and do it you better make it damn easy to get 
stuff in there.

> The RDF weirdos like to believe that TBL's magic layer cake of 
> trust+proof+logic is the answer, which is madness. The Google people 
> like to believe that their special "We're smarter than you" sauce is the 
> answer, which is hubris. Social networking sites like to believe that 
> your contacts have the answer, which is wishful thinking.

Indeed.

> Meanwhile the CPAN is probably the only tool that has ever managed to 
> gracefully ("gracefully") dance around the problem; although often at 
> the expense of needing to install half the Internet just to add support 
> for plain text sprockets...

:)

> Which is a very long way of saying : I don't think that there's really a 
> need to worry about "random" yet.
> 
> It will be messy, for sure, but I tend to think it is more important to 
> let people add data quickly and easily than it is to try to imagine how 
> it will sort itself out in the end.

That's my feeling. though the kicker here is that one might want some 
structure in order to have nice interfaces that let people add stuff 
more easily. e.g. you might want to only show the geodata related stuff 
on geodata package pages rather than the other 3000 tags people have 
used for other types of material but maybe even this is too much!

> The sorting out is important but that's always going to be subject to 
> both the magic (computers) and conventions (humans) of the day.
> 
> By which I mean to say : horse=yes!

By which you mean for this kind of stuff people do enter data are the 
constraining factor and we can work on getting info back later. I 
basically agree and that is to some extent why CKAN is the way it is (no 
text in RDF in a wiki stuff which you so poetically described as 
stabbing yourself in the eyeballs ...).

~rufus