[okfn-discuss] Fw: The equivalent of free software for online services

Sat Jul 29 11:59:48 BST 2006

Kragen does consistently interesting work[^1] and this is no exception. 
He's making a lot of important points:

## The Possibility of Vendor Lock-in ##

The possibilities for vendor lockin with webservices where the service 
operator controls your data are even greater than anything we've seen in 
the traditional software market.

This is Microsoft Office on steroids and I'd guesstimate that the 
'lockin-strength' of data exceeds that of a user interface (or even an 
API) by an order of magnitude (compare how much effort it takes to 
convert all your documents from word to open document to the effort of 
learning the new interface). As Ross Anderson's has repeatedly 
emphasised the value that a vendor can extract from users is roughly 
equivalent to the total switching cost and the switching cost of moving 
data is much higher than the retraining.

## Open APIs != Open knowledge ##

This is a point I've discussed frequently in the past with various 
people (Jo Walsh in particular). Kragen doesn't focus heavily on this 
but it implicit in much of what he says. For me this is perhaps the most 
important issue -- particularly given the touting of open APIs in web2.0 
PR. What exactly are the problems?

   1. The 'open' in open API is very narrow. Usually all it means is 
that the API is publicly documented and that access is free (though even 
this doesn't seem to be necessary -- many of amazon's 'open' API's are 
pay per use)
   2. The API cannot be freely changed or adapted by someone other than 
the service provider
   3. There is no guarantee that the API will remain free (as in cost) 
to use or even that it will remain fully documented
   4. Freedom to reuse or redistribute the information you obtain from 
the API is often limited (e.g. Google/Yahoo/... maps). As a consequence 
the reuse chain is extremely limited (usually to one step).
   5. Because the data is not openly available the ability for the 
community to find bugs, provide 'patches' etc is greatly curtailed

To summarize: of the 3 freedoms that make *open* knowledge[^okd] -- the 
freedom to access, to reuse, and to redistribute -- an open API 
guarantees none and at most promises to deliver on the first (though 
even here there are various limitations on full 'open' access ranging 
from charging to the imposition of usage quotas).

## The Solutions ##

Kragen suggests how we could address these problems by 'build[ing] these 
services as decentralized free-software peer-to-peer applications'. I 
certainly think this is an interesting suggestion and would concur with 
many of the items on his wish-list of tools and systems but I think a) 
we need to distinguish between different kinds of  service (and the 
associated data) b) distinguish between services and data.

For example it seems to me that a todolist service (highly personalized 
data, little reuse, little sharing, privacy issues etc) is substantially 
different from a map-tile provider (common data, massive reuse, large 
datasets etc) and that must influence the approach taken in providing 
that service in a truly free/open manner.

Second, and relatedly, I believe it is primarily the *data* and *not* 
the *service* that we need to manage in the highly-decentralized, 
collaborative (versioned), componentized[^2] fashion which Kragen describes.

In essence we need a *knowledge-centric* approach rather than 
*service-centric* one.

Regards,

Rufus

[^1]: For example his work on an open version of the OED about which I 
posted back in March:
   <http://blog.okfn.org/2006/03/17/open-version-of-the-oed/>

[^okd]: See the Open Knowledge Definition <http://www.okfn.org/okd/>

[^2]:<http://blog.okfn.org/2006/05/09/the-four-principles-of-open-knowledge-development/>

Jo Walsh wrote:
> connected to the "open API not truly open" discussion...
> 
> ----- Forwarded message from Kragen Sitaker <kragen at pobox.com> -----
> Date: Fri, 28 Jul 2006 16:50:10 -0400
> From: Kragen Sitaker <kragen at pobox.com>
> To: kragen-tol at lists.canonical.org
> 
> (First draft; thanks for your understanding.)
> 
> There's a lot of movement toward replacing software with services
> delivered over the internet --- to look at it another way, toward
> running the software you depend on on someone else's computer.  You
> might access your mail through Hotmail or Gmail, correspond with your
> friends via LiveJournal or Xanga, look at maps through Google Maps or
> WMS.jpl.nasa.gov, write your to-do lists in Ta-Da List or Sproutliner,
> edit your documents with PBWiki or Writely, store your browser bookmarks
> in Delicious or Furl (or DeepLeap, R.I.P.), and so on.  When I first
> wrote about this trend several years ago [0], I thought this was
> primarily vendor-driven --- it's easier to charge for services provided
> by software running on a machine you control, and it makes it harder for
> other people to compete with you.
> 
> But there are real advantages to delivering software in this way, some
> of which accrue to the users of the software as well as the providers;
> Paul Graham [1] and Philip Greenspun [7] have written eloquently about
> them.  The advantages, which I won't describe in great depth, include
> quicker software updates, easy social-network features (del.icio.us is a
> good example), less system administration, access to enormous databases
> without downloading them, and access to a lot of CPUs at once without
> buying them.
> 
> Unfortunately, all the disadvantages of proprietary software also accrue
> to web services [9], which was what occasioned my original essay,
> and figures prominently in the rationale for the GPLv3 process[2].
> 
> Jamie McCarthy just wrote a Slashdot article [3] on the subject, largely
> focusing on one particular problem: once data goes in, it can be hard,
> even impractical, to get it out.  This isn't a new issue; it can exist
> with proprietary software, too, where the lock-in is usually
> accomplished by secret data formats, rather than by storing the data
> outside of the physical control of its owner.  Jamie writes:
> 
> 	Today, while some companies are trying to build goodwill
> 	with that community, there is nothing like a GPL for web
> 	services. No one's discovered a legal foundation that would
> 	establish open services, openly shared web services, with the
> 	same kinds of rights that we insist on in open-source code. No
> 	one's even sure what "open services" might mean, indeed, there's
> 	no consensus that we even need such a thing.
> 
> There's been a lot of discussion about this particular problem,
> particularly in the wake of the discussion about Flickr and Zooomr [4],
> in which Zooomr made it easy for Flickr users to import all their images
> in order to switch to Zooomr, but for a while, Flickr made it hard for
> Zooomr to do this.
> 
> So far, all this echoes the "open standards" and "open formats"
> discussion from the days when we had to take proprietary software for
> granted.  In those days, we spent enormous amounts of effort trying to
> make sure our software kept our data in well-documented formats that
> were supported by other programs, and choosing proprietary software that
> conformed to well-documented interfaces (POSIX, SQL, SMTP, whatever)
> rather than the proprietary software that worked best for our purposes.
> 
> Ultimately, it was a losing game, because of the inherent conflict of
> interest between software author and software user.
> 
> Julian Cash just started MoveMyData [5], an open-source software project
> similar in concept to catdoc [6] --- the idea is to write little
> programs to extract your data from each web service that might try to
> hold it hostage, and more little programs to upload the data to some
> other comparable web service.
> 
> If MoveMyData becomes highly successful, then the users of web-based
> apps in 2008 will be only slightly worse off than users of proprietary
> desktop software in 1988.  The software they depend on can be changed or
> removed at any time, by parties outside their control, and they have no
> recourse; the owners of that software can monitor every mouse movement
> they make, should they so desire, and may be required to do so by local
> law enforcement officers in jurisdictions unknown to the user; and they
> still can't change it or study how it works internally.  But at least
> they will be able to make backups of their data, just like the 1988 guy,
> which they might in theory be able to load into some other similar
> program.
> 
> Of course, they have access to much greater software capabilities than
> the 1988 guy.  They have instant photo sharing and annotation, bookmark
> metadata, instant access to satellite photos of any place in the world,
> gigabytes of full-text-searchable email online from any random place
> they visit, and so on.  Good stuff.
> 
> But I want to talk about why people originally switched from proprietary
> software to free software.
> 
> A major reason, of course, was price.  Free software is not just free
> --- it's also, generally, free.  It's a lot easier to try out a piece of
> software if you don't have to pay for it first, and if you don't have to
> worry about license audits.
> 
> Also, free software tends to do what its users want it to, because those
> users can write patches for it, and share them.  So it tends to be a lot
> more pleasant to use than its proprietary equivalents, although it often
> suffers from feature overload, and it tends to be less buggy.  It tends
> to support open formats and open standards wholeheartedly, because that
> benefits its users, who have the authority to reject changes they don't
> like.
> 
> The most important reason, for me, was trustworthiness.  I knew it
> wasn't going to disappear next year, as proprietary software often does.
> I could keep running it as long as I wanted to.  I didn't have to worry
> that it might contain some secret piece of code to make it not work if
> it thought, rightly or wrongly, that I had copied it onto somebody
> else's machine, or was trying to let someone else use it over the
> network.  The file formats weren't going to become undocumented
> mysteries in a version upgrade (Mark Pilgrim lost his mail to Mail.app
> this way earlier this year [8]); if it started crashing on startup, I
> could find the bug and fix it.
> 
> These web services generally provide a low price already, although this
> varies at the whim of the service provider.  So you can't depend on low
> cost unless you can trust that the service will remain available to you.
> 
> So people use free software because of its guaranteed low cost, because
> it does what its users want, and because it's trustworthy.  And they use
> web services because they get low system administration costs, they can
> use huge databases without downloading them first, they can get software
> updates quickly, they can do very-CPU-intensive things, and they can
> collaborate with their friends easily.  How can we get both of these
> sets of advantages at once?
> 
> I think there is only one solution: build these services as
> decentralized free-software peer-to-peer applications, pieces of which
> run on the computers of each user.  As long as there's a single point of
> failure in the system somewhere outside your control, its owner is in a
> position to deny service to you; such systems are not trustworthy in the
> way that free software is.
> 
> Imagine, for example, that the pieces of the Global Mosaic and other
> large GIS data are distributed around the world among people who look at
> maps, and the system contains some way to prevent any pieces from
> getting lost.  (Automated speculation in an online futures market for
> map chunks, for example, as in [10]; or perhaps you could form a pact
> with 100 other people on your continent to ensure that there are at
> least three copies of each chunk among the 100 of you --- that way,
> you're only depending on those 100 people; or perhaps certain clubs,
> universities, or countries could maintain depositories that kept copies
> of all the pieces, but only served them up when they were otherwise
> unavailable.)
> 
> It's going to be a lot of work.  We'll need reputation systems to track
> which peers will keep their promises and return trustworthy data, new
> algorithms and data structures for decentralized computation (although
> secure hashes, public key cryptography, MapReduce, MIXes,
> capability-oriented security, and distributed hash tables seem to
> represent a lot of progress), new heuristics for systems design
> (Postel's law, for example), sandboxes that allow flexible
> interoperation of mobile code from different trust domains (E has one
> approach [13], GreaseMonkey has another [12]), a data store model that
> allows conflicting updates to coexist until a human feels like resolving
> them [14], and so on.
> 
> A lot of this is research that's already been done, that just needs a
> good platform built around it.
> 
> But building applications in this decentralized environment needs to be
> nearly as easy as building them on the web, at both ends of the spectrum
> from rich user interfaces to simple apps.  If everybody who wants to
> build a P2P photo-sharing app has to figure out how to handle
> distributing version updates, distributed storage of photos, NAT
> traversal, decentralized naming, peer reputation management, and so on,
> most of them will just give up, and the other five will build web sites
> instead.
> 
> So we need a platform, something like a web browser, that supports a
> universe of constantly-changing code written by a multitude of authors,
> which migrates to where it's being used, and simultaneously supports
> individual control over what version of the code is running on your
> system and no-hassle updating when someone else has a change you want;
> that replicates your data transparently to other machines so that you
> don't have a single point of failure, but without allowing the owners of
> those other machines to spy on you or corrupt your data; that runs
> programs in a high-level language; that supports conflicting updates to
> different replicas of the data and allows a human being to resolve the
> conflicts; and that makes it easy for you to share particular bits of
> your code or data with anyone, everyone, or no one.  Maybe we could even
> start with a web browser and add the other stuff to it.
> 
> If we don't build such a platform, we will eventually lose the
> advantages of free software, because we will use web services instead.
> 
> Groove [11] is pretty far along this path already, I think.
> 
> [0] "People, places, things, and ideas", by Kragen Sitaker, January 1999,
> http://lists.canonical.org/pipermail/kragen-tol/1999-January/000322.html or
> http://www.gnu.org/philosophy/kragen-software.html
> [1] "The Other Road Ahead", by Paul Graham, September 2001,
> http://www.paulgraham.com/road.html
> [2] See the rationale for the GPLv3, version 2, subsection 4 of section
> 7b, "Additional Requirements" --- available online at
> http://gplv3.fsf.org/rationale at the moment.  This is intended to allow
> license compatibility with licenses such as the Affero GPL.
> [3] "Web Services and Open Source at OSCON", by Jamie McCarthy,
> July 2006, at
> http://developers.slashdot.org/article.pl?sid=06/07/26/1537213
> [4] "Flickr, Zooomr and API Parity", by Kevin Yank, July 2006,
> http://www.sitepoint.com/blogs/2006/06/21/flickr-zooomr-and-api-parity/
> [5] http://www.movemydata.org/
> [6] catdoc is a Microsoft Word decoder by Victor Wagner:
> http://www.45.free.net/~vitus/software/catdoc/
> [7] Some of Greenspun's words along these lines are in "A Future
> So Bright You'll Need to Wear Sunglasses", one of the chapters
> of his book, "Philip & Alex's Guide to Web Publishing", online at 
> http://philip.greenspun.com/panda/future --- warning,
> this page contains photographs of naked women.
> [8] "Juggling Oranges", by Mark Pilgrim, June 2006, at
> http://diveintomark.org/archives/2006/06/16/juggling-oranges
> [9] I know Microsoft and a gajillion "enterprise architects" have
> appropriated the term "web services" to mean programs accessible through
> XML-RPC and SOAP.  That is stupid, because XML-RPC and SOAP servers
> aren't part of the web (you can't talk to them with a web browser or
> link to things stored in them with URLs) and don't provide anything a
> normal person would recognize as a "service".  In this essay, I'm
> continuing to use the term in the sense in which Philip Greenspun meant
> it when he coined it: web servers that do things somebody wants.
> (See http://philip.greenspun.com/wtr/application-servers.html
> for some example uses from 1998.)
> [10] "Grid Economics", by Kragen Sitaker, at 
> http://wiki.commerce.net/wiki/Grid_Economics --- and about a
> zillion academic papers, going back to Ivan Sutherland's 1968
> paper, "A futures market in computer time".  Some of the recent
> real work in this vein includes MojoNation and Padala's OCEAN;
> see "A Survey of Market-Based Approaches to Computation", by
> Shashank Shetty, P. Padala, and M.P. Frank, August 2003, at
> http://www.cise.ufl.edu/~ppadala/publications/tr/survey.pdf
> [11] The best current description I've found of Groove is the
> Wikipedia article, "Microsoft Office Groove", at
> http://en.wikipedia.org/wiki/Microsoft_Office_Groove ---
> although there are other materials available online.  Todd R.
> Weiss wrote a PC World article, "Groove Updates Its Virtual
> Office", in July 2004, online at
> http://www.pcworld.com/article/116839-1/article.html --- this
> article summarizes what Groove does and how it's interesting.
> Some brief allusions to Groove's handling of conflicts are in
> "James Governor's Misjudgement of Ray Ozzie", at
> http://www.rhs.com/web/blog/PowerOfTheSchwartz.nsf/d6plinks/RSCZ-6J8JD5,
> and some more descriptions (focused on connection with
> SharePoint) in
> http://www.offlinesharepoint.com/topic/groove-2007/ --- and
> there's a "Groove Virtual Office FAQ" at
> http://www.groove.net/index.cfm?pagename=VO_FAQ that roughly
> describes the synchronization and sharing facilities of Groove.
> [12] Some details (with example code) on GreaseMonkey's use
> of XPCNativeWrappers, including how it affects user scripts, can
> be found in the article "Avoid Common Pitfalls in Greasemonkey",
> by Mark Pilgrim, November 2005, also a chapter in his book
> "Greasemonkey Hacks",
> http://www.oreillynet.com/pub/a/network/2005/11/01/avoid-common-greasemonkey-pitfalls.html
> [13] The E language, child of Joule, largely by Mark Miller and Marc
> Stiegler: http://www.erights.org/
> [14] Like Lotus Notes, CVS, or various weakly-consistent databases from
> academia.
> 
> ----- End forwarded message -----
>