[okfn-discuss] Fw: The equivalent of free software for online services

Sat Jul 29 01:30:19 UTC 2006

connected to the "open API not truly open" discussion...

----- Forwarded message from Kragen Sitaker <kragen at pobox.com> -----
Date: Fri, 28 Jul 2006 16:50:10 -0400
From: Kragen Sitaker <kragen at pobox.com>
To: kragen-tol at lists.canonical.org

(First draft; thanks for your understanding.)

There's a lot of movement toward replacing software with services
delivered over the internet --- to look at it another way, toward
running the software you depend on on someone else's computer.  You
might access your mail through Hotmail or Gmail, correspond with your
friends via LiveJournal or Xanga, look at maps through Google Maps or
WMS.jpl.nasa.gov, write your to-do lists in Ta-Da List or Sproutliner,
edit your documents with PBWiki or Writely, store your browser bookmarks
in Delicious or Furl (or DeepLeap, R.I.P.), and so on.  When I first
wrote about this trend several years ago [0], I thought this was
primarily vendor-driven --- it's easier to charge for services provided
by software running on a machine you control, and it makes it harder for
other people to compete with you.

But there are real advantages to delivering software in this way, some
of which accrue to the users of the software as well as the providers;
Paul Graham [1] and Philip Greenspun [7] have written eloquently about
them.  The advantages, which I won't describe in great depth, include
quicker software updates, easy social-network features (del.icio.us is a
good example), less system administration, access to enormous databases
without downloading them, and access to a lot of CPUs at once without
buying them.

Unfortunately, all the disadvantages of proprietary software also accrue
to web services [9], which was what occasioned my original essay,
and figures prominently in the rationale for the GPLv3 process[2].

Jamie McCarthy just wrote a Slashdot article [3] on the subject, largely
focusing on one particular problem: once data goes in, it can be hard,
even impractical, to get it out.  This isn't a new issue; it can exist
with proprietary software, too, where the lock-in is usually
accomplished by secret data formats, rather than by storing the data
outside of the physical control of its owner.  Jamie writes:

	Today, while some companies are trying to build goodwill
	with that community, there is nothing like a GPL for web
	services. No one's discovered a legal foundation that would
	establish open services, openly shared web services, with the
	same kinds of rights that we insist on in open-source code. No
	one's even sure what "open services" might mean, indeed, there's
	no consensus that we even need such a thing.

There's been a lot of discussion about this particular problem,
particularly in the wake of the discussion about Flickr and Zooomr [4],
in which Zooomr made it easy for Flickr users to import all their images
in order to switch to Zooomr, but for a while, Flickr made it hard for
Zooomr to do this.

So far, all this echoes the "open standards" and "open formats"
discussion from the days when we had to take proprietary software for
granted.  In those days, we spent enormous amounts of effort trying to
make sure our software kept our data in well-documented formats that
were supported by other programs, and choosing proprietary software that
conformed to well-documented interfaces (POSIX, SQL, SMTP, whatever)
rather than the proprietary software that worked best for our purposes.

Ultimately, it was a losing game, because of the inherent conflict of
interest between software author and software user.

Julian Cash just started MoveMyData [5], an open-source software project
similar in concept to catdoc [6] --- the idea is to write little
programs to extract your data from each web service that might try to
hold it hostage, and more little programs to upload the data to some
other comparable web service.

If MoveMyData becomes highly successful, then the users of web-based
apps in 2008 will be only slightly worse off than users of proprietary
desktop software in 1988.  The software they depend on can be changed or
removed at any time, by parties outside their control, and they have no
recourse; the owners of that software can monitor every mouse movement
they make, should they so desire, and may be required to do so by local
law enforcement officers in jurisdictions unknown to the user; and they
still can't change it or study how it works internally.  But at least
they will be able to make backups of their data, just like the 1988 guy,
which they might in theory be able to load into some other similar
program.

Of course, they have access to much greater software capabilities than
the 1988 guy.  They have instant photo sharing and annotation, bookmark
metadata, instant access to satellite photos of any place in the world,
gigabytes of full-text-searchable email online from any random place
they visit, and so on.  Good stuff.

But I want to talk about why people originally switched from proprietary
software to free software.

A major reason, of course, was price.  Free software is not just free
--- it's also, generally, free.  It's a lot easier to try out a piece of
software if you don't have to pay for it first, and if you don't have to
worry about license audits.

Also, free software tends to do what its users want it to, because those
users can write patches for it, and share them.  So it tends to be a lot
more pleasant to use than its proprietary equivalents, although it often
suffers from feature overload, and it tends to be less buggy.  It tends
to support open formats and open standards wholeheartedly, because that
benefits its users, who have the authority to reject changes they don't
like.

The most important reason, for me, was trustworthiness.  I knew it
wasn't going to disappear next year, as proprietary software often does.
I could keep running it as long as I wanted to.  I didn't have to worry
that it might contain some secret piece of code to make it not work if
it thought, rightly or wrongly, that I had copied it onto somebody
else's machine, or was trying to let someone else use it over the
network.  The file formats weren't going to become undocumented
mysteries in a version upgrade (Mark Pilgrim lost his mail to Mail.app
this way earlier this year [8]); if it started crashing on startup, I
could find the bug and fix it.

These web services generally provide a low price already, although this
varies at the whim of the service provider.  So you can't depend on low
cost unless you can trust that the service will remain available to you.

So people use free software because of its guaranteed low cost, because
it does what its users want, and because it's trustworthy.  And they use
web services because they get low system administration costs, they can
use huge databases without downloading them first, they can get software
updates quickly, they can do very-CPU-intensive things, and they can
collaborate with their friends easily.  How can we get both of these
sets of advantages at once?

I think there is only one solution: build these services as
decentralized free-software peer-to-peer applications, pieces of which
run on the computers of each user.  As long as there's a single point of
failure in the system somewhere outside your control, its owner is in a
position to deny service to you; such systems are not trustworthy in the
way that free software is.

Imagine, for example, that the pieces of the Global Mosaic and other
large GIS data are distributed around the world among people who look at
maps, and the system contains some way to prevent any pieces from
getting lost.  (Automated speculation in an online futures market for
map chunks, for example, as in [10]; or perhaps you could form a pact
with 100 other people on your continent to ensure that there are at
least three copies of each chunk among the 100 of you --- that way,
you're only depending on those 100 people; or perhaps certain clubs,
universities, or countries could maintain depositories that kept copies
of all the pieces, but only served them up when they were otherwise
unavailable.)

It's going to be a lot of work.  We'll need reputation systems to track
which peers will keep their promises and return trustworthy data, new
algorithms and data structures for decentralized computation (although
secure hashes, public key cryptography, MapReduce, MIXes,
capability-oriented security, and distributed hash tables seem to
represent a lot of progress), new heuristics for systems design
(Postel's law, for example), sandboxes that allow flexible
interoperation of mobile code from different trust domains (E has one
approach [13], GreaseMonkey has another [12]), a data store model that
allows conflicting updates to coexist until a human feels like resolving
them [14], and so on.

A lot of this is research that's already been done, that just needs a
good platform built around it.

But building applications in this decentralized environment needs to be
nearly as easy as building them on the web, at both ends of the spectrum
from rich user interfaces to simple apps.  If everybody who wants to
build a P2P photo-sharing app has to figure out how to handle
distributing version updates, distributed storage of photos, NAT
traversal, decentralized naming, peer reputation management, and so on,
most of them will just give up, and the other five will build web sites
instead.

So we need a platform, something like a web browser, that supports a
universe of constantly-changing code written by a multitude of authors,
which migrates to where it's being used, and simultaneously supports
individual control over what version of the code is running on your
system and no-hassle updating when someone else has a change you want;
that replicates your data transparently to other machines so that you
don't have a single point of failure, but without allowing the owners of
those other machines to spy on you or corrupt your data; that runs
programs in a high-level language; that supports conflicting updates to
different replicas of the data and allows a human being to resolve the
conflicts; and that makes it easy for you to share particular bits of
your code or data with anyone, everyone, or no one.  Maybe we could even
start with a web browser and add the other stuff to it.

If we don't build such a platform, we will eventually lose the
advantages of free software, because we will use web services instead.

Groove [11] is pretty far along this path already, I think.

[0] "People, places, things, and ideas", by Kragen Sitaker, January 1999,
http://lists.canonical.org/pipermail/kragen-tol/1999-January/000322.html or
http://www.gnu.org/philosophy/kragen-software.html
[1] "The Other Road Ahead", by Paul Graham, September 2001,
http://www.paulgraham.com/road.html
[2] See the rationale for the GPLv3, version 2, subsection 4 of section
7b, "Additional Requirements" --- available online at
http://gplv3.fsf.org/rationale at the moment.  This is intended to allow
license compatibility with licenses such as the Affero GPL.
[3] "Web Services and Open Source at OSCON", by Jamie McCarthy,
July 2006, at
http://developers.slashdot.org/article.pl?sid=06/07/26/1537213
[4] "Flickr, Zooomr and API Parity", by Kevin Yank, July 2006,
http://www.sitepoint.com/blogs/2006/06/21/flickr-zooomr-and-api-parity/
[5] http://www.movemydata.org/
[6] catdoc is a Microsoft Word decoder by Victor Wagner:
http://www.45.free.net/~vitus/software/catdoc/
[7] Some of Greenspun's words along these lines are in "A Future
So Bright You'll Need to Wear Sunglasses", one of the chapters
of his book, "Philip & Alex's Guide to Web Publishing", online at 
http://philip.greenspun.com/panda/future --- warning,
this page contains photographs of naked women.
[8] "Juggling Oranges", by Mark Pilgrim, June 2006, at
http://diveintomark.org/archives/2006/06/16/juggling-oranges
[9] I know Microsoft and a gajillion "enterprise architects" have
appropriated the term "web services" to mean programs accessible through
XML-RPC and SOAP.  That is stupid, because XML-RPC and SOAP servers
aren't part of the web (you can't talk to them with a web browser or
link to things stored in them with URLs) and don't provide anything a
normal person would recognize as a "service".  In this essay, I'm
continuing to use the term in the sense in which Philip Greenspun meant
it when he coined it: web servers that do things somebody wants.
(See http://philip.greenspun.com/wtr/application-servers.html
for some example uses from 1998.)
[10] "Grid Economics", by Kragen Sitaker, at 
http://wiki.commerce.net/wiki/Grid_Economics --- and about a
zillion academic papers, going back to Ivan Sutherland's 1968
paper, "A futures market in computer time".  Some of the recent
real work in this vein includes MojoNation and Padala's OCEAN;
see "A Survey of Market-Based Approaches to Computation", by
Shashank Shetty, P. Padala, and M.P. Frank, August 2003, at
http://www.cise.ufl.edu/~ppadala/publications/tr/survey.pdf
[11] The best current description I've found of Groove is the
Wikipedia article, "Microsoft Office Groove", at
http://en.wikipedia.org/wiki/Microsoft_Office_Groove ---
although there are other materials available online.  Todd R.
Weiss wrote a PC World article, "Groove Updates Its Virtual
Office", in July 2004, online at
http://www.pcworld.com/article/116839-1/article.html --- this
article summarizes what Groove does and how it's interesting.
Some brief allusions to Groove's handling of conflicts are in
"James Governor's Misjudgement of Ray Ozzie", at
http://www.rhs.com/web/blog/PowerOfTheSchwartz.nsf/d6plinks/RSCZ-6J8JD5,
and some more descriptions (focused on connection with
SharePoint) in
http://www.offlinesharepoint.com/topic/groove-2007/ --- and
there's a "Groove Virtual Office FAQ" at
http://www.groove.net/index.cfm?pagename=VO_FAQ that roughly
describes the synchronization and sharing facilities of Groove.
[12] Some details (with example code) on GreaseMonkey's use
of XPCNativeWrappers, including how it affects user scripts, can
be found in the article "Avoid Common Pitfalls in Greasemonkey",
by Mark Pilgrim, November 2005, also a chapter in his book
"Greasemonkey Hacks",
http://www.oreillynet.com/pub/a/network/2005/11/01/avoid-common-greasemonkey-pitfalls.html
[13] The E language, child of Joule, largely by Mark Miller and Marc
Stiegler: http://www.erights.org/
[14] Like Lotus Notes, CVS, or various weakly-consistent databases from
academia.

----- End forwarded message -----

-- 
ghug is my email archiving bot. if you see it cc'd on this email,
please leave it cc'd, that will help me a lot. http://frot.org/ghug