[okfn-help] Post mortem: Is It Open launch

Sun Feb 21 13:34:52 GMT 2010

On 21 February 2010 11:22, John Bywater
<john.bywater at appropriatesoftware.net> wrote:
> Having spent some slightly uncomfortable time yesterday discussing errors in
> the recently launched Is It Open System with Peter, and as I have just a
> minute, I feel it shouldn't be left uncommented that the errors which Peter
> experienced, and which he raised on the main public discussion list (with
> some implications for reputation of the various parties involved) arose from
> an *untested* section of code (that which receives response emails).
>
> This aspect of the isitopen code is basically a piece of user interface
> ("data handler sends an enquiry response to the system by email") and could
> have been tested for a modest range of input cases, by asserting each input
> case is presentable. It would not have been more than a few minutes before
> the "broken HTML in email" case showed up.

Yes and no :) -- this code was under test but had not been presented
with the particular input that caused it to break. My experience so
far, is that especially concerning input from outside of your system
there is nothing like actual use to throw up bugs -- even when you
have extensive testing.

> Therefore I don't think this was just a case of bad luck, but rather it was
> an error of process ("deviation from test-driven development"): it was
> decided (collectively) that this section of code would not be tested any
> more as it was considered to be working already.

It is of course clear *now* what test would have identified the
brokenness but would that necessarily have been a test we added in
advance? Maybe and maybe not. I'm not so convinced this is a
deficiency in the development process here as in the deployment
process: we should have done more "testing by use" before the big
public launch ... (but then again, we are working with very limited
resources and deployment and migration working until Thursday evening
...)

> It matters only that we understand this was not some stochastic event (a
> piece of "bad luck") but determined by not cooking up the system so that it
> is bounded to work only in ways that are intended (through a test-first
> approach).

Possibly though I'm dubious as to how possible it is to test all
relevant inputs to the system -- particularly given the extremely
limited resources we possess.

> Of course, this isn't to demand the impossible ("all releases must always be
> totally error free") but that in this case a little more time spent testing
> something that was known to be untested would have avoided much more time

It wasn't untested. We did have, or at least used to have, tests for
rendering messages ... (the issue was it didn't test enough possible
input messages ...)

What was untested was the migration process (which was responsible for
1 of the 2 issues that arose -- lack of enquiry owners for enquiries
migrated from pre-the-most-recent release). This is something that it
would be good to have under test and wasn't (that said, from my
experience it is relatively very costly to test and which, if we'd
gone over the site thoroughly by hand -- mea culpa here -- we'd have
found very quickly and which, once found, was fixed in 10m ...)

> being spent discussing a broken service with users. In other words,
> producing a higher quality system would have involved less total time
> overall.

We can all agree more testing is good. The question is how much more
and at what cost ...

Rufus
-- 
Open Knowledge Foundation
Promoting Open Knowledge in a Digital Age
http://www.okfn.org/ - http://blog.okfn.org/