This Test Was an Ordeal to Write

13 07 2012

I had an interesting time at work today, writing what should have been a simple nothing test.  I thought somebody might be interested in the story.

My client’s application has an entity called a Dashboard.  Before my pass through this particular section of the code, a Dashboard was simply a Name and a set of Modules.

For one reason or another, there is a business-equals operator for Dashboard that returns true if the two Dashboards it’s comparing have the same Name and the same set of Modules, false otherwise.  Simple, standard stuff.

Part of my change involved adding a preview/thumbnail Icon to Dashboard, so that the user could see, in a listing of Dashboards, approximately what the Dashboard looked like before examining it in detail.  I looked at the business-equals method and decided that the new Icon field was irrelevant to it, so I left it alone.

After completing my change, I submitted my code for review, and one of the reviewers flagged the business-equals method.  “Doesn’t check for equality of Icon,” he said.

I explained to him the point above–that Icon is irrelevant to business-equals–and he made a very good argument that I hadn’t considered.

“That makes sense,” he said, “but I didn’t notice it at first, and maybe I’m not unique.  What if somebody comes through this code someday and notices what I noticed, and in the spirit of leaving the code better than he found it, he goes ahead and puts Icon in the business-equals method?  Then we’ll start getting false negatives in production.”

I could see myself in just that role, so I understood exactly where he was coming from.

“Better put a comment in there,” he suggested, “explaining that Icon’s been left out for a reason.”

A comment? I thought to myself.  Comments suck.  I’ll write a test, that’s what I’ll do.

I’ll write a test that uses business equals to compare two Dashboards with exactly the same Name and Module set, but with different Icons, and assert that the two are declared equal.  That way, if somebody adds an Icon equality check later, that test will fail, and upon examining the failure he’ll understand that Icon was left out on purpose.  Maybe somebody has now determined that Icons are relevant to business equals, and the failing test will be deleted, but at least the change won’t be an accidental one that results in unexpected false negatives.

So I did write a test; one that looked a lot like this (C++ code, under the Qt framework).

QString name ("Name");
QPixmap p (5, 10); // tall
QPixmap q (10, 5); // wide
ModuleSet modules;
Dashboard a (name, p, modules);
Dashboard b (name, q, modules);

bool result = (a == b);

EXPECT_TRUE (result); // Google Test assertion

Elementary, right?

Well, not really.  When I ran the test, Qt segfaulted on the QPixmap constructor.  I made various minor changes and scoured Google, but nothing I tried worked: at least in the testing environment, I was unable to construct a QPixmap with dimensions.

I discovered, however, that I could construct a QPixmap with no constructor parameters just fine; that didn’t cause a segfault.  But I couldn’t make p and q both blank QPixmaps like that, because then they’d be the same and the test wouldn’t prove anything: they have to be different.

So I thought, well, why not just mock the QPixmaps?  I have Google Mock here; I’ll just write a mock with an operator==() that always returns false; then I don’t have to worry about creating real QPixmaps.

Problem is, QPixmap doesn’t have an operator==() to mock.  If that notional future developer is going to add code to compare QPixmaps, he’s going to have to do it by comparing properties manually.  Which properties?  Dunno: I’m not him.

Scratch mocking.

Well, I looked over the constructor list for QPixmap and discovered that I could create a QPixmap from a QImage, and I could create a QImage from a pair of dimensions, some binary image data, and a Format specifier (for example, bilevel, grayscale, color).  So I wrote code to create a couple of bilevel 1×1 QImages, one consisting of a single white pixel and the other consisting of a single black pixel, and used those QImages to construct a couple of unequal QPixmaps.

Cool, right?

Nope: the constructor for the first QImage segfaulted.

Cue another round of code shuffling and Googling: no joy.

Well, I’m not licked yet, I figured.  If you create .png files, and put them in a suitable place, and write an XML file describing them and their location, and refer to the XML file from the proper point in the build process, then you can create QPixmaps with strings that designate the handles of those .png files, and they’ll be loaded for you.  I had done this in a number of places already in the production code, so I knew it worked.

It was a long, arduous process, though, to set all that up.

But I’m no slacker, so I put it all together, corrected a few oversights, and ran the test.


Success?  Really?  I was suspicious.  I ran it in the debugger and stepped through that QPixmap creation.

Turns out all that machinery didn’t work; but instead of segfaulting or throwing an exception, the QPixmap constructors created blank QPixmaps–just like the parameterless version of the constructor would have–that were identical to each other; so once more, the test didn’t prove anything.

More Googling and spiking and whining to my neighbor.

This time, my neighbor discovered that there’s an object called QApplication that does a lot of global initialization in its constructor.  You create a QApplication object somewhere (doesn’t seem to matter where), and suddenly a bunch of things work that didn’t work before.

Okay, fine, so my neighbor wrote a little spike program that created a QPixmap without a QApplication.  Bam: segfault.  He modified the spike to create a QApplication before creating the QPixmap, and presto: worked just fine.

So that was my problem, I figured.  I put code to create a QApplication in the test method and ran the test.

Segfault on the constructor for QApplication.

No big deal: I moved the creation of the QApplication to the SetUp() method of the test class.

Segfault on the constructor for QApplication.

Darn.  I moved it to the initializer list for the test class’s constructor.

Segfault on the constructor for QApplication.

I moved it out of the file, into the main() class that ran the tests.

Segfault on the constructor for QApplication.

By now it was after lunch, and the day was on its way to being spent.

But I wasn’t ready to give up.  This is C, I said to myself: it has to be testable.

The business equals doesn’t ever touch Icon at all, and any code that does touch Icon should cause trouble.  That gave me an idea.

So I created two Dashboards with blank (that is, equal) Icons; then I did this to each of them:

memset (&dashboard.Icon, 0xFFFFFFFF, sizeof (dashboard.Icon));

See that?  The area of memory that contains the Icon field of dashboard, I’m overwriting with all 1 bits.  Then I verified that calling just about any method on that corrupted Icon field would produce a segfault.

There’s my test, right?  Those corrupted Icons don’t bother me, since I’m not bothering them, but if anybody changes Dashboard::operator==() to access the Icon field, that test will segfault.  Segfault isn’t nearly as nice as a formal assertion failure, but it’s a whole heck of a lot better than false negatives in production.

Well, it turned out to be close to working.  Problem was, once I was done with those corrupted QPixmaps and terminated the test, their destructors were called, and–of course, since they were completely corrupt–the destructors segfaulted.

Okay, fine, so I put a couple of calls at the end of my test to an outboard method that did this:

QPixmap exemplar;
memcpy (&dashboard.Icon, &exemplar, sizeof (dashboard.Icon));

Write the guts of a real Icon back over all the 1 bits when we’re finished with the corrupted Icon, and the destructor works just fine.

My reviewer didn’t like it, though: he pointed out that I was restoring the QPixmap from a completely different object, and while that might work okay right now on my machine with this version of Qt and this compiler, modifying any of those variables might change that.

So, okay, fine, I modified things so that corrupting each dashboard copied its Icon contents to a buffer first, from which they were then restored just before calling the destructor.  Still works great.

So: this is obviously not the right way to write the test, because Qt shouldn’t be segfaulting in my face at every turn.  Something’s wrong.

But it works!  The code is covered, and if somebody reflexively adds Icon to Dashboard’s business equals, a test will show him that it doesn’t belong there.

I felt like Qt had been flipping me the bird all day, receding tauntingly before me as I pursued it; but finally I was able to grab it by the throat and beat it into bloody submission anyway.  (Does that sound a little hostile?  Great: it felt a little hostile when I did it.)  I’m ashamed of the test, but proud of the victory.


So I’ve Got This AVR Emulator Partway Done…

2 07 2012

I decided to go with writing the emulator, for several reasons:

  1. It puts off the point at which I’ll start having to get serious about hardware.  Yes, I want to get into hardware, really I do, but it scares me a little: screw up software and you get an exception or a segfault; screw up hardware and you get smoke, and have to replace parts and maybe recharge the fire extinguisher.
  2. It’s going to teach me an awful lot about the microcontroller I’ll be using, if I have to write software to simulate it.
  3. It’s an excuse to write code in Scala rather than in C.

Whoa–what was that last point?  Scala?  Why would I want to write a low-level bit-twiddler in Scala?  Why not write it in a low-level language like C or C++?

If that’s your question, go read some stuff about Scala.  I can’t imagine anything that I’d rather write in C than Scala.

That said, I should point out that Scala’s support for unsigned values is–well, nonexistent.  Everything’s signed in Scala, and treating 0x94E3 as a positive number can get a little hairy if you put it in a Short.  So I wrote an UnsignedShort class to deal with that, including a couple of implicits to convert back and forth from Int, and then also an UnsignedByte class that worked out also to have a bunch of status-flag stuff in it for addition, subtraction, and exclusive-or.  (Maybe that should be somewhere else, somehow.)

Addition, subtraction, and exclusive-or?  Why just those?  Why no others?

Well, because of the way I’m proceeding.

The most important thing about an emulator, of course, is that it emulates: that is, that it acts exactly as the real part does, at least down to some small epsilon that is unavoidable because it is not in fact hardware.

So the first thing I did was to write a very simple C program in the trashy Arduino IDE that did nothing but increment a well-known memory location (0x500, if I remember correctly) by 1. I used the IDE to compile and upload the file to my ATmega2560 board, which thereupon–I assume–executed it. (Hard to tell: all it did was increment a memory location.)

Then I located the Intel .HEX format file that the IDE had sent over the wire, copied it into my project, and wrote a test to load that file into my as-yet-nonexistent emulator, set location 0x0500 to 42, run the loaded code in the emulator, and check that location 0x0500 now contained 43.  Simple, right?

Well, that test has been failing for a couple of weeks straight, now.  My mode of operation has been to run the test, look at the message (“Illegal instruction 0x940C at 0x0000FD”), use the AVR instruction set manual to figure out what instruction 0x940C is (it’s a JMP instruction), and implement that instruction.  Then I run the test again, and it works for every instruction it understands and blows up at the first one it doesn’t.  So I implement that one, and so forth.

Along the way, of course, there’s opportunity for all sorts of emergent design and refactoring.

At the moment, for instance, I’m stuck on the CALL instruction.  I have the CALL instruction tested and working just fine, but the story test fails because the stack pointer is zero–so pushing something onto the stack decrements it into negative numbers, which is a problem.  Why is the stack pointer zero?  Well, because the AVR architecture doesn’t have a specific instruction to load the stack pointer with an initial value, but it does expose the SP register as a sequence of input/output ports.  Write a byte to the correct output port, and it’ll appear as part of the two-byte stack pointer.

So I have the OUT instruction implemented, but so far (or at least until this morning), all the available input and output ports were just abstract concepts, unless you specifically connected InputRecordings or OutputRecorders to them.  The instructions to initialize the stack pointer are there in the .HEX file, but the numbers they’re writing to the I/O ports that refer to the stack pointer are vanishing into the bit bucket.  That connection between the I/O ports and the stack pointer (and an earlier one between the I/O ports (specifically 0x3F) and the status register) is more than an abstract concept, so I’m working on the idea of a Peripheral, something with special functionality that can be listed in a configuration file and hooked up during initialization to provide services like that.

Eventually, though, I’ll get the simple add application running in the emulator, by implementing all the instructions it needs and thereby all the infrastructure elements they need.

Then I plan a period for refactoring and projectization (right now the only way to build the project is in Eclipse, and you’ll find yourself missing the ScalaTest jars if you try).

After that, I want to write a fairly simple framework in C, to run on the Arduino, that will communicate with running tests on my desktop machine over the USB cable.  This framework will allow me to do two things from my tests: submit a few bytes of code to be executed and execute them; and ask questions about registers and I/O ports and SRAM and flash, and get answers to them.

Once I get that working on the Arduino (it’s going to be interesting without being able to unit-test it), I’ll grab the .HEX file for it, load it into my emulator, and go back to implementing more instructions until it runs.

After that, I’ll be able to write comparison tests for all the instructions I’ve implemented that far: use an instruction in several ways on the Arduino and make sure it operates as expected, then connect to the emulator instead of the Arduino and run the same test again.  With judiciously chosen tests, that ought to ensure to whatever small value of epsilon I’m willing to drive it to, that the emulator matches the part.

Anyway, that’s what I’ve been doing with my spare time recently, in the absence of insufficiently-Agile people to pester.

Travails of a Newly-Embedded TDDer

17 06 2012

So, as someone who has done a lot of building and repair of things strictly electrical, but who has done practically nothing having to do with analog or digital electronic hardware, I’ve taken delivery of an Arduino microcontroller board and have been playing around with it.  I’ve got a fairly decent breadboard, a set of wire jumpers, and a mess of resistors, capacitors, and LEDs.   I’ve put together several meaningless toy “devices,” ranging from a program that just repeatedly blinked “Shave And a Hair-cut, Two Bits” on an on-board LED to one that sequenced green outboard LEDs until you pulled a particular pin low, at which point it would sequence red outboard LEDs until you let the pin go, at which point it would go back to sequencing the green LEDs.

But as a TDD, I’m beginning to feel really, really crowded by the difficulty of test-driving embedded microcontroller code.  I just happened to find a bug in one of my programs that happened because I was refactoring without tests; if I had not just happened across it, there’s no reliable way I could have found it without tests.

So I’m thinking: If I could wish into existence whatever I wanted to support my Arduino development, what would I wish?

At first I thought, hey, this isn’t so hard.  Arduino provides me with a C library full of API calls like pinMode(), that sets a pin to input or output mode, digitalWrite(), that sets an output pin high or low, analogRead(), that reads a value from one of the ADC pins, and so on.  All I have to do is write an instrumented static-mock version of that library.  When I want to run tests, I can compile the microcontroller code for my development machine instead of for the microcontroller, link in my static mock instead of the Arduino library, and run tests with Google Test.  I can set up the static mock so that I can attach prerecorded “analog” signals (essentially sequences of floating-point numbers) to particular input pins, and attach “recorder” objects to particular output pins.  In the Assert portion of the tests, I can examine the recordings and make sure they recorded what should have happened.

But then I thought about timing.  Timing’s real important to microcontrollers.  For example, suppose I wanted to test-drive handling of the situation when an interrupt is received while a value is being written to EEPROM.  How am I going to arrange the input signal on the interrupt pin so that it changes state exactly at the right time during the run of the code under test?  Without access to the hardware on which that code is running, how do I even simulate an interrupt at all?

Now I’m thinking I need to write a complete software emulator of the microcontroller–keeping the idea of prerecording input signals and recording output signals during the test, and having asserts to compare them, but with debugger-like awareness of just what’s executing in the test: for example, you could have prerecorded signals where the numbers weren’t just numbers, but could be tagged with clock cycle counts or time delays, or even with something like “Fire this sample the third time execution passes address XXXX with memory location XXXX set to XXXX.”

There is at least one software emulator in existence for my microcontroller–the AVR ATmega2560–but it doesn’t offer the kind of control that a TDD (this one, anyway) would want.

But I’m thinking there’s got to be an easier way.

How Do You Measure Position on a Fingerboard?

17 06 2012

My first idea was just to put long ribbon controllers (like laptop touchpads, but long and skinny and sensitive only to one dimension) under the strings.  But there are two problems with that.  First, nobody seems to make ribbon controllers any longer than about 100mm.  Second, my bass strings can, over time, wear grooves in even ultra-hard ebony fingerboards.  I’d expect plastic ribbon controllers to last maybe ten minutes.

My second idea was to put a little piezoelectric transducer on the bridge end of the string and have them fire periodic compression pulses down the string, listen for the reflection from the point where the string is held down, and compute position from the delay.  But there three problems with that.  First, it’s going to be difficult to damp those pulses out at the bridge end; most likely, they’ll reflect off each end a couple of times before they disappear into the noise floor.  How am I going to tell the reflection of the pulse I just sent from the continuing echoes of its predecessors?  Second, holding a string against a fingerboard definitely stops transverse waves; but I have no reason for believing that it would cause a clean echo of a compression wave.  My instinct is that there’d be a very weak reflection from the fingered position followed by a nice sharp one from the other end of the string, so I’d have to find the weak, fuzzy one among all the sharp, clear ones.  Third, a compression pulse travels at about 20,000 feet per second down a steel string.  If I want, say, 1mm resolution in positioning, then even if the other two problems evaporate I’m going to have to be able to measure the delay with a granularity of 164 nanoseconds, which isn’t particularly realistic for the sort of microcontroller hardware one expects to find in a MIDI controller.

My third idea was to make the fingerboard of some hard, conductive metal and the strings of resistance wire–that is, wire that has a significant resistance per unit length, unlike copper or aluminum.  Then you could use an analog-to-digital converter to measure the resistance between the fingerboard and the bridge end of each string, and you’d get either infinity–if the string wasn’t being fingered at all–or a value roughly proportional to the distance of the fingered point from the bridge.

There are a whole host of problems with this third idea.

Resistance is affected by more than where the string is fingered.  For example, a small contact patch (light pressure) will have more resistance than a large contact patch (heavy pressure).  A place on the string or fingerboard with more oxidation will have more resistance than a place with less oxidation.  But…stainless steel doesn’t oxidize (much), and neither does nichrome resistance wire, if you’re not using it for a heating element.

Nichrome resistance wire doesn’t have as much resistance as one might imagine.  In one diameter that’s roughly the size of a middling guitar string, its resistance is about one third of an ohm per foot.  I can crank my ADC’s reference voltage down as far as 1.0V, but if I run 1.0V through about eight inches of wire (what I expect to be left over at the end of the fingerboard once a string is fingered in the highest position), Ohm’s Law says I’ll get 4.5 amps of current, which will burn my poor little ADC to a smoking crisp, as well as frying the battery and even heating up the strings significantly.  But…I could limit the current to one milliamp and use an operational amplifier with a gain of 1000 or so to get a low-current variable voltage between 0 and 1V for the ADC.

A/D converters don’t measure resistance, they measure voltage.  Sure, it’s easy to convert resistance into voltage; but it’s easy to convert all sorts of things into voltage, and much harder than you would think to avoid converting them–things like electromagnetic interference from elevators, power lines, air conditioners, and so on.  The signal arriving at the ADC is liable to be pretty noisy, especially given the required amplification.  But…I could add another string on the neck and put it somewhere where it couldn’t be played.  That string would be subject to all the same noise and interference as the one being played.  The only difference would be that it wouldn’t be being played; so you could use the differential mode of the op amp to subtract one from the other and cancel out the common stuff.

One idea I had to get rid of the amplifier, and all the noise it would amplify, would be to ditch the 16ga nichrome wire (about the size of a guitar string) and use 40ga instead: very skinny, with a significantly higher resistance per foot.  Then coat it with insulating enamel, like magnet wire, and wrap it around a central support wire made of steel or bronze, so that it looked very much like a real guitar string.  Then sand off the enamel from the outside of the string so that the nichrome can contact the steel fingerboard.  In this case, our nichrome wire is much thinner, with more resistance per foot, and much longer, because it’s wound in a helix instead of being straight.  I ran the numbers and discovered that I could expect about 10,000 ohms per foot for a string like that: almost perfect for the ADC.

Then I remembered that the resistance of the human body is about 10,000 ohms: the resistance of the system would be significantly modified if anyone touched it without insulating gloves.  So…ditch that idea.  (A 10,000-ohm parallel resistance will have no perceptible influence on the 1-ohm resistor you get from a three-foot 16ga nichrome wire.)

So there are problems with this third approach too, but all of them at least appear to be surmountable.  It might be worth a try.

Looks Like a New Project Coming Up

17 06 2012

So this is a little different subject.

In my life away from Agile software development, one of the things I do is play the electric bass.  Not particularly well, but it’s something I enjoy.  My favorite bass has six strings and no frets.

Since I’m a computer geek, though, I’ve been playing around with MIDI for years.

So for quite some time, I’ve been developing a blue-sky fantasy about a fretless stringed MIDI controller–that is, something that would have a user interface similar to that of a fretless bass (but better, because it wouldn’t have to tolerate certain physical realities that actual fretless basses do) and would allow the player to control a synthesizer.

One thing that any MIDI controller must be able to do is determine what pitch the user wishes it to direct the synth to play.  I’ve been able to find two different methods by which this is done by existing guitar-shaped MIDI controllers (EGSMCs).

The most common way is to use a real electric guitar with real guitar strings and a real pickup feeding a real note into an analog-to-digital converter and running a pitch-extraction algorithm on the resulting stream of samples. I don’t like this way because of the unavoidable inherent latency.  You have to identify the attack transient and wait for it to pass; then you have to identify at least two clean cycles of the actual note after it has begun to ring; then you have to do a Fourier transform to find the fundamental frequency; then you have to convert it into (at least) a three-byte MIDI message; then you have to transmit it over a serial link at just over 3000BPS; then the synthesizer has to initiate the note.  Even with a lot of processing power, that can be 50-60ms of latency or even more, especially with low-frequency notes.  It’s hard to play decent music when you have to wait that long after playing a note before hearing it.

The other way I’ve seen is to use a series of switches, one for each fret/string combination, that the player depresses in much the same way he’d press a string against a fret on a real instrument.  This provides considerably less latency, because the entire pitch-extraction computation is eliminated; but it doesn’t lend itself to fretlessness.

I’d like to see a MIDI controller that determined which MIDI note number it would send exclusively  by position rather than pitch, like the discrete-switch solution, but in which the position measurement would be continuous rather than discrete, so that if you wanted to you could send MIDI pitch-wheel messages as well as note-on messages.

If you could do this, one of the coolnesses that would result would be that note spacing would be entirely soft.

On a real stringed instrument, the note positions, whether fretted or not, are significantly further apart near the head (the small end of the guitar that usually has the tuning keys on it) than they are near the bridge.  This has two tradeoff effects, at least for a fretless instrument.  First, playing quickly is more difficult near the head, because one’s fingers have to span a much larger physical distance, meaning more hand movement.  Second, playing with good intonation (that is, in tune) is more difficult near the bridge, where a half-step interval might be less than half an inch wide, making the positional tolerances very small.

Also on a real stringed instrument, the available pitch range is determined in a minor way by the number of strings, but in a major way by the length of the fingerboard in relation to the strings.  Is the fingerboard half the length of the strings?  Then you can play an octave on each string.  Three-quarters the length of the strings?  Then you can play two octaves.  Want three octaves?  Then you need seven-eighths of the string length underlaid by fingerboard, meaning that the notes are extremely close together for that last octave, and there is very little room between the end of the fingerboard and the bridge for pickup electronics, and usually the body of the instrument has to be mutilated to allow the player’s hand to get to that last octave.  Four octaves?  Forget it.

On a MIDI controller that depended on position rather than pitch to determine note number, you could:

  • define whatever length of fingerboard you had to hold whatever number of notes you wanted it to;
  • space those notes however you wanted, with no regard for the physics of a vibrating string; and
  • map position to note number in whatever way you liked–specifically, you could round a fingered position up to the next note, the way a fret does, if you wanted to simulate frets.

You could make different definitions for each string, if you wanted, enabling simulation of a five-string banjo with its fifth short string.  You could do diatonic quantization, like a mountain dulcimer, where the frets are irregularly spaced to prevent playing non-scale notes.  You could space the notes uniformly from one end to the other, so finger spacing was always the same.  You could make some of the strings “fretted” and not the others, or you could make some or all of the strings “fretted” in some places and not in others.

You could set up several of these different arrangements and impose them successively in the course of a single song, if you wanted to, with simple MIDI program-change messages.

So being able to sense continuous position admits a host of wonderful possibilities; but how do we sense it?

Well, I’ve been thinking about that.

Riding Herd on Developers

28 03 2012

When I was a kid, I learned a song in Sunday school about motivation.  It was called The Hornet Song*, and part of it went like this:

If a nest of live hornets were brought to this room
And the creatures allowed to go free,
You would not need urging to make yourself scarce:
You’d want to get out; don’t you see?

They would not take hold and by force of their strength
Throw you out of the window; oh, no!
They would not compel you to go ‘gainst your will;
They’d just make you willing to go.

A client architect once told me, “We’ve been trying really hard to get on the software-craftsmanship bandwagon.  We’ve been having lunch-and-learns, and offering training, and everything we can think of; but maybe two or three people show up.”

When you’re a developer, and there are three or four layers of inspection and signoff between you and production (as there were in this case), there’s no real reason to care about things like software craftsmanship, unless you just naturally have a fascination with such things, in which case you’re certainly not going to be working in a place where there are three or four layers of inspection and signoff between you and production.

What makes you care about software craftsmanship is being right up against production so that the first person who sees what leaves your fingers is a customer, and the first person who gets a call when something goes blooie is you.

My architect friend said this as part of a conversation about whether IT should control the dev teams (and through the dev teams the business interests they serve) or whether the business interests should control the dev teams (and through the dev teams IT).

He objected that unless IT rides very close herd on the dev teams, they all do different things, and build engineering and operations becomes an intractable nightmare.

This struck me as a perfect case of trying to hang onto the soap by squeezing it tighter.

He was describing a classic case of moral hazard.  In his world, it was the developers’ job to make a horrible mess, and the job of build engineering and operations to clean it up; and the only way that job would ever be small enough to be kept under control by build engineering and operations was to bear down ever harder on developers.

As always in a situation like this, the real solution will turn out not to be figuring out how to cram one party into an ever tinier, more inescapable prison, but how to eliminate the moral hazard that creates the problem in the first place.

First, let’s talk about build engineering.  What is build engineering?  In this case, it was one huge farm of CI servers that knew about every project in the company, and a predetermined set of builds and test environments that every checkin from everywhere automatically went through, and a group of people to oversee and maintain this system and explain rather impatiently to dev teams why (or at least that) no, they couldn’t do that either, because the build system wasn’t set up that way.

I watched the reaction in that company to the news that a dev team was planning to set up its own team-local CI server, with a bunch of VMs to simulate the application environment for testing, so that it could run certain tests after every checkin that the company-wide system could only run once a day.  There was a moment of slack-jawed horror, followed by absolute, nonnegotiable, and emphatic prohibition–even though a team-local CI server would not require the least change or modification to any code or configuration or practice having to do with the main system.

If the build engineering team reacted this way to a suggestion that doesn’t involve them doing anything different, imagine their reaction to a suggestion that they should change something!

In their defense, though, it’s not hard to understand.  As any developer knows, build is a terrifically delicate process, especially when it involves enough configuration to get a nontrivial application to run successfully in a variety of different environments (on the order of a dozen, as I remember) with different levels of mocking each time.  Getting that all working once is already something to write home about: imagine getting it working—and keeping it working—for 150 completely unrelated applications whose natures, characteristics, and requirements you know next to nothing about!

It strikes me as being about as difficult as playing all the positions on a baseball team simultaneously.

Which is why, in the real world, we have baseball teams.  The job of playing all the positions at once doesn’t exist, because there’s no need for it, once we’ve got the first baseman playing first base and the pitcher pitching and the catcher catching and the outfielders outfielding and so on.

In my opinion, that whole huge company-wide build system should have been thrown out—or at least broken up into tiny pieces—and the individual teams should have been responsible for their own CI and testing processes.  Those processes undoubtedly would have been completely different from team to team, but that would have been a good thing, not a bad thing, because it would mean each team was doing what was best for its project.

I suspect my architect friend would have protested that a huge build system like that was the only way they could be sure that every team put their code through all the proper testing environments before it went into production.

My response would have been that the most important thing was not that all code go through every environment envisaged by the build engineers, but that all code worked properly in production.  Put the dev teams right up against production, as suggested above, and they’ll find a way to make sure their code gets whatever testing it needs, or else they’ll be doing 24/7 production support instead of software development.  (That‘ll get ’em to your lunch-and-learns, for dang sure.)

But what about operations?  Do we solve that problem by putting the dev teams in charge of operations as well?

I don’t think so—not in most cases.  It’s an issue of boundaries.  Just as it’s a moral hazard for the build team to be compelling all sorts of behavior inside the dev teams, it’s a moral hazard for the dev teams to be specifying behavior inside the operations team.

The operations team should say, “Based on our skillset, expertise, and practical constraints, here’s a short list of services we can provide to you, along with the technologies we’re willing to use to provide them.  This list is subject to change, but at our behest, not yours.”  The dev teams should design their CI processes to spit out something compatible with the services provided by the operations team, if at all possible.

When a dev team can’t do without something operations can’t provide—or could theoretically provide but not fast enough—that’s when the dev team needs to think about doing its own operations temporarily; but that’s a sign of a sick organization, and one way or another, that situation probably won’t be allowed to persist long.

To sum this all up, morale is very important on a dev team.  You don’t want developers who lie in bed staring at the brightening ceiling thinking to themselves, “I wonder what insufferable crap they’re going to make me do today.”  You want developers who say, “If my pair and I get our story done early today, I’ll get to spend some time playing with that new framework the guy tweeted about last night.”

To be in a state of high morale, developers need to be constantly in an innovative state of mind, slapping down challenges right and left and reaching for ever more velocity multiplication in the form of technology, skill, and experience.  (Notice how I didn’t mention super-high salaries or super-low hours or team-building trips.)

You don’t make or attract developers like this by insulating them from the real world with signoffs and approvals and global build systems and stupid, counterproductive rules, and then imposing a bunch of stuff on them that—even if it’s really great stuff, which is rare—they’ll resent and regard very suspiciously and cynically.

You make them by tossing them into the deep end—preferably with several senior developers who already know how to navigate the deep end comfortably—and making sure that when they grab out for something to stay afloat, it’s there for them.  (If they do the grabbing for it, they’re not going to see it as an imposition.  See?)

The term “riding herd” comes from the great cattle drives when America was young, where several dozen cowboys would drive a herd of thousands of dumb cattle a distance of thousands of miles over several months to be slaughtered.

Do you want developers like those cattle?

Then ride herd on them, like the cowboys did.

Otherwise don’t.

Update 3/30/2012:

My architect friend saw this article and got in touch with me about it.  We discussed what he said was the major reason for all the heavyweight ceremony and process in his company: Sarbanes-Oxley.  SarbOx means that the company CFO is accountable for everything individual developers do, and if they do something the government doesn’t like, he goes to prison: hence, he is pretty much required to ride herd on them.

SarbOx is an area that I haven’t yet been seriously involved in.

I understand that it’s a government’s job to kill people, devastate lives, destroy liberties, and generally cock things up as far as it can without its politicians ending up swinging from lampposts in droves; but it’s also true that for decades now software developers have been wading into cocked-up situations with the help of domain experts and making them convenient, fast, and smooth.

Is SarbOx really such a competently-conceived atrocity that even Agile developers can find no way around, through, or over it?  Somehow it seems unlikely to me that politicians could be that smart.

*The complete Hornet Song used the quoted analogy to explain the behavior of the Hivites, Canaanites, and Hittites in Exodus 23:28, and the behavior of Jonah in Jonah 3:3.

Moral Hazard: The Implacable Enemy of Agile

25 03 2012

This is from Wikipedia:

In economic theorymoral hazard is a tendency to take undue risks because the costs are not borne by the party taking the risk. The term defines a situation where the behavior of one party may change to the detriment of another after a transaction has taken place. For example, a person with insurance against automobile theft may be less cautious about locking their car, because the negative consequences of vehicle theft are now (partially) the responsibility of the insurance company. A party makes a decision about how much risk to take, while another party bears the costs if things go badly, and the party insulated from risk behaves differently from how it would if it were fully exposed to the risk.

In more general terms, moral hazard is a situation where it is the responsibility of one person or group to make a mess, and the responsibility of another person or group to clean it up.

For example, for a limited time while I was a kid, it was my job to play with the toys in my room, and my mother’s job to keep the floor uncluttered enough that she wouldn’t fall and break her neck when she had to walk across it in the dark.

So…who won, do you think?  Was the floor always clean, or was the floor always messy?

Of course the floor was always messy.  It was always messy until a rule was made–and enforced–that if my room wasn’t cleaned by a specified time, then I either wouldn’t get something I wanted or would get something I didn’t want.  (Decades later, I honestly don’t remember which.)  The trick was to take the incentive out of the wrong place and put it into the right place.

But this post isn’t about parenting, it’s about company culture.  Here are some more relevant examples of moral hazard:

  • A company has a maintenance team–or a group of them–whose job it is to jump on production problems and fix them when they crop up.  That is, it’s the development team’s job to create production problems and the maintenance team’s job to fix them.
  • Architects come up with designs, which developers then implement.  That is, it’s the architects’ job to create dozens of small but annoying and confusing incompatibilities and discrepancies between their overarching concept of how the system should work and the reality of how the system actually does work, and the development team’s job to make that design work anyway (in the process creating even more discrepancies between the architects’ picture of the system and reality).
  • Some uber-architect in the IT organization writes a Checkstyle (or other) coding standard and mandates that all code must pass the standard before it’s allowed into production.  That is, it’s the uber-architect’s job to create thousands and thousands of build errors, and the development teams’ job to clean them up.
  • Technical people outside the development team must sign off on all code before it goes into production.  That is, it’s IT’s job to create deployment delays at the last moment based on various criteria that have little or nothing to do with actual business functionality, and the product owner’s job to keep customers from leaving because of those delays.
  • Agile coaches explain all the things that each part of the Agile process is supposed to accomplish, and their clients take them seriously.  That is, it’s management’s job to spend a large portion of the development team’s time on meaningless meetings, and development’s job to deliver anyway.
  • Ideas from the dev team for making things better better or easier or faster or more efficient are submitted in writing to a special committee, which at its next meeting–or the one after that, or at the very latest the one after that–will gravely consider them and all their associated consequences.  If an idea is judged to be worthy, the machinery for imposing it on the entire organization will be set in motion as soon as the proper approvals can be applied for and awarded.  That is, it’s the committee’s job to impose incomprehensible, irrelevant, and counterproductive regulations on distant teams that already had an environment all set up and working for them, and the job of those teams to deal with the unintended consequences.

Moral hazard puts incentives in the wrong places for getting anything done.  It destroys morale, it dulls the wits, and it creates endless loops of excuses.  For example:

Product owner: “We didn’t go into production over the weekend?  Why not?”

Development team: “It was the operations team’s fault: they rejected our package.”

Product owner: “Why did you reject their package?”

Operations team: “It wasn’t our fault: the Checkstyle analysis blew up with over 700 coding-standard violations.”

Product owner: “Why did you submit a package with over 700 coding-standard violations?”

Development team: “It’s the architect’s fault: there’s no way we can track down and fix that many violations and still get your functionality into production on time.”

Product owner: “Why did you create a Checkstyle configuration that was so out of whack with the way our team codes?”

Architect: “It’s the development team’s fault: we have to have some standards, otherwise we’ll end up with a whole base of impenetrable code that nobody dares change!”

Product owner: “Why don’t you have any standards?

Development team: “Uh…”

The real answer to that one, of course, is one that the development team probably won’t give: coding standards are somebody else’s responsibility, not theirs, so they aren’t used to thinking about it and don’t really care.  Likewise things like simple design.  They’re not responsible for the design, and making official changes to the design is so painful and time-consuming that it’s much easier not to think about design at all, and just hack in fixes where the official design doesn’t fit reality.  Testing?  Sure, they’ll have to make a token effort in that direction, but as long as the Emma coverage numbers are such that the code passes the externally-imposed checks, it doesn’t really matter how good or bad the tests are, because if there are problems the maintenance team will catch them.

In a situation like this, the drive to excellence inside the team is largely extinguished, for two complementary reasons.  First, the culture is designed so that excellence is defined by people outside the team, so that any ideas the team has to make things even better are therefore by definition not excellence (or else said people outside the team would already have had them), so they are resisted or even punished.  Second, it’s just not the way things are done.  Mediocrity is a way of life, and anything completely intolerable will be taken care of by somebody else in some other layer of company bureaucracy.

Any team member who has a hunger for excellence must surmount both the why-bother and the you’re-gonna-get-in-trouble obstacles before he can accomplish anything.

The thing that has made Agile so successful, in the places where it has been successful, is the fact that it puts the incentives in the right places, not the wrong places.  To the greatest extent possible, the rewards for your successes and the consequences for your failures come directly to you, not to somebody else.  You are given not only the power but also the incentive to better yourself, the project, and the organization, and whatever stands in your way is made big and visible, and then removed with alacrity and enthusiasm.

Many organizations believe that they have adopted Agile by instituting certain of the practices that Agile coaches tell them are part of Agile, but they retain a company culture shot through with moral hazard where many of the incentives are counterproductive.

For example, in a truly Agile environment:

  • There is no separate maintenance team.  The development team is responsible for a project from conception to end-of-life.  If there’s a production problem that means somebody rolls out of bed at 3:00 in the morning, then it’s somebody from the development team responsible for that project.  If the project is so buggy that its dev team spends 80% of its time fixing production issues, then until the project is much healthier, that dev team isn’t going to get many new projects.  Hence, the dev team is going to do everything it possibly can by way of testing to make sure the project is ready for production, and by way of logging so that if there are issues forensics will be quick and easy, and by way of simple and open design so that any problems can be fixed fast.  And they’ll do this without any scowling remonstrances from a stern-faced architect outside the team, because the built-in reward for success and consequence for failure is worth much more to them than anything he could offer anyway.
  • Developers come up with designs, which they then vet–should they decide it’s necessary to avoid becoming a maintenance team–by consulting with architects who know the whole system better than they do.  The architects have no power or responsibility to approve or reject; their job is merely to act as information resources.  Hence, the designs will be better and simpler, because they spring up from reality, rather than down from an ideal.  The architects will know more about what’s really going on, because they’ll be doing more listening and less telling–and because if they don’t, development teams will prefer to interact directly with other development teams to find out what they need to know and make the highly-paid, expensive architects superfluous.
  • If Checkstyle or one of its competitors is used at all, it’ll be used because the team has decided it will be helpful, and the standard it enforces will be tailored to the team by the team, not imposed on the entire organization by somebody with a Vision who actually writes very little code.
  • The product owner, not anyone in IT, decides when the code goes into production and is responsible for the result.  If the product owner puts code in too soon, he’ll be responsible for the negative experience the customers have: but that’s appropriate, because they’re his customers and he knows what’s important to them.  At least he’s not likely to put the code in too late, because as the product owner he knows which barriers to production are important to his customers and which they don’t care about.  Wanna place any bets on where he’ll stand on Checkstyle violations?
  • The team’s cadence–iteration length, end/start day, standard meetings, impromptu meetings, etc.–is determined by the team based on the requirements of the project and the needs of the product owner.  Fly-by-night Agile coaches may pass through and provide consultation and advice, but it’s the team–not management, not IT–and the product owner that decides what works best for the project.
  • Finally, anyone on the team who has an idea to make things better or easier or faster or more efficient can directly implement that idea for the team immediately, with no submission process or lengthy approvals more involved than bringing it up after standup one day and getting thumbs-up or thumbs-down.  Whatever it is, it doesn’t have to affect or be imposed upon anyone else in the company.  If it turns out to be a good idea, other teams will notice, inquire, and imitate: voluntarily, on their own, because they’re attracted by it.  If it turns out to be a bad idea–and let’s face it, nobody can say for sure whether a newly suggested practice will have unforeseen and unintended consequences–then it will slow down only one team for an iteration or two before being abandoned, and the company will cheaply learn something valuable about what doesn’t work.

If a company labors to adopt Agile, but insists on keeping the moral hazard in its culture, the change will be limited to a few modifications in practice, but no significant increase in velocity, efficiency, quality, or morale.  Furthermore, people in an environment like this will look around and say, “So this is Agile, huh?  What a crock!  This sucks!”

But what if the company is also willing to put the incentives where they belong, as well as adopting practices that those incentives have shown to be useful in other organizations?  What if IT is told that its purpose is to serve and support development teams as they direct, and the development teams are told that if IT doesn’t support them properly, they’re to bypass it and do whatever’s necessary to get into production?  What if development teams are told that their purpose is to serve and support the product owner, and that if they don’t satisfy their product owner, he’ll go find another team?  What if the product owner is told that his purpose is to serve the business, and any demand made on him by development or IT that doesn’t move toward satisfying the needs of the business can be rejected out of hand?

In a situation like that, moral hazard will quickly become Big And Visible, and it can be dealt with promptly.

As a matter of fact, one might say that an Agile transformation should consist chiefly of a quest to expose and eliminate moral hazard, and that the various practices for which Agile is so well known will automatically trail along behind because they’re the best way anyone’s found so far to operate in a low-moral-hazard environment.

If you adopt economical driving habits, you’ll end up putting less gasoline in your tank.  But if you skip past the economical driving habits and just put less gas in your tank, you’ll end up muttering grim imprecations as you trudge down the highway with a gas can.