Massive Downside? Massive Downside?

14 05 2016

A friend of mine sent me a link to this article in Inc. magazine by Adam Fridman: The Massive Downside of Agile Software Development.  Since I’ve been doing Agile software development now for twelve or thirteen years, I was eager to learn about this massive downside.

Here’s what he has to say, in five points.

1. Less predictability.
For some software deliverables, developers cannot quantify the full extent of required efforts. This is especially true in the beginning of the development life cycle on larger products. Teams new to the agile methodology fear these unknowns. This fear drives frustration, poor practices, and often poor decisions. The more regimented, waterfall process makes it easy to quantify the effort, time, and cost of delivering the final product.

For all software deliverables, developers cannot quantify the full extent of required efforts.  This is because every time a developer builds something, it’s the first time he’s ever built it.  (If it weren’t the first time, he’d just use what he built the first time instead of doing it over.)  If he is very experienced with similar things, he might have an idea how long it will take him; but he’ll still run into unfamiliar issues that will require unanticipated effort.

What the more regimented, waterfall process makes it easy to do is lie about the effort, time, and cost of delivering the final product, and maintain the illusion until nearly the end of the project, which is where all the make-or-break emergencies are.  Anyone who estimates that a software project will reach a specified scope in eighteen months is just making stuff up, whether he realizes it or not.  Heck, the team I just rolled off made capacity estimates every two weeks for almost a year, and hit it right on the nose only once.  And that time it was probably accidental.

If a collection of actual boots-in-the-trenches developers in the middle of a project can’t give accurate estimates for two weeks in the future, then a project manager isn’t going to be able to give accurate estimates for eighteen months in the future before anybody really knows what the project will involve.

However, we were able to give data–real historical data about the past, not blithe fantasies about the future–on those discrepancies to our product owners every two weeks.  Agile teams are no more able to make long-range predictions than waterfall teams are: but at least they’re honest about it.

2. More time and commitment.

Testers, customers, and developers must constantly interact with each other. This involves numerous face-to-face conversations, as they are the best form of communication. All involved in the project must have close cooperation. Daily users need to be available for prompt testing and sign off on each phase so developers can mark it off as complete before moving on to the next feature. This might ensure the product meets user expectations, but is onerous and time-consuming. This demands more time and energy of everyone involved.

3. Greater demands on developers and clients.

These principles require close collaboration and extensive user involvement. Though it is an engaging and rewarding system, it demands a big commitment for the entirety of the project to ensure success. Clients must go through training to aid in product development. Any lack of client participation will impact software quality and success. It also reflects poorly on the development company.

I think these are both good points.  You should only expend real effort on software projects you want to succeed.  The ones you don’t care about, you shouldn’t waste the testers’ or customers’ time on.

Or the developers’, either.

4. Lack of necessary documentation.

Because requirements for software are clarified just in time for development, documentation is less detailed. This means that when new members join the team, they do not know the details about certain features or how they need to perform. This creates misunderstandings and difficulties.

Have you ever been a new member joining a development team?  Me too.  Have you been a new member joining a development team that has its codebase documented?  Me too.  Have you ever gotten any information out of that documentation that you were confident enough in to use without having to ask somebody else on the team whether it was obsolete or not?  Me either.

Comprehensively documenting an emerging system on paper is a losing proposition that turns into a money pit and a useless effort.  Comprehensively documenting a nonexistent system on paper is even worse.

You know what kind of documentation of an emerging system isn’t useless?  Properly written automated tests, that’s what kind.  First, they’re written not in prose that has to be translated in an error-prone operation to technical concepts in the reader’s head, but in the same code that’s used to represent those technical concepts in the codebase the reader will be dealing with.  Second, they’re always up to date, never obsolete: they have to be, or they’ll fail.

And if you want new members to come up to speed quickly, don’t give them technical documentation–even clearly written, up-to-the-minute technical documentation.  Instead, pair them with experienced team members who will let them drive.  That’s the fastest way for them to learn what’s going on: much, much faster than reading technical documentation–or even automated tests, for that matter.

Can’t spare the time to pair?  Deadline too close?  Need everyone on his own computer to improve velocity?  Well, first, you don’t understand pairing; but that’s a separate issue.  Here’s the point: your new guy is going to be pestering your old guys one way or another, whether he’s trying to find out which parts of the technical docs are obsolete or whether he’s officially pairing with them.  Pairing is much faster.

5. Project easily falls off track.

This method requires very little planning to get started, and assumes the consumer’s needs are ever changing. With so little to go on, you can see how this could limit the agile model. Then, if a consumer’s feedback or communications are not clear, a developer might focus on the wrong areas of development. It also has the potential for scope creep, and an ever-changing product becomes an ever-lasting one.

Is the implication here that the waterfall model handles this situation better?


In a properly-run Agile project, there is no predetermined track to fall off.  The project goes where the customers take it, and the most valuable work is always done first.  If the communications are not clear, the discrepancy shows up immediately and is instantly corrected.  There is no scope creep in an Agile project, by definition: we call it “customer satisfaction” instead.

Since the most valuable things are done first, the product is finished when either A) the money runs out, or B) the business value of the next most valuable feature is less than what the developers would cost to develop it.

On the other hand, if the customers continue to point out more business value that can be exploited by further development, that’s a good thing, not a bad thing.  The customers are happier, the company’s market share increases, and the developers continue to have interesting work to do.

Now, the fact that I disagree with much of what Mr. Fridman says in his article should not be taken to mean that I don’t think Agile has no downside.  I think it has at least two major problems; but Mr. Fridman’s article didn’t mention either of them.

Agility Isn’t For Everyone

29 04 2016

My post of less than 24 hours ago, The Only Legitimate Measure of Agility, has drawn some interesting comments.  (I mean real, face-to-face, vocal audio comments, not website or social media comments.)

For example, one fellow said to me, “What about a project to develop a new passenger jet?  What with all the safety concerns and government regulations and mountains of approvals such a thing has to go through, in addition to the fact that there’s no use putting a passenger jet into production before it can fly, means that you might not be able to release for ten years. However, you can still bring practices like TDD and demos and retrospectives to bear on such a project.  Your 1/w formula needs some kind of a scaling factor for projects like that.”

But no.

TDD and demos and retrospectives are practices, not agility.  Agility is frequently releasing to paying customers so as to get fast feedback to quickly fold back into the product and keep it relevant so that the money keeps rolling in–or to identify it rapidly as a bad idea and abandon it.

And you can’t do that with a passenger jet.  You can’t.  There’s no way.  Can’t be done. (…but see update below.)

There are plenty of projects in the industry today that could be made agile, either easily or with a bit of skull sweat, if the companies weren’t so huge and sluggish and shot through with enterprise corporate politics and perversity.  But the development of a new passenger jet isn’t one of them.

Therefore, the development of a new passenger jet can’t be agile.  (Or, to be more precise, if it takes ten years to develop, it can be at most 0.19% agile.)  That’s not a condemnation of the company or the business team or the developers; it’s a simple statement of fact.  If you want to be agile, then find (or start) a project that’s not developing a new passenger jet.

(Of course, once you have the jet, you might well be able to mount agile efforts to enhance and improve it.)

But while those practices aren’t the same as agility, they still bear talking about.  Where did those practices come from?  They came from agile teams who were desperately searching for ways to sustain their agility in the face of a high-volume cascade of production problems such as their waterfall predecessors never dreamed about.  They were invented and survived because they work.  They can produce fast, close-knit product teams who churn out high-quality, dependable code very quickly.

And any project, agile or not, can benefit from a product team like that.  Their practices are good practices, and (when used correctly) should be commended wherever they appear, and encouraged wherever they don’t.

Agility isn’t for everyone, but good practices are…or should be.


UPDATE: I just thought of a way the development of a new passenger jet might be agilified.

Manufacturers frequently (always?) accept orders for newly-designed airplanes years before they go into production, and such orders come with a significant amount of deposit money attached. This is real money from real customers.

Perhaps simulators could be devised, well before any aluminum was extruded from any furnaces anywhere, to demonstrate the anticipated experiences of the passengers and the crew and the mechanics and the support personnel and so on, such that the real code under development running in these simulators would give a reasonably faithful rendition of anticipated reality.

Releasing to these simulators, then, might qualify as releasing to a kind of production, since good experiences would lead to more orders with deposits, and changes to the simulated experiences would produce definite feedback from real customers with real money at real risk.  You could come up with a pretty tight feedback loop if you did something like that…and probably put a serious competitive hurtin’ on sluggish corporate government contractors like Boeing or Lockheed-Martin who will dismiss it as video-game nonsense.

Maybe a stupid thought, but…a thought, at least.


The Only Legitimate Measure of Agility

29 04 2016

“Oh no you di’unt.  I know you didn’t just say there’s only one legitimate measure of agility!”

Oh yes I did.  That’s exactly what I did.  Not only is there just one legitimate measure of agility, but it’s a very simple measure, requiring only one metric, and a completely objective one at that.

“Obviously, you don’t understand.  Agility is a tremendously complex proposition, requiring many moving pieces, and any attempt to measure it is going to have to take into account at least dozens if not hundreds of different metrics, which will be different from methodology to methodology, and most of those metrics will be untidily subjective!”


An agile culture is indeed tremendously complex, but I’m not going to measure the culture. I’m going to measure the success of the culture, which is a simpler task.


Agility is a measure of how frequently new functionality is released to customers customers customers in production production production.  Significantly, this is not how frequently it’s released to the business in the UAT region.

Why customers customers customers?  Why not the business?

Those of us who are developers are familiar with the phenomenon, discovered only after wading through thundering torrents of human misery, that it’s difficult-bordering-on-impossible for developers to understand what the business wants from them without an iterative process of stepwise refinement.  That’s why we have the business in the room with us.  That’s why we’re constantly pestering them with questions.  That’s why trying to cope without a product owner is such a disastrous idea.

But as hard as it is for us developers to understand what the business wants, it’s at least as hard for the business to understand what the market wants without an iterative process of stepwise refinement.  Harder, because they have not only to know what the market wants, they have to predict what it will want when the software is finished.  The business is undoubtedly better than we developers are at predicting the market, but that’s not saying enough.  Without iterative stepwise refinement of the project’s goals, it will end up being much less responsive to customer’s needs–and therefore much less valuable–when it is finally released. If it is finally released.

Of course, that iterative process of stepwise refinement for the business is frequent releases to production. This is so that the customers, who pay the bills, can see and react to the trajectory of the product, and the business can either A) adjust that trajectory for better effect, or B) abandon the project, if it turns out to have been a bad idea, early before millions of dollars are irretrievably gone.

That’s what the word “agility” means: the ability to respond quickly to unexpected stimuli and change direction rapidly.

The only legitimate measure of agility is this simple formula:


where w is the number of weeks between releases of new functionality to production.

If you release every week, you’re 100% agile.

If you release every month, you’re 23% agile.

If you release every year, you’re 1.9% agile.

If you release twice a day, or ten times a week, you’re 1000% agile!

I know what you’re thinking.  You’re thinking, “What about the single startup developer with execrable engineering practices whose production server is running on his development machine, and who ‘releases’ every time he commits a code change to source control?  Are you going to try to claim that he’s agile?”


Yes I am.

He may not be very good at agile, but he definitely is agile, responding constantly to customer needs.  He’s certainly more agile than a big company that releases only once a year but has dutifully imposed all the latest engineering best practices on its developers.

And if he doesn’t go out of business, it will be his very agility that will force him to develop and adopt better engineering practices, purely in self-defense.  Which is to say, enthusiastically and wholeheartedly, as opposed to the way he’d have adopted them if a manager forced them on him.

You see, that’s the way it works.  Demos aren’t agility. Retrospectives aren’t agility. Pair programming isn’t agility. Not even test-driven development is agility. Releasing to production is agility, and all those other things are supportive practices that have been empirically demonstrated to enable agility to be sustained and increased.

A client says, “If everything goes right, we release about once a year.  We’d like to be more agile, but we have a lot of inertia, and we’d like to take it slow and gradual.  So we put up this card wall, and now we’re considerably more agile.”


No they’re not.

They were 1.9% (max) agile before they put up the card wall, and they’re 1.9% (max) agile now. The card wall may or may not be a good idea, depending on how they use it, but it has zero effect on their agility.

I think agile transformations ought to start on the other end.

Do whatever it takes to develop the ability to release to production every week, and then start doing it.  Every week.

If you have new code since last week to release, then release it.  If you have no new code, then re-release exactly the same code you released last week, and complain bitterly to everyone within earshot.

Presently, perhaps with the help of a powerful director or vice president or executive, you’ll start getting some code.  Probably it won’t be very good code, and minutes after you release it you’ll have to roll it back out and return it for repair; but you’ll be agile, and your agility will drive the development team to adopt the engineering practices necessary to support that agility.

That’s the way we do things in an agile culture, isn’t it? We try something for a short time to see if it works, and if it fails we figure out what to do to improve the process and try again.

Maybe the dev team will choose to adopt card walls and demos and retrospectives and TDD and continuous integration and all the rest of the standard “agile” practices, and one or more of us will get rich teaching them how.

Or…maybe…they’ll come up with something better.


TDD: What About Code You Need But Have No Test For Yet?

26 07 2015

I spend a lot of time teaching TDD to other developers.  Here’s a situation I run into a lot.

Given an opening test of this sort:

public void shouldReturnChangeIfPaymentIsSufficient () {
    int price = 1999;
    int payment = 2000;

    int result = subject.conductTransaction (price, payment);

    assertEquals (1, result);

my mentee will begin supporting it like this:

public int conductTransaction (int price, int payment) {
    if (price >= payment) {

and I’ll stop him immediately.

“Whoa,” I’ll observe, “we don’t have a test for that yet.”

“Huh?” he says.

“We never write a line of production code that’s not demanded into existence by a failing test,” I’ll say. “That if statement you’re writing will have two sides, and we only have a test for one of those sides. Before we have that other test, we can’t write any code that steps outside the tests we do have.”

So I’ll erase his if statement and proceed like this:

public int conductTransaction (int price, int payment) {
    return price - payment;

We run the test, the test passes, and now we can write another test:

public void shouldThrowExceptionIfPaymentIsInsufficient () {
    int price = 1999;
    int payment = 1998;

    try {
        subject.conductTransaction (price, payment);
        fail ();
    catch (IllegalArgumentException e) {
        assertEquals ("Payment of $19.98 is insufficient to cover $19.99 charge");

Now we have justification to put that if statement in the production code.

Frequently, though, my mentee will be unsatisfied with this. “What if we get distracted and forget to add that second test?” he’ll ask. “We’ll have code that passes all its tests, but that is still incorrect and will probably fail silently.”

Until recently, the only response I could come up with was, “Well, that’s discipline, isn’t it? Are you a professional developer, or are you a hobbyist who forgets things?”

But that’s not really acceptable, because I’ve been a developer for four and a half decades now, and while I started out doing a pretty good job of forgetting things, I’m getting better and better at it as more of my hair turns gray.

So I started teaching a different path. Instead of leaving out the if statement until you (hopefully) get around to putting in a test for it, put it in right at the beginning, but instrument it so that it complains if you use it in an untested way:

public int conductTransaction (int price, int payment) {
    if (price < payment) {
        throw new UnsupportedOperationException ("Test-drive me!");
    return price - payment;

Now you can wait as long as you like to put in that missing test, and the production code will remember for you that it’s missing, and will throw you an exception if you exercise it in a way that doesn’t have a test.

That’s better.  Much better.

But there are still problems.

First, “throw new UnsupportedOperationException ("Test-drive me!");” takes longer to type than I’d like.

Second, unless you want to string-search your code for UnsupportedOperationExceptions, and carefully ignore the ones that are legitimate but not the ones that are calling for tests (error-prone), there’s no easy way to make sure you’ve remembered to write all the tests you need.

So now I go a step further.

Somewhere in most projects, in the production tree, is at least one Utils class that’s essentially just an uninstantiable bag of class methods.  In that class (or in one I create, if that class doesn’t exist yet) I put a method like this:

   public void TEST_DRIVE_ME () {
        throw new UnsupportedOperationException ("Test-drive me!");

Now, whenever I put in an if statement that has only one branch tested, I put a TEST_DRIVE_ME() call in the other branch.  Whenever I have to create an empty method so that a test will compile, instead of having it return null or 0 or false, I put a TEST_DRIVE_ME() in it.

Of course, in Java you still have to return something from a non-void method, but it’s just compiler candy, because it’ll never execute after a TEST_DRIVE_ME(). Some languages are different; for example, in Scala you can have TEST_DRIVE_ME() return Nothing—which is a subclass of every type—instead of being void, which makes things even easier.

Are you about to complain that I’m putting test code in the production tree, and that that’s forbidden?

Okay, fine, but wait to complain just a little longer, until after the next paragraph.

The really cool part is that when you think you’re done with your application, and you’re ready to release it, you go back to that bag-of-methods Utils class and just delete the TEST_DRIVE_ME() method. Now the compiler will unerringly find all your untested code and complain about it–whereupon you can Ctrl-Z the TEST_DRIVE_ME() back in and go finish your application, and then delete it again when you’re really done.

See? No test code in the production tree!

Spluttering With Indignation

26 03 2015

I got so angry at the late Sun Microsystems today that I could barely find words.  I’ll tell you why in a minute.

So…I’ve been working on some spike code for a system to generate webservice mocks (actually combination stub/spies, if I have my terminology correct) to use in testing clients.

Here’s the idea.

If you have the source code for an existing web service, or if you’re developing a new web service, you use @MockableService and @MockableOperation annotations to mark the classes and methods that you want mocks for.  Then you run a command-line utility that searches through your web service code for those annotations and generates source code for a mock version of your web service that has all the marked services and operations.

Consider, for example, one of these @MockableOperations that responds to an HTTP POST request.  The example I used for my spike was hanging ornaments on a Christmas tree.  The URL of the POST request designates a limb on a tree, and the body of the request describes in JSON the ornaments to be added to that limb.  The body of the response contains JSON describing the resulting total ornamentation of the tree. The generated mock version of this operation can accept four different kinds of POST requests:

  1. X-Mockability: clear
  2. X-Mockability: prepare
  3. X-Mockability: report
  4. [real]

“X-Mockability” is a custom HTTP header I invented just for the purpose of clandestine out-of-band communications between the automated test on the front end and the generated mock on the back end, such communications to be completely unbeknownst to the code under test.

If the incoming POST request has X-Mockability of “prepare” (type 2 above), the generated mock knows it’s from the setup portion of the test, and the body of the request contains (encapsulated first in Base64 and then in JSON) one or more HTTP responses that the mock should remember and then respond with when it receives real requests with no X-Mockability header (type 4 above) for that operation.

If the incoming POST request has X-Mockability of “report” (type 3 above), the generated mock knows it’s from the assert portion of the test, and it will send back an HTTP response whose body contains (again, encapsulated in Base64 and JSON) a list of all the real (type 4) HTTP requests that operation has recently received from the code under test.

If the incoming POST request has X-Mockability of “clear” (type 1 above), the generated mock will forget all about the client sending the request: it will throw away all pending prepared responses and all pending recorded requests.  In general, this is the first request a test will make.

And, of course, as has been mentioned, if there is no X-Mockability header at all, the generated mock knows that the request is genuine, having originated from code under test, and it responds with the prepared response that is next in line, or–and this is key for what follows–a 499 response (I made up that status code) saying, “Hey, I’m just a mock and I’m not prepared for that request!” if there is no prepared response.

Pretty cool, right?  Can you see the problem yet? No?  Don’t feel bad; I hadn’t by this point either.  Let me go a little further.

I wrote a sample client that looked at a pre-decorated Christmas tree (using GET on the tree limb and expecting exactly the same kind of response that comes from a POST) and decided whether the ornamentation was Good, TopHeavy, BottomHeavy, or Uneven.  Then I wrote an automated test for the client, meant to run on a mock service. The test used “X-Mockability: prepare” to set up an ornamentation response from the GET operation that the code under test would judge to be BottomHeavy; then it triggered the code under test; then it asserted that the code under test had indeed judged the ornamentation to be BottomHeavy.

How about now?  Do you see it?  I didn’t either.

When I ran the test, it failed with my made-up 499 status code: the generated mock thought it wasn’t prepared for the GET request. Well, that was weird.  Not too surprising, though: my generated mock has to deal with the facts that A) it may be called from a variety of different network clients, and it has to prepare for and record from each of them separately; B) the webserver in which it runs may decide to create several instances of the mock to use in thread pooling, but it still has to be able to keep track of everything; and C) each of the @MockableOperations has to be administered separately: you don’t want to send a GET and have the response you’d prepared for a POST come back, and you don’t want to send a GET to one URL and receive the response you’d prepared for a GET to a different URL.

That’s a fair amount of complexity, and I figured I’d gotten the keying logic wrong somewhere, so that my preparations were ending up somewhere that the real request couldn’t find them.

So I put in a raftload of logging statements–which I probably should have had in from the beginning: after all, it’s for testing, right?–and tried it again.

Turns out that when the real request came in, the generated mock really truly honestly wasn’t prepared for a GET: instead, it was prepared for a POST to that URL instead.

Huh?  A POST?  But I prepared it for a GET, not a POST.  Honest.

I went and looked at the code, and logged the request method several times, from the initial preparation call in the test right up to Apache’s HttpClient, which is the library I was using to contact the server.  The HTTP method was GET all the way.

Me, I was just getting confuseder and confuseder.  How about you?  Have you beaten me to the conclusion?

The problem is that while this system works just fine for POST and PUT requests, there’s a little issue with GET, HEADER, and DELETE requests.

What’s the issue?  Well, GET, HEADER, and DELETE aren’t supposed to have bodies–just headers.  It’s part of the HTTP standard.  So sending an “X-Mockability: prepare” version of any of these three kinds of requests, with a list of canned responses in the body, involves stepping outside the standard a bit.

If you try using curl to send a GET with a body, it’ll be very cross with you.  If you tell SoapUI that you’re preparing to send a GET, it’ll gray out the place where you put in the body data.  If you already have data in there, it’ll disappear.  So I figured it was fair to anticipate some recalcitrance from Apache HttpClient, but this was more than recalcitrance: somehow, somewhere, my GET was turning into a POST.

I did some packet sniffing.  Sure enough, the server was calling the POST operation because it was getting a POST request over the wire.  Everything about that POST request was exactly like the GET request I wanted to send except for the HTTP method itself.

I tried tracing into HttpClient, but there’s a lot of complexity in there, and TDD has pretty much destroyed my skill with a debugger.

I didn’t need all the capabilities of HttpClient anyway, so I tossed it out and tried a naked HttpURLConnection instead. It took me a few minutes to get everything reassembled and the unit tests passing, but once I did, the integration test displayed exactly the same behavior again: my HttpURLConnection.getMethod () was showing GET right up until the call to HttpURLConnection.getOutputStream () (which includes a call to actually send the request to the server), but the packet sniffer showed POST instead of GET.

HttpURLConnection is a little easier to step into than HttpClient, so I stepped in, and finally I found it, in  Here is the offending code, in a private method called from getOutputStream ():

    private synchronized OutputStream getOutputStream0() throws IOException {
        // ...
        if(this.method.equals("GET")) {
            this.method = "POST";
        // ...

See that?

Sun Microsystems has decided that if it looks like you want to add a body to a GET request, you probably meant to say POST instead of GET, so it helpfully corrects you behind the scenes.


Now, if it doesn’t want you putting a body in a GET request, it could throw an exception.

Does it throw an exception?


It could simply fail to send the request at all.

Does it simply fail to send the request at all?


It could refuse to accept a body for the GET, but give you access to a lower level of operation so that you can put the body in less conveniently and take more direct responsibility for the consequences.

Does it do that?


You can’t even reasonably subclass HttpURLConnection, because the instance is constructed by URL.openConnection () through a complicated service-provider interface of some sort.

This sort of hubris fills me with such rage that I can barely speak.  Sun presumes to decide that it can correct me?!  Even Apple isn’t this arrogant.

So I pulled out HttpURLConnection and used Socket directly, and I’ve got it working, sort of: very inconvenient, but the tests are green.

Unfortunately, Sun isn’t the only place we see practices like this.  JavaScript’s == operator frequently does things you didn’t ask for, and we’ve all experienced Apple’s and Android’s autocorrect mechanism putting words in our…thumbs…for us.

But at least JavaScript has a === operator that behaves, and you can either turn autocorrect off, or double-check your text or tweet to make sure it says what you want before you send it. Sun doesn’t consider it necessary to give you a choice; it simply pre-empts your decision on a whim.

I guess the lesson for me–and perhaps for you too, if you haven’t run into something like this already–is: don’t correct your clients.  Tell them they’re wrong, or refuse to do what they ask, or let them choose lower-level access and increased responsibility; but don’t assume you know better than they do and do something they didn’t ask you to do instead of what they did ask you to do.

They might be me, and I might know where you live.

This Test Was an Ordeal to Write

13 07 2012

I had an interesting time at work today, writing what should have been a simple nothing test.  I thought somebody might be interested in the story.

My client’s application has an entity called a Dashboard.  Before my pass through this particular section of the code, a Dashboard was simply a Name and a set of Modules.

For one reason or another, there is a business-equals operator for Dashboard that returns true if the two Dashboards it’s comparing have the same Name and the same set of Modules, false otherwise.  Simple, standard stuff.

Part of my change involved adding a preview/thumbnail Icon to Dashboard, so that the user could see, in a listing of Dashboards, approximately what the Dashboard looked like before examining it in detail.  I looked at the business-equals method and decided that the new Icon field was irrelevant to it, so I left it alone.

After completing my change, I submitted my code for review, and one of the reviewers flagged the business-equals method.  “Doesn’t check for equality of Icon,” he said.

I explained to him the point above–that Icon is irrelevant to business-equals–and he made a very good argument that I hadn’t considered.

“That makes sense,” he said, “but I didn’t notice it at first, and maybe I’m not unique.  What if somebody comes through this code someday and notices what I noticed, and in the spirit of leaving the code better than he found it, he goes ahead and puts Icon in the business-equals method?  Then we’ll start getting false negatives in production.”

I could see myself in just that role, so I understood exactly where he was coming from.

“Better put a comment in there,” he suggested, “explaining that Icon’s been left out for a reason.”

A comment? I thought to myself.  Comments suck.  I’ll write a test, that’s what I’ll do.

I’ll write a test that uses business equals to compare two Dashboards with exactly the same Name and Module set, but with different Icons, and assert that the two are declared equal.  That way, if somebody adds an Icon equality check later, that test will fail, and upon examining the failure he’ll understand that Icon was left out on purpose.  Maybe somebody has now determined that Icons are relevant to business equals, and the failing test will be deleted, but at least the change won’t be an accidental one that results in unexpected false negatives.

So I did write a test; one that looked a lot like this (C++ code, under the Qt framework).

QString name ("Name");
QPixmap p (5, 10); // tall
QPixmap q (10, 5); // wide
ModuleSet modules;
Dashboard a (name, p, modules);
Dashboard b (name, q, modules);

bool result = (a == b);

EXPECT_TRUE (result); // Google Test assertion

Elementary, right?

Well, not really.  When I ran the test, Qt segfaulted on the QPixmap constructor.  I made various minor changes and scoured Google, but nothing I tried worked: at least in the testing environment, I was unable to construct a QPixmap with dimensions.

I discovered, however, that I could construct a QPixmap with no constructor parameters just fine; that didn’t cause a segfault.  But I couldn’t make p and q both blank QPixmaps like that, because then they’d be the same and the test wouldn’t prove anything: they have to be different.

So I thought, well, why not just mock the QPixmaps?  I have Google Mock here; I’ll just write a mock with an operator==() that always returns false; then I don’t have to worry about creating real QPixmaps.

Problem is, QPixmap doesn’t have an operator==() to mock.  If that notional future developer is going to add code to compare QPixmaps, he’s going to have to do it by comparing properties manually.  Which properties?  Dunno: I’m not him.

Scratch mocking.

Well, I looked over the constructor list for QPixmap and discovered that I could create a QPixmap from a QImage, and I could create a QImage from a pair of dimensions, some binary image data, and a Format specifier (for example, bilevel, grayscale, color).  So I wrote code to create a couple of bilevel 1×1 QImages, one consisting of a single white pixel and the other consisting of a single black pixel, and used those QImages to construct a couple of unequal QPixmaps.

Cool, right?

Nope: the constructor for the first QImage segfaulted.

Cue another round of code shuffling and Googling: no joy.

Well, I’m not licked yet, I figured.  If you create .png files, and put them in a suitable place, and write an XML file describing them and their location, and refer to the XML file from the proper point in the build process, then you can create QPixmaps with strings that designate the handles of those .png files, and they’ll be loaded for you.  I had done this in a number of places already in the production code, so I knew it worked.

It was a long, arduous process, though, to set all that up.

But I’m no slacker, so I put it all together, corrected a few oversights, and ran the test.


Success?  Really?  I was suspicious.  I ran it in the debugger and stepped through that QPixmap creation.

Turns out all that machinery didn’t work; but instead of segfaulting or throwing an exception, the QPixmap constructors created blank QPixmaps–just like the parameterless version of the constructor would have–that were identical to each other; so once more, the test didn’t prove anything.

More Googling and spiking and whining to my neighbor.

This time, my neighbor discovered that there’s an object called QApplication that does a lot of global initialization in its constructor.  You create a QApplication object somewhere (doesn’t seem to matter where), and suddenly a bunch of things work that didn’t work before.

Okay, fine, so my neighbor wrote a little spike program that created a QPixmap without a QApplication.  Bam: segfault.  He modified the spike to create a QApplication before creating the QPixmap, and presto: worked just fine.

So that was my problem, I figured.  I put code to create a QApplication in the test method and ran the test.

Segfault on the constructor for QApplication.

No big deal: I moved the creation of the QApplication to the SetUp() method of the test class.

Segfault on the constructor for QApplication.

Darn.  I moved it to the initializer list for the test class’s constructor.

Segfault on the constructor for QApplication.

I moved it out of the file, into the main() class that ran the tests.

Segfault on the constructor for QApplication.

By now it was after lunch, and the day was on its way to being spent.

But I wasn’t ready to give up.  This is C, I said to myself: it has to be testable.

The business equals doesn’t ever touch Icon at all, and any code that does touch Icon should cause trouble.  That gave me an idea.

So I created two Dashboards with blank (that is, equal) Icons; then I did this to each of them:

memset (&dashboard.Icon, 0xFFFFFFFF, sizeof (dashboard.Icon));

See that?  The area of memory that contains the Icon field of dashboard, I’m overwriting with all 1 bits.  Then I verified that calling just about any method on that corrupted Icon field would produce a segfault.

There’s my test, right?  Those corrupted Icons don’t bother me, since I’m not bothering them, but if anybody changes Dashboard::operator==() to access the Icon field, that test will segfault.  Segfault isn’t nearly as nice as a formal assertion failure, but it’s a whole heck of a lot better than false negatives in production.

Well, it turned out to be close to working.  Problem was, once I was done with those corrupted QPixmaps and terminated the test, their destructors were called, and–of course, since they were completely corrupt–the destructors segfaulted.

Okay, fine, so I put a couple of calls at the end of my test to an outboard method that did this:

QPixmap exemplar;
memcpy (&dashboard.Icon, &exemplar, sizeof (dashboard.Icon));

Write the guts of a real Icon back over all the 1 bits when we’re finished with the corrupted Icon, and the destructor works just fine.

My reviewer didn’t like it, though: he pointed out that I was restoring the QPixmap from a completely different object, and while that might work okay right now on my machine with this version of Qt and this compiler, modifying any of those variables might change that.

So, okay, fine, I modified things so that corrupting each dashboard copied its Icon contents to a buffer first, from which they were then restored just before calling the destructor.  Still works great.

So: this is obviously not the right way to write the test, because Qt shouldn’t be segfaulting in my face at every turn.  Something’s wrong.

But it works!  The code is covered, and if somebody reflexively adds Icon to Dashboard’s business equals, a test will show him that it doesn’t belong there.

I felt like Qt had been flipping me the bird all day, receding tauntingly before me as I pursued it; but finally I was able to grab it by the throat and beat it into bloody submission anyway.  (Does that sound a little hostile?  Great: it felt a little hostile when I did it.)  I’m ashamed of the test, but proud of the victory.

So I’ve Got This AVR Emulator Partway Done…

2 07 2012

I decided to go with writing the emulator, for several reasons:

  1. It puts off the point at which I’ll start having to get serious about hardware.  Yes, I want to get into hardware, really I do, but it scares me a little: screw up software and you get an exception or a segfault; screw up hardware and you get smoke, and have to replace parts and maybe recharge the fire extinguisher.
  2. It’s going to teach me an awful lot about the microcontroller I’ll be using, if I have to write software to simulate it.
  3. It’s an excuse to write code in Scala rather than in C.

Whoa–what was that last point?  Scala?  Why would I want to write a low-level bit-twiddler in Scala?  Why not write it in a low-level language like C or C++?

If that’s your question, go read some stuff about Scala.  I can’t imagine anything that I’d rather write in C than Scala.

That said, I should point out that Scala’s support for unsigned values is–well, nonexistent.  Everything’s signed in Scala, and treating 0x94E3 as a positive number can get a little hairy if you put it in a Short.  So I wrote an UnsignedShort class to deal with that, including a couple of implicits to convert back and forth from Int, and then also an UnsignedByte class that worked out also to have a bunch of status-flag stuff in it for addition, subtraction, and exclusive-or.  (Maybe that should be somewhere else, somehow.)

Addition, subtraction, and exclusive-or?  Why just those?  Why no others?

Well, because of the way I’m proceeding.

The most important thing about an emulator, of course, is that it emulates: that is, that it acts exactly as the real part does, at least down to some small epsilon that is unavoidable because it is not in fact hardware.

So the first thing I did was to write a very simple C program in the trashy Arduino IDE that did nothing but increment a well-known memory location (0x500, if I remember correctly) by 1. I used the IDE to compile and upload the file to my ATmega2560 board, which thereupon–I assume–executed it. (Hard to tell: all it did was increment a memory location.)

Then I located the Intel .HEX format file that the IDE had sent over the wire, copied it into my project, and wrote a test to load that file into my as-yet-nonexistent emulator, set location 0x0500 to 42, run the loaded code in the emulator, and check that location 0x0500 now contained 43.  Simple, right?

Well, that test has been failing for a couple of weeks straight, now.  My mode of operation has been to run the test, look at the message (“Illegal instruction 0x940C at 0x0000FD”), use the AVR instruction set manual to figure out what instruction 0x940C is (it’s a JMP instruction), and implement that instruction.  Then I run the test again, and it works for every instruction it understands and blows up at the first one it doesn’t.  So I implement that one, and so forth.

Along the way, of course, there’s opportunity for all sorts of emergent design and refactoring.

At the moment, for instance, I’m stuck on the CALL instruction.  I have the CALL instruction tested and working just fine, but the story test fails because the stack pointer is zero–so pushing something onto the stack decrements it into negative numbers, which is a problem.  Why is the stack pointer zero?  Well, because the AVR architecture doesn’t have a specific instruction to load the stack pointer with an initial value, but it does expose the SP register as a sequence of input/output ports.  Write a byte to the correct output port, and it’ll appear as part of the two-byte stack pointer.

So I have the OUT instruction implemented, but so far (or at least until this morning), all the available input and output ports were just abstract concepts, unless you specifically connected InputRecordings or OutputRecorders to them.  The instructions to initialize the stack pointer are there in the .HEX file, but the numbers they’re writing to the I/O ports that refer to the stack pointer are vanishing into the bit bucket.  That connection between the I/O ports and the stack pointer (and an earlier one between the I/O ports (specifically 0x3F) and the status register) is more than an abstract concept, so I’m working on the idea of a Peripheral, something with special functionality that can be listed in a configuration file and hooked up during initialization to provide services like that.

Eventually, though, I’ll get the simple add application running in the emulator, by implementing all the instructions it needs and thereby all the infrastructure elements they need.

Then I plan a period for refactoring and projectization (right now the only way to build the project is in Eclipse, and you’ll find yourself missing the ScalaTest jars if you try).

After that, I want to write a fairly simple framework in C, to run on the Arduino, that will communicate with running tests on my desktop machine over the USB cable.  This framework will allow me to do two things from my tests: submit a few bytes of code to be executed and execute them; and ask questions about registers and I/O ports and SRAM and flash, and get answers to them.

Once I get that working on the Arduino (it’s going to be interesting without being able to unit-test it), I’ll grab the .HEX file for it, load it into my emulator, and go back to implementing more instructions until it runs.

After that, I’ll be able to write comparison tests for all the instructions I’ve implemented that far: use an instruction in several ways on the Arduino and make sure it operates as expected, then connect to the emulator instead of the Arduino and run the same test again.  With judiciously chosen tests, that ought to ensure to whatever small value of epsilon I’m willing to drive it to, that the emulator matches the part.

Anyway, that’s what I’ve been doing with my spare time recently, in the absence of insufficiently-Agile people to pester.