On Design for Testability

keeping balance Almost at every conference, event, training, or consulting engagement someone asks for my opinion on the whole design for testability thing. I’m not quite sure why I haven’t blogged on this topic, especially at the time that a lot of the other bloggers were weighing in, but better late than never.

Before getting into that, I want to start with a slightly broader scope of discussion.

You see, I get asked about “best practices” on all sorts of things. And I try not to be the kind of consultant that responds with “it depends”, but the context of the question often makes the answer irrelevant. And the unspoken context of a best-practice question is:

Given infinite time and budget

The biggest problem that I see with well-intentioned, best-practices-following developers and architects is that they don’t ask the question “is this the right thing for us to be focusing on right now?” Understandably, that is a difficult question to answer – but it needs to be asked, since you don’t have infinite time or budget to do everything according to best practices (assuming those even exist).

About testing

The biggest issue I have with the “design for testability” topic is the extremely narrow view it takes of the word “testability”, usually in the form of more code written by a developer which invokes the production code of the system, also known as “unit tests”.

There are many different kinds of testing – unit, integration, functional, load, performance, exploratory, etc… where some may be automated and others not. Should we not discuss what “design for testability” means for not-just-unit-testing?

And what’s the point of testing anyway?

It’s not to find bugs.

Research has shown that testing (of all kinds) is not the most effective way of finding bugs. I don’t have the reference handy but I’m pretty sure that it’s from Alistair Cockburn’s work. Code reviews are (on average) about 60% more effective.

Don’t get me wrong – testing can provide indications that the software has bugs in it, but not necessarily where in the code those bugs are.

The purpose of testing is to provide quantitative and qualitative information about the system that can help various stakeholders in their decision-making processes. The relevance of that information indicates the quality of the testing. Here are some examples:

The system supports 100 concurrent users, with the expected user-type distribution (X% role A, Y% role B, etc), performing expected use-case distributions, and collaboration scenarios.
Time to proficiency for new users in role A is expected to be 3 days
Alternate #2 of use case #12 fails on step #3

As you can see, the relevance of the above information is dependent on what decisions the various stakeholders need to make. The bullet on load can help us decide if more machines are needed or if developers need to tune the performance of the systems. The bullet on time to proficiency can help us decide if larger investment in usability is required. Information like the last bullet can be used in conjunction with the first two to decide on the timing and type of a release.

The timeliness of this relevant information is critical to the success of a project.

Choosing which and how much of the various testing activities to perform when is something that needs to be revisited several times throughout the lifetime of a project, taking into account the current risks (threats and probabilities) and time and resource investment to mitigate them.

Let me reiterate – we’re not going to have enough time to do everything.

On iterations

If the only part of your organization that is doing iterations are your developers, you’re not agile.

In order to capitalize on the information that testers are providing, you need them in your iterations.

The same goes for the other roles involved in the project – business analysts, DBAs, sysadmins, etc.

I know that 99% of organizations aren’t structured in a way to do this.

I never said doing this would be easy.

On design

Figuring out what kind of design and how much to do when is just as important, and just as hard. Design for testability is one part of that, but not the only one, or necessarily the most important one at any point of time.

Within that design for testability topic is the “design for unit-testing” sub-topic which seems to be the popular one. Before getting into the design aspects of it, let’s take a closer look at the unit-testing side of things.

On unit-testing

The assumption is that having more unit tests will lead to a code-base with less bugs, thus requiring shorter time to get the system into production, which will pay back the time it took to write those unit tests to begin with.

In practice, what tends to happen is that as development progresses, testing code breaks as the structure of the production code changes. Now one of two things happens – either the testing code is removed or rewritten. In either case, we didn’t get the return on investment we expected on the first bit of testing code. Unfortunately, rare is the case where the relevant people in the organization understand why, resulting in the same situation repeating itself over and over again.

Those projects would have been better off without unit testing, though the organization as a whole might have used those experiences to learn and improve. It’s been my experience that if the organization wasn’t conscious enough in the context of the project to notice the situation, it is unlikely to do so at higher levels.

On fragile unit tests

The reason that a unit test ends up being rewritten (or removed) is that its code was coupled to the production code in such a way that it broke when the production code changed. This tendency to break (fragility) is a critical property of a unit test. A fragile unit test will slow down a developer doing work on some existing code – it actually makes the system less maintainable.

For a unit test code to be stable (not fragile) it needs to be coupled to stable properties of the production code. The question of whether the production code is designed in such a way that it has stable properties – is a design question. Is it a unit? If not, you will not be able to write a unit-test against it.

And anyway, who said that every class is a unit, or should be a unit? Domain models (when done right) are good examples of a unit, yet the classes that make them up may not be units. Unit-testing should only be attempted with things which are units.

I think too much weight is put on whether a dependency of a class is a concrete or interface type, and not nearly enough on the nature of the dependency. I wouldn’t blame the hammer for pounding my thumb, and by the same token I think that blame should not be directed towards tools like those from TypeMock.

On tools

There is so much more depth to both design and testability that needs to be more broadly understood. No tool has yet been created to handle either design or testing in such a way that humans can give up responsibility for the outcome.

Over the years I’ve noticed that tools are most significant when used by skilled practitioners, which makes sense in retrospect. Giving a novice carpenter a laser-guided saw probably won’t significantly change the outcome of their work. Ultimately, the skilled practitioners are the ones that create tools – not the novices. And no tool, no matter how advanced, will make a novice perform at levels like the skilled practitioner.

In the case of a project too big for a single skilled practitioner to complete in the time required (or at all), the balance of importance shifts away from tools to the project management topics described above.

In summary

I hope that this post has shed some light on the context in which decisions with respect to testing need to be made. Design is one activity that can support certain kinds of testing, but not the only one, or even the most important one for the given type of testing necessary at that time in the project.

Design is hard. Project management is hard. Testing is hard.

Getting the right mix of people that together have enough experience and skills in these activities isn’t easy.

Don’t expect that sprinkling some interfaces in your code base will be enough.
That doesn’t count much in the way of design, just as writing code in a testing namespace doesn’t count much in the way of testability.

Looking forward to hearing your comments.