Know your system

26 Oct

Is your system good enough?  Are the tests good enough?  Both of these questions can be answered quantitatively, and the first step is to understand the system. The process starts with listing the main functions of the system, determining how often each function is called, and how severe failures are.

Consider a simple web site that enables event photographers to upload and sell downloads of photographs to customers.  This system will need to set up an account for a new photographer, allow the photographer to upload photos, show thumbnails and proofs to customers, and allow the user to purchase and download individual photos.

The next step is to determine the natural units for the system. In this case, the objective is to take orders for pictures. The system does other things – it receives pictures, displays pictures, and registers photographers, but it does all of these things in support of the sale. Because the objective of the site is to sell pictures, the natural units are orders.

With natural units established, we can determine the frequencies of various calls. The easiest to determine is checkout. It will be called once per order. For other functions, we must use the resources at our disposal to make reasonable estimates.

Suppose the business analyst forecasts an average of 3 images per order. Then, we know that there will be 3 downloads per order, and there will be at least 3 occurrences of “add to cart” for each order. We guess 4 occurrences for “add to cart” because the typical user picks an extra picture and later discards it.

Carrying on through the rest of the list in this manner, we make assumptions: the photographer takes 500 pictures per event and sells 6 orders after 30 people browse the pictures. 20 pictures show on a page of thumbnails. A typical pohtographer will sell 1000 orders through the system during his photography stint.

With this list, we can now compute the relative frequency of use of each function. Normalize that column, that is, sum the “calls per order” column and divide each value by that sum.

Function calls per order frequency %
Register photographer 1/1000 0.002
Upload images 1/6 0.37
Show thumbnails 25 55
Show proof 12 27
Add to cart 4 9
Checkout 1 2
Download photo 3 7

We have one more round of estimating and arithmetic. How severe is a failure? Again, these will be relative numbers. Suppose a failure to take an order is of severity 1. We estimate the severity of other failures based on that.

We could hem and haw about the mode of failure – taking an order could charge the customer but not deliver the goods, or it might just get stuck leaving the customer unable to complete the purchase. For this purpose, it is better just to say “it fails” and consider the overall consequences of a failure.

A failure to register a photographer is particularly severe because it causes many lost orders over time. Not every photographer who has a problem will abandon the site, though. Still, we will use the higher number to be conservative because photographer registration gives the user a critical first impression.

Function Calls per Order Frequency % Severity Freq * Severity Relative impact %
Register photographer 1/1000 0.002 200 .02 6
Upload images 1/6 0.37 0.5 .002 0.5
Show thumbnails 25 55 0.25 0.14 35
Show proof 12 27 0.5 0.13 34
Add to cart 4 9 0.5 0.04 11
Checkout 1 2 1 .02 6
Download photo 3 7 0.5 0.03 8

(You may notice that the percent columns don’t add up to exactly 100. That’s rounding error. With this sort of number, we really only have 1 significant figure anyway, so try to relax.)

The next task is to write tests for the functions in proportion to the last column, with at least one test for each function.

Suppose you already have some tests. See how they stack up against this measure. Are you testing the most important parts? Are you testing in proportion to the severity of failures?

This table gives us a start at understanding how much to test, and it will help us to characterize the severity of failures we encounter during testing.

Reference: Musa, John D. Reliability Engineering: More Reliable Software Faster and Cheaper 2nd Edition. Bloomington, Ind.: Authorhouse. 2004.

Types of software testing

24 Oct

A physician, a civil engineer, and a computer scientist were arguing about what was the oldest profession in the world.

The physician remarked, “Well, in the Bible, it says that God created Eve from a rib taken out of Adam. This clearly required surgery, and so I can rightly claim that mine is the oldest profession in the world.”

The civil engineer interrupted, and said, “But even earlier in the book of Genesis, it states that God created the order of the heavens and the earth from out of the chaos. This was the first and certainly the most spectacular application of civil engineering. Therefore, fair doctor, you are wrong: mine is the oldest profession in the world.”

The computer scientist leaned back in her chair, smiled, and then said confidently, “Ah, but who do you think created the chaos?” [1]

My wife left me this joke in my lunchbox one day recently.  It drives me to, in equal parts, laugh and cringe.  Writing software is the art of describing, to the most pedantic of machines, exactly what we want done, so we programmers are sure to make errors, but errors are, in fact manageable, if we just know how to look for them before they become problematic.

Tests can help us to validate the code we’ve written.  By writing good tests, automated tests, and by using them appropriately, we can quantify how good our code is and stop testing when the code is good enough.  In this blog, we’ll cover all topics software testing.

Tests come in many different shapes and sizes.  While there are many test strategies, let’s begin by focusing on two with dramatically different aims: unit tests and acceptance tests.

chart showing low-level, white box testing.

Unit testing is low-level white box testing.

Unit tests have been popularized by the excellent library JUnit and variants in several other languages.  Unit tests focus on a single, isolatable portion of code and attempt to verify its functionality.  Unit tests are typically “white box tests” – that is, tests written with knowledge of the inner-workings of the software.  Failures at the unit test level have the power to expose the exact cause of failure.  Unit tests can be cumbersome to maintain because they are so closely related to the code under test, but they can also provide excellent validation of refactoring success.  Unit testing, powerful though it is, is just the beginning of the arsenal of tools to uncover software faults.  Each unit test only adds a little bit to code coverage because they focus on specific small functions.

Acceptance testing is black-box testing and at a high-level.

On the opposite corner of the testing spectrum is acceptance testing.  Acceptance tests are written by the customer to verify that everything works as intended to accept delivery of code. Acceptance tests are black-box tests – written without knowledge of the underlying code.  Acceptance tests are written at a high level, so they offer substantial code coverage per test, but they are not very useful for isolating a problem.  If a customer gives you a failing acceptance test, odds are good that you will want to write a lower level test to help identify the root cause of the failure.

Software developers should never rely on the customer to find their bugs.  Acceptance tests can be thorough, but in practice they are often cursory and not necessarily repeatable.

System testing is the in-house analogue of acceptance testing. The software producer writes system tests to validate that the software is performing adequately for release. For a web application, they might use Selenium to manipulate a web browser in a repeatable fashion.  System tests evaluate the typical flow of the program as well as exceptional flows.  System tests are typically black box tests

A good test suite does not focus on only one kind of testing.  While it would be ideal to have a test suite that has exactly one failure per defect in the code, that goal is, for many purposes, unrealistic.  It is more important to have a suite of tests to detect all the failures than it is to have a suite of tests to precisely count each failure.  The good test suite will use a variety of types of tests at different levels to expose possible problems and to exercise the entire system.