May 21, 2008 at
2:17 pm —
Uncategorized
A code coverage tool watches your program executing and reports which lines of code were executed and which were not. Testers are sometimes tempted to use code coverage tools to assess test coverage. And some testers are tempted to set code coverage goals. If you feel these temptations, be careful how you interpret the code coverage tool’s reports.
You can be sure that if a line of code was not executed during a test run, then it certainly was not tested by that run.
But what of a line of code that was executed by the tests? Unfortunately, you can’t tell, just from the fact that it was executed, whether the line was tested.
Elisabeth Hendrickson and I developed a workshop on unit testing. The work of the workshop centered on a small application we had written, a rudimentary HTTP server. Our initial code had exactly thirteen tests, just enough to illustrate a few basic tools and techniques that we’d be teaching in the workshop.
When we ran a test coverage tool called NCover to watch our test suite, it reported that our thirteen tests executed 65 percent of the server’s code. Does that mean that we achieved 65 percent test coverage? Not on your life. Our thirteen tests barely scratched the surface of the responsibilities of even our very simple HTTP server.
If our tests tested so little, why was code coverage so high? Because though we our suite tested little of the code, it executed a lot of the code.
For example, one of our tests sent a GET request to the server and evaluated the response. As the server executed the request, it called a logging function to log information about the request and its response to a file. The logging function was minimal, and did not deal with any of the zillions of possible file system errors it might encounter. It expected the happy path, and nothing but the happy path. So this one test, which did not in any way assess the logging feature, executed all of the logging code. The logging code was 100 percent executed and zero percent tested.
Code coverage does not imply test coverage. If you use code coverage tools to help assess your test coverage, keep that in mind.
Comments (0)
March 15, 2006 at
6:00 pm —
Uncategorized
- information
-
- n. Data that reduces uncertainty.
I once defined information as “that which informs.” I was unsatisfied with this definition, because I didn’t have a good definition for “inform”. The dictionary definitions didn’t help (e.g. “to impart information” or “to impart facts”), so I turned to the web to seek other people’s ideas.
The definition I found most useful is Claude Shannon’s: Information is that which reduces uncertainty. I’ve refined Shannon’s definition by replacing the fuzzy word “that” with the sharper word “data,” which I define as descriptions of events or conditions.
For a while I was worried that this definition, focused so specifically on reducing uncertainty, was too limiting. Why uncertainty? And why reducing uncertainty? What about data that increases uncertainty? Suppose I discover a datum that invalidates a key “fact” that I thought I knew, and therefore leaves me uncertain about many other “facts.” It seems to me that that would be information, too.
I was tempted to substitute the more general word “alters,” and to find some variable more general than “uncertainty” as the central variable that is altered by information. But I’ve found that the limitation—the intense focus on reducing uncertainty—turns out to be helpful when I’m seeking information. In most cases where I’m gathering information my goal is to reduce my uncertainty. Of course, I may end up being informed by data I wasn’t seeking, and I may be informed in ways that increase my uncertainty. But when I’m seeking information, I’m almost always trying to reduce my uncertainty about something. This definition reminds me to ask myself which uncertainties are most important to me, and which I’m most uncertain about. Then I can focus more productively on gathering the data that may reduce my uncertainty.
I’ve made good use of this definition in a number of contexts. I’ve found it especially helpful in talking about testing, because a central purpose of testing is to deliver information, specifically information that reduces stakeholders’ uncertainty about quality.
The definition also helps me when I’m estimating. It invites me to assess my uncertainty about the variables that affect the thing I’m estimating. That assessment helps me to focus my search for data.
Another definition I like is Peter Drucker’s: Information is data endowed with relevance or purpose. Drucker’s definition emphasizes purpose and the relevance of data to our purposes. I’ve seen numerous metrics programs flounder because they started by collecting data rather than by clarifying their purpose for measuring. The end up collecting lots of data, and then not knowing how to make sense of it.
Comments (0)
April 29, 2004 at
12:40 am —
Uncategorized
A few days ago I was poking around the web for ideas about how to test software, and I saw Scott Ambler’s article about “Full Life Cycle Object-Oriented Testing (FLOOT).” The article includes a list of common testing techniques. As I looked over the list, I noticed that there is a small set of key dimensions that distinguish one testing technique from another. For example, unit testing and system testing differ in the kind of component they test. Stress testing and usability testing differ in the quality attribute that they test for. Unit testing and acceptance testing differ in the nature of the decisions that are made based on the test results.
I love looking for patterns like that, so I spent an hour analyzing Scott’s to identify the dimensions. Here are thirteen dimensions I found, and a few examples that show how different testing techniques vary along each.
Unit Under Test. What type of component being tested?
- In Class Testing or Unit Testing, the unit under test is a class.
- In Method Testing, the unit under test is a method of a class.
- In System Testing, the unit under test is the system.
Test Case Scope. What is the scope of the interaction tested by each test case?
- In Use-Case Scenario Testing, the scope of the interaction tested by each test case is a user goal.
- In Unit Testing, the scope of each test case is a method invocation.
- In Integration Testing, the scope is a transaction.
Unit Coverage. What subset of the unit under test is exercised by the test suite?
- In Coverage Testing, the subset being exercised by the test suite is code statements.
- In Path Testing, the coverage is logic paths.
- In Regression Testing, the coverage is code changes.
- In Boundary-value Testing, the coverage is limits.
Behavioral Scope. What subset of the unit-under-test’s behavior is being tested?
- Installation Testing tests the system’s installation procedure.
- Functional Testing tests the system’s business functionality.
- Integration Testing tests interactions among subsystems.
Unit Relationships. What are the relationships among the units whose interactions are being tested?
- In Inheritance-regression Testing, the relationship between units is inheritance.
- In Integration Testing, the relationship is collaboration or peers.
Quality Attribute. What type of quality attribute is being tested?
- In Stress Testing or Volume Testing, the quality attribute being tested is throughput or latency or capacity.
- In Usability Testing, the quality attribute being tested is usability.
Stakeholder. Whose interests are the focus of the testing?
- Acceptance Testing focuses on the interests of users.
- Operations Testing focuses on the interests of operators.
- Support Testing focuses on the interests of support staff.
Liveness. How closely does the test environment mimic the operational environment. Or perhaps this dimension is better characterized as Safety: To what extent are the testers using the system to do the real work for which the system was intended?
- In a Pilot, the test is the actual operational environment, perhaps limited in scope (e.g. a small subset of users, or for a limited time).
- In Beta Testing, the environment is a fully operation environment, but perhaps used only for non-critical functions.
- In Acceptance Testing, the environment is a non-operational similar to the operational environment.
- Unit Testing is done in the development environment.
Visibility into Unit Under Test. To what extent does the tester exploit knowledge about the internals of the unit under test?
- In Black-box Testing, the tester exploits no knowledge knowledge of internals of the unit under test.
- In White-box Testing, the tester exploits full knowledge of internals.
- In Grey-box Testing, the tester exploits some knowledge of internals.
Tester. What is the relationship of the tester to the software under test?
- For Acceptance Testing or User Testing, the tester is a user of the software.
- For Unit Testing or Developer Testing, the tester is a developer of the software.
Processor. What type of “processor” will “executes” the “software” during the tests?
- In most kinds of testing, a computer executes the software.
- In Code Inspections and Design Reviews, developers “execute” the software.
- In Prototype Walkthroughs, user “execute” the “software.”
Pre-Test Confidence. How confident are we about the software before we begin the testing?
- Before Alpha Testing, our confidence in the software is lower (compared with Beta Testing).
- Before Beta Testing, our confidence in the software is higher (compared with Alpha Testing).
Decision Scope. What kinds of decisions will we make based on the outcome of the test?
- For Acceptance Testing, the key decision is shell to release the product.
- For Integration Testing, the decision may be whether to begin system testing.
- For Unit Testing, the decision is whether the current coding task is complete.
This list is based on only an hour’s work, and on my analysis of only a single list of testing techniques (Scott’s), so I don’t claim that it is anywhere near complete or correct. It might be useful, though, for people who want to expand their repertoire of testing techniques, or to locate a technique that fits a given purpose or context.
I wonder what would happen if we created a thirteen-dimensional matrix. What parts would of the matrix would be crowded with testing techniques? What parts would be empty?
Thirteen dimensions is more than I can handle. So what would happen if we took two or three dimensions at a time and explored all of the values along those dimensions? Would that be interesting? Would it be useful? Would it help us to identify testing techniques that fit our specific situations? Might we notice holes in the matrix for which we want to invent useful techniques?
Comments (0)