Tag Archives: Regression test

History Report

Okay, my recent post was about my Changes Report. In this post I’m writing about my History Report, which is a spreadsheet.

(If your history report is a spreadsheet, too, you may want to skip the first three paragraphs below, and resume reading at Each verdict cell.)


The left headers are in the first few columns at the left; their job is to identify each row as belonging to a single test verdict. I’m using the Ruby gem MiniTest::Unit, so the identifying information is: suite name, test name, method name, verdict identifier.

The top headers are in the first few rows at the top; their job is to identify the build and summarize the results. Each column’s headers include the date and time, the build identifier, and the count of each possible outcome: passed, missed, failed. The leftmost of these build columns is for the most recent build. Older builds are represented in columns to the right.


Each column (other than those I’ve just mentioned) show the verdicts for a single test run; the most recent run is just to the right of the identifying information, and the older runs are to the right of that.

Each verdict cell shows the outcome for the verdict in that row: passed, missed, or failed. These outcome cells are colored according to their values. (See my post on colors.)

Beyond that, there’s one other important bit of data involved: if the verdict or its underlying data changed since the previous test run, the verdict is rendered in bold and italic, and is in fact a link. The link takes me to the very verdict in the Changes Report, and there I find the full information about the verdict: its expected and actual values for the current and previous test runs.

The bold italic link is present only when there was a change in the verdict. That means that for an old (unchanged) verdict, I can look to the right to find the most recent bold italic link, and that tells me when the most recent change occurred.

The remaining item I’ll be adding (soon) is a column for defects. Each cell will be a link to the defect item (in Rally), if there is one.

Oh, and did I say? Both my Changes Report and my History Report are generated automatically from the test logs (the only exception being the defect information, which must be updated manually).


Changes Report

My automated tests produce two reports:

  • History report.
  • Changes report.

In my test logs, each verdict is one of: passed, failed, missed (the verification point was not reached).

Now what the managers want to know is: How many of each there were. That’s what’s in the history report: today’s results, along with historical results. I’ll write about the history report in my next post.

What I want to know is: What’s different from the previous test run. That’s what’s in the changes report: all the differences between the current test run and the previous one.

The changes report groups verdicts as follows:

  • New failed.
  • New missed.
  • New passed.
  • Changed failed.
  • Changed passed.
  • Old failed.
  • Old missed.
  • Old passed.

The last three — old failed, old missed, and old passed — are of no immediate interest to me. The current result is exactly the same as the previous result. There’s no action I need to take, because all these were dealt with after some previous test run: defect reports opened, closed, updated, etc.

The first three — new failed, new missed, and new passed — obviously need my attention. Defect reports will need to be opened, closed, updated, etc.

The middle two — changed failed and changed passed — also need my attention:

  • Changed failed: A changed failed verdict is one that failed in the previous test run, then failed in the current test run, but in a different way. This occurs when the actual value changes from one wrong value to another. Investigation is required.
  • Changed passed: A changed passed verdict is one that passed in the previous test run, then passed in the current test run, but in in a different way. This occurs when both the expected value in the test and the actual value delivered by the application have changed, but also agree. Usually this would be because the developer gave advance notice of a change, which the tester accommodated by pre-adapting the test.

So what of the changes report itself? Well, it has nine sections: a summary, plus a section for each item in the first list above.

The summary lists the other sections, linking to each, and showing me the count of verdicts in each. The links allow me to navigate quickly to whichever section I want.

Each of the other sections begins with a list of the verdict ids for the verdicts it contains; each verdict id in that list links to the data for the verdict. Again, the links facilitate navigation.

At the links, each verdict’s data is presented in a small table that gives the verdict id, along with the expected and actual values for both the previous test run and the current one. The table is “over-and-under,” showing the corresponding values one above the other; this makes it easy for me to spot differences, even between similar values. The values in the table are displayed in a monofont, which also makes spotting differences easier.

And of course, my reports are kinder and gentler than some others.

Tiered Testing [Socratic Dialog]

Socrates: Let’s begin at the beginning. Now, tell me, Tester, what is the purpose of the build verification test?

Tester: Its purpose, Socrates, is to determine whether full regression testing should proceed.

Socrates: I see. And what is the alternative, if the full regression testing should not proceed?

Tester: The alternative is that the build is considered failed, and repairs must be made before the build is tried again.

Socrates: What sorts of test failures, bugs, would fail the build?

Tester: Well, bugs that block important parts of the testing, certainly.

Socrates: The full regression testing should not proceed unless the tests can actually do their work?

Tester: That’s correct, I think.

Socrates: Are there other failures that would fail the build?

Tester: Yes. I think so: failures of important functionality.

Socrates: Regression testing should not proceed unless the major functionality works?

Tester: That’s right.

Socrates: Any others besides blocking failures and major functionality failures?

Tester: No, Socrates, I think that’s it.

Socrates: Very well. Then let’s think about just those two types of failures.

Tester: As you say.

Socrates: Of the two, is each type of failure sufficient, by itself, to fail the build?

Tester: Yes, Socrates, certainly.

Socrates: All right, then. Suppose that there are failures in major functionality, but there are not any blocking bugs. In that case, the regression testing should not proceed?

Tester: I think that’s right.

Socrates: That must mean, then, that the information gathered by the regression testing would not help in diagnosing the failures in major functionality, and therefore is not needed.

Tester: Well, the information might be helpful. Let me think. Yes, it would be helpful. Very much so, now that I think about it.

Socrates: So a failure in major functionality should not, by itself, be sufficient to fail the build. The regression testing should begin, and would gather helpful information.

Tester: Yes, I do now think that’s so.

Socrates: And a blocking bug alone would be sufficient, regardless of whether there are major functionality failures.

Tester: Yes, it would be sufficient.

Socrates: I see. Therefore the major functionality testing on the one hand does increase the duration of the build verification test, but on the other hand does not contribute to determining whether to fail the build.

Tester: Again, true.

Socrates: Why, then, is major functionality testing included in the build verification?

Tester: I’m not sure, Socrates. Perhaps it should be included because we need to identify important failures sooner rather than later.

Socrates: Indeed, that is important.

Tester: Well, Socrates, at least I get some agreement from you today.

Socrates: I’m glad for that. But, according to what we’ve said, would it not be better to separate the testing into three tiers: build verification, major functionality, and full regression testing? That way, the build verification can complete sooner; ideally, the major functionality testing would be started at the same time, but if not, then immediately after the build verification test.

Tester: Yes, Socrates, you’re right.

Socrates: Thanks for that.

Tester: Therefore I see, finally, that it would be good to have three-tiered testing:

  1. Build verification test: Find disqualifying bugs first.
  2. Major functionality test: Find important bugs fast.
  3. Full regression test: Find as many bugs as possible.

Socrates: As you say, Tester.

Tester: And if possible, all three should begin at the same time, to get the results soonest. In case the build is failed, diagnosis and repair can begin immediately.

Socrates: Again, true.

Tester: Thanks, Socrates. I’ll begin working on this.

Socrates: You’re very welcome, Tester.

[Ed: Modern thinking is that the BVT should fail the build for a single failed verification. Note, however, that a single verification may be, under the hood, compound and complex. For example, if there are two ways to register a user on a website, the verification might be that at least one of those ways succeeds. The verification would fail only if there’s no effective way to register a user, because that would block testing.]

Going Through Some Changes

Quite a while ago I ran across this post (now a PDF periodical article). I’d been doing what I call “change testing” for some time, and I thought this was a good introduction to the concept. Please see the section “Regression Testing” that begins on the second page. (But it’s okay — informative, even — to read the whole article.) I especially like the metaphor that sees change testing as a “quality ratchet.”

But I think it’s unfortunate that the authors characterize change testing as regression testing, which I think invites confusion.

I’ve done change testing on a large scale, and will have much more to say about it in forthcoming posts.