Successful bug triage
Written sometime between 2001 and 2009. The original publication date is lost. This post has moved across three blogging platforms during its life. I preserve it here as a snapshot of my thinking about testing at the time I wrote it. Talk to me before taking anything you read here as gospel!
Triage must be performed when there is a shortage of developer resources for fixing all existing known bugs.
The term is borrowed from the medical profession:
To see how this applies to software engineering, simply replace the word ‘patient’ with ‘bug’ in the text above.
Triage is a process of prioritizing patients based on the severity of their condition. This [process] rations patient treatment efficiently when resources are insufficient for all to be treated immediately.
This document explains why it’s necessary to perform regular triage and provides guidelines and tips on how to triage successfully.
A note on terminology
Throughout this document I use the colloquial term ‘bug’ to refer to a software defect - mainly because that’s the term used by Bugzilla — and the fact that it’s easier to say ‘bug’ than ‘software defect’.
I use the terms ‘bug’ and ‘bug report’ interchangeably, but of course they are different entities. Whether I’m referring to the software defect itself or the document which describes it should be apparent from the context.
The purpose of triage is two-fold:
- To review and agree the current severity and priority level of existing bug reports
- To select and agree a subset of existing bugs to be fixed in this release cycle - the ‘fix list’
Quality, features, deadline - pick any two
When the customer insists that the deadline cannot slip and all features must be present, they are choosing to accept lower quality software. It’s a poor choice, but it’s one that good triage can help to mitigate. Remember that as a QA engineer you are the customer’s advocate; and the customer is always right, even when they’re wrong.
Regular triage of your bug reports means that at any time you know which bugs should be fixed to maximise the quality of the software.
Bugs never die
The term ‘triage’ applied to software engineering is really just an analogy. Real patients can die or get well of their own accord. This doesn’t happen with bugs. They never die and they never fix themselves. I call them zombie bugs - the living dead. More on this below.
Decide on a triage partnership
Triage is best accomplished as a partnership between QA and development. For a given project, select a QA engineer and a developer who both have a wide knowledge of the product and can work co-operatively together. Choose individuals of equivalent technical ability and rank.
Arrange regular triage sessions. These may be triggered either at a set time interval or when bug counts reach a pre-determined level.
Review existing bug reports
The severity and priority level originally assigned to a bug should be considered provisional. Each project imposes standards for how bugs should be rated and it’s normal for these to change over time. During triage, each bug will be rated for severity and priority in relation to the other bugs in the triage list. The same bug will rise and fall in the rankings over several triage iterations.
If a bug report has insufficient information, refer it back to the reporter, detailing the information they ought to provide.
If a bug needs to be retested, refer it back to the QA lead. This individual will delegate the retesting task.
Create the ‘fix list’
I recommend using the bug severity level as the indicator that a bug is on the ‘fix list’. This still allows bugs to be prioritised. I discourage the use of keywords or other arbitrary flags or strings as markers for the ‘fix list’ - they’re too easy to mix up, mistype or delete and too easily they fall out of use.
For a small, well-controlled project, it may well be possible to fix all bugs. For a crisis-driven project, only blocker, critical and major issues may make it onto the list of bugs to be fixed. Your issue tracking system has an advanced search feature; learn how to use it well.
Most likely the list of bugs for triage will be the newest bugs logged against a given component. However bugs which were dropped from the ‘fix list’ during the last triage should be evaluated again. See the section on zombie bugs.
Good reasons for including a bug in the ‘fix list’
- The bug has a high likelihood of occurring in production
- The bug is damaging to your company’s brand
- The bug causes loss of data or instability
Good reasons for excluding a bug
- The bug is only triggered in rare or unlikely edge cases
- The high relative cost (time, risk, effort) of fixing the bug is offset by the low likelihood of the bug occurring in production
- The bug report is poorly written
Bad reasons for including a bug
- The QA engineer has an emotional investment in the bug report
- The bug occurs in functionality which the customer doesn’t use
- The bug is actually a feature request
Bad reasons for excluding a bug
- The bug is difficult to reproduce
- The developer considers the bug too boring or too difficult to fix
This list is not exhaustive; feel free to add your own.
Persistence, persistence, persistence
Let’s face it, bug triage is unglamourous. On long-running projects, or for software that’s in sustaining mode, It can seem like a daunting, never-ending task. Don’t let it get you down - a few months of solid effort a couple of times a week will make a substantial difference to the state of your product.
The push for regular triage must begin with QA, but has to be a collaborative effort between QA and development. You may face inertia (or even passive aggressive behaviour) from development. Don’t let this get you down; keep plugging away and eventually the culture will be one where everyone accepts that bug triage is a normal part of daily life. It’s essential that your manager understands the value that the triage effort brings.
Regular and steady triage is far better than monumental spurts of triage.
Be prepared to compromise; nasty bugs will have to drop from this week’s ‘fix list’ at times when the developer’s workload is very heavy. Just triage the bug again the following week.
Being persistent is something you’ll have to learn by practice. Be polite, be technical and be relentless. Bullying or emotional persuasion techniques are counter-productive.
Some words of warning
As I described above, it’s necessary to re-evaluate bugs which have been in the NEW state for many months or even years. It can happen that for a given bug report, at every review, everyone has agreed that yes indeed, this bug report describes a genuine issue, but the issue has never been deemed to be severe enough to merit fixing in the current release.
The process of a genuine bug report turning into a zombie is slow and subtle. It’s not possible to say exactly how long it takes. One day you’ll be using your issue tracking system to do triage and you’ll realize you’re looking at a zombie bug report. Make one last effort to promote the bug to the ‘fix in this release’ list. (You may choose to label the bug report as a zombie.) If the bug remains unfixed, close it - it’ll never get fixed.
Leave your ego at the door
It can happen that you’re triaging a list of bugs with a developer and you come across a bug report that you’ve logged yourself. You remember the effort that you put into it, tracking down the precise cause of the issue and documenting it carefully. Now the developer is telling you it’s not a bug or that it shouldn’t be fixed in this release. What about all the work I put into this bug report? you think to yourself. That’s your ego talking. What your ego wants is not material to the decision as to whether this is or is not a bug that should be fixed. If you find yourself falling into this trap, delegate the triage of bugs you’ve logged yourself to another QA engineer. Most importantly, respect the choice they have made about the relative importance of your bugs!
Equally, you may find yourself battling with a developer over a bug in their code. The developer takes the bug report as a personal criticism (perhaps the bug report is poorly phrased - see my post on writing better bug reports) and refuses to accept the bug is a real issue. If this is happening to you, skip that issue and get a second opinion from another developer. In any case, the specification should contain sufficient detail to help you sort out the problem. If it doesn’t, read my post on writing solid test plans.
I used the following documents while researching this essay:
- Fedora Bugzilla Triage: Overview
- GNOME Bugsquad Triage Guide
- Call for assistance: glibc bugzilla triage