Successful bug triage

Written sometime between 2001 and 2009. The original publication date is lost. This post has moved across three blogging platforms during its life. I preserve it here as a snapshot of my thinking about testing at the time I wrote it. Talk to me before taking anything you read here as gospel!

Introduction

Triage must be performed when there is a shortage of developer resources for fixing all existing known bugs.

The term is borrowed from the medical profession:

Triage is a process of prioritizing patients based on the severity of their condition. This [process] rations patient treatment efficiently when resources are insufficient for all to be treated immediately.

To see how this applies to software engineering, simply replace the word ‘patient’ with ‘bug’ in the text above.

This document explains why it’s necessary to perform regular triage and provides guidelines and tips on how to triage successfully.

A note on terminology

Throughout this document I use the colloquial term bug’ to refer to a software defect - mainly because that’s the term used by Bugzilla — and the fact that it’s easier to say bug’ than software defect’.

I use the terms bug’ and bug report’ interchangeably, but of course they are different entities. Whether I’m referring to the software defect itself or the document which describes it should be apparent from the context.

Why triage?

The purpose of triage is two-fold:

  • To review and agree the current severity and priority level of existing bug reports
  • To select and agree a subset of existing bugs to be fixed in this release cycle - the fix list’

Quality, features, deadline - pick any two

When the customer insists that the deadline cannot slip and all features must be present, they are choosing to accept lower quality software. It’s a poor choice, but it’s one that good triage can help to mitigate. Remember that as a QA engineer you are the customer’s advocate; and the customer is always right, even when they’re wrong.

Regular triage of your bug reports means that at any time you know which bugs should be fixed to maximise the quality of the software.

Bugs never die

The term triage’ applied to software engineering is really just an analogy. Real patients can die or get well of their own accord. This doesn’t happen with bugs. They never die and they never fix themselves. I call them zombie bugs - the living dead. More on this below.

Effective triage

Decide on a triage partnership

Triage is best accomplished as a partnership between QA and development. For a given project, select a QA engineer and a developer who both have a wide knowledge of the product and can work co-operatively together. Choose individuals of equivalent technical ability and rank.

Arrange regular triage sessions. These may be triggered either at a set time interval or when bug counts reach a pre-determined level.

Review existing bug reports

The severity and priority level originally assigned to a bug should be considered provisional. Each project imposes standards for how bugs should be rated and it’s normal for these to change over time. During triage, each bug will be rated for severity and priority in relation to the other bugs in the triage list. The same bug will rise and fall in the rankings over several triage iterations.

If a bug report has insufficient information, refer it back to the reporter, detailing the information they ought to provide.

If a bug needs to be retested, refer it back to the QA lead. This individual will delegate the retesting task.

Create the fix list’

I recommend using the bug severity level as the indicator that a bug is on the fix list’. This still allows bugs to be prioritised. I discourage the use of keywords or other arbitrary flags or strings as markers for the fix list’ - they’re too easy to mix up, mistype or delete and too easily they fall out of use.

For a small, well-controlled project, it may well be possible to fix all bugs. For a crisis-driven project, only blocker, critical and major issues may make it onto the list of bugs to be fixed. Your issue tracking system has an advanced search feature; learn how to use it well.

Most likely the list of bugs for triage will be the newest bugs logged against a given component. However bugs which were dropped from the fix list’ during the last triage should be evaluated again. See the section on zombie bugs.

Good reasons for including a bug in the fix list’

  • The bug has a high likelihood of occurring in production
  • The bug is damaging to your company’s brand
  • The bug causes loss of data or instability

Good reasons for excluding a bug

  • The bug is only triggered in rare or unlikely edge cases
  • The high relative cost (time, risk, effort) of fixing the bug is offset by the low likelihood of the bug occurring in production
  • The bug report is poorly written

Bad reasons for including a bug

  • The QA engineer has an emotional investment in the bug report
  • The bug occurs in functionality which the customer doesn’t use
  • The bug is actually a feature request

Bad reasons for excluding a bug

  • The bug is difficult to reproduce
  • The developer considers the bug too boring or too difficult to fix

This list is not exhaustive; feel free to add your own.

Persistence, persistence, persistence

Let’s face it, bug triage is unglamourous. On long-running projects, or for software that’s in sustaining mode, It can seem like a daunting, never-ending task. Don’t let it get you down - a few months of solid effort a couple of times a week will make a substantial difference to the state of your product.

The push for regular triage must begin with QA, but has to be a collaborative effort between QA and development. You may face inertia (or even passive aggressive behaviour) from development. Don’t let this get you down; keep plugging away and eventually the culture will be one where everyone accepts that bug triage is a normal part of daily life. It’s essential that your manager understands the value that the triage effort brings.

Regular and steady triage is far better than monumental spurts of triage.

Be prepared to compromise; nasty bugs will have to drop from this week’s fix list’ at times when the developer’s workload is very heavy. Just triage the bug again the following week.

Being persistent is something you’ll have to learn by practice. Be polite, be technical and be relentless. Bullying or emotional persuasion techniques are counter-productive.

Some words of warning

Zombie bugs

As I described above, it’s necessary to re-evaluate bugs which have been in the NEW state for many months or even years. It can happen that for a given bug report, at every review, everyone has agreed that yes indeed, this bug report describes a genuine issue, but the issue has never been deemed to be severe enough to merit fixing in the current release.

The process of a genuine bug report turning into a zombie is slow and subtle. It’s not possible to say exactly how long it takes. One day you’ll be using your issue tracking system to do triage and you’ll realize you’re looking at a zombie bug report. Make one last effort to promote the bug to the fix in this release’ list. (You may choose to label the bug report as a zombie.) If the bug remains unfixed, close it - it’ll never get fixed.

Leave your ego at the door

It can happen that you’re triaging a list of bugs with a developer and you come across a bug report that you’ve logged yourself. You remember the effort that you put into it, tracking down the precise cause of the issue and documenting it carefully. Now the developer is telling you it’s not a bug or that it shouldn’t be fixed in this release. What about all the work I put into this bug report? you think to yourself. That’s your ego talking. What your ego wants is not material to the decision as to whether this is or is not a bug that should be fixed. If you find yourself falling into this trap, delegate the triage of bugs you’ve logged yourself to another QA engineer. Most importantly, respect the choice they have made about the relative importance of your bugs!

Equally, you may find yourself battling with a developer over a bug in their code. The developer takes the bug report as a personal criticism (perhaps the bug report is poorly phrased - see my post on writing better bug reports) and refuses to accept the bug is a real issue. If this is happening to you, skip that issue and get a second opinion from another developer. In any case, the specification should contain sufficient detail to help you sort out the problem. If it doesn’t, read my post on writing solid test plans.

Further reading

I used the following documents while researching this essay:

May 31, 2007

Writing solid test plans

Written sometime between 2001 and 2009. The original publication date is lost. This post has moved across three blogging platforms during its life. I preserve it here as a snapshot of my thinking about testing at the time I wrote it.

Introduction

This short document provides you with the background context you need in order to be able to write good test plans. It explains that there are a number of perspectives from which you must view a piece of software in order to test it properly.

You might like to print this out and read it on paper. Your feedback is welcome!

Understand the business requirements

A tester is a proxy for the most important stakeholder in the entire project: the customer. It’s important that the tester has a good understanding of why a customer wants a new feature to work in a particular way. Knowing this helps you to predict what the feature should and should not do, which in turn feeds into your test plan.

If the Business Requirement Document is made available to you, read it. Ask questions. Print it out. Scribble in the margins. Don’t assume that the customer has actually thought things out properly for themselves. Try to find holes - situations that the author has missed. Talk to the person in your company who is the direct customer liason.

Understand the technical specification

The Business Requirement Document outlines why the customer needs a given feature and what they want it to do. The Technical Specification describes how the feature will be implemented. Look out for mismatches between the two documents - assume that the person who translated the business requirement into a technical specification doesn’t have perfect knowledge. Expect to find mismatches, ambiguities and contradictions.

You can assume the document author has imperfect knowledge, but If you find an ambiguity in the document, don’t assume that anyone else will resolve it the way you might; bring it to the attention of the document author and make sure the document is amended to resolve the dependency. Check that the resolution to the ambiguity had input from the customer. This is testing.

Even without the Business Requirements Document (or Functional Specification) you can still make yourself very useful (and annoying) by asking penetrating questions about the Technical Specification. Ask why the particular architecture has been chosen - is it a fast and cheap solution, is it a compromise enforced by the existing architecture? Will this solution result in a solid, maintainable, modular solution with minimal interdependencies?

Know your domain

Domain knowledge: sounds great, but it just means knowing your stuff”. This is something that comes with practice; everyone starts off knowing virtually nothing about the product they’re testing. Make sure you have an experienced mentor, who can sketch out the product architecture at various levels of detail for you as your knowledge grows. Challenge yourself to understand as much as you can about the inter-relationships between the various parts of the system you’re testing. Talk to the developers and get them to explain how components work internally. Don’t believe anybody who tells you that testing is best done by black box testing. That’s just one tool out of many in the toolkit of an experienced tester.

Scoping out a test plan

Now that you’ve got an understanding of what the customer wants, how the development team are going to build it and the way the current system works, it’s time to start sketchng out a test plan.

Speak with your team lead and decide on major test headings and what the goals of those test headings are. Some example test headings follow. These start off with functional tests, looking at the system through a microscope, moving to larger and larger views of the system until you are testing the entire system as a whole, taking into account not only the software under test, but foreign systems with which the system has to communicate as well as the hardware and network on which the system is running.

Functional tests

These tests are designed to verify that the system behaves as intended. The simplest functional tests will verify components and the tasks you can perform with them. For example, creating, reading, updating and deleting a user account. This is known as CRUD, which are the most elementary operations you can perform with a database-backed system. Write separate tests for each of these activities.

Remember that the read’ part of CRUD includes searching; this can quickly multiply out into a large number of tests if you have several search parameters. In general, the correct approach is to write a separate test for each individual search parameter and then write tests for combinations of parameters. Paging though results in a UI is also a class of read’ test.

The same CRUD approach can be taken with API tests; APIs are used to create, read update and delete entities in the system.

Each test only needs to have one expected outcome’. If your test has several expected outcomes interspersed between a chain of activities, break these out into separate tests and instead set up the necessary preceding steps as prerequisites.

Other types of functional tests which need to be covered are state change tests: for example, moving a user from the NEW to the ACTIVE and then to the SUSPENDED state. State change tests are all subsets of the CRUD tests - after all, moving a user from NEW to ACTIVE is a type of update.

Data validation is the final type of basic functional test. These types of test are concerned with verifying that the system can cope with input in expected and unexpected formats - for example, try to create a user with no name, or a name with one million characters. Does the system cope properly with these types of input? Does it swallow the input and then break when you try to read the data back? See the TROLL page for more information on the different types of input checking that need to be tested.

Remember that not just UIs or APIs need data validation: if the system has to read data from a configuration file, check how it copes with bad or missing data in the file.

Integration tests

Integration testing is where the focus is in how the system’s components work together. The components of a typical system include remote APIs, databases, mail servers, message queues and gateways to third-party systems. These interfaces may use clearly defined protocols (HTTP, SOAP, SMTP…) or they may use proprietary protocols (Oracle’s network interface, WebLogic’s T3 protocol…)

Integration testing has two main focuses - validating that the component parts of the system work together faultlessly under normal operation and verifying that the system copes properly with failures in remote systems. The latter type of testing is called negative or failure testing.

The failure or unavailability of some remote systems are fatal to the system; it cannot function at all without them. A typical examples is the database. Failures in other remote systems may be recoverable - you need to understand how the system should cope with these failures before you begin writing tests.

Integration testing requires a very subtle approach; what if the database is clustered? The system should be able to fail over to the still-functioning database with no loss of data.

Any remote system which relies on the network must be carefully integration tested: network problems can mean that an expected response from a remote system never arrives. How does the system behave? Does it wait forever for the lost response? Note how these tests are similar to but subtly different from data validation tests at the functional test level.

It’s possible to create a rock-solid system which is very tolerant of badly-behaved remote systems if we are able to build simulators of those remote systems which can be triggered to produce bad input or no input at all, simulating errors at various levels throughout the network stack: application, transport, internet and link.

A key idea to remember when performing integration testing is to ensure that your system is very conservative in what it sends out, and very liberal in what it receives. In practice, that means we should validate (error-check) our data before we send it and we should validate any data we receive before we attempt to do anything else with it.

System tests

You read about state change tests at the functional test level. The focus of those tests is to verify that the GUI (or API) can be used to move an entity through various states. State change tests at the system testing level depend on the functional tests and integration tests working properly and are focused on what happens when an entity tries to interact with other entities in the system while in various states. Another way of looking at is is to think of the actions which are associated with certain entities. For example, if a user is in the suspended state, what happens when they try to make a purchase? These types of state-change test are also called lifecycle tests, and belong at the system test level, not at the functional test level.

If two (or more) entities are interacting, draw up a matrix of each of the states that these entitles can be in and write tests for each of those state combinations. For example, a user and their account can be in several different states at the same time. Only some are valid combinations. Find out which combinations are possible on the system and test these, then verify that the invalid states are disallowed. Here’s a sample matrix for an imaginary payments system:

User and Account state matrix
 
User states



    NEW ACTIVE SUSPENDED TERMINATED

Account states

NEW valid - user may be in NEW state and have NEW account valid - user may be in ACTIVE state and have NEW account valid - user may be in SUSPENEDED state and have NEW account valid - user may be in TERMINATED state and have NEW account
  ACTIVE invalid combination - account can’t be made ACTIVE before the user is made ACTIVE valid - user may be in ACTIVE state and have ACTIVE account invalid combination - the account must move to the SUSPENDED state at the same time that the user moves to SUSPENDED invalid combination - if user is terminated, all user accounts must be terminated first.
  SUSPENDED invalid combination - account can’t be SUSPENDED before the user is made ACTIVE valid - user may be ACTIVE while their account is SUSPENDED valid - a SUSPENDED user’s accounts are also SUSPENDED invalid combination - if user is terminated, all user accounts must be terminated first.
  TERMINATED invalid combination - account can’t be TERMINATED before the user is made ACTIVE  valid - user may be ACTIVE while their account is TERMINATED  valid - a SUSPENDED user may have an account which is TERMINATED valid -  a TERMINATED user’s accounts are also TERMINATED

Performance tests

For the time being, see my post on performance testing guidelines, which includes tips on how to write tests plans for performance testing.

Other types of test

Usability testing

Usability testing is another perspective that you have to keep in mind while testing. It is the job of verifying that human users of the software can carry out the tasks for which the software is designed with minimum effort and confusion. Usability testing is one of the few testing functions which is impossible to automate!

The software delivered to you might match the specification perfectly, but if you have to scroll around a UI endlessly or make several clicks where one would do, or a GUI component isn’t chosen well, then the software has usability issues.

Most often usability issues come to light at the specification stage. The job of finding these issues at specification time is made easier if you’ve been provided with mockups or wireframes’ of how the UIs will look. Unfortunately, usability issues are often treated as low priority issues when they’re found after the implementation has been done, so it’s important to find them as early as possible.

Test design

Be clear in what you’re testing

State in your test plan whether the test or set of tests are functional, integration or any other type. Make sure you keep to that perspective while writing your tests - don’t get sucked into checking what happens in the database when writing a functional test, for example.

For every test case, state the reason for your test: this sounds inane, but sometimes it’s just not obvious why something is being tested. For example, indicate if the test is part of a series of CRUD tests. Refer to use cases in the source documents - the business requirements document, functional specification and technical specification.

As well as having a reason for being executed, every test should have just one expected outcome. You should know in advance what the result will be - even long before you get your fingers on running code. If you find yourself dithering over an expected outcome, congratulate yourself for having found a hole in the specification and alert the person responsible for maintaining the specification - you’ve just found a specification bug! Finding bugs early is one of the best prizes of Quality Assurance.

Separating tests and data

A test case specifies a set of actions which must be performed and the data which must be supplied as part of those actions. Manual tests typically specify the data in the test case.

Deciding whether to separate tests and data for automated testing purposes depends on the complexity of your functionality. If you have simple functionality (for example, Google search) then it makes sense to separate out the tests from the data: a series of simple tests can iterate over vast data sets.

If you have complex functionality (For example, complex entities within your system which go through state transitions and can perform actions on other entities in your system), separating tests and data is not recommended.

Some of the data’ are entities which must be created in the system. Even if you separate the data from the tests, you would have to create these entities in the system anyway. So ultimately you have to do the same amount of work but in two different places. It’s difficult to keep these coordinated.

For example, if a test needs a suspended user, you have to add code to the test’ module to create a suspended user for this test. Then the other properties of the suspended user have to be listed in a data’ file, which will be read by the test. This doesn’t really make sense! It is much better to create the user and set their state and other properties as and when the test requires it.

Avoiding inter-dependencies

It’s essential to ensure that you don’t use one test case to set up data or state for a following test case. This introduces interdependencies between tests and makes the tests impossible to run in isolation. It’s a difficult habit to break if you are used to manual testing, but must be avoided once you move on to building automated tests.

Grouping your tests logically

It should be clear by now that a test is an item that can be independently validated (for example: Enter the service URL; the home page appears”).

Tests can be grouped into a test case, most often a logical series of steps. In turn, test cases can be grouped into a test scenario, which is typically a complete business transaction.

Be aware that this approach can lead you into chaining tests together, setting up interdependencies between them. As stated earlier, this is not good practice if the tests are to be automated. When automating tests, don’t just assume that you can string together a bunch of existing functional automated tests to produce a valid test for a complete business transaction. Instead, write the business transaction test as a separate test. Taking that approach means that the functional tests can be run standalone and so can the business transaction test.

Conclusion

After reading this document, I hope you’ll have a feel for the fact that testing requires you to look at a system from a variety of distances and angles.

You can get right in and close-up, verifying the behaviour of components on an individual GUI page, or you can stand back to see the system interacting with other systems around it.

You can view the same component from different directions: checking how its state changes and how it interacts with other components. The design of a component affects its usability.

No matter which angle you’re looking at a system from, it’s very important to be clear in your own mind where you are standing, and not to confuse one perspective of the system with another. When writing a test, always ask yourself what you are trying to test. Don’t try to make each test a complex, all-singing, all dancing test which does fifteen different things. It may feel efficient when you’re running the test, but this approach makes it very hard to diagnose what’s happening when your test triggers the inevitable bug.

Acknowledgements

Thanks to Ciara for suggesting the entity state interaction matrix, to Chaminda for a very illuminating discussion on the merits (or otherwise) of separating tests and data and Sisira for feedback on logical test structure.

May 30, 2007

Performance testing guidelines

Written sometime between 2001 and 2009. The original publication date is lost. This post has moved across three blogging platforms during its life. I preserve it here as a snapshot of my thinking about testing at the time I wrote it.

Introduction

This page exists to help you design, run and interpret useful performance tests. These are high-level tips to help you avoid common pitfalls. It’s still largely draft; feel free to contact me if you have any questions about any of this!

Unlike other forms of testing, performance testing is more prone to the bike shed syndrome: everyone has an opinion on what colour to paint the bike shed. In fact, building a meaningful performance test is hard. The bike shop syndrome makes it harder.

Good reasons for executing performance tests

There are two main reasons for performance testing:

  • Execute the same performance test over several builds while making performance improvements
  • Execute the same performance test to verify that performance remains stable as the software evolves

Deciding what to measure

In short, pick one or two things to measure, but monitor several system statistics.

Let’s say you’ve got to build a test which will measure the time to create a number of entities (Let’s call them transactions’, but they could be anything) in your test system. From this you’ll be able to generate two figures: average number of transactions created per second and average time to create a single transaction. (See the difference?) In order for these figures to be meaningful, you’ll need to monitor the load on the machine (or machines) during the test. It’s the system statistics that give context to your primary measurements. For example, you may achieve a thoughput of 20 transactions per second, but this only becomes meaningful when you also report that the database CPU spends 20% of its time waiting for the disk (iowait) and virtually none of its time in user time. In this case, you’re disk-bound at the database tier. The most likely culprit is inefficient query design, or costly database write operations.

The test complexity trap

Performance tests are difficult to design well; you’ll fall into the test complexity trap by trying to make the test be all things to all people. Design your test to do one thing well. Don’t expect it to be an all-singing, all dancing performance tool. If you do, most likely the output it produces will be worthless and you’ll have wasted a lot of time.

Another factor that increases complexity is shifting goalposts. The goalposts shift when a stakeholder asks you to measure one more parameter, or carry out the test in a slightly different way. Tell them that this is actually a different set of tests. If you change direction after a period of days or weeks, any data you’ve already gathered is rendered meaningless.

Test design

Keep it simple

As far as possible, limit confounding factors and over-complex test scenarios. These are variables that upset the accuracy of your test results.

Confounding factors include poorly tuned Java VM heap, poorly tuned connection pool configuration, database misconfiguration or other system components taking up CPU time. If your database has a single disk (or even a pair of disks in a RAID0 configuration) you can expect performance to drop off slowly if your test runs for several hours. This is caused by ever-increasing disk activity on the database server as the volume of data grows. It seems to be an artifact of Oracle housekeeping, but may apply to other DBMSs as well.

When designing your test, you need to think of the effects of each of the elements that make up the entire stack:

The software stack

  • The software under test
  • The Virtual Machine on which the application software runs
  • Third party libraries (database connectors, connection pools, message queues, protocol stacks…)
  • The operating system

The hardware stack

  • Hardware: CPU, memory, disk, network
  • Networked components such as databases and load balancers

Use the same hardware for all your tests. Guard it with your life. If halfway though your testing you’re forced to switch to using different hardware, all of your previous test results are meaningless. The weeks of work you’ve spend gathering and analyzing them are wasted. Make sure everyone understands this.

It’s tempting to try to get maximum performance’ from your system by kicking off several threads and hammering the system repeatedly to get as much throughput as possible; this doesn’t actually give you an accurate reflection of how your application performs because you’re forcing it up against a bottleneck somewhere.

Understanding your data

Complex test scenarios include starting with a pre-populated database. To begin with, do your performance testing on an empty database, with just enough schema and data to start your application. As the data population grows and grows, your data or database configuration will become a confounding factor. Incorrectly designed schema, missing or wrong indexes and inefficient queries can affect the performance of your application, especially as the database population grows. Poor database disk tuning can also affect performance - this may only become apparent as the data volume grows.

Be aware that Oracle will cause smoothly degrading performance in your application as the size of your database population grows. Avoid executing performance tests on a single-spindle ext2/3 Oracle database. Instead, use a database backed by an OCFS disk array. Multi-disk OCFS databases appear to be immune to this performance degradation.

Note also that apparently simple things like the automatically generated names you give to entities can adversely affect performance. Take for example an entity name, generated from a UNIX timestamp, which is indexed for fast searching. If all of the entity names are almost the same, then the index b-tree will be hugely lop-sided. This will badly affect the efficiency of the index, giving you skewed results for search performance. We’ve found it’s better to hash and then base64-encode entity names in order to generate truly unique names. No, they won’t be very human-readable, but they’ll be printable and will index like real names.

Choosing a tool

You have complete freedom on this one; choose what works for you and allows you to get your results in the shortest time. I favour the Grinder because of its built-in ability to gather statistics. Jmeter is another tool which I’m planning to investigate.

Running the test

Here’s something you can be certain of: you’ll have to run your test multiple times. Make sure you include enough time in your project schedule for this.

  • The first few test runs will be necessary to iron out kinks in the test design.
  • The second set of runs will highlight the most obvious confounding factors.
  • The third set of runs will start to give you a meaningful performance baseline. You can start to record statistics from this point
  • During the fourth set of runs you’ll carry out some runs to see what effect tweaking certain variables has on the outcome (and beware of falling into the rabbit warren on this one!)
  • Finally, as you test new builds, you’ll be able to show what effects attempts to improve performance are having.

Fixing performance issues is hard, so expect several iterations of new builds, test runs and slow, incremental improvements. Schedule accordingly. Expect to find and remove a series of bottlenecks.

Thread tuning

Tuning incoming request handling threads and database connection pool size is a bit of a black art. I’ll attempt to clarify the topic here. There are some specifics in this post on tuning Tomcat incoming connection request threads.

Working in conjunction with developers, I use rules of thumb derived from testing experience to converge on what I consider a sensible default configuration for incoming thread connection tuning and for database connection pool tuning. Performance testing is an integral part of this process. These settings are subsequently further tuned in production.

Note that even if you bump up the number of threads handling incoming connections, you’re still constrained by the number of database connection pool threads. If all of the database connection pool threads are busy, then an incoming request will be accepted by an incoming request thread, but that thread may have to wait for a database connection to come free. Annoyingly, there isn’t a one-to-one mapping between incoming request threads and database connection pool threads.

Diminishing returns

Adding more client threads (or even server threads) beyond a certain point will not improve performance - in fact it’s likely to negatively impact performance. The graph below illustrates this.

The graph plots results for a series of test runs. Two parameters are measured for each test run; response time (average transaction time) and thoughput (average transactions per second). The first test run uses just one client thread; client threads are progressively added in each subsequent run, finishing with nine client threads in the last test run.

When the data is graphed, the data points in each set are joined by a line to highlight the differences between test runs. The scale for response times and throughput are both on the left.

As the number of clients increases from 1 to 5, throughput (transactions per second) increases steadily, and response time (transaction time) is barely affected.  However as the number of client threads increases beyond six, response time gets longer and longer, and the number of transactions processed per second barely increases. The system is saturated.

Although this data is faked up to show how a typical system will behave, let’s imagine for a moment that this graph shows the true behaviour of a real system. Let’s say, having never performance tested this system before, that you put together a test that triggers eight client threads. Your result will show that you can push through 70 transactions per second, but at a cost of 40 seconds per transaction. This isn’t the optimal load for the server. The optimal load, which is the best trade-off between the highest throughput and the lowest transaction time, is actually a five-client load.

Monitoring the system

After much experimentation, I’ve found that vmstat provides the most useful information. It’s a one-stop-shop for recording CPU and virtual memory behaviour before, during and after your test run.

For diagnosing heap problems, visualgc is essential.

You may like to use top in batch mode to monitor CPU and memory usage. I’ve found that nine times out of ten, unexpected test results are down to resource-contention - specifically, contention for either CPU or disk on the database box. You can quickly discover if what’s causing the problem by using top first on your application-tier hosts and then on the database-tier hosts:

top - 12:16:37 up 116 days, 18:03,  2 users,  load average: 14.12, 4.73, 1.67
Tasks: 319 total,  15 running, 304 sleeping,   0 stopped,   0 zombie
Cpu0  : 79.9% us, 19.4% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.6% hi,  0.0% si
Cpu1  : 75.5% us, 24.2% sy,  0.0% ni,  0.0% id,  0.3% wa,  0.0% hi,  0.0% si


Press 1’ on your keyboard to see the load on individual CPUs.

The first line shows the load average over the last one, five and fifteen minutes. A perfectly loaded machine has a load average of 1. Anything above one and your host is struggling.

In the example above, the load average over the last minute is 14.12. That’s off the scale. Now look at the two CPUs: The most important metrics are us (user time), sy (system time), id (idle time) and wa (wait time). Fist of all, the CPUs are spending zero time in the idle state. The rest of the time is spent in user time (which is the time the CPU spends running user processes) and system time (which is the time the CPU spends in operating system calls; running the task scheduler, memory management, managing I/O and so on). The other useful CPU metric is wait time, which is the time spent waiting for I/O devices - literally waiting for the disk to spin around to read or write the required block.

If the system wait time is high (5% or more), then the machine is disk-bound; if the idle time is low or zero but user and system time are high, then the machine is CPU-bound.

Interpreting performance test results

“It’s almost never the network.” If it is, your network is misconfigured and your test run is invalidated.

Are my results even valid?

Well, on this particular hardware, yes, they probably are. Attempting to extrapolate your results to other (perhaps similar) hardware will quickly bring you into the realms of speculation and wishful thinking, no matter how rigorous you think you’re being with your calculations. If your software crawls along on a single disk spindle, it’s pure fantasy to expect it to run twelve times faster on twelve spindles. You can’t even say it’ll run twice as fast. You just don’t know. If you want to know, get a disk array with twelve spindles and try it.

If any part of your hardware stack has changed, all of your previous results are invalid.

If any CPU in your hardware stack is spending more than 20% of its time in the iowait state, then that component is disk-bound and you’re not measuring true throughput.

Tips on writing up your results

Most often, you’ll be writing bug reports. Sometimes you’ll be expected to write up a formal document for internal consumption, or, rarer yet, external consumption. Scott Barber has written an excellent presentation on the correct way to present your results.

Report the CPU, memory and disk details of the hardware you’re using. Note any changes to default configurations that you’ve made on the software under test, the Java VM, the operating system, the database server or any other component.

Further reading

May 30, 2006

Writing solid test plans

Introduction

This short document provides you with the background context you need in order to be able to write good test plans. It explains that there are a number of perspectives from which you must view a piece of software in order to test it properly.

You might like to print this out and read it on paper. Your feedback is welcome!

Understand the business requirements

A tester is a proxy for the most important stakeholder in the entire project: the customer. It’s important that the tester has a good understanding of why a customer wants a new feature to work in a particular way. Knowing this helps you to predict what the feature should and should not do, which in turn feeds into your test plan.

If the Business Requirement Document is made available to you, read it. Ask questions. Print it out. Scribble in the margins. Don’t assume that the customer has actually thought things out properly for themselves. Try to find holes - situations that the author has missed. Talk to the person in your company who is the direct customer liason.

Understand the technical specification

The Business Requirement Document outlines why the customer needs a given feature and what they want it to do. The Technical Specification describes how the feature will be implemented. Look out for mismatches between the two documents - assume that the person who translated the business requirement into a technical specification doesn’t have perfect knowledge. Expect to find mismatches, ambiguities and contradictions.

You can assume the document author has imperfect knowledge, but If you find an ambiguity in the document, don’t assume that anyone else will resolve it the way you might; bring it to the attention of the document author and make sure the document is amended to resolve the dependency. Check that the resolution to the ambiguity had input from the customer. This is testing.

Even without the Business Requirements Document (or Functional Specification) you can still make yourself very useful (and annoying) by asking penetrating questions about the Technical Specification. Ask why the particular architecture has been chosen - is it a fast and cheap solution, is it a compromise enforced by the existing architecture? Will this solution result in a solid, maintainable, modular solution with minimal interdependencies?

Know your domain

Domain knowledge: sounds great, but it just means knowing your stuff”. This is something that comes with practice; everyone starts off knowing virtually nothing about the product they’re testing. Make sure you have an experienced mentor, who can sketch out the product architecture at various levels of detail for you as your knowledge grows. Challenge yourself to understand as much as you can about the inter-relationships between the various parts of the system you’re testing. Talk to the developers and get them to explain how components work internally. Don’t believe anybody who tells you that testing is best done by black box testing. That’s just one tool out of many in the toolkit of an experienced tester.

Scoping out a test plan

Now that you’ve got an understanding of what the customer wants, how the development team are going to build it and the way the current system works, it’s time to start sketchng out a test plan.

Speak with your team lead and decide on major test headings and what the goals of those test headings are. Some example test headings follow. These start off with functional tests, looking at the system through a microscope, moving to larger and larger views of the system until you are testing the entire system as a whole, taking into account not only the software under test, but foreign systems with which the system has to communicate as well as the hardware and network on which the system is running.

Functional tests

These tests are designed to verify that the system behaves as intended. The simplest functional tests will verify components and the tasks you can perform with them. For example, creating, reading, updating and deleting a user account. This is known as CRUD, which are the most elementary operations you can perform with a database-backed system. Write separate tests for each of these activities.

Remember that the read’ part of CRUD includes searching; this can quickly multiply out into a large number of tests if you have several search parameters. In general, the correct approach is to write a separate test for each individual search parameter and then write tests for combinations of parameters. Paging though results in a UI is also a class of read’ test.

The same CRUD approach can be taken with API tests; APIs are used to create, read update and delete entities in the system.

Each test only needs to have one expected outcome’. If your test has several expected outcomes interspersed between a chain of activities, break these out into separate tests and instead set up the necessary preceding steps as prerequisites.

Other types of functional tests which need to be covered are state change tests: for example, moving a user from the NEW to the ACTIVE and then to the SUSPENDED state. State change tests are all subsets of the CRUD tests - after all, moving a user from NEW to ACTIVE is a type of update.

Data validation is the final type of basic functional test. These types of test are concerned with verifying that the system can cope with input in expected and unexpected formats - for example, try to create a user with no name, or a name with one million characters. Does the system cope properly with these types of input? Does it swallow the input and then break when you try to read the data back? See the TROLL page for more information on the different types of input checking that need to be tested.

Remember that not just UIs or APIs need data validation: if the system has to read data from a configuration file, check how it copes with bad or missing data in the file.

Integration tests

Integration testing is where the focus is in how the system’s components work together. The components of a typical system include remote APIs, databases, mail servers, message queues and gateways to third-party systems. These interfaces may use clearly defined protocols (HTTP, SOAP, SMTP…) or they may use proprietary protocols (Oracle’s network interface, WebLogic’s T3 protocol…)

Integration testing has two main focuses - validating that the component parts of the system work together faultlessly under normal operation and verifying that the system copes properly with failures in remote systems. The latter type of testing is called negative or failure testing.

The failure or unavailability of some remote systems are fatal to the system; it cannot function at all without them. A typical examples is the database. Failures in other remote systems may be recoverable - you need to understand how the system should cope with these failures before you begin writing tests.

Integration testing requires a very subtle approach; what if the database is clustered? The system should be able to fail over to the still-functioning database with no loss of data.

Any remote system which relies on the network must be carefully integration tested: network problems can mean that an expected response from a remote system never arrives. How does the system behave? Does it wait forever for the lost response? Note how these tests are similar to but subtly different from data validation tests at the functional test level.

It’s possible to create a rock-solid system which is very tolerant of badly-behaved remote systems if we are able to build simulators of those remote systems which can be triggered to produce bad input or no input at all, simulating errors at various levels throughout the network stack: application, transport, internet and link.

A key idea to remember when performing integration testing is to ensure that your system is very conservative in what it sends out, and very liberal in what it receives. In practice, that means we should validate (error-check) our data before we send it and we should validate any data we receive before we attempt to do anything else with it.

System tests

You read about state change tests at the functional test level. The focus of those tests is to verify that the GUI (or API) can be used to move an entity through various states. State change tests at the system testing level depend on the functional tests and integration tests working properly and are focused on what happens when an entity tries to interact with other entities in the system while in various states. Another way of looking at is is to think of the actions which are associated with certain entities. For example, if a user is in the suspended state, what happens when they try to make a purchase? These types of state-change test are also called lifecycle tests, and belong at the system test level, not at the functional test level.

If two (or more) entities are interacting, draw up a matrix of each of the states that these entitles can be in and write tests for each of those state combinations. For example, a user and their account can be in several different states at the same time. Only some are valid combinations. Find out which combinations are possible on the system and test these, then verify that the invalid states are disallowed. Here’s a sample matrix for an imaginary payments system:

User and Account state matrix
 
User states



    NEW ACTIVE SUSPENDED TERMINATED

Account states

NEW valid - user may be in NEW state and have NEW account valid - user may be in ACTIVE state and have NEW account valid - user may be in SUSPENEDED state and have NEW account valid - user may be in TERMINATED state and have NEW account
  ACTIVE invalid combination - account can’t be made ACTIVE before the user is made ACTIVE valid - user may be in ACTIVE state and have ACTIVE account invalid combination - the account must move to the SUSPENDED state at the same time that the user moves to SUSPENDED invalid combination - if user is terminated, all user accounts must be terminated first.
  SUSPENDED invalid combination - account can’t be SUSPENDED before the user is made ACTIVE valid - user may be ACTIVE while their account is SUSPENDED valid - a SUSPENDED user’s accounts are also SUSPENDED invalid combination - if user is terminated, all user accounts must be terminated first.
  TERMINATED invalid combination - account can’t be TERMINATED before the user is made ACTIVE  valid - user may be ACTIVE while their account is TERMINATED  valid - a SUSPENDED user may have an account which is TERMINATED valid -  a TERMINATED user’s accounts are also TERMINATED

Performance tests

For the time being, see my post on performance testing guidelines, which includes tips on how to write tests plans for performance testing.

Other types of test

Usability testing

Usability testing is another perspective that you have to keep in mind while testing. It is the job of verifying that human users of the software can carry out the tasks for which the software is designed with minimum effort and confusion. Usability testing is one of the few testing functions which is impossible to automate!

The software delivered to you might match the specification perfectly, but if you have to scroll around a UI endlessly or make several clicks where one would do, or a GUI component isn’t chosen well, then the software has usability issues.

Most often usability issues come to light at the specification stage. The job of finding these issues at specification time is made easier if you’ve been provided with mockups or wireframes’ of how the UIs will look. Unfortunately, usability issues are often treated as low priority issues when they’re found after the implementation has been done, so it’s important to find them as early as possible.

Test design

Be clear in what you’re testing

State in your test plan whether the test or set of tests are functional, integration or any other type. Make sure you keep to that perspective while writing your tests - don’t get sucked into checking what happens in the database when writing a functional test, for example.

For every test case, state the reason for your test: this sounds inane, but sometimes it’s just not obvious why something is being tested. For example, indicate if the test is part of a series of CRUD tests. Refer to use cases in the source documents - the business requirements document, functional specification and technical specification.

As well as having a reason for being executed, every test should have just one expected outcome. You should know in advance what the result will be - even long before you get your fingers on running code. If you find yourself dithering over an expected outcome, congratulate yourself for having found a hole in the specification and alert the person responsible for maintaining the specification - you’ve just found a specification bug! Finding bugs early is one of the best prizes of Quality Assurance.

Separating tests and data

A test case specifies a set of actions which must be performed and the data which must be supplied as part of those actions. Manual tests typically specify the data in the test case.

Deciding whether to separate tests and data for automated testing purposes depends on the complexity of your functionality. If you have simple functionality (for example, Google search) then it makes sense to separate out the tests from the data: a series of simple tests can iterate over vast data sets.

If you have complex functionality (For example, complex entities within your system which go through state transitions and can perform actions on other entities in your system), separating tests and data is not recommended.

Some of the data’ are entities which must be created in the system. Even if you separate the data from the tests, you would have to create these entities in the system anyway. So ultimately you have to do the same amount of work but in two different places. It’s difficult to keep these coordinated.

For example, if a test needs a suspended user, you have to add code to the test’ module to create a suspended user for this test. Then the other properties of the suspended user have to be listed in a data’ file, which will be read by the test. This doesn’t really make sense! It is much better to create the user and set their state and other properties as and when the test requires it.

Avoiding inter-dependencies

It’s essential to ensure that you don’t use one test case to set up data or state for a following test case. This introduces interdependencies between tests and makes the tests impossible to run in isolation. It’s a difficult habit to break if you are used to manual testing, but must be avoided once you move on to building automated tests.

Grouping your tests logically

It should be clear by now that a test is an item that can be independently validated (for example: Enter the service URL; the home page appears”).

Tests can be grouped into a test case, most often a logical series of steps. In turn, test cases can be grouped into a test scenario, which is typically a complete business transaction.

Be aware that this approach can lead you into chaining tests together, setting up interdependencies between them. As stated earlier, this is not good practice if the tests are to be automated. When automating tests, don’t just assume that you can string together a bunch of existing functional automated tests to produce a valid test for a complete business transaction. Instead, write the business transaction test as a separate test. Taking that approach means that the functional tests can be run standalone and so can the business transaction test.

Conclusion

After reading this document, I hope you’ll have a feel for the fact that testing requires you to look at a system from a variety of distances and angles.

You can get right in and close-up, verifying the behaviour of components on an individual GUI page, or you can stand back to see the system interacting with other systems around it.

You can view the same component from different directions: checking how its state changes and how it interacts with other components. The design of a component affects its usability.

No matter which angle you’re looking at a system from, it’s very important to be clear in your own mind where you are standing, and not to confuse one perspective of the system with another. When writing a test, always ask yourself what you are trying to test. Don’t try to make each test a complex, all-singing, all dancing test which does fifteen different things. It may feel efficient when you’re running the test, but this approach makes it very hard to diagnose what’s happening when your test triggers the inevitable bug.

Acknowledgements

Thanks to Ciara for suggesting the entity state interaction matrix, to Chaminda for a very illuminating discussion on the merits (or otherwise) of separating tests and data and Sisira for feedback on logical test structure.

May 30, 2005