The major challenge facing performance testing projects is that they’ve become the “Tick in a Box” task for most IT projects. That is, they tend to exceed project budget, effort, and outrun projected timelines. The main reason for this is that performance assurance is not considered worth the cost. Most of the time, it doesn’t achieve the goals, nor does it provide the designated outcome. Secondly, while optimizing IT project budgets, performance testing isn’t considered a priority and, thus, falls to the wayside. Lastly, there’s a perception that new-age infrastructure, platforms, and solutions of the Digital Revolution are making performance-proof applications and antiquating performance assurance.
The truth is that performance assurance makes or breaks the criteria of any successful IT application. Even with access to new-age technology and necessary infrastructure, performance issues impact every IT application. Your business needs adequate performance assurance to be confident about the output of any application.
Frequent advancements in technology, readily available and scalable infrastructure, and better platforms for IT applications lead to neglect around performance assurance for applications. Believing that better support and advanced software solutions ensure the performance of an application, leads to performance negligence. Thus, performance assurance projects have been brushed to the side and deemed not worth the budget, effort, or time.
However, this notion has been broken by a Prime incident mid-last year, which made businesses reconsider the need for performance assurance as one of the keys to a successful IT application. The real secret lies in how the whole performance assurance process has been shaped and used to unlock the door of IT application success.
In spite of having access to the best possible infrastructure, platform, and the solution, Prime still faced an outage on its planned “Biggest Day.” Although they recovered, making significant profits, it left people wondering about the thousands of big companies that rely on their services. We’re also left wondering about other XaaS providers that run their IT applications and the stability and performance of moving onto the cloud, digital platforms, and other solutions.
Performance assurance activity has to justify its worth. It needs to provide accurate, quality results, which could impact IT projects by predicting correct performance, finding bottlenecks, and capacity planning during production. The significant challenges performance assurance project to certify its effectiveness and relevance include:
- Performance testing environment size in comparison to the production environment
- Test design for accurate representation and emulation of production load
- Commercial tool cost for testing and monitoring
- Quality of results
- Precise monitoring and analysis for the tests
- Extrapolation of results
- Time to market
All the above are influenced by considering the vitality of performance assurance activity. Then, accordingly, considering whether it’s worth the budget, timeline, and effort.
Performance assurance activity is as good as its ability to emulate the real-life conditions and test the application with real user loads. Testing environment scalability (in comparison to production) and extrapolation of the results also impact the quality/relevance of performance assurance solutions. This is based on the following factors:
1) Test Design to Emulate Real User Behavior
The performance test should be able to achieve its objective as long as it’s capable of imitating real user behavior. The biggest challenge to recreating real user behavior is the lack of clarity and actual application usage. The test design depends on how well the performance testing requirement has been understood and gathered, and also the capability of the performance testing tool to design a realistic situation.
Performance Testing Scenarios:
Identifying business scenarios to be simulated is very important for the success of the performance assurance process. Though the load on any IT application comes from the business scenarios defined for the application, it’s difficult to replicate due to time and budget constraints. The main challenge is to identify the scenarios that will impact the application performance more in comparison to other scenarios. The selection of situations could be made using the 80:20 principle (i.e., 80% of performance problems occur due to 20% of scenarios). You should also consider negative and abandoned scenarios.
Ramp up and Ramp down of Virtual Users:
While executing any performance test, it’s advisable not to start with all virtual users at the same time or randomly start them. Ramping up virtual users at the same time or randomly will not generate a realistic load and may cause an artificial/unrealistic bottleneck. The ramp-up of users should depict a real-world situation as close as possible to a peak load. Follow the same principle for ramping down (i.e., moving the users out of the application).
Steady Load Duration:
The steady load duration should be decided based on peak application usage and duration, which depends on the domain and usage of the application. During this process, each virtual user would execute a scenario and then start over again for however many times. This will help reach the targeted TPS (Transactions per Second) requirement.
The time between two iterations of execution for virtual users. It is set to achieve the required TPS to emulate real-life conditions.
Real users use a different amount of time on different pages. “Think time” is the time that a real user takes to read the page, look at the image, read the text, and so on. Real users are categorized based on their familiarity with the application and their way of using it. Some users are new, some are experienced, some want to finish their tasks quickly and exit the application, and some stay logged in. Think of it like this: the time should be emulated based on new versus experienced users.
Different browsers used by real users generate the load on an application. Simulate the browser load by mixing different browsers and their various versions.
To emulate the users’ internet connection, specify different network speeds for virtual users based on different locations, geographies, and devices. Network emulation helps us identify server behavior during network congestion or packet loss. Even in the case of cloud load generators, there is very little control over bandwidth, packet losses, etc.
There should be a mechanism for IP spoofing to mimic loads coming from different geographies. The load may not be restricted to specific IP ranges, which could impact the load balancing among servers.
User Cache Behavior:
The user’s cache behavior to be simulated.
Sufficient and Realistic Test Data:
Data should be parameterized to depict real user behavior. Thus, there should be sufficient data available in the databases (an empty database adds no value to the testing process).
It’s essential to include the actual background and batch process as apart of the test.
Tests should consider load burst situations. Though mostly seasonal, it’s about handling higher-than-the-expected demand and impacts revenue in a short period.
2) Performance Testing Environment Size and Extrapolation of Results
Another major constraint of performance testing is doing load tests on scaled-down versions of the production environment with much lower user loads. Using the production environment for load test has many risks and limitations, as it could impact real users and fill the database with junk data. So the load test usually is not conducted on production.
The performance testing environment is sized based on the budget available, and most of the time it’s the minimum possible scaled version. The real challenge comes in predicting the behavior of the user in production based on the test conducted on the scaled-down environment. The general belief is that it’s about simple multiplication, but simple repetition or linear approach can over/underestimate performance.
The IT infrastructure’s components behave differently based on configuration so the real solution would be extrapolation. Extrapolating the results of an IT application is not only complex and time-consuming — considering the presence of too many components, their corresponding variables, and their relationship with each other — but also not a foolproof solution.
However, it has to start with something simple and could lead to an approximate prediction of performance in production. A rough approximation can be done by comparing the performance of a single user in different environments with different scalability (like production, SIT, UAT, and PT). By keeping this distinction in a different environment, then scaling that environment, the result of load tests with different user loads can help us make predictions.
To conclude, there should be a cost-benefit analysis of performance assurance. Based on that, if there is a need for further performance assurance, it should be planned effectively to guarantee a successful and worthwhile project. The solution for this lies in identifying performance requirements and goals, simulating real user behavior during load testing, correctly sizing the performance testing application, and proper monitoring/analysis. Once completed, testers can predict the result of a test conducted on a performance test environment (mostly scaled down) to the production performance behavior.
The four main aspects that impact the effectiveness of any performance assurance solution are as follows:
- Performance test environment sizing
- Test design to emulate real user behavior
- Accurate monitoring and analysis
- Extrapolation of test results to predict performance in production
Until testers correctly factor in all of these, performance assurance doesn’t have much value. If you have any questions, feel free to reach out to us via the comments below or contact us here.