Why do online testing and what kind of testing tools you can use

The digital environment provides businesses with a very convenient opportunity to quickly evaluate ideas using controlled experiments. The aim is to carry out experiments using the scientific method: observing the effect of the changes in the behaviour or opinion of the users that we assign randomly between different variations / combinations.

Why do online testing?

Unlike other methodologies, such as post-hoc analysis or analysis of time series (quasi-experimentation), controlled experiments evaluate cause-effect relationships.

The simplest experiment would consist in the random assignment of the users that visit a page between 2 variants: (A) control, with the original version and (B) alternative variation, the new version that is evaluated against the control.

The evaluation uses performance metrics that are directly or indirectly related to data on user behavior and opinion. Statistical tests evaluate the probability of not finding evidence of significant differences between the data collected from the variation versus the original variation. The use of secondary metrics and user segmentation of each group can help us understand the results and refine and explore new ideas.

Web teams that take into account the behavior and opinion of users are more likely to conduct successful experiments. With a data-driven approach we can report on the subjective results of our experiments and go beyond the opinion of the expert or Highest Paid Person’s Opinion (HiPPO).

When a company already has a testing tool, the cost of doing one more test is very low. In this sense, putting a testing tool at your disposal will enhance the innovation capabilities of your company and optimize business results.

A losing test is more interesting than an inconclusive test. Finding alternative variations of ideas that could be harmful allows you to analyze the results and refine the idea to make it work.

“The best way to have a good idea, is to have many ideas.”

Important concepts in testing

We use the type of testing tool that we use, we need to have clear the following concepts when we perform tests:

Main objective: It is the main metric or key performance indicator with which the achievement of the objectives of the experiment will be evaluated. It should not be based on clicks but a result or an “outcome”.
Variable: It is the factor that is believed to influence the outcome of the main objective.
Variations: They are the different versions of the variable, including the original version.
Experimental unit: Refers to the entity with which the metrics are calculated for each of the variants. On the web, a user with his userId in a browser cookie is usually the experimental unit. It is important that the user experience a consistent experience throughout the experiment depending on what variation was assigned.
Null hypothesis (H0): Consider that the different variants do not produce differences in the main objective beyond random differences.
Confidence level: The probability of refuting the null hypothesis when in fact it is true.
Statistical power: The probability of detecting difference when it really exists.
Test A/A: Exercise to evaluate the statistical power of the method of random assignment between two variations equal to the original version.
Standard deviation (Std-Dev): It is a measure that indicates how scattered the data are with respect to the average.
Standard error (Std-Err): It is the estimate of the standard deviation, calculated from an estimate derived from a sample.
Effect: It is the difference of the main objective between the variations. It is useful to give the confidence intervals for the differences between the means of the variations why it gives an idea of the ranges of the effect.

Types of testing tools

The testing tools can be classified according to their method of random assignment.

Every testing tool must execute a code to randomly assign users in one variation or another. Then it should be possible to manipulate what is necessary, from the content to the proper functioning of the modules of the platform.

Each method has its advantages and disadvantages.

“Client-side” method

The “Client-side” method is the most common method in testing tools.

The developer puts a javascript tag on the web.

When the user visits the page, the user’s browser makes a request to the servers of the testing tool to assign the user to a variation, if it is not, and obtain the javascript code of the variation assigned to it.

The javascript code of the variation is executed and makes the changes at the time of loading the page.

It is an intrusive method, but it is easy to implement.

As a disadvantage, the assignment of the user is done during the loading of the page itself and this may delay it.

It does not allow to easily test those sites with complex dynamic implementations.

The end user easily realizes that he is being tested with the testing tool.

The “Client-side” method is recommended for design experiments and static content.

Example: VWO, Convert, Google Optimize, Optimizely X, Oracle Maxymiser, Monetate, etc.

“Server-side” method

The “Server-side” method refers to the inclusion of code in the application itself to be able to produce different experiences for users assigned in one variation or in another.

The code will use an API to determine the user’s assignment at those points where the application should work differently for one variation or another.

It is an intrusive method, and may require significant changes in the application.

As an advantage, it is possible to test everything at the point where it is necessary and transparent to the user.

Initial implementation requires development effort and will be complex if the application is complex.

Each test requires code changes by the development team, which introduces a risk of affecting parts of the application not related to the test itself.

The final changes also require programming to eliminate the test code.

The “Server-side” method is recommended to integrate the experimentation in your “content management system” using a data model for it.

Once a first test is done, the cost of doing the next one is reduced to a minimum.

A common experiment is to test different blocks of content in different locations on the same page.

Example: VWO, AB Tasty, Optimizely X, Adobe Target, Google Content Experiments, etc.

“Rewrite page” method (Proxy)

The “page rewrite” method incorporates a proxy that modifies the HTML before returning it to the user.

The user makes the request to the proxy, the proxy makes the request to the server to get the response and modify it before sending it back to the user including the necessary change.

It is a non-intrusive method since you do not need to make changes in the application itself.

As a disadvantage, the load time of the website is affected since it is the proxy that must return to make the request to the server and make the change in the HTML on the return.

The proy server should be able to manage all website traffic, which will require one or more servers with sufficient power.

It usually has more risk of technical errors, since instead of writing the code directly write the rules of rewriting the html.

It does not allow to easily test the functionality of modules since the modification is done on the HTML already processed.

The “page rewrite” method is recommended to minimize the effort of the development team in tests of design or content changes, but it would not be the most appropriate method to test functionalities, major changes or migrations.

Example: SiteSpect, etc.

“Split test” method

The method of “split test” consists of the distribution of the traffic between each of the variations of the experiment housed in different servers.

The website will use a proxy to take care of directing the user to the server assigned to him.

It is a non-intrusive method since you do not need to make changes in the application itself.

As a disadvantage, for each variation you need to replicate the application on a different server and it may not make sense in situations with many very small variations.

The technical infrastructure is usually expensive and the differences in performance can also be the cause of the difference in results.

The “Split test” method is recommended to test pilots, migrations or very important web changes.

Example: Google Cloud Engine, nginx, etc.

Conclusions

What kind of testing tool is the most appropriate for you? What requirements should your testing tool have? Want to talk with us? We can help you designing, developing and implementing the tool that you need.