A/B Testing with CloudBees Rollout and Google Analytics (Part I)

 

Feature flags are a key tool for continuous delivery. Software developers utilize feature flagging to iterate quickly and release safely, ultimately improving product development. This continuous improvement is a beneficial byproduct of continuous delivery, made possible through repeating scientific methods. Experimentation like A/B testing is a popular way to test variations of an application. Feature flag management solutions have further simplified creating multiple versions (by varying one element) and then routing the versions to end user groups.

But how do these teams know which variation is performing better? As Sherlock Holmes once said, “It is a capital mistake to theorize before one has data.” And Sir Arthur Conan Doyle was ahead of his time in warning against hasty conclusions. In today’s software world, additional caution has to be taken with data analytics. Incorrect assumptions can reduce a company’s understanding about their customers. Furthermore, key decisions based on misrepresented data have respective strategic and dire economic impacts.

For a simple A/B test, a team would want to monitor the statistical significance (to know when the experiment is done) and perform Chi-Squared analysis. Additionally, they would look at each variation’s specific metrics, such as website traffic, page load time, conversion rate, bounce rate or number of clicks. As teams scale and more experimentation ensues, it’s become imperative to adapt and ensure the metrics and data are tailored to the process they’re measuring.

Niche data platforms like Google Analytics make it easy to wire in the appropriate metrics and monitor the results. But more importantly, these tools provide an open library on how to communicate and push data to their systems. So in addition to their out of the box metrics, a company can define and use data that is relevant and meets the needs for a defined test. 

In this blog post, we will set-up a simple A/B test, integrating two platforms that will help with the necessary set-up and data analysis for this experimentation. In doing so, we’ll outline how to:

  • Use CloudBees Rollout SDK in setting up variations in an A/B test
  • Route the user traffic through CloudBees Rollout Dashboard
  • Integrate with Google Analytics and push data for further analysis

Test Overview

Our example will focus around our sample Hacker News application built in Vue. As you may see (or may find hard to see), the “Log In” button’s lack of contrast, in the right-hand corner of the header, may be a contributing factor for a low user engagement rate.

A/B testing can help confirm (or negate) this null hypothesis. In our A/B test, we’ll vary the color of this call-to-action button between our two test options: “button is-black” and “button is-primary.” The first thing we’ll need to do is install the Rollout SDK using the JavaScript installation. This can be done after cloning the repository locally, and entering the following into the command line at the root of the directory:

$   yarn install
$   yarn add rox-browser

The flags.js (src/utils/flag.js) file initializes the feature flags, and defines their default behaviors. It also establishes a connection to the Rollout dashboard via the VUE_APP_ROLLOUT_KEY (must be modified in ‘.env.local.sample file), thus permitting remote configuration. Within the Flags constant, the loginButtonColor is defined with its two variant values, and ‘button is-black’ is the default color. An outline of the flag file is shown below:

import Rox from 'rox-browser'

export const Flags = {
  loginButtonColor: new Rox.Variant('button is-black', ['button is-black', 'button is-primary'])
}

export const configurationFetchedHandler = fetcherResults => {
  if (fetcherResults.hasChanges && fetcherResults.fetcherStatus === 'APPLIED_FROM_NETWORK') {
    window.location.reload(false)
  }
}

const options = {
  configurationFetchedHandler: configurationFetchedHandler
}

Rox.register('default', Flags)
Rox.setup(process.env.VUE_APP_ROLLOUT_KEY, options)

Creating a Split Experiment in Rollout

Now that we have the variations of our application set-up for A/B testing, we can rely on a Rollout split experiment to route traffic to our two user groups. Within the dashboard, we can create a new experiment, and select the loginButtonColor flag. By setting the split percentages to weight 50 percent of the traffic towards our control “button is-black” control variable, and the other 50 percent to the “button is-primary” option, we can ensure our entire user group will be split into two equally sized groups. (For more information on what makes this possible, I recommend reading the article on Rollout’s Stickiness Property.) We can now “Update Audience” with our finished experiment. With the set-up for our test complete, it’s now time to push the data to an analytics platform.

Gating the Feature Behind a Feature Flag

To actually use the flag, we have to gate the component (i.e. the color of the log-in button) and reference the flag’s value. The Nav.vue file (src/components/Nav.vue) implements the log-in button on the top navigation bar. We first import the library and our flags from the previous file (line 46) and then create a new function “loginButtonColor” that will get the value of the feature flag, either locally or through the dynamic configuration file (line 58).

loginButtonColor: Flags.loginButtonColor.getValue()

The goal here is to use the flag to assign a color value from the configuration file. Because we leveraged the Variant flag and provided the values of the Vue color choices as options, we don’t have to use any sort of if-then, or switch statement logic here.

<div class=”buttons”>

          <a :class=”loginButtonColor” v-if=”loggedIn” @click=”logout”>

            Log out

          </a>

          <a :class=”loginButtonColor” v-else href=”/login”>

            Log in

          </a>

</div>

Sending Data to Google Analytics

Google Analytics is one of the most popular data analytics platforms. It extensibility allows users to look at the various metrics (e.g. pageviews, page load time, user location, engagement rates, etc.) and empowers teams with a massive public development library.

We can take advantage of this by installing the gtag.js snippet in our index.html file to allow subsequent gtag(‘event’) calls in our code base. Generally, we want to send data through a gtag(‘event’) whenever a flag is evaluated. To start, we can paste the snippet in the index.html file to link our application to the data monitoring platform. Note: make sure to replace GA_MEASUREMENT_ID with personal ID where appropriate.

<!— Global site tag (gtag.js) - Google Analytics —>

<script async src=”https://www.googletagmanager.com/gtag/js?id=GA_MEASUREMENT_ID”></script>

<script>

  window.dataLayer = window.dataLayer || [];

  function gtag(){dataLayer.push(arguments);}

  gtag(‘js’, new Date());

 

  gtag(‘config’, ‘GA_MEASUREMENT_ID’);

</script>

Now that we have the application linked to the Google Analytics platform, we’ll need to transmit information about the feature flag (i.e. its local value). This will help segment our metrics in the dashboard so we can easily splice the views and visualize which log in button color yields a higher engagement rate. To do this we will make use of Rollout’s Impression Handler.

What is the Impression Handler?

The Rollout SDK allows its users to configure a function that is called every single time a feature flag is evaluated. If you are familiar with JavaScript, this should sound familiar to an EventHandler.

So in our example, we can insert a gtag(‘event’) within the Impression Handler. This will relay the values of each user’s login button color by calling “reporting.value”. We can further restrict invoking the gtag(‘event’) to only occur when certain logic criterion is met (e.g. only if a Rollout experiment is configured for a flag) or wait on a click event or some other user action. This way, we can filter any superfluous data transmission. With our new impressionHandler, our finished Flag.js file will look similar to that below:

import Rox from 'rox-browser'

export const Flags = {
  loginButtonColor: new Rox.Variant('button is-black', ['button is-black', 'button is-primary'])
}

export const configurationFetchedHandler = fetcherResults => {
  if (fetcherResults.hasChanges && fetcherResults.fetcherStatus === 'APPLIED_FROM_NETWORK') {
    window.location.reload(false)
  }
}

export const impressionHandler = (reporting, experiment) => {
  if (experiment) {
    console.log('flag ' + reporting.name + ' value is ' + reporting.value + ', it is part of ' + experiment.name + ' experiment')
    gtag('event', experiment.name, {
      'event_category': reporting.name,
      'event_label': reporting.value
    })
  } else {
    console.log('No experiment configured for flag ' + reporting.name + '. default value ' + reporting.value + ' was used')
  }
}

const options = {
  configurationFetchedHandler: configurationFetchedHandler,
  impressionHandler: impressionHandler
}

Rox.register('default', Flags)
Rox.setup(process.env.VUE_APP_ROLLOUT_KEY, options)

The if statement within the impressionHandler ensures that we’ll only relay data for feature flags that have a corresponding experiment defined from the dashboard. Now, run the app and head over to the Google Analytics dashboard. We should not only see real time data coming in, but also our successfully pushed flag data appearing as Events!

Summary

Experimentation is essential to continuously grow and improve a product or service. The very premise of a split or multivariate based test suggests that feature flags can be employed, allowing teams to create variations and route user traffic. For experiments to be scalable and produce reliable, actionable data, a feature flag management tool should be used in concert with a specialized data analytics platform counterpart, such as Google Analytics. This two-platform approach is advantageous because it enables developers and product managers to use specialized tools, each designed to be the best in its field, while experimenting.

In a future blog, we will expand on the data manipulation done within Google Analytics to analyze results of an A/B test from the information pushed through the impressionHandler.