Experience League | Image

Part Human, Part Machine Learning, All Powerful Analyst

Table of Contents

Lab Overview

In this lab we will show you how to take advantage of some of the analysis tools within Adobe Analytics to better understand and draw insights from your data. We will focus on Anomaly Detection, Contribution Analysis, and Intelligent Alerts to identify unexpected changes in your data, understand what may be causing those changes, and configure alerts so that you can be notified quickly about the trends you most care about.

Key Takeaways

  • How to run Anomaly Detection, which will automatically identify departures from your usual patterns
  • How to run Contribution Analysis, which can point to likely causes of anomalies in your data
  • How to configure Intelligent Alerts, which can notify you when there are significant changes in your data

Prerequisites

  • All of the tools we will be exploring are capabilites within Adobe Analytics
  • Terminology
    • Metric: a numeric value, such as Page Views, Visits, or Average Time Spent
    • Dimension: non-numeric values such as Region, Browser, or Channel. Each individual value of a Dimension (Chrome vs Safari) is referred to as an Element, or Dimension-element. The Contribution Analysis portion of this lab will focus on how Metrics such as Visits break into different Dimension-elements.
    • Segment: a set of rules based on Metrics and/or Dimensions used to identify a sub-group. For example, Region = US and Time Spent > 1 minute.

Lesson 1 - Anomaly Detection

Objectives

  1. Learn when Anomaly Detection is run automatically and how to access detailed results
  2. Understand the output generated by Anomaly Detection
  3. Understand how the training data (time window selection) affects Anomaly Detection results

Lesson Context

The inspiration for Anomaly Detection came from one of our product managers (John) while working in Adobe's consluting group. A client called and reported that the conversion rate on their website had fallen by 0.8%. John first needed to know whether that kind of change was actually unusual for the client. After many hours of pulling together data and writing code for analysis, he concluded that the change was statistically (and practically) significant. The intent with Anomaly Detection is for everything John did to now happen automatically for you whenever you're looking at data in Analysis Workspace.

When you create a new table in Analysis Workspace, it can be difficult to quickly digest the data and notice where the biggest changes occur. When you do see an uptick or downturn in the data, your first question is usually something like "Does this happen every week? Every year?" The Anomaly Detection algorithm learns the seasonal patterns in your data and will automatically account for them when identifying anomalies.

Algorithm Background

Anomaly Detection fits several time series models to the 35 days preceding your selected time window. It then compares how well each model represents the data within these 35 days, and selects the model with the best fit. This model is used to create a forecast range from each point within your window to the next point. When a point falls outside of this forecasted range, it is flagged as an anomaly.

  • The algorithm fits separate models for weekdays and weekends, enabling it to capture large changes in weekend behavior without triggering an anomaly.
  • Each model will also use data from the previous year, if available. Annual data is shifted by one day (or two for leap years) so that Friday of this year is compared to Friday of last year.
  • The default confidence for forecasting is 99% for daily data to reduce false positives. This can't be changed for automatic jobs, but we will see how to configure confidence level for automated alerts later.

Data Background

In our examples we will take the role of an analyst on our first day at a brand-new clothing company called Luma. We know the company is young so we don't expect to see huge volumes of data yet, but we want to get in and start to get familiar with what's available.

Exercise 1.1

  1. Log in to your lab machine with the username provided and password Adobe2019!

  2. Start Google Chrome and click the Experience Cloud bookmark

    Figure 1: Bookmark

  3. Click Sign In with an Adobe ID and log in with the credentials provided.

  4. Click the grid in the top right, then click Analytics

    Figure 3: Grid

    Figure 4: Analytics

  5. Click Workspace in the top left banner then Create New Project > Create

  6. You should now have a blank project with an empty Freeform Table

    Figure 5: Freeform

  7. Change the data range in the top right to December 1, 2018 through January 31, 2019.

    Figure 6: Date Range

  8. Drag the metric Visits from the left bar into the metric drop zone of the Freeform Table

    Figure 9: Visits

    Figure 10: MetricDrop

  9. In the top right, you should briefly see Searching for anomalies...

    Figure 11: Searching

    Note: Anomaly Detection will only run when the Dimension in the table is a time variable: Hour, Day, Week, Month

  10. If there are no anomalies, this will soon change to No anomalies found

    Figure 12: NoAnomalies

  11. If any anomalies are found, they will be marked in the corner of the cell

    Figure 13: Anomaly

  12. To see details of the Anomaly Detection results, let's add a line visualization to our project. Click the bar graph icon in the top left to bring up visualizaton options, then drag Line toward the top of the table.

    Figure 14: BarGraphIcon

  13. You will get a line graph with anomalies highlighted, as well as the green shaded confidence region that shows where we expected your data to be during this time window.

    Figure 14: LineGraph

Exercise 1.2

It is important to understand how the date selection of your table affects the results of Anomaly Detection. The algorithm uses the 35 days before your selection (as well as the previous year, if available) to train a model. Because of this, a data point that is anomalous under one selection may not be under another.

  1. To see an extreme example of this, change the time window to Nov 1 through Dec 31. We now have a lot of anomalies in our report, but most of the data doesn't look erratic.

    Anomaly Detection report for Nov 2018 - Jan 2019 with many anomalies Figure 15: Many Anomalies

  2. If you see anomalies that aren't intuitive to you, especially many at once, it can help to take a look at the data Anomaly Detection is using to set its expectations. Let's now include October in the time range so we can see what happened that resulted in this graph.

    Anomaly Detection report for Oct 2018 - Jan 2019 showing empty training data Figure 16: Empty Training

  3. Now we can see what the problem is - there is no data for October. With a flat line of zeroes in October, the activity in November looks anomalous in comparison.

    Note: We are releasing an update to Anomaly Detection in a few weeks that will help it recover more quickly from situations like this, but the results will still be dependent on the time window selected.

  4. Note the anomalies in early January (before the big spike), and let's return the window to just December and January.

    Anomaly Detection report for Dec 2018 - Jan 2019 showing only one anomaly in January Figure 17: Clean January

  5. Now that the November data is used as the training period, a more appropriate model is created that only finds one anomaly during January.

Lesson 2 - Contribution Analysis

Objectives

  1. Learn how to run Contribution Analysis on an anomaly of interest
  2. Learn how to interpret Contribution Analysis results
  3. Learn common pitfalls of Contribution Analysis reports and how to avoid them

Lesson Context

Returning to our example with John in consulting, you can guess what the client's next question was after John reported that the drop in their conversion rate was signifcnat. "What caused the decrease?" This sent John back to his computer for an exhaustive search through the data for which specific products or areas of the website were responsible for the overall decline that was seen in the aggregate numbers. He eventually found one product in particular whose conversion rate had plummeted after a change in its product page.

Performing the kind of analysis John did for his client can be difficult, time-consuming, or both, as there are commonly hundreds or thousands of candidate variables in your data. Contribution Analysis scans across each element of every Dimension to find which of them most deviated from expected behavior on the same day as the anomaly. The list of dimensions at the top of a Contribution Anaylsis report can quickly narrow down your search by highlighting the characteristics most associated with the unexpected change in your metric.

Note: Contribution Analysis is not enabled for Hourly data. We will only be using the Daily granularity for this exercise.

Algorithm Background

Contribution Analysis performs a fairly simple statistical test for each dimension-element included in the report. The power of the Contribution Analysis service is that it can perform this test across tens of thousands of dimension-elements efficiently. The exhaustive search that Contribution Analysis provides can give you confidence that there is not some aspect of the data that you've overlooked.

Without going into the details of the math performed by Contribution Analysis (if you want to see them, look here), the service creates a simple table for each dimension that comparing the day of the anomaly to the 30 days prior. Any elements with large changes in their representation are likely causes of the anomaly. As a simple example, consider the following table of website visitors:

Age Control Period Anomaly Day
< 40 1,000 100
40 - 60 800 80
> 60 200 120
Total 2,000 300

It may be clear from the raw numbers that we have more visitors from the > 60 group than expected on the anomaly day (or less of the other groups). To quantify this, let's rewrite the table in terms of proportions:

Age Control Period Anomaly Day Difference Contribution Score
< 40 50% 33% -17% -
40 - 60 40% 27% -13% -
> 60 10% 40% 30% +
Total 100% 100% 0%

We can now clearly see that the percentage of the population above 60 increases dramatically on the the day of the anomaly, while the other percentages decrease proportionally. This will result in positive Contribution Score for the > 60 element, while the < 40 and 40 - 60 elements will have negative Contribution Scores.

What we don't see in this table is how the total of 300 on the day of the anomaly compares to what was expected. The sign of the score indicates whether the anomaly day contains more or less than expected of the element on a percentage basis. To complete the picture and shape our understanding of what happened on the anomaly day, we need the context of the expected total.

If the expected amount was 250, for example, following the 50-40-10 distribution of the Control Period, we would expect only 25 visitors from the > 60 group, but we had 120. This seems like the main cause for our increase in traffic, even offsetting lower than expected traffic from the other age groups.

Age Control Period Anomaly Day Expected on Anomaly Day Difference from Expected
< 40 1,000 100 125 -25
40 - 60 800 80 100 -20
> 60 200 120 25 95
Total 2,000 300 250 50

If we had expected 1,000 visitors, however, we now have a drop of 700 visitors to explain and we'll interpret the table much differently. Now we have only slightly more of the > 60 group than expected, but substantially less of the other two groups. the > 60 was just as expected (10% of 1,200) and the anomaly was then due to a large drop in the other two categories.

Age Control Period Anomaly Day Expected on Anomaly Day Difference from Expected
< 40 1,000 100 500 -400
40 - 60 800 80 400 -320
> 60 200 120 100 20
Total 2,000 300 1,000 -700

One intent of this example is to illustrate that whether your anomaly was above expected (spike) or below expected (dip), you may see both positive and negative Contribution Scores in the output. Your interpretation of the scores may change when you are looking at a spike versus a dip, however. For spikes, the largest positive score should usually be your starting point, while for dips you will usually want to start on the negative side.

The main thing to remember from this example is that a positive score means an element was responsible for a larger percentage of your metric on the anomaly day than during the control period, while a negative score means the percentage on the anomaly day is lower.

Exercise 2.1

Now let's run a Contribution Analysis report. You're curious what happened with Luma's site on December 10 that resulted in such a relatively large spike in the number of visits.

  1. Select the anomaly on December 10 in the line chart and click Analyze, then Run Contribution Analysis.

    Figure 15: Analyze

    Note: You can also run Contribution Analysis on days not identified as an anomaly by right-clicking on any row in the table.

    Figure 16: RunCANoAnomaly

  2. You should see the Contribution Analysis panel appear above your Freeform Table. Click Run Contribution Analysis.

    Figure 17: Run CA

  3. A lot of elements are being examined in Contribution Analysis, so we need a minute or two (occasionally much longer for really large data sets) to run the job. You should see a progress bar after starting the job.

  4. When the job is complete, you will see the anomaly metric in the top left and the trend of themetric in the top right. These help us remember the context of the anomaly we're trying to understand in the report.

    Figure 18: CA Report

  5. Below the anomaly and trend, we see a list of Top Items. These are the dimension elements that are most associated with the anomaly, based on a comparison of their usual contributon to the metric and that on the anomaly day.

    Top Items from the Contribution Analysis report Figure 16: ContributionAnalyisResults

    Note: Contribution Analysis looks at the 30 days prior to the day being analyzed, not the date range selected when the anomaly was identified.

  6. Contribution Scores are normalized so that the highest score is always 1.00 (or -1.00). The intent with normalization is to allow you to quickly scan the list of top items and see how other elements compare to the strongest relationship we could find. When the source of the anomaly can be clearly isolated, there should be a rapid decline in the scores. In this report, the top 50 scores decrease smoothly from 1.00 to 0.50. When so many entries in the table are this close to 1.00, this usually means that none of the contributions found were particularly strong. All of these variables had a weak contribution to the anomaly and so their contributions are roughly the same. It's rare to have multiple distinct variables that make significant contributions of approximately the same magnitude.

    Note: The exception to this is when several elements are strongly correlated, or even the same. We have this to some extent in this table - there are several variables that seem to have the same data: Marketing Channel: Email and Marketing Channel: emailLaunch, for example.

  7. Since we aren't seeing much to help us understand the spike, we ask around at Luma and find out that the web traffic so far is all just testing of the new site, and the increase on December 10 was a group of engineers making sure the site could handle future expected loads.

  8. Let's look for an anomaly where we can get more interesting results. Close the current Contribution Analysis Report by clicking the 'X' in the top right corner, then go to the anomaly on January 20, click Analyze, and run Contribution Analysis again.

    Top Items for Visits Anomaly on January 20 Figure 18: January 20

  9. For this anomaly, the story is a bit clearer. We see have Virginia as the top two elements, folowed by four items related to the women's section of our site. Following these, the scores start to decrease fairly quickly, with a steep drop-off after Countries: United States. It looks like something happened in Virgina related to our female product line. After asking around, you find out that the company held a promotional event near Washington DC focused on women's clothing. Without knowing the schedule, Contribution Analysis pointed us right to the explanation for the spie in our data, including information around where on our site the activity occurred, and where the traffic was coming from. Depending on the array of dimensions available in your data, Contribution Analysis can highlight many aspects of the drivers of the anomaly.

  10. Let's look at the table in a little more detail. Because we were looking at the metric Visits in our Freeform Table, the Top Items table automatically shows us the number of visits associated with each of the elements. Virginia, for example, was responsible for 257 visits out of the 268 total on the anomaly day. The table also includes the number of Unique Visitors, to help us understand how much duplication we had across the visits recorded. In this case there was little overlap - our 257 visits were performed by 228 unique visitors. An interesting example in this table is Marketing Channel: Direct, which has more associated visits than Virginia, yet has a lower Contribution Score because it was already a common element in our data before tha anomaly.

  11. The totals at the top of the table for Visits and Unique Visitors are not the number of visits on the anomaly day or over any period of time. They follow the pattern of Freeform Tables and provide the column sum. The percentages in each row are relative to the total of the column, not to the number of visits/visitors on the anomaly day.

  12. For this report, let's also look at the Generated Segments table. The report attempts to automatically create segments that are responsible for the anomaly by combining elements from the Top Items table. You can see details of each segment by hovering over the segment name (Contribution Segment 1) and click the info icon.

    Figure 18: CA Segments

    Figure 19: CA Segment Info

  13. Clicking the info for a segment brings up a small window showing the variables used to create the segment. In addition, you can see how many unique visitors, visits, and page views this segment accounts for from the last 90 days. For example, our first segment captures 257 unique visitors out of the total 1,337 we've had on the site over the last three months.

    Figure 20: CA Segment Details

  14. Let's make our segment public and take a look at activity in the segment over December and January. Click Make public at the bottom of the segment detail window. Then scroll down to your Freeform Table. You should now have Contrbution Segment 1 on the side panel at the top of your Segments.

    Figure 21: Public Segment

  15. Drag Contribution Segment 1 to the Segment drop zone of your Freeform Table to get a visualization of hits belonging to the segment in December and January.

    Figure 22: Segment Drop

  16. We can see no activity for this segment before the January 20 spike, although there are some residuals hits in the few days following the event.

    Figure 23: Segment ONly

Exercise 2.2

  1. There are a few reasons you may want to exclude a variable from Contribution Analysis:

    • Jobs are taking a long time and you'd like to speed them up. Especially if you have a set of the usual suspects, you can exclude all other variables to have Contribution Analysis quickly scan over those dimensions you're most interested in.
    • There are variables you know are not legitimate contributors but are still likely to have a high contribution score. Examples are time-related variables that are related to the anomaly period by definition, but don't provide any insight into the cause of the anomaly.
    • You have previously run a report that contained several variables that were essentially the same (as in one of our examples), and you'd like to see the report without duplication. In this case, you can manually exclude all but one dimension from each group you identify to get a cleaner report.

    Note: your company may have a limit on the number of Contribution Analysis jobs you can run each month. Be sure to understand your limit and be careful of running multiple versions of the same report if you have a limit.

  2. To see the list of currently excluded dimensions, go to run a new Contribution Analysis report and click Dimensions.

    Figure 21: Excluded List

  3. We default to an exclusion list that covers dimensions that will be problematic to most customers, which should be a useful starting point. To re-include a dimension in the analysis, click the X near its name. To exclude a new dimension, click out of the excluded list, then drag a dimension from the left side bar (make sure the Components icon is selected) into the dotted box around Excluded Dimensions. After changing the list, you can also set it as a new default.

    Figure 22: Exclude Dimension

    Note: For those with experience in statstical modeling, it is useful to understand that while Contribution Analysis is capable of looking at many dimensions, the calculations it performs are univariate. This means that excluding (or including) one dimension will have no impact on the Contribution Score of another dimension, unless the highest-scoring dimension is changed in the result. Another way of saying this is that the ratio between scores of any two dimensions will always be the same, regardless of which other dimensions are included in the report.

  4. To see why these variables are excluded, let's look at a specific example. Open the list of excluded variables and re-include Day, then run the report again. You will now see January 20 at the top of the list, because of course that day is unique to the anomaly on January 20. The default exclusion list includes time dimensions like this that jump unfairly to the top of the list, or dimensions that we assume will be uninteresting to most users when looking for the causes of anomalies (Monitor Resolution).

    Contribution Analyis Report with the Day Dimension Included Figure 23: Day Wins

    Note: One drawback of the normalized scores in Contribution Analysis is it can be difficult to compare scores between reports. One way to get around this is to intentionally include Day as a dimension to provide you a common benchmark across your reports. The day variable will generate the highest contribution score possible, so you can anchor your other scores to that reference point.

Lesson 3 - Intelligent Alerts

Objective

  1. Learn how to create an Alert
  2. Understand the alerting options available for simple alerts
  3. Work through an example of a stacked alert

Lesson Context

Back to our consultant John, after helping his client identify the cause of their problem, they immediately asked if they could receive notifications when something similar happened in the future. The Alerts feature now offers flexible options for configuring notifications specific to your interests.

Exercise 3.1

  1. Hover over Components in the top rail and click Alerts.

    Figure 24: Alerts

  2. Click Create new alert. You will first have several options for configuring the alert. Make sure to enter a title for the alert, or you will not be able to save it.

    Figure 25: Alert Config

  3. The definition of the alert is made below in the Send an Alert When box. Begin by dropping the metric Visits into the Metrics drop zone. You will now have options to determine when an alert on visits should fire.

    Figure 26: Alert Options

  4. The options are intended to be as descriptive as possible in a few words, but some additional detail for each is provided below.

    Alert Type Description Option to Specify
    anomaly exists alerts will fire for all anomalies confidence level to determine anomalies
    anomaly is above expected alerts will fire only for anomalies higher than usual confidence level to determine anomalies
    anomaly is below expected alerts will fire only for anomalies lower than usual confidence level to determine anomalies
    is above or equals the data point is greater than or equal to a certain value the threshold to exceed
    is below or equals the data point less than or equal to a certain value the threshold to be below
    changes by % the data point is a certain percentage above or below the previous data point the percentage change
  5. In addition to these options, alerts can be filtered by any number of Segments. To continue with our example from Contribuiton Analysis, let's say we're interested in knowing about unusual traffic from Virginia to the women's section of our site again. Drag Contribution Segment 1 to the Segment drop zone of the alert.

    Figure 26: Alert Segment

  6. After configuring your Alert, click Save and you should see your new Alert appear in a list.

    Figure 27: Alert List

Exercise 3.2

You probably noticed when creating your Alert that the Preview pane shows how many times your Alert would have triggered recently.

Figure 28: Alert Preview

The Preview pane is useful for avoiding too many alerts, and for setting up alerts where you hope to be notified maybe once a week or month. Experimenting with the confidence threshold can often get you to the alerting frequency you are aiming for.

Note: If you need a Preview over a longer time horizion, go back to Analysis Workspace and visualize the time period you're interested in as we did in the first exercise.

What do you do for Alerts that you hope only happen rarely? Often what you want to see in the Preview pane is 0 triggers, because you only want to be alerted when something is really wrong (or different). Even at 99% confidence, however, anomalies are not especially rare. Large percentage changes can also happen frequently as part of a 24-hour or weekly cycle. To find truly rare events, you can combine these two alerts.

  1. Start a new alert. If you saved the prior Alert, you now do this with the + Add button above your Alert list. Enter a new title, then drag Visits into the Alert and select anomaly exists with 99% threshold.

  2. Click the gear just outside the alert definition box and click Add Rule.

    Figure 27: Add Rule

  3. Between the two rules, select AND.

    Figure 28: And

  4. Drag Visits into the second pane as well. This time, choose changes by % with 100.

    Figure 29: Dual Alert

This Alert will only fire when both conditions are met so you can be more confident when you are alerted that something unexpected has happened.

Note: For this example we intentionally selected Visits for both sides of the rule. You can instead combine different Metrics to get the exact combination of events you want to watch out for. You can also add any number of rules, but it's not possible to mix AND and OR rules.

Next Steps

Thanks so much for attending. Please go try the services that were new for you and get more familiar with them. Here are a few ideas for where you can start:

  1. If you haven't noticed it before, make sure Anomaly Detection runs for you. If you don't see it automatically start in a Freeform Table while logged in under own account, ask your admin if they can enable it for you.

  2. Check on your access to Contribution Analysis tokens and run some Contribution Analysis jobs, if you're able. Try to start with an event in the past you've already analyzed so you can see how the Contribution Analysis results compare to your findings.

  3. Set up an Alert on one of the metrics you follow. Create your initial alert to intentionally fire soon so you can verify that you've created it correctly and have confidence that Alerts is working for you.

In general, try all of these services on your own data and see where they're useful for you. Reach out to me directly at challis@adobe.com with any questions or with recommendations for types of analysis you wish you could do with Analytics. I would love to hear directly from you the kinds of problems you're facing and capabilities we could provide that would solve them.

Additional Resources

Experience League: a new enablement program with guided learning, one-to-one expert support, and a thriving community of fellow professionals designed to help you get the most out of Adobe Experience Cloud.

Community Forums: Get answers to questions, discuss topics with your peers, and vote on features you want to see in the future.

Appendix

The Virtual Analyst documentation provides further details on all of the services covered today.