People often ask why they should use any other tool than Google Analytics? Don’t we have all the data we need there?
It’s true, Google Analytics stores a lot of data. But sometimes it doesn’t tell the whole story.
We tend to accept the numbers displayed in GA as the absolute truth. It’s is a slippery slope.
We should ask questions to figure out what’s behind the number. It’s not always easy to answer, but investing the time will be worth it.
With that in mind, let’s pull some data using the GA API to give it different shapes.
We are trying to find a way to get to the answer faster and to go deeper into the visitor patterns.
With a little bit of help from tools like R and Tableau our data analysis should go very smooth.
Let’s dive in!
Box plots & Histograms
In GA we don’t have any way to build box plot visualizations. It’s too bad because we can easily look at aggregations, spot outliers, see distributions, etc.
High-level executives usually love box plots. It might be an excellent way to communicate with them. But most importantly, Data Analysis should be a core discipline, not only a better way to get your point across.
The following is the most comprehensive video explanation I could find on box-and-whiskers charts (or box-plots)
We should keep in mind that taller means the data varies a lot, shorter means that it’s more stable and easier to predict.
Here is a chart we built using R showing traffic grouped by the day of the week:
We can quickly draw some conclusions from our chart:
- Tuesday and Wednesday are the days with the highest traffic
- Weekends are not very active, especially Saturdays
- There are no outliers, which means no unusual traffic spikes
- The days with high traffic have longer whiskers compared to lower traffic ones: it means that the variation is higher
Let’s have a look at another box plot, built with Tableau. It shows the daily traffic grouped by month of the year:
- it’s easy to spot the outliers in this example
- it’s more difficult to predict traffic in November, December and January compared to the rest of the year (since the variations are taller)
- these three are also the months with the highest traffic
Now, what if we could find a way to split the traffic by gender?
We have some data on the Demographics report. Why shouldn’t we try to create a cool viz?
Now, this is what I’m talking about! traffic for every day of the week split by gender.
Let’s say that we want to compare this month’s conversion rate with the same period from last year.
We would probably have a look at something like this:
How about we try it another way? Daily conversion rate for November:
Or maybe using R:
Seems a lot more relevant and more accessible to understand!
If you want to dig deeper on how to compare conversion rates with R, I recommend this article by Justin Marciszewski.
But is this the result of something we’ve done or just a random movement?
We can’t know for sure, but we can check if it is significant from a statistics point of view.
For this purpose, we will copy our data into Evan Miller’s T-Test. In the left sample we paste the conversion rates from last year, and in the right side, the ones from this year.
Click, paste, and … NOT significant:
Important conclusion: don’t rely on comparisons that don’t make sense!
We should enlarge our samples to the point where they become meaningful.
We answered a question of what’s behind a hard number. Try to do one on your own!
In time you will find more confidence in your answers, and you will start asking better questions. Because the quality of the answer depends on the quality if the question.
Let’s open our eyes even more!
With histograms, we can see the frequency distribution of your chosen metrics.
For this purpose, the data is being split into intervals, called bins. Playing around with their size could prove insightful.
For example, here is the conversion rate distribution across two months (bin size ~ 0.2%):
- the most common conversion rates fall into the 2.4% – 2.6% interval
- we can easily see the outliers on the right side
Funny enough, Google Analytics offers the option to query custom histogram options to avoid splitting the data on your own.
Below we are testing the example we found in the Lunametrics article linked above.
The result is a histogram with the traffic split into custom day intervals (traditional TV dayparts):
The histogram of Google Analytics
The only one I could find among the common reports is the session & page views distribution across session duration.
You can find it under Audience > Behaviour > Engagement and it looks like this:
It’s ok, but we need much more! Let’s move on to other types of reports.
Rolling averages are used to gauge the direction of the current trend. The definition is very technical, but they are very easy to implement. They are an average of the last X values in a dataset.
They are instrumental if you find it challenging to recognize trends in time-series.
For example, here is a regular line-chart with daily traffic data. It looks a bit rough, doesn’t it?
Doesn’t this one look a lot better?
Here is a version we built in R (the scatterplot represents the daily traffic, and the moving average cuts nicely through it):
Let’s take another example regarding conversions.
Here is a GA screenshot of transactions over ~ 4 months:
Can you spot a trend here? Without being subjective? Really?
Switch to weeks you say? Let’s see:
What if we don’t have full weeks at the beginning or the end?
Would we make decisions based on skewed data? We sure would!
How about this chart? Easier to see a trend here?
Visual is always better. Don’t act like you don’t need smart visualizations to understand your data.
You might think you know what’s going on, but you actually don’t!
Over to you now!
Here is a fine post on Lunametrics on how to create moving averages using Google Sheets. Super useful!
We built ours using Tableau. Here is a bit stuffy tutorial on how to do it.
Decomposing & forecasting time-series
Looking at time series is useful, but breaking them down into pieces is even more fruitful.
Decomposing all-time traffic in R
In the first chart we have our actual traffic which is broken down into:
- trend: the big picture growth line
- seasonal: our 7-day periods of time
- random: the random noise in our traffic
Within a few strokes of the keyboard, we have a full picture of our traffic growth over time.
Below we have a traffic forecast for more than one year using the Holt-Winters algorithm in R:
With this chart, we try to answer the question of where will we be in one year if we continue doing the same stuff as before.
It’s not rocket science; it’s a mere projection that can help us see where we’re going.
We can get a similar result in Tableau. Here’s the same forecast as above, but looking at months instead of days:
What about the growth mindset? Let’s try it below!
Growth instead of traffic
Looking at growth instead of actual traffic brings a whole new mentality to the table.
We are forgetting about volumes and focusing only on improvement.
Here is a graph of the actual traffic for each month side by side compared with last year:
I would dare to say it’s an SEO’s dream chart. The traffic volume is looking good on a year over year basis!
If instead, you were to ask a growth hacker or a very ambitious business leader, they would probably be looking for other styles of comparisons.
For example, a percentage showing the year over year monthly growth:
Or something like a month over month growth for more extended periods of time:
If in the first chart everything was rainbows and sunshine, the following two show a slowdown in growth.
It could represent a problem in the big picture.
Seeing how the total sum of metrics on a graph can be insightful for specific campaigns.
The slope of growth is also a relevant indicator in my opinion.
Below is a chart with the cumulative number of sessions vs. transactions over one year:
We can even try to split it into multiple lines like traffic sources:
Growth in GA
I’ve managed to get to year over year growth reports in Google Analytics by asking some questions in the Intelligence box.
Not much wiggle room regarding time frames at the moment, but we can see the slowdown in growth with this graph as well.
I love using bubble charts to look to look at the overall picture.
Let’s say that we want a quick overview of traffic sources. We could start digging into this kind of a table:
Or we can quickly create a bubble chart to see the whole picture in more a visual way.
Here we have transactions vs. sessions sized by conversion rate:
On the X-axis we have the transactions volume, on the Y-axis we have the traffic, and the size of each bubble is represented by the transactions/user metric (similar to conversion rate, but reported to users instead of sessions).
Not all people know, but Google Analytics has a similar chart type called motion chart.
Some might feel it’s even better since you can play it to show the movement for each day.
The main downside is that it’s built on Flash. There are compatibility issues, and some of them might even block the browser entirely.
My feeling is that motion charts could be discontinued shortly. They look cool though (if you manage to see some working ones).
You can obtain the motion functionality in Tableau as well. Here’s how it looks:
3 things we can’t do with Google Analytics
Add average or median lines through the graph:
Cluster the traffic sources and color them accordingly:
Play around with the bubble transparency to see the cramped ones:
80/20 analysis with Pareto charts
With Pareto charts we can quickly answer questions like:
- Which are my top landing pages? The ones that bring in 80% of the total traffic?
- Which are the landing pages that deliver 80% of the total conversions?
- If you are using enhanced ecommerce, you could ask: What are the top 20% of products which bring in 80% of sales/profit?
Transactions per landing page on a Pareto chart:
We have to scroll down until we reach the 80% mark in transaction volume:
We have a very similar graph for organic traffic per landing page:
We colored them by conversion rate to make it even more insightful!
If you are looking for a way to build Pareto charts with Tableau, we recommend this quick guide.
Combining Bars with Lines
In my opinion, it’s one of the most effective types of reports. It’s simple, and it cuts through the clutter in a matter of seconds.
It can give us the power to make lightning-fast decisions.
Traffic vs conversion rate per landing page
Three key paths to follow on this viz:
- find out pages with high conversion rates and double down on them
- figure out what these pages have compared to the ones with low conversion rates and implement changes on a macro level
- direct more traffic to those pages to see what happens; either from external sources or from within the website
Here are the people at TrenDemon putting a similar visualization to work in a superb way: Why Data Visualization Is Key in Content Marketing Analytics
Traffic vs conversion rate per browser resolution
This one is the classic CRO report. It’s the place where we should always start looking for conversion issues.
We can event filter down by device (mobile/desktop/tablet) to compare more relevant data:
Multi-channel funnels analysis
Understanding the multi-channel section could be challenging even for the best of us. There are some guides out there that could help.
But the article is not about using Google Analytics. We are on the hunt for different ways to look at this data.
We will work with the Multi-Channel Funnels Reporting API to obtain something easier to understand. Yes, it has a slightly different Analytics API Documentation.
Sunbursting the Top Conversion Paths
What if there was a way to avoid looking at this boring way of reporting? Could we build a Google Analytics Data Visualization to escape the soul-crushing grey?
We managed to obtain this sunburst using a simple R integration:
Not only that, but we managed an interactive chart for a basic Channel Grouping Path. Here is the live version for you to play around with:
Notice the breadcrumbs and the live calculation from the center of the graph as well as the legend option.
Here is the R code we used to create this sunburst:
library(googleAnalyticsR) ga_auth(new_user = F) start <- Sys.Date() - 33 end <- Sys.Date() - 2 #MCF mcf_gadata <- google_analytics(id = 12345678, #use your own view ID start, end, metrics = c("totalConversions"), dimensions = c("basicChannelGroupingPath"), type="mcf", max_results = 1000, sort = "-mcf:totalConversions") #devtools::install_github("timelyportfolio/sunburstR") library(sunburstR) mcf_gadata <- mcf_gadata[!grepl("unavailable", mcf_gadata$basicChannelGroupingPath),] mcf_gadata$basicChannelGroupingPath <- gsub("CLICK:", "", mcf_gadata$basicChannelGroupingPath) mcf_gadata$basicChannelGroupingPath <- gsub(":CLICK", "", mcf_gadata$basicChannelGroupingPath) mcf_gadata$basicChannelGroupingPath <- gsub("NA:", "", mcf_gadata$basicChannelGroupingPath) mcf_gadata$basicChannelGroupingPath <- gsub(":NA", "", mcf_gadata$basicChannelGroupingPath) mcf_gadata$basicChannelGroupingPath <- gsub(" > ", "-", mcf_gadata$basicChannelGroupingPath) mcf_gadata <- mcf_gadata[nchar(mcf_gadata$basicChannelGroupingPath)<80,] sunburst(mcf_gadata)
Smarter attribution using the Markov Model
My personal opinion is that the current Google Analytics attribution models are broken.
First touch, last touch, linear touch, etc. attributions are frankly not useful.
Google has a data-driven attribution product, which is available to premium users. There have been talks about making it available for free accounts, but you need a whole lot of data in your GA account.
Also, who has the time to wait for it? We need answers now!
Enter Markov Models
Here are some visuals trying to explain what Markov Chains are.
But what do these have to do with us trying to figure out which channels are converting?
The authors of the ChannelAttribution package for R explain it beautifully using a sports analogy:
We can look at Channel Attribution as a football match.
Channels can be viewed as players, paths as game actions and conversions as goals.
The Markov Model examines relationships between game actions to analyze the role of the player in scoring.
The last-touch approach rewards only players who scored the goals, while first-touch rewards only the players who originated the action.
Linear approach gives equal credit to every player who took part in the action, while time-decay grants subjective weights to players who took part the action.
Basically, the Markov Model tries to calculate the actual importance of each channel.
Everything looks good so far. Let's move on to implementation!
You might be surprised, but there are a lot of articles out there trying to explain the process. The problem is that they don't offer examples from the real-world.
Slapping some code examples and offering imaginary use cases is not helping. And showcasing a GUI app that doesn't work is the cherry on top.
Here's us trying to create the graph for an e-commerce website using R:
Please read that for me!
Now let's take the data out of R and place it inside Tableau:
Now that's more like it!
The beauty of Tableau in this case is the grouping ability.
In our example, the newsletter channel had over 250 separate campaigns. Comparing each of these would be heartbreaking:
After working a bit more wh the graphs, we got to this dashboard:
We can sort by attribution model and highlight on a particular one. Here's the link to our demo dashboard on Tableau Public to inspire your analysis.
And here's the R code we used to create the calculation:
library(googleAnalyticsR) ga_auth(new_user = R) start <- Sys.Date() - 33 end <- Sys.Date() - 2 #get the multi-channel data from GA mcf_gadata <- google_analytics(id = 12345678, #replace with your view ID start, end, metrics = c("totalConversions","totalConversionValue"), dimensions = c("sourceMediumPath","conversionType"), type="mcf", max_results = 10000, sort = "-mcf:totalConversions") #clean up the data datao <- mcf_gadata datao <- datao[!grepl("unavailable", datao$sourceMediumPath),] datao <- datao[grepl("Transaction", datao$conversionType),] datao$sourceMediumPath <- gsub("CLICK:", "", datao$sourceMediumPath) datao$sourceMediumPath <- gsub(":CLICK", "", datao$sourceMediumPath) datao$sourceMediumPath <- gsub("NA:", "", datao$sourceMediumPath) datao$sourceMediumPath <- gsub(":NA", "", datao$sourceMediumPath) datao$sourceMediumPath <- gsub(" / ", "/", datao$sourceMediumPath) datao$sourceMediumPath <- gsub("not set", "not_set", datao$sourceMediumPath) colnames(datao) <- c('path','conversion_type', 'total_conversions','total_conversion_value') datao$total_conversions <- as.numeric(datao$total_conversions) datao$total_conversion_value <- as.numeric(datao$total_conversion_value) library(ChannelAttribution) library(reshape) library(ggplot2) #The Markov calculations H <- heuristic_models(datao, 'path', 'total_conversions', var_value='total_conversion_value') M <- markov_model(datao, 'path', 'total_conversions', var_value='total_conversion_value', order = 1) R <- merge(H, M, by='channel_name') # Selects only relevant columns R1 <- R[, (colnames(R)%in%c('channel_name', 'first_touch_conversions', 'last_touch_conversions', 'linear_touch_conversions', 'total_conversion'))] colnames(R1) <- c('channel_name', 'first_touch', 'last_touch', 'linear_touch', 'markov_model') #plot the data inside R using ggplot2 R1 <- melt(R1, id='channel_name') ggplot(R1, aes(channel_name, value, fill = variable)) + geom_bar(stat='identity', position='dodge') + ggtitle('TOTAL CONVERSIONS') + theme(axis.title.x = element_text(vjust = -2)) + theme(axis.title.y = element_text(vjust = +2)) + theme(title = element_text(size = 16)) + theme(plot.title=element_text(size = 20)) + ylab("") R2 <- R[, (colnames(R)%in%c('channel_name', 'first_touch_value', 'last_touch_value', 'linear_touch_value', 'total_conversion_value'))] colnames(R2) <- c('channel_name', 'first_touch', 'last_touch', 'linear_touch', 'markov_model') R2 <- melt(R2, id='channel_name') ggplot(R2, aes(channel_name, value, fill = variable)) + geom_bar(stat='identity', position='dodge') + ggtitle('TOTAL VALUE') + theme(axis.title.x = element_text(vjust = -2)) + theme(axis.title.y = element_text(vjust = +2)) + theme(title = element_text(size = 16)) + theme(plot.title=element_text(size = 20)) + ylab("") #write the data to a .csv file asistao = merge(R1,R2,by=c("channel_name","variable")) colnames(asistao) <- c("channel_name","variable","count","value") write.csv(asistao,file="asistao.csv")
View the article as a presentation
Let's keep in touch!
Have a swing at them yourself and show your awesomeness in the comments!
If you are having difficulties understanding or building the visualizations, get in touch, and we'll help you!
We won't charge you anything <- read this if you're interested in how we roll.
The main author of the Canonicalized content.
I am highly passionate about data analysis, visualization and whatever helps people reach informed answers faster.
I love what I do, and I am working to improve speed in every aspect of my life.
I find comfort in helping people so if you have a question give me a shout!