Data sampling in Google Analytics is a very interesting phenomenon that affects all GA users at a larger scale. In a nutshell, it consists in selecting a cohort or a group of similar recipients and presenting them as one group. Sampling organized in this way relieves the servers from the excess of unnecessary (duplicated) data and at the same time allows you to extract reliable and credible data. Each GA report, however, has its own limits, which are indicated by one key symbol. I invite you to read this post to delve into the secrets of data collection in GA!
Data sampling limits
The enormous amount of data passes through Google’s servers every day, so it would be impossible to return 100% of the data to every single user of this system, considering that each single report requires the same data to be prepared from a scratch. But let’s start from the very beginning – what number do the restrictions begin with? How much data is a sufficient source of information to trust the number you have sampled? Let’s look at the data below.
- 500,000 service-level sessions
- 1,000,000 conversions in the “multi-channel funnels” report
- 100,000 “path visualization” report sessions
Google Analytics 360:
- 100,000,000 data-level sessions
How to check if data is sampled?
Access any report in your Google Analytics account. At the very top, to the left of the report’s name, there is a shield. Check its color – green means 100% of the data, yellow means sampling. You can choose from two options: “faster response time” or “more precision”. Of course, this only makes sense with reports with a lot of data.
How to avoid data sampling?
The most important element in seeking information with the help of GA is the question “what do I actually need?” Sometimes the best way is to keep the default Data Impressions as simple as possible. In order to provide us with the most precise information possible, GA groups similar groups of data that may constitute important information for us and returns the sampled data to us.
It does not mean, however, that we see incomplete information about our users. The user selection mechanism is based on a cohort, i.e. a group of similar behaviors or searches. Thanks to this, the system response time to our request is shorter and we do not have to wait indefinitely for each report.
Another most common solution is the Supermetrics plugin, which allows you to work with data from Google Analytics, e.g. in Google Sheet. It also has another use as numerous connectors with Google Data Studio, but it is payable depending on the number of users using it.
Are you as interested in the topic as I am? Last but not least, a very interesting video from the Loves Data channel, which also discusses other methods of avoiding sampling.
Read on next time!