If you’ve never heard of it, the back/forward cache (bfcache) is a browser optimization introduced in 2019 that is available in all major browsers, including Chrome (or Chromium-based browsers, version 96 and above), Firefox, and Safari. It allows for instant navigation when moving back and forth between pages. This feature significantly improves the browsing experience, especially for users on slower connections or with less powerful devices.
The way it works is that when you navigate to a new page, the browser, instead of immediately destroying the current page, delays it and pauses JavaScript execution. This allows the page to load instantly if the user clicks the back button, as the page is restored immediately from memory.
This is beneficial in almost every imaginable way, but when it comes to measuring our site, it poses a challenge. Since JavaScript execution is paused and the page is restored when the user navigates back ( or forward ), our scripts won’t fire again, which means our tags won’t trigger or re-evaluate. Meaning that for example the page_view won’t fire. Oh boy that’s not good, let’s see what happens on a site when the bfcache is doing it’s thing.
I think this is really important because it means you could be missing crucial page_view tracking on your site, or certain pixels might not be firing as expected. This could lead to gaps in your data collection, making it difficult to accurately measure user interactions and performance. Without proper tracking in place, you may miss key insights, such as user behavior on different pages or the effectiveness of your marketing campaigns. Ensuring that your ‘page_view‘ events are consistently fired across all page types—whether regular page loads, virtual pageviews, SPA sites, or bfcache-restored pages—will help guarantee that you’re capturing all the necessary data for effective analysis and optimization.
As shown below, users navigating through the site won’t trigger any page_view events (or any other events). Meaning that I may be missing the measure of my blog content being viewed :(.
There’s some ways of checking if this is happening to us, easiest way it just checking the console, because, at least Chrome, will let us know when the page is being restored from the bfcache
How BFCache Works
Let’s take a look at how the cache works—it’s a bit different from other caching systems we’re typically used to. The bfcache is a full snapshot of the entire page, including the JS Heap and gets fully restored on user navigation.
You might wonder what happens to the code running on the site, such as promises or setTimeouts. It’s important to know that these will also be paused, along with most other tasks in queues.
It’s also important to keep in mind that iframes contained within the current page are not eligible for the bfcache.
Testing BFCache Behaviour
We can also use Developer Tools to force this behavior by going to the Application tab, then navigating to Background Services > Back/forward Cache.
I want my pages to be fully tracked
We need to understand that the Page Lifecycle API internally tracks the current page state, allowing us to know when a page is being frozen (Freeze Event) or resumed (Resume Event), as shown in the Page Lifecycle API State Flow below:
These events are emitted to the document, so we can listen to them by adding a simple listener, as shown in the code snippets below.
Also Since Chrome 68 according to the docs, we can even know if the current page was discarded from memory while being hidden checking the wasDiscarded switch for the current document.
if (document.wasDiscarded) {}
Tracking my pages when they’re restored from bfcache
Luckily for us, the pageshow event from the Page Lifecycle API that will trigger when a page is loaded, either for the first time or when it is restored from the back/forward cache (bfcache).
Key points about pageshowevent
It is triggered when a page is shown, which can occur when navigating to a page for the first time or when returning to a cached page.
The event includes a persisted property that indicates whether the page was restored from the bfcache (back/forward cache). If persisted is true, the page was restored from the bfcache.
Based on this, we can easily add a piece of code to monitor if the page is being restored from the bfcache. For example, we can push a page_viewevent to our dataLayer or trigger a gtag page_view event directly. Of course, you’ll be able to customize this for your specific needs.
This is personal advice: even if you don’t have an SPA site or don’t worry about the bfcache, it’s still a great idea to set up your system to push a ‘page_view’ event every time. Doing so will not only help you establish a common and shared logic for handling regular page loads, virtual pageviews, SPA sites, and bfcache-restored pages, but it will also allow you to avoid race conditions (e.g., by not relying on triggers like ‘All Pages’, DOM Ready, or CMP callbacks). The key is that every time a ‘page_view’ event is fired, it should contain all the necessary data in the dataLayer allow you to have a starndard and safe way to know when triggers things.
I’ve released a new library named “Return Of The CustomTask” which as the name it self suggest brings back the Universal Analytics Custom Task functionality to Google Analytics 4.
It’s an Open Source library under the Apache 2.0 license, that uses Fetch Interceptors to mimic the behavior on the old friend the customTask. The idea of having the change to modify the current GA4 Payload before it gets sent to Google Analytics Servers.
The library consist on a single file that accepts a list of callbacks ( customTasks ) that will be applied to our request.
These tasks will be applied sequentially, meaning you can easily apply more than one action, such as checking for PII and removing duplicate purchase events.
At the same time, I took some time to find all the possible custom tasks by searching on Google, and I’ve already replicated and made them available for everyone. In many cases I’ve even make them even better than the originals 🙂
I must advise that this is a very technical approach to getting things done, so use it at your own risk. If you’re not a developer, consider seeking help rather than just trying to copy and paste. There’re out there so many great Analytics Engineers and Programmers ( including myself ) that will be able to help on having things setup in the best and more safe way.
Note: In the coming days, I will be writing specific posts for each of the tasks to ensure that their usage is clear for everyone. In any case, each task folder on GitHubhas a README with the basic details to help set things up.
First Step: Grab the GA4CustomTask code
After building the library you’ll find all the code within the dist/ folder. The code is provided in minified format and non-minified way. Since you’re not likely going to need to change anything here, i would select the dist/GA4CustomTask.js code ( use the minified code better ). Now they only thing we need to do is adding it into a Custom Html tag on Google Tag Manager o in other TMS or your page source.
If you using this library without GTM or using another other TMS the logic should be the same, fire it before GTAG Code.
<script>// dist/GACustomTask.js// Use the linkj on the top link for updated code(function (global, factory) {typeofexports==='object'&&typeofmodule!=='undefined'?module.exports=factory() :typeofdefine==='function'&&define.amd?define(factory) : (global=typeofglobalThis!=='undefined'?globalThis:global||self,global.GA4CustomTask=factory());})(this, (function () {'use strict';// Check if the URL belongs to GA4functionisGA4Hit(url){try{varurlObj=newURL(url);varparams=newURLSearchParams(urlObj.search);vartid=params.get('tid');varcid=params.get('cid');varv=params.get('v');return!!tid&&tid.startsWith('G-') &&!!cid&&v==='2';}catch (e) {console.error('Error parsing URL:',e);returnfalse;}}varinterceptors= [];// Interceptor function to handle fetch requests and responsesfunctioninterceptor(fetch,args){varreversedInterceptors=interceptors.reduce(function(array,interceptor){return [interceptor].concat(array);}, []);varpromise=Promise.resolve(args);// Apply request interceptors (resolve to FetchArgs)reversedInterceptors.forEach(function(_a){varrequest=_a.request,requestError=_a.requestError;if (request||requestError) {promise=promise.then(function(args){return (request?request.apply(void0,args) :args);},requestError);}});// Proceed with the original fetch call (resolve to Response)varresponsePromise=promise.then(function(args){returnfetch(args[0],args[1]);});// Apply response interceptors (resolve to Response)reversedInterceptors.forEach(function(_a){varresponse=_a.response,responseError=_a.responseError;if (response||responseError) {responsePromise=responsePromise.then(response,responseError);}});returnresponsePromise;}varGA4CustomTask=function(settings){if (!settings)return;interceptors.push({request:function(resource,options){if (options===void0) {options={};}try{if (typeofresource==='string'&&isGA4Hit(resource)) {varurl=newURL(resource);varRequestModel_1={endpoint:url.origin+url.pathname,sharedPayload:null,events: [],};varpayloadArray=Array.from(newURLSearchParams(url.search).entries());if (!options.body) {RequestModel_1.sharedPayload=Object.fromEntries(payloadArray.slice(0,payloadArray.findIndex(function(_a){varkey=_a[0];returnkey==='en';})));RequestModel_1.events= [Object.fromEntries(payloadArray.slice(payloadArray.findIndex(function(_a){varkey=_a[0];returnkey==='en';}))) ];}else{RequestModel_1.sharedPayload=Object.fromEntries(payloadArray);RequestModel_1.events=options.body.split('\r\n').map(function(e){returnObject.fromEntries(newURLSearchParams(e).entries());});}varpayload=Object.fromEntries(newURLSearchParams(url.search));if (settings.allowedMeasurementIds&&Array.isArray(settings.allowedMeasurementIds) &&!settings.allowedMeasurementIds.includes(payload['tid'])) {return [resource,options];}if (Array.isArray(settings.tasks)) {settings.tasks.forEach(function(callback){if (typeofcallback==='function') {RequestModel_1=callback.call({originalFetch:GA4CustomTask.originalFetch},RequestModel_1);}else{console.warn('Callback is not a function:',callback);}});}varreBuildResource=function(model){varresourceString=newURLSearchParams(model.sharedPayload||{}).toString();varbodyString=model.events.map(function(e){returnnewURLSearchParams(e).toString();}).join('\r\n');return{endpoint:model.endpoint,resource:resourceString,body:bodyString,};};varnewResource=reBuildResource(RequestModel_1);if (options.body) {resource="".concat(newResource.endpoint,"?").concat(newResource.resource);options.body=newResource.body;}else{resource="".concat(newResource.endpoint,"?").concat(newResource.resource,"&").concat(newResource.body);}}}catch (e) {console.error('Error in fetch interceptor:',e);}return [resource,options];},response:function(response){returnresponse;},responseError:function(error){returnPromise.reject(error);},});// Ensure fetch is available in the environmentwindow.fetch= (function(fetch){returnfunction(resource,options){varfetchArgs= [resource,options];returninterceptor(fetch,fetchArgs);};})(window.fetch);return{clear:function(){interceptors= [];},};};// Add original fetch for TypeScript type safetyGA4CustomTask.originalFetch=window.fetch;returnGA4CustomTask;}));</script>
We are on the right path, now we’ll have a new class GA4CustomTask what we can instantiate, for attaching the intercepts to the Fetch API
We need to focus on the highlighted lines. This tool operates over the Fetch function, but typically we only want to intercept GA4 hits. Don’t worry the tool already detects these hits internally in order to intercept just the requests we need. However, what happens if we’re using two Measurement IDs on our site? On line 3, we can specify which Measurement ID the Custom Task should apply to.
Then we can define the tasks that will be applied to our payload. On GA4CustomTask is possible to run some chained tasks, and they will sequentially applied. ( The customTask receives the requestModel and returns it back after the task has finished working with it )
Custom Tasks List
I went ahead an migrated all customTasks I found on internet to this new library. You can find the list of them an the source code at the repository as packages on the folder /tasks
This task reads the clientId value a passed it back to all the events on the request , or to the first event if the scoped defined is ‘user’
It accepts 2 parameters, the name to be used for the event parameter / user property and the scope. If the scope is not specified it will be set as ‘event‘
This task will intercept all hits containing at least 1 purchase event on the payload. If the current ep.transaction_id parameter value was already used on the current browser, that specific event will be removed from the request.
This task relies on Cookies and the LocalStorage for keeping the transactions history. and internally keeps for state management system synched, meaning that if the user removed it’s cookies but not the localStorage the data will be replicated back to the cookie ( and same if they remove the localStorage )
It takes the cookie name as an optional value, or default to __ad_trans_dedup by default
This task takes the GA4 Payload and sends a copy to the defined snowplow collector endpoint. Since SnowPlow expects one event per request this task generates an individual request for each event on the payload ( keeping the sharedParameter intact )
You can pass the endpoint Hostname as a parameter.
What to say about this one, a classic. It will replicate out request to a secondary Measurement Ids, but this time, It takes 2 extra parameters: a list of whitelisted events and a list of blacklisted one ( this one will take effect it whitelist is not passed or it’s empty )
// This will relay ALL the events varGA4CustomTaskInstance=newGA4CustomTask({allowedMeasurementIds: ["G-DEBUGEMALL"],tasks: [(requestModel)=>sendToSecondaryMeasurementIdTask(requestModel, ["G-SECONDID","G-ANOTHER"], [], []), ]});
// This will relay only the add_to_cart and purchase eventsvarGA4CustomTaskInstance=newGA4CustomTask({allowedMeasurementIds: ["G-DEBUGEMALL"],tasks: [(requestModel)=>sendToSecondaryMeasurementIdTask(requestModel, ["G-SECONDID","G-ANOTHER"], ["add_to_cart","purchase"], []), ]});
// This will relay all events but purchase eventsvarGA4CustomTaskInstance=newGA4CustomTask({allowedMeasurementIds: ["G-DEBUGEMALL"],tasks: [(requestModel)=>sendToSecondaryMeasurementIdTask(requestModel, ["G-SECONDID","G-ANOTHER"], [], ["purchase"]), ]});
This task strips out all the parameter that are not related to Analytics, in case we are wrroried about our privacy and the data going to Google. Useful if we are tracking an intranet or some sensitive environment and we want to have some extra privacy added.
You can find the list of current parameters on the repository
Lastly (for now), we have our Bounce Task. We can define a list of events that we want to allow through our implementation, preventing all those pesky vendors and script kiddies from pushing events to the gtag() function to mess with our data.
But not only that, we can define which parameters we want to allow (WhiteListedEventParameters), which will strip out any parameter that is not listed from the current event.
The function takes a Schema definition object to work
Disclaimer: Got a report of this may breaking some Google Ads Integration. I’m gathering details to understand the issue and try to find a workaround (if possible).
One of the most recurrent issues while trying keep or data clean is needing to fight with the spammers or the script kiddies that pollute our properties. This is not something new and it’s been around since the start of the times.
This was even a bigger issue on Universal Analyticswhere property IDs were sequential. This made it easy for malicious actors to systematically spam all accounts with minimal effort.With the transition to Google Analytics4 (GA4), the new Measurement IDs are no longer sequential. This change makes it significantly more difficult to indiscriminately spam all accounts, as the previous method of programmatically targeting consecutive IDs is no longer feasible.
Still we’re not safe from having someone looking at this network requests or even having some people crawling websites and making a lost of the Measurement IDs used to having our properties with some unneeded data.
Sadly, Google Analytics 4 (GA4) doesn’t have good controls to stop or filter out unwanted data from coming into your account.
Due to the nature of the Analytics Tracking ,which happens mainly on client side, there’s no way to stop this spam. But we can take some actions to try to mitigate this issue. And we’re going to talk about the most important one from my point of view. This is:
Not Allowing spammers to know our real Measurement ID
The last month my company ( Analytics Debugger ) became a Stape.io partner, since we started to offer Server-Side services. Which is allowing me to start playing more with the technology. and luckily this reported spam attack over makes the perfect introduction for a new Server-Side Trick.
Running GTM/GA4 via SGTM in Stealth Mode.
We’re using a Client Template to make our Measurement ID invisible to attackers. This will keep the script kiddies that crawl our sties away since they’ll be getting a fake ID , And the tempalte will take care of forwarding the final request with the proper ID.
But David, the could still point to our server-side endpoint and hits will end hitting our property, and that right!, but since we’re running a server-side endpoint we can enforce some rules ( working on refactoring some old code I have in PHP to make a SGTM template ) to prevent the spam. At this point we have a lot of rules to try to fight the spam traffic, for example:
User Agent Checking
Checking the Request IP address Against the associeted ASN ( Autonomus System Number ) , most of the ISP or DC providers have one, so it’s a easy task to filter out not residential connections using this method. There’re even some IP database that are open ( this information is provided by RIPE )
But not only this, we’re on server side, meaning that we can easily build some last 15 minutes database list, and build some throttling mechanism, or we could also check the IP GEO location, let’s be honest I’m from spain, getting too much traffic from some countries may the unsual. But the best of all this that could even build some internal IP Score, allowing us to tag spam traffic ( not event removing it ) for example assing the &tt= parameter via SGTM if the current score is > 5 ( you’ll need to the setup the rules ).
In any case, I know some other people has been talking about this in the past, I feel this could be great change for running a hackaton with the smart people around ( maybe in the Analytics DevFestbecames a reality 🙂
So, filtering hits has been a long-debated topic, where Server-Side can play an important role. However, if we don’t hide our Measurement ID, spammers can directly target GA4 servers and bypass any implemented checks.
Let’s me start showing how your setup will look like, please note that the possible attacker will only see our GTM-DEBUGEMALL, and G-DEBUGEMALL Ids. At any point he will be able to target Google Servers to grab the real data 🙂
But still with our Server Side send the data to Google., replaced the &tid by the real one:
Implementing this involves setting up a Server-Side Client Template, which you can download from here: GTM/GA4 Stealth Mode Client Template and setting up the real and fake ids you want to use in our setup, refer to the following screenshot for guidance:
Last step would be slightly updating our GTM Loading snippet. You may see that there’s an extra /s/ before the gtm.js, this is because SGTM doesn’t seem to allow you to claim the requests to the knows GTM/GA endpoints ( or I was not able to do it … )
The template needs to be improved, and I don’t consider it production ready, ( despite I’m using it on this own blog )
Essentially, we can utilize all our methods to combat spam, as we have been doing, while ensuring that the ID remains concealed to prevent attackers from circumventing any measures aimed at safeguarding our data integrity.
Disclaimer: This might stem from a misunderstanding on my side about how GA4 tags function within Google Tag Manager (GTM), especially compared to how configuration tags used to work in Universal Analytics. The differences between these systems can be subtle and confusing. This issue has been quite frustrating, so I’m sharing my experience in case others encounter the same problem.
I was working on my Google Analytics 4 implementation via Google Tag Manager Server Side and I noticed some of the hits were coming through the regular endpoint ( *.google-analytics.com ), for some reason.
After some testing I figured out the issue. Despite Google Tag Manager shows the associated Google Tag on your GA4 tags, it’s not taking any action to be sure that the config settings are applied to the related tag. Meaning that any tag firing before Google Tag won’t have the needed configuration. GTM won’t be applying the settings to the linked GA4 Tag.
Having this message in the interface makes it somehow confusing:
So for example if the current dataLayer event that triggers your event is on the dataLayer before the event that triggers the GA4 Configuration Tag, the server_container_url will be ignored making your first hit not going through your server-side endpoint.
This means that if are firing a GA4 event tag based on a trigger that happens before the trigger that fires the GA4 config tag, these hits wont go through the SGTM endpoint.
I prepared a test environment with the most simple replication setup, just a page_view event tag that fires on the page_view event, and the GA4 Config Tag that fires on the All pages (gtm.js) event.
This is how it looks like the tags and triggers on this container:
In my case the issue was not a big deal since i’m just firing the data through the server-side endpoint and I’m not doing anything else, so the biggest problem I’d have is having some hits being blocked by some adblocker because of the domain.
I expect some people firing pixels or doing data widening, firing some pixels based on the SGTM GA4 Client to have some more inconveniences, despite the data flow to Google Analytics 4 to keep working.
Google Tag Manager knows when there’s a Google Tag that references our current Measurement IDs ( check screenshot above ), meaning that should be possible to grab the configuration parameters ( server_url, cookie preferences, etc ) and apply them to the to the linked Event Tags ( as Universal Analytics Config tags used to work … ) without even needing to wait for the Google Tag execution at all or maybe have them them queued them until the related GTAG has fired.
But anyway if you are having this issue, the workarounds are pretty straighforward:
Ask your developers to move your push after GTM Snippet
Set the GA4 Config tag trigger to be the same as your first event trigger
Manually set the server_container_url for the tags that may fire before the Google Tag
Be sure that GTAG fired always before any GA4 Event Tag.
Based on a comment on Linkedin from Roman Appeltauer there’s an extra way of getting over this.
Use the Initialization Trigger for your Google Tag
This post talks about the server-side integration issues (because it was my problem) but this can be applied to having user properties or event parameters set via the Google Tag. and not being applied to all the tags, depending on your triggers or dataLayer configuration )
When we use Google Ad Manager, the ads are served using an iframe, which makes it somewhat challenging if we want to track the impressions in another tool, such as Google Analytics 4.
This is why, on this occasion, I’ll be sharing a library to programmatically track our ad impressions and clicks.
For this task we’ll relying on the ad tagging library for Google Ad Manager, Google Publisher Tag and the Google Analytics Promotion Tracking from the Ecommerce ( view_promotion , select_promotion events )
You may identify th eGoogle Publish Tag Library for the global variable window.googletag, check the following snippet and the most simpliest example.
<head><metacharset="utf-8"/><metaname="viewport"content="width=device-width, initial-scale=1"/><metaname="description"content="Display a fixed-sized test ad."/><title>Display a test ad</title><scriptasyncsrc="https://securepubads.g.doubleclick.net/tag/js/gpt.js"></script><script> window.googletag = window.googletag || {cmd: [] }; googletag.cmd.push(() => {// Define an ad slot for div with id "banner-ad".googletag.defineSlot("/123456789/Debug/Em/All", [300,250],"banner-ad").addService(googletag.pubads());// Enable the PubAdsService.googletag.enableServices();});</script><style></style></head>
The way the Google Ad Server works is that it creates an iframe with our unit_ad_id as its name or ID, and it handles querying the Google Ad Manager server for details about the ads and the creatives to be served.
If we want to track which ads are being served in our pages, The first challenge we’ll encounter is that we only have information about the ad-unit path, which essentially provides details about the current slot context. Typically, we also need other values, such as the current campaign and the creatives being served. Since all the rendering happens on the iframe we won’t be able to know when the ads has been served, or even have access to the content becuase the content is loading within the iframe.
So, let’s start to figure out how are we going to track our ads, first thing we need to know that despite being working on an iframe ( i know we all hate iframes ), GTP library offers some events that will help us in our mission. ( GPT Reference )
impressionViewable
rewardedSlotClosed
rewardedSlotGranted
rewardedSlotReady
slotOnload
slotRenderEnded
slotRequested
slotResponseReceived
slotVisibilityChanged
Available events on Google Tag Publisher
We want to track the ads impressions, so we’ll be using the impressionViewable event. Which will trigger once the banner has been shown to the user.
This is how we can add a listener for the Google Tag Publisher event.
googletag.pubads().addEventListener("impressionViewable",function(ad){// Do our stuff});
You may have noticed that the event will return a value ( that we’re naming as ad) that will contain some extra details about the current ad being shown. This will return and slot object, which some functions we can use to grab info about the current ad just rendered.
The primary method we need to examine is ad.slot.getResponseInformation(). This method returns the current advertiserId, campaignId, and creativeId, which we will use to compile our Promotion Impression data.
But not only this, we also want to track to where the ads points to, ie: what’s the clickURL. Sadly none of the method will give us that information, but we can use ad.slot.getHtml()method , which will return the current ads markup and we’ll be using some regex to extract the details.
We will follow this logic
Extract all the a href links from the html
Filter these href links for those who have a adurl parameter
This is the way we can extract the destination url for our banner
vargetAdUrl=function(adHtmlString){// Step 1: Extract all 'a' tags with 'href' attributesvaraTagRegex=/<a\s+[^>]*href\s*=\s*["']?([^"'>\s]+)["']?[^>]*>/gi;letmatches;varhrefs= [];while ((matches=aTagRegex.exec(adHtmlString)) !==null) {hrefs.push(matches[1]);// Capture the 'href' value}// Step 2: Filter hrefs that contain the 'adurl' parametervaradurlHrefs=hrefs.filter(href=>href.includes('adurl='));// Step 3: Extract the 'adurl' parameter value from these hrefsvaradurlValues=adurlHrefs.map(href=>{varurlParams=newURLSearchParams(href.split('?')[1]);returnurlParams.get('adurl');});}varadHtmlString=ad.slot.getHtml()console.log(getAdUrl(adHtmlString));
Let’s quickly recap. Now, we can capture the Ad-unit Path, the campaignId, the creativeId, and the destination URL for the impressions of our ads, which provides us with valuable information. This means we can build a dataLayer.push({}) with a view_promotion event to track our impressions. Alternatively, we can send this information to any other tool you prefer. We’ll be using a Google Tag Manager (GTM)dataLayer.push as it is the most commonly used tag management system (TMS). Feel free to adapt this core implementation for your needs or reach out to me for help in customizing it to your site’s requirements.
Next Challenge is: “How do we track the clicks to the iFrame“. the <iframe> element doesn’t have a click event, which doesn’t help in our task, we can luckily detect the clicks within the iframe.contentWindow since it’s not hosted on a remote domain ( no src )
At this point, we should know how to detect when an impression has been rendered, extract all the details about the ad’s impression, and track the clicks on the ads. We have everything we need; it’s just a matter of putting everything together to create a fully functional tracking solution.
To make the process easier for everyone, I’ve started a new GitHub repository that provides a simple, ready-to-use library. This library will push the data as an Ecommerce Promotion event to the GTM dataLayer.
(function(){// Copyright Analytics Debugger S.L.U. All Rights Reserved. 2024// Author: David Vallejo// Date: 2024-06-21// Name: Google Publisher Tag Impressions and Clicks Tracker// This file is licensed under the MIT License.// License text available at https://opensource.org/licenses/MIT // Add ?ad_debug=1 for debugging messagesvardebug=document.location.search.includes('ad_debug=1') ?true:false;// Init googletag variable, jicwindow.googletag=window.googletag||{cmd: []};// Init googletag variable, jicwindow.dataLayer=window.dataLayer|| [];// Our variable for holding the impressions info. Used for grabbing the data on clickwindow.__gpt_impressions=window.__gpt_impressions|| [];// Helper function to extract the ad destination URL from the Ad HTMLvargetAdUrl=function(adHtmlString){// Step 1: Extract all 'a' tags with 'href' attributesvaraTagRegex=/<a\s+[^>]*href\s*=\s*["']?([^"'>\s]+)["']?[^>]*>/gi;letmatches;varhrefs= [];while ((matches=aTagRegex.exec(adHtmlString)) !==null) {hrefs.push(matches[1]);// Capture the 'href' value}// Step 2: Filter hrefs that contain the 'adurl' parametervaradurlHrefs=hrefs.filter(href=>href.includes('adurl='));// Step 3: Extract the 'adurl' parameter value from these hrefsvaradurlValues=adurlHrefs.map(href=>{varurlParams=newURLSearchParams(href.split('?')[1]);returnurlParams.get('adurl');});if (adurlValues.length >0) returnadurlValues[0];}// Adding the impression Listenergoogletag.cmd.push(function(){googletag.pubads().addEventListener("impressionViewable",(event)=>{// We have an impression, let's get the holder iframe reference and add the click event.document.querySelector('#'+event.slot.getSlotElementId() +' iframe').contentWindow.document.body.onclick=function(e){varimpressionInfo=window.__gpt_impressions.filter(function(e){if (e.promotion_name===event.slot.getSlotElementId()) returntrue;});window.dataLayer.push({event:'select_promotion',ecommerce:{items: [impressionInfo]}})if (debug===true) console.log("GPT AD CLICK",impressionInfo);}varslotDetails=event.slot.getResponseInformation();try{varimpressionInfo={promotion_name:event.slot.getSlotId().getDomId(),promotion_id:slotDetails.campaignId.toString(),creative_name:slotDetails.creativeId.toString(),creative_slot:getAdUrl(event.slot.getHtml())}window.dataLayer.push({event:'view_promotion',ecommerce:{items: [impressionInfo]}});window.__gpt_impressions.push(impressionInfo);console.log("GPT AD IMPRESSION",impressionInfo);}catch (e) {if (debug===true) console.log("GPT ERROR GRABBING IMPRESSION DETAILS: ",e);}});});})()
The earlier code snippet is responsible for pushing the necessary data to Google Tag Manager. At this stage, you just need to add the appropriate tags and triggers in Google Tag Manager to ensure the data flows into your account.
This code has only been tested on Chromium-based browsers. While the impressions should work across all browsers, we are unsure of how Safari and Firefox will behave with regard to the clicks.
If you’ve ever worked on a project and needed to debug your company’s or client’s app, I’m sure you already know how frustating can be needing to audit the implemetnation. I’m sure you know I’m talking about. Not having a public and avaiable namespace and API as we do in browsers can make things really hard, even more if you don’t have some technical skills.
Circa 2016 I presented ( well Sir Yehoshua Coren did on my behalf ) in theSuperWeek‘s Punchcard Prize a tool for debugging apps using a small hardware device ( rPI like board ). Which was essentially able to intercept anything that was connected through the Wi-Fi connection this device was generating. As simple as that sounds,on a piece of hardware made the prospect of broadly offering this to everyone somewhat difficult. At the same time, turning it into a commercial product was not an option due to my limited business and communication skills at that point.
Analysts, particularly those in data analytics and data science roles, don’t need an exhaustive understanding of how everything functions. For instance, they don’t require knowledge of JavaScript to examine how a video player is tracked, and there’s no necessity for them to be familiar with Kotlin to review the events tracked within an app. However, I firmly believe they should have the capability to check if, at the very least, the tracking is occurring. This is where my tools come into play, adn this is where Analytics Debugger for Apps was born, enabling anyone to effortlessly inspect what’s happening in their analytics implementations.
My aim was clear: to democratize Apps Debugging, making the debugging task accessible to individuals of all technical backgrounds.
Looking back to 2019 with the upcoming release of App+Web ( now turned into Google Analytics 4 / Firebase for GA ) I began experimenting with the idea of creating a tool that would mimic the reports of the current Chrome extension, regardless of the platform being debugged. Before long, I developed “Android デバッガ” a debugger for Firebase/Universal Analytics on Android as the initial step, which was later expanded to include iOS debugging and name turned into “iOS/Android Debugger“.
I really made a list of points that the tool should comply with:
No App Modifications Needed: The debugging should work ‘out of the box’ without the need to recompile packages or add external libraries.
Reports should be vendor-specific, not just a simple output of requests, in order to make the data understandable for any analyst or marketer.
It ought to be made available for free.
Open to the inclusion of new vendors.
At this stage of the venture, I’m happy to say that I’ve met all the objectives and am ready for the next steps. You might see a pattern here, but the first thing I did was change the name (once again). It’s quite in line with previous name changes. So, say hello to “Analytics Debugger for Apps“.
The debugger remains FREE for everyone as long as you stay subscribed to the newsletter (by the way, I’ve sent 0 emails in 10 years…). That’s the only requirement to use the tool. You can create an account and then a license at https://app.analytics-debugger.com at not cost.
You might be wondering if there’s a catch to this, but the answer is straightforward: there isn’t one. I’m simply sharing one of my internal tools to help people. Not sure how long I will be doing it, so for now let’s just enjoy the moment 🙂
You wouldn’t believe how time-consuming tool development can be, especially when going solo ( backend, frontend, testing, websites, docs, social, updated, support, backups, apis, design ). Therefore, the tool is open to sponsorships. Sponsors will not encounter any nag screens or experience delays while using certain features, such as exporting data to the clipboard.
“Due to Android’s limitations, not all vendors are supported (though this may change in the near future 😉), making iOS App debugging the more feature-rich version.” . Featured vendors are: Firebase Analytics, Universal Analytics (to be deprecated), SnowPlowand Adobe Analytics. As I mentioned before, I do this in my free time. While I would love to add more features, This is where sponsorships come into play, providing me with the resources to dedicate more time to working on the tools.
I’m amazed that over 1,500 fellow analysts have used the debugger so far, especially considering the minimal marketing efforts I’ve put into it. Huge thanks to everyone! .
As expected in the last months many vendors and providers has started to add Google Analytics 4 integrations, and many of them may just push events to the GTAG wrapper function, and you’ll likely ending having some unwanted events on your reports.
Not only about vendors, spammers has an easy way to programatically mess with your data, just using the global objects.
We’ll learn some implementation tricks for preventing any other tools to pollute our GA4 data and also how we can ensure that nobody else but us send data send data to our dataLayer ) , as usual I’ll be using examples for Google Tag Manager and Google Analytics 4 but same logic could be applicable to any other tool.
Protecting GTAG from Bots and Vendors pollution
In order to protect our setup from unasked events or pushes, we’ll slightly modify our GTAG calls. First modification is adding some guard check on the GTAG wrappermething blocked them.
<script async src="https://www.googletagmanager.com/gtag/js?id=G-THYNGSTER"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag() {
// Guard, first argument is not our protection token?, skip
if (arguments[0] !== 'protectToken') return;
// This function will allow us to pass back an Argument Type Objecto to the dataLayer
function passArgumentsBack() {
dataLayer.push(arguments)
}
// Remove the first argument and pass the rest back
passArgumentsBack.apply(this, Array.prototype.slice.call(arguments, 1));
}
gtag('protectToken', 'js', new Date());
gtag('protectToken', 'config', 'G-THYNGSTER');
</script>
Now, any gtag function calls, where the first arguments is not our “protectionToken” will be blocked. Making that any vendor or bots that tries to push data to our namespace, will be just stealthy ignored
It may happen that you cannot modify the gtag on the creation, maybe because it’s hardcoded on the page or because someone else has already initialized it. don’t worry, you could run this code to override the current method.
if (window.gtag && {}.toString.call(window.gtag) === '[object Function]') {
function gtag() {
if (arguments[0] !== 'protectToken') return;
function passArgumentsBack() {
dataLayer.push(arguments)
}
passArgumentsBack.apply(this, Array.prototype.slice.call(arguments, 1));
}
}
You have to remember that now you need to append your protectionToken to any call you want to push.
gtag('protectToke', ... )
Protecting our Datalayer from Bots and Vendors pollution
I bet you may have already thoughjt about just adding a custom event to all your pushes and then adding a blocking condition on GTM, and that’s indeed a great idea, but at this time we’re not trying to block events being push, but our dataLayer being polluted and messed up. We really want a clean and nice looking dataLayer which is fully governated by us, in anyone want to enter the play , should ask us first :).
Protecting the dataLayer is a bit more complicated ( at least Google Tag Manager one ), and this is because when GTM loads, it modifies the original push method from the array. That we’ll be end losing the reactivity, or causing any other malfunction is we mess around with it, at the same and for the same reason, we cannot add the modification on the dataLayer initialization because it will be lost when GTM overrides it.
The sting we need to do here is wait until the dataLayer.push has been fully initialized and then add some method to intercept the calls beind made to it.
On this example I’ll be using a simple proxy pattern , but there are some more proper ( and at the same time more difficult to implement ) workarounds like, working with setters and getters or using an ES6 Proxy. In any case this method is pretty straightforward an has a very good cross-browsers support.
I tried to focus on having an undestandable code rather than some cool looking code. We’ll use a Promises based solution to query the dataLayer.push method until we detect it was already initilaized by Google Tag Manager and then we’ll add our proxy
This is our dataLayer variable name, will default to dataLayer
pollingTime
The polling period, by default it’s checked every 25ms
limit
We don’t really want to wait forever, limit + pollingTime will stop the watcher. If you want to calculate this in seconds the total seconds that the code will keep waiting for a dataLauyer is secs = (limit * pollingTime) / 1000
protectKey
This is the key we need to add to our pushes, if it’s not present the push won’t go throught
protectValue
And this is the expected Protect Token value
Settings Parameters Definition
If you check the code carefully we added an special rule to allow all events starting with /^gtm\..*/ skip the check, to allow the system pushes to keep going into the dataLayer.
So now, if we someone does the following, the push will be intecerted and will never reach our dataLayer.
From this point on the logic could be extended as much as you want, for example you may want to defined a whitelist events list rather than working with a token, it’s just up to your imagination.
This proxy pattern is extendible to almost any tools, meaning that you could this concept to any other vendor or TMS. Please take in mind that this is not a trivial stuff to add, so my advise is relying on your dev teams or some agency or contractor that can take some proper care or implmention this kind of solutions.
Anyone that knows me is aware of how obsessive may I be when working on something from the optimization perspective. I really mi
The point of this post is measuring the current memory usage by the JavaScript in our pages, for this we’ll be using the performance.memory API . I have to start saying it’s a currently deprecated feature that it’s only available on Chromium browsers, which returns the JavaScript heap details, and that it may miss some accuracy depending on if some pages sahre the same heap or the page has iframes using a separate heaps.
A new experimental API is meant to replace this one, Performance.measureUserAgentSpecificMemory(), which will be able to estimate the current web page memory usage including it’s iframes and related workers. But it needsf or cross-origin-isolation ( window.crossOriginIsolated ), and we’d need to add teh COOP/COEP headers to our site. So let’s work with the old API for now.
In any case, we’ll be working with the current performance.memory API, which may end giving us some good tips on our current pages memory usage.
Let’s get on it
performance.memory
As we mentioned before this is a function, that will return the current JS Heap of the current page and will only work for Chromium based browsers, still having a sampling of this metric could help us on measuring how our sites performs,
It will return 3 different metrics.
jsHeapSizeLimit , The max. memory available for the current context.
totalJSHeapSize , The total allocated head size.
usedJSHeapSize , The current active segment of JS Heap
In any case returning the current memory usage at an specific moment doesn’t give much value, so I prepared an script that will recording the current values during the page load, and that pushes the maximum, minumun, and average memory usage to our dataLayer.
You can personalize the values, by default the script will ask for the current memory usage each 1/4 seconds ( 250ms ), during 10 seconds, or 2.5seconds after the page load event whatever comes first. Just in case the data will be pushed if the user navigates away from the page before any of the previous rules happen using the beforeunload page event.
You can personalize these values on the script top section. Please have in mind that using a 1 millisecond polling time won’t be likely giving you better insights, just keep that value
The idea is grabbing the memory usage during the page rendering, since after the page load it may not change much. ( this will depend of course, if you have some lazy loading content, the memory usage will keep changing ), or you may be using an SPA site, that you may want to monitor on some regular intervals, if that case you may have this running on the background and push the momory usage data for each history.change or virtual pageview
The code Snippet
Here you can find the code that will take care of monitoring the memory usage over the page load. Please the sooner you add this code into the page the better, if you are using Google Tag Manager using the gtm.init , gtm.js / All Pages events, if you have a TMS with the option of adding sychrounous code that’s your best choice, and lastly adding directly into the page.
Parameter
Description
eventName
The event name to be using on the dataLayer Push . string
sampleLimit
Total count of samples to take. integer
pollingPeriord
Polling time in ms . This along with the sampleLimit will defined how much max time will take the vent to fire. integer
waitTimeAfterWindowLoad
Total Seconds to wait after windows load. It doesn’t make many sense waiting 10 seconds if the page loads comes in 2 seconds, So we’re pushing the data on this data without waiting for all the sample limit. integer
<script>
(function() {
// David Vallejo (@thyng)
// Analytics Debugger S.L.U. 2023
var settings = {
eventName: 'memory_usage_profiler',
sampleLimit: 40,
pollingPeriod: 250, // in ms
waitTimeAfterWindowLoad: 2500
}
// This is only available on Chromium based browsers, just skip if the API is not available
if (!(window.performance && 'memory'in window.performance))
return;
try{
// Initialize Data
var data = {
sent: false,
samplesCount: 0,
max: performance.memory.usedJSHeapSize,
min: performance.memory.usedJSHeapSize,
avg: performance.memory.usedJSHeapSize
}
var pushData = function(data) {
if (!data.sent) {
window.dataLayer.push({
'event': settings.eventName || 'memory_usage_profiler',
'event_data': {
'max_memory_usage': (data.max / 1024 / 1024).toFixed(2),
'min_memory_usage': (data.min / 1024 / 1024).toFixed(2),
'avg_memory_usage': (data.avg / 1024 / 1024).toFixed(2),
}
})
data.sent = !!true
}
}
var clear = setInterval(function() {
if (performance.memory.usedJSHeapSize > data.max)
data.max = data.avg
if (performance.memory.usedJSHeapSize < data.min)
data.min = data.avg
data.avg = (data.avg + (performance.memory.usedJSHeapSize)) / 2
data.samplesCount++;
if (data.samplesCount >= settings.sampleLimit) {
clearInterval(clear)
pushData(data)
}
;
}, settings.pollingPeriod)
// If page has been already loaded, wait 1 second and push the data
window.addEventListener("load", function(event) {
setTimeout(function() {
clearInterval(clear)
pushData(data)
}, settings.waitTimeAfterWindowLoad);
}
);
// In case the user navigates away from the page...
// Should prefer to use hide,load mechanism, Pending.
window.addEventListener('beforeunload', function(event) {
clearInterval(clear)
pushData(data)
});
}catch(e){}
}
)()
</script>
Sending the data
At this point we have all the data coming into our dataLayer, this means that we could send it to wherever we want to. As usual I’m using Google Analytics 4 , and then make use of some Metrics for getting the averages.
It’s gonna be some easy setup, just create 3 dataLayer type variables, a trigger to match the event name you defined for this tracking ( default: “memory_usage_profiler” ) , and lastly map all to a GA4 event Tag.
Google Analytics 4 Metrics
We’d need to create some metrics in our account, we should create metrics and not dimensions.
Looker Studio Report Example
I quickly built a looker studio report to show how the data will look like.
Deciphering User Engagement Measurement in Google Analytics 4
One of the most notorius changes in Google Analytics 4 was the removal of the Bounce Rate Metric in benefit of the Engaged Sessions / Engagement Time. ( yes, I know they ended bringing it back with some slight differences ).
First one will define if the current session has been engaged by the user, second one defined the “amount” of engagement involved with the visit. The combination of these 2 metrics will provide a better view of users behaviour in our site, rather than checking a simple “did they bounce” boolean parameter without any further context.
Engaged Session
Google Analytics 4 will consider ( and accordingly mark ) a session as engaged , is any of the following conditions are met:
The session last longer than 10 seconds (this can be configured)
The session has at least 1 conversion event
The session has at least 2 or more page_view or screen_view
Then the engagement rate will report the percentage of sessions that meaningfully interacted with your site, based on the condition listed above.
In the other side the “Bounce Rate” will report the sessions where there was not engagement. The session engagment is reported back to GA4 using a shared payload key (ie: it’s shared accross all the events in the current hit ).
Engagement Time Metric
The way the engagement time is track may confuse the people, despite it may include a lot of rules inside, it ends being pretty simple, Google Analytics will only count the time the user is really “engaged” (huh!) with your site, and for deciding this there are 3 page state that needs to be enabled.
Page MUST be visible
Page MUST be active
Page MUST have the Focus
If you’re not a developer this may be easy to understand, let me try to explain.
The current document (page) has to visible, if for some reason the current page document is not visibile on for the users, no time will be counted. This is detected using the visibilitychange API and checking for the document.hidden value.
Also the current page needs to be active, this is measured using the pageshow and pagehide events. If you don’t know that this means, let’s say that we have a SPA page we press the Back button the counter will be stoped on the pagehide and will start counting again whe the page is shown.
Lastly, the current window needs to have the focus. Let’s say that you’re using my Analytics Debugger Extension, the time you’re interacting with the extension the focus it’s on the extension, not on the page. Google Analytics 4 will stop the counter and that time won’t be counted.
In any case some of these rules my not work all the times because of the different browser behaviors, race conditions, so it uses some more signals to enable or disable the counter, for example checking if the document.hasFocus is true, or starting the counter of each gtag event push as long as the event name is not a get call ( "gtag.get" )
The user_engagement event
There’s some extra point to have in mind about how Google Analytics 4 measures the engagement time. Each time that we click on any self-domain link, or any of the domains configured on the linker. the user_engagement is fired. (There may some more conditions to be meet, or some extra conditions that may trigger it at some point)
The main reason for this auto-fired event to exist seems to be measuring the current time on page, that’s why it gets fired just before user leaves the current page. And this will trigger if there’re no other events pending on the queue (I think, need to test this deeply), for example if you have an click event at some point, that even will hold the engagement time, so this even will not be needed.
You’ll likely need to check the “preserve log” check to see these hits ( which are also available on Big Query ), and that which basically hold the engagement time value (&et) . You may see these hits showing an unknown status on the network tab, nothing to worry about. All hits on GA4 are sent using the sendBeacon API , which basically means that they are fired and the browser doesn’t wait for any responde from the receiving endpoint. Since
Are these Metrics Reliable
I was chatting with @CharlesFarina about this, and related to the reliability of the engagment time it came to my head that there were some studies mentioning some issues with he sendBeacon usage.
The people of Volument, found some 40% discrepancies while using sendBeacon along with the unbeforeUnload event here: https://volument.com/blog/sendbeacon-is-broken . According to MDN documentation should be use along with the visibilitychange ( I didn’t check on which one GA4 triggers the data, since no events are fired on my visibilities changes, we may guess to be fired on the pagehide, or page unload … ) . In any case, this study was ran 2 years ago, so as today we should expect this gap not to be as big.
Still even if this event is not finally tracked some times, it really helps on adding even some more value to the engagement time metric on Google Analytics 4.
Working demonstration Gif:
And that basically how it works, I may be missing some smalls details and rules, but it’s not the point of the post to replicate the funcionality , but providing some details so we know why some events don’t attach the event engagement_time ( &_et ) value, or why sometimes event if we spent more than 10 seconds the session has not been marked as engaged.
One thing that I may be missing is having the option to set an event as a “non-interactional“, as we used to have on the old analytics, but rather that non spawning a session this would just skip passing the engagement_time to the current event, which may be useful for people tracking some backend or automatic event without any use interactions.
I’m the last year I didn’t have many time to work on this blog, mostly because I’ve been focusing on my other side projects on my “free” time.
Since I started with this blog some many years ago, my top 1 rule has been not writing about what others have already written, so I’ve been trying to provide new information on each of my posts, rather than retreading others content or even my own content each year, this added to some people being ludricoulsy fast on providing new content about Google Products new features had left me in a very difficult position to provide some more stuff.
In any case, Google Analytics 4 is now in da house, which provided a full new scope of chances to write about some new stuff, or even take time to really update all these +10 years old blog posts. So I’m back on writting.
This is gonna help me on improving my , poor, english skill, but will also let me learn some new stuff, since I’ve been feeling a bit outdate, ( ie: Big Query, Attributions. Modeling ).
All this said, this post also inagurates the new blog design, which I think is more read-able and includes a dark-mode version. Along with this there will be some news about the content on this blog.
It won’t even again be solely a Google Analytics / Google Tag Manager blog, I really want to try,learn new tools, vendors, and of course “teach” about them to all this blog readers.
This will be a ChatGPT-Free blog . That may be against my blog growing or writting posibilities, but I only want to provide my own created content, and I want to improve my english skill. So, you’re really lucky at this point since you be able to keep enjoying my exquiste english vocabulary.
I’ve never had/ask for collaborations, I’d love to have some guest posts, and will be even beter if we work on some solution together, are you interested on this?, ping me.
I’m take away my Top 1 Rule. I’ll write about anything that I may work with, or that I may try, since even if some other people has writting about the same topic I may contribute with some other valid PoVs or ideas that someone may find useful.