Author: David Vallejo

  • Securing Google Analytics 4 (GA4): Mitigating Spam and Protecting Measurement IDs with Server-Side Strategies

    Disclaimer: Got a report of this may breaking some Google Ads Integration. I’m gathering details to understand the issue and try to find a workaround (if possible).

    One of the most recurrent issues while trying keep or data clean is needing to fight with the spammers or the script kiddies that pollute our properties. This is not something new and it’s been around since the start of the times.

    This was even a bigger issue on Universal Analyticswhere property IDs were sequential. This made it easy for malicious actors to systematically spam all accounts with minimal effort.With the transition to Google Analytics 4 (GA4), the new Measurement IDs are no longer sequential. This change makes it significantly more difficult to indiscriminately spam all accounts, as the previous method of programmatically targeting consecutive IDs is no longer feasible.

    Still we’re not safe from having someone looking at this network requests or even having some people crawling websites and making a lost of the Measurement IDs used to having our properties with some unneeded data.

    Also Rick Dronkers has been talking about this on linkedin https://www.linkedin.com/feed/update/urn:li:activity:7210944583294177281/ ,

    Sadly, Google Analytics 4 (GA4) doesn’t have good controls to stop or filter out unwanted data from coming into your account.

    Due to the nature of the Analytics Tracking ,which happens mainly on client side, there’s no way to stop this spam. But we can take some actions to try to mitigate this issue. And we’re going to talk about the most important one from my point of view. This is:

    Not Allowing spammers to know our real Measurement ID

    The last month my company ( Analytics Debugger ) became a Stape.io partner, since we started to offer Server-Side services. Which is allowing me to start playing more with the technology. and luckily this reported spam attack over makes the perfect introduction for a new Server-Side Trick.

    Running GTM/GA4 via SGTM in Stealth Mode.

    We’re using a Client Template to make our Measurement ID invisible to attackers. This will keep the script kiddies that crawl our sties away since they’ll be getting a fake ID , And the tempalte will take care of forwarding the final request with the proper ID.

    But David, the could still point to our server-side endpoint and hits will end hitting our property, and that right!, but since we’re running a server-side endpoint we can enforce some rules ( working on refactoring some old code I have in PHP to make a SGTM template ) to prevent the spam. At this point we have a lot of rules to try to fight the spam traffic, for example:

    But not only this, we’re on server side, meaning that we can easily build some last 15 minutes database list, and build some throttling mechanism, or we could also check the IP GEO location, let’s be honest I’m from spain, getting too much traffic from some countries may the unsual. But the best of all this that could even build some internal IP Score, allowing us to tag spam traffic ( not event removing it ) for example assing the &tt= parameter via SGTM if the current score is > 5 ( you’ll need to the setup the rules ).

    In any case, I know some other people has been talking about this in the past, I feel this could be great change for running a hackaton with the smart people around ( maybe in the Analytics DevFest becames a reality 🙂

    So, filtering hits has been a long-debated topic, where Server-Side can play an important role. However, if we don’t hide our Measurement ID, spammers can directly target GA4 servers and bypass any implemented checks.

    Let’s me start showing how your setup will look like, please note that the possible attacker will only see our GTM-DEBUGEMALL, and G-DEBUGEMALL Ids. At any point he will be able to target Google Servers to grab the real data 🙂

    But still with our Server Side send the data to Google., replaced the &tid by the real one:

    Implementing this involves setting up a Server-Side Client Template, which you can download from here: GTM/GA4 Stealth Mode Client Template and setting up the real and fake ids you want to use in our setup, refer to the following screenshot for guidance:

    Last step would be slightly updating our GTM Loading snippet. You may see that there’s an extra /s/ before the gtm.js, this is because SGTM doesn’t seem to allow you to claim the requests to the knows GTM/GA endpoints ( or I was not able to do it … )

    	<script>
     
    	(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
    	new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
    	j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
    	'https://our.server.side.endpoint.com/s/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
    	})(window,document,'script','dataLayer','GTM-DEBUGEMALL');
    
    </script>	

    The template needs to be improved, and I don’t consider it production ready, ( despite I’m using it on this own blog )

    Essentially, we can utilize all our methods to combat spam, as we have been doing, while ensuring that the ID remains concealed to prevent attackers from circumventing any measures aimed at safeguarding our data integrity.

  • Solving the GA4 Server-Side Configuration Issue: Why Your First Hit Might Bypass Your Server-Side Endpoint

    Disclaimer: This might stem from a misunderstanding on my side about how GA4 tags function within Google Tag Manager (GTM), especially compared to how configuration tags used to work in Universal Analytics. The differences between these systems can be subtle and confusing. This issue has been quite frustrating, so I’m sharing my experience in case others encounter the same problem.

    I was working on my Google Analytics 4 implementation via Google Tag Manager Server Side and I noticed some of the hits were coming through the regular endpoint ( *.google-analytics.com ), for some reason.

    After some testing I figured out the issue. Despite Google Tag Manager shows the associated Google Tag on your GA4 tags, it’s not taking any action to be sure that the config settings are applied to the related tag. Meaning that any tag firing before Google Tag won’t have the needed configuration. GTM won’t be applying the settings to the linked GA4 Tag.

    Having this message in the interface makes it somehow confusing:

    So for example if the current dataLayer event that triggers your event is on the dataLayer before the event that triggers the GA4 Configuration Tag, the server_container_url will be ignored making your first hit not going through your server-side endpoint.

    This means that if are firing a GA4 event tag based on a trigger that happens before the trigger that fires the GA4 config tag, these hits wont go through the SGTM endpoint.

    Here is it is an example of the issue:

    You see the issue by your own on the following urls:
    https://www.thyngster.com/demos/gtag_ss_race_issue/pv_push_before_gtm.html
    https://www.thyngster.com/demos/gtag_ss_race_issue/pv_push_after_gtm.html

    I prepared a test environment with the most simple replication setup, just a page_view event tag that fires on the page_view event, and the GA4 Config Tag that fires on the All pages (gtm.js) event.

    This is how it looks like the tags and triggers on this container:

    In my case the issue was not a big deal since i’m just firing the data through the server-side endpoint and I’m not doing anything else, so the biggest problem I’d have is having some hits being blocked by some adblocker because of the domain.

    I expect some people firing pixels or doing data widening, firing some pixels based on the SGTM GA4 Client to have some more inconveniences, despite the data flow to Google Analytics 4 to keep working.

    Google Tag Manager knows when there’s a Google Tag that references our current Measurement IDs ( check screenshot above ), meaning that should be possible to grab the configuration parameters ( server_url, cookie preferences, etc ) and apply them to the to the linked Event Tags ( as Universal Analytics Config tags used to work … ) without even needing to wait for the Google Tag execution at all or maybe have them them queued them until the related GTAG has fired.

    But anyway if you are having this issue, the workarounds are pretty straighforward:

    • Ask your developers to move your push after GTM Snippet
    • Set the GA4 Config tag trigger to be the same as your first event trigger
    • Manually set the server_container_url for the tags that may fire before the Google Tag
    • Be sure that GTAG fired always before any GA4 Event Tag.

    Based on a comment on Linkedin from Roman Appeltauer there’s an extra way of getting over this.

    • Use the Initialization Trigger for your Google Tag

    This post talks about the server-side integration issues (because it was my problem) but this can be applied to having user properties or event parameters set via the Google Tag. and not being applied to all the tags, depending on your triggers or dataLayer configuration )

  • Guide to Tracking Google Ad Manager Impressions and Clicks

    When we use Google Ad Manager, the ads are served using an iframe, which makes it somewhat challenging if we want to track the impressions in another tool, such as Google Analytics 4.

    This is why, on this occasion, I’ll be sharing a library to programmatically track our ad impressions and clicks.

    For this task we’ll relying on the ad tagging library for Google Ad Manager, Google Publisher Tag and the Google Analytics Promotion Tracking from the Ecommerce ( view_promotion , select_promotion events )

    You may identify th eGoogle Publish Tag Library for the global variable window.googletag, check the following snippet and the most simpliest example.

    <head>
      <meta charset="utf-8" />
      <meta name="viewport" content="width=device-width, initial-scale=1" />
      <meta name="description" content="Display a fixed-sized test ad." />
      <title>Display a test ad</title>
      <script async src="https://securepubads.g.doubleclick.net/tag/js/gpt.js"></script>
      <script>
        window.googletag = window.googletag || { cmd: [] };
    
        googletag.cmd.push(() => {
          // Define an ad slot for div with id "banner-ad".
          googletag
            .defineSlot("/123456789/Debug/Em/All", [300, 250], "banner-ad")
            .addService(googletag.pubads());
    
          // Enable the PubAdsService.
          googletag.enableServices();
        });
      </script>
      <style></style>
    </head>

    The way the Google Ad Server works is that it creates an iframe with our unit_ad_id as its name or ID, and it handles querying the Google Ad Manager server for details about the ads and the creatives to be served.

    If we want to track which ads are being served in our pages, The first challenge we’ll encounter is that we only have information about the ad-unit path, which essentially provides details about the current slot context. Typically, we also need other values, such as the current campaign and the creatives being served. Since all the rendering happens on the iframe we won’t be able to know when the ads has been served, or even have access to the content becuase the content is loading within the iframe.

    So, let’s start to figure out how are we going to track our ads, first thing we need to know that despite being working on an iframe ( i know we all hate iframes ), GTP library offers some events that will help us in our mission. ( GPT Reference )

    impressionViewable
    rewardedSlotClosed
    rewardedSlotGranted
    rewardedSlotReady
    slotOnload
    slotRenderEnded
    slotRequested
    slotResponseReceived
    slotVisibilityChanged
    Available events on Google Tag Publisher

    We want to track the ads impressions, so we’ll be using the impressionViewable event. Which will trigger once the banner has been shown to the user.

    This is how we can add a listener for the Google Tag Publisher event.

    googletag.pubads().addEventListener("impressionViewable", function(ad) {
    	// Do our stuff
    });

    You may have noticed that the event will return a value ( that we’re naming as ad) that will contain some extra details about the current ad being shown. This will return and slot object, which some functions we can use to grab info about the current ad just rendered.

    The primary method we need to examine is ad.slot.getResponseInformation(). This method returns the current advertiserId, campaignId, and creativeId, which we will use to compile our Promotion Impression data.

    > ad.slot.getResponseInformation()
    {
        "advertiserId": 5632560629,
        "campaignId": 3554088676,
        "creativeId": 138478051191,
        "lineItemId": 6737660555,
        "sourceAgnosticCreativeId": 138478051191,
        "sourceAgnosticLineItemId": 6737660555,
        "isBackfill": false,
        "yieldGroupIds": null,
        "companyIds": null,
        "creativeTemplateId": null,
        "encryptedTroubleshootingInfo": null
    }

    But not only this, we also want to track to where the ads points to, ie: what’s the clickURL. Sadly none of the method will give us that information, but we can use ad.slot.getHtml()method , which will return the current ads markup and we’ll be using some regex to extract the details.

    We will follow this logic

    • Extract all the a href links from the html
    • Filter these href links for those who have a adurl parameter

    This is the way we can extract the destination url for our banner

    var getAdUrl = function(adHtmlString) {
      // Step 1: Extract all 'a' tags with 'href' attributes
      var aTagRegex = /<a\s+[^>]*href\s*=\s*["']?([^"'>\s]+)["']?[^>]*>/gi;
      
      let matches;
      var hrefs = [];
      while ((matches = aTagRegex.exec(adHtmlString)) !== null) {
        hrefs.push(matches[1]); // Capture the 'href' value
      }
      
      // Step 2: Filter hrefs that contain the 'adurl' parameter
      var adurlHrefs = hrefs.filter(href => href.includes('adurl='));
      
      // Step 3: Extract the 'adurl' parameter value from these hrefs
      var adurlValues = adurlHrefs.map(href => {
        var urlParams = new URLSearchParams(href.split('?')[1]);
        return urlParams.get('adurl');
      });  
    }
    
    var adHtmlString = ad.slot.getHtml()
    console.log(getAdUrl(adHtmlString));

    Let’s quickly recap. Now, we can capture the Ad-unit Path, the campaignId, the creativeId, and the destination URL for the impressions of our ads, which provides us with valuable information. This means we can build a dataLayer.push({}) with a view_promotion event to track our impressions. Alternatively, we can send this information to any other tool you prefer. We’ll be using a Google Tag Manager (GTM) dataLayer.push as it is the most commonly used tag management system (TMS). Feel free to adapt this core implementation for your needs or reach out to me for help in customizing it to your site’s requirements.

    Next Challenge is: “How do we track the clicks to the iFrame“. the <iframe> element doesn’t have a click event, which doesn’t help in our task, we can luckily detect the clicks within the iframe.contentWindow since it’s not hosted on a remote domain ( no src )

    document.querySelector('#div_ad_holder iframe').contentWindow.document.body.onclick = function(e) {
    	alert("Ad Clicked");
    }

    At this point, we should know how to detect when an impression has been rendered, extract all the details about the ad’s impression, and track the clicks on the ads. We have everything we need; it’s just a matter of putting everything together to create a fully functional tracking solution.

    To make the process easier for everyone, I’ve started a new GitHub repository that provides a simple, ready-to-use library. This library will push the data as an Ecommerce Promotion event to the GTM dataLayer.

    Find it here: https://github.com/analytics-debugger/google-ad-manager-tracker

    (function() {
        // Copyright Analytics Debugger S.L.U. All Rights Reserved. 2024
        // Author: David Vallejo
        // Date: 2024-06-21
        // Name: Google Publisher Tag Impressions and Clicks Tracker
        // This file is licensed under the MIT License.
        // License text available at https://opensource.org/licenses/MIT  
      
        // Add ?ad_debug=1 for debugging messages
        var debug = document.location.search.includes('ad_debug=1') ? true : false;
        // Init googletag variable, jic
        window.googletag = window.googletag || {
            cmd: []
        };
        // Init googletag variable, jic
        window.dataLayer = window.dataLayer || [];
        // Our variable for holding the impressions info. Used for grabbing the data on click
        window.__gpt_impressions = window.__gpt_impressions || [];
    
        // Helper function to extract the ad destination URL from the Ad HTML
        var getAdUrl = function(adHtmlString) {
            // Step 1: Extract all 'a' tags with 'href' attributes
            var aTagRegex = /<a\s+[^>]*href\s*=\s*["']?([^"'>\s]+)["']?[^>]*>/gi;
    
            let matches;
            var hrefs = [];
    
            while ((matches = aTagRegex.exec(adHtmlString)) !== null) {
                hrefs.push(matches[1]); // Capture the 'href' value
            }
    
            // Step 2: Filter hrefs that contain the 'adurl' parameter
            var adurlHrefs = hrefs.filter(href => href.includes('adurl='));
    
            // Step 3: Extract the 'adurl' parameter value from these hrefs
            var adurlValues = adurlHrefs.map(href => {
                var urlParams = new URLSearchParams(href.split('?')[1]);
                return urlParams.get('adurl');
            });
            if (adurlValues.length > 0) return adurlValues[0];
        }
        // Adding the impression Listener
        googletag.cmd.push(function() {
            googletag.pubads().addEventListener("impressionViewable", (event) => {
                // We have an impression, let's get the holder iframe reference and add the click event.
                document.querySelector('#' + event.slot.getSlotElementId() + ' iframe').contentWindow.document.body.onclick = function(e) {
                    var impressionInfo = window.__gpt_impressions.filter(function(e) {
                        if (e.promotion_name === event.slot.getSlotElementId()) return true;
                    });
                    window.dataLayer.push({
                        event: 'select_promotion',
                        ecommerce: {
                            items: [impressionInfo]
                        }
                    })
                    if (debug === true) console.log("GPT AD CLICK", impressionInfo);
                }
    
                var slotDetails = event.slot.getResponseInformation();
                try {
                    var impressionInfo = {
                        promotion_name: event.slot.getSlotId().getDomId(),
                        promotion_id: slotDetails.campaignId.toString(),
                        creative_name: slotDetails.creativeId.toString(),
                        creative_slot: getAdUrl(event.slot.getHtml())
                    }
                    window.dataLayer.push({
                        event: 'view_promotion',
                        ecommerce: {
                            items: [impressionInfo]
                        }
                    });
                    window.__gpt_impressions.push(impressionInfo);
                    console.log("GPT AD IMPRESSION", impressionInfo);
                } catch (e) {
    		            if (debug === true) console.log("GPT ERROR GRABBING IMPRESSION DETAILS: ", e);
    	          }
            });
        });
    })()

    The earlier code snippet is responsible for pushing the necessary data to Google Tag Manager. At this stage, you just need to add the appropriate tags and triggers in Google Tag Manager to ensure the data flows into your account.

    This code has only been tested on Chromium-based browsers. While the impressions should work across all browsers, we are unsure of how Safari and Firefox will behave with regard to the clicks.

  • … App Debugging For All: Making App Implementation Auditing Accessible to Everyone!

    If you’ve ever worked on a project and needed to debug your company’s or client’s app, I’m sure you already know how frustating can be needing to audit the implemetnation. I’m sure you know I’m talking about. Not having a public and avaiable namespace and API as we do in browsers can make things really hard, even more if you don’t have some technical skills.

    Circa 2016 I presented ( well Sir Yehoshua Coren did on my behalf ) in the SuperWeek‘s Punchcard Prize a tool for debugging apps using a small hardware device ( rPI like board ). Which was essentially able to intercept anything that was connected through the Wi-Fi connection this device was generating. As simple as that sounds,on a piece of hardware made the prospect of broadly offering this to everyone somewhat difficult. At the same time, turning it into a commercial product was not an option due to my limited business and communication skills at that point.

    Analysts, particularly those in data analytics and data science roles, don’t need an exhaustive understanding of how everything functions. For instance, they don’t require knowledge of JavaScript to examine how a video player is tracked, and there’s no necessity for them to be familiar with Kotlin to review the events tracked within an app. However, I firmly believe they should have the capability to check if, at the very least, the tracking is occurring. This is where my tools come into play, adn this is where Analytics Debugger for Apps was born, enabling anyone to effortlessly inspect what’s happening in their analytics implementations.

    My aim was clear: to democratize Apps Debugging, making the debugging task accessible to individuals of all technical backgrounds.

    Looking back to 2019 with the upcoming release of App+Web ( now turned into Google Analytics 4 / Firebase for GA ) I began experimenting with the idea of creating a tool that would mimic the reports of the current Chrome extension, regardless of the platform being debugged. Before long, I developed “Android デバッガ” a debugger for Firebase/Universal Analytics on Android as the initial step, which was later expanded to include iOS debugging and name turned into “iOS/Android Debugger“.

    I really made a list of points that the tool should comply with:

    • No App Modifications Needed: The debugging should work ‘out of the box’ without the need to recompile packages or add external libraries.
    • Reports should be vendor-specific, not just a simple output of requests, in order to make the data understandable for any analyst or marketer.
    • It ought to be made available for free.
    • Open to the inclusion of new vendors.

    At this stage of the venture, I’m happy to say that I’ve met all the objectives and am ready for the next steps. You might see a pattern here, but the first thing I did was change the name (once again). It’s quite in line with previous name changes. So, say hello to “Analytics Debugger for Apps“.

    The debugger remains FREE for everyone as long as you stay subscribed to the newsletter (by the way, I’ve sent 0 emails in 10 years…). That’s the only requirement to use the tool. You can create an account and then a license at https://app.analytics-debugger.com at not cost.

    You might be wondering if there’s a catch to this, but the answer is straightforward: there isn’t one. I’m simply sharing one of my internal tools to help people. Not sure how long I will be doing it, so for now let’s just enjoy the moment 🙂

    You wouldn’t believe how time-consuming tool development can be, especially when going solo ( backend, frontend, testing, websites, docs, social, updated, support, backups, apis, design ). Therefore, the tool is open to sponsorships. Sponsors will not encounter any nag screens or experience delays while using certain features, such as exporting data to the clipboard.

    Currently, Windows 64-bit and MacOS Universal versions are available at https://www.analytics-debugger.com/tools/ios-android-debugger/, with an upcoming Linux 64-bit version (currently in beta) in the works.

    “Due to Android’s limitations, not all vendors are supported (though this may change in the near future 😉), making iOS App debugging the more feature-rich version.” . Featured vendors are: Firebase Analytics, Universal Analytics (to be deprecated), SnowPlow and Adobe Analytics . As I mentioned before, I do this in my free time. While I would love to add more features, This is where sponsorships come into play, providing me with the resources to dedicate more time to working on the tools.

    I’m amazed that over 1,500 fellow analysts have used the debugger so far, especially considering the minimal marketing efforts I’ve put into it. Huge thanks to everyone! .

    Happy Debugging

  • Securing Your DataLayer: Defending Against Pollution by External Tools and Bots to Preserve Data Integrity

    As expected in the last months many vendors and providers has started to add Google Analytics 4 integrations, and many of them may just push events to the GTAG wrapper function, and you’ll likely ending having some unwanted events on your reports.

    Not only about vendors, spammers has an easy way to programatically mess with your data, just using the global objects.

    We’ll learn some implementation tricks for preventing any other tools to pollute our GA4 data and also how we can ensure that nobody else but us send data send data to our dataLayer ) , as usual I’ll be using examples for Google Tag Manager and Google Analytics 4 but same logic could be applicable to any other tool.


    Protecting GTAG from Bots and Vendors pollution

    In order to protect our setup from unasked events or pushes, we’ll slightly modify our GTAG calls. First modification is adding some guard check on the GTAG wrappermething blocked them.

    <script async src="https://www.googletagmanager.com/gtag/js?id=G-THYNGSTER"></script>
    <script>
    window.dataLayer = window.dataLayer || [];
    function gtag() {
        // Guard, first argument is not our protection token?, skip
    	if (arguments[0] !== 'protectToken') return;
    
        // This function will allow us to pass back an Argument Type Objecto to the dataLayer
    	function passArgumentsBack() {
    		dataLayer.push(arguments)
    	}
        // Remove the first argument and pass the rest back
    	passArgumentsBack.apply(this, Array.prototype.slice.call(arguments, 1));
    }
    gtag('protectToken', 'js', new Date());
    gtag('protectToken', 'config', 'G-THYNGSTER');
    </script>

    Now, any gtag function calls, where the first arguments is not our “protectionToken” will be blocked. Making that any vendor or bots that tries to push data to our namespace, will be just stealthy ignored

    It may happen that you cannot modify the gtag on the creation, maybe because it’s hardcoded on the page or because someone else has already initialized it. don’t worry, you could run this code to override the current method.

    if (window.gtag && {}.toString.call(window.gtag) === '[object Function]') {
    	function gtag() {
    		if (arguments[0] !== 'protectToken') return;
    		function passArgumentsBack() {
    			dataLayer.push(arguments)
    		}
    		passArgumentsBack.apply(this, Array.prototype.slice.call(arguments, 1));
    	}
    }

    You have to remember that now you need to append your protectionToken to any call you want to push.

    gtag('protectToke', ... )

    Protecting our Datalayer from Bots and Vendors pollution

    I bet you may have already thoughjt about just adding a custom event to all your pushes and then adding a blocking condition on GTM, and that’s indeed a great idea, but at this time we’re not trying to block events being push, but our dataLayer being polluted and messed up. We really want a clean and nice looking dataLayer which is fully governated by us, in anyone want to enter the play , should ask us first :).

    Protecting the dataLayer is a bit more complicated ( at least Google Tag Manager one ), and this is because when GTM loads, it modifies the original push method from the array. That we’ll be end losing the reactivity, or causing any other malfunction is we mess around with it, at the same and for the same reason, we cannot add the modification on the dataLayer initialization because it will be lost when GTM overrides it.

    The sting we need to do here is wait until the dataLayer.push has been fully initialized and then add some method to intercept the calls beind made to it.

    On this example I’ll be using a simple proxy pattern , but there are some more proper ( and at the same time more difficult to implement ) workarounds like, working with setters and getters or using an ES6 Proxy. In any case this method is pretty straightforward an has a very good cross-browsers support.


    I tried to focus on having an undestandable code rather than some cool looking code. We’ll use a Promises based solution to query the dataLayer.push method until we detect it was already initilaized by Google Tag Manager and then we’ll add our proxy

    <script>
    (function() {
    	var settings = {
    		dataLayerName: 'dataLayer',
    		pollingTime: 25,
    		limit: 1000,
    		protectKey: 'ptoken',
    		protectValue: 'thyngster'
    	}
    
    	var waitForDataLayerReady = function(settings) {
    		var count = 1;
    
    		function waitFor(result) {
    			if (result) {
    				var proxiedDataLayerPush = window[settings.dataLayerName].push;
    				window[settings.dataLayerName].push = function() {
    					if ((arguments && arguments[0] && arguments[0][settings.protectKey] && arguments[0][settings.protectKey] === settings.protectValue) || arguments && arguments[0] && arguments[0].event && String(arguments[0].event).match(/^gtm\./)) {
    						if (arguments && arguments[0] && arguments[0][settings.protectKey] && arguments[0][settings.protectKey]) delete arguments[0][settings.protectKey]
    						return proxiedDataLayerPush.apply(this, arguments);
    					}
    				}
    				return settings.dataLayerName
    			}
    			if (count >= settings.limit) {
    				return null;
    			}
    			count++;
    			return new Promise(function(resolve) {
    				setTimeout(resolve, settings.pollingTime || 1000)
    			}).then(function() {
    				Promise.resolve(window[settings.dataLayerName || 'dataLayer'] && window[settings.dataLayerName].push && window[settings.dataLayerName || 'dataLayer'].push.toString().includes('SANDBOXED_JS_SEMAPHORE'))
    			}).then(function(res) {
    				waitFor(res)
    			});
    		}
    		return waitFor();
    	}
    
    	waitForDataLayerReady(settings).then(function(result) {
    		if (result === true) {
    			var proxied = window[settings.dataLayerName || 'dataLayer'];
    			window[settings.dataLayerName || 'dataLayer'] = function() {
    				return proxied.apply(this, arguments);
    			}
    		}
    	});
    })()
    </script>
    
    dataLayerNameThis is our dataLayer variable name, will default to dataLayer
    pollingTimeThe polling period, by default it’s checked every 25ms
    limitWe don’t really want to wait forever, limit + pollingTime will stop the watcher. If you want to calculate this in seconds the total seconds that the code will keep waiting for a dataLauyer is secs = (limit * pollingTime) / 1000
    protectKeyThis is the key we need to add to our pushes, if it’s not present the push won’t go throught
    protectValueAnd this is the expected Protect Token value
    Settings Parameters Definition

    If you check the code carefully we added an special rule to allow all events starting with /^gtm\..*/ skip the check, to allow the system pushes to keep going into the dataLayer.

    So now, if we someone does the following, the push will be intecerted and will never reach our dataLayer.

    window.dataLayer.push({
        event: 'evilVendor',
        opted_in_groups: '1,2,3,4'
    })
    In the other side our pushes should contain the protect key and token so they are allowed to end into the DL.
    window.dataLayer.push({
        event: 'add_to_wishlist',
        ptoken: 'thyngster'
    })

    From this point on the logic could be extended as much as you want, for example you may want to defined a whitelist events list rather than working with a token, it’s just up to your imagination.

    This proxy pattern is extendible to almost any tools, meaning that you could this concept to any other vendor or TMS. Please take in mind that this is not a trivial stuff to add, so my advise is relying on your dev teams or some agency or contractor that can take some proper care or implmention this kind of solutions.

  • Step-by-Step Guide: Measuring JavaScript Memory Usage on Your Web Pages

    Anyone that knows me is aware of how obsessive may I be when working on something from the optimization perspective. I really mi

    The point of this post is measuring the current memory usage by the JavaScript in our pages, for this we’ll be using the performance.memory API . I have to start saying it’s a currently deprecated feature that it’s only available on Chromium browsers, which returns the JavaScript heap details, and that it may miss some accuracy depending on if some pages sahre the same heap or the page has iframes using a separate heaps.

    A new experimental API is meant to replace this one, Performance.measureUserAgentSpecificMemory(), which will be able to estimate the current web page memory usage including it’s iframes and related workers. But it needsf or cross-origin-isolation ( window.crossOriginIsolated ), and we’d need to add teh COOP/COEP headers to our site. So let’s work with the old API for now.

    In any case, we’ll be working with the current performance.memory API, which may end giving us some good tips on our current pages memory usage.

    Let’s get on it


    performance.memory

    As we mentioned before this is a function, that will return the current JS Heap of the current page and will only work for Chromium based browsers, still having a sampling of this metric could help us on measuring how our sites performs,

    It will return 3 different metrics.

    • jsHeapSizeLimit , The max. memory available for the current context.
    • totalJSHeapSize , The total allocated head size.
    • usedJSHeapSize , The current active segment of JS Heap

    In any case returning the current memory usage at an specific moment doesn’t give much value, so I prepared an script that will recording the current values during the page load, and that pushes the maximum, minumun, and average memory usage to our dataLayer.

    You can personalize the values, by default the script will ask for the current memory usage each 1/4 seconds ( 250ms ), during 10 seconds, or 2.5seconds after the page load event whatever comes first. Just in case the data will be pushed if the user navigates away from the page before any of the previous rules happen using the beforeunload page event.

    You can personalize these values on the script top section. Please have in mind that using a 1 millisecond polling time won’t be likely giving you better insights, just keep that value

    The idea is grabbing the memory usage during the page rendering, since after the page load it may not change much. ( this will depend of course, if you have some lazy loading content, the memory usage will keep changing ), or you may be using an SPA site, that you may want to monitor on some regular intervals, if that case you may have this running on the background and push the momory usage data for each history.change or virtual pageview

    The code Snippet

    Here you can find the code that will take care of monitoring the memory usage over the page load. Please the sooner you add this code into the page the better, if you are using Google Tag Manager using the gtm.init , gtm.js / All Pages events, if you have a TMS with the option of adding sychrounous code that’s your best choice, and lastly adding directly into the page.

    ParameterDescription
    eventNameThe event name to be using on the dataLayer Push . string
    sampleLimitTotal count of samples to take. integer
    pollingPeriordPolling time in ms . This along with the sampleLimit will defined how much max time will take the vent to fire. integer
    waitTimeAfterWindowLoadTotal Seconds to wait after windows load. It doesn’t make many sense waiting 10 seconds if the page loads comes in 2 seconds, So we’re pushing the data on this data without waiting for all the sample limit. integer
    <script>
    (function() {
        // David Vallejo (@thyng)
        // Analytics Debugger S.L.U. 2023
    
        var settings = {
            eventName: 'memory_usage_profiler',
            sampleLimit: 40,
            pollingPeriod: 250, // in ms
            waitTimeAfterWindowLoad: 2500 
        }
    
        // This is only available on Chromium based browsers, just skip if the API is not available
        if (!(window.performance && 'memory'in window.performance))
            return;
    
        try{
        // Initialize Data
        var data = {
            sent: false,
            samplesCount: 0,
            max: performance.memory.usedJSHeapSize,
            min: performance.memory.usedJSHeapSize,
            avg: performance.memory.usedJSHeapSize
        }
        var pushData = function(data) {
            if (!data.sent) {
                window.dataLayer.push({
                    'event': settings.eventName || 'memory_usage_profiler',
                    'event_data': {
                        'max_memory_usage': (data.max / 1024 / 1024).toFixed(2),
                        'min_memory_usage': (data.min / 1024 / 1024).toFixed(2),
                        'avg_memory_usage': (data.avg / 1024 / 1024).toFixed(2),
                    }
                })
                data.sent = !!true
            }
        }
    
        var clear = setInterval(function() {
            if (performance.memory.usedJSHeapSize > data.max)
                data.max = data.avg
            if (performance.memory.usedJSHeapSize < data.min)
                data.min = data.avg
            data.avg = (data.avg + (performance.memory.usedJSHeapSize)) / 2
            data.samplesCount++;
            if (data.samplesCount >= settings.sampleLimit) {            
                clearInterval(clear)
                pushData(data)
            }
            ;
        }, settings.pollingPeriod)
    
        // If page has been already loaded, wait 1 second and push the data
        window.addEventListener("load", function(event) {
            setTimeout(function() {
                clearInterval(clear)
                pushData(data)
            }, settings.waitTimeAfterWindowLoad);
        }
        );
        // In case the user navigates away from the page...
        // Should prefer to use hide,load mechanism, Pending.
        window.addEventListener('beforeunload', function(event) {
            clearInterval(clear)
            pushData(data)
        });
        
        }catch(e){}
    }
    )()
    </script>

    Sending the data

    At this point we have all the data coming into our dataLayer, this means that we could send it to wherever we want to. As usual I’m using Google Analytics 4 , and then make use of some Metrics for getting the averages.

    It’s gonna be some easy setup, just create 3 dataLayer type variables, a trigger to match the event name you defined for this tracking ( default: “memory_usage_profiler” ) , and lastly map all to a GA4 event Tag.

    Google Analytics 4 Metrics

    We’d need to create some metrics in our account, we should create metrics and not dimensions.


    Looker Studio Report Example

    I quickly built a looker studio report to show how the data will look like.

  • Unraveling the User Engagement Measurement in Google Analytics 4

    Deciphering User Engagement Measurement in Google Analytics 4

    One of the most notorius changes in Google Analytics 4 was the removal of the Bounce Rate Metric in benefit of the Engaged Sessions / Engagement Time. ( yes, I know they ended bringing it back with some slight differences ).

    First one will define if the current session has been engaged by the user, second one defined the “amount” of engagement involved with the visit. The combination of these 2 metrics will provide a better view of users behaviour in our site, rather than checking a simple “did they bounce” boolean parameter without any further context.

    Engaged Session

    Google Analytics 4 will consider ( and accordingly mark ) a session as engaged , is any of the following conditions are met:

    • The session last longer than 10 seconds (this can be configured)
    • The session has at least 1 conversion event
    • The session has at least 2 or more page_view or screen_view

    Then the engagement rate will report the percentage of sessions that meaningfully interacted with your site, based on the condition listed above.

    In the other side the “Bounce Rate” will report the sessions where there was not engagement.
    The session engagment is reported back to GA4 using a shared payload key (ie: it’s shared accross all the events in the current hit ).

    Engagement Time Metric

    The way the engagement time is track may confuse the people, despite it may include a lot of rules inside, it ends being pretty simple, Google Analytics will only count the time the user is really “engaged” (huh!) with your site, and for deciding this there are 3 page state that needs to be enabled.

    • Page MUST be visible
    • Page MUST be active
    • Page MUST have the Focus

    If you’re not a developer this may be easy to understand, let me try to explain.

    The current document (page) has to visible, if for some reason the current page document is not visibile on for the users, no time will be counted. This is detected using the visibilitychange API and checking for the document.hidden value.

    Also the current page needs to be active, this is measured using the pageshow and pagehide events. If you don’t know that this means, let’s say that we have a SPA page we press the Back button the counter will be stoped on the pagehide and will start counting again whe the page is shown.

    Lastly, the current window needs to have the focus. Let’s say that you’re using my Analytics Debugger Extension, the time you’re interacting with the extension the focus it’s on the extension, not on the page. Google Analytics 4 will stop the counter and that time won’t be counted.

    In any case some of these rules my not work all the times because of the different browser behaviors, race conditions, so it uses some more signals to enable or disable the counter, for example checking if the document.hasFocus is true, or starting the counter of each gtag event push as long as the event name is not a get call ( "gtag.get" )

    The user_engagement event

    There’s some extra point to have in mind about how Google Analytics 4 measures the engagement time. Each time that we click on any self-domain link, or any of the domains configured on the linker. the user_engagement is fired. (There may some more conditions to be meet, or some extra conditions that may trigger it at some point)

    The main reason for this auto-fired event to exist seems to be measuring the current time on page, that’s why it gets fired just before user leaves the current page. And this will trigger if there’re no other events pending on the queue (I think, need to test this deeply), for example if you have an click event at some point, that even will hold the engagement time, so this even will not be needed.

    You’ll likely need to check the “preserve log” check to see these hits ( which are also available on Big Query ), and that which basically hold the engagement time value (&et) . You may see these hits showing an unknown status on the network tab, nothing to worry about. All hits on GA4 are sent using the sendBeacon API , which basically means that they are fired and the browser doesn’t wait for any responde from the receiving endpoint. Since

    Are these Metrics Reliable

    I was chatting with @CharlesFarina about this, and related to the reliability of the engagment time it came to my head that there were some studies mentioning some issues with he sendBeacon usage.

    The people of Volument, found some 40% discrepancies while using sendBeacon along with the unbeforeUnload event here: https://volument.com/blog/sendbeacon-is-broken . According to MDN documentation should be use along with the visibilitychange ( I didn’t check on which one GA4 triggers the data, since no events are fired on my visibilities changes, we may guess to be fired on the pagehide, or page unload … ) . In any case, this study was ran 2 years ago, so as today we should expect this gap not to be as big.

    Still even if this event is not finally tracked some times, it really helps on adding even some more value to the engagement time metric on Google Analytics 4.

    Working demonstration Gif:

    And that basically how it works, I may be missing some smalls details and rules, but it’s not the point of the post to replicate the funcionality , but providing some details so we know why some events don’t attach the event engagement_time ( &_et ) value, or why sometimes event if we spent more than 10 seconds the session has not been marked as engaged.

    One thing that I may be missing is having the option to set an event as a “non-interactional“, as we used to have on the old analytics, but rather that non spawning a session this would just skip passing the engagement_time to the current event, which may be useful for people tracking some backend or automatic event without any use interactions.

  • Back to the “blogs”

    I’m the last year I didn’t have many time to work on this blog, mostly because I’ve been focusing on my other side projects on my “free” time.

    Since I started with this blog some many years ago, my top 1 rule has been not writing about what others have already written, so I’ve been trying to provide new information on each of my posts, rather than retreading others content or even my own content each year, this added to some people being ludricoulsy fast on providing new content about Google Products new features had left me in a very difficult position to provide some more stuff.

    In any case, Google Analytics 4 is now in da house, which provided a full new scope of chances to write about some new stuff, or even take time to really update all these +10 years old blog posts. So I’m back on writting.

    This is gonna help me on improving my , poor, english skill, but will also let me learn some new stuff, since I’ve been feeling a bit outdate, ( ie: Big Query, Attributions. Modeling ).

    All this said, this post also inagurates the new blog design, which I think is more read-able and includes a dark-mode version. Along with this there will be some news about the content on this blog.

    • It won’t even again be solely a Google Analytics / Google Tag Manager blog, I really want to try,learn new tools, vendors, and of course “teach” about them to all this blog readers.
    • This will be a ChatGPT-Free blog . That may be against my blog growing or writting posibilities, but I only want to provide my own created content, and I want to improve my english skill. So, you’re really lucky at this point since you be able to keep enjoying my exquiste english vocabulary.
    • I’ve never had/ask for collaborations, I’d love to have some guest posts, and will be even beter if we work on some solution together, are you interested on this?, ping me.
    • I’m take away my Top 1 Rule. I’ll write about anything that I may work with, or that I may try, since even if some other people has writting about the same topic I may contribute with some other valid PoVs or ideas that someone may find useful.
  • How to track AMP / GA4 Pages using Google Tag Manager

    Hi, tomorrow 1st of July is the day when Universal Analytics is being sunsetted for all free account users, 360 accounts will enjoy of an extra year for dealing with the full migration .

    On 26th June, Google luckily announced the official Google Analytics 4 support for AMP pages. Which makes me really happy since in some time from now I won’t need to worried to keep my solution and CDN servers online, (sorry it feels like a lot of responsability to keep something for almost 1B unique users and +3B monthly hits, and also there will be some more money in my pocket )

    I’ve been following the coments on social networks and forums, and it seems there’s a lot of people that were waiting for some AMP/GTM support. But as now 24h to the deadline, there’s no a template for GTM AMP containers.

    I worked on this the past summer, but I didn’t publish it since not having a template on GTM wasn’t making much sense, in any case, I feel this may end help some people (or not…). But I’m showing you how you can do the GA4 tracking on AMP using GTM.

    First of all we need to modify our main <amp-analytics> tag to include the type="googleanalytics" , this is needed for having access to the session information variables.

    <amp-analytics config="https://www.googletagmanager.com/amp.json?id=GTM-THYNGSTER&gtm.url=SOURCE_URL" data-credentials="include" type="googleanalytics">
    </amp-analytics>


    Oki doki, we have the first step, now, current GTM containers for AMP doesn’t have the predefined variables to build a proper Google Analytics 4 hit, for example it’s missing the session_id, session_count, client_hits, and some other details. And here’s were we’re going to make use of the current AMP variables to accordingly calculate this information so we can later or using it in our hits. For this we need to add some vars to our Google Tag Manager configuration, I took care of building everything you, so you just need to copy and paste. ( just change the GTM container ID )

    <amp-analytics config="https://www.googletagmanager.com/amp.json?id=GTM-THYNGSTER&gtm.url=SOURCE_URL" data-credentials="include" type="googleanalytics">
        <script type="application/json">
            {
                "vars": {
                "sid": "$CALC(SESSION_TIMESTAMP, 1000, divide, true)",
    	        "sct": "SESSION_COUNT",
            	"seg": "$IF($EQUALS(SESSION_ENGAGED, true),1,0)",
    	        "_et": "$CALC(INCREMENTAL_ENGAGED_TIME,1000, multiply)",
    	        "gcs": "$IF($EQUALS(${GOOGLE_CONSENT_ENABLED},TRUE),G10$IF($EQUALS(CONSENT_STATE,sufficient),1,0),)",
            	"uaa": "${uach(architecture)}",
    	        "uab": "${uach(bitness)}",
            	"uafvl": "${uach(fullVersionList)}",
    	        "uamb": "$IF($EQUALS($DEFAULT(${uach(mobile)}, EMPTY), EMPTY),,$IF($EQUALS(${uach(mobile)}, false),0,1))",
            	"uam": "${uach(model)}",
    	        "uap": "${uach(platform)}",
            	"uapv": "${uach(platformVersion)}",
    	        "uaw": "$IF($EQUALS($DEFAULT(${uach(wow64)}, EMPTY), EMPTY),,$IF($EQUALS(${uach(wow64)}, false),0,1))",
                "is_first_visit": "$IF($EQUALS($CALC(SESSION_COUNT, $CALC($CALC(${timestamp}, 1000, divide, true),$CALC(SESSION_TIMESTAMP, 1000, divide, true), subtract), add),1), _fv, __nfv)",
                "is_session_start": "$IF($EQUALS($CALC($CALC(${timestamp}, 1000, divide, true),$CALC(SESSION_TIMESTAMP, 1000, divide, true), subtract),0), _ss, __nss)"
                }
            }
        </script>
    </amp-analytics>


    How are we going for now?, easy, isn’t it?. At this point, we have all the necessary data points to properly build a Google Analytics 4 hit payload.

    Now we just need to create some Custom Image tags in GTM for our page_views and events . Just to keep making your life easy, I’m attaching the full main payload you should be using in ALL of our hits. Yuu need to take this and then add your custom details based on the data you want to sent to GA4.

    This is the core custom payload you need to use:

    https://region1.google-analytics.com/g/collect?v=2&tid=G-THYNGSTER&ds=GTM-AMP&_p=${pageViewId}&cid=${clientId}&ul=${browserLanguage}&sr=${screenWidth}x${screenHeight}&_s=${requestCount}&sid=${sid}&sct=${sct}&seg=${seg}&dl=${sourceUrl}&dr=${documentReferrer}&dt=${title}&uua=${uaa}&uab=${uab}&uua=${uaa}&uafvl=${uafvl}&uamb=${uamb}&uam=${uam}&uap=${uap}&uapv=${uapv}&uaw=${uaw}&${is_session_start}=1&${is_first_visit}=1

    Now you only need to add the data related to the event, if you just want to fire a page_view, add the folloowing to the end

    &en=page_view
    or for a custom event name
    &en=outgoing_link

    If you want to add event parameters you need it this way:

    &ep.page_type=homepage // if it's an string
    &epn.page_load_time=122 // if it's a number, note the epN

    If you need to add user properties, it will work this way

    &up.user_name=thyngster // if it's an string
    &upn.life_time_value=234 // if it's a number, note the epN

    Just in case someone got lost, the page_view hit looks like this:

    https://region1.google-analytics.com/g/collect?v=2&tid=G-THYNGSTER&ds=GTM-AMP&_p=${pageViewId}&cid=${clientId}&ul=${browserLanguage}&sr=${screenWidth}x${screenHeight}&_s=${requestCount}&sid=${sid}&sct=${sct}&seg=${seg}&dl=${sourceUrl}&dr=${documentReferrer}&dt=${title}&uua=${uaa}&uab=${uab}&uua=${uaa}&uafvl=${uafvl}&uamb=${uamb}&uam=${uam}&uap=${uap}&uapv=${uapv}&uaw=${uaw}&${is_session_start}=1&${is_first_visit}=1&en=page_view

    At this point i still advice you to update your setup to GTAG tag type on AMP, since it’s not clear at this point if GTM for AMP will be supported in the future, hopefully there will be some announements about this at some point.

    Hope this helps someone in the last minute.

  • Analytics Debugger 2.0.0 Chrome Extension Release

    It took almost 9 months, but it’s finally here. The new Analytics Debugger Extension (Formerly GTM/GA Debugger) hit the 2.0.0 milestone.

    It’s been a full rewrite of the code, mainly to make it fully compliance with the new MV3, but at the same time it’s internal functionality has been improved. Now all the components are loaded asyncronouly, making the extension to use less resources and working fast, and it also make much more easier to add new features.

    The UI has been kept faithful to the original one, but there’re a lot of slight differences that make the reports more clean :). The report that has got the most changes ir the Google Analytics 4 one.

    While debugging your Google Analytics 4 hits/events, you will be able to:

    See the hit event batches
    The server-side generated events (session_start, first_visit, etc )
    On the main report you’ll see the curretn event engagement time, if it’s counting as a conversion, if it has ecommerce.

    • See the events batches grouped but the holding hit request
    • The server-side generated events (session_start, first_visit, etc )
    • On the main report you’ll see the curretn event engagement time, if it’s counting as a conversion, if it has ecommerce.
    • The current used endpoint
    • If it’s a sent to a SGTM server
    • The SGTM response and header ( pixels, and cookies set server side )
    • All the data is presented on a friendly way ( not only the parameters keys )
    • Still you can see the raw payload details
    • The current hit consent mode
    • The session Id and Sessions coount
    • If Google Signals is being used.
    • You can filter the events by the Measurement ID or the event name
    • Event parameters and User properties

    And some much more new features coming in the future stay tuned.

    Real Time Notifications

    The new version, is able to show real-time news/notifications about the current viewed vendor. This will alow me to notify users about service outages or breaking changes on the tools. Of course all this is not automated and I’ll need to be up to date with the news in order to have them showing up on time.

    Amplitude Support

    Another big news, it now supports Amplitude. It support seeing the current hit batches , as usual in a really friendly manner. Clicking the project ID will show up the current tracker/project configuration

    Matomo/Piwik Support

    I also added support for Matomo/Piwik. This report will allow you to see the hits coming in real time, it support the pings, link cliks, purchases. And you can even parse the current hits payload to exactly now what does each payload key means.

    IU Updates

    I really took some time to make the tool configurable, just because not everyone has the same needs. Now you can define which tools to show in your debugging session, and in which order you want the tabs to be shown on the reports:

    The internalization has finally arrived, the extension is now available in English , Spanish and Japanese (kind). Which some more languages coming in the future.

    You can choose if the hit payloads are shown in the way they are send or if you want the payload key sorted out alphabetically

    You can choose the current deep level you want the object to be opened, some people just want the first level to be opened by default, other want to see a deeper level

    Finally you may like to see more data in your screen, or you may use a 4K monitor that makes all very small, you can know setup the zoom level to make your debugging sessions more confortable.

    Support / Bug Reports / Features Proposals

    I’ve opened a Github repository, to track the bugs, you can find it here:

    https://github.com/analytics-debugger/analytics-debugger-browser-extension/issues/new?assignees=thyngster&labels=bug&template=bug_report.md&title=

    Final thoughts

    Please have in mind I rewrite it from scratch, so while I feel it’s more accurate that the previous version, some people may hit bugs on some sites. Just reach me so I can work on them.

    Some more vendors are coming in the next months, just to mention some of them

    • Adobe Analytics
    • Adobe Launch
    • Adobe Target
    • Tealium IQ
    • Yandex Metrika
    • Chartbeat

    And some more debugging features are also coming that will make some of the most complicated debugging tasks a breeze.