Blog

Google Analytics 4 Cookie Format Change: From GS1 to GS2 Explained
Overview

Google Analytics has introduced a new cookie format (v2) to their Cookie Stream Cookie that changes how tracking data is structured. This guide explains the differences between the old GS1 format and the new GS2 format, and demonstrates how to parse both versions

Why Did Google Change the Cookie Format?

Google’s transition from the GS1 to GS2 cookie format wasn’t arbitrary. Here are several strategic reasons behind this change:

1. Improved Extensibility

The new format with single-letter prefixes allows Google to easily add new tracking parameters without breaking existing implementations. With GS1’s fixed positional format, adding new parameters would require updating all parsers to handle additional positions.

2. Self-Documenting Structure

The prefix system makes cookies more readable and debuggable. Instead of remembering that position 5 is “lastHitTimestamp”, developers can see t1746825440 and understand what t represents.

3. Version Control

The explicit GS2 prefix enables Google to:
- Track which format version is being used
- Potentially introduce GS3, GS4, etc. in the future
- Maintain backward compatibility during transitions
4. Data Optimization

The dollar sign separator and prefix system can be more efficient:
- Empty values can be omitted entirely (no need for consecutive dots)
- Parameters can appear in any order
- Optional parameters don’t require placeholder positions
5. Better Error Handling

With prefixed values, parsers can:
- Ignore unknown prefixes without breaking
- Handle missing parameters gracefully
- Validate data more effectively
6. Alignment with Modern Standards

The key-value pair approach (prefix + value) aligns better with:
- JSON-like structures
- URL parameter formats
- Modern data serialization patterns
7. Align with other Google Cookies

These format updates aren’t isolated to Google Analytics. Google is systematically updating various tracking mechanisms to:

Google has recently updated several other cookie formats as well, most notably the Google Click ID (gclid) cookie. These changes follow a similar pattern of modernization:

This evolution reflects Google’s need for a more flexible, maintainable tracking system as Google Analytics continues to evolve and add new features.

Old Format (GS1)

The original Google Analytics cookie format uses a straightforward dot-separated structure:
```
GS1.1.1746825440.14.0.17468254406.0.0.295082955
```
Characteristics:
- Uses dots (.) as separators throughout
- Begins with “GS1” (Google Stream Version 1)
- Values are in fixed positions without prefixes
- Each position has a specific meaning
Structure Breakdown:
New Format (GS2)

Google Analytics has transitioned to a more flexible cookie format that uses prefixed values:
```
GS2.1.s1746825440$o14$g0$t1746825440$j60$l0$h295082955
```
Characteristics:
- Uses dollar signs ($) as separators after the header
- Begins with “GS2” (Google Stream Version 2)
- Each value has a single-letter prefix identifier
- More extensible and self-documenting format
Structure Breakdown:
Prefix Meanings:
- s – Session ID
- o – Session Number
- g – Session Engaged
- t – Last Hit Timestamp
- j – Join Timer
- l – Enhanced User ID Logged In State
- h – Enhanced User ID (hash)
- d – Join ID (not present in this example, appears in some cookies)
Complete Parser Implementation

I’ve created a universal parser that handles both Google Analytics cookie formats. This parser is based on my previous work that originally handled only the GS1 format, which I’ve now adapted to automatically detect and parse both GS1 and GS2 versions.

The parser provides:
- Automatic format detection (GS1 vs GS2)
- Consistent output structure regardless of input format
- URL encoding support for dollar signs (%24)
- Fallback handling for missing values
I’m providing both ES6 and ES5 versions to ensure compatibility across different JavaScript environments:

ES6 Version (Modern JavaScript)
```
const parseGoogleStreamCookie = str => {
  const mapping = {
    s: "sessionId",
    o: "sessionNumber", 
    g: "sessionEngaged",
    t: "lastHitTimestamp",
    j: "joinTimer",
    l: "enhancedUserIdLoggedInState",
    h: "enhancedUserId",
    d: "joinId"
  };
  
  const [version, , data, ...rest] = str.split('.');
  const keys = ['s', 'o', 'g', 't', 'j', 'l', 'h', 'd'];
  
  if (version === 'GS1') {
    return Object.fromEntries(
      keys.map((k, i) => [mapping[k], [data, ...rest][i] || ''])
    );
  }
  
  return Object.fromEntries(
    data.replace(/%24/g, '$')
        .split('$')
        .map(s => [mapping[s[0]] || s[0], decodeURIComponent(s.slice(1))])
  );
};
```
ES5 Version (Legacy Browser/GTM Support)
```
function parseGoogleStreamCookie(str) {
  var mapping = {
    s: "sessionId",
    o: "sessionNumber",
    g: "sessionEngaged",
    t: "lastHitTimestamp",
    j: "joinTimer",
    l: "enhancedUserIdLoggedInState",
    h: "enhancedUserId",
    d: "joinId"
  };
  
  var parts = str.split('.');
  var version = parts[0];
  var result = {};
  var entries = [];
  
  if (version === 'GS1') {
    // GS1: Create entries from positional values
    var values = parts.slice(2);
    var keys = ['s', 'o', 'g', 't', 'j', 'l', 'h', 'd'];
    
    for (var i = 0; i < keys.length; i++) {
      entries.push([mapping[keys[i]], values[i] || '']);
    }
  } else {
    // GS2: Create entries from prefixed values
    var segments = parts[2].replace(/%24/g, '$').split('$');
    
    for (var j = 0; j < segments.length; j++) {
      var prefix = segments[j][0];
      var value = decodeURIComponent(segments[j].slice(1));
      entries.push([mapping[prefix] || prefix, value]);
    }
  }
  
  // Build result object from entries
  for (var k = 0; k < entries.length; k++) {
    result[entries[k][0]] = entries[k][1];
  }
  
  return result;
}
```
Feel free to use either version based on your project’s requirements.

The parser will return a consistent object structure with descriptive property names, making it easy to work with Google Analytics cookie data in your applications.
May 14, 2025

Big News: My Tools Are Becoming Paid Tools Starting January 1st!

Dear friends and users,

🎉 Big changes are coming! Starting January 1st, 2025, my tools will transition to a premium model… or will they? 😉

Here’s the scoop:

You’ll be able to subscribe to the Core Package for just $4.99/month! But here’s the twist: the subscription will be tied to a specific browser. Use Chrome on your desktop and your laptop? That’s two separate subscriptions. Because who doesn’t love the idea of doubling up? 😂

But wait—there’s more! I’m introducing pay-per-use pricing for debugging hits and intercepted pushes.

Yes, you read that right! To keep things “fair” (and, okay, just a tiny bit more profitable), you’ll be charged per action on top of the Core Package.

Here’s how it works:

Just like those fancy cloud services that charge for every hit, push, and probably even every coffee break, I’m hopping on that bandwagon. Why should they have all the fun? 😎💸

Hits:

Action	Cost per Action
Hit Intercepted	$0.01 per hit
Hit Debugged	$0.09 per hit

Pushes:

Action	Cost per Action
Push Intercepted	$0.01 per push
Push Reported	$0.04 per push

Other Services

	Cost
Support	$99.99 per inquiry response
Advanced Event Tracking	$1.00 per event (for deeper insights)
Custom Configuration Debugging	$2.00 per configuration attempt
Real-time Debug Session	$1.50 per minute
Event Sequence Analysis	$4.00 per sequence analysis

Note: Duplicate GA4 event names for the same Measurement ID within the current debugging session will be billed at 0.5x the price of the original action.

But there’s even more! For agencies, I’m rolling out the Platinum VIP Ultra Deluxe Pro Max Plan for just $499.99/month per employee. This exclusive plan offers:

Unlimited debugging hits and pushes
A handwritten thank-you note once a year (seriously!)

Transparency Bonus:

To provide the ultimate user experience, I’ll also start collecting:

Domains and URLs being debugged
Tools being used
All requests made through the tools
Debugged DataLayer push details

But don’t worry—privacy is my priority! All timestamps will be hashed using SHA256, turning them into strings so secure, even a supercomputer couldn’t make sense of them. Basically, it’s like Fort Knox, but for timestamps. 😂

And in the spirit of complete transparency, all this collected data will be made freely available via a public endpoint. Why have privacy when you can have worldwide transparency? 🌐

In Summary:

You’ll still have all the features you know and love—just with a shiny new invoice arriving at the end of each month. Because who doesn’t love surprises?

And now… the punchline!
This is all part of a Día de los Inocentes joke! 🎭

In Spain and Latin America, December 28th is Día de los Inocentes—our version of April Fools’ Day. It’s a day of pranks, laughs, and playful fun.

So, rest easy—my tools remain completely free, and I’m still not collecting any data about debugged pages or tools. I genuinely care about your privacy!

Thanks for playing along—happy pranking! 😄

December 28, 2024

How to Track Page Loads When Restored from the Back/Forward Cache
If you’ve never heard of it, the back/forward cache (bfcache) is a browser optimization introduced in 2019 that is available in all major browsers, including Chrome (or Chromium-based browsers, version 96 and above), Firefox, and Safari. It allows for instant navigation when moving back and forth between pages. This feature significantly improves the browsing experience, especially for users on slower connections or with less powerful devices.

The way it works is that when you navigate to a new page, the browser, instead of immediately destroying the current page, delays it and pauses JavaScript execution. This allows the page to load instantly if the user clicks the back button, as the page is restored immediately from memory.

This is beneficial in almost every imaginable way, but when it comes to measuring our site, it poses a challenge. Since JavaScript execution is paused and the page is restored when the user navigates back ( or forward ), our scripts won’t fire again, which means our tags won’t trigger or re-evaluate. Meaning that for example the page_view won’t fire. Oh boy that’s not good, let’s see what happens on a site when the bfcache is doing it’s thing.

I think this is really important because it means you could be missing crucial page_view tracking on your site, or certain pixels might not be firing as expected. This could lead to gaps in your data collection, making it difficult to accurately measure user interactions and performance. Without proper tracking in place, you may miss key insights, such as user behavior on different pages or the effectiveness of your marketing campaigns. Ensuring that your ‘page_view‘ events are consistently fired across all page types—whether regular page loads, virtual pageviews, SPA sites, or bfcache-restored pages—will help guarantee that you’re capturing all the necessary data for effective analysis and optimization.

As shown below, users navigating through the site won’t trigger any page_view events (or any other events). Meaning that I may be missing the measure of my blog content being viewed :(.

There’s some ways of checking if this is happening to us, easiest way it just checking the console, because, at least Chrome, will let us know when the page is being restored from the bfcache

How BFCache Works

Let’s take a look at how the cache works—it’s a bit different from other caching systems we’re typically used to. The bfcache is a full snapshot of the entire page, including the JS Heap and gets fully restored on user navigation.

You might wonder what happens to the code running on the site, such as promises or setTimeouts. It’s important to know that these will also be paused, along with most other tasks in queues.

It’s also important to keep in mind that iframes contained within the current page are not eligible for the bfcache.

Testing BFCache Behaviour

We can also use Developer Tools to force this behavior by going to the Application tab, then navigating to Background Services > Back/forward Cache.

I want my pages to be fully tracked

We need to understand that the Page Lifecycle API internally tracks the current page state, allowing us to know when a page is being frozen (Freeze Event) or resumed (Resume Event), as shown in the Page Lifecycle API State Flow below:

These events are emitted to the document, so we can listen to them by adding a simple listener, as shown in the code snippets below.
```
document.addEventListener('freeze', (event) => {
  
});

document.addEventListener('resume', (event) => {
  
});
```
Also Since Chrome 68 according to the docs, we can even know if the current page was discarded from memory while being hidden checking the wasDiscarded switch for the current document.
```
if (document.wasDiscarded) {
  
}
```
Tracking my pages when they’re restored from bfcache

Luckily for us, the pageshow event from the Page Lifecycle API that will trigger when a page is loaded, either for the first time or when it is restored from the back/forward cache (bfcache).

Key points about pageshow event
- It is triggered when a page is shown, which can occur when navigating to a page for the first time or when returning to a cached page.
- The event includes a persisted property that indicates whether the page was restored from the bfcache (back/forward cache). If persisted is true, the page was restored from the bfcache.
Based on this, we can easily add a piece of code to monitor if the page is being restored from the bfcache. For example, we can push a page_view event to our dataLayer or trigger a gtag page_view event directly. Of course, you’ll be able to customize this for your specific needs.
```
window.addEventListener('pageshow', (event) => {
  if (event.persisted) {
    window.dataLayer.push({
      'event': 'page_view',
      'pv_origin: 'from_bfcache'
    })
  } 
});
```
Pro Tip: page_view event

This is personal advice: even if you don’t have an SPA site or don’t worry about the bfcache, it’s still a great idea to set up your system to push a ‘page_view’ event every time. Doing so will not only help you establish a common and shared logic for handling regular page loads, virtual pageviews, SPA sites, and bfcache-restored pages, but it will also allow you to avoid race conditions (e.g., by not relying on triggers like ‘All Pages’, DOM Ready, or CMP callbacks). The key is that every time a ‘page_view’ event is fired, it should contain all the necessary data in the dataLayer allow you to have a starndard and safe way to know when triggers things.
November 12, 2024

Return Of The Custom Task (Google Analytics 4 Redux)

I’ve released a new library named “Return Of The CustomTask” which as the name it self suggest brings back the Universal Analytics Custom Task functionality to Google Analytics 4.

It’s an Open Source library under the Apache 2.0 license, that uses Fetch Interceptors to mimic the behavior on the old friend the customTask. The idea of having the change to modify the current GA4 Payload before it gets sent to Google Analytics Servers.

The library consist on a single file that accepts a list of callbacks ( customTasks ) that will be applied to our request.

These tasks will be applied sequentially, meaning you can easily apply more than one action, such as checking for PII and removing duplicate purchase events.

At the same time, I took some time to find all the possible custom tasks by searching on Google, and I’ve already replicated and made them available for everyone. In many cases I’ve even make them even better than the originals 🙂

I must advise that this is a very technical approach to getting things done, so use it at your own risk. If you’re not a developer, consider seeking help rather than just trying to copy and paste. There’re out there so many great Analytics Engineers and Programmers ( including myself ) that will be able to help on having things setup in the best and more safe way.

Note: In the coming days, I will be writing specific posts for each of the tasks to ensure that their usage is clear for everyone. In any case, each task folder on GitHub has a README with the basic details to help set things up.

First Step: Grab the GA4CustomTask code

After building the library you’ll find all the code within the dist/ folder. The code is provided in minified format and non-minified way. Since you’re not likely going to need to change anything here, i would select the dist/GA4CustomTask.js code ( use the minified code better ). Now they only thing we need to do is adding it into a Custom Html tag on Google Tag Manager o in other TMS or your page source.

Important Note: This code needs to be run BEFORE GA4 loads, my advise is using the Initialization Trigger or using a Setup Tag on the GA4 Config Tag. We should need to change anything at this point so just copy paste the code. ( Since you won’t need to change anything here, just use the minified code: https://raw.githubusercontent.com/analytics-debugger/Return-Of-The-Custom-Task/refs/heads/main/dist/GA4CustomTask.min.js )

If you using this library without GTM or using another other TMS the logic should be the same, fire it before GTAG Code.

<script>
// dist/GACustomTask.js
// Use the linkj on the top link for updated code
(function (global, factory) {
    typeof exports === 'object' && typeof module !== 'undefined' ? module.exports = factory() :
    typeof define === 'function' && define.amd ? define(factory) :
    (global = typeof globalThis !== 'undefined' ? globalThis : global || self, global.GA4CustomTask = factory());
})(this, (function () { 'use strict';

    // Check if the URL belongs to GA4
    function isGA4Hit(url) {
        try {
            var urlObj = new URL(url);
            var params = new URLSearchParams(urlObj.search);
            var tid = params.get('tid');
            var cid = params.get('cid');
            var v = params.get('v');
            return !!tid && tid.startsWith('G-') && !!cid && v === '2';
        }
        catch (e) {
            console.error('Error parsing URL:', e);
            return false;
        }
    }

    var interceptors = [];
    // Interceptor function to handle fetch requests and responses
    function interceptor(fetch, args) {
        var reversedInterceptors = interceptors.reduce(function (array, interceptor) { return [interceptor].concat(array); }, []);
        var promise = Promise.resolve(args);
        // Apply request interceptors (resolve to FetchArgs)
        reversedInterceptors.forEach(function (_a) {
            var request = _a.request, requestError = _a.requestError;
            if (request || requestError) {
                promise = promise.then(function (args) { return (request ? request.apply(void 0, args) : args); }, requestError);
            }
        });
        // Proceed with the original fetch call (resolve to Response)
        var responsePromise = promise.then(function (args) { return fetch(args[0], args[1]); });
        // Apply response interceptors (resolve to Response)
        reversedInterceptors.forEach(function (_a) {
            var response = _a.response, responseError = _a.responseError;
            if (response || responseError) {
                responsePromise = responsePromise.then(response, responseError);
            }
        });
        return responsePromise;
    }
    var GA4CustomTask = function (settings) {
        if (!settings)
            return;
        interceptors.push({
            request: function (resource, options) {
                if (options === void 0) { options = {}; }
                try {
                    if (typeof resource === 'string' && isGA4Hit(resource)) {
                        var url = new URL(resource);
                        var RequestModel_1 = {
                            endpoint: url.origin + url.pathname,
                            sharedPayload: null,
                            events: [],
                        };
                        var payloadArray = Array.from(new URLSearchParams(url.search).entries());
                        if (!options.body) {
                            RequestModel_1.sharedPayload = Object.fromEntries(payloadArray.slice(0, payloadArray.findIndex(function (_a) {
                                var key = _a[0];
                                return key === 'en';
                            })));
                            RequestModel_1.events = [
                                Object.fromEntries(payloadArray.slice(payloadArray.findIndex(function (_a) {
                                    var key = _a[0];
                                    return key === 'en';
                                })))
                            ];
                        }
                        else {
                            RequestModel_1.sharedPayload = Object.fromEntries(payloadArray);
                            RequestModel_1.events = options.body
                                .split('\r\n')
                                .map(function (e) { return Object.fromEntries(new URLSearchParams(e).entries()); });
                        }
                        var payload = Object.fromEntries(new URLSearchParams(url.search));
                        if (settings.allowedMeasurementIds &&
                            Array.isArray(settings.allowedMeasurementIds) &&
                            !settings.allowedMeasurementIds.includes(payload['tid'])) {
                            return [resource, options];
                        }
                        if (Array.isArray(settings.tasks)) {
                            settings.tasks.forEach(function (callback) {
                                if (typeof callback === 'function') {
                                    RequestModel_1 = callback.call({ originalFetch: GA4CustomTask.originalFetch }, RequestModel_1);
                                }
                                else {
                                    console.warn('Callback is not a function:', callback);
                                }
                            });
                        }
                        var reBuildResource = function (model) {
                            var resourceString = new URLSearchParams(model.sharedPayload || {}).toString();
                            var bodyString = model.events.map(function (e) { return new URLSearchParams(e).toString(); }).join('\r\n');
                            return {
                                endpoint: model.endpoint,
                                resource: resourceString,
                                body: bodyString,
                            };
                        };
                        var newResource = reBuildResource(RequestModel_1);
                        if (options.body) {
                            resource = "".concat(newResource.endpoint, "?").concat(newResource.resource);
                            options.body = newResource.body;
                        }
                        else {
                            resource = "".concat(newResource.endpoint, "?").concat(newResource.resource, "&").concat(newResource.body);
                        }
                    }
                }
                catch (e) {
                    console.error('Error in fetch interceptor:', e);
                }
                return [resource, options];
            },
            response: function (response) {
                return response;
            },
            responseError: function (error) {
                return Promise.reject(error);
            },
        });
        // Ensure fetch is available in the environment
        window.fetch = (function (fetch) {
            return function (resource, options) {
                var fetchArgs = [resource, options];
                return interceptor(fetch, fetchArgs);
            };
        })(window.fetch);
        return {
            clear: function () {
                interceptors = [];
            },
        };
    };
    // Add original fetch for TypeScript type safety
    GA4CustomTask.originalFetch = window.fetch;
    return GA4CustomTask;
}));
</script>

We are on the right path, now we’ll have a new class GA4CustomTask what we can instantiate, for attaching the intercepts to the Fetch API

<script>
{{ CODE FROM GA4CustomTask.min.js }}
var logRequestsToConsoleTask = () => {...}
var task1= () => {...}
var task2= () => {...}
var GA4CustomTaskInstance = new GA4CustomTask({
 allowedMeasurementIds: ["G-DEBUGEMALL"],
 tasks: [
  logRequestsToConsoleTask, 
  task1,
  task2
 ]
});
</script>

We need to focus on the highlighted lines. This tool operates over the Fetch function, but typically we only want to intercept GA4 hits. Don’t worry the tool already detects these hits internally in order to intercept just the requests we need. However, what happens if we’re using two Measurement IDs on our site? On line 3, we can specify which Measurement ID the Custom Task should apply to.

Then we can define the tasks that will be applied to our payload. On GA4CustomTask is possible to run some chained tasks, and they will sequentially applied. ( The customTask receives the requestModel and returns it back after the task has finished working with it )

Custom Tasks List

I went ahead an migrated all customTasks I found on internet to this new library. You can find the list of them an the source code at the repository as packages on the folder /tasks

Task Name	Description
#1 logRequestsToConsoleTask	Logs all requests to the console, for debugging pourposes
#2 mapClientIdTask	Grabs the clientId (&cid) and attaches the value to the specified parameter
#3 mapPayloadSizeTask	Attaches the current payload size to the specified parameter
#4 preventDuplicateTransactionsTask	Prevents Duplicate Purchases/transaations keeping a list of transactions on the cookies/localStorage
#5 snowPlowStreamingTask	Sends a copy of the payload to your SnowPlow Collector
#6 sendToSecondaryMeasurementId	Sends a copy of the payload to a secondary account
#7 piiScrubberTask	Loops all data in the payload redacting the PII Data
#8 privacySweepTask	Cleans Up all non “Analytics” related parameters/ids

logRequestsToConsoleTask

This tasks prints the current requestModel to the console. Useful for debugging pourposes. It doesn’t take any parameters

var GA4CustomTaskInstance = new GA4CustomTask({
 allowedMeasurementIds: ["G-DEBUGEMALL"],
 tasks: [
  logRequestsToConsoleTask
 ]
});

mapClientIdTask

This task reads the clientId value a passed it back to all the events on the request , or to the first event if the scoped defined is ‘user’

It accepts 2 parameters, the name to be used for the event parameter / user property and the scope. If the scope is not specified it will be set as ‘event‘

var GA4CustomTaskInstance = new GA4CustomTask({
 allowedMeasurementIds: ["G-DEBUGEMALL"],
 tasks: [
  (requestModel) => mapClientIdTask(requestModel, 'client_id', 'event'), 
 ]
});

mapPayloadSizeTask

This task will calculate the total payload size on the current hit, and map is an event parameter ( number ).

It takes the parameter name as a parameter.

var GA4CustomTaskInstance = new GA4CustomTask({
 allowedMeasurementIds: ["G-DEBUGEMALL"],
 tasks: [
  (requestModel) => mapPayloadSize(requestModel, 'payload_size'), 
 ]
});

preventDuplicateTransactionsTask

This task will intercept all hits containing at least 1 purchase event on the payload. If the current ep.transaction_id parameter value was already used on the current browser, that specific event will be removed from the request.

This task relies on Cookies and the LocalStorage for keeping the transactions history. and internally keeps for state management system synched, meaning that if the user removed it’s cookies but not the localStorage the data will be replicated back to the cookie ( and same if they remove the localStorage )

It takes the cookie name as an optional value, or default to __ad_trans_dedup by default

var GA4CustomTaskInstance = new GA4CustomTask({
 allowedMeasurementIds: ["G-DEBUGEMALL"],
 tasks: [
     preventDuplicateTransactions
 ]
});

var GA4CustomTaskInstance = new GA4CustomTask({
 allowedMeasurementIds: ["G-DEBUGEMALL"],
 tasks: [
  (requestModel) => preventDuplicateTransactions(requestModel, '__transaction_cookie'), 
 ]
});

snowPlowStreamingTask

This task takes the GA4 Payload and sends a copy to the defined snowplow collector endpoint. Since SnowPlow expects one event per request this task generates an individual request for each event on the payload ( keeping the sharedParameter intact )

You can pass the endpoint Hostname as a parameter.

var GA4CustomTaskInstance = new GA4CustomTask({
 allowedMeasurementIds: ["G-DEBUGEMALL"],
 tasks: [
   (requestModel) => snowPlowStreaming(requestModel,  endpointHostname),
 ]
});

sendToSecondaryMeasurementId

What to say about this one, a classic. It will replicate out request to a secondary Measurement Ids, but this time, It takes 2 extra parameters: a list of whitelisted events and a list of blacklisted one ( this one will take effect it whitelist is not passed or it’s empty )

// This will relay ALL the events 
var GA4CustomTaskInstance = new GA4CustomTask({
 allowedMeasurementIds: ["G-DEBUGEMALL"],
 tasks: [
  (requestModel) => sendToSecondaryMeasurementIdTask(requestModel, ["G-SECONDID","G-ANOTHER"], [], []), 
 ]
});

// This will relay only the add_to_cart and purchase events
var GA4CustomTaskInstance = new GA4CustomTask({
 allowedMeasurementIds: ["G-DEBUGEMALL"],
 tasks: [
  (requestModel) => sendToSecondaryMeasurementIdTask(requestModel, ["G-SECONDID","G-ANOTHER"], ["add_to_cart","purchase"], []), 
 ]
});

// This will relay all events but purchase events
var GA4CustomTaskInstance = new GA4CustomTask({
 allowedMeasurementIds: ["G-DEBUGEMALL"],
 tasks: [
  (requestModel) => sendToSecondaryMeasurementIdTask(requestModel, ["G-SECONDID","G-ANOTHER"], [], ["purchase"]), 
 ]
});

privacySweepTask

This task strips out all the parameter that are not related to Analytics, in case we are wrroried about our privacy and the data going to Google. Useful if we are tracking an intranet or some sensitive environment and we want to have some extra privacy added.

You can find the list of current parameters on the repository

var GA4CustomTaskInstance = new GA4CustomTask({
 allowedMeasurementIds: ["G-DEBUGEMALL"],
 tasks: [
  privacySweepTask 
 ]
});

EventBouncerTask

Lastly (for now), we have our Bounce Task. We can define a list of events that we want to allow through our implementation, preventing all those pesky vendors and script kiddies from pushing events to the gtag() function to mess with our data.

But not only that, we can define which parameters we want to allow (WhiteListedEventParameters), which will strip out any parameter that is not listed from the current event.

The function takes a Schema definition object to work

var GA4CustomTaskInstance = new GA4CustomTask({
 allowedMeasurementIds: ["G-DEBUGEMALL"],
 tasks: [
    (request) => eventBouncerTask(requestModel, {    
    "sharedEventParameters": ["page_type"],
    "events": {
        "page_view": {
            "wlep": []
        },
        "add_to_cart": {
            "wlep": []
        }
    } 
}),    
 ]
});

In the next days we’ll writing a specific post for each task with more specific details about how to use each of the tasks:)

Enjoy.

September 30, 2024

Securing Google Analytics 4 (GA4): Mitigating Spam and Protecting Measurement IDs with Server-Side Strategies
Disclaimer: Got a report of this may breaking some Google Ads Integration. I’m gathering details to understand the issue and try to find a workaround (if possible).

One of the most recurrent issues while trying keep or data clean is needing to fight with the spammers or the script kiddies that pollute our properties. This is not something new and it’s been around since the start of the times.

This was even a bigger issue on Universal Analyticswhere property IDs were sequential. This made it easy for malicious actors to systematically spam all accounts with minimal effort.With the transition to Google Analytics 4 (GA4), the new Measurement IDs are no longer sequential. This change makes it significantly more difficult to indiscriminately spam all accounts, as the previous method of programmatically targeting consecutive IDs is no longer feasible.

Still we’re not safe from having someone looking at this network requests or even having some people crawling websites and making a lost of the Measurement IDs used to having our properties with some unneeded data.

Also Rick Dronkers has been talking about this on linkedin https://www.linkedin.com/feed/update/urn:li:activity:7210944583294177281/ ,

Sadly, Google Analytics 4 (GA4) doesn’t have good controls to stop or filter out unwanted data from coming into your account.

Due to the nature of the Analytics Tracking ,which happens mainly on client side, there’s no way to stop this spam. But we can take some actions to try to mitigate this issue. And we’re going to talk about the most important one from my point of view. This is:

Not Allowing spammers to know our real Measurement ID

The last month my company ( Analytics Debugger ) became a Stape.io partner, since we started to offer Server-Side services. Which is allowing me to start playing more with the technology. and luckily this reported spam attack over makes the perfect introduction for a new Server-Side Trick.

Running GTM/GA4 via SGTM in Stealth Mode.

We’re using a Client Template to make our Measurement ID invisible to attackers. This will keep the script kiddies that crawl our sties away since they’ll be getting a fake ID , And the tempalte will take care of forwarding the final request with the proper ID.

But David, the could still point to our server-side endpoint and hits will end hitting our property, and that right!, but since we’re running a server-side endpoint we can enforce some rules ( working on refactoring some old code I have in PHP to make a SGTM template ) to prevent the spam. At this point we have a lot of rules to try to fight the spam traffic, for example:
- User Agent Checking
- Checking the Request IP address Against the associeted ASN ( Autonomus System Number ) , most of the ISP or DC providers have one, so it’s a easy task to filter out not residential connections using this method. There’re even some IP database that are open ( this information is provided by RIPE )
- The use of public ip lists of DCs, Cloud servicers provides, VPNS, etc. For example: https://github.com/jhassine/server-ip-addresses , https://udger.com/resources/datacenter-list, https://github.com/growlfm/ipcat/blob/main/datacenters-stats.csv for example, if you’re really into it there’re some companies mainting some paid services for consulting if the current IP belongs to non resential connection
But not only this, we’re on server side, meaning that we can easily build some last 15 minutes database list, and build some throttling mechanism, or we could also check the IP GEO location, let’s be honest I’m from spain, getting too much traffic from some countries may the unsual. But the best of all this that could even build some internal IP Score, allowing us to tag spam traffic ( not event removing it ) for example assing the &tt= parameter via SGTM if the current score is > 5 ( you’ll need to the setup the rules ).

In any case, I know some other people has been talking about this in the past, I feel this could be great change for running a hackaton with the smart people around ( maybe in the Analytics DevFest becames a reality 🙂

So, filtering hits has been a long-debated topic, where Server-Side can play an important role. However, if we don’t hide our Measurement ID, spammers can directly target GA4 servers and bypass any implemented checks.

Let’s me start showing how your setup will look like, please note that the possible attacker will only see our GTM-DEBUGEMALL, and G-DEBUGEMALL Ids. At any point he will be able to target Google Servers to grab the real data 🙂

But still with our Server Side send the data to Google., replaced the &tid by the real one:

Implementing this involves setting up a Server-Side Client Template, which you can download from here: GTM/GA4 Stealth Mode Client Template and setting up the real and fake ids you want to use in our setup, refer to the following screenshot for guidance:

Last step would be slightly updating our GTM Loading snippet. You may see that there’s an extra /s/ before the gtm.js, this is because SGTM doesn’t seem to allow you to claim the requests to the knows GTM/GA endpoints ( or I was not able to do it … )
```
	<script>
 
	(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
	new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
	j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
	'https://our.server.side.endpoint.com/s/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
	})(window,document,'script','dataLayer','GTM-DEBUGEMALL');

</script>	
```
The template needs to be improved, and I don’t consider it production ready, ( despite I’m using it on this own blog )

Essentially, we can utilize all our methods to combat spam, as we have been doing, while ensuring that the ID remains concealed to prevent attackers from circumventing any measures aimed at safeguarding our data integrity.
June 24, 2024
Solving the GA4 Server-Side Configuration Issue: Why Your First Hit Might Bypass Your Server-Side Endpoint
Disclaimer: This might stem from a misunderstanding on my side about how GA4 tags function within Google Tag Manager (GTM), especially compared to how configuration tags used to work in Universal Analytics. The differences between these systems can be subtle and confusing. This issue has been quite frustrating, so I’m sharing my experience in case others encounter the same problem.

I was working on my Google Analytics 4 implementation via Google Tag Manager Server Side and I noticed some of the hits were coming through the regular endpoint ( *.google-analytics.com ), for some reason.

After some testing I figured out the issue. Despite Google Tag Manager shows the associated Google Tag on your GA4 tags, it’s not taking any action to be sure that the config settings are applied to the related tag. Meaning that any tag firing before Google Tag won’t have the needed configuration. GTM won’t be applying the settings to the linked GA4 Tag.

Having this message in the interface makes it somehow confusing:

So for example if the current dataLayer event that triggers your event is on the dataLayer before the event that triggers the GA4 Configuration Tag, the server_container_url will be ignored making your first hit not going through your server-side endpoint.

This means that if are firing a GA4 event tag based on a trigger that happens before the trigger that fires the GA4 config tag, these hits wont go through the SGTM endpoint.

Here is it is an example of the issue:

You see the issue by your own on the following urls:
https://www.thyngster.com/demos/gtag_ss_race_issue/pv_push_before_gtm.html
https://www.thyngster.com/demos/gtag_ss_race_issue/pv_push_after_gtm.html

I prepared a test environment with the most simple replication setup, just a page_view event tag that fires on the page_view event, and the GA4 Config Tag that fires on the All pages (gtm.js) event.

This is how it looks like the tags and triggers on this container:

In my case the issue was not a big deal since i’m just firing the data through the server-side endpoint and I’m not doing anything else, so the biggest problem I’d have is having some hits being blocked by some adblocker because of the domain.

I expect some people firing pixels or doing data widening, firing some pixels based on the SGTM GA4 Client to have some more inconveniences, despite the data flow to Google Analytics 4 to keep working.

Google Tag Manager knows when there’s a Google Tag that references our current Measurement IDs ( check screenshot above ), meaning that should be possible to grab the configuration parameters ( server_url, cookie preferences, etc ) and apply them to the to the linked Event Tags ( as Universal Analytics Config tags used to work … ) without even needing to wait for the Google Tag execution at all or maybe have them them queued them until the related GTAG has fired.

But anyway if you are having this issue, the workarounds are pretty straighforward:
- Ask your developers to move your push after GTM Snippet
- Set the GA4 Config tag trigger to be the same as your first event trigger
- Manually set the server_container_url for the tags that may fire before the Google Tag
- Be sure that GTAG fired always before any GA4 Event Tag.
Based on a comment on Linkedin from Roman Appeltauer there’s an extra way of getting over this.
- Use the Initialization Trigger for your Google Tag
This post talks about the server-side integration issues (because it was my problem) but this can be applied to having user properties or event parameters set via the Google Tag. and not being applied to all the tags, depending on your triggers or dataLayer configuration )
June 23, 2024

Guide to Tracking Google Ad Manager Impressions and Clicks

When we use Google Ad Manager, the ads are served using an iframe, which makes it somewhat challenging if we want to track the impressions in another tool, such as Google Analytics 4.

This is why, on this occasion, I’ll be sharing a library to programmatically track our ad impressions and clicks.

For this task we’ll relying on the ad tagging library for Google Ad Manager, Google Publisher Tag and the Google Analytics Promotion Tracking from the Ecommerce ( view_promotion , select_promotion events )

You may identify th eGoogle Publish Tag Library for the global variable window.googletag, check the following snippet and the most simpliest example.

<head>
  <meta charset="utf-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1" />
  <meta name="description" content="Display a fixed-sized test ad." />
  <title>Display a test ad</title>
  <script async src="https://securepubads.g.doubleclick.net/tag/js/gpt.js"></script>
  <script>
    window.googletag = window.googletag || { cmd: [] };

    googletag.cmd.push(() => {
      // Define an ad slot for div with id "banner-ad".
      googletag
        .defineSlot("/123456789/Debug/Em/All", [300, 250], "banner-ad")
        .addService(googletag.pubads());

      // Enable the PubAdsService.
      googletag.enableServices();
    });
  </script>
  <style></style>
</head>

The way the Google Ad Server works is that it creates an iframe with our unit_ad_id as its name or ID, and it handles querying the Google Ad Manager server for details about the ads and the creatives to be served.

If we want to track which ads are being served in our pages, The first challenge we’ll encounter is that we only have information about the ad-unit path, which essentially provides details about the current slot context. Typically, we also need other values, such as the current campaign and the creatives being served. Since all the rendering happens on the iframe we won’t be able to know when the ads has been served, or even have access to the content becuase the content is loading within the iframe.

So, let’s start to figure out how are we going to track our ads, first thing we need to know that despite being working on an iframe ( i know we all hate iframes ), GTP library offers some events that will help us in our mission. ( GPT Reference )


`impressionViewable`
`rewardedSlotClosed`
`rewardedSlotGranted`
`rewardedSlotReady`
`slotOnload`
`slotRenderEnded`
`slotRequested`
`slotResponseReceived`
`slotVisibilityChanged`

Available events on Google Tag Publisher

We want to track the ads impressions, so we’ll be using the impressionViewable event. Which will trigger once the banner has been shown to the user.

This is how we can add a listener for the Google Tag Publisher event.

googletag.pubads().addEventListener("impressionViewable", function(ad) {
	// Do our stuff
});

You may have noticed that the event will return a value ( that we’re naming as ad) that will contain some extra details about the current ad being shown. This will return and slot object, which some functions we can use to grab info about the current ad just rendered.

The primary method we need to examine is ad.slot.getResponseInformation(). This method returns the current advertiserId, campaignId, and creativeId, which we will use to compile our Promotion Impression data.

> ad.slot.getResponseInformation()
{
    "advertiserId": 5632560629,
    "campaignId": 3554088676,
    "creativeId": 138478051191,
    "lineItemId": 6737660555,
    "sourceAgnosticCreativeId": 138478051191,
    "sourceAgnosticLineItemId": 6737660555,
    "isBackfill": false,
    "yieldGroupIds": null,
    "companyIds": null,
    "creativeTemplateId": null,
    "encryptedTroubleshootingInfo": null
}

But not only this, we also want to track to where the ads points to, ie: what’s the clickURL. Sadly none of the method will give us that information, but we can use ad.slot.getHtml()method , which will return the current ads markup and we’ll be using some regex to extract the details.

We will follow this logic

Extract all the a href links from the html
Filter these href links for those who have a adurl parameter

This is the way we can extract the destination url for our banner

var getAdUrl = function(adHtmlString) {
  // Step 1: Extract all 'a' tags with 'href' attributes
  var aTagRegex = /<a\s+[^>]*href\s*=\s*["']?([^"'>\s]+)["']?[^>]*>/gi;
  
  let matches;
  var hrefs = [];
  while ((matches = aTagRegex.exec(adHtmlString)) !== null) {
    hrefs.push(matches[1]); // Capture the 'href' value
  }
  
  // Step 2: Filter hrefs that contain the 'adurl' parameter
  var adurlHrefs = hrefs.filter(href => href.includes('adurl='));
  
  // Step 3: Extract the 'adurl' parameter value from these hrefs
  var adurlValues = adurlHrefs.map(href => {
    var urlParams = new URLSearchParams(href.split('?')[1]);
    return urlParams.get('adurl');
  });  
}

var adHtmlString = ad.slot.getHtml()
console.log(getAdUrl(adHtmlString));

Let’s quickly recap. Now, we can capture the Ad-unit Path, the campaignId, the creativeId, and the destination URL for the impressions of our ads, which provides us with valuable information. This means we can build a dataLayer.push({}) with a view_promotion event to track our impressions. Alternatively, we can send this information to any other tool you prefer. We’ll be using a Google Tag Manager (GTM) dataLayer.push as it is the most commonly used tag management system (TMS). Feel free to adapt this core implementation for your needs or reach out to me for help in customizing it to your site’s requirements.

Next Challenge is: “How do we track the clicks to the iFrame“. the <iframe> element doesn’t have a click event, which doesn’t help in our task, we can luckily detect the clicks within the iframe.contentWindow since it’s not hosted on a remote domain ( no src )

document.querySelector('#div_ad_holder iframe').contentWindow.document.body.onclick = function(e) {
	alert("Ad Clicked");
}

At this point, we should know how to detect when an impression has been rendered, extract all the details about the ad’s impression, and track the clicks on the ads. We have everything we need; it’s just a matter of putting everything together to create a fully functional tracking solution.

To make the process easier for everyone, I’ve started a new GitHub repository that provides a simple, ready-to-use library. This library will push the data as an Ecommerce Promotion event to the GTM dataLayer.

Find it here: https://github.com/analytics-debugger/google-ad-manager-tracker

(function() {
    // Copyright Analytics Debugger S.L.U. All Rights Reserved. 2024
    // Author: David Vallejo
    // Date: 2024-06-21
    // Name: Google Publisher Tag Impressions and Clicks Tracker
    // This file is licensed under the MIT License.
    // License text available at https://opensource.org/licenses/MIT  
  
    // Add ?ad_debug=1 for debugging messages
    var debug = document.location.search.includes('ad_debug=1') ? true : false;
    // Init googletag variable, jic
    window.googletag = window.googletag || {
        cmd: []
    };
    // Init googletag variable, jic
    window.dataLayer = window.dataLayer || [];
    // Our variable for holding the impressions info. Used for grabbing the data on click
    window.__gpt_impressions = window.__gpt_impressions || [];

    // Helper function to extract the ad destination URL from the Ad HTML
    var getAdUrl = function(adHtmlString) {
        // Step 1: Extract all 'a' tags with 'href' attributes
        var aTagRegex = /<a\s+[^>]*href\s*=\s*["']?([^"'>\s]+)["']?[^>]*>/gi;

        let matches;
        var hrefs = [];

        while ((matches = aTagRegex.exec(adHtmlString)) !== null) {
            hrefs.push(matches[1]); // Capture the 'href' value
        }

        // Step 2: Filter hrefs that contain the 'adurl' parameter
        var adurlHrefs = hrefs.filter(href => href.includes('adurl='));

        // Step 3: Extract the 'adurl' parameter value from these hrefs
        var adurlValues = adurlHrefs.map(href => {
            var urlParams = new URLSearchParams(href.split('?')[1]);
            return urlParams.get('adurl');
        });
        if (adurlValues.length > 0) return adurlValues[0];
    }
    // Adding the impression Listener
    googletag.cmd.push(function() {
        googletag.pubads().addEventListener("impressionViewable", (event) => {
            // We have an impression, let's get the holder iframe reference and add the click event.
            document.querySelector('#' + event.slot.getSlotElementId() + ' iframe').contentWindow.document.body.onclick = function(e) {
                var impressionInfo = window.__gpt_impressions.filter(function(e) {
                    if (e.promotion_name === event.slot.getSlotElementId()) return true;
                });
                window.dataLayer.push({
                    event: 'select_promotion',
                    ecommerce: {
                        items: [impressionInfo]
                    }
                })
                if (debug === true) console.log("GPT AD CLICK", impressionInfo);
            }

            var slotDetails = event.slot.getResponseInformation();
            try {
                var impressionInfo = {
                    promotion_name: event.slot.getSlotId().getDomId(),
                    promotion_id: slotDetails.campaignId.toString(),
                    creative_name: slotDetails.creativeId.toString(),
                    creative_slot: getAdUrl(event.slot.getHtml())
                }
                window.dataLayer.push({
                    event: 'view_promotion',
                    ecommerce: {
                        items: [impressionInfo]
                    }
                });
                window.__gpt_impressions.push(impressionInfo);
                console.log("GPT AD IMPRESSION", impressionInfo);
            } catch (e) {
		            if (debug === true) console.log("GPT ERROR GRABBING IMPRESSION DETAILS: ", e);
	          }
        });
    });
})()

The earlier code snippet is responsible for pushing the necessary data to Google Tag Manager. At this stage, you just need to add the appropriate tags and triggers in Google Tag Manager to ensure the data flows into your account.

This code has only been tested on Chromium-based browsers. While the impressions should work across all browsers, we are unsure of how Safari and Firefox will behave with regard to the clicks.

June 21, 2024

… App Debugging For All: Making App Implementation Auditing Accessible to Everyone!
If you’ve ever worked on a project and needed to debug your company’s or client’s app, I’m sure you already know how frustating can be needing to audit the implemetnation. I’m sure you know I’m talking about. Not having a public and avaiable namespace and API as we do in browsers can make things really hard, even more if you don’t have some technical skills.

Circa 2016 I presented ( well Sir Yehoshua Coren did on my behalf ) in the SuperWeek‘s Punchcard Prize a tool for debugging apps using a small hardware device ( rPI like board ). Which was essentially able to intercept anything that was connected through the Wi-Fi connection this device was generating. As simple as that sounds,on a piece of hardware made the prospect of broadly offering this to everyone somewhat difficult. At the same time, turning it into a commercial product was not an option due to my limited business and communication skills at that point.

Analysts, particularly those in data analytics and data science roles, don’t need an exhaustive understanding of how everything functions. For instance, they don’t require knowledge of JavaScript to examine how a video player is tracked, and there’s no necessity for them to be familiar with Kotlin to review the events tracked within an app. However, I firmly believe they should have the capability to check if, at the very least, the tracking is occurring. This is where my tools come into play, adn this is where Analytics Debugger for Apps was born, enabling anyone to effortlessly inspect what’s happening in their analytics implementations.

My aim was clear: to democratize Apps Debugging, making the debugging task accessible to individuals of all technical backgrounds.

Looking back to 2019 with the upcoming release of App+Web ( now turned into Google Analytics 4 / Firebase for GA ) I began experimenting with the idea of creating a tool that would mimic the reports of the current Chrome extension, regardless of the platform being debugged. Before long, I developed “Android デバッガ” a debugger for Firebase/Universal Analytics on Android as the initial step, which was later expanded to include iOS debugging and name turned into “iOS/Android Debugger“.

I really made a list of points that the tool should comply with:
- No App Modifications Needed: The debugging should work ‘out of the box’ without the need to recompile packages or add external libraries.
- Reports should be vendor-specific, not just a simple output of requests, in order to make the data understandable for any analyst or marketer.
- It ought to be made available for free.
- Open to the inclusion of new vendors.
At this stage of the venture, I’m happy to say that I’ve met all the objectives and am ready for the next steps. You might see a pattern here, but the first thing I did was change the name (once again). It’s quite in line with previous name changes. So, say hello to “Analytics Debugger for Apps“.

The debugger remains FREE for everyone as long as you stay subscribed to the newsletter (by the way, I’ve sent 0 emails in 10 years…). That’s the only requirement to use the tool. You can create an account and then a license at https://app.analytics-debugger.com at not cost.

You might be wondering if there’s a catch to this, but the answer is straightforward: there isn’t one. I’m simply sharing one of my internal tools to help people. Not sure how long I will be doing it, so for now let’s just enjoy the moment 🙂

You wouldn’t believe how time-consuming tool development can be, especially when going solo ( backend, frontend, testing, websites, docs, social, updated, support, backups, apis, design ). Therefore, the tool is open to sponsorships. Sponsors will not encounter any nag screens or experience delays while using certain features, such as exporting data to the clipboard.

Currently, Windows 64-bit and MacOS Universal versions are available at https://www.analytics-debugger.com/tools/ios-android-debugger/, with an upcoming Linux 64-bit version (currently in beta) in the works.

“Due to Android’s limitations, not all vendors are supported (though this may change in the near future 😉), making iOS App debugging the more feature-rich version.” . Featured vendors are: Firebase Analytics, Universal Analytics (to be deprecated), SnowPlow and Adobe Analytics . As I mentioned before, I do this in my free time. While I would love to add more features, This is where sponsorships come into play, providing me with the resources to dedicate more time to working on the tools.

I’m amazed that over 1,500 fellow analysts have used the debugger so far, especially considering the minimal marketing efforts I’ve put into it. Huge thanks to everyone! .

Happy Debugging
January 2, 2024

Securing Your DataLayer: Defending Against Pollution by External Tools and Bots to Preserve Data Integrity

As expected in the last months many vendors and providers has started to add Google Analytics 4 integrations, and many of them may just push events to the GTAG wrapper function, and you’ll likely ending having some unwanted events on your reports.

Not only about vendors, spammers has an easy way to programatically mess with your data, just using the global objects.

We’ll learn some implementation tricks for preventing any other tools to pollute our GA4 data and also how we can ensure that nobody else but us send data send data to our dataLayer ) , as usual I’ll be using examples for Google Tag Manager and Google Analytics 4 but same logic could be applicable to any other tool.

Protecting GTAG from Bots and Vendors pollution

In order to protect our setup from unasked events or pushes, we’ll slightly modify our GTAG calls. First modification is adding some guard check on the GTAG wrappermething blocked them.

<script async src="https://www.googletagmanager.com/gtag/js?id=G-THYNGSTER"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag() {
    // Guard, first argument is not our protection token?, skip
	if (arguments[0] !== 'protectToken') return;

    // This function will allow us to pass back an Argument Type Objecto to the dataLayer
	function passArgumentsBack() {
		dataLayer.push(arguments)
	}
    // Remove the first argument and pass the rest back
	passArgumentsBack.apply(this, Array.prototype.slice.call(arguments, 1));
}
gtag('protectToken', 'js', new Date());
gtag('protectToken', 'config', 'G-THYNGSTER');
</script>

Now, any gtag function calls, where the first arguments is not our “protectionToken” will be blocked. Making that any vendor or bots that tries to push data to our namespace, will be just stealthy ignored

It may happen that you cannot modify the gtag on the creation, maybe because it’s hardcoded on the page or because someone else has already initialized it. don’t worry, you could run this code to override the current method.

if (window.gtag && {}.toString.call(window.gtag) === '[object Function]') {
	function gtag() {
		if (arguments[0] !== 'protectToken') return;
		function passArgumentsBack() {
			dataLayer.push(arguments)
		}
		passArgumentsBack.apply(this, Array.prototype.slice.call(arguments, 1));
	}
}

You have to remember that now you need to append your protectionToken to any call you want to push.

gtag('protectToke', ... )

Protecting our Datalayer from Bots and Vendors pollution

I bet you may have already thoughjt about just adding a custom event to all your pushes and then adding a blocking condition on GTM, and that’s indeed a great idea, but at this time we’re not trying to block events being push, but our dataLayer being polluted and messed up. We really want a clean and nice looking dataLayer which is fully governated by us, in anyone want to enter the play , should ask us first :).

Protecting the dataLayer is a bit more complicated ( at least Google Tag Manager one ), and this is because when GTM loads, it modifies the original push method from the array. That we’ll be end losing the reactivity, or causing any other malfunction is we mess around with it, at the same and for the same reason, we cannot add the modification on the dataLayer initialization because it will be lost when GTM overrides it.

The sting we need to do here is wait until the dataLayer.push has been fully initialized and then add some method to intercept the calls beind made to it.

On this example I’ll be using a simple proxy pattern , but there are some more proper ( and at the same time more difficult to implement ) workarounds like, working with setters and getters or using an ES6 Proxy. In any case this method is pretty straightforward an has a very good cross-browsers support.

I tried to focus on having an undestandable code rather than some cool looking code. We’ll use a Promises based solution to query the dataLayer.push method until we detect it was already initilaized by Google Tag Manager and then we’ll add our proxy

<script>
(function() {
	var settings = {
		dataLayerName: 'dataLayer',
		pollingTime: 25,
		limit: 1000,
		protectKey: 'ptoken',
		protectValue: 'thyngster'
	}

	var waitForDataLayerReady = function(settings) {
		var count = 1;

		function waitFor(result) {
			if (result) {
				var proxiedDataLayerPush = window[settings.dataLayerName].push;
				window[settings.dataLayerName].push = function() {
					if ((arguments && arguments[0] && arguments[0][settings.protectKey] && arguments[0][settings.protectKey] === settings.protectValue) || arguments && arguments[0] && arguments[0].event && String(arguments[0].event).match(/^gtm\./)) {
						if (arguments && arguments[0] && arguments[0][settings.protectKey] && arguments[0][settings.protectKey]) delete arguments[0][settings.protectKey]
						return proxiedDataLayerPush.apply(this, arguments);
					}
				}
				return settings.dataLayerName
			}
			if (count >= settings.limit) {
				return null;
			}
			count++;
			return new Promise(function(resolve) {
				setTimeout(resolve, settings.pollingTime || 1000)
			}).then(function() {
				Promise.resolve(window[settings.dataLayerName || 'dataLayer'] && window[settings.dataLayerName].push && window[settings.dataLayerName || 'dataLayer'].push.toString().includes('SANDBOXED_JS_SEMAPHORE'))
			}).then(function(res) {
				waitFor(res)
			});
		}
		return waitFor();
	}

	waitForDataLayerReady(settings).then(function(result) {
		if (result === true) {
			var proxied = window[settings.dataLayerName || 'dataLayer'];
			window[settings.dataLayerName || 'dataLayer'] = function() {
				return proxied.apply(this, arguments);
			}
		}
	});
})()
</script>

dataLayerName	This is our dataLayer variable name, will default to `dataLayer`
pollingTime	The polling period, by default it’s checked every 25ms
limit	We don’t really want to wait forever, limit + pollingTime will stop the watcher. If you want to calculate this in seconds the total seconds that the code will keep waiting for a dataLauyer is secs = (limit * pollingTime) / 1000
protectKey	This is the key we need to add to our pushes, if it’s not present the push won’t go throught
protectValue	And this is the expected Protect Token value

Settings Parameters Definition

If you check the code carefully we added an special rule to allow all events starting with /^gtm\..*/ skip the check, to allow the system pushes to keep going into the dataLayer.

So now, if we someone does the following, the push will be intecerted and will never reach our dataLayer.
window.dataLayer.push({
    event: 'evilVendor',
    opted_in_groups: '1,2,3,4'
})
In the other side our pushes should contain the protect key and token so they are allowed to end into the DL.

window.dataLayer.push({
    event: 'add_to_wishlist',
    ptoken: 'thyngster'
})

From this point on the logic could be extended as much as you want, for example you may want to defined a whitelist events list rather than working with a token, it’s just up to your imagination.

This proxy pattern is extendible to almost any tools, meaning that you could this concept to any other vendor or TMS. Please take in mind that this is not a trivial stuff to add, so my advise is relying on your dev teams or some agency or contractor that can take some proper care or implmention this kind of solutions.

August 10, 2023

Step-by-Step Guide: Measuring JavaScript Memory Usage on Your Web Pages

Anyone that knows me is aware of how obsessive may I be when working on something from the optimization perspective. I really mi

The point of this post is measuring the current memory usage by the JavaScript in our pages, for this we’ll be using the performance.memory API . I have to start saying it’s a currently deprecated feature that it’s only available on Chromium browsers, which returns the JavaScript heap details, and that it may miss some accuracy depending on if some pages sahre the same heap or the page has iframes using a separate heaps.

A new experimental API is meant to replace this one, Performance.measureUserAgentSpecificMemory(), which will be able to estimate the current web page memory usage including it’s iframes and related workers. But it needsf or cross-origin-isolation ( window.crossOriginIsolated ), and we’d need to add teh COOP/COEP headers to our site. So let’s work with the old API for now.

In any case, we’ll be working with the current performance.memory API, which may end giving us some good tips on our current pages memory usage.

Let’s get on it

performance.memory

As we mentioned before this is a function, that will return the current JS Heap of the current page and will only work for Chromium based browsers, still having a sampling of this metric could help us on measuring how our sites performs,

It will return 3 different metrics.

jsHeapSizeLimit , The max. memory available for the current context.
totalJSHeapSize , The total allocated head size.
usedJSHeapSize , The current active segment of JS Heap

In any case returning the current memory usage at an specific moment doesn’t give much value, so I prepared an script that will recording the current values during the page load, and that pushes the maximum, minumun, and average memory usage to our dataLayer.

You can personalize the values, by default the script will ask for the current memory usage each 1/4 seconds ( 250ms ), during 10 seconds, or 2.5seconds after the page load event whatever comes first. Just in case the data will be pushed if the user navigates away from the page before any of the previous rules happen using the beforeunload page event.

You can personalize these values on the script top section. Please have in mind that using a 1 millisecond polling time won’t be likely giving you better insights, just keep that value

The idea is grabbing the memory usage during the page rendering, since after the page load it may not change much. ( this will depend of course, if you have some lazy loading content, the memory usage will keep changing ), or you may be using an SPA site, that you may want to monitor on some regular intervals, if that case you may have this running on the background and push the momory usage data for each history.change or virtual pageview

The code Snippet

Here you can find the code that will take care of monitoring the memory usage over the page load. Please the sooner you add this code into the page the better, if you are using Google Tag Manager using the gtm.init , gtm.js / All Pages events, if you have a TMS with the option of adding sychrounous code that’s your best choice, and lastly adding directly into the page.

Parameter	Description
eventName	The event name to be using on the dataLayer Push . `string`
sampleLimit	Total count of samples to take. `integer`
pollingPeriord	Polling time in ms . This along with the sampleLimit will defined how much max time will take the vent to fire. `integer`
waitTimeAfterWindowLoad	Total Seconds to wait after windows load. It doesn’t make many sense waiting 10 seconds if the page loads comes in 2 seconds, So we’re pushing the data on this data without waiting for all the sample limit. `integer`

<script>
(function() {
    // David Vallejo (@thyng)
    // Analytics Debugger S.L.U. 2023

    var settings = {
        eventName: 'memory_usage_profiler',
        sampleLimit: 40,
        pollingPeriod: 250, // in ms
        waitTimeAfterWindowLoad: 2500 
    }

    // This is only available on Chromium based browsers, just skip if the API is not available
    if (!(window.performance && 'memory'in window.performance))
        return;

    try{
    // Initialize Data
    var data = {
        sent: false,
        samplesCount: 0,
        max: performance.memory.usedJSHeapSize,
        min: performance.memory.usedJSHeapSize,
        avg: performance.memory.usedJSHeapSize
    }
    var pushData = function(data) {
        if (!data.sent) {
            window.dataLayer.push({
                'event': settings.eventName || 'memory_usage_profiler',
                'event_data': {
                    'max_memory_usage': (data.max / 1024 / 1024).toFixed(2),
                    'min_memory_usage': (data.min / 1024 / 1024).toFixed(2),
                    'avg_memory_usage': (data.avg / 1024 / 1024).toFixed(2),
                }
            })
            data.sent = !!true
        }
    }

    var clear = setInterval(function() {
        if (performance.memory.usedJSHeapSize > data.max)
            data.max = data.avg
        if (performance.memory.usedJSHeapSize < data.min)
            data.min = data.avg
        data.avg = (data.avg + (performance.memory.usedJSHeapSize)) / 2
        data.samplesCount++;
        if (data.samplesCount >= settings.sampleLimit) {            
            clearInterval(clear)
            pushData(data)
        }
        ;
    }, settings.pollingPeriod)

    // If page has been already loaded, wait 1 second and push the data
    window.addEventListener("load", function(event) {
        setTimeout(function() {
            clearInterval(clear)
            pushData(data)
        }, settings.waitTimeAfterWindowLoad);
    }
    );
    // In case the user navigates away from the page...
    // Should prefer to use hide,load mechanism, Pending.
    window.addEventListener('beforeunload', function(event) {
        clearInterval(clear)
        pushData(data)
    });
    
    }catch(e){}
}
)()
</script>

Sending the data

At this point we have all the data coming into our dataLayer, this means that we could send it to wherever we want to. As usual I’m using Google Analytics 4 , and then make use of some Metrics for getting the averages.

It’s gonna be some easy setup, just create 3 dataLayer type variables, a trigger to match the event name you defined for this tracking ( default: “memory_usage_profiler” ) , and lastly map all to a GA4 event Tag.

Google Analytics 4 Metrics

We’d need to create some metrics in our account, we should create metrics and not dimensions.

Looker Studio Report Example

I quickly built a looker studio report to show how the data will look like.

August 1, 2023

Blog

Overview

Why Did Google Change the Cookie Format?

1. Improved Extensibility

2. Self-Documenting Structure

3. Version Control

4. Data Optimization

5. Better Error Handling

6. Alignment with Modern Standards

7. Align with other Google Cookies

Old Format (GS1)

Characteristics:

Structure Breakdown:

New Format (GS2)

Characteristics:

Structure Breakdown:

Prefix Meanings:

Complete Parser Implementation

ES6 Version (Modern JavaScript)

ES5 Version (Legacy Browser/GTM Support)

Here’s how it works:

Transparency Bonus:

In Summary:

How BFCache Works

Testing BFCache Behaviour

I want my pages to be fully tracked

Tracking my pages when they’re restored from bfcache

Pro Tip: page_view event