Author: admin

  • How to Track Page Loads When Restored from the Back/Forward Cache

    If you’ve never heard of it, the back/forward cache (bfcache) is a browser optimization introduced in 2019 that is available in all major browsers, including Chrome (or Chromium-based browsers, version 96 and above), Firefox, and Safari. It allows for instant navigation when moving back and forth between pages. This feature significantly improves the browsing experience, especially for users on slower connections or with less powerful devices.

    The way it works is that when you navigate to a new page, the browser, instead of immediately destroying the current page, delays it and pauses JavaScript execution. This allows the page to load instantly if the user clicks the back button, as the page is restored immediately from memory.

    This is beneficial in almost every imaginable way, but when it comes to measuring our site, it poses a challenge. Since JavaScript execution is paused and the page is restored when the user navigates back ( or forward ), our scripts won’t fire again, which means our tags won’t trigger or re-evaluate. Meaning that for example the page_view won’t fire. Oh boy that’s not good, let’s see what happens on a site when the bfcache is doing it’s thing.

    I think this is really important because it means you could be missing crucial page_view tracking on your site, or certain pixels might not be firing as expected. This could lead to gaps in your data collection, making it difficult to accurately measure user interactions and performance. Without proper tracking in place, you may miss key insights, such as user behavior on different pages or the effectiveness of your marketing campaigns. Ensuring that your ‘page_view‘ events are consistently fired across all page types—whether regular page loads, virtual pageviews, SPA sites, or bfcache-restored pages—will help guarantee that you’re capturing all the necessary data for effective analysis and optimization.

    As shown below, users navigating through the site won’t trigger any page_view events (or any other events). Meaning that I may be missing the measure of my blog content being viewed :(.

    There’s some ways of checking if this is happening to us, easiest way it just checking the console, because, at least Chrome, will let us know when the page is being restored from the bfcache

    How BFCache Works

    Let’s take a look at how the cache works—it’s a bit different from other caching systems we’re typically used to. The bfcache is a full snapshot of the entire page, including the JS Heap and gets fully restored on user navigation.

    You might wonder what happens to the code running on the site, such as promises or setTimeouts. It’s important to know that these will also be paused, along with most other tasks in queues.

    It’s also important to keep in mind that iframes contained within the current page are not eligible for the bfcache.

    Testing BFCache Behaviour

    We can also use Developer Tools to force this behavior by going to the Application tab, then navigating to Background Services > Back/forward Cache.

    I want my pages to be fully tracked

    We need to understand that the Page Lifecycle API internally tracks the current page state, allowing us to know when a page is being frozen (Freeze Event) or resumed (Resume Event), as shown in the Page Lifecycle API State Flow below:

    These events are emitted to the document, so we can listen to them by adding a simple listener, as shown in the code snippets below.

    document.addEventListener('freeze', (event) => {
      
    });
    
    document.addEventListener('resume', (event) => {
      
    });

    Also Since Chrome 68 according to the docs, we can even know if the current page was discarded from memory while being hidden checking the wasDiscarded switch for the current document.

    if (document.wasDiscarded) {
      
    }

    Tracking my pages when they’re restored from bfcache

    Luckily for us, the pageshow event from the Page Lifecycle API that will trigger when a page is loaded, either for the first time or when it is restored from the back/forward cache (bfcache).

    Key points about pageshow event

    • It is triggered when a page is shown, which can occur when navigating to a page for the first time or when returning to a cached page.
    • The event includes a persisted property that indicates whether the page was restored from the bfcache (back/forward cache). If persisted is true, the page was restored from the bfcache.


    Based on this, we can easily add a piece of code to monitor if the page is being restored from the bfcache. For example, we can push a page_view event to our dataLayer or trigger a gtag page_view event directly. Of course, you’ll be able to customize this for your specific needs.

    window.addEventListener('pageshow', (event) => {
      if (event.persisted) {
        window.dataLayer.push({
          'event': 'page_view',
          'pv_origin: 'from_bfcache'
        })
      } 
    });

    Pro Tip: page_view event

    This is personal advice: even if you don’t have an SPA site or don’t worry about the bfcache, it’s still a great idea to set up your system to push a ‘page_view’ event every time. Doing so will not only help you establish a common and shared logic for handling regular page loads, virtual pageviews, SPA sites, and bfcache-restored pages, but it will also allow you to avoid race conditions (e.g., by not relying on triggers like ‘All Pages’, DOM Ready, or CMP callbacks). The key is that every time a ‘page_view’ event is fired, it should contain all the necessary data in the dataLayer allow you to have a starndard and safe way to know when triggers things.

  • Return Of The Custom Task (Google Analytics 4 Redux)

    I’ve released a new library named “Return Of The CustomTask” which as the name it self suggest brings back the Universal Analytics Custom Task functionality to Google Analytics 4.

    It’s an Open Source library under the Apache 2.0 license, that uses Fetch Interceptors to mimic the behavior on the old friend the customTask. The idea of having the change to modify the current GA4 Payload before it gets sent to Google Analytics Servers.

    The library consist on a single file that accepts a list of callbacks ( customTasks ) that will be applied to our request.

    These tasks will be applied sequentially, meaning you can easily apply more than one action, such as checking for PII and removing duplicate purchase events.

    At the same time, I took some time to find all the possible custom tasks by searching on Google, and I’ve already replicated and made them available for everyone. In many cases I’ve even make them even better than the originals 🙂

    I must advise that this is a very technical approach to getting things done, so use it at your own risk. If you’re not a developer, consider seeking help rather than just trying to copy and paste. There’re out there so many great Analytics Engineers and Programmers ( including myself ) that will be able to help on having things setup in the best and more safe way.

    Note: In the coming days, I will be writing specific posts for each of the tasks to ensure that their usage is clear for everyone. In any case, each task folder on GitHub has a README with the basic details to help set things up.

    First Step: Grab the GA4CustomTask code

    After building the library you’ll find all the code within the dist/ folder. The code is provided in minified format and non-minified way. Since you’re not likely going to need to change anything here, i would select the dist/GA4CustomTask.js code ( use the minified code better ). Now they only thing we need to do is adding it into a Custom Html tag on Google Tag Manager o in other TMS or your page source.

    Important Note: This code needs to be run BEFORE GA4 loads, my advise is using the Initialization Trigger or using a Setup Tag on the GA4 Config Tag. We should need to change anything at this point so just copy paste the code. ( Since you won’t need to change anything here, just use the minified code: https://raw.githubusercontent.com/analytics-debugger/Return-Of-The-Custom-Task/refs/heads/main/dist/GA4CustomTask.min.js )

    If you using this library without GTM or using another other TMS the logic should be the same, fire it before GTAG Code.

    <script>
    // dist/GACustomTask.js
    // Use the linkj on the top link for updated code
    (function (global, factory) {
        typeof exports === 'object' && typeof module !== 'undefined' ? module.exports = factory() :
        typeof define === 'function' && define.amd ? define(factory) :
        (global = typeof globalThis !== 'undefined' ? globalThis : global || self, global.GA4CustomTask = factory());
    })(this, (function () { 'use strict';
    
        // Check if the URL belongs to GA4
        function isGA4Hit(url) {
            try {
                var urlObj = new URL(url);
                var params = new URLSearchParams(urlObj.search);
                var tid = params.get('tid');
                var cid = params.get('cid');
                var v = params.get('v');
                return !!tid && tid.startsWith('G-') && !!cid && v === '2';
            }
            catch (e) {
                console.error('Error parsing URL:', e);
                return false;
            }
        }
    
        var interceptors = [];
        // Interceptor function to handle fetch requests and responses
        function interceptor(fetch, args) {
            var reversedInterceptors = interceptors.reduce(function (array, interceptor) { return [interceptor].concat(array); }, []);
            var promise = Promise.resolve(args);
            // Apply request interceptors (resolve to FetchArgs)
            reversedInterceptors.forEach(function (_a) {
                var request = _a.request, requestError = _a.requestError;
                if (request || requestError) {
                    promise = promise.then(function (args) { return (request ? request.apply(void 0, args) : args); }, requestError);
                }
            });
            // Proceed with the original fetch call (resolve to Response)
            var responsePromise = promise.then(function (args) { return fetch(args[0], args[1]); });
            // Apply response interceptors (resolve to Response)
            reversedInterceptors.forEach(function (_a) {
                var response = _a.response, responseError = _a.responseError;
                if (response || responseError) {
                    responsePromise = responsePromise.then(response, responseError);
                }
            });
            return responsePromise;
        }
        var GA4CustomTask = function (settings) {
            if (!settings)
                return;
            interceptors.push({
                request: function (resource, options) {
                    if (options === void 0) { options = {}; }
                    try {
                        if (typeof resource === 'string' && isGA4Hit(resource)) {
                            var url = new URL(resource);
                            var RequestModel_1 = {
                                endpoint: url.origin + url.pathname,
                                sharedPayload: null,
                                events: [],
                            };
                            var payloadArray = Array.from(new URLSearchParams(url.search).entries());
                            if (!options.body) {
                                RequestModel_1.sharedPayload = Object.fromEntries(payloadArray.slice(0, payloadArray.findIndex(function (_a) {
                                    var key = _a[0];
                                    return key === 'en';
                                })));
                                RequestModel_1.events = [
                                    Object.fromEntries(payloadArray.slice(payloadArray.findIndex(function (_a) {
                                        var key = _a[0];
                                        return key === 'en';
                                    })))
                                ];
                            }
                            else {
                                RequestModel_1.sharedPayload = Object.fromEntries(payloadArray);
                                RequestModel_1.events = options.body
                                    .split('\r\n')
                                    .map(function (e) { return Object.fromEntries(new URLSearchParams(e).entries()); });
                            }
                            var payload = Object.fromEntries(new URLSearchParams(url.search));
                            if (settings.allowedMeasurementIds &&
                                Array.isArray(settings.allowedMeasurementIds) &&
                                !settings.allowedMeasurementIds.includes(payload['tid'])) {
                                return [resource, options];
                            }
                            if (Array.isArray(settings.tasks)) {
                                settings.tasks.forEach(function (callback) {
                                    if (typeof callback === 'function') {
                                        RequestModel_1 = callback.call({ originalFetch: GA4CustomTask.originalFetch }, RequestModel_1);
                                    }
                                    else {
                                        console.warn('Callback is not a function:', callback);
                                    }
                                });
                            }
                            var reBuildResource = function (model) {
                                var resourceString = new URLSearchParams(model.sharedPayload || {}).toString();
                                var bodyString = model.events.map(function (e) { return new URLSearchParams(e).toString(); }).join('\r\n');
                                return {
                                    endpoint: model.endpoint,
                                    resource: resourceString,
                                    body: bodyString,
                                };
                            };
                            var newResource = reBuildResource(RequestModel_1);
                            if (options.body) {
                                resource = "".concat(newResource.endpoint, "?").concat(newResource.resource);
                                options.body = newResource.body;
                            }
                            else {
                                resource = "".concat(newResource.endpoint, "?").concat(newResource.resource, "&").concat(newResource.body);
                            }
                        }
                    }
                    catch (e) {
                        console.error('Error in fetch interceptor:', e);
                    }
                    return [resource, options];
                },
                response: function (response) {
                    return response;
                },
                responseError: function (error) {
                    return Promise.reject(error);
                },
            });
            // Ensure fetch is available in the environment
            window.fetch = (function (fetch) {
                return function (resource, options) {
                    var fetchArgs = [resource, options];
                    return interceptor(fetch, fetchArgs);
                };
            })(window.fetch);
            return {
                clear: function () {
                    interceptors = [];
                },
            };
        };
        // Add original fetch for TypeScript type safety
        GA4CustomTask.originalFetch = window.fetch;
        return GA4CustomTask;
    }));
    </script>

    We are on the right path, now we’ll have a new class GA4CustomTask what we can instantiate, for attaching the intercepts to the Fetch API

    <script>
    {{ CODE FROM GA4CustomTask.min.js }}
    var logRequestsToConsoleTask = () => {...}
    var task1= () => {...}
    var task2= () => {...}
    var GA4CustomTaskInstance = new GA4CustomTask({
     allowedMeasurementIds: ["G-DEBUGEMALL"],
     tasks: [
      logRequestsToConsoleTask, 
      task1,
      task2
     ]
    });
    </script>

    We need to focus on the highlighted lines. This tool operates over the Fetch function, but typically we only want to intercept GA4 hits. Don’t worry the tool already detects these hits internally in order to intercept just the requests we need. However, what happens if we’re using two Measurement IDs on our site? On line 3, we can specify which Measurement ID the Custom Task should apply to.

    Then we can define the tasks that will be applied to our payload. On GA4CustomTask is possible to run some chained tasks, and they will sequentially applied. ( The customTask receives the requestModel and returns it back after the task has finished working with it )

    Custom Tasks List

    I went ahead an migrated all customTasks I found on internet to this new library. You can find the list of them an the source code at the repository as packages on the folder /tasks


    Task Name
    Description
    #1 logRequestsToConsoleTaskLogs all requests to the console, for debugging pourposes
    #2 mapClientIdTaskGrabs the clientId (&cid) and attaches the value to the specified parameter
    #3 mapPayloadSizeTaskAttaches the current payload size to the specified parameter
    #4 preventDuplicateTransactionsTaskPrevents Duplicate Purchases/transaations keeping a list of transactions on the cookies/localStorage
    #5 snowPlowStreamingTaskSends a copy of the payload to your SnowPlow Collector
    #6 sendToSecondaryMeasurementIdSends a copy of the payload to a secondary account
    #7 piiScrubberTaskLoops all data in the payload redacting the PII Data
    #8 privacySweepTaskCleans Up all non “Analytics” related parameters/ids

    logRequestsToConsoleTask

    This tasks prints the current requestModel to the console. Useful for debugging pourposes. It doesn’t take any parameters

    var GA4CustomTaskInstance = new GA4CustomTask({
     allowedMeasurementIds: ["G-DEBUGEMALL"],
     tasks: [
      logRequestsToConsoleTask
     ]
    });

    mapClientIdTask

    This task reads the clientId value a passed it back to all the events on the request , or to the first event if the scoped defined is ‘user’

    It accepts 2 parameters, the name to be used for the event parameter / user property and the scope. If the scope is not specified it will be set as ‘event

    var GA4CustomTaskInstance = new GA4CustomTask({
     allowedMeasurementIds: ["G-DEBUGEMALL"],
     tasks: [
      (requestModel) => mapClientIdTask(requestModel, 'client_id', 'event'), 
     ]
    });

    mapPayloadSizeTask

    This task will calculate the total payload size on the current hit, and map is an event parameter ( number ).

    It takes the parameter name as a parameter.

    var GA4CustomTaskInstance = new GA4CustomTask({
     allowedMeasurementIds: ["G-DEBUGEMALL"],
     tasks: [
      (requestModel) => mapPayloadSize(requestModel, 'payload_size'), 
     ]
    });

    preventDuplicateTransactionsTask

    This task will intercept all hits containing at least 1 purchase event on the payload. If the current ep.transaction_id parameter value was already used on the current browser, that specific event will be removed from the request.

    This task relies on Cookies and the LocalStorage for keeping the transactions history. and internally keeps for state management system synched, meaning that if the user removed it’s cookies but not the localStorage the data will be replicated back to the cookie ( and same if they remove the localStorage )

    It takes the cookie name as an optional value, or default to __ad_trans_dedup by default

    var GA4CustomTaskInstance = new GA4CustomTask({
     allowedMeasurementIds: ["G-DEBUGEMALL"],
     tasks: [
         preventDuplicateTransactions
     ]
    });

    var GA4CustomTaskInstance = new GA4CustomTask({
     allowedMeasurementIds: ["G-DEBUGEMALL"],
     tasks: [
      (requestModel) => preventDuplicateTransactions(requestModel, '__transaction_cookie'), 
     ]
    });

    snowPlowStreamingTask

    This task takes the GA4 Payload and sends a copy to the defined snowplow collector endpoint. Since SnowPlow expects one event per request this task generates an individual request for each event on the payload ( keeping the sharedParameter intact )

    You can pass the endpoint Hostname as a parameter.

    var GA4CustomTaskInstance = new GA4CustomTask({
     allowedMeasurementIds: ["G-DEBUGEMALL"],
     tasks: [
       (requestModel) => snowPlowStreaming(requestModel,  endpointHostname),
     ]
    });

    sendToSecondaryMeasurementId

    What to say about this one, a classic. It will replicate out request to a secondary Measurement Ids, but this time, It takes 2 extra parameters: a list of whitelisted events and a list of blacklisted one ( this one will take effect it whitelist is not passed or it’s empty )

    // This will relay ALL the events 
    var GA4CustomTaskInstance = new GA4CustomTask({
     allowedMeasurementIds: ["G-DEBUGEMALL"],
     tasks: [
      (requestModel) => sendToSecondaryMeasurementIdTask(requestModel, ["G-SECONDID","G-ANOTHER"], [], []), 
     ]
    });
    // This will relay only the add_to_cart and purchase events
    var GA4CustomTaskInstance = new GA4CustomTask({
     allowedMeasurementIds: ["G-DEBUGEMALL"],
     tasks: [
      (requestModel) => sendToSecondaryMeasurementIdTask(requestModel, ["G-SECONDID","G-ANOTHER"], ["add_to_cart","purchase"], []), 
     ]
    });

    // This will relay all events but purchase events
    var GA4CustomTaskInstance = new GA4CustomTask({
     allowedMeasurementIds: ["G-DEBUGEMALL"],
     tasks: [
      (requestModel) => sendToSecondaryMeasurementIdTask(requestModel, ["G-SECONDID","G-ANOTHER"], [], ["purchase"]), 
     ]
    });

    privacySweepTask

    This task strips out all the parameter that are not related to Analytics, in case we are wrroried about our privacy and the data going to Google. Useful if we are tracking an intranet or some sensitive environment and we want to have some extra privacy added.

    You can find the list of current parameters on the repository

    var GA4CustomTaskInstance = new GA4CustomTask({
     allowedMeasurementIds: ["G-DEBUGEMALL"],
     tasks: [
      privacySweepTask 
     ]
    });

    EventBouncerTask

    Lastly (for now), we have our Bounce Task. We can define a list of events that we want to allow through our implementation, preventing all those pesky vendors and script kiddies from pushing events to the gtag() function to mess with our data.

    But not only that, we can define which parameters we want to allow (WhiteListedEventParameters), which will strip out any parameter that is not listed from the current event.

    The function takes a Schema definition object to work

    var GA4CustomTaskInstance = new GA4CustomTask({
     allowedMeasurementIds: ["G-DEBUGEMALL"],
     tasks: [
        (request) => eventBouncerTask(requestModel, {    
        "sharedEventParameters": ["page_type"],
        "events": {
            "page_view": {
                "wlep": []
            },
            "add_to_cart": {
                "wlep": []
            }
        } 
    }),    
     ]
    });

    In the next days we’ll writing a specific post for each task with more specific details about how to use each of the tasks:)

    Enjoy.

  • Tracking user’s IP Autonomous System Number and Organization details to prevent the spam

    Around end of 2019, Google Analytics dropped the Network Domain and Service Provider dimensions support from their reports making an official announment in February about it.

    These 2 dimensions, where widely used to fight the spam in Google Analytics and there have been a lot of posts going around this topic in the last months. Simo Ahava wrote about how to collect the ISP data third party service in you want to check it.

    On this post we’ll learning what’s an Autonomous System and how we could use this info to try to fight the spam. And coolest part is that we’ll be able to use a free database for this. Continue reading 🙂

    There are some other services and commercial databases that will provide this details, but let’s be honest there’re some big handicaps:

    • If you use a free services, you will hit the limit quota quickly
    • If you have a high traffic website this is not going to be cheap

    There’re basically 3 different types of subscriptions, SaaS ( they host the app and the database, DB ( you host the Database and the query system ), WebService.

    I’m attaching a list of some of the providers available, in case you want to check them.

    SaaSDBWebServiceUpdates
    MaxMind✅ ✅ ✅ Weekly/Monthly
    IP2Location✅ ✅ ✅ Monthly
    IPStack✅ Hourly
    ip-api✅ ?
    ipgeolocation.io✅ ✅ Weekly
    db-ip✅ Monthly/Daily
    Abstract IP GeolocationDaily

    In any case there are a lot of posts around this topic on the web, and I’m trying to give this issue a new solution.

    MaxMind provides their GEO LITE databases for Free , these database are updated weekly ( on Tuesdays to be exact ) and they provide info about:

    • Countries
    • Cities
    • ASN

    The main difference on this databases with the paid ones is how accurate they are and how often they get updated. This accuracy may be an problem when we need to target users based on their city, but this time this is not what we’re looking for, we’ll looking at their ASN database.

    If you are wondering ASN stands for Autonomous System Number. According to the Wikipedia:

    An autonomous system (AS) is a collection of connected Internet Protocol (IP) routing prefixes under the control of one or more network operators on behalf of a single administrative entity or domain that presents a common, clearly defined routing policy to the internet.[1]

    https://en.wikipedia.org/wiki/Autonomous_system_(Internet)

    ASNs are a “big” routers on the ISPs and datacenters that are in charge of announcing the IP addreses they hold. ( sorry for this unaccurate description, trying to make this simple ) in order to let other AS to know how to reach their IP addreses.

    Each ISP usually have their own ( they can have more than 1 ) . ASN. For example one of main ASN in Google is: AS15169 registered to Google LLC, and this Autonomous System manages 9.5 millions IPs from Google:

    https://ipinfo.io/AS15169

    This means that we could query any IP address we and the ASN database will return their current ASN that it belongs to.

    For example we may query Google DNS’s IP address: 8.8.8.8 and the database will return the AS number and the organization name:

    Array
    (
        [autonomous_system_number] => 15169
        [autonomous_system_organization] => GOOGLE
    )

    Some other examples let’s query for this Fastly CDN IP address 151.101.134.133

    Array 
    (
       [autonomous_system_number] => 54113 
       [autonomous_system_organization] => FASTLY
    )

    Or let’s query for an IP in a dedicated servers provide like LiquedWeb

    Array
    (
        [autonomous_system_number] => 32244
        [autonomous_system_organization] => LIQUIDWEB
    )

    We could use the AS Number and the Organization names as a way to try to catch the spam, since most spam traffic is likely going to come from a co-location / vpn providers that we could identify this way.

    Since it’s a database we’ll need to setup a small endpoint in our domain in order to be able to query it. This implies some IT development but in the other side it has some big wins:

    There will be NO query limits.

    The cost of having this solution running is the cost endpoint development

    We could have our website developer querying this info via server-side and have this data pushed to the dataLayer instead of needing to have an extra XHR request and needing to delay the hits, YAY!

    Now, in the order side of the road there some handicaps:

    • Not as accurate data as network/domain in other databases
    • Data freshness accuracy won’t be premium, but as we all know GA wasn’t either.

    Getting the ASN DB

    As I’ve mentioned above the GeoLite ASN database is free and you’ll be able to get it after signup for a free account at : https://dev.maxmind.com/geoip/geoip2/geolite2/

    PHP Example

    Another good point is that MaxMind already provides libreries for PHP/NodeJS/Perl and other languages to help on reading querying their GEOLite databases, which helps on setting up our endpoint.

    As usual I’m providing a example for PHP, since it’s the most widly used language and the one that it’s avaiable on almost any hosting around the world

    If we don’t have composer installed yet, that’s gonna be our first step:

    curl -sS https://getcomposer.org/installer | php

    next, we’ll be installing the needed dependences

    php composer.phar require geoip2/geoip2:~2.0

    <?php
    require_once 'vendor/autoload.php';
    use GeoIp2\Database\Reader;
    $ip_as_details = new Reader('geo/GeoLite2-ASN.mmdb');
    $asn_details = $ip_as_details->get('8.8.8.8');
    // As this point we could build a JSON and send it back to the browser.
    print_r($asn_details);

    Last step will be passing back this info to Google Analytics using a custom dimension, so we can use it in our filters or segments.

    Extra – Grabbing the network domain

    I was about to publish the post and I decided to add a little extra , let’s also learn how to track the “network domain” .

    Google Analytics was using the IP’s PTR for the “network domain” . Again you may wonder what’s “PTR” , and it stands for “Pointer record” and it basically resolves an IP to a FQDN ( fully-qualified domain name ). This is it’s the inverse of a A DNS Record.

    For example we can make a Reverse IP Lookup to google DNS’s and it will return “dns.google”.

    root@sd1:/# nslookup
    > set q=ptr
    > 8.8.8.8
    Server:         8.8.8.8
    Address:        8.8.8.8#53
    Non-authoritative answer:
    8.8.8.8.in-addr.arpa    name = dns.google.

    Or we may try with one Google Bot IP address, which most sea must be familiar

    > set q=ptr
    > 66.249.66.1
    Server:  dns.google
    Address:  8.8.8.8
    Non-authoritative answer:
    1.66.249.66.in-addr.arpa        name = crawl-66-249-66-1.googlebot.com

    Last example let’s query google.com IP address

    > set q=a
    > google.com
    Server:  dns.google
    Address:  8.8.8.8
    Non-authoritative answer:
    Name:    google.com
    Address:  172.217.17.14
    > set q=ptr
    > 172.217.17.14
    Server:  dns.google
    Address:  8.8.8.8
    Non-authoritative answer:
    14.17.217.172.in-addr.arpa      name = mad07s09-in-f14.1e100.net

    If we want to have the network domain info back in our GA reports we’ll just need to parse the hostname of the PTR for grabing just the root domain, on this last case it would be: 1e100.net .

    I wouldn’t advise about tracking to full ptr hostname for 2 reasons: First mosts of hostname are a mix of the IP address + a the ISP domain which will be agains the GDPR ( we cannot record the user’s IP address ) and also it will create a high cardinality which won’t help on analyzing the data.

    Now, remember that we were building and endpoint in PHP to get the ASN details, just some more lines of data would allow to have the network domain pushed into our datalayer! 🙂

    $ip_ptr = gethostbyaddr('8.8.8.8');

    Dealing with getting the root domains, can be a pain task due to all the new domain tlds and needing to have in mind the third level tlds. In case you want to have this done easily you can use the following PHP library https://github.com/utopia-php/domains , which will let you grab the “registable” domain name within a hostname

    require_once '../vendor/autoload.php';
    use Utopia\Domains\Domain;
    // demo.example.co.uk
    $domain = new Domain('demo.example.co.uk');
    $domain->get(); // demo.example.co.uk
    $domain->getTLD(); // uk
    $domain->getSuffix(); // co.uk
    $domain->getRegisterable(); // example.co.uk
    $domain->getName(); // example
    $domain->getSub(); // demo
    $domain->isKnown(); // true
    $domain->isICANN(); // true
    $domain->isPrivate(); // false
    $domain->isTest(); // false

    I’m providing the example in PHP language, but it doesn’t mean you have to use it at all, this code/idea can be developed on almost any server-side language you may be using. In the last instance you run a small VM or VPS to have a PHP environment where you can host your endpoint :).

  • sameSite Automated Fix and status reporting tool

    It has been a hard week with all these vendors announcing the Four Hoursemen of the Cookies Apocalypse arrival.

    There’re a lot of changes coming when we talk about cookies ( 1st, 3rd party ), ITP, GDPR, CCPA,etc . I understand it may be a terrible headache for anyone but we need to keep some calm.

    Last update has came from Google Chrome, which was expected to start blocking the cookies not containing the sameSite attribute on 4th February. Luckily they have postponed around 2 weeks ( for now ).

    One of the main concerns about this latests Chrome update 80 is that it’s not up to us to fix things ( or it is? ). If a cookie is being set via JS, that JS owner is the one in charge for setting the cookie properly. So we may expect some old libraries not being updated on time, or some vendors not even caring about taking care of this properly.

    In order to deal with the sameSite switch for cookies, I’m releasing today a JS snippet that will hook in the document.cookie execution flow and will take care of two main tasks:

    • Reporting all the cookies that are not setting the sameSite value properly ( remember they need to have the sameSite and the Secure switches )
    • If we decide to turn it on, the script will be able to automatically add sameSite and Secure parameters automatically!

    The script I’m sharing will take care reporting on the window.load event a list of cookies that didn’t either set the sameSite or the Secure switches for the cookies, and will also report the cookie name, if the cookie setting has been autofixed and the original and fixed cookie setting string.

    {
    'event': 'samesite-cookies-report',
    'cookiesList': [{
                                'cookieName': {{cookie.name}},
                                'cookieSameSite': {{cookie.secure}},
                                'cookieSecure': {{cookie.sameSite}},
                                'autofixed': {{autoFix}},
                                'originalCookieString': {{cookie.string}},
                                'fixedCookieString': {{cookie.fixedString}}
    }]
    }

    Then based on the samesite-cookies-report event on Google Tag Manager you could push this details as an event to Google Analytics or report it to anywhere else.

    Main point of this script is being able to monitorize any cookie being set somewhere in our website, so we can contact their stakeholder to have it fixed as soon as posible

    (function() {
      try {
        // Set this to true if you want to automatically add the sameSite attribute        
        var autoFix = false;
        var cookiesReportList = [];
    
        // Detecting if the current browser is a Chrome >=80
        var browserDetails = navigator.userAgent.match(/(MSIE|(?!Gecko.+)Firefox|(?!AppleWebKit.+Chrome.+)Safari|(?!AppleWebKit.+)Chrome|AppleWebKit(?!.+Chrome|.+Safari)|Gecko(?!.+Firefox))(?: |\/)([\d\.apre]+)/);
        var browserName = browserDetails[1];
        var browserVersion = browserDetails[2];
        var browserMajorVersion = parseInt(browserDetails[2].split('.')[0]);
    
        // We only want to hook the cookie behavior if it's Chrome +80 
        if (browserName === 'Chrome' && browserMajorVersion >= 80) {
          var cookie_setter = document.__lookupSetter__('cookie');
          var cookie_getter = document.__lookupGetter__('cookie');
    
          Object.defineProperty(document, "cookie", {
            get: function() {
              return cookie_getter.apply(this, arguments);
            },
            set: function(val) {
              var cookie = {
                name: '',
                sameSite: false,
                secure: false,
                parts: val.split(';'),
                string: val
              }
              cookie.parts.forEach(function(e, i) {
                var key = e.trim();
                cookie.parts[i] = e.trim();
                if (i === 0) {
                  cookie.name = key.split('=')[0];
                }
                if (key.match(/samesite/)) {
                  cookie.sameSite = true;
                }
                if (key.match(/secure/)) {
                  cookie.secure = true;
                }
              });
              if (cookie.sameSite === false || cookie.secure === false) {
                if (autoFix === true && document.location.protocol==="https:") {
                  if (arguments[0][arguments[0].length - 1]) {
                    arguments[0] = arguments[0].substring(0, arguments[0].length - 1);
                  }
                  if (cookie.sameSite === false) {
                    arguments[0] = arguments[0] + '; sameSite=None';
                  }
                  if (cookie.secure === false) {
                    arguments[0] = arguments[0] + '; secure';
                  }
                }
                cookiesReportList.push({
                  'cookieName': cookie.name,
                  'cookieSameSite': cookie.sameSite,
                  'cookieSecure': cookie.secure,
                  'autofixed': autoFix,
                  'originalCookieString': cookie.string,
                  'fixedCookieString': arguments[0]
                });
              }
              return cookie_setter.apply(this, arguments);
            }
          });
        }
        window.addEventListener('load', function(event) {
          dataLayer.push({
            'event': 'samesite-cookies-report',
            'cookiesList': cookiesReportList
          });
        });
      } catch (err) {}
    })();

    On the top of code you may see the following line:

    var autoFix = false;

    ok, if we change this to true, the script will automatically take care of accordingly adding the missing parts 🙂

    One of the things to have in mind is that we need this code to be run as soon as possible on the page execution flow, so if we’re setting this via GTM, we’ll need to setup this tag to fire on the first event on the page ( most of time will be “All Pages ), and give it some extra priority:

    WARNING: If you only plan to use this script as a reporting tool you can stay safe. If you plan to use the autofixing feature, please have in mind that I only tested in some sites of mine, so it’s your liability to properly setting and testing it up in your site before going live.

    If you’re really interested on knowing more about how cookies are updating their behaviour to protect the users privacy, best place is https://www.cookiestatus.com/ . A site where my friend Simo is collecting all the info about almost all the mainstream browsers out there.