Blog

  • Tracking the anchor text for the incoming links in Google Tag Manager

    Tracking the anchor text for the incoming links in Google Tag Manager

    Introduction

    It’s been a long time since I took care of this blog’s “Analytics” ( In the blacksmith’s house, a wooden knife). And I noticed that would be cool having the info about the Anchor Text the sites referring to my sites are using to link me.

    So I’m sharing the solution I built today in order to capture which Anchor Text was on the referring URLs and sending the info back to Google Tag Manager and from there we’ll be able send an event to APP+WEB or to any other place we want 🙂



    How it works


    Execution Flow Chart

    The flow chart on the right side, shows how the executions flow works. We’ll have 2 main pieces:

    – One GTM CUSTOM HTML Tag
    – One PHP File

    The first one will the responsible of doing the main logic and doing a XMLRequest call to the second one that will take care of reading the current visitor referrer page and scrape it in order to try to find the current Anchor Text that the user clicked.

    We’re using extense logic to void any kind of false positives/duplicate hits. For example when an user goes back into a mobile phone or swipes. We don’t want to consider these “page reloads” as landings despite they may still hold a valid referrer info.

    SERVER SIDE CODE

    PHP Snippet Code

    First we need to upload the following php snippet to any server supporting PHP 7.x ( because of the use of arrays literals ).

    This code can be highly improved for example for adding a timeout in the case the page is not reachable. If someone asks I may adding more sanity check for the script.

    // David Vallejo (@thyngster)
    // 2020-04-14
    // Needs PHP7.X
    
    if(!isset($_GET["url"])){
            die("missing url parameter");
    }
    
    $links = [];
    if(isset($_SERVER["HTTP_REFERER"])){
            $url = $_GET["url"];
            $referrer_link_html_content = file_get_contents($url);
            $current_domain = str_replace("www.","", parse_url($_SERVER["HTTP_REFERER"], PHP_URL_HOST));
            $doc = new DOMDocument();
            $doc->loadHTML($referrer_link_html_content);
    
            $rows = $doc->getElementsByTagName('a');
            foreach ($rows as $row)
            {
                    if($row instanceof DOMElement){
                            preg_match_all('/'.$current_domain.'/i', $row->getAttribute('href'), $matches, PREG_OFFSET_CAPTURE);
                            if(count($matches[0]) > 0){
                                    $links[] = [
                                            "url" => $row->getAttribute('href'),
                                            "anchor_text" => $row->textContent
                                    ];
                            }
                    }
            }
    }
    header('Content-type: application/json; charset=UTF-8');
    header("Access-Control-Allow-Origin: *");
    echo json_encode($links, JSON_PRETTY_PRINT | JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES);
    exit;

    Python Snippet code

    I know this code is not the best one since I’m not a python coder, but it can give an overall idea about how to run this based on the Python.

    should be used like:

    python anchor.py REFFERER_LINK LINKTOSEARCH

    # use: python anchor.py REFFERER LINKTOSEARCH
    #!/usr/bin/env python
    import json
    import urllib2
    import requests
    import sys
    from bs4 import BeautifulSoup
    from urlparse import urlparse
    
    links = []
    
    if len(sys.argv) > 1:
        url = sys.argv[1]
    else:
        print("URL argument is missing")
        sys.exit()
    
    if len(sys.argv) > 2:
        referrer = sys.argv[2]
    else:
        print("REFERRER argument is missing")
        sys.exit()
    
    headers = {'User-Agent': 'Mozilla/5.0'}
    response = requests.get(url, headers = headers)
    soup = BeautifulSoup(response.text, "html.parser")
    
    for ahref in soup.select('a[href*="'+urlparse(referrer).netloc.replace("www.", "")+'"]'):
            links.append({
                    "url": ahref.attrs["href"],
                    "anchor_text": ahref.text
            })
    
    print json.dumps(links, sort_keys=True,indent=4, separators=(',', ': '))

    GTM Custom HTML Code

    NOTE Remember that the following code needs to be added to GTM wrapped between <script></script> tags!

    Also remember that we need to update the endPointUrl value to the domain where we’ve uploaded the PHP script

      (function(){
        try{
    	  var endPointUrl = 'https://domain.com/getLinkInfo.php';
          // We don't want this to run on page reloads or navigations. Just on Real Landings
          if (window.performance && window.performance.navigation && window.performance.navigation.type === 0) {
              var referrer = document.referrer;
              var current_url = document.location.href;
    
              var grab_hostname_from_url = function(url) {
                  var h;
                  var a = document.createElement("a");
                  a.href = url;
                  h = a.hostname.replace('www.', '');
                  return h;
              }
              // Only continue if the current referrer is set to a valid URL
              if (referrer.match(/^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\$&'\(\)\*\+,;=.]+$/)) {
                  // current referrer domain != current_domain
                  console.log(grab_hostname_from_url(grab_hostname_from_url(referrer).indexOf(grab_hostname_from_url(current_url)) === -1))
                  if (grab_hostname_from_url(referrer).indexOf(grab_hostname_from_url(current_url)) === -1) {
                      fetch(endPointUrl+ '?url=' + referrer).then(function(response) {
                          return response.json();
                      }).then(function(json) {
                          json.forEach(function(link) {
                              if (current_url.indexOf(link.url)>-1) {
                              //if (current_url===link.url.indexOf) {
                                  window.dataLayer.push({
                                      event: 'incoming-link',
                                      linked_url: link.url,
                                      landing_url: document.location.href,
                                      referring_url: referrer,
                                      anchor_text: link.linkText
                                  });
                              }
    
                          })
                      });
                  }
              }
          }
          
        }catch(e){}   
      })();

    Now we’re only one step away of having this working, we’ll need to setup a firing trigger for our tag, this ideally should be the All Pages trigger to get it fired asap.

    Reported Data Info

    dataLayer KeydataLayer Value
    eventincoming-link
    linked_urlCurrent Link in the Referral Page
    landing_urlCurrent URL
    referring_urlFull Referrer Info
    anchor_textThe Anchor Text on the referrer page linking to your site

    Caveats

    Please note that this solution relies on the current document.referrer, so don’t expect it to work for all referrals since some of them may be stripping the full referrer info, like Google SERPS do, or even some browser may end stripping the referrer details down to origin for privacy reason.

    Also it may happens that the referring URL is linking to us in more than 1 place, on this case the scraping endpoint will return all the links and anchors texts matching. From that point of, it’s up to you how you report it in Google Analytics or any too 😀

    In any case this should work for most of the common referrals traffic.

    Working Demo Video

  • APP + WEB: Google Analytics  Measurement Protocol version 2

    APP + WEB: Google Analytics Measurement Protocol version 2

    The Google Analytics Measurement Protocol allows users and developers to make HTTP requests directly to Google Analytics endpoint in order to measure how users interact from any enviroment/platform.

    Since Google announced the new APP+WEB Properties back in summer, we noticed that the &v parameter that used to hold a fixed 1 value turned to be a =2 value in our hit requests. Which implicitily means that at some point a new version of the Measurement Protocol is going to be released.

    I tried to reverse-engineer all the details I could about the parameters used on this new upcoming protocol.

    Please have in mind that the , and I’m publishing all the info I was able to gather.

    Introduction

    The new Measurement Protocol cames with some great new improvements over the version 1 that we’re used to see in our Universal Analytics hits.

    I’d try to think about this new protocol as an enhanced version of the previous one. They even share some parameters.

    What’s new on the version 2 protocol

    This new measurement protocol seems to had been designed having some performance optimizations in mind.

    First thing we need to have in mind is that APP+WEB doesn’t longer have “hit types“, everything we may end sending to APP+WEB is an “event” that may (or may not) be accompanied with parameters.

    There 2 groups of parameters in the APP+WEB Measurement Protocol .
    Let’s think about them as the event parameters “scope“.

    • Event Related Parameters ( ep.* , epn.* )
    • User Related Parameters ( up.* , upn.* )

    Also the parameters accepts 2 diferente values types:

    Batched Events

    Now by default APP+WEB Protocol allows to send batched events, meaning that with a single hit request we’ll be able to send multiple events. I know this is not new at all, and we ever needed to debug an APP implemention we’d have noticed that version 1 protocol allowed us to send batched hits ( via /batch endpoint ).

    In any case v2, comes with some extra enhanced comparted with the legacy version,

    • Events within a single hit request share parameters. So the hits payload will the smaller. for example won’t make much sense sending the &dl document.location for all the events if that’s a shared value across all event within the current hit.
    • POST is now the only accept Method. This will bypass the old GET 1082 bytes limit.

    Debugging

    Debugging the new measurument protocol v2 has became even easier, since the new properties offer a Debug View.

    In order to have our hits showing up here, we’ll need to add a _dbg=1 parameter to our hits.

    &_dbg=1

    Then our hits will show up in the DebugView report in real time, making our debugging efforts much easier that they actual are.

    Turning on the debug on the web based library

    If you’re working on a website based implementation you can turn on the “oficial” debugging logs just loading the GTAG container with the &dbg={{randomNumber}} parameter:

    https://www.googletagmanager.com/gtag/js?id=G-XXXXXXX&l=dataLayer&cx=c&dbg=918

    This will turn on the debug output into our browser, giving us a log of detailed info about what’s happening.


    Building a request

    APP+WEB hits need to go to a new endpoint that is located on the following URL:

    https://www.google-analytics.com/g/collect

    As we mentioned in our technical overview for the new APP+WEB Properties now the hits al built in 2 separate parts:

    • URL QueryString will hold the common parameters
    • Request Payload ( POST ), this will hold the events related data.

    The Request Payload will only be available when there’re more than 1 event on the current hit request. If the hit only contains one event, the parameter will be attached to the QueryString as the rest of the common shared parameters

    The following code will help us to understand how should be build a hit, and also how to send it to APP+WEB Endpoint using the navigator.sendBeacon function.

    // APP+WEB Endpoint
    var endPoint = 'https://www.google-analytics.com/g/collect';
    
    // Base Event Model for Web Hit
    var eventModel = {
        v: 2,
        tid: 'G-XXXXXXXX-0',
        _p: Math.round(2147483647 * Math.random()),
        sr: screen.width + 'x' + screen.height,
        _dbg: 1,
        ul: (navigator.language || "").toLowerCase(),
        cid: '1908161148.1586721292',
        dl: 'https://appweb.thyngster.com/',
        dr: '',
        dt: 'APP + WEB Measurement Protocol version2 DEMO',
        sid: new Date() * 1,
        _s: 1
    }
    
    // A queue to batch our events
    var events = [];
    
    var requestQueryString;
    var requestBody;
    
    // Let's push some events 
    events.push({
        'en': 'pageview'
    });
    // Second Event
    events.push({
        'en': 'scroll',
        '_et': '5000',
        'epn.percent_scrolled': '90'
    });
    // Another more event
    events.push({
        'en': 'useless_no_bounce_event',
        '_et': '5000',
        'ep.no_bounce_time': '5sec'
    });
    
    // Is there any event in our queue?
    if (events.length > 0) {
        // If there's only one event, we'll not pushing a body within our request
        if (events.length === 1) {
            Object.assign(eventModel, events[0]);
        } else {
            requestBody = events.map(function(e) {
                return (Object.keys(e).map(key=>key + '=' + e[key]).join('&'));
            }).join("\n");
        }
        requestQueryString = Object.keys(eventModel).map(key=>key + '=' + encodeURIComponent(eventModel[key])).join('&');
        navigator.sendBeacon(endPoint + '?' + requestQueryString, requestBody);
    }
    APP + Web Measurement Protocol v2 Hit Example . Just 1 Event

    Parameters Reference

    Request Parameters

    These parameters are available across all hits. There are related to the current hit.

    ParameterValue TypeValue
    vintProtocol Version
    tidstringStream ID ( G-XXXXXXXXX )
    cidstringClient ID Value
    sidstringSession ID . ( current session start TimeStamp )
    srstringScreen Resolution
    _dbgboolDebug Switch
    ulstringUser Language
    _fid
    _ucibool
    _p
    gtmstringContainer Hash
    _sintegerSession Hits Count

    Shared Parameters

    ParameterValue TypeValue
    dlstring (url)Document Location
    drstring (url)Document Referer
    dtstringDocument Title
    sidstringSession ID
    sctintegerSession Count
    segbooleanSession Engagement
    _fvboolFirst Visit
    _nsiboolNew Session Id
    _ssboolSession Start
    custringCurrency Code
    _c

    Event Parameters

    ParameterValue TypeValue
    enstringEvent Name
    _etintegerEvent Time
    up.*stringUser Parameter String
    upn.*numberUser Parameter Number
    ep.*stringEvent Parameter String
    epn.*numberEvent Parameter Number

    Ecommerce

    NOTE: I want to add that this was live on the latest gtag version one week ago, and that it seems it has been removed. In any case I wouldn’t expect to have changes on the final release.

    We’re splitting the parameters related to the Ecommerce on 3 categories. We need to have in mind that APP+WEB have 2 main groups of models for the Enhanced Ecommerce, the Products Model and the Promotions Model.

    Products Model, is used in every single ecommerce event that is sent to Google Analytics . Which includes product listings, products clicks, product details views, products adds to cart, products remove from cart, product checkout, products purchases and products refunds.

    Promotions Model, this is the second model, this is for the promotions tracking in the Enhanced Ecommerce, since they’re not directly related to a product this is a total aside model used on APP+WEB

    • Product Items ( Shared Product Related data )
    • Product List Details ( Product Lists Related data , this goes along with Product Items )
    • Promotions

    Product Items

    Products Items are send under it’s own incremental key, &pr1, &pr2&prN . Then each of these parameters will hold all the product model info.

    Example:

    &pr1': 'idP12345~nmAndroid Warhol T-Shirt~lnSearch Results~brGoogle~caApparel/T-Shirts~vaBlack~lp1~qt2~pr2.0',

    As you can see we can split the data within this parameter key by the tilde character ( ~ ) to be able to see a proper Product Model

    id: P12345
    nm: Android Warhol T-Shirt
    ln: Search Results
    br: Google
    ca: Apparel/T-Shirts
    va: Black
    qt: 2
    pr: 2.0
    ParameterValue TypeValue
    pr[0-9]idstringProduct ID/Sku
    nmstringProduct Name
    brstringProduct Brand
    castringProduct Category Hierarchy Level 1
    ca2stringProduct Category Hierarchy Level 2
    ca3stringProduct Category Hierarchy Level 3
    ca4stringProduct Category Hierarchy Level 4
    ca5stringProduct Category Hierarchy Level 5
    vastringProduct Variant
    prnumberProduct Unit Price
    qtintegerProduct Quantity
    cpstringProduct Coupon
    dsnumberProduct Discount

    Product Impressions

    These are the Measurement Protocol related parameters to the products Impressions. They are complimentary to the product items. Expect these on the product impressions and product clicks events

    ParameterValue TypeValue
    lnstringList Name
    listringList ID
    lpstringList Position

    Transaction Related Data

    The next table shows the parameters related to the transacion info.

    ParameterValue TypeValue
    ep.transaction_idstringTransaction ID
    ep.affiliationstringTransactionm Affiliation
    epn.valuenumberTransaction Revenue
    epn.taxnumberTransaction Tax
    epn.shippingnumberTransaction Shipping
    ep.couponstringTransaction Coupon

    Promotions

    And finally the next table shows the parameters related to the promotions tracking. We should expect these parematers to be showing up into the promotion views and promotion clicks events

    ParameterValue TypeValue
    pistringPromotion ID
    pnstringPromotion Name
    cnstringCreative Name
    csstringCreative Slot (Position )
    lostringLocationo ID

  • Tracking the Protocol version in Google Analytics via Google Tag Manager

    Despite you being a SEO or not, I’m sure you’re aware of how important the WPO ( Web Performance Optimization ) and this of includes of course how fast your site loads. The faster it loads the better for your users ( and better for the conversion rates they say … ).

    At this point you may have heard about HTTP/2 (2015) , which the replacement for the oldie HTTP/1.1 ( 1995) , you have even heard about http/3 ( last draft Feb 2020 ), which is ever a more modern Hypertext Transfer Protocol, witch runs over QUIC transport layer protocol and that now run over UDP instead of TCP.

    Ok, I know all this may be too much unneeded technical details, but I found some clients that may have some different websites/servers, and they need to track their sites performs

    Sooo, this time we’re going to learn how to track the request protocol version using for loading the current page and pushing it back to Google Analytics as a Custom Dimension.

    We’ll need to create the following Custom JavaScript Variable in Google Tag Manager, We’ll be using it later in our Google Analytics Tags.

    // getProtocolVersion()
    function(){
        // Search on performance API for the navigation type entry and grab the info if available
        if(window.performance && window.performance.getEntriesByType("navigation") && window.performance.getEntriesByType("navigation").length > 0 && window.performance.getEntriesByType("navigation")[0].nextHopProtocol){
            return window.performance.getEntriesByType("navigation")[0].nextHopProtocol;        
        // This is deprecated on Chrome, but better checking in in case performance fails :)    
        }else if(window.chrome && window.chrome.loadTimes() && window.chrome.loadTimes().connectionInfo){
            return window.chrome.loadTimes().connectionInfo;        
        }else{
            // If nothing is available let's record the Scheme
            return document.location.protocol ? document.location.protocol.match(/[^:]*/)[0] : "(not set)";
        }
    }

    This piece of code mainly relies on the window.performance API from the browser, If it’s not available for any reason ( old browsers ) , a (not -set) will be set. ( NOTE: There’s a deprecated API in Chrome Browers: chrome.loadTimes(), that we’ll be checking in case performance is not available ).

    What we do is checking for the “navigation” type entry in the performance API. Since we just need to know the main html request protocol details. ( the request that contains our HTML source )

    After that we should be able to see the info in the preview mode, check the following screenshot:

    Now we just need to create a new custom dimension index ( hit scope ) and map the value to this newly created variable. Or pass it as a Parameter to the page_view event if you’re already using APP+WEB Properties

  • Tracking Android In-App visits in Google Analytics

    This is going to be a quick post about how to track in-app visits from Android devices.

    When an Android App opens a website in a webview ( in-app visit ), the visit usually comes with an special referrer, It does start with “android-app” referrer string, here you can see a log line about how the referrers comes up.

    77.XXX.XXX.XXX - - [20/Mar/2020:11:20:10 +0000] "GET /in-app-test HTTP/1.0" 200 1580 "android-app://org.telegram.messenger" "Mozilla/5.0 (Linux; Android 10; GM1913) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Mobile Safari/537.36"

    Since this is a non standard referrer format (it doesn’t start with http), we could easily detect these visit with these rules:

    • It doesn’t start with “http”
    • It isn’t an empty value

    Refer to the following trigger for Google Tag Manager:

    This will only fire on the landing page ( subsequents pageviews will have a referrer starting with http ),then we’ll just have an event per session, We could for example fire a non-interactional event to Google Analytics.


    And, then the events will start showing up in the real-time reports,


    We could also calculate this within a Custom Variable in Google Tag Manager , and use it to force the visit attribution if needed, since Google Analytics will ignore the non-standard referrers and reports them as Direct Traffic.

    function(){
      if (!document.referrer.match(/^http.*/) && !document.referrer.match(/^$/)){
          return document.referrer;
      }  
    }

    Before someone asks it’s not possible to detect the in-app visits from iOS afaik, and this won’t help on tagging these apps that open up a link into an standalone browser (like Whatsapp does).

    Also I’m not 100% sure that all Android versions/apps will report thing the same way, but it seems to be common on latests android versions and major apps.



  • #Discussion :: GDPR Compliance – Google Analytics Setup Proposal

    NOTE: I want to start this post with a big disclaimer over it I’m not publishing it in order to tell anyone how they should be doing the Google Analytics tracking to comply with the GDPR / CCPA .


    The goal of this post being able to start an open discussion about the reliability of this exposed method and any final decision should be taken the site owners under their own responsability.

    One of biggest issues I ( my clients ) are hitting when implementing a hard “cookies-consent wall” is that they would likely lost all the attribution info for at least all the people that bounces. Which can be a disaster if you use Google Analytics for reporting about how your investments in marketing are working. ( not to mention that losing the info about pageviews, sessions, for all that many amount of traffic ).

    Let me show you my proposal for setting up Google Analytics for when the users didn’t yet selected an option for their cookies preferences:

    Then, what are we doing here:

    • If the current user didn’t yet selected his preference, we’ll be launching a pageview hit to Google Analytics
    • This is not an standard hit/tracker initialization. It’s a stateless tracker with all the cookies writing disabled, the IP Anonymization enabled and with the AdsFeatures forcely set to false.
    if (!userConsent) {
      ga('create', 'UA-123123123-123', 'auto', {
        'storage': 'none',
        'storeGac': false,
        'anonymizeIp': true,
        'allowAdFeatures': false
      });
      // We'll save the current clientId into a variable,
      // if later on, the user gives it's consent, we'll be using 
      // to write the cookie
      ga('set', 'customTask', function(tracker) {
        window._gacid = tracker.get('clientId');
      });
      ga('send', 'pageview');
    }

    At this point when the user lands we’ll be launching a pageview in order to track that session start, but no cookie will be used ( if the users reloads a new clientId will be genarated ). If at some point the user accepts the cookies, we’ll write down the uses random-generated-clientId into the cookie and we’ll be able to properly track the user journey.

    All the tracking happens ( imo ) in a first-party content, and we’re respecting the user privacy while we takes a decision. It’s just an extra “anonymized” session starting hit, that will allow to keep a vision from where our traffic is coming.

    Of course after the user has choosen not to be tracked, so this should only be used while our “consent-cookie” is not present, from that point on, we should obey to what our cookies states.

    I really feel this respects the GDPR since there won’t be any cookies if the users doesnt’ explicitly allow it, and we’ re taking extra steps to protects the user privacy in all other ways we can when sending the hit.

    In any case, I’m not a lawer nor an expert on user-privacy, so I’d love to have feedback from other people on this.

    DISCLAIMER: This post in NOT mean to show a law-approved way to use Google Analytics. Please get a proper advise from an user-privacy expert or from your lawer before implementing your tracking the way is showed on this post.

  • sameSite Automated Fix and status reporting tool

    It has been a hard week with all these vendors announcing the Four Hoursemen of the Cookies Apocalypse arrival.

    There’re a lot of changes coming when we talk about cookies ( 1st, 3rd party ), ITP, GDPR, CCPA,etc . I understand it may be a terrible headache for anyone but we need to keep some calm.

    Last update has came from Google Chrome, which was expected to start blocking the cookies not containing the sameSite attribute on 4th February. Luckily they have postponed around 2 weeks ( for now ).

    One of the main concerns about this latests Chrome update 80 is that it’s not up to us to fix things ( or it is? ). If a cookie is being set via JS, that JS owner is the one in charge for setting the cookie properly. So we may expect some old libraries not being updated on time, or some vendors not even caring about taking care of this properly.

    In order to deal with the sameSite switch for cookies, I’m releasing today a JS snippet that will hook in the document.cookie execution flow and will take care of two main tasks:

    • Reporting all the cookies that are not setting the sameSite value properly ( remember they need to have the sameSite and the Secure switches )
    • If we decide to turn it on, the script will be able to automatically add sameSite and Secure parameters automatically!

    The script I’m sharing will take care reporting on the window.load event a list of cookies that didn’t either set the sameSite or the Secure switches for the cookies, and will also report the cookie name, if the cookie setting has been autofixed and the original and fixed cookie setting string.

    {
    'event': 'samesite-cookies-report',
    'cookiesList': [{
                                'cookieName': {{cookie.name}},
                                'cookieSameSite': {{cookie.secure}},
                                'cookieSecure': {{cookie.sameSite}},
                                'autofixed': {{autoFix}},
                                'originalCookieString': {{cookie.string}},
                                'fixedCookieString': {{cookie.fixedString}}
    }]
    }

    Then based on the samesite-cookies-report event on Google Tag Manager you could push this details as an event to Google Analytics or report it to anywhere else.

    Main point of this script is being able to monitorize any cookie being set somewhere in our website, so we can contact their stakeholder to have it fixed as soon as posible

    (function() {
      try {
        // Set this to true if you want to automatically add the sameSite attribute        
        var autoFix = false;
        var cookiesReportList = [];
    
        // Detecting if the current browser is a Chrome >=80
        var browserDetails = navigator.userAgent.match(/(MSIE|(?!Gecko.+)Firefox|(?!AppleWebKit.+Chrome.+)Safari|(?!AppleWebKit.+)Chrome|AppleWebKit(?!.+Chrome|.+Safari)|Gecko(?!.+Firefox))(?: |\/)([\d\.apre]+)/);
        var browserName = browserDetails[1];
        var browserVersion = browserDetails[2];
        var browserMajorVersion = parseInt(browserDetails[2].split('.')[0]);
    
        // We only want to hook the cookie behavior if it's Chrome +80 
        if (browserName === 'Chrome' && browserMajorVersion >= 80) {
          var cookie_setter = document.__lookupSetter__('cookie');
          var cookie_getter = document.__lookupGetter__('cookie');
    
          Object.defineProperty(document, "cookie", {
            get: function() {
              return cookie_getter.apply(this, arguments);
            },
            set: function(val) {
              var cookie = {
                name: '',
                sameSite: false,
                secure: false,
                parts: val.split(';'),
                string: val
              }
              cookie.parts.forEach(function(e, i) {
                var key = e.trim();
                cookie.parts[i] = e.trim();
                if (i === 0) {
                  cookie.name = key.split('=')[0];
                }
                if (key.match(/samesite/)) {
                  cookie.sameSite = true;
                }
                if (key.match(/secure/)) {
                  cookie.secure = true;
                }
              });
              if (cookie.sameSite === false || cookie.secure === false) {
                if (autoFix === true && document.location.protocol==="https:") {
                  if (arguments[0][arguments[0].length - 1]) {
                    arguments[0] = arguments[0].substring(0, arguments[0].length - 1);
                  }
                  if (cookie.sameSite === false) {
                    arguments[0] = arguments[0] + '; sameSite=None';
                  }
                  if (cookie.secure === false) {
                    arguments[0] = arguments[0] + '; secure';
                  }
                }
                cookiesReportList.push({
                  'cookieName': cookie.name,
                  'cookieSameSite': cookie.sameSite,
                  'cookieSecure': cookie.secure,
                  'autofixed': autoFix,
                  'originalCookieString': cookie.string,
                  'fixedCookieString': arguments[0]
                });
              }
              return cookie_setter.apply(this, arguments);
            }
          });
        }
        window.addEventListener('load', function(event) {
          dataLayer.push({
            'event': 'samesite-cookies-report',
            'cookiesList': cookiesReportList
          });
        });
      } catch (err) {}
    })();

    On the top of code you may see the following line:

    var autoFix = false;

    ok, if we change this to true, the script will automatically take care of accordingly adding the missing parts 🙂

    One of the things to have in mind is that we need this code to be run as soon as possible on the page execution flow, so if we’re setting this via GTM, we’ll need to setup this tag to fire on the first event on the page ( most of time will be “All Pages ), and give it some extra priority:

    WARNING: If you only plan to use this script as a reporting tool you can stay safe. If you plan to use the autofixing feature, please have in mind that I only tested in some sites of mine, so it’s your liability to properly setting and testing it up in your site before going live.

    If you’re really interested on knowing more about how cookies are updating their behaviour to protect the users privacy, best place is https://www.cookiestatus.com/ . A site where my friend Simo is collecting all the info about almost all the mainstream browsers out there.

  • The Definitive Approach for preventing duplicate transactions on Google Analytics – Using a Universal CustomTask

    It’s been a long time since I wrote my post about how to prevent duplicate transactions on Google Analytics. At that point, the customTask wasn’t a thing on the Google Analytics JS library, and the approach consisted of writing a cookie on each transaction and then work with some blocking triggers.

    It’s a working solution for sure, but based on all the feedback I had over the years, it was not easy to understand for people. Things got worse even with the Enhanced Ecommerce since there’s no specific hit type to block ( remember that on EEC, any hit is used as a transport for the Ecommerce data ).

    That’s why I’m releasing a completely new approach to prevent duplicate transactions on Google Analytics. It’s based on the customTask functionality and it will work out of the box independently on how you have set up your Enhanced Ecommerce Tracking, sound good yes?

    If you wonder how are we going to achieve this, take a look at the following flow chart

    Basically, we’ll check the current hit payload to find out if it has any transaction-related data, and, only, in that case, we’ll be removed the e-commerce related data from the hit, If that transaction has been already tracked on the current browser ( we’ll be using a cookie to keep track of recorded transactions, just as we used to do in our old solution )

    To have this working the only thing we need to do it to create a new Variable in Google Tag Manager with the following for our “duplicate transactions blocking customTask” .

    *Note that I tried t add as many comments as I could in the customTask code, so please take some time to understand how it works! 🙂

    function() {
      return function(customTaskModel) {
        var originalSendHitTask = customTaskModel.get('sendHitTask');
        // Helper Function to grab the rootDomain
        // Will help on seeting the cookie to the highest domain level
        var getRootDomain = function() {
          var domain = document.location.host;
          var rootDomain = null;
          if (domain.substring(0, 4) == "www.") {
            domain = domain.substring(4, domain.length);
          }
          var domParts = domain.split('.');
          for (var i = 1; i <= domParts.length; i++) {
            document.cookie = "testcookie=1; path=/; domain=" + domParts.slice(i * -1).join('.');
            if (document.cookie.indexOf("testcookie") != -1) {
              var rootDomain = domParts.slice(i * -1).join('.');
              document.cookie = "testcookie=1; expires=Thu, 01 Jan 1970 00:00:01 GMT; path=/; domain=" + domParts.slice(i * -1).join('.');
              break;
            }
          }
          return rootDomain;
        };
        // The custom Task
        customTaskModel.set('sendHitTask', function(model) {
          try {
            // Let's grab the hit payload
            var rawHitPayload = model.get('hitPayload');
            // We're converting the payload string into a key=>value object
            var hitPayload = (rawHitPayload).replace(/(^\?)/, '').split("&").map(function(n) {
              return n = n.split("="),
                this[n[0]] = n[1],
                this
            }.bind({}))[0];
    
            // Let's check if this hit contains a transaction info
            // if the hit contains a &pa parameter and the value equals to "purchase" this hits contains a transaction info        
            if ((hitPayload && hitPayload.pa && hitPayload.pa === "purchase")) {
              // Let's grab our the previous transactions saved in our cookie ( if any )  
              var transactionIds = document.cookie.replace(/(?:(?:^|.*;\s*)__transaction_ids\s*\=\s*([^;]*).*$)|^.*$/, "$1") ? document.cookie.replace(/(?:(?:^|.*;\s*)__transaction_ids\s*\=\s*([^;]*).*$)|^.*$/, "$1").split('|') : [];
              // if the current transaction ID is already logged into our cookie, let's perform the magic
              if (transactionIds.length > 0 && transactionIds.indexOf(hitPayload.ti) > -1) {            
                // EEC hit keys magic regex. The following regex will match all the payload keys that are related to the ecommerce
                var eecRelatedKeys = /^(pa|ti|ta|tr|ts|tt|tcc|pr(\d+)[a-z]{2}((\d+)|))$/;
                // Now we'll loop through all the payload keys and we'll remove the ones that are related to the ecommerce
                for (var key in hitPayload) {
                  if (key.match(eecRelatedKeys)) {
                    delete(hitPayload[key]);
                  }
                }
                // not let's update the payload into the hit model! :)
                model.set('hitPayload', Object.keys(hitPayload).map(function(key) {
                    return key + '=' + hitPayload[key];
                }).join('&'), true);            
              } else {
                // IF the execution arrived to this point. It means that this is a NEW transaction
                // Then, we'll do nothing to the payload but instead we'll be adding the current transaction ID to our cookie
                transactionIds = [hitPayload.ti].concat(transactionIds);
                var _expireDate = new Date();
                // This cookie will expire in 2 years
                _expireDate.setMonth(_expireDate.getMonth() + 24);
                document.cookie = "__transaction_ids=" + transactionIds.join('|') + ";expires=" + _expireDate + ";domain=" + getRootDomain() + ";path=/";            
              }
    
            }
            // Send the hit
            originalSendHitTask(model);
          } catch (err) {
            // In case the above fails, we want to send the hit in any case!!!
            originalSendHitTask(model);
          }
        });
      };
    }

    We’re done. From now all this customTask will be taking care of detecting transactions traveling on the hits, writing it to a cookie and removing the transaction data from the hit if needed!

    • You don’t need a blocking trigger
    • You don’t need an extra condition on your firing trigger
    • You don’t need a variable for checking for the value of the cookie
    • It’s doesn’t matter how you’ve set up your e-commerce tracking, the customTask will work despite your current approach ( sending it with the default pageview, or an event, or if using the dataLayer data or based on a variable that builds up the e-commerce data for GTM ). 
    • You won’t need to block your default pageview on the confirmation page to have the ecommerce working without duplicates.

    It will just simply work!

    Of course, you may want to block some other tags from firing since the customTask will write all the data into a cookie, it would be accessible for you to use it at your need. Just grab the “__transaction_ids” cookie value, and search for your already recorded transactions

  • Test API . Unit testing for Google Tag Manager Custom Templates

    Google Tag Manager team has just added, in a stealth mode, a new Test API for the Custom Templates. This Test API will allow us to define some Unit Tests in order to be able to automate the testing of our template before publishing them.

    You will find a new tab within the templates editor named “test” where you will be able to run some code before the tests start, and then add a set of unit test to run each time you want to test your library.

    For those who are not much into the programming according to the Wikipedia a Unit Test is:

    In computer programmingunit testing is a software testing method by which individual units of source code, sets of one or more computer program modules together with associated control data, usage procedures, and operating procedures, are tested to determine whether they are fit for use.[1]

    https://en.wikipedia.org/wiki/Unit_testing

    In essence, they’re automated tests that make sure that a certain part of program execution, a unit, is working as expected. For example, if we were writing some unit tests for a page, we would be writing some Unit tests for example for checking if the contact form works, another to check if the add to card works, another one for testing if the search engine works, or if the current pages contain some specific elements.

    According to Google Tag Manager support pages, each test can provide sample input values, mock function calls, and assert code behaviors. ( More details GTM Tests ) .

    Unit Tests are also exported/imported, so this would help people that need to tweak any template to have their new code tested if the original author provided all the needed unit tests for testing the template.

    Setting up the Unit Testing may look like a tedious and time-wasting task the first time you have to deal with it, but with the time you’ll widely recover all the time invested ( NOT WASTED ) while setting up the tests.

    This new Test API doesn’t need to be initialized as the other ones since it’s run within its own section of the editor and its looks it’s already enabled there and it provided 5 new API’s we can use to defined and run our unit tests:

    assertApi Returns a matcher object that can be used to fluently make assertions about the given API.
    assertThat The assertThat API is the one in charge to check if some specific value/variable matches the one we are expecting. GTM used Google’s Truth library as a model for this. Having one asset failing doesn’t affect other test cases.
    fail This will force the test to fail and return a giving message if any is provided.
    mock The mock API will allow us to override/replace the current sandbox API’s behavior
    runCode Calling this method will run the code for the current template,

    I’m trying to show some simple examples of how these APIs work with examples. If you want to dig deeper I suggest you going to the official developer site for the Test API for a further read: https://developers.google.com/tag-manager/templates/api#runcode

    assertAPI

    assertApi(apiName)

    This API will return a matcher object that we will able to use to make asserts against the given API.

    For example, each tag execution must contain one call to gtmOnSuccess. Would make no sense having 2 calls to success within a single tag execution that could mean that we’re firing it at some point when it’s may not be true. Let’s write an API assertion to check that gtmOnSuccess is being called just once.

    This is our test current custom tag code

    // dummmmyyyy template
    data.gtmOnSuccess();
    const log = require('logToConsole');
    log('data =', data);
    
    data.gtmOnSuccess();

    There’re not many details about the currently supported subjects for this assertion tests. I’ll try to keep the following table updated as long as I find/test them out.

    Available subjects list

    wasCalled(number of calls)
    wasNotCalled()

    assertThat

    This API will allow making the assertions against our variables or subjects. It has been designed based on the Truth library by Google, which is a library for running assertion tests for Java and Android. You can find more info in the official URL: https://truth.dev/

    I’m not sure if all the subjects available on truth and that been currently modeled into the Test API For GTM, but you can find a list of officially provided one in the project GitHub repository, in case you’re curious about it: https://github.com/google/truth/tree/master/core/src/main/java/com/google/common/truth

    assertThat(actual, opt_message)

    For this let’s imagine that we have a variable that we expect to be returning “SIMO” as a string, ok?. For this test, we’ll write a small variable that returns “DAVID” instead of our expected value in order to test an assertion.

    // assetThat API Test
    // No Needed Libraries
    // return "SIMO"
    return "DAVID";

    Now, we’ll write a simple test that run an assert to test that the value returned by the variable actually is “DAVID”

    There’re not many details about the currently supported matched subjects for this assertion tests. I’ll try to keep the following table updated as long as I find/test them out.

    Available subjects list

    isEqualTo(expected)
    isNotEqualTo(expected)

    fail

    fail(opt_message)

    mock

    This API will allow us to replace or override the current Sandbox API’s functionality.
    Let use the following Custom Variable Code

    mock(apiName, returnValue);
    // Mock API Test
    // Add Needed Libraries
    const log = require('logToConsole');
    const encodeUri = require('encodeUri');
    
    const testValue = encodeUri("http://www.google.es");
    log('testValue=', testValue);
    
    return testValue;

    If we run variable we would expect to have the testValue to be the URL encoded, but we’ve mocked the encodeUri API in our test to return a fixed value:

    runCode

    This API actually takes care of running our template code. It optionally accepts a data object as a parameter and it will return a value if we’re testing a Variable or undefined in any other case.

    runCode(data)
    runCode({
      measurementID: 'G-XXXXXXXX' 
    });

    Passing an object data will allow us to set some values for the current tag/variable without the need to manually fill the data into the template fields.

    In the example above the data.measurementID variable will hold the value ‘G-XXXXXXX’ even w didn’t populate our field.

    Wrapping up

    Google Tag Manager is giving us the perfect tool ( as a start ) for being able to have some fail-proof containers and code. AFAIK it seems the library is supporting some basic matchers, but I expect them to be expanded in the future ( or documented! ).

  • UDO Debugger – Easiest way to work and QA Tealium implementations

    Some years ago, I published my “Tealium Debugger” extension for Chrome which after a nice email my Inbox got, ended being named “UDO Debugger” ( no hard feelings at all ).

    You may click no the next CTA if you want to install the extension in your browser =)

    That version published circa Apr 2017, was just showing up the current app state on the page loads, the current account, profile, and environment details. There was no support for anything else. Sadly I switched works and I was not able to work with Tealium anymore, despite my requests to get some testing account access to improve the extension.

    The good news is that I’m back on some works with Tealium which means it’s a nice time to update the extension. I’m pleased to say that after a lot of hours of works, I’ve completely refactored the extension, and I’ve added lots of cool features that I hope makes the analysts and developers in the need to work with Tealium a bit more easier 🙂

    Let me put a small list of the current features on the debugger:

    • Tealium Presence Detection and profile, account, environment reporting.
    • NEW. App State, Utag.link, utag.view Reporting
    • NEW. The current fired tags are reported.
    • NEW. Clipboard Support.
    • NEW. Environments Switcher
    • NEW. Debug Enabler / Reporter

    Tealium Presence Detection

    The extension will automatically report to you if Tealium has been loaded on the current page.

    Along with this info, you’ll be able to see the current enabled environment, profile and account name. Also, you will be able to see the current version loaded ( useful for easily seeing when the last publication is live ).

    App State, Utag.link, utag.view Reporting

    In the previous versions, only the App State was being reported. ( when I refer to app state it’s the original utag_data value.

    On the new version, all the UDO model is shown on the page load, and for making the debugging easier all the data layer keys that are internal to Tealium. As you may know, Tealium collects some details by default data that can be used, for cookies, meta tags, audienceStream audiences, queryString parameters, etc ), are grouped into the _internal key, also we’re doing the same for the keys that are related to the eCommerce.

    This way you could just focus on the Universal Data Object keys that are really up to you.

    And yes now the UTAG.LINK and UTAG.VIEW calls are being reported. So each time your site does a call you’ll be notified.

    And the coolest thing is that you’ll be able to see if the current link/view was scoped for a certain tag or if they were pushed along with any callback, even being able to see a fully prettified and formated output of the callback code.

    The current fired tags are reported

    This new UDO Debugger version can to show you the current tags being fired by the site via Tealium. And they will appear in the report automatically as they fire :).

    And … did you ever meet any implementation where the tag names were named so badly, something like “marketing pixel” , or even not matching the current pixel, like firing a Google Ads Pixel and having a name like “Conversion Pixel FB” ?, I’m sure you did. Now on UDO Debugger, you’ll be able to click on the tags ID’s and you’ll get a full report on what that pixel is. Like the real template name, the description, the available configurable fields for that pixel on Tealium and … you will be able to see the current tag code, in a prettified and formated way without needing to go to Tealium and editing the template! ( say yay! )

    Clipboard Support

    Any app state/utag.link/utag.view is eligible to be copied into your clipboard with a mouse click. And they will be copied on a nicely formatted way, easy to report to your IT department or for writing your documentation 🙂

    Environments Switcher

    Within the main window of the extension, you’ll be able to switch to DEV / QA / PROD environments. This setting while last while the browser session is open.

    If you ever worked with the native solution to switch environments on Tealium it has 2 big issues in my opinion:

    1. utag.sync is not affected. So it will not be redirected.
    2. If you use the switching cookie, instead of doing a redirect, it will load a second utag.js on top of the first one, ending on some odd behavior under certain circumstances. For example, let’s say that your “prod” environment has an extension that sets a global variable window.thyngster, then you say let’s load our “dev” environment where that variable is not set. You may see that variable being set despite you’re not expecting it, making you go crazy!

    We’re have solved this 2 issues, the newly added environment switcher added to UDO Debugger will perform 307 redirects for both utag.js and utag.sync.js files requests so you can be able to do some proper QA and testing.

    Debug Enabler / Reporter

    As I’m sure you already know you can have Tealium pushing all the debug logs to the console ( via the calls to utag.DB in the templates ). For getting this done, you’ll only need to create a cookie named ‘utagdb’ with a true value this way:

    document.cookie="utagdb=true";

    Still, a lot of stuff is pushed to the console, which makes difficult to follow what going on, since it’ll be messed with other messages, request failed or scripts errors.

    This version allows you to enable the debug switch just from the interface, and it will then show you in a nice table. you’ll even be able to check previous pages logs. Which will make the utag.debug diggest a lot easier.

  • Google Tag Manager Custom Template to grab the current Root domain Name

    It happened to me a lot of times that while working with cookies that for whatever the reason I didn’t remembered to add the domain name to the settings, or that I was not able to properly define the root domain.

    If you ever have had the need to set a cookie to the main root domain for your current domain, to make a cookie available for all the place, that depending on your current domain it may not be just enough to split the current hostname, because we need to think on the TLDs ( top-level domains ), SDLs ( second-level domain ), ccTLD ( country code top-level domain ), having all these variables you may end not knowing where the root domain is, for example is we’re talking about a co.uk domain , we could check things by the numbers of characters on each level, but that would be also not as relialable as it would look at a first time.

    Actually ,afaik, the only reliable way to check this is relying on a proper lookup table or database ( http://www.iana.org/domains/root/db ).

    Good thing on JavaScript is that we try grab the current hostname and start setting a cookie till it get’s actually set.

    For example let’s say that we have my.super.duper.domain.co.uk , we would be doing the following:

    Good thing is that now you can have this value available without needing to deal with any code, just import the following Custom Variable to your containers 🙂

    Available on GTM Gallery or via GitHub: https://github.com/david-vallejo-com/gtm-template-root-domain-variable/