Blog

  • Preventing duplicate transactions in Universal Analytics with Google Tag Manager

    One of the most common headaches while implementing the ecommerce tracking on a site is trying to match the tracked transactions by the shop backend to Google Analytics. As most tracking solutions are JavaScript based, there’s a small chance of losing some of them and there’s nothing we can do without playing with the measurement protocol and some server-side tracking.

    Another problem that is usually present is having duplicated transactions. And this hopefully is something we can prevent with some code.

    We can setup a tag to write a cookie each time a visitor views our “thank you” page, that is not a bad approach, but that way we won’t be sure that the transaction has been really tracked on Google Analytics.

    We’re going to use the hitCallback feature available for Universal Analytics, to set the cookies just right after the data has been successfully sent to Google Analytics.

    We’ll need to set the hitCallback value in our Google Tag Manager Tag to a Custom JavaScript Variable. As I was pointed by Simo Ahava on Twitter , hitCallback is expecting a function, so we’re going to return a function that does the following:

    1. Grabs the current transactionId from the dataLayer
    2. Checks for the “transactions” cookie
    2.1. If if doesn’t exists we’ll create it with the current transactionId value
    2.2. If the cookies already exists, we’ll check the current values, and if the current transaction is not there, we’ll add it.

    To avoid having a new cookie for each transaction, we’ll be using just one cookie with all the transactions joined by a pipe ( “|” ) symbol.

    Ok, now every time that a transaction hit is sent to Google Analytics, the current transactionId will be added to our tracking cookie.

    We’ll need a 1st party variable too to grab the transacctions cookie this way:

    You may noticed that the first 2 lines of the code is checking for the transactionId, this is because in this example we’re using the Enhanced Ecommerce feature and populating the transaction info from the dataLayer, and we don’t want to do anything for all the pageviews on the site but just for our thankyou page one. You may need to tune this for your needs.

    Ok, let’s move on. Now we’ll need to add another customJS variable to check if the current transaction is already in the cookie, and we’ll use this variable to create a blocking trigger for our tag.

    I’ve named it as “Should I Track Transaction”, (yeah, not the best name), but it helps to understand the trigger:

    We only need to add this blocking rule to our pageview tag and we’ll be finish.

    Let’s do a resume of the tracking flow:

    1. “Should I Track Transaction”, will return “blockTransaction” if the current transactionId is present in our tracking Cookie
    2. “Block Transaction” Trigger will block the pageview tag firing if #1 is true.
    3. If the first 2 points are not met, the pageview tag will be fired.
    4. When the pageview tag is fired, the hitCallback function will be executed right after the transactions has been sent to Google Analytics Endpoint
    5. The hitCallback will execute the function returned by the variable “transactionCallback”, which will be in charge of creating the cookie if is doesn’t exist and adding the current transactionId to it.

    I know that this will not be functional for some cases and there’re a lot of different implementations, sending the transaction based on events, sending the transactions based on a macro value (enhanced ecommerce), but that’s something you’ll need to figure out as isn’t there any stardard tracking solution. Hopefully you have learnt how hitcallbacks work in Google Tag Manager and you could get it working for your enviroment, if not drop a message in the post and I (or any other reader), will try to help you.

    As could not be otherwise Sir Simo already did something similar for old ecommerce tracking some months ago.

    transactionCallback Code

    function()
    {
      	// If isn't there a transaction ID, we don't need to do anything.
    	if(!{{transactionId From DL}})
        	return;
      
    	return function(){    
    		var transactionId = {{transactionId From DL}};
    		if({{transactions}}){
     			var trackedTransactions = {{transactions}}.split("|");
       			if(trackedTransactions.indexOf(transactionId)==-1){         
              		trackedTransactions.push(transactionId);
    				var d = new Date();
        			d.setTime(d.getTime() + (180*24*60*60*1000));
        			var expires = "expires="+d.toUTCString();           
        			document.cookie = "transactions=" + trackedTransactions.join('|') + "; " + expires;
            	}
    		}else{
          		var trackedTransactions = [];
          		trackedTransactions.push(transactionId);
      			var d = new Date();
        		d.setTime(d.getTime() + (180*24*60*60*1000));
        		var expires = "expires="+d.toUTCString();           
        		document.cookie = "transactions=" + trackedTransactions.join('|') + "; " + expires;
    		}
    	}
    }
    

    Should I track transaction Code

    function()
    {
    	if(!{{transactionId From DL}})
           return;
    
        var transactionId = {{transactionId From DL}};	
    	if({{transactions}}){
      		var trackedTransactions = {{transactions}}.split("|");
        	if(trackedTransactions.indexOf(transactionId)>-1)
            {
    			return "blockTransaction";
            }
        }
    }
    
  • Locating legacy hardcoded events and pageviews using Google Tag Manager

    If you’re migrating from classic Google Analytics (_gaq object) to Universal Analytics, or planning to migrate your old hardcoded events to use Google Tag Manager you may have run in the case that not all old tracked was really removed from the site by the client. And it’s really hard to find it out.

    We’ll learn today how to track this situation natively using just in-built Google Tag Manager features and allowing us to track all those all coded tracked within our Google Analytics data using events, so we can fix them and without the need of crawling your website.

    We’re going to use Google Tag Manager CSS Selectors to find out those old tags and then firing an events to help us to easily view in GA where are those old codes hidden without needing to deep-crawl the sites and  helping us having a clean migration.

    We’ll need to create a new variable in Google Tag Manager, as it’s shown in the screenshot below:

    hardcoded_events_tracking_01

    This variable will look for all onclick attributes available in the current page, containing  the “_gaq” string. If there’s any onclick containing a _gaq push, the whole value of the onclick attribute will be returned, if not the value for this variable will be “null”. Knowing this let’s configure our trigger that is going to be fired when there’s a match.

    hardcoded_events_tracking_02

    Now we only need to setup an Event to record the pages where the old hardcoded tracking is present.

    We’ll record the current page path, along with the full push that was found.

    hardcoded_events_tracking_03


    Note: We’ll need to set the event Non-Interaction switch to true, to avoid messing the bounce rate. After this an event will be fired is any old _gaq push is found within all the onclick attributes in the current page.

    hardcoded_events_tracking_04
  • Detecting and tracking page’s printing intent with Google Tag Manager

    I was looking for something new to track within the websites interactions, and I decided to give a try to track the user’s intent to print our pages.

    So I’m going to show you my own approach for tracking this user action using Google Tag Manager.

    We’ll need to create a new tag with the following code will allow us to track the user printing intent when he clicks on printing from the File Menu or when he uses the keys shortcut (Control-P on Windows)

    It may happen that a browser does have support for the onBeforePrinting event and mediaQueries, so the code is looking into the dataLayer for a previously send printing event to avoid duplicate events.

    Reminder: The code above needs to be wrapped between <script></script> tags or you will end having this text code injected on your page.

    Just to be safe, we’ll run this tag on the “gtm.dom” event. Then we’ll have a push into the dataLayer telling us the user tried to print the current page, and we’ll be able to fire an event for Google Analytics or use it with any other tool.

  • Tip: Maintaining Proper Session Attribution in GA when using OAuth providers with Google Tag Manager

    If you using Facebook, Twitter, Github or any other service to autenticate your users, you may have noticed that they end showing up as referral traffic from the oauth service.

    User lands from CPC -> Logs in -> Respawns a new visit as referral

    We could think on adding those domains to the ignored referrals within our view configuration, but this will and hidding the real referal traffic from those networks.

    oauth_post_01

    The screenshot above is an example for the referral paths for the domain facebook.com. We only want to avoid the ones that comes from certain paths and not the whole domain.

    Simo posted above How to implement the referral exlusiong using GTM some days ago, and it’s kinda similar to what I did some time ago in one implementation to get ride of fake referrals traffic from Facebook Login.

    oauth_post_02

    For doing this we’ll only need to take a look to the referrals paths for the domain note the ones we want to ignore, and then use create a new variable on Google Tag Manager.

    oauth_post_03


    Note that we’ll need to enable the built-in Variable in Google Tag Manager, configure the fixFBarr array with our paths. This variable will return “null” as the referrer when the paths match the ones we want to ignore, and will continue returning the original referrer is there is not match.

    CODE

    function(){
      var fixFBarr = ['/dialog/oauth',
            '/v2.1/dialog/oauth',
            '/login.php',
            '/v2.1/dialog/oauth/read'
        ];
    
        for (var i = 0; i < fixFBarr.length; i++) { 
            if (document.referrer.indexOf(fixFBarr[i]) > -1)
                return null;
        }
      return {{Referrer}};
    }

    It will be a good idea instead of checking the path to check a combination of domain + paths array. If someone if interested on it, drop a message in the post and I’ll publish a full script covering this option.

    Now, we just need to set the referrer value for our pageview tag as follows:

    oauth_post_04

    Ok, we’re ready right after publishing all the traffic from Facebook/Twitter/Github or any other oAuth provider will start to dissapear slowly but relentlessly.

  • Tip: Grabbing the visitor’s first date from cookies using Google Tag Manager

    With the new Universal Analytics cookies and everything being calculated server-side, it’s more difficult to detect some attribution problems or verifying if they were already fixed.

    For example after fixing a self-referrals problem we can continue getting more of them in our reports if some users were previusly tagged in Google Analytics.

    One little trick we could use is to grab the cookie creation date from the “_ga” cookie, convert it to an understandable format like YYYYMMDD, and then use it a custom dimension in order to see if those self-referrals where created before fixing our site implementation.

    Ok, I know we could directly segment those sessions by the “User Type” dimension but this way we could run some cool cohort analysis based on that data, for example days since an user landed on our site for first time to when he registered or performed any other action like buying something, signing up in our newsletter or whatever.

    We’re going to use a single Variable in Google Tag Manager to get this value, and then we could use it whenever we want, to directly send that date into a custom dimension or calculate any other dimension/metric

    CODE

    function(){
      var regex = new RegExp("_ga=([^;]+)");
      var value = regex.exec(document.cookie);
      var cookieCreationDate = (value != null) ? new Date(value[1].split('.')[3]*1000) : undefined;
    
      if(typeof(cookieCreationDate)!=="undefined")
      {
    
        var year = cookieCreationDate.getFullYear().toString();
        var month = (cookieCreationDate.getMonth()+1).toString();
        var day  = cookieCreationDate.getDate().toString();
        cookieCreationDate = year + (month[1]?month:"0"+month[0]) + (day[1]?day:"0"+day[0]);
      }
      
      return cookieCreationDate;
    }
    
  • Track alert() pop ups automatically in Google Tag Manager

    Did you ever faced a site that still used the old-fashined alert popups for example for showing a form validation messages? and you were wondering how to track that form alert popups without needing to change anything on the site (as it is likely going to be an old one). And yep,they still exist. Hopefully if we are already using Google Tag Manager (or we can tell the client to add some little piece of code to their pages) we could track them easily using what is called a Proxy Pattern.

    Sadly this won’t be supported by IE8 for the alerts as it seems it’s not a function but an object, therefore we’re adding a check for pre IE9 browsers.

    In essence, we’re saving the current window[“alert”] function in a new variable, and the redefining the same function, but returning the passed arguments to our original saved function, preserving the original function behaviour.

    We’ll just need to create a Custom Html Tag that will fire either in all pages or just for the ones that we want to track alerts on.

    // Let's check if the browser is <=IE8
    if(!(document.all && !document.addEventListener))
    { 
     // Just in case check, to avoid loops
     if(!window.proxied_alert)
     {
       window.proxied_alert = window.alert; // Preserve the original function
       window.alert = function() { 
         var message = (!arguments[0]) ? 'null': arguments[0];
         dataLayer.push({'event': 'alert_showed','alert_message': message});
         return proxied_alert.apply(this, arguments);
       }
     }
    }

    In this example we’re pushing the info to the dataLayer, but we could directly do anything like firing an event using the native ga/_gaq functions or pushing the info to any other tool we want.

    As we’re modifying the original window.alert function, I suggest  to deeply debug/test the site functionality after implementing this code, as it may happen that other script is doing the same thing to the alert function. (some modals scripts plays with this to automatically convert the alert calls to a nicer popups).

  • Tips to track an ajax based website using GTM and Universal Analytics

    I’ve been dealing with some Ajax based website tracking these past months (mostly AngularJS based ones), and I’ve learn a lot of stuff , mostly some important points about how Google Tag Manager does work.

    Usually tracking an Ajax based website is a PITA, since mostly we don’t have page reloads as all the content is loaded asynchronously. So either you ask the developers to push you the content loaded info into the dataLayer or you play with the history location.

    Still we need to have some points in mind while tracking those sites with Google Tag Manager (and maybe natively or using any other tag management system).

    1. Use a fixed tracker name

    The way Google Tag Manager works, everytime a Google Analytics tags is fired it creates a new tracker with a name (“GTM”+Current timeStamp ). This is not a problem for most tracking implementations, but we need to take in mind that websites based in Ajax usually changes the current URI without even needing to reload the page, so it may happen that if we’re using the campaign tagging we may ending having a lots of unneeded visit respawn and referrals.

    Let me explain this with a simple example:

    https://www.domainlinkingtous.com has a link to our site and it’s using the campaign tagging to allow us to track their visits to us.

    http://www.ourdomain.com?utm_source=test_source&utm_medium=test_medium&utm_campaign=test_campaign

    So the visit will end in our page with the following values:

    URL: http://www.ourdomain.com?utm_source=test_source&utm_medium=test_medium&utm_campaign=test_campaign
    Referrer: http://www.domainlinkingtous.com

    Let’s map this to Universal Analytics parameters:

    dl: http://www.ourdomain.com?utm_source=test_source&utm_medium=test_medium&utm_campaign=test_campaign
    dr: http://www.domainlinkingtous.com

    Universal Analytics uses the “dl” parameter  (document.location) and dr(document.referrer), to calculate the visit attribution. It will search for gclid, utm_ parameters in the “dl” parameter value (in that order of preference) and if it doesn’t find them it will try to calculate the attribution from the “dr” string.

    As we have the the utm parameters in our landing page, the referrer won’t be taken in mind to calculate the visit attribution, so we’re ok at this point.

    source: test_source
    medium: test_medium
    campaign: test_campaign

    But if the visitor decides to go to another page, and like we said in this case there will be no page reload  Google Tag Manager will spawn a new tracker and we’ll end having those values.

    dl: http://www.ourdomain.com/secondPage.html
    dr: http://www.domainlinkingtous.com

    As we said before if there’re no utm or gclid parameters in the document.location Google analytics in going to look to the document.referral string and it will find a different domain than the current loading one, so it will track a new visit:

    source: domainlinkingtous.com
    medium: referral
    campaign: /

    D’oh. This is wrong, we’ll miss the original visit attribution, plus we’ll have some extra ghost sessions in our data.

    Best way to do this (even if it’s not recommended) to set a fixed tracker name in our tags.

    ajax_sites-01

    This way, the same tracker instance will be used everytime and the original dl parameter will be kept.

    2. Force the Non-Interaction Flag to false in your pageviews

    This is another important point to have in mind. Google Tag Manager uses the set command to set the values instead of setting them directly to hit. So if we fire and non-interactive event tag, the subsecuent hits will inherit that flag making the subsecuente pageviews to have the “ni” parameter attached to them.

    Let’s how Google Tag Manager is going to translate a single Event tag:

    ga("create", "UA-40180485-1", {"cookieDomain": "auto","name": "myTracker"});
    ga("myTracker.set", "&gtm", "GTM-MW3DNR");
    ga("myTracker.set", "nonInteraction", true);
    ga("myTracker.send", {"eventAction": "test","eventCategory": "test", "eventLabel": "test","eventValue": undefined,"hitType": "event"});

    As you can see the nonInteraction is globaly for the tracker, and this will affect our next pageviews (remember that we don’t have any page reload so the nonInteraction flag will be kept till we disable it.

    An easy going fix for this without needing to deal with hitCallback/eventCallback is to automatically remove the “ni” parameter for all pageviews as it’s not likely going to need it (ever used a non-interactional pageview?).

    Let’s see how can we achive this. We can’t use undefined string on GTM input boxes as it will be treated as a string, therefore we’re going to create a new Variable that is going to return us the real undefined value.

    ajax_sites-02


    N
    ow we can safely force the “ni” parameter to undefined in our pageview tags this way:

    ajax_sites-03

    Did you fance any other problems when tracking ajax based sites?, share them in a comment 🙂

    Greets: Thanks fly one more time to Yehoshua Coren from Analytics Ninja, for reviewing these points with me.

  • Pushing custom commands to Universal Analytics before Pageview tag execution

    Since I wrote the post about how to send a local copy of Universal Analytics hits I’ve been looking for a way to get this working within Google Tag Manager. But there’s no way to send anything between the tracker creating and the hit firing on GTM.

    Some time ago there was a discussion thread on Twitter commenting this problem:

    And even Simo Ahava wrote a post about it afterwards: http://www.simoahava.com/analytics/access-the-tracker-object-in-your-page-view-tag/

    Today I had some free time after lunch and I decided to get this working in my own way so and I’m showing a way to achive this in a pretty standard way, with multi-tracker support, with just one single HTML tag (affecting all hits sent by GTM) and still not needing to initialize the tracker within a Custom HTML tag, so you will just continue using the predefined Universal Analytics tags on Google Tag Manager. you remember I wrote about how Universal Analytics snippet works some time ago, and we’re going to use the stuff learned there on this post.
    Universal Analytics creates a global ‘ga’ object, and then a queue (‘q’ parameter) to process it later on(when analytics.js has been fully loaded) , this way:

    window['ga']=window['ga']||function(){
      (window['ga'].q=window['ga'].q||[]).push(arguments);
    }
    

    Google Tag Manager does nothing else but using the analytics.js API to track things then we’re going to hook our needed code right after the create command has been pushed to the queue:
    The following tag will us to set the current tracker clientID as the custom dimension 10:

    window['ga']=window['ga']||function(){
      (window['ga'].q=window['ga'].q||[]).push(arguments);
      if(arguments[0]=="create")
      {
      // Code that is going to be added to the queue after the create command
      ga(function() {
      tracker = ga.getAll()[ga.getAll().length-1];
      tracker.set('dimension10', tracker.get('clientId'));
      });
    }
    

    Yeah, that’s all , as easy as that, we won’t need to deal with dataLayer pushes, wait for something to be executed and better of all it will work for any Universal Analytics tags being fired from GTM.

    There’s just one more thing we will need to do, and it’s set the firing priority for this tag to a high value (100 or so) and firing trigger to “All Pages” so it gets executed as soon as possible and before any other tag  firing at “all pages” time by Google Tag Manager.

    tracker_create_hook_1

    Let’s take a look about how the queue would look without our code, and how it will end being after we hooked our code into it:

    tracker_create_hook_3


    When analytics.js starts to proccess the current commands queue, it will be process the new pushes we added right after the tracker has been created and before it sends the hit.

    This will not just allow to set custom dimensions on GA, we could use to work with the Universal Analytics Tasks , or we could use it to grab our testing values before Universal Pageview has been sent, or anything you ever wanted to do before the first hit was being fired and you never were able to control due to the asynchronously loaded tags by Google Tag Manager.
    To finish, let’s take a look to how would our tag look if we would like to send a local copy of universal hits:

    tracker_create_hook_4

    Find the code below:

    Thanks go to @AnalyticsNinja for helping me to test it.

  • JWPlayer video tracking using Google Tag Manager

    JWPlayer is one the most known video players (if not the most one) and today we’re going to learn how to track it using GTM (Google Tag Manager). This time we’re going to use an agnostic dataLayer pushes instead of the ones that we usually use for Google Analytics. 

    We’ll be using one tag, one trigger and one variable, and in the next list we’re resuming all the events that our code will be able to track:

    • Video Plays
    • Video Pauses
    • Video Completions
    • Video Progress (Percentages to be tracked can be easily defined in the first line of the code)
    • Video Mute and UnMute
    • Video FullScreen On and FullScreen Off
    • Video Errors

    As we said before we’re going to use a tool agnostic dataLayer pushes instead of focusing in the almost standard event tracking model from Google Analytics. The following object will be our video tracking data model:

    {
        "event": "video",
        "player_id": {{VIDEO_ID}},
        "interaction": {{VIDEO_ACTION}},
        "video_url": {{VIDEO_URL}},
        "duration": {{VIDEO_DURATION}},
        "width": {{PLAYER_WIDTH}},
        "height": {{PLAYER_HEIGHT}},
        "position": {{VIDEO_CURRENT_POSITION}},
        "resolutions": {{AVAILABLE_VIDEO_QUALITIES}}, // an array
        "volume": {{PLAYER_VOLUME}},
        "player_type": {{PLAYER_TYPE}} // html5 or flash
    }
    

    Most important data is the event, the interacction and the video_url (as used of the most video tracking scripts), but this time we’ll be adding some extra to the dataLayer pushes to be able to track more details info if we want to.

    Let’s start creating our needed configuration on Google Tag Manager. First thing that we’re going to need is to have a variable that is going to tell us if there’s any jwplayer available in the current page. We don’t want our tracking code to be injected in the page if we don’t have any video on it, do we?.

    jwplayer_tracking_variable

    Now it’s time to configure the trigger that is going to fire our tracking tag, so we’ll be using the previously created variable, and a second condition to trigger the tracking tag after the DOMReady event:

    jwplayer_tracking_trigger

    Let’s configure the tracking tag, after the screenshoot you’ll find the full code so you can copy & paste it:

    jwplayer_tracking_tag

    Tag Code

    Now all the video interactions info will be pushes to the dataLayer so you’ll have all that info available to send your events to Google Analytics or any other tool you like.
    Any suggestion, new tracking idea, feedback will be welcome, just leave a comment.

  • Yet another way/try to stop referrals/events spam on Google Analytics

    The spam referrals problem in Google Analytics is turning into the new “not provided”. Almost everyone I follow has posted/retweeted/said something about it in the past month. I didn’t want to miss this oportunity to bring back some of the scripts I did in the past year for myself and share it with everyone. So in this post I’m going to try to address the Spam problem from another perspective that is not based on using referrals lists and filters.

    As almost all of you may know, this problem is related to the Google Analytics Measurement Protocol not using any authentication mechanism, and being Websites tracked client-side it’s almost imposible to hide any protection mechanism if we don’t force user’s to be logged in some way to some service, what is not possible as we’ll need some way to track anonymous users. They’re using the Measurement Protocol to spam us? ok, so let’s play that game too and use the Measurement Protocol to protect our data from being spammed =)

    Some years ago I wrote some PHP scripts to track the ecommerce transactions server-side to avoid any client-side problem like: users blocking GA, JavaScript errors forcing ga to fail, users network problems, etc.

    So I updated that code to work with Universal to achieve the following goals:

    • Our real UA Number will be totally hidden for others, ie: there will be no way for anyone to guess our real Property number.
    • We’ll attach some string to our hostname parameters (&dh), and we’ll use that to filter our new view to prevent any spammers looping around al UA-XXXXX-Y to send hits.
    • As we’re going to proxy all our ga hits through our server-side script, it may allow other people to directly load that php file, so we’re using PHP sessions to avoid anyone to query that file if isn’t there any open session for that user.

    For Google Analytics Classic or the first Google Analytics Measurement Protocol releases, tracking every hit from server-side was a problem because we weren’t able to send the real users ip addreses (missing all GEO location related data), or their user-agents (this was possible forcing the request user-agent header anyway). But since some months ago this is no longer a problem as we can now send those values as parameters within our hits payload.

    First thing we need to do is to tell Google Analytics to send a copy of the hits to our own PHP tracking file (the one that is going to take care of proxing the hits to the real UA), this can be achived using the “Tasks”, so our Universal Analytics tag will look similar to:

    ga('create', 'UA-XXXXXX-Y', 'auto');
    ga(function(tracker) {
      var originalSendHitTask = tracker.get('sendHitTask');
      tracker.set('sendHitTask', function(model) {
      var payLoad =model.get('hitPayload');
      originalSendHitTask(model);
      var i=new Image(1,1);
      i.src="/collect.php"+"?"+payLoad;
      i.onload=function() { return; }
      });
    });
    ga('send', 'pageview');
    

    If you are interested on knowing how this piece of code works, you can take a read to this other post (it’s in spanish sorry …)

    Now we’ll need to install our server-side tracking code. It just has 2 files. gaproxy.class.php and collect.php . Just take care of uploading the collect.php file to accesible path on your site and then matching the code above with the right path for the file.

    You can grab the files from the following GitHub Repo: https://github.com/thyngster/ga-proxy

    So now that we have everything on place, let’s configure everything. In first place, let’s configure your own variables in the class file, it should be pretty straightforward:

        // Configuration Start
        // Set your real Property Nmber where you want to redirect the data
        private $property_id = 'UA-AAAAAAA-B';
        // This will be attached to the hostname value, so we can then filter any hit not coming from this script
        private $filterHash = 'dontspamme';
        // set this to true, if you want to remove the last Ip's Octet
        private $anonymizeIp = true;
        // Configuration End
    

    Now we’ll need to add this code to all our pages (into the header), so it creates the session, that will be used to avoid the collect.php to be called directly.

    <?php
    include_once('gaproxy.class.php');
    $ga = new GaProxy();
    $ga->setupProxy();
    ?>

    We’re finish!, Now if collect.php is called directly without any previously loaded page, or by a script that doesn’t allow and keep cookies, the session will not be active and our file won’t be sending those requests to Google Analytics endpoint.

    Let’s resume what we did:

    1. We told analytics.js to fordward a copy of the hits to our local collect.php
    2. We added 3 lines of php into our pages, so a session is started and a tracking token is set.
    3. We’ve configured the gaproxy.class.php file with our real UA Number, plus we added a little hash to be able to filter the real hits to our property.

    Now we’ll need to configure 2 filters in our views:

    1. One excluding all hits where the field hostname does not include our filterHash.
    2. One replacing our filterHash by an empty string.
    spamfilter_1
    spamfilter_2

    As you may see in the code there’s a reserved function that will be used to check the requesting IP’s against blacklists, maybe using some throttling mechanism, checking the referrers against a blacklisted list that can be automatically updated,  etc. This is actually in progress as I need to think about the best way to achieve these feature, so any suggestion will be really welcomed. I’m planning to port it to other languages as Python or Ruby too, but I’d like to have a more polished PHP version before that.

    I know this workaround will not be accesible to everyone, or it may take some extra efforts to get it running but using a plain filter, but it has more benefits and you won’t need to keep your filters updated for each new spammer that starts to mess around with your account.