The hidden world of Internet data collection and tracking is indeed wild and woolly.
Mezzobit has observed hundreds of thousands of different tags across tens of billions of transactions. We’ve catalogued more than 200 possible attributes and actions for each tag. The average website has 30 third-party tags, so for a site of 100 million monthly pageviews, that’s 3 billion individual tag firings.
How do website operators find actionable intelligence in all of this to create positive business results? Unless you spend your waking hours thinking about this world — like we do — it’s nearly impossible. (And even then, it’s not a walk in the park.)
Which is why Mezzobit created our tag indices, which assimilate all of this information into a series of easy-to-understand scores that range from 0 to 100. These are used in our Audience Control Module, which monitors tag activity in every web browser transaction as well as scans external websites.
How do indices work
We have six indices now with each answering a different question:
- Data sharing: What types of data does the tag collect and transmit and what types of recipients are there?
- User tracking: What sorts of tracking activities do tag engage in, ranging from cookies to local storage to browser fingerprinting?
- Security: Does the tag handle data in a secure fashion, using HTTPS and payload encryption?
- Tag chaining: How much other third-party technology does the tag call into the page?
- UI changes: Does the tag change the user interface and how extensive are the changes?
- Composite: What’s the total score of the key indices?
Each index is comprised of a complex equation that imports raw data regarding tag operation and then weights it according to the business impact. For instance, the UI changes index looks at all visual calls in the tag, such as images, iframes, CSS, and UI-related HTML. The weighting would involve how significantly each call affects the UI: a call to display a 1x1 pixel would receive a lower score than a large image. A document.write call may be weighted higher than either, depending on its context. Index calculations could be thought of Google PageRank for tags.
Once a raw score for each tag is developed for each index, we then compare it against the thousands of others in our database to develop a rank score from 0 to 100. A score of 56 means that the tag’s actions are more extreme than 56% of its peers. A score of 0 mean that the tag doesn’t perform any actions related to the index and a score of 100 means that the tag is the most extreme example that we have seen (or tied for most extreme). For the security index, a higher score indicates that the tag engages in more insecure transactions than lower-ranked tags. The composite score adds up all the raw scores for each index and then calculates a rank score based on that, with each index being weighted equally.
High scores aren’t necessarily bad, nor are low scores necessarily good. Different tags must perform certain actions based on their function on the page. Ad tags and video players making large changes to the UI is expected, but analytics tags doing anything beyond using a 1x1 clear pixel would be a potential warning sign.
Each index value for a tag is also assigned is color to make visual recognition easier: blue for scores of 0-49, orange for 50-74, and red for 75-100.
The indices are calculated for a given tag across the entire Mezzobit network and not for a specific site. We periodically recalculate the indices based on the volatility of the underlying technology.
How are indices used
The indices are reported within Mezzobit’s Audience Control Module (ACM) in two locations: the table of the tag network view and the details tab for each tag. The color for the composite score is used for the tag diagram node. And soon, there will be a dashboard screen that shows index values averaged across the entire site.
Seeing indices in various reports (and the ability to export values) inform customers about their business partner activities and help Mezzobit to identify larger patterns of suspicious activity. Oftentimes, a glance at a diagram or table can trigger deeper investigation.
More importantly, index values can be used in tag control rules. These rules can shut down the operations of entire tags or components based on meeting certain conditions, or trigger less severe actions, such as logging the violation or sending a notification. Users can set up rules on a variety of conditions, but as indices roll-up many tag attributes into a single number, they make setting up firewall rules much easier.
For instance, there may be a tag on a website that is not expected to bring in any third-party technology. A rule could be set up to trigger on a low tag chaining threshold, with the actual value calculated by looking at other tags on the site. Or there may be tags with no expected visual components that could be screened by a rule with a low UI change index.
What’s coming up
We are actively increasing both the depth and variety of indices, as they form an important component to translating Mezzobit data into actionable business insights.
Depth is how many data elements are used to calculate the index. As we continue to peel back the layers of technology and apply machine learning to understand what’s going on, we will include these elements in our calculations to make the indices more sensitive. The initial cut of PageRank only had a few dozen elements, whereas Google now uses thousands of signals to prioritize search results. Ditto for us as Mezzobit’s database grows larger.
Also, as customers tell us more about what tag actions they worry about, we’ll also change the weighting on the indices to better reflect this.
As for index variety, we’d like to take any data-related problem that customers may experience with tags and turn it into an index to facilitate action. Since we’re gathering a treasure trove of data related to tags, there may be concerns that publishers have that different from e-commerce sites or companies in regulated industries. We have a lot of exciting projects in the works and will share them as they are launched. We already have a browser fingerprinting score that factors into the user tracking index that was recently released.
We also have spoken to customers about developing indices that better instrument their operational or compliance concerns. While the ideal solution would be a customer index builder, we feel that this requires too much knowledge of underlying tag operations to be effectively used. If you have an idea for index, shoot us a note and we may include it in a future release.
Finally, we want to develop more detailed reporting capabilities that would include index values, both grouped and filtered by other attributes (“Show me index values for all ad tags coming in on programmatic transactions”) as well as time series reporting (“Showing me a plot of average site data sharing index values by day over the past month”). In the interim, we can make this data available via custom export for customer analysis.