Mobile Attribution in The Privacy-First Era

01010011 @01010011@hackers.pub

Overview

The paradigm of mobile attribution has significantly changed as the importance of privacy protection has come to the forefront. In the past, clear measurement was possible through advertising identifiers such as IDFA and GAID, but in today's world where user privacy protection is prioritized, deterministic user identification is no longer possible.

In this article, we will explain the mobile privacy protection frameworks represented by Apple's SKAdNetwork (SKAN) and Google's Privacy Sandbox, and explore how to probabilistically acquire mobile attribution.

How Did You Hear About Us?

In the 1970s, Detroit car salesman Joe Girard sold 13,001 cars over 15 years, earning him a place in the Guinness Book of Records as "the world's greatest salesman."

The secret to his success wasn't his car-selling technique, but rather his 'system' for continuously creating customers. He called people who would introduce potential customers to him "Bird Dogs". Barbers, restaurant owners, bankers - anyone around him could become his 'Bird Dog'.
His rule was simple: "Send customers my way. If that customer buys a car, I'll immediately send you $25."
For this system to work perfectly, there was one crucial prerequisite: knowing "who referred this customer?" with absolute precision.
When a new customer arrived, Joe Girard would first ask, "Who sent you to me?" And when a sale was completed, he meticulously recorded it in his ledger and made sure to send the promised $25 to the referrer.

Just like Joe Girard's anecdote, in the mobile app service ecosystem, tracking which advertising or marketing activities led to customer acquisition or payments and finding the source of credit is called Mobile Attribution.

Deterministic Attribution

Measuring attribution accurately is extremely important. You need to know whether a customer came through a barber's referral or a banker's referral to determine how much advertising fee to pay to whom.
To answer the question 'Who referred this customer?', mobile platforms provided Ad Networks with the following information:

  • IDFA (Identifier for Advertisers): An advertising identifier provided by Apple for iOS devices that users can reset
  • GAID (Google Advertising ID): An advertising identifier provided by Google for Android devices with Google Play services installed

Both IDFA and GAID are unique values that can precisely identify a user's device. Therefore, knowing the IDFA and GAID allows for accurate Mobile Attribution.

  1. User clicks on an advertisement
  2. Ad Network captures and stores the device's IDFA/GAID
  3. User installs the app and runs it for the first time
  4. Attribution is acquired with 100% accuracy about the user's acquisition path

With this deterministic attribution acquisition method, advertisers could obtain clear data on which advertisements brought in which users, which was the foundation for growing the mobile advertising ecosystem. But now the free lunch is over. Under mobile platforms' strengthened privacy protection stance, deterministic attribution acquisition without user consent is impossible. Like you reading this article, users no longer consent to providing personal information that can specifically identify them.

Probabilistic Attribution

Apple's ATT (AppTrackingTransparency) and SKAN (SKAdNetwork) Framework

As mentioned above, Apple no longer provides IDFA without user consent (more precisely, without "explicit" user consent).

Therefore, since iOS14+, to obtain a device's IDFA value, explicit user permission must be acquired through the ATT (AppTrackingTransparency) framework. If the user refuses consent, the IDFA value is emptied to 000000~ as shown in the screenshot above.

This is a critical problem for both advertisers and Ad Networks. Since they cannot clearly know who was exposed to their advertisements, they cannot identify which path customers came through, and the fundamental premise of the advertising business ("How did you hear about us?") collapses.

As an alternative, Apple provides an advertising identifier acquisition method that offers limited information that cannot identify individuals, which is SKAN (SKAdNetwork).

The data flow of SKAN is as follows:

  1. Ad Network: Publishes advertisements
  2. Ad Network: Registers URL for collecting limited Attribution (Postback)
  3. User: Clicks on ad -> Installs app
  4. Apple: Delays for a certain period (up to 144 hours) to make it difficult for advertisers to track individuals
  5. Ad Network: After a certain time passes, receives campaign ID and limited user information (Conversion Value) via Postback (without IDFA)

Here, the only means to obtain user information is through CV (Conversion Value). CV is a 6-bit value represented as an integer between 0-63, allowing advertisers to map 64 different values to collect user behavior after app installation. For example, CV=1 might mean tutorial completion, CV=2 might mean first in-app purchase completion, etc. These values can be predefined and acquired at the Postback point to analyze users. As you might know, 6 bits is a very limited value, and there's no way to identify specific users using CV alone.

In essence, Apple positions itself as both the referee and sole processor of data measurement. Advertisers and Ad Networks must passively receive and interpret the final results provided by Apple within the strict rules Apple has established.

Android's Privacy Sandbox and Attribution Reporting API

In contrast to Apple's approach of controlling all Attribution acquisition paths and only delivering results, Google provides building blocks that allow advertising ecosystem participants to build their own privacy protection solutions based on privacy-preserving technologies. The core building block is the Privacy Sandbox.

The Privacy Sandbox has three key objectives:

  1. Building new privacy-preserving technologies to replace existing tracking mechanisms.
  2. Supporting publishers and developers to continue providing free online content without invasive tracking.
  3. Collaborating with the industry to establish new internet privacy standards.

In summary, it aims to create industry standards that protect personal information while allowing publishers and developers to sustain ad-based businesses.

The most significant technical distinction between Google's Privacy Sandbox and traditional Attribution acquisition methods is that it creates Attribution matching with Ad Network information within the user's device.

Because Attribution is acquired within the device, advertising businesses can obtain meaningful user conversion information without extracting personal information outside the device.

This anonymized Attribution generated on the user's device is collected through the Attribution Reporting API (abbreviated as ARA).

There are two main types of reports collected through ARA:

  • Event-Level Report: Limited but granular information such as "which ad led to a conversion?"
  • Summary Reports: Detailed conversion data such as "what is the total revenue and ROI of the campaign?" provided in encrypted and aggregated form

Event Level Reports are anonymized individual information. Although they contain individual information, they are anonymized and therefore don't contain much detail. They provide data mapping Attribution information with user clicks, views, and other events. These reports are suitable for measuring campaign reach or aggregating Attribution.

On the other hand, Summary Reports are statistical results of aggregated user data. While they don't contain individualized information, they provide in-depth reports such as conversion value, ROI, and campaign performance analysis by user segment.

This data is delivered to Ad tech platforms (Appsflyer, Meta, Applovin, etc.) in encrypted form (=encrypted aggregatable report), and based on this encrypted data, necessary queries are made to the Aggregation Service located in the Cloud Trusted Execution Environment.

Cloud Trusted Execution Environment (TEE)?
TEE is an isolated environment that operates on the infrastructure of cloud providers that meet Google's proposed security standards and are trustworthy. If the security standards of TEE are met, Ad Tech Platform companies can build and operate it themselves.

Executive Summary

We've explored how to acquire Mobile Attribution in the era of privacy protection. The era of deterministically acquiring Attribution as in the past is over, and all stakeholders must prepare for Attribution acquisition methods in the age of privacy protection.

End of the deterministic era: Enhanced privacy protection has made 1:1 user tracking based on IDFA and GAID impossible.

Transition to the probabilistic era: Instead of clear data, we now need to 'infer' performance based on limited data.

Apple's (SKAN) approach: A 'black box' method where Apple controls the entire process.

Google's (Sandbox) approach: Provides 'building blocks' that the advertising ecosystem can utilize, with 'on-device matching' as the core.

Changing role of Ad Tech: MMPs, advertising networks, and other Ad Tech companies must receive encrypted reports and build and operate an 'Aggregation Service' in a cloud security environment (TEE) to process data themselves.

2

No comments

If you have a fediverse account, you can comment on this article from your own instance. Search https://hackers.pub/ap/articles/01995790-54fb-75f8-a28d-8d8181445a22 on your instance and reply to it.