Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start identifying "capabilities" required to address proposed use cases #1

Open
philippp opened this issue May 13, 2022 · 7 comments
Open

Comments

@philippp
Copy link
Contributor

For the sake of efficiency, it might be prudent to vote on the use cases document before we file individual "capability" issues / requests against it. That said, it would behoove us - especially folks working on defensive teams that have relevant use cases - to start mapping out what capabilities we require in order to address our use cases.

To that effect, I invite folks to use this issue as a scratch pad, describing the capabilities their use cases require. We can then factor out the distinct capabilities (I assume there will be overlap) and file them as discrete issues for targeted discussion.

@dvorak42 dvorak42 transferred this issue from antifraudcg/proposals Jun 6, 2022
@jross1012
Copy link

On behalf of Google Ad Traffic Quality

Abuse Vectors for Invalid Traffic (IVT) / Ad Fraud

An Invalid Traffic taxonomy provides one framework for classifying IVT abuse vectors. Below is a sample of some abuse vectors for IVT / Ad Fraud that Google’s Ad Traffic Quality team has identified as impacting ads monetization on the web. IVT detection for these abuse vectors is dynamic due to the space being highly adversarial, with bad actors that continuously develop new methods to bypass detection.

IVT detection is commonly segmented across three areas of interest: 1) Non-Human Traffic, 2) Incentivized Human Traffic, and 3) Misrepresented or Manipulated Human Traffic. By looking at these three areas we can determine if relevant ad events should be considered IVT, or if they are organic (genuine human interactions resulting from genuine interest).

The list below highlights a subset of common categories of IVT. It should be noted that the list is not exhaustive of all abuse vectors, and that the vectors listed in each category are not necessarily mutually exclusive from each other.

1. Non-Human Traffic

Botnets

Typically based on malware that has infected a user’s device without their consent (computer, mobile device, or other system), rendering the device a “bot” that combined with other bots comprises a “botnet.” Botnets drive automated traffic that tries to mimic human behavior, often by opening hidden browser windows on infected devices. Some malware emulates user clicks using random or predetermined click patterns.

Emulators / Virtual Machines

Non-human invalid traffic often comes from virtual devices, potentially including emulated mobile devices, virtual machines running in data centers, etc. While not all ads interactions from emulated or simulated devices are non-human, it is Google’s practice, as well as an industry standard, to deem this type of ads traffic as IVT when detected.

Scripted Attacks

IVT attacks that are not distributed via botnets, but run in an automated fashion, typically include:

  • Scripts to view / click ads that are running locally on a single or small number of machines
  • Scripts used by click farms (coordinated activity aimed at generating large amounts of seemingly human clicks)
  • Scripts running in data centers
  • Web Scrapers (that are declared or not according to industry standards) aimed at collecting data for commercial or research purposes

2. Incentivized Human Traffic

Self-clicking Activity

Publishers may attempt to drive revenue by interacting with ads on their own websites or apps.

User without Genuine Intent / Use of Incentives

Users that engage with ads with a lack of intent or interest (e.g. browser extensions that provide financial rewards for clicks). Ad interactions where users are offered a direct or indirect monetary incentive (in the form of currency or an equivalent) for interacting with ads without disclosure to advertisers. This does not refer to rewarded traffic, where advertisers are aware that publishers offer users non-currency (or equivalent) rewards (redeemable only within the app/site/game) in exchange for interacting with ads.

3. Misrepresented or Manipulated Human Traffic

User Geolocation Misrepresentation

Bad actors may try to swap the reported country of their users, to fetch higher ad revenues (eg. swap the IP address or country of emerging markets users for developed markets with higher average ad prices). Users may state they are in a country that does not match their actual location as a tactic to evade anti-abuse defenses or to be perceived as less risky by associating with a country with lower abuse rates.

Publisher Inventory Misrepresentation

Bad actors may misrepresent the ad inventory they are monetizing, in an attempt to fetch higher revenue in ads auctions (e.g., a low quality site or app may claim to be a high-value, known name brand).

Clickjacking

The use of deceptive elements (e.g., buttons or short link redirections) or interfaces on a web page or app to trick users into clicking on an ad (that they didn’t expect to click on).

Hidden Ads

Ads that are impossible to see under any normal circumstances. They are ads tucked under iframes, hidden behind content, hidden behind other ads (aka “ad stacking”), inside invisible HTML containers, or ads that are displayed but too small to be seen (aka “pixel stuffing”).

Accidental Clicks

When users inadvertently click on an ad, even though they didn’t mean to do so. Publishers are not permitted to create interfaces that may lead users to accidentally click on ads. This includes implementing the ads in a way that they might be mistaken for other site content, such as a menu, navigation, or download links.

Appearing as Multiple Users from the Same Device / Browser

Users may try to disguise their activity by removing cookies or using other tactics to hide their high ads activity in order to appear as multiple users interacting with the same ads. IVT defenses should be able to determine when an actor is attempting to appear as multiple users from the same device/browser instance.

@jross1012
Copy link

jross1012 commented Jun 27, 2022

On behalf of Google Ad Traffic Quality

Anti-Abuse Needs

Invalid Traffic (IVT) and Ad Fraud detection requires additional capabilities to protect ad systems against bad actors. As seen in the common use cases observed by Google’s Ad Traffic Quality team, there is a diverse set of abuse vectors. Many of these vectors are continuously emerging and evolving to evade detection. Below is a non-exhaustive list of capabilities to assist in IVT and Ad Fraud detection that enable defenses against these vectors. It is important to note that no single mechanism or tool is a comprehensive solution in defending against IVT, and that each mechanism is part of a layered defense strategy, as bad actors develop new tactics and techniques to generate IVT and commit Ad Fraud.

Authenticity of a User

Invalid traffic defenses should be able to determine the realness and human qualities of ad interactions.

  • Real and authentic mouse/keyboard/touchscreen inputs
  • "Non-human" low latency of reaction time
  • "Non-human" low or artificially high variability in time between actions/user-events
  • Authenticity of device, environment, and event are critical inputs to effective assessment of authentic interactions.

Organic Ads Traffic

Invalid traffic detection requires the ability to separate “normal” or “organic” ads interactions from invalid or non-organic interactions. Today this is accomplished in part by evaluating interaction signal anomalies, as well as conversion metrics.

Coordinated Attack Detection

The detection of threats that are generated by multiple actors working in a unified, synchronized, and coordinated manner. Such attacks include coordinated clicking from a ring of publishers who all agree to click on each others' ads (eg. co-clicking, bad actor rings).

True Traffic Origin and Destination

Determine instances where the user's actual location is not aligned with the stated location, which can distort publisher metrics related to CPC and CPM. Users may disguise their location to be in a more trusted country to bypass anti-abuse defenses.
Determine instances where the page being visited claims to be another, so as to manipulate ad prices.

Resilience

IVT defenses and attestation signals (including a browser or platform’s Privacy Preserving APIs for ads targeting and measurement) should not be easily reverse engineered or manipulated. This need has been addressed historically through obfuscation and non-public disclosure of IVT signals and defenses. Additionally these signals must be able to be regularly updated as defense needs change, and their effectiveness should be measurable over time.
Of note, bad actors are highly motivated to evade detection and circumvent device, environment, and event attestation solutions.

@jbradleychen
Copy link

jbradleychen commented Jun 28, 2022

Concern: If we try to enumerate all the varieties of devious behavior I worry the list will be unbounded. It's much simpler to explain the normal behavior that we need to be able to recognize clearly: Counterabuse needs:

  • adversarially resistant evidence that the user is human
  • accurate software and hardware state, delivered in a fashion that is tamper-proof
  • coarse geography, for the purpose of consistency checks, content management, etc.

With this basic information, subsequent behavioral analysis of system usage can recognize incentivization, coordination and related behavioral abuse. Counterabuse will also use account-level intellligence, but account infrastructure seems like something we take for granted and so perhaps out-of-scope here.

@philippp
Copy link
Contributor Author

I share the intuition that there may be a low number of canonical assertions (e.g. user is human, platform security model is unbroken, geo-location is correct) that will cover a wide array of use cases. The ambition of enumerating the use cases and requirements/capabilities for each is to connect the real-world motivations (e.g. preventing account takeover, social media manipulation, DoS) with the capabilities/assertions they require.

I agree with your conclusion, but we have align on the priority of these requirements across all stakeholders, including those who are newer to the trust & safety / anti-fraud / security domains. I hope that once we have capabilities from a variety of use case owners, we can consolidate and up-level them as it makes sense (potentially arriving at a short list similar to yours).

However, once we have the mapping of use cases to capabilities, it is easier to go back and say "if we are unable to detect an attacker who is 'Appearing as Multiple Users from the Same Device / Browser,' we open the door to social media manipulation, denial of service attacks, ad fraud, etc" and include the relevant stakeholders/use-case-havers when discussing the validity and urgency of this requirement.

I'm open to more efficient ways of establish the criticality and completeness of these requirements, if you have suggestions.

samuel-t-jackson added a commit that referenced this issue Sep 1, 2022
Initial commit of Capabilities, which reflect submitted issues from working group members. I took Yarne Habberman's proposal and copied it in, and then interleaved information from the capabilities listed in this comment: #1 (comment)

Also added submitted 'Domain Spoofing' use case.
@donivatamazondotcom
Copy link

Per Philipp's request, pasting the functional and non-functional requirements detailed for the IAB Tech Lab Authenticated Devices standard here:

Functional requirements

  1. Enable on-demand attestation of device and app information in ad traffic so that recipients don’t have to blindly trust seller-provided information.
  2. Enable independent verification of attestations so that recipients can gauge the authenticity of the associated transaction.
  3. Enable aggregation of device attestation data at a seller level so that enforcement actions may be taken.
  4. Enable support for disparate device types and user agents such as streaming TV (Advanced TV) devices, audio devices, smartphones, tablets, desktops, web browsers, etc. MVP may be Advanced TV if others are not feasible to include.

Non-functional requirements

  1. Protect against attacks such as replay forgery or context transplanting. Keep certain context (pixel firings) tied to the device that’s being shown.
  2. Except for device manufacturer attestation processes where the device manufacturer uses a persistent identifier to identify requests coming from a specific device over a period of time, no identifier should leave the device that can be used to correlate the device's content consumption activities with a specific device or consumer profile.
  3. The attestation verification process should not adversely impact the user experience such as the bidding process or the creative delivery process.
  4. Attestations should be non-repudiable so that verifiers can trust the integrity and durability of attestations across the lifecycle of the attestation.
  5. Implementers should be able to support distinct versions of the protocol so that future releases do not break existing implementations.
  6. Leverage established and well-tested cryptographic mechanisms and frameworks as much as possible to reduce the potential for exploits.
  7. The proposed mechanism should support all popular device manufacturers within the supported device type. This assumes that mechanisms may be distinct across device types: the mechanism used for desktop device attestations may be distinct from the mechanism used for streaming TV devices.
  8. The impact of a compromise of any on-device keys this mechanism relies on should not extend beyond that set of devices. Essentially, learning a secret key used by a set of devices should not result in other devices being impacted, so that the blast radius of the compromise is limited.
  9. The verification mechanism should align with existing verification models employed by marketers, such as using independent verification vendors to measure and report on the quality of their ad campaigns.
  10. Client applications that generate ad opportunities should not require changes to support this standard, since updating apps causes considerable churn.
  11. Attestations should have a limited life so that a non-repudiable cryptographic record of consumer or business activity is not generated and stored.
  12. Independent verification of signatures should not require a round trip to the device or the attester so that latency is minimized.
  13. Proposed mechanisms should not require publisher/developer support to implement, so that it is not dependent on adoption by parties that have a vested interest to not support the mechanism.
  14. Since advertising systems operate under stringent latency requirements, creating attestations and verifying attestations should not be resource-intensive operations.
  15. The attestation mechanism should be durable across device software and hardware updates so that availability of attestation data is not impacted over time.
  16. Proposals should minimize dependencies on other transparency standards (especially standards like Authenticated Delivery) since it can potentially hinder adoption of this standard.
  17. While this standard focuses on abating device and app misrepresentation, proposals should allow for flexibility so that use cases presently deemed out of scope can be supported in the future.

@neha-goog
Copy link

To help Google ensure that we have a complete inventory or capabilities and understand the anti-fraud ecosystem's priorities, we have launched a capabilities gathering survey. This survey can be accessed at https://google.qualtrics.com/jfe/form/SV_5p3y9l2N1LYQYpU. We invite you all to participate in this survey as well.

A few FAQs on the survey:
Who can take it? Any organization with anti-fraud needs in any region. This organization can be of any size and in any industry vertical (including payments, eCommerce, social media etc.).
Till when can I take the survey? The survey will be active till January 31, 2023.
Who will see my information? Consolidated results (with no organization name) will be published in this GitHub after the survey closes. Respondents have the option to include their organization name if they would like their individual response published in GitHub. Respondents also have the option to include their name and email address if they would be willing to discuss their response in detail with Google 1:1.

We will also ask for this survey to be added to the agenda at one of the next CG meetings.

@neha-goog
Copy link

The results of the capabilities gathering survey have been posted below, along with individual responses from four organizations that wanted their results to be published as well:

Capabilities Gathering Survey Results (Publish).pdf
dstillery (Publish in AFCG) (1) (1) (1).pdf
F5 (Publish in AFCG) (1).pdf
IDWall (Publish in AFCG) (1).pdf
Socure (Publish in AFCG) (1).pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants