-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start identifying "capabilities" required to address proposed use cases #1
Comments
On behalf of Google Ad Traffic Quality Abuse Vectors for Invalid Traffic (IVT) / Ad FraudAn Invalid Traffic taxonomy provides one framework for classifying IVT abuse vectors. Below is a sample of some abuse vectors for IVT / Ad Fraud that Google’s Ad Traffic Quality team has identified as impacting ads monetization on the web. IVT detection for these abuse vectors is dynamic due to the space being highly adversarial, with bad actors that continuously develop new methods to bypass detection. IVT detection is commonly segmented across three areas of interest: 1) Non-Human Traffic, 2) Incentivized Human Traffic, and 3) Misrepresented or Manipulated Human Traffic. By looking at these three areas we can determine if relevant ad events should be considered IVT, or if they are organic (genuine human interactions resulting from genuine interest). The list below highlights a subset of common categories of IVT. It should be noted that the list is not exhaustive of all abuse vectors, and that the vectors listed in each category are not necessarily mutually exclusive from each other. 1. Non-Human TrafficBotnetsTypically based on malware that has infected a user’s device without their consent (computer, mobile device, or other system), rendering the device a “bot” that combined with other bots comprises a “botnet.” Botnets drive automated traffic that tries to mimic human behavior, often by opening hidden browser windows on infected devices. Some malware emulates user clicks using random or predetermined click patterns. Emulators / Virtual MachinesNon-human invalid traffic often comes from virtual devices, potentially including emulated mobile devices, virtual machines running in data centers, etc. While not all ads interactions from emulated or simulated devices are non-human, it is Google’s practice, as well as an industry standard, to deem this type of ads traffic as IVT when detected. Scripted AttacksIVT attacks that are not distributed via botnets, but run in an automated fashion, typically include:
2. Incentivized Human TrafficSelf-clicking ActivityPublishers may attempt to drive revenue by interacting with ads on their own websites or apps. User without Genuine Intent / Use of IncentivesUsers that engage with ads with a lack of intent or interest (e.g. browser extensions that provide financial rewards for clicks). Ad interactions where users are offered a direct or indirect monetary incentive (in the form of currency or an equivalent) for interacting with ads without disclosure to advertisers. This does not refer to rewarded traffic, where advertisers are aware that publishers offer users non-currency (or equivalent) rewards (redeemable only within the app/site/game) in exchange for interacting with ads. 3. Misrepresented or Manipulated Human TrafficUser Geolocation MisrepresentationBad actors may try to swap the reported country of their users, to fetch higher ad revenues (eg. swap the IP address or country of emerging markets users for developed markets with higher average ad prices). Users may state they are in a country that does not match their actual location as a tactic to evade anti-abuse defenses or to be perceived as less risky by associating with a country with lower abuse rates. Publisher Inventory MisrepresentationBad actors may misrepresent the ad inventory they are monetizing, in an attempt to fetch higher revenue in ads auctions (e.g., a low quality site or app may claim to be a high-value, known name brand). ClickjackingThe use of deceptive elements (e.g., buttons or short link redirections) or interfaces on a web page or app to trick users into clicking on an ad (that they didn’t expect to click on). Hidden AdsAds that are impossible to see under any normal circumstances. They are ads tucked under iframes, hidden behind content, hidden behind other ads (aka “ad stacking”), inside invisible HTML containers, or ads that are displayed but too small to be seen (aka “pixel stuffing”). Accidental ClicksWhen users inadvertently click on an ad, even though they didn’t mean to do so. Publishers are not permitted to create interfaces that may lead users to accidentally click on ads. This includes implementing the ads in a way that they might be mistaken for other site content, such as a menu, navigation, or download links. Appearing as Multiple Users from the Same Device / BrowserUsers may try to disguise their activity by removing cookies or using other tactics to hide their high ads activity in order to appear as multiple users interacting with the same ads. IVT defenses should be able to determine when an actor is attempting to appear as multiple users from the same device/browser instance. |
On behalf of Google Ad Traffic Quality Anti-Abuse NeedsInvalid Traffic (IVT) and Ad Fraud detection requires additional capabilities to protect ad systems against bad actors. As seen in the common use cases observed by Google’s Ad Traffic Quality team, there is a diverse set of abuse vectors. Many of these vectors are continuously emerging and evolving to evade detection. Below is a non-exhaustive list of capabilities to assist in IVT and Ad Fraud detection that enable defenses against these vectors. It is important to note that no single mechanism or tool is a comprehensive solution in defending against IVT, and that each mechanism is part of a layered defense strategy, as bad actors develop new tactics and techniques to generate IVT and commit Ad Fraud. Authenticity of a UserInvalid traffic defenses should be able to determine the realness and human qualities of ad interactions.
Organic Ads TrafficInvalid traffic detection requires the ability to separate “normal” or “organic” ads interactions from invalid or non-organic interactions. Today this is accomplished in part by evaluating interaction signal anomalies, as well as conversion metrics. Coordinated Attack DetectionThe detection of threats that are generated by multiple actors working in a unified, synchronized, and coordinated manner. Such attacks include coordinated clicking from a ring of publishers who all agree to click on each others' ads (eg. co-clicking, bad actor rings). True Traffic Origin and DestinationDetermine instances where the user's actual location is not aligned with the stated location, which can distort publisher metrics related to CPC and CPM. Users may disguise their location to be in a more trusted country to bypass anti-abuse defenses. ResilienceIVT defenses and attestation signals (including a browser or platform’s Privacy Preserving APIs for ads targeting and measurement) should not be easily reverse engineered or manipulated. This need has been addressed historically through obfuscation and non-public disclosure of IVT signals and defenses. Additionally these signals must be able to be regularly updated as defense needs change, and their effectiveness should be measurable over time. |
Concern: If we try to enumerate all the varieties of devious behavior I worry the list will be unbounded. It's much simpler to explain the normal behavior that we need to be able to recognize clearly: Counterabuse needs:
With this basic information, subsequent behavioral analysis of system usage can recognize incentivization, coordination and related behavioral abuse. Counterabuse will also use account-level intellligence, but account infrastructure seems like something we take for granted and so perhaps out-of-scope here. |
I share the intuition that there may be a low number of canonical assertions (e.g. user is human, platform security model is unbroken, geo-location is correct) that will cover a wide array of use cases. The ambition of enumerating the use cases and requirements/capabilities for each is to connect the real-world motivations (e.g. preventing account takeover, social media manipulation, DoS) with the capabilities/assertions they require. I agree with your conclusion, but we have align on the priority of these requirements across all stakeholders, including those who are newer to the trust & safety / anti-fraud / security domains. I hope that once we have capabilities from a variety of use case owners, we can consolidate and up-level them as it makes sense (potentially arriving at a short list similar to yours). However, once we have the mapping of use cases to capabilities, it is easier to go back and say "if we are unable to detect an attacker who is 'Appearing as Multiple Users from the Same Device / Browser,' we open the door to social media manipulation, denial of service attacks, ad fraud, etc" and include the relevant stakeholders/use-case-havers when discussing the validity and urgency of this requirement. I'm open to more efficient ways of establish the criticality and completeness of these requirements, if you have suggestions. |
Initial commit of Capabilities, which reflect submitted issues from working group members. I took Yarne Habberman's proposal and copied it in, and then interleaved information from the capabilities listed in this comment: #1 (comment) Also added submitted 'Domain Spoofing' use case.
Per Philipp's request, pasting the functional and non-functional requirements detailed for the IAB Tech Lab Authenticated Devices standard here: Functional requirements
Non-functional requirements
|
To help Google ensure that we have a complete inventory or capabilities and understand the anti-fraud ecosystem's priorities, we have launched a capabilities gathering survey. This survey can be accessed at https://google.qualtrics.com/jfe/form/SV_5p3y9l2N1LYQYpU. We invite you all to participate in this survey as well. A few FAQs on the survey: We will also ask for this survey to be added to the agenda at one of the next CG meetings. |
The results of the capabilities gathering survey have been posted below, along with individual responses from four organizations that wanted their results to be published as well: Capabilities Gathering Survey Results (Publish).pdf |
For the sake of efficiency, it might be prudent to vote on the use cases document before we file individual "capability" issues / requests against it. That said, it would behoove us - especially folks working on defensive teams that have relevant use cases - to start mapping out what capabilities we require in order to address our use cases.
To that effect, I invite folks to use this issue as a scratch pad, describing the capabilities their use cases require. We can then factor out the distinct capabilities (I assume there will be overlap) and file them as discrete issues for targeted discussion.
The text was updated successfully, but these errors were encountered: