Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of user consent violates privacy requirements, makes this easier to fingerprint. #366

Open
duaneking opened this issue Jan 17, 2025 · 5 comments

Comments

@duaneking
Copy link

duaneking commented Jan 17, 2025

Questions:

  • If a customer or user says "I don't consent to use topics" how do we turn that off?
  • If a customer persona can't legally consent to be considered for a topic that the website wants them to be considered for because they are underage/lack rights to consent under the law (GDPR, CA 's Privacy Law, etc.)" how do we honor that?
  • How does the user say "no" and turn this off in a meaningful way?
  • How do people know what groups they are being presented as "in"?
  • How do people protect themselves from being considered part of a group that they do not want to be in, without needing to think of everything possible in advance?
  • How do people remove themselves from being considered part of a group that they might have accidentally been considered part of in the past by mistake?
  • What about the legal right to be forgotten?

Your docs say the document.browsingTopics() returns an array of up to three topic objects in random order; Respectfully, The fact that the items are in random order also doesn't really make any real difference since the items in question can simply be reordered on the server side; Its security theater and does not improve security at all. This randomization is useless, so why is it included?

You say the returned array looks like: [{'configVersion': String, 'modelVersion': String, 'taxonomyVersion': String, 'topic': Number, 'version': String}] but I see no consent tracking, no timestamps for when consent was collected, no validation that the claim of consent happened, so I'm unsure how this can ever possibly be within compliance. If the intention is to show ads without consent then that is a problem, Due to the GDPR and other issues.

In addition because it's a number, it's not visible enough for people to actually know what is being sent because that number could be represent anything, so how would a person be able to give informed consent (as required by compliance) if they do not know what that group even is? How do people know what groups they are being presented as? How do people protect themselves from being considered part of a group that would make them targeted

What is the actual goal? Because it doesn't look like the goal is to allow customers to consent to be shown the correct ads that will actually convert; Is the intention then just to make it easier to sell more ads that won't convert? Is conversion secondary? If people are supposed to feel safe enough to want to buy something... then they should actually be interested in the product first, and if they've already opted out of those advertisements what is the point of showing them an advertisement for that topic knowing that they will not convert the ad?

@michaelkleber
Copy link
Collaborator

For questions of how a browser might present Topics-related information to the person using it, you can look at chrome://settings/adPrivacy/interests in a copy of Chrome with Topics turned on.

That page also has the toggle which turns the Topics API on or off in the browser, and a way to prevent association with high-level categories of topics — the 22 top-level categories in the taxonomy, like "Shopping" or "Autos & Vehicles".

Most of your questions, though, are about how a person's topics information is used. That is in the hands of the party who is using the information. I expect any ad tech who plans to use topics as part of ad selection has an answer to those sorts of questions, but I have not seen any share it publicly.

Your docs say the document.browsingTopics() returns an array of up to three topic objects in random order; Respectfully, The fact that the items are in random order also doesn't really make any real difference since the items in question can simply be reordered on the server side; Its security theater and does not improve security at all. This randomization is useless, so why is it included?

Suppose a person is observed once on site A and once on site B, and the three topics observed in each place have a single topic in common. Without randomization, the observer could deterministically know whether or not the matching topic comes from the same epoch. In this situation, the randomization does indeed make cross-site recognition harder (in that "guesses" would be less accurate).

@duaneking
Copy link
Author

chrome://settings/adPrivacy/interests

This does not answer my question.

Most of your questions, though, are about how a person's topics information is used.

No, they are not. My questions speak for themselves and are very direct, intentionally to mitigate communication ambiguity. Even the CIA blocks ads to protect systems, So these are perfectly legitimate questions, from a cybersecurity perspective, as data transferred is data you have lost control of.

I know how the data is used. I've seen the internals of large marketing operations, I know they dont respect privacy, And I'm fully aware that once the data has left the users system, the user has have lost all ability to control it or its use without enforcement action that %99.9999 of people can't afford. I'm also aware that everybody can be unmasked once marketing has collected only 3 pieces information.

As as such it's more and more important to never leak information, and every intersection on a Venn diagram that can be made about a user is a possible unmasking of that user. That's why you limit it to only 3 things, because statistically the higher that number is, the easier it will be for an individual website to unmask somebody.. but simply making that value lower doesn't mitigate that risk either, from a mathematical perspective. It just takes the amount of time needed to unmask somebody and makes it longer.

So I ask again how do I protect my privacy with this?

How do I protect children who have been exposed to this?

I can't make the automatic assumption that anybody is trustworthy enough to give them this data; In America we literally have a legal mandate for zero trust for some things, and this would also violate that compliance requirement.

Suppose a person is observed once on site A and once on site B, and the three topics observed in each place have a single topic in common. Without randomization, the observer could deterministically know whether or not the matching topic comes from the same epoch. In this situation, the randomization does indeed make cross-site recognition harder (in that "guesses" would be less accurate).

The randomization does nothing to protect the user in this case. If I'm sending three items to a server from a client, it doesn't matter what order they are sent to the server from the client int, I'm still sending three items to the server from the client. The order of the items sent on the network does not matter here, since the server can see all of them anyway, no matter what order they are sent in. Maybe the user only wants to send one? Maybe the user doesn't want to send any? Maybe the user disagrees with that category or their place in it and thinks it should be using a different category that would actually convert? Maybe the user is actually part of that category but doesn't want to be declared part of that category due to privacy concerns?

Again, is the goal to simply force people to view ads that Google knows will not convert?

@appascoe
Copy link

I don't work for Google, but this:

I'm also aware that everybody can be unmasked once marketing has collected only 3 pieces information.

is patently false, and it's the foundational premise you use for the rest of your argument. To wit:

  • Person is in the US
  • Person owns a pet
  • Person is under the age of 65

Combined, these three pieces of information, none of which implies the other two, are simply not enough to "unmask" someone as their intersection applies to millions of people. And this really gets to the mathematical thrust: we tend to use information theoretic language to talk about the bits that get leaked from revealing other pieces of information.

What's interesting about the Topics API is that the topics available, the sets that can be constructed from them, and the noise that's built in are all mathematical mechanisms to prevent the unmasking you're concerned about. It's pretty neat stuff, and I recommend engaging with the material.

@michaelkleber
Copy link
Collaborator

It just takes the amount of time needed to unmask somebody and makes it longer. So I ask again how do I protect my privacy with this?

Our privacy design goals for the Topics API did indeed consider that question: How long would it take to unmask someone, if you were to make guesses based on their topics? The paper https://arxiv.org/abs/2304.07210 does a good job at analyzing it. The graph in Figure 3 shows what you're getting at: the probability of correctly guessing who someone is on two different sites increases as you collect data over a longer period of time. That analysis shows that the probability is around 3% after data from 8 weeks.

If that answer to "how do I protect my privacy?" is not to your liking, then of course you can turn the API off, or not turn it on.

Maybe the user only wants to send one? Maybe the user doesn't want to send any? Maybe the user disagrees with that category or their place in it and thinks it should be using a different category that would actually convert? Maybe the user is actually part of that category but doesn't want to be declared part of that category due to privacy concerns?

I agree, we could have included many more user control options like the ones you're describing. Our guess was that very few users would choose to do that kind of hand-tailored configuration, and that a simple on/off switch was the way to be the most helpful to the most people.

Again, is the goal to simply force people to view ads that Google knows will not convert?

As I said, this is a question about how ad techs use the topics to pick ads. Our work on Chrome has involved a lot of conversations with a lot of ad tech companies about what would make this API useful to them, and personally I would be very surprised if any of them used it to pick ads that they know will not convert! Other APIs in the Privacy Sandbox offer privacy-focused ways for advertisers to know which of their ad spending is leading to conversions and which is not. So even as we make Chrome more private, the people who spend money on advertising should still be able to choose not to spend money on ads that don't convert.

@dmarti
Copy link
Contributor

dmarti commented Jan 24, 2025

In a commercial context individual identification can be less of a problem for many users than algorithmic discrimination. For example, a user can experience adverse consequences if classified as a likely member of a group that a property management firm does not rent to, or classified as a person unlikely to be hired by an employer after responding to a job ad, even if not identified individually.

According to the Topics API FAQ,

Chrome can and will take steps to avoid topics that might be sensitive (i.e. race, sexual orientation, religion, etc.). However, it is still possible that websites calling the API may combine or correlate topics with other signals to infer sensitive information, outside of intended use. Chrome will continue to investigate methods for reducing this risk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants