New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FLEDGE - shuttling data from generate_bid to report_win #146
Comments
We agree this would be a useful feature. Here's another use case: when a buyer (DSP) participates in an auction they're typically bidding on a CPM basis: how much would they pay for this impression? They will often be charging their advertisers, however, in a way that better matches advertiser goals. Most commonly this is charging per click (CPC), though buyers also offer products billed per active view (CPMAV) or conversion (CPA). Additionally, buyers often allow advertisers to specify a maximum amount they are willing to pay, but the buyer still attempts to spend as little as possible, passing along the savings. This makes the advertiser experience much better, but requires extra information in billing. Imagine an advertiser is willing to spend up to $1 for a click (MaxCPC), and the buyer predicts a 3% chance of clicking (pCTR). The buyer would be willing to bid up to $0.03 ($30 CPM) but instead predicts it can win with a bid of $0.02 ($20 CPM). It does, and it wins the auction. If the user doesn't click, the advertiser is charged nothing: the buyer is taking the risk that it calculated pCTR incorrectly. If they do click, the advertiser is charged bid / pCTR ($0.02 / 3% = $0.67). This means the buyer needs to know pCTR for billing and budget enforcement. In FLEDGE, the buyer will calculate pCTR in Stepping back, the
Because of the privacy implications of (2), the responses to the In many cases, buyers may be willing to restrict themselves to making decisions only on the combination of (a) contextual information and (b) the interest group name or Instead, we could allow derived bidder information to feed into We could add a new function,
Note that this receives This produces an arbitrary
For the CPC case, while this information isn't technically needed until the click, having it in Since
It would be possible for We do acknowledge that this is temporary, and is based on distinctions that will not make sense after the transition to aggregate reporting. Still, we think it provides a critical feature for buyers in migrating to FLEDGE. [1] In some cases [2] The trusted server url is currently per-user, which would additionally be a hole in the privacy model. This is already a hole for |
@michaelkleber would a PR to integrate |
It sounds like you're not proposing any change in how data is retrieved from the key-value server, but rather treating some of the retrieved values differently from others. I think this would be a substantial change in the assumptions we're making about the K-V server. In the current design, as long as the K-V server only sends information back to the browser that asked it a question, privacy is preserved; no matter what logic in the K-V server implements in picking the returned values, it can't affect FLEDGE's privacy properties, only which ad shows. But in the model you're proposing, the K-V server has a channel to return arbitrary data through the browser and to event-level logging. Aggregate logging really seems like the right solution for the problems being described here. |
@michaelkleber The idea behind the proposal above was that in any case where it is safe to report Here's an alternative proposal which gets most of the benefits of In every response the key-value server could include the version number as a header:
Then the browser would make this number available to reporting via
A key and version number are sufficient to determine what the server would have responded with. In cases where the key is already available in reporting, such as To take the example of pCTR above, since all the inputs to the calculation are either (a) already available in reporting or (b) coming from the key-value server, then the calculation can be re-run on the server after reporting. One risk with this proposal is that a key-value server could attempt to smuggle extra information back through the version number. Because in standard usage the version number would gradually increment and the version for one request would match contemporaneous requests, abusive usage would be easily identifiable externally. Additionally, whatever method is chosen for validating that the key-value server is spec-compliant general should be able to validate that version numbers do not depend on request-specific data. |
Thank you @jeffkaufman, this makes a ton of sense. Reporting a slowly-incrementing version number sent back by the KV server does seem like a great way to be useful while making any attempted abuse easily visible. Of course you're quite right that there is a risk of version skew during updates. But honestly that was a risk all along, with a single KV server response including the values for multiple keys! So including an explicit version number probably makes that risk smaller, since it forces the server to think about skew instead of sweeping it under the rug. @MattMenke2 and @brusshamilton PTAL? |
Yup, looks good. Jeff, want to send a PR to add this to the FLEDGE explainer? Or else we'll get to it when someone has time. |
@jeffkaufman Maybe I am missing something, but I don't see how this #243 addresses my original comment that opened this issue. In particular, how does this help buyers pace spend or monitor prediction accuracy? |
@mbowiewilson let's say you're using the name of the interest group as your key. That value is available in reporting, assuming it passes a k-anonymity threshold. Combine that with data-version, and you can figure out what the contents of the key-value response would have been. Then you can, for example, re-run your production code on the server, to calculate what it would have generated on the client, letting you check its accuracy. |
Gotcha, thanks for explaining @jeffkaufman. I think this is a step in the right direction, but I am not yet convinced it totally solves the issues I raised. I say this because |
@mbowiewilson that makes sense! If the buyer is willing to restrict themselves to the interest group name and information from the trusted server that is keyed on the name, then they can fully reconstruct things, but not if they want to depend on additional information. Unfortunately, this is not easy to fix: the information you're talking about is potentially specific to a single user, so I don't see how it could be made available to event level reporting. I wonder if this might be a better fit for aggregate reporting? |
Thanks for the quick responses @jeffkaufman, I understand your point about this being difficult from a privacy perspective. I think this mechanism mostly achieves the third use-case I list above ("Monitoring inputs to generate bid"). Probably the second bullet point ("monitoring prediction accuracy") can be done with aggregate reporting since that isn't time-sensitive. Regarding aggregate reporting and the first use-case ("spend pacing improvements") there have already been concerns about the utility of aggregate reporting for spend pacing (especially for smaller campaigns), and the same concerns would apply to the performance-based spend pacing approach I brought up. So, depending on the details of the aggregate reporting API, that use-case may still be an unresolved issue. |
Do you want me to modify my PR so it doesn't claim this issue is fixed? I think of it as being as fixed as it is going to get, since I don't think there's any practical way to modify the spec to supply an arbitrary pipe from generateBid to reportWin, but you're the one who filed so I'll defer to you. |
I don't mind if you want to call this issue fixed/resolved. The remaining part of this issue is mostly related to #145. |
Thanks for the discussion, both of you! I agree that handling non-personalized inputs, as in the PR, is an easier case, and general access to personalized signals is a better fit for aggregated reporting, which will need more discussion. |
Fixes #146 (As per #146 (comment) this fix only takes care of some of the use cases described in that Issue, but it addresses the ones that are readily compatible with event-level reporting.)
I'd like to comment on this issue in light of my current understanding of requirements placed on the key-value server. The idea behind the
I think that in deployments that need to support incremental updates and/or heterogenoeus data, the |
To summarize my proposal presented in yesterday's call: It is difficult to meaningfully enforce that an untrusted key-value server is behaving correctly; that is, it is not leaking information about the user (learned from the request) via the data-version field. In current state, a malicious untrusted server could leak 64 bits of user's identity via the data-version mechanism into event-level reporting. We heard feedback that data-version is useful for certain use cases. To address the privacy risk, we propose to reduce the resolution of data-version to ~8 bits, enforced by the browser. We propose this restriction to remain in place for as long as an untrusted ("bring your own") key-value server is used. With a trusted key-value server (where the code is open source, the server runs on suitable secure hardware that supports remote code attestation), it will be possible to verify by code inspection that the returned data-version value is independent of the request's content. With such a trusted server, restrictions on the resolution of data-version can most likely be removed. Reduction to 8 bits of entropy can be acheved in two ways:
|
In FLEDGE, as I understand, there is not a mechanism for shuttling data computed in
generate_bid
toreport_win
, but we think this functionality would be extremely useful for buyers. This is because, a buyer may have computed interesting and useful intermediate results as a byproduct of computing the bid price. As a concrete example, a buyer may have computed the predicted probability of a click or conversion on their way to determining the bid price. Furthermore, the ability to log data fromgenerate_bid
inreport_win
would unlock a lot of useful monitoring of the bidding workflow.Some benefits of having intermediate data from
generate_bid
logged inreport_win
would be:generate_bid
allows buyers to compare the accuracy of the prediction used in bidding with the actual outcomes they observe. This would be a key sanity check of any machine learning pipeline that fedgenerate_bid
.generate_bid
-- Logging certain portions of the input togenerate_bid
, such as the value of thetrusted_bidding_signals
, would allow buyers to sanity check, validate, and debug their bidding pipelines.In terms of implementation, I was imagining the object returned from
generate_bid
having a field I'll callbidding_values_to_log
in which a buyer could put some information they wanted to log, and that this new field would be passed to the report_win function.The text was updated successfully, but these errors were encountered: