Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trusted Server Caching and Validation #158

Open
jeffkaufman opened this issue Mar 24, 2021 · 6 comments
Open

Trusted Server Caching and Validation #158

jeffkaufman opened this issue Mar 24, 2021 · 6 comments
Labels
Non-breaking Feature Request Feature request for functionality unlikely to break backwards compatibility

Comments

@jeffkaufman
Copy link
Contributor

jeffkaufman commented Mar 24, 2021

The FLEDGE trusted server call shouldn't use standard HTTP caching semantics because it is logically multiple calls bundled together for efficiency. For example, say the browser wants ["https://www.kv-server.example", "publisher.com", ["key1", "key2"]] and does:

GET https://www.kv-server.example/?hostname=publisher.com&keys=key1,key2

It might receive back something like:

Cache-Control: max-age=3600
...
{"key1": ..., "key2": ...}

If a minute later, within the max-age=3600, it wants ["https://www.kv-server.example", "publisher.com", ["key2"]], it should be able to satisfy that from cache, but current HTTP caching semantics mean it won't know to look at the cache response for the earlier request. Similarly, if wanted ["https://www.kv-server.example", "publisher.com", ["key1", "key2", "key3"]] it should be able to send a request only for key3. One way to handle this would be a FLEDGE-specific cache for these key-value pairs.

Since responses may be large, it would be good to be able to handle revalidation requests, allowing the trusted server to save bytes on keys that are already up-to-date in cache. The intended cache status is at the key level, instead of the request level, so the browser can't use request-level If-None-Match or If-Modified-Since headers.

Could the request consist of a list of key-validator pairs, allowing the server to omit any keys whose values haven't changed? For example:

[
  ["key1", ["If-None-Match", "<hash of previous value1>"]],
  ["key2", ["If-None-Match", "<hash of previous value2>"]],
  ["key3", []], // no previous value
  …
]

Alternatively, it's possible that with HTTP/2+ the cost of sending one request per key (https://www.kv-server.example/?hostname=publisher.com&key=key1) would be low enough that we don't need any batching? Our guess is that this is not the case, but it's an empirical question. That would also make the browser implementation cleaner, since these calls would no longer require special cache treatment.

@jeffkaufman
Copy link
Contributor Author

Now that Chrome is farther along in the implementation, I think this may be worth revisiting, especially the first half of the proposal around a FLEDGE-specific cache. In the current implementation, the trustedScoringSignals response is never cached while the trustedBiddingSignals response is only cached if the URL matches exactly (as you would expect from the standard cache semantics). An uncacheable or poorly cached round trip that blocks the auction is quite bad from a latency perspective, so it would be nice if we could do something better here.

@JensenPaul JensenPaul added the Non-breaking Feature Request Feature request for functionality unlikely to break backwards compatibility label Jun 23, 2023
@rdgordon-index
Copy link
Contributor

In the current implementation, the trustedScoringSignals response is never cached

@MattMenke2 -- can you amend #906 to cover this as well? There's mention of a global HTTP cache, but only in the context of bidder worklets, not seller worklets -- and I think it's crucial to clarify if this is a still a privacy concern or not with the current implementation.

@MattMenke2
Copy link
Contributor

I don't think this is worth writing up at the moment - we just use standard HTTP caching semantics - I think we may use a transient network partition for seller signals, because we can't really leak the URL to any network partition without essentially leaking cross-origin cookie-equivalents to whatever network partition we use, while we currently use the bidder's 1P partition for bidder signals (which is also leaky).

We'll need to move over to something completely different once we use an actual trusted server for these fetches. Since OHTTP doesn't work with HTTP caching semantics, we'll need our own cache at that point (probably short lived an in memory, but we'll see). We may wire up that cache in place of the current HTTP caching semantics + network partitioning scheme for our current request format (and switch to a more privacy-preserving partitioning approach as well - e.g., we could use the network partition used to join the interest group in the first place, though that would potentially mean more network connections/requests).

Anyhow, a lot to work out here. I don't think it's worth documenting how things currently work here, as it's likely going to change pretty drastically at some point in (hopefully) the fairly immediate future, though any rollout of new behavior here will likely be slow, to compare its performance with the current behavior.

@rdgordon-index
Copy link
Contributor

we just use standard HTTP caching semantics

So, just to be explicit, is the following statement above no longer true re: TSS being "never cached"?

In the current implementation, the trustedScoringSignals response is never cached while the trustedBiddingSignals response is only cached if the URL matches exactly

@MattMenke2
Copy link
Contributor

With network partitioning, all network requests need to be associated with a network partition. With HTTP caching, any page that shares a network partition with trusted signals can probe the cache for responses, to try to see what bids a page made. This is a pretty serious violation of both FLEDGE's user-tracking model and, more fundamentally, the cross-origin attack model of the web, since it potentially exposes cookie-equivalents to third parties. If we use the publisher's network partition, we expose ad URLs for bids to the publisher page and everything in it. If we use the seller partition, we expose what ads bidders wanted to show to the user to the seller (and any 3P scripts they run, if the user navigates to the seller's origin), which is also not great.

So I think we currently act as if seller requests came from an opaque origin. I'm not a cache expert, but I think our cache may not cache anything for network partitions associated with opaque origins, currently, so I think the never caching may well actually be true. If we did cache in those cases, auctions run at the same time from the same page could actually be cached (All requests for the lifetime of an internal SellerWorklet object share a gobally unique key for their opaque origin, until we tear down the seller worklet).

For bidder fetches, we do use the bidder's network partition, which has the same 3P script leaking funkiness as the seller's origin, so we do probably want to improve things there, too, but caching is also likely more useful there, hence having the temporary leak until we have a better caching strategy.

@morlovich
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Non-breaking Feature Request Feature request for functionality unlikely to break backwards compatibility
Projects
None yet
Development

No branches or pull requests

5 participants