You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Nick Kew <ni...@webthing.com> on 2007/10/03 15:23:04 UTC

ETag and Content-Encoding

http://issues.apache.org/bugzilla/show_bug.cgi?id=39727

We have some controversy surrounding this bug, and bugzilla
has turned into a technical discussion that belongs here.

Fundamental question:  Does a weak ETag preclude (negotiated) 
changes to Content-Encoding?

Summary:

Original bug: mod_deflate may compress/decompress content
but leave an existing ETag in place.

[ various discussion followed ]

Yesterday: I committed a fix to /trunk/, assuming it would
be uncontroversial.  The fix is that any existing ETag should
be made a weak ETag if mod_deflate either inflates or
deflates the contents.  Rationale: a weak ETag promises
equivalent but not byte-by-byte identical contents, and
that's exactly what you have with mod_deflate.

Henrik Nordstrom commented:

  "Not sufficient. The two versions is not semantically equivalen as one
  can not be exchanged for the other without breaking the protocol. In
  the context of If-None-Match the weak comparator is used in HTTP and
  there a strong ETag is equal to a weak ETag."

Further discussion followed.  I won't repost it here in full, but
since there clearly is an issue, it needs discussing here.

Cc: folks subscribed to the bug.

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/

Re: ETag and Content-Encoding

Posted by Ruediger Pluem <rp...@apache.org>.

On 10/03/2007 03:23 PM, Nick Kew wrote:
> http://issues.apache.org/bugzilla/show_bug.cgi?id=39727
> 
> We have some controversy surrounding this bug, and bugzilla
> has turned into a technical discussion that belongs here.
> 
> Fundamental question:  Does a weak ETag preclude (negotiated) 
> changes to Content-Encoding?
> 
> Summary:
> 
> Original bug: mod_deflate may compress/decompress content
> but leave an existing ETag in place.
> 
> [ various discussion followed ]
> 
> Yesterday: I committed a fix to /trunk/, assuming it would
> be uncontroversial.  The fix is that any existing ETag should
> be made a weak ETag if mod_deflate either inflates or
> deflates the contents.  Rationale: a weak ETag promises
> equivalent but not byte-by-byte identical contents, and
> that's exactly what you have with mod_deflate.
> 
> Henrik Nordstrom commented:
> 
>   "Not sufficient. The two versions is not semantically equivalen as one
>   can not be exchanged for the other without breaking the protocol. In
>   the context of If-None-Match the weak comparator is used in HTTP and
>   there a strong ETag is equal to a weak ETag."
> 
> Further discussion followed.  I won't repost it here in full, but
> since there clearly is an issue, it needs discussing here.

Currently I share your opinion that a weak etag should fix the issue
(besides ap_meets_condition currently does not work correctly with
weak etags, but this is another story).
OTOH I try to understand why Henrik thinks it is not sufficient.

Ok, before the patch we had the following situation:

Depending on the client httpd sent an uncompressed or an compressed
response with the *same* (possibly) strong ETag and a Vary: Accept-Encoding header.
A cache in the line stored the response and because both responses had
the *same* (possibly) strong ETag it only stored it *once* (either the compressed or
uncompressed version) and in fact ignored the Vary header. So if a client
requested that resource from the cache either conditional (If-none-match) or
unconditional it delivered what it had in stock ignoring the Accept-Encoding
header of the client.

Now after the patch we have the following situation:

Depending on the client httpd sends an uncompressed or an compressed
response with the original ETag if it does not modify the response and
with a weak version of the ETag if does compress / uncompress the response.
In any case it sets a Vary: Accept-Encoding header.
Ok, sending the original ETag if we do not alter the response might be an
error, but lets assume we do not and sent a weak version of the original
ETag in both cases (altering the response / not altering the response).
Does this allow the cache in the line to store it only *once* and ignoring
the Vary header?
If yes, then the fix is not sufficient, but if a weak ETag forces the cache
to store each variant based on the Vary header than it should work.


Regards

RĂ¼diger

Re: ETag and Content-Encoding

Posted by Henrik Nordstrom <he...@henriknordstrom.net>.
On ons, 2007-10-03 at 23:52 +0200, Henrik Nordstrom wrote:
> > That is not HTTP.  Don't confuse the needs of caching with the needs
> > of range requests -- only range requests need strong etags.
> 
> I am not. I am talking about If-None-Match, not If-Range. And
> specifically the use of If-None-Match in 13.6 Caching Negotiated
> Responses.

To clarify, I do not care much about strong/weak etags. This is a
property of how the server generates the content with no significant
relevance to caching other than that the ETags as such must be
sufficiently unique (there is some cache impacts of weak etags, but not
really relevant to this discussion)

It anything I said seems to imply that I only want to see strong ETags
then that's solely due to the use of poor language on my part and not
intentional.

All I am trying to say is that the responses

[no Content-Encoding]
and
Content-Encoding: gzip

from the same negotiated resource is two different variants in terms of
HTTP and must carry different ETag values, if any.

End.

The rest is just trying to get people to see this.

Apache mod_deflate do not do this when doing it's dynamic content
negotiation driven transformations, and that is a bug (13.11 MUST) with
quite nasty implications on caching of negotiated responses (13.6).

The fact that responses with different Content-Encoding is meant to
result in the same object after decoding is pretty much irrelevant here.
It's two incompatible different negotated variants of the resource and
is all that matters.

I am also saying that the simple change of making mod_deflate transform
any existing ETag into a weak one is not sufficient to address this
proper, but it's quite likely to plaster over the problem for a while in
most uses except when the original response ETag is already weak. It
will however break completely if Apache GET If-None-Match processing is
changed to use the weak comparison as mandated by the RFC (13.3.3) (to
my best knowledge Apache always uses the strong function, but I may be
wrong there..).

Negotiation of Content-Encoding is really not any different than
negotiation of any of the other content properties such as
Content-Language or Content-Type. The same rules apply, and each unique
outcome (variant) of the negotiation process needs to be assigned an
unique ETag with no overlaps between variants, and for strong ETag's
each binary version of each variant needs to have an unique ETag with no
overlaps.

This ignoring any out-of-band dynamic parameters to the negotiation
process such as server load which might affect responses to the same
request, only talking about negotiation based on request headers. For
out-of-band negotiation properties it's important to respect the strong
ETag binary equivalence requirements.


Note: Changed language to use the more proper term "variant" instead of
"entity". Hopefully less confusing.

Regards
Henrik

Re: ETag and Content-Encoding

Posted by Henrik Nordstrom <he...@henriknordstrom.net>.
On ons, 2007-10-03 at 12:10 -0700, Roy T. Fielding wrote:

> > Two resource variants with different content-encoding is not
> > semantically equivalent as the recipient may not be able to understand
> > an variant sent with an incompatible encoding.
> 
> That is not true.  The weak etag is for content that has changed but
> is just as good a response content as would have been received.
> In other words, protocol equivalence is irrelevant.

By protocol semantic equivalence I mean responses being acceptable to
requests.

Example: Two negotiated responses with different Content-Encoding is not
semantically equivalent at the HTTP level as their negotiation
properties is different, and one can not substitute one for the other
and expect that HTTP works.

But two compressed response entities with different compression level
depending on the CPU load is.

Note: Ignoring transfer-encoding here as it's transport and pretty much
irrelevant to the operations of the protocol other than wire message
encoding/decoding.

> > a) HTTP must be able to tell if an already cached variant is valid  
> > for a
> > new request by using If-None-Match. This means that each negotiated
> > entity needs to use a different ETag value. Accept-Encoding is no
> > different in this any of the other inputs to content negotiation.
> 
> That is not HTTP.  Don't confuse the needs of caching with the needs
> of range requests -- only range requests need strong etags.

I am not. I am talking about If-None-Match, not If-Range. And
specifically the use of If-None-Match in 13.6 Caching Negotiated
Responses.

It's a very simple and effective mechanism, but requires servers to
properly assign ETags to each (semantically in case of weak) unique
entity of a resource (not the resource as such).

Content-Encoding is no different in this than any of the other
negotiated properties (Content-Type, Content-Language, whatever).

Regards
Henrik

Re: ETag and Content-Encoding

Posted by "Roy T. Fielding" <fi...@gbiv.com>.
On Oct 3, 2007, at 7:20 AM, Henrik Nordstrom wrote:

> On ons, 2007-10-03 at 14:23 +0100, Nick Kew wrote:
>> http://issues.apache.org/bugzilla/show_bug.cgi?id=39727
>>
>> We have some controversy surrounding this bug, and bugzilla
>> has turned into a technical discussion that belongs here.
>>
>> Fundamental question:  Does a weak ETag preclude (negotiated)
>> changes to Content-Encoding?
>
> A weak etag means the response is semantically equivalent both at
> protocol and content level, and may be exchanged freely.
>
> Two resource variants with different content-encoding is not
> semantically equivalent as the recipient may not be able to understand
> an variant sent with an incompatible encoding.

That is not true.  The weak etag is for content that has changed but
is just as good a response content as would have been received.
In other words, protocol equivalence is irrelevant.

> Sending a weak ETag do not signal that there is negotiation taking  
> place
> (Vary does that), all it signals is that there may be multiple but  
> fully
> compatible versions of the entity variant in circulation, or that each
> request results in a slightly different object where the difference  
> has
> no practical meaning (i.e. embedded non-important timestamp or  
> similar).

Yes.  Compression has no practical meaning.

>> deflates the contents.  Rationale: a weak ETag promises
>> equivalent but not byte-by-byte identical contents, and
>> that's exactly what you have with mod_deflate.
>
> I disagree. It's two very different entities.

That is irrelevant.  What matters is the resource semantics, not the
message bits.  Every bit can change randomly and still be semantically
equivalent to a resource representation of random bits.

> Note: If mod_deflate is deterministic and always returning the exact
> same encoded version then using a strong ETag is correct.
>
>
> What this boils down to in the end is
>
> a) HTTP must be able to tell if an already cached variant is valid  
> for a
> new request by using If-None-Match. This means that each negotiated
> entity needs to use a different ETag value. Accept-Encoding is no
> different in this any of the other inputs to content negotiation.

That is not HTTP.  Don't confuse the needs of caching with the needs
of range requests -- only range requests need strong etags.

> b) If the object undergo some transformation that is not deterministic
> then the ETag must be weak to signify that byte-equivalence can not be
> guaranteed.
>
> Note regarding a: The weak/strong property of the ETag has no
> significance here. If-None-Match uses the weak comparision function
> where only the value is compared, not the strength. See 13.3.3  
> paragraph
> "The weak comparison function".

As intended,

....Roy


Re: Cc: lists (Re: ETag and Content-Encoding)

Posted by Henrik Nordstrom <he...@henriknordstrom.net>.
On ons, 2007-10-03 at 21:44 +0100, Nick Kew wrote:

> The Cc: list on this and subsequent postings is screwed:
> 
>   (1) It includes me, so I get everything twice.
>       OK, I can live with that, but it's annoying.

Use a Message-Id filter?

>   (2) It fails to include Henrik Nordstrom, the principal 
>       non-Apache protagonist in this discussion.

No problem. I am a dev@ subscriber

Regards
Henrik

Cc: lists (Re: ETag and Content-Encoding)

Posted by Nick Kew <ni...@webthing.com>.
On Wed, 3 Oct 2007 07:53:31 -0700
"Justin Erenkrantz" <ju...@erenkrantz.com> wrote:

> [chop]

The Cc: list on this and subsequent postings is screwed:

  (1) It includes me, so I get everything twice.
      OK, I can live with that, but it's annoying.
  (2) It fails to include Henrik Nordstrom, the principal 
      non-Apache protagonist in this discussion.

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/

Re: ETag and Content-Encoding

Posted by Julian Reschke <ju...@gmx.de>.
Henrik Nordstrom wrote:
> On ons, 2007-10-03 at 13:29 -0700, Justin Erenkrantz wrote:
> 
>> The issue here is that mod_dav_svn generates an ETag (based off rev
>> num and path) and that ETag can be later used to check for conditional
>> requests.  But, if mod_deflate always strips a 'special' tag from the
>> ETag (per Henrik),
> 
> That was only a suggestion on how you may work around your somewhat
> limited conditional processing capabilities wrt filters like
> mod_deflate, but I think it's probably the cleanest approach considering
> the requirements of If-Match and modifying methods (PUT, DELETE,
> PROPATCH etc). In that construct the tag added to the ETag by
> mod_deflate (or another entity transforming filter) needs to be
> sufficiently unique that it is not likely to be seen in the original
> ETag value.
> ...

Two cents -- no three cents :-):

#1) I agree with Henrik's analysis.

#2) If Content-Encoding is implemented through a separate module, it 
will have to rewrite both outgoing and incoming etags; note that this 
includes the "If-*" headers from RFC2616 and the "If" header defined in 
RFC4918 (obsoleting RFC2518).

#3) If just appending "-gzip" doesn't provide sufficient uniqueness, the 
  implementation may want to *always* append a token (such as 
"-identity"), even when no compression occurred.

Best regards, Julian



Re: ETag and Content-Encoding

Posted by Henrik Nordstrom <he...@henriknordstrom.net>.
On ons, 2007-10-03 at 13:29 -0700, Justin Erenkrantz wrote:

> The issue here is that mod_dav_svn generates an ETag (based off rev
> num and path) and that ETag can be later used to check for conditional
> requests.  But, if mod_deflate always strips a 'special' tag from the
> ETag (per Henrik),

That was only a suggestion on how you may work around your somewhat
limited conditional processing capabilities wrt filters like
mod_deflate, but I think it's probably the cleanest approach considering
the requirements of If-Match and modifying methods (PUT, DELETE,
PROPATCH etc). In that construct the tag added to the ETag by
mod_deflate (or another entity transforming filter) needs to be
sufficiently unique that it is not likely to be seen in the original
ETag value.

It's not easy to fulfill the needs of all components when doing dynamic
entity transformations, especially when there is negotiation involved..

Regards
Henrik

Re: ETag and Content-Encoding

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On Oct 3, 2007 12:19 PM, Roy T. Fielding <fi...@gbiv.com> wrote:
> I don't see how that is possible, unless subversion is depending
> on content-encoding to twiddle between compressed and uncompressed
> transfer without changing the etag.  In that case, subversion will be
> broken, as would any poster child for misusing content-encoding as
> a transfer encoding.

I don't understand - why should Subversion care?  It doesn't know
anything related to gzip - that's purely mod_deflate's job.

The issue here is that mod_dav_svn generates an ETag (based off rev
num and path) and that ETag can be later used to check for conditional
requests.  But, if mod_deflate always strips a 'special' tag from the
ETag (per Henrik), then by the time that mod_dav_svn sees it, the tag
could be corrupt - as that special tag could have been part of a valid
ETag produced by mod_dav_svn as we've *never* placed restrictions on
the format of the ETag produced by our modules.  -- justin

Re: ETag and Content-Encoding

Posted by "Roy T. Fielding" <fi...@gbiv.com>.
On Oct 3, 2007, at 7:53 AM, Justin Erenkrantz wrote:

> The problem with trying to invent new ETags is that we'll almost
> certainly break conditional requests and I find that a total
> non-starter.  Your suggestion of appending ";gzip" leaks information
> that doesn't belong in the ETag - as it is quite possible for that to
> appear in a valid ETag from another source - for example, it is
> trivial to make Subversion generate ETags containing that at the end -
> this would create nasty false positives and corrupt Subversion's
> conditional request checks.  Plus, rewriting every filter to append or
> delete a 'special' marker in the ETag is bound to make the situation
> way worse.  -- justin

I don't see how that is possible, unless subversion is depending
on content-encoding to twiddle between compressed and uncompressed
transfer without changing the etag.  In that case, subversion will be
broken, as would any poster child for misusing content-encoding as
a transfer encoding.

....Roy

Re: ETag and Content-Encoding

Posted by Henrik Nordstrom <he...@henriknordstrom.net>.
On ons, 2007-10-03 at 07:53 -0700, Justin Erenkrantz wrote:

> As before, I still don't understand why Vary is not sufficient to
> allow real-world clients to differentiate here.  If Squid is ignoring
> Vary, then it does so at its own peril - regardless of ETags.

See RFC2616 13.6 Caching Negotiated Responses and you should understand
why returing an unique ETag on each variant is very important. (yes, the
gzip and identity content-encoded responses is two different variants of
the same resource, see earlier discussions if you don't agree on that).

But yes, thinking over this a second time converting the ETag to a weak
ETag is sufficient to plaster over the problem assuming the original
ETag is a strong one. Not because it's correct from a protocol
perspective, but becase Apache do not use the weak compare function when
processing If-None-Match so in Apache's world changing a strong ETag to
a weak one is about the same as assigning a new ETag.

However, if the original ETag is already weak then the problem remains
exactly as it is today..

Also it's also almost the same as deleting the ETag as you also destroy
If-None-Match processing of filtered responses, which also is why it
works..

> The problem with trying to invent new ETags is that we'll almost
> certainly break conditional requests and I find that a total
> non-starter.

Only because your processing of conditional requests is broken. See
earlier discussions on the topic of this bug already covering this
aspect.

To work proper the conditionals needs to (logically) be processed when
the response entity is known, this is after mod_deflate (or another
filter) does it's dance to transform the response headers. Doing
conditionals before the actual response headers is known is very
errorprone and likely to cause false matches as you don't know this is
the response which will be sent to the requestor.

> Your suggestion of appending ";gzip" leaks information
> that doesn't belong in the ETag - as it is quite possible for that to
> appear in a valid ETag from another source - for example, it is
> trivial to make Subversion generate ETags containing that at the end -
> this would create nasty false positives and corrupt Subversion's
> conditional request checks.

Then use something stronger, less likely to be seen in the original
etag. Or fix the filter architecture to deal with conditionals proper
making this question ("collisions") pretty much a non-issue.

Or until conditionals can be processed correctly in precense of filters
drop the ETag on filtered responses where the filter do some kind of
negotiation.

> Plus, rewriting every filter to append or
> delete a 'special' marker in the ETag is bound to make the situation
> way worse.  -- justin

I don't see much choice if you want to comply with the RFC requirements.
The other choice is to drop the ETag header on such responses, which
also is not a nice thing but at least complying with the specifications
making it better than sending out the same ETag on incompatible
responses from the same resource.

Regards
Henrik

Re: ETag and Content-Encoding

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On Oct 3, 2007 7:20 AM, Henrik Nordstrom <hn...@squid-cache.org> wrote:
> > deflates the contents.  Rationale: a weak ETag promises
> > equivalent but not byte-by-byte identical contents, and
> > that's exactly what you have with mod_deflate.
>
> I disagree. It's two very different entities.

As before, I still don't understand why Vary is not sufficient to
allow real-world clients to differentiate here.  If Squid is ignoring
Vary, then it does so at its own peril - regardless of ETags.

The problem with trying to invent new ETags is that we'll almost
certainly break conditional requests and I find that a total
non-starter.  Your suggestion of appending ";gzip" leaks information
that doesn't belong in the ETag - as it is quite possible for that to
appear in a valid ETag from another source - for example, it is
trivial to make Subversion generate ETags containing that at the end -
this would create nasty false positives and corrupt Subversion's
conditional request checks.  Plus, rewriting every filter to append or
delete a 'special' marker in the ETag is bound to make the situation
way worse.  -- justin

Re: ETag and Content-Encoding

Posted by Henrik Nordstrom <hn...@squid-cache.org>.
On ons, 2007-10-03 at 14:23 +0100, Nick Kew wrote:
> http://issues.apache.org/bugzilla/show_bug.cgi?id=39727
> 
> We have some controversy surrounding this bug, and bugzilla
> has turned into a technical discussion that belongs here.
> 
> Fundamental question:  Does a weak ETag preclude (negotiated) 
> changes to Content-Encoding?

A weak etag means the response is semantically equivalent both at
protocol and content level, and may be exchanged freely.

Two resource variants with different content-encoding is not
semantically equivalent as the recipient may not be able to understand
an variant sent with an incompatible encoding.

Sending a weak ETag do not signal that there is negotiation taking place
(Vary does that), all it signals is that there may be multiple but fully
compatible versions of the entity variant in circulation, or that each
request results in a slightly different object where the difference has
no practical meaning (i.e. embedded non-important timestamp or similar).

> deflates the contents.  Rationale: a weak ETag promises
> equivalent but not byte-by-byte identical contents, and
> that's exactly what you have with mod_deflate.

I disagree. It's two very different entities.

Note: If mod_deflate is deterministic and always returning the exact
same encoded version then using a strong ETag is correct.


What this boils down to in the end is

a) HTTP must be able to tell if an already cached variant is valid for a
new request by using If-None-Match. This means that each negotiated
entity needs to use a different ETag value. Accept-Encoding is no
different in this any of the other inputs to content negotiation.

b) If the object undergo some transformation that is not deterministic
then the ETag must be weak to signify that byte-equivalence can not be
guaranteed.

Note regarding a: The weak/strong property of the ETag has no
significance here. If-None-Match uses the weak comparision function
where only the value is compared, not the strength. See 13.3.3 paragraph
"The weak comparison function".

Regards
Henrik