You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by "William A. Rowe, Jr." <wr...@rowe-clan.net> on 2009/08/26 20:47:28 UTC

mod_cache, mod_deflate and Vary: User-Agent

I think we blew it :)

Vary: user-agent is not practical for correcting errant browser behavior.

For example;

  User-Agent: Mozilla/5.0 Gecko/20090729 Firefox/3.5.2

produces a myriad number of 'variant' flavors when tagging Vary with
the User-Agent when determining if the deflate/gzip compression should
be served, or the uncompressed variant.

What we really meant to do was to determine which Accept-Encoding values
were invalid based on known browser bugs, and -remove them- from the A-E
header *prior* to determining the cache handling (quick handler hook) or
typical content handling.

Which implies that setenvif + headers need an extra chance to run really
first in front of the quick handler.

Any better suggestions?





Re: mod_cache, mod_deflate and Vary: User-Agent

Posted by Paul Querna <pa...@querna.org>.
On Wed, Aug 26, 2009 at 2:50 PM, William A. Rowe,
Jr.<wr...@rowe-clan.net> wrote:
> Paul Querna wrote:
>>
>> Yes, write a Varied header to 'hash' plugin API for mod_cache.
>>
>> I would write little lua scriptlets that map user agents to two
>> buckets: supports gzip, doesnt support gzip.  store the thing in
>> mod_cache only twice, instead of once for every user agent.
>
> This doesn't solve the problem of each-and-every downstream proxy
> cache storing an excessively large number of copies.  Even if we
> strip down comments from the fields before choosing cache entries,
> Mozilla's many versions of Mozilla/2.0.3 and Gecko/20090731 tags
> are going to continue to proliferate copies.
>
> I'm suggesting that this might need to be 'invisibly' handled, not
> using Vary:, but by any proxy clever enough to detect the non-conforming
> browser to then strip the request to deflate/gzip.  At that point, the
> choice-of-two becomes obvious to all proxies and back end servers with
> this knowledge.  If this is unknown to an earlier proxy, the client
> could get the broken deflate/gzip content, but that seems unavoidable.
>
> Honestly, I can't see a way to honor HTTP/1.1 cache negotiation goals
> while minimizing cache pollution.

There isn't.  So, optimize your cache, strip caching headers to
downstream proxies.

Maybe Waka can fix it.

Re: mod_cache, mod_deflate and Vary: User-Agent

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
Paul Querna wrote:
> 
> Yes, write a Varied header to 'hash' plugin API for mod_cache.
> 
> I would write little lua scriptlets that map user agents to two
> buckets: supports gzip, doesnt support gzip.  store the thing in
> mod_cache only twice, instead of once for every user agent.

This doesn't solve the problem of each-and-every downstream proxy
cache storing an excessively large number of copies.  Even if we
strip down comments from the fields before choosing cache entries,
Mozilla's many versions of Mozilla/2.0.3 and Gecko/20090731 tags
are going to continue to proliferate copies.

I'm suggesting that this might need to be 'invisibly' handled, not
using Vary:, but by any proxy clever enough to detect the non-conforming
browser to then strip the request to deflate/gzip.  At that point, the
choice-of-two becomes obvious to all proxies and back end servers with
this knowledge.  If this is unknown to an earlier proxy, the client
could get the broken deflate/gzip content, but that seems unavoidable.

Honestly, I can't see a way to honor HTTP/1.1 cache negotiation goals
while minimizing cache pollution.

I did consider a module (lua or otherwise) that would 'interfere' in
the initial quick handler phase just to work out broken user agents,
rather than carry the entire weight of setenvif/headers to the quick
handler phase.



Re: mod_cache, mod_deflate and Vary: User-Agent

Posted by Nick Kew <ni...@webthing.com>.
On 28 Aug 2009, at 06:13, tokiley@aol.com wrote:

>
> > Brian Akins of Turner Broadcasting, Inc. wrote...
> >
> > We are moving towards the 'if you say you support gzip,
> > then you get gzip' attitude.

The only approach that makes sense.  Good to hear that from
folks as big as you.

> There isn't a browser in the world that can 'Accept Encoding'
> successfully for ALL mime types.

Huh?  Whyever not?  Encoding is orthogonal to MIME type,
and for the ability to decode to be dependent on MIME type
would indicate tortuously over-complicated and hopelessly
broken browser design.

-- 
Nick Kew

Re: mod_cache, mod_deflate and Vary: User-Agent

Posted by to...@aol.com.

> Brian Akins of Turner Broadcasting, Inc. wrote...
>
> We are moving towards the 'if you say you support gzip,
> then you get gzip' attitude.

There isn't a browser in the world that can 'Accept Encoding'
successfully for ALL mime types.

Some are better than others but there are always certain
mime types that should never be returned with any
'Content Encoding' regardless of what the browser
is saying.

In that sense, you can never really trust the 
'Accept-encoding: gzip, deflate' header at all.

There is (currently) no mechanism in the HTTP protocol
for a client to specify WHICH mime types it can
successfully decode.

It was supposed to be an 'all or nothing' DEVCAP
indicator but that's not how things have evolved in
the real world.

There are really only 3 choices...

1. Stick with the original spec and continue to treat
'Accept-encoding: whatever' as an 'all or nothing' indicator
with regards to possible mime types and treat every 
complaint of breakage as 'it's not our problem, your 
browser is non-compliant'.

2. Change the original spec and add a way for clients 
to indicate which mime types can be successfully
decoded and then wait for all the resulting support code 
to be added to all Servers and Proxies.

3. Do nothing, and let every individual Server owner
continue to find their own solution(s) to the problem(s).

Yours
Kevin Kiley



 

-----Original Message-----
From: Akins, Brian <Br...@turner.com>
To: dev@httpd.apache.org <de...@httpd.apache.org>
Sent: Thu, Aug 27, 2009 9:42 am
Subject: Re: mod_cache, mod_deflate and Vary: User-Agent










On 8/26/09 3:20 PM, "Paul Querna" <pa...@querna.org> wrote:

> I would write little lua scriptlets that map user agents to two
> buckets: supports gzip, doesnt support gzip.  store the thing in
> mod_cache only twice, instead of once for every user agent.

We do the same basic thing.  We are moving towards the "if you say you
support gzip, then you get gzip" attitude.  I think less than 1% of our
clients would be affected, and I think a lot of those are fake agents
anyway.


-- 
Brian Akins




 


Re: mod_cache, mod_deflate and Vary: User-Agent

Posted by "Akins, Brian" <Br...@turner.com>.
On 8/26/09 3:20 PM, "Paul Querna" <pa...@querna.org> wrote:

> I would write little lua scriptlets that map user agents to two
> buckets: supports gzip, doesnt support gzip.  store the thing in
> mod_cache only twice, instead of once for every user agent.

We do the same basic thing.  We are moving towards the "if you say you
support gzip, then you get gzip" attitude.  I think less than 1% of our
clients would be affected, and I think a lot of those are fake agents
anyway.


-- 
Brian Akins


Re: mod_cache, mod_deflate and Vary: User-Agent

Posted by Paul Querna <pa...@querna.org>.
On Wed, Aug 26, 2009 at 11:47 AM, William A. Rowe,
Jr.<wr...@rowe-clan.net> wrote:
> I think we blew it :)
>
> Vary: user-agent is not practical for correcting errant browser behavior.
>
> For example;
>
>  User-Agent: Mozilla/5.0 Gecko/20090729 Firefox/3.5.2
>
> produces a myriad number of 'variant' flavors when tagging Vary with
> the User-Agent when determining if the deflate/gzip compression should
> be served, or the uncompressed variant.
>
> What we really meant to do was to determine which Accept-Encoding values
> were invalid based on known browser bugs, and -remove them- from the A-E
> header *prior* to determining the cache handling (quick handler hook) or
> typical content handling.
>
> Which implies that setenvif + headers need an extra chance to run really
> first in front of the quick handler.
>
> Any better suggestions?

Yes, write a Varied header to 'hash' plugin API for mod_cache.

I would write little lua scriptlets that map user agents to two
buckets: supports gzip, doesnt support gzip.  store the thing in
mod_cache only twice, instead of once for every user agent.

Re: mod_cache, mod_deflate and Vary: User-Agent

Posted by to...@aol.com.
> William A. Rowe, Jr.
>
> I think we blew it :)
>
> Vary: user-agent is not practical for correcting errant browser behavior.

You have not 'blown it'.

>From a certain perspective, it's the only reasonable thing to do.

Everyone keeps forgetting one very important aspect of this issue
and that is the fact that the 'Browsers' themselves are 
participating in the whole 'caching' scheme and that they
are the source of the actual requests, so their behavior is
as much a part of the equation as any inline proxy cache.

There is no real solution to this problem.

The HTTP protocol itself does not have the capability
to deal with things correctly with regards to 
compressed variants.

The only decision that anyone needs to make is 'Where is
the pain factor?'.

If you VARY on ANYTHING other than 'User-Agent' then this
might show some reduction of the pain factor at the proxy
level but you have now exponentially increased the pain
factor at the infamous 'Last Mile'.

Most modern browsers will NOT 'cache' anything that has
a 'Vary:' header OTHER than 'User-Agent:'. This is as true
today as it was 10 years ago.

The following discussion involving myself and some of the 
authors of the SQUID Proxy caching Server took place just 
short of SEVEN (7) YEARS ago but, as unbelievable as it might
seem, is still just as relevant ( and unresolved )...

http://marc.info/?l=apache-modgzip&m=103958533520502&w=2

It's way too long to reproduce here but here is just 
the SUMMARY part. You would have to access the link
above to read all the gory details...

[snip]

> Hello all.
>
> This is a continuation of the thread entitled...
>
> [Mod_gzip] "mod_gzip_send_vary=Yes" disables caching on IE
>
> After several hours spent doing my own testing with MSIE and
> digging into MSIE internals with a kernel debugger I think I
> have the answers.
>
> The news is NOT GOOD.
>
> I will start with a SUMMARY first for those who don't have the
> time to read the whole, ugly story but for those who want to
> know where the following 'conclusions' are coming from I
> refer you to the rest of the message and the "detail".
>
> SUMMARY
>
> There is only 1 request header value that you can use with
> "Vary:" that will cause MSIE to cache a non-compressed
> response and that is ( drum roll please ) "User-Agent".
>
> If you use ANY other (legal) request header field name in
> a "Vary:" header then MSIE ( Versions 4, 5 and 6 ) will
> REFUSE to cache that response in the MSIE local cache.
>
> This is why Jordan is seeing a caching problem and Slava
> is not. Slava is 'accidentally' using the only possible "Vary:"
> field name that will cause MSIE to behave as it should
> and cache a non-compressed response.
>
> Jordan is seeing non-compressed responses never being
> cached by MSIE because the responses are arriving
> with something other than "Vary: User-Agent" like
> "Vary: Accept-Encoding".
>
> It should be perfectly legal and fine to send "Vary: Accept-Encoding"
> on a non-compressed response that can 'Vary' on that field
> value and that response SHOULD be 'cached' by MSIE...
> but so much for assumptions. MSIE will NOT cache this response.
>
> MSIE will treat ANY field name other than "User-Agent"
> as if "Vary: *" ( Vary + STAR ) was used and it will
> NOT cache the non-compressed response.
>
> The reason the COMPRESSED responses are, in fact,
> always getting cached no matter what "Vary:" field name
> is present is just as I suspected... it is because MSIE
> decides it MUST cache responses that arrive with
> "Content-Encoding: gzip" because it MUST have a
> disk ( cache ) file to work with in order to do the
> decompression.
>
> The problem exists in ALL versions of MSIE but it's
> even WORSE for any version earlier than 5.0. MSIE 4.x
> will not even cache responses with "Vary: User-Agent".
>
> That's it for the SUMMARY.
>
> The rest of this message contains the gory details.

[/snip]

I participated in another lengthy 'offline' discussion about
all this some 3 or 4 years ago again with the authors of 
SQUID. There was still no real resolution to the problem.

The general consensus was that if there is always going to
be a 'pain factor' then it's better to follow one of the
rules of Networking and assume the following...

"The least amount of resources will always be present
the closer you get to the last mile."

In other words... it's BETTER to live with some redundant
traffic at the proxy level, where the equipment and bandwidth 
is usually more robust and closer to the backbone, than to put 
the pain factor onto the 'last mile' where resources are usually
more constrained.

If anyone is going to start dropping some special code
anywhere to 'invisibly handle the problem' my suggestion
would be to look at coming up with a scheme that undoes
the damage these out-of-control redundant 'User-Agent' strings are 
causing. The only thing a proxy cache really needs to know is
whether a certain 'User-Agent' string represents a 
different level of DEVCAP than another one. If all that
is changing is a version number and there is no change
with regards to actual Device Capabilities then there's
no reason to cache a separate response for that User Agent.

That still wouldn't represent the ultimate 'fix' for this
multi-variant caching issue... but it sure would be a
step in the right direction.

Yours...
Kevin Kiley

BTW: This posting doesn't even come anywhere near the
real issue which is that even Browsers that 'appear'
to not be able to support 'Accept-Encoding: gzip, deflate'
usually CAN... but it's actually all about MIME TYPES.
The HTTP protocol does NOT provide a way for a client to 
indicate WHICH mime types it can or cannot 'decompress'.
Browsers that appear 'broken' with regards to decompression
are actually only 'broken' for certain MIME types.

That's a complete separate discussion and I'm not
goint to 'go there' tonight.


-----Original Message-----
From: William A. Rowe, Jr. <wr...@rowe-clan.net>
To: dev@httpd.apache.org <de...@httpd.apache.org>
Sent: Wed, Aug 26, 2009 1:47 pm
Subject: mod_cache, mod_deflate and Vary: User-Agent










I think we blew it :)

Vary: user-agent is not practical for correcting errant browser behavior.

For example;

  User-Agent: Mozilla/5.0 Gecko/20090729 Firefox/3.5.2

produces a myriad number of 'variant' flavors when tagging Vary with
the User-Agent when determining if the deflate/gzip compression should
be served, or the uncompressed variant.

What we really meant to do was to determine which Accept-Encoding values
were invalid based on known browser bugs, and -remove them- from the A-E
header *prior* to determining the cache handling (quick handler hook) or
typical content handling.

Which implies that setenvif + headers need an extra chance to run really
first in front of the quick handler.

Any better suggestions?