You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modules-dev@httpd.apache.org by Joshua Marantz <jm...@google.com> on 2011/06/04 21:26:44 UTC

Vary:User-Agent, best practices, and making the web faster.

Hi,

We've been working on a lingering HTTP-compliance issue in mod_pagespeed:
respecting Vary:User-Agent.  mod_pagespeed needs to cache resources in order
to optimize them.  The economics of this make sense when the server
optimizes a resource, and saves the optimization for serving to multiple
clients.

The problem is that this is, in general, expensive to do correctly when the
site owner has put Vary:User-Agent in the response header for, say, a css or
javascript file.  There are legitimate reasons to do this, such as serving a
different version of a CSS file to IE6.  But I think most sites don't do
that.  However, there is a disturbing passage in the document for
mod_deflate: http://httpd.apache.org/docs/2.2/mod/mod_deflate.html:

SetEnvIfNoCase Request_URI \
\.(?:gif|jpe?g|png)$ no-gzip dont-vary

# Make sure proxies don't deliver the wrong content
Header append Vary User-Agent env=!dont-vary

This encourages all site owners to add Vary:User-Agent to all css and js
files, whether they actually vary in content or not.

Does anyone know the history of this recommendation?  Surely that is an
inappropriate recommendation for mod_deflate.  Vary:Accept-Encoding make
sense in the context of that filter, but not Vary:User-Agent.

The problem with defensively setting Vary:User-Agent is that any proxy cache
-- and in this respect mod_pagespeed acts like a proxy cache -- must fetch
origin content for each distinct user-agent.  While it's feasible for us to
employ a level of indirection in our cache so that we only store extra
copies of a resource when it actually differs -- this can, I fear, be
catestrophic to the cache working-set and hit-rate.  We couldn't get around
storing each distinct user-agent.


So, there are two questions:

1. Who can I lobby to get the recommendation changed for the mod_deflate
doc?  That recommendation seems incorrect &/or obsolete.
2. Given that there are likely a huge number of sites that blindly followed
that recommendation, is there a straightforward way for mod_pagespeed to
correct the situation?  Specifically, can mod_pagespeed get access to apache
configuration parameters that were added by other filters, looking
specifically for the pattern quoted above with SetEnvIfNoCace & Header?  Is
it OK for mod_pagespeed to register for config-parameters owned by another
module?

I think what we'd do is basically let mod_pagespeed ignore "Vary:User-Agent"
if we saw that it was inserted per this exact pattern.  This would, to be
pendantic, violate  HTTP, but I think it would help make the web faster, and
in practice would help many more sites than it would hurt.  Sites that
specifically added vary:user-agent using a more specific construct, such as
identifying a particular CSS file that they want to Vary, would be treated
differently.

-Josh

Re: Vary:User-Agent, best practices, and making the web faster.

Posted by Ben Noordhuis <in...@bnoordhuis.nl>.
On Sun, Jun 5, 2011 at 21:37, Joshua Marantz <jm...@google.com> wrote:
> Does Magento actually vary the content of CSS & JS based on user-agent?  Or
> does it only vary the content of HTML?

I don't know. I'm by no means a Magento expert, I only run into it
from time to time. That site I broke? That was in the summer of 2009
while beefing up the security and performance of a large retailer's
web shop, mostly by putting stuff behind reverse proxies.

Re: Vary:User-Agent, best practices, and making the web faster.

Posted by Joshua Marantz <jm...@google.com>.
Thanks for the feedback, Ben!  You are omniscient in the ways of Apache.
 I'll try to lobby for an update to the mod_deflate page.

Your concerns about Magento are interesting -- my impression from our forums
and Twitter is that mod_pagespeed is successfully accelerating many Magento
sites now, since releasing the mod_rewrite workarounds, which we were able
to do thanks to your help a few months ago.  The reason that we don't have
problems with vary:user-agent on HTML is that we don't ever cache HTML under
any circumstance.  We assume HTML generally varies on user-agent, cookies,
locale-of-client, and is generally updated frequently.

Does Magento actually vary the content of CSS & JS based on user-agent?  Or
does it only vary the content of HTML?

You've dissuaded me from trying to infer the intent of the site
administrator, and we will find another way to phase in vary:user-agent
compliance without falling off a performance cliff.

-Josh

On Sun, Jun 5, 2011 at 2:54 PM, Ben Noordhuis <in...@bnoordhuis.nl> wrote:

> On Sun, Jun 5, 2011 at 13:42, Joshua Marantz <jm...@google.com> wrote:
> > This is a case where the content varies based on user-agent.  The
> > recommendation on the mod_deflate doc page is add vary:user-agent for any
> > non-image.  Can you think of a case where the absence of a
> vary:user-agent
> > header causes broken behavior when the content doesn't vary?
> >
> > I'm not objecting to setting vary:user-agent when content varies: that's
> > what it's for.  I'm objecting to setting vary:user-agent when content
> does
> > *not* vary.  The mod_deflate documentation unambiguously recommends
> setting
> > vary:user-agent, and my feeling is that this is to work around a bug that
> > exists only in IE5 or pre-2007 patch of IE6.
>
> Sorry, Joshua, we're conflating things. You raised two issues in your
> original post:
>
> 1. Updating the mod_deflate documentation. Seems reasonable. The Vary:
> UA recommendation was added in 2002 during a general clean-up of the
> mod_deflate documentation and the commit log doesn't tell why. You
> could open a bugzilla issue or raise it on the httpd-dev mailing list
> (the former is the proper channel but the bugzilla is something of a
> graveyard).
>
> 2. mod_pagespeed second-guessing the user's intent. That still seems
> like an unambiguously bad idea. To touch on Magento again, its
> documentation links (or linked) directly to that section of the
> mod_deflate docs and people are using that. If your module scans for
> and neutralizes that Header directive, you will break someone's site.
>

Re: Vary:User-Agent, best practices, and making the web faster.

Posted by Ben Noordhuis <in...@bnoordhuis.nl>.
On Sun, Jun 5, 2011 at 13:42, Joshua Marantz <jm...@google.com> wrote:
> This is a case where the content varies based on user-agent.  The
> recommendation on the mod_deflate doc page is add vary:user-agent for any
> non-image.  Can you think of a case where the absence of a vary:user-agent
> header causes broken behavior when the content doesn't vary?
>
> I'm not objecting to setting vary:user-agent when content varies: that's
> what it's for.  I'm objecting to setting vary:user-agent when content does
> *not* vary.  The mod_deflate documentation unambiguously recommends setting
> vary:user-agent, and my feeling is that this is to work around a bug that
> exists only in IE5 or pre-2007 patch of IE6.

Sorry, Joshua, we're conflating things. You raised two issues in your
original post:

1. Updating the mod_deflate documentation. Seems reasonable. The Vary:
UA recommendation was added in 2002 during a general clean-up of the
mod_deflate documentation and the commit log doesn't tell why. You
could open a bugzilla issue or raise it on the httpd-dev mailing list
(the former is the proper channel but the bugzilla is something of a
graveyard).

2. mod_pagespeed second-guessing the user's intent. That still seems
like an unambiguously bad idea. To touch on Magento again, its
documentation links (or linked) directly to that section of the
mod_deflate docs and people are using that. If your module scans for
and neutralizes that Header directive, you will break someone's site.

Re: Vary:User-Agent, best practices, and making the web faster.

Posted by Joshua Marantz <jm...@google.com>.
On Sun, Jun 5, 2011 at 7:32 AM, Ben Noordhuis <in...@bnoordhuis.nl> wrote:

> On Sun, Jun 5, 2011 at 02:15, Joshua Marantz <jm...@google.com> wrote:
> > On Sat, Jun 4, 2011 at 7:58 PM, Ben Noordhuis <in...@bnoordhuis.nl>
> wrote:
> >> Some popular OSS packages depend on Vary: User-Agent to make
> >> downstream proxies (reverse or forward) do the right thing.
> >
> > I'm pretty interested in deconstructing this further.  Can you be more
> > specific?   Which OSS packages?  Under what scenario would a proxy do the
> > wrong thing in the absence of Vary:User-Agent (other than, obviously,
> when
> > the content actually varies based on user-agent)?
>
> From first-hand experience (because I broke it): Magento, a popular
> PHP e-commerce framework. Magento (or one of its plug-ins) generates
> browser-tailored HTML and sets the Vary header to ensure that
> downstream proxies send the right HTML to the right client. If you
> remove or ignore the header, the layout of your site breaks.
>

This is a case where the content varies based on user-agent.  The
recommendation on the mod_deflate doc page is add vary:user-agent for any
non-image.  Can you think of a case where the absence of a vary:user-agent
header causes broken behavior when the content doesn't vary?

I'm not objecting to setting vary:user-agent when content varies: that's
what it's for.  I'm objecting to setting vary:user-agent when content does
*not* vary.  The mod_deflate documentation unambiguously recommends setting
vary:user-agent, and my feeling is that this is to work around a bug that
exists only in IE5 or pre-2007 patch of IE6.

-Josh

Re: Vary:User-Agent, best practices, and making the web faster.

Posted by Ben Noordhuis <in...@bnoordhuis.nl>.
On Sun, Jun 5, 2011 at 02:15, Joshua Marantz <jm...@google.com> wrote:
> On Sat, Jun 4, 2011 at 7:58 PM, Ben Noordhuis <in...@bnoordhuis.nl> wrote:
>> Some popular OSS packages depend on Vary: User-Agent to make
>> downstream proxies (reverse or forward) do the right thing.
>
> I'm pretty interested in deconstructing this further.  Can you be more
> specific?   Which OSS packages?  Under what scenario would a proxy do the
> wrong thing in the absence of Vary:User-Agent (other than, obviously, when
> the content actually varies based on user-agent)?

>From first-hand experience (because I broke it): Magento, a popular
PHP e-commerce framework. Magento (or one of its plug-ins) generates
browser-tailored HTML and sets the Vary header to ensure that
downstream proxies send the right HTML to the right client. If you
remove or ignore the header, the layout of your site breaks.

There are CPAN modules and Rack middleware that do similar things and
no doubt other software too.

Re: Vary:User-Agent, best practices, and making the web faster.

Posted by Joshua Marantz <jm...@google.com>.
On Sat, Jun 4, 2011 at 7:58 PM, Ben Noordhuis <in...@bnoordhuis.nl> wrote:

>  > And I still don't understand how that relates to Vary:User-Agent.
>  What's
> > really at issue here seems more related to proxies; is that right?  That
> > proxies were not respecting Accept-Encoding, but sending gzipped content
> to
> > browsers that did not want it?  Is that still a problem?  Which proxies
> were
> > broken?  Are they still broken?
>
> Some popular OSS packages depend on Vary: User-Agent to make
> downstream proxies (reverse or forward) do the right thing.
>

I'm pretty interested in deconstructing this further.  Can you be more
specific?   Which OSS packages?  Under what scenario would a proxy do the
wrong thing in the absence of Vary:User-Agent (other than, obviously, when
the content actually varies based on user-agent)?

> And, while I understand the reluctance to help me figure out from our
> module
> > what values were passed to SetEnvIfNoCase and Header, I would like to see
> > whether there's agreement that the Apache 2.2 docs for mod_deflate are no
> > longer appropriate -- and in fact harmful.
>
> I've been mulling it over for 10 minutes and I can't decide. It's
> harmful because it leads to a proliferation of cached objects (bad)
>

I think that at least some proxies would likely decide to simply *not* cache
in the presence of vary:user-agent, rather than explode their caches.  That
makes the web slower.  But, Varnish, in particular, will explode its cache:
 http://www.varnish-cache.org/docs/trunk/tutorial/vary.html.  I believe that
will also make the web slower, because the hit-rate will suffer and they'll
be less room the cache for differentiated content.


> but removing it from the documentation will break things for someone
> somewhere (also bad).
>

I'm trying to get a handle on exactly what would break, and for whom :)

-Josh

Re: Vary:User-Agent, best practices, and making the web faster.

Posted by Ben Noordhuis <in...@bnoordhuis.nl>.
On Sun, Jun 5, 2011 at 00:34, Joshua Marantz <jm...@google.com> wrote:
> It was with some reluctance that I brought this up.  It occurs to me that
> this idea propagates the sort of spec violations that led to this issue
> (inappropriate user of Vary:User-Agent) in the first place.   However, I'm
> trying to figure out how to improve compliance to support legitimate uses of
> Vary:User-Agent without causing mod_pagespeed to become significantly less
> ineffective across a broad range of sites.
>
> We have found that putting complaints in Apache logs mostly causes disks to
> fill and servers to crash -- although that does get it noticed :).  The
> problem, put another way, is that mod_pagespeed cannot distinguish
> legitimate uses of Vary:User-Agent, so it really has no business complaining
> in logs.  Complaining in docs is fine; but some existing mod_pagespeed users
> that simply type "sudo yum update" will later notice a performance-drop and
> may not consult the docs to figure out why.
>
> I'm also trying to grok the first response from Eric:
>
> It's because of the other (dated) canned exceptions that set/unset
> no-gzip/gzip-only-text/html based on the User-Agent, to second-guess
> browsers that send AE:gzip but can't properly deal with it.
>
>
> Going backwards:  which browsers send AE:gzip but can't properly deal with
> it?   Does IE6 have that issue or is it only true of IE5?   I know that IE6
> has had issues with compression in the past but they appear to be addressed
> by patches issued by Microsoft four and a half years ago:
> http://support.microsoft.com/default.aspx?scid=kb;en-us;Q312496.  Moreover
> IE6 is shrinking in market
> share<http://arstechnica.com/web/news/2011/05/web-browser-market-share-upgrade-analysis.ars>(~
> 10%) and IE5 does not appear in the pie-chart at all.

This was indeed a (since fixed) problem with IE6. I haven't seen the
gzip issue crop up since but that is purely anecdotal.

> And I still don't understand how that relates to Vary:User-Agent.  What's
> really at issue here seems more related to proxies; is that right?  That
> proxies were not respecting Accept-Encoding, but sending gzipped content to
> browsers that did not want it?  Is that still a problem?  Which proxies were
> broken?  Are they still broken?

Some popular OSS packages depend on Vary: User-Agent to make
downstream proxies (reverse or forward) do the right thing.

> And, while I understand the reluctance to help me figure out from our module
> what values were passed to SetEnvIfNoCase and Header, I would like to see
> whether there's agreement that the Apache 2.2 docs for mod_deflate are no
> longer appropriate -- and in fact harmful.

I've been mulling it over for 10 minutes and I can't decide. It's
harmful because it leads to a proliferation of cached objects (bad)
but removing it from the documentation will break things for someone
somewhere (also bad).

Re: Vary:User-Agent, best practices, and making the web faster.

Posted by Joshua Marantz <jm...@google.com>.
It was with some reluctance that I brought this up.  It occurs to me that
this idea propagates the sort of spec violations that led to this issue
(inappropriate user of Vary:User-Agent) in the first place.   However, I'm
trying to figure out how to improve compliance to support legitimate uses of
Vary:User-Agent without causing mod_pagespeed to become significantly less
ineffective across a broad range of sites.

We have found that putting complaints in Apache logs mostly causes disks to
fill and servers to crash -- although that does get it noticed :).  The
problem, put another way, is that mod_pagespeed cannot distinguish
legitimate uses of Vary:User-Agent, so it really has no business complaining
in logs.  Complaining in docs is fine; but some existing mod_pagespeed users
that simply type "sudo yum update" will later notice a performance-drop and
may not consult the docs to figure out why.

I'm also trying to grok the first response from Eric:

It's because of the other (dated) canned exceptions that set/unset
no-gzip/gzip-only-text/html based on the User-Agent, to second-guess
browsers that send AE:gzip but can't properly deal with it.


Going backwards:  which browsers send AE:gzip but can't properly deal with
it?   Does IE6 have that issue or is it only true of IE5?   I know that IE6
has had issues with compression in the past but they appear to be addressed
by patches issued by Microsoft four and a half years ago:
http://support.microsoft.com/default.aspx?scid=kb;en-us;Q312496.  Moreover
IE6 is shrinking in market
share<http://arstechnica.com/web/news/2011/05/web-browser-market-share-upgrade-analysis.ars>(~
10%) and IE5 does not appear in the pie-chart at all.

And I still don't understand how that relates to Vary:User-Agent.  What's
really at issue here seems more related to proxies; is that right?  That
proxies were not respecting Accept-Encoding, but sending gzipped content to
browsers that did not want it?  Is that still a problem?  Which proxies were
broken?  Are they still broken?

And, while I understand the reluctance to help me figure out from our module
what values were passed to SetEnvIfNoCase and Header, I would like to see
whether there's agreement that the Apache 2.2 docs for mod_deflate are no
longer appropriate -- and in fact harmful.

-Josh

On Sat, Jun 4, 2011 at 5:03 PM, Ben Noordhuis <in...@bnoordhuis.nl> wrote:

> On Sat, Jun 4, 2011 at 21:26, Joshua Marantz <jm...@google.com> wrote:
> > I think what we'd do is basically let mod_pagespeed ignore
> "Vary:User-Agent"
> > if we saw that it was inserted per this exact pattern.  This would, to be
>
> This seems like a stupendously bad idea. Warn about it in your docs,
> complain about it in the logs but don't willy-nilly override people's
> settings.
>

Re: Vary:User-Agent, best practices, and making the web faster.

Posted by Ben Noordhuis <in...@bnoordhuis.nl>.
On Sat, Jun 4, 2011 at 21:26, Joshua Marantz <jm...@google.com> wrote:
> I think what we'd do is basically let mod_pagespeed ignore "Vary:User-Agent"
> if we saw that it was inserted per this exact pattern.  This would, to be

This seems like a stupendously bad idea. Warn about it in your docs,
complain about it in the logs but don't willy-nilly override people's
settings.

Re: Vary:User-Agent, best practices, and making the web faster.

Posted by Eric Covener <co...@gmail.com>.
> This encourages all site owners to add Vary:User-Agent to all css and js
> files, whether they actually vary in content or not.
>
> Does anyone know the history of this recommendation?  Surely that is an
> inappropriate recommendation for mod_deflate.  Vary:Accept-Encoding make
> sense in the context of that filter, but not Vary:User-Agent.

It's because of the other (dated) canned exceptions that set/unset
no-gzip/gzip-only-text/html based on the User-Agent, to second-guess
browsers that send AE:gzip but can't properly deal with it.