You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@trafficserver.apache.org by Nick Muerdter <st...@nickm.org> on 2014/07/29 20:17:55 UTC

Maximizing cache hits when dealing with gzip

Hi,

For cacheable, gzippable requests, I'm trying to ensure that our origin
servers only get hit once, regardless of whether clients request gzip or
not. In other words, if the first client to hit an uncached resource
accepts gzip, I want all subsequent gzipped or non-gzipped responses to
be delivered from the cache, rather than hitting the origin server
again.

TrafficServer's gzip plugin does seem to support this behavior in some
situations, but not universally. So I'm not sure if this is a bug in the
gzip plugin, or if I've misconfigured things, or this simply isn't
supported by the gzip plugin and Traffic Server. Any thoughts or ideas
would be welcome.

The main issue I'm running into is when the origin server supports
gzipping responses itself (so it returns "Vary: Accept-Encoding"
headers). In that case, TrafficServer wants to cache the gzipped and
non-gzipped versions of the response separately, incurring two separate
origin requests. If I set "remove-accept-encoding true" this almost
solves things, except when the first request requests gzipping (the
client sets "Accept-Encoding: gzip") and the server response contains
"Vary: Accept-Encoding". In that case, a subsequent uncompressed request
(omitting any "Accept-Encoding" header) still results in another hit to
the origin server.

And while "remove-accept-encoding true" comes closer to solving the
issue, I'd ideally like to achieve this behavior with
"remove-accept-encoding false", since in some cases, I'd prefer to have
the gzipping handled by the origin server and the backend communication
happen with gzipped responses.

Here's an excerpt from some automated integration tests I've written to
test all the various gzip request/response combinations I could come up
with. This might more clearly define which situations are currently
working and which one's aren't, but let me know if any of this still
isn't clear:

https://gist.github.com/GUI/4ab8eacb6bd21f39d590

This was tested on Traffic Server 5.0.1. I could also abstract this part
of our test suite into some isolated test scripts if anyone want to try
and reproduce it.

And for whatever it's worth, Varnish appears to behave the way I want,
so it seems like it might be in the realm of possibilities, but Varnish
also seems to deal with gzip quite differently (I think it always stores
the gzipped version and un-gzips on the fly for uncompressed clients).
I've tried various tweaks to the gzip and vary settings in Traffic
Server, but I can't seem to get rid of these duplicate requests in some
cases when different clients support gzip or not.

Thanks!
Nick

Re: Maximizing cache hits when dealing with gzip

Posted by Nick Muerdter <st...@nickm.org>.
Thanks for the quick response and for offering to take a look
at it. Much appreciated! It's definitely helpful to know that
this isn't currently supported, but if an option could someday
be added to make gzip behave this way, that would be fantastic.
I'm unfortunately not very familiar with the Traffic Server
code base, but if I can help out in any way, give me a shout.



Thanks again!

Nick





On Tue, Jul 29, 2014, at 03:21 PM, Otto van der Schaaf wrote:

Regrettably, the gzip plugin currently doesn't support doing
what you want. But I think it would be really nice to (add an
option to) make it work like that.  I'll have a look, but allow
me a week or two to get back to you about this.

Otto



2014-07-29 20:17 GMT+02:00 Nick Muerdter <[1...@nickm.org>:

Hi,



For cacheable, gzippable requests, I'm trying to ensure that
our origin

servers only get hit once, regardless of whether clients
request gzip or

not. In other words, if the first client to hit an uncached
resource

accepts gzip, I want all subsequent gzipped or non-gzipped
responses to

be delivered from the cache, rather than hitting the origin
server

again.



TrafficServer's gzip plugin does seem to support this behavior
in some

situations, but not universally. So I'm not sure if this is a
bug in the

gzip plugin, or if I've misconfigured things, or this simply
isn't

supported by the gzip plugin and Traffic Server. Any thoughts
or ideas

would be welcome.



The main issue I'm running into is when the origin server
supports

gzipping responses itself (so it returns "Vary:
Accept-Encoding"

headers). In that case, TrafficServer wants to cache the
gzipped and

non-gzipped versions of the response separately, incurring two
separate

origin requests. If I set "remove-accept-encoding true" this
almost

solves things, except when the first request requests gzipping
(the

client sets "Accept-Encoding: gzip") and the server response
contains

"Vary: Accept-Encoding". In that case, a subsequent
uncompressed request

(omitting any "Accept-Encoding" header) still results in
another hit to

the origin server.



And while "remove-accept-encoding true" comes closer to solving
the

issue, I'd ideally like to achieve this behavior with

"remove-accept-encoding false", since in some cases, I'd prefer
to have

the gzipping handled by the origin server and the backend
communication

happen with gzipped responses.



Here's an excerpt from some automated integration tests I've
written to

test all the various gzip request/response combinations I could
come up

with. This might more clearly define which situations are
currently

working and which one's aren't, but let me know if any of this
still

isn't clear:



[2]https://gist.github.com/GUI/4ab8eacb6bd21f39d590



This was tested on Traffic Server 5.0.1. I could also abstract
this part

of our test suite into some isolated test scripts if anyone
want to try

and reproduce it.



And for whatever it's worth, Varnish appears to behave the way
I want,

so it seems like it might be in the realm of possibilities, but
Varnish

also seems to deal with gzip quite differently (I think it
always stores

the gzipped version and un-gzips on the fly for uncompressed
clients).

I've tried various tweaks to the gzip and vary settings in
Traffic

Server, but I can't seem to get rid of these duplicate requests
in some

cases when different clients support gzip or not.



Thanks!

Nick

References

1. mailto:stuff@nickm.org
2. https://gist.github.com/GUI/4ab8eacb6bd21f39d590

Re: Maximizing cache hits when dealing with gzip

Posted by Otto van der Schaaf <os...@gmail.com>.
Regrettably, the gzip plugin currently doesn't support doing what you want.
But I think it would be really nice to (add an option to) make it work like
that.  I'll have a look, but allow me a week or two to get back to you
about this.

Otto


2014-07-29 20:17 GMT+02:00 Nick Muerdter <st...@nickm.org>:

> Hi,
>
> For cacheable, gzippable requests, I'm trying to ensure that our origin
> servers only get hit once, regardless of whether clients request gzip or
> not. In other words, if the first client to hit an uncached resource
> accepts gzip, I want all subsequent gzipped or non-gzipped responses to
> be delivered from the cache, rather than hitting the origin server
> again.
>
> TrafficServer's gzip plugin does seem to support this behavior in some
> situations, but not universally. So I'm not sure if this is a bug in the
> gzip plugin, or if I've misconfigured things, or this simply isn't
> supported by the gzip plugin and Traffic Server. Any thoughts or ideas
> would be welcome.
>
> The main issue I'm running into is when the origin server supports
> gzipping responses itself (so it returns "Vary: Accept-Encoding"
> headers). In that case, TrafficServer wants to cache the gzipped and
> non-gzipped versions of the response separately, incurring two separate
> origin requests. If I set "remove-accept-encoding true" this almost
> solves things, except when the first request requests gzipping (the
> client sets "Accept-Encoding: gzip") and the server response contains
> "Vary: Accept-Encoding". In that case, a subsequent uncompressed request
> (omitting any "Accept-Encoding" header) still results in another hit to
> the origin server.
>
> And while "remove-accept-encoding true" comes closer to solving the
> issue, I'd ideally like to achieve this behavior with
> "remove-accept-encoding false", since in some cases, I'd prefer to have
> the gzipping handled by the origin server and the backend communication
> happen with gzipped responses.
>
> Here's an excerpt from some automated integration tests I've written to
> test all the various gzip request/response combinations I could come up
> with. This might more clearly define which situations are currently
> working and which one's aren't, but let me know if any of this still
> isn't clear:
>
> https://gist.github.com/GUI/4ab8eacb6bd21f39d590
>
> This was tested on Traffic Server 5.0.1. I could also abstract this part
> of our test suite into some isolated test scripts if anyone want to try
> and reproduce it.
>
> And for whatever it's worth, Varnish appears to behave the way I want,
> so it seems like it might be in the realm of possibilities, but Varnish
> also seems to deal with gzip quite differently (I think it always stores
> the gzipped version and un-gzips on the fly for uncompressed clients).
> I've tried various tweaks to the gzip and vary settings in Traffic
> Server, but I can't seem to get rid of these duplicate requests in some
> cases when different clients support gzip or not.
>
> Thanks!
> Nick
>