You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Konstantin Kolinko <kn...@gmail.com> on 2013/10/13 15:11:39 UTC

Serving gz files in DefaultServlet (Re: r1531115, BZ 54095)

Hi!

Regarding this commit:
2013/10/11  <ma...@apache.org>:
> Author: markt
> Date: Thu Oct 10 21:24:59 2013
> New Revision: 1531115
>
> URL: http://svn.apache.org/r1531115
> Log:
> Fix https://issues.apache.org/bugzilla/show_bug.cgi?id=54095
> Add support to the Default Servlet for serving gzipped versions of static resources directly from disk as an alternative to Tomcat compressing them on each request. Patch by Philippe Marschall.
>
> Added:
>     tomcat/trunk/test/webapp/index.html.gz   (with props)
> Modified:
>     tomcat/trunk/conf/web.xml
>     tomcat/trunk/java/org/apache/catalina/servlets/DefaultServlet.java
>     tomcat/trunk/test/org/apache/catalina/servlets/TestDefaultServlet.java
>     tomcat/trunk/webapps/docs/changelog.xml
>     tomcat/trunk/webapps/docs/default-servlet.xml
>

General:
I think this feature should be opt-in, like the listings feature of
DefaultServlet, being off by default.

Concerns:
(1) Excessive disk access to check existence of gz files.

(2) Additional access path for ".gz" files, which might be not covered
by security constraints

(3) Interoperability with filters that may preprocess or postprocess
the response,
including ISE handling in the following lines of DefaultServlet:

[[[
            try {
                ostream = response.getOutputStream();
            } catch (IllegalStateException e) {
                ...
                writer = response.getWriter();
]]]

(2) and (3) would occur only if both "foo" and "foo.gz" files are present.

As (2) requires the presence and accessibility of "foo" file (as
"resource.exists()" check is done before gzip processing), I see no
exploitable issue here, though we may mention this on the
"security-howto" page.


Technical issues with the code:
1) "If-Modified-Since" header processing is inconsistent,

The checkIfHeaders(...) call happens before gzip processing and checks
the date of original resource, not of gz one,
but "ETag" and "Last-Modified" headers will be for the gz resource.

I think it is better to respond with "ETag" and "Last-Modified" of the
original resource.

2) I think that if it is an "included" resource, the gzip feature
should be disabled.
See lines 751-752 as an example:
[[[
            boolean included = (request.getAttribute(
                    RequestDispatcher.INCLUDE_CONTEXT_PATH) != null);
]]]

3) The following line:
>+       gzipResource.setMimeType(contentType);

I think that line is not needed here, as the "Content-Type" header is
served from a local variable, not property of the resource.
Moreover I suspect that it changes mime-type of a cached resource and
a subsequent direct request for "foo.gz" will be served with wrong
mime-type.

Best regards,
Konstantin Kolinko

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Serving gz files in DefaultServlet (Re: r1531115, BZ 54095)

Posted by Rainer Jung <ra...@kippdata.de>.
On 15.10.2013 18:43, Christopher Schultz wrote:

> It's also worth pointing out that what has been implemented for Tomcat
> is a serious pain in the neck to do in httpd. Basically, to get it to
> work as expected in httpd, you need several animal sacrifices and a fair
> amount of arcane words to get it done. At this point, I have the animal
> part done but my pronunciation must be off because I still don't have it
> working properly.

It seems I sacrificed the right animals. Here's an old recipe of mine
once done for Apache 2.2. Might be a bit easier in 2.4 using the more
powerful expressions:

Explanation:

1) mod_rewrite checks, whether the browser accepts gzip encoded
responses (RewriteCond using Accept-Encoding header from the request)

2) If "yes", check whether the URL indicates that we do have pre
compressed content on disk

3) If both are "yes", then add a .gz in front of the file suffix,
rewrite URI and remember the decision in an env var. Not adding .gz at
the end of the file name is important to let Apache still set the
correct Content-Type header from the original and unchanged file ending.

4) Finally use mod_headers to set the response Content-Encoding header
to "gzip" (if we find that the env var is set).

5) Caches will be informed by the Vary response header that there was a
content "negiotiation" going on (without mod_negotiation). This is done
by mod_rewrite automatically, it adds "Accept-Encoding" to the Vary header.


Example config:

# This needs mod_rewrite and mod_headers
# loaded as modules.

# Make static content available.
# Not needed if already mounted elsewhere.

Alias /static /my/path/to/static

# Activate mod_rewrite and debug logging.
# Not needed if mod_rewrite is already
# activated for this VHost elsewhere.

RewriteEngine On
RewriteLog "|/path/to/bin/rotatelogs /oath/to/rewrite_log 43200"
# Only for debugging
RewriteLogLevel 255

# Flip in compressed content if allowed.
# Assumes all the compressed files are on disk
# having the correct names:
# something.css -> something.gz.css
# something.js -> something.gz.js

# 1) Check whether browser accepts "gzip" encoding
RewriteCond %{HTTP:Accept-Encoding} gzip
# 2) Check whether request belongs to our
#    static URLs and has the right suffix
#    If yes, add ".gz" to URL before existing suffix
#    and remember this in our custom environment variable.
RewriteRule (/static/.*)\.(css|js)$ $1.gz.$2 [E=gz:1]

# Fix returned encoding header, the file was gzipped.
Header set Content-Encoding gzip env=gz

# Notes:
#
# - Be careful when introducing loops for rewrite rules:
#   The new .gz.js etc. file would again match the rule
#   leading to unterminated recursion.
#   Make regexp more precise in that case (not allowing the .gz.)
#   to match again.
#
# - Content-Type header is OK, because file suffix hasn't changed.
#   This would not work for files without suffix, because then
#   we end up with a ".gz" suffix!
#
# - Vary header is automatically extended with "Accept-Encoding"
#   by mod_rewrite because of using the "Accept-Encoding" header
#   in the RewriteCond
#
# - Old-style "Accept-Encoding: x-gzip" in request also works.
#   The "gzip" is a sub pattern match (not anchored).
#
# Open Questions:
#
# - Is there any interoperability issue when mod_deflate is
#   activated in addition (double compress or similar).
#   If so, try to set env var "no-gzip" to deactivate mod_deflate
#   for those requests.


Finally a simple script to add the pre-compressed content to the disk:

CONTENT_DIR=/path/to/static/content
for suffix in css js
do
for file in `find $CONTENT_DIR -type f -name "*.$suffix" -a ! -name
"*.gz.*"`
do
gzfile=`echo $file | sed -e 's#\.'$suffix'#.gz.'$suffix'#'`
gzip --best -c $file > $gzfile
chmod 644 $gzfile
echo === $file $gzfile ===
ls -ld $file $gzfile
done
done

Or similar.

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Serving gz files in DefaultServlet (Re: r1531115, BZ 54095)

Posted by Christopher Schultz <ch...@christopherschultz.net>.
Konstantin,

On 10/15/13 7:15 AM, Konstantin Kolinko wrote:
> 2013/10/13 Mark Thomas <ma...@apache.org>:
>> On 13/10/2013 14:11, Konstantin Kolinko wrote:
>>
>> <snip/>
>>
>>>> URL: http://svn.apache.org/r1531115
>>>> Log:
>>>> Fix https://issues.apache.org/bugzilla/show_bug.cgi?id=54095
>>>> Add support to the Default Servlet for serving gzipped versions of static resources directly from disk as an alternative to Tomcat compressing them on each request. Patch by Philippe Marschall.
>>
>> <snip/>
>>
>>> General:
>>> I think this feature should be opt-in, like the listings feature of
>>> DefaultServlet, being off by default.
>>
>> I disagree since:
>> - this is only in 8.0.x and we haven't had a stable release yet.
>> - the user has to create the gzip'd version which is unlikely to exist
>> be default before this feature does anything
>>
>> I agree if it is ever back-ported to earlier versions it needs to be
>> disabled by default.
>>
> 
> My concern is that this feature is not stated by Specification, and by
> default I think people should not expect this behaviour from "default"
> servlet. Thus I think it should be an opt-in feature.
> 
> 
> The files may already exist there for some other reasons. E.g. if an
> application uses a framework that implements this feature by itself
> independently of Tomcat.
> 
>>> (2) Additional access path for ".gz" files, which might be not covered
>>> by security constraints
>>>
> 
> E.g. if "foo.gz" is protected by a constraint and should not be served
> directly,  it can now be accessed by asking for "foo".  It only works
> if "foo" exists as well, so unlikely it causes anything, but this is
> an additional access path.
> 
>>> (3) Interoperability with filters that may preprocess or postprocess
>>> the response,
>>> including ISE handling in the following lines of DefaultServlet:
>>>
>>> [[[
>>>             try {
>>>                 ostream = response.getOutputStream();
>>>             } catch (IllegalStateException e) {
>>>                 ...
>>>                 writer = response.getWriter();
>>> ]]]
>>
>> I've disabled the fall back for (3).
>>
> 
> I'd be better to fallback to serving the original resource instead of
> failing. I agree that it is a rare case, though.
> 
>>
> 
> There is also a list of options for DefaultServlet just above its
> definition in conf/web.xml. It has not been updated.
> 
>>
> 
> Looking at how Apache HTTPD deals with this content negotiation feature,
> 
> First,
> this feature is present by default (mod_negation is compiled in), but
> is turned off. To enable it you have to enable it with  "Options
> MultiViews" directive on specific directory.  Note that using "Options
> All" does not enable this feature, you have to enable it explicitly.
> [1][4]
> 
> Second,
> Apache HTTPD goes to some length to prevent unwanted caching of these
> responses by proxies.
> 
> "To prevent this, httpd normally marks all responses that are returned
> after content negotiation as non-cacheable by HTTP/1.0 clients" [1]
> It is configurable with directive "CacheNegotiatedDocs".
> 
> For HTTP/1.1 clients it sends a "Vary" header, such as "Vary:
> Accept-Encoding". [1][3].
> 
> [1] http://httpd.apache.org/docs/current/content-negotiation.html
> [2] http://httpd.apache.org/docs/current/mod/mod_negotiation.html
> [3] http://httpd.apache.org/docs/current/mod/mod_deflate.html
> [4] http://httpd.apache.org/docs/current/mod/core.html#options

It's also worth pointing out that what has been implemented for Tomcat
is a serious pain in the neck to do in httpd. Basically, to get it to
work as expected in httpd, you need several animal sacrifices and a fair
amount of arcane words to get it done. At this point, I have the animal
part done but my pronunciation must be off because I still don't have it
working properly.

That said, making it easy to do in Tomcat - while nice - is also a bit
dangerous given the previous arguments. I agree with the decision to
disable this by default... just wanted to add a "me too" voice in case
you wanted some more input, Mark.

Thanks,
-chris


Re: Serving gz files in DefaultServlet (Re: r1531115, BZ 54095)

Posted by Mark Thomas <ma...@apache.org>.
I'm convinced. Default changed.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Serving gz files in DefaultServlet (Re: r1531115, BZ 54095)

Posted by Konstantin Kolinko <kn...@gmail.com>.
2013/10/13 Mark Thomas <ma...@apache.org>:
> On 13/10/2013 14:11, Konstantin Kolinko wrote:
>
> <snip/>
>
>>> URL: http://svn.apache.org/r1531115
>>> Log:
>>> Fix https://issues.apache.org/bugzilla/show_bug.cgi?id=54095
>>> Add support to the Default Servlet for serving gzipped versions of static resources directly from disk as an alternative to Tomcat compressing them on each request. Patch by Philippe Marschall.
>
> <snip/>
>
>> General:
>> I think this feature should be opt-in, like the listings feature of
>> DefaultServlet, being off by default.
>
> I disagree since:
> - this is only in 8.0.x and we haven't had a stable release yet.
> - the user has to create the gzip'd version which is unlikely to exist
> be default before this feature does anything
>
> I agree if it is ever back-ported to earlier versions it needs to be
> disabled by default.
>

My concern is that this feature is not stated by Specification, and by
default I think people should not expect this behaviour from "default"
servlet. Thus I think it should be an opt-in feature.


The files may already exist there for some other reasons. E.g. if an
application uses a framework that implements this feature by itself
independently of Tomcat.

>> (2) Additional access path for ".gz" files, which might be not covered
>> by security constraints
>>

E.g. if "foo.gz" is protected by a constraint and should not be served
directly,  it can now be accessed by asking for "foo".  It only works
if "foo" exists as well, so unlikely it causes anything, but this is
an additional access path.

>> (3) Interoperability with filters that may preprocess or postprocess
>> the response,
>> including ISE handling in the following lines of DefaultServlet:
>>
>> [[[
>>             try {
>>                 ostream = response.getOutputStream();
>>             } catch (IllegalStateException e) {
>>                 ...
>>                 writer = response.getWriter();
>> ]]]
>
> I've disabled the fall back for (3).
>

I'd be better to fallback to serving the original resource instead of
failing. I agree that it is a rare case, though.

>

There is also a list of options for DefaultServlet just above its
definition in conf/web.xml. It has not been updated.

>

Looking at how Apache HTTPD deals with this content negotiation feature,

First,
this feature is present by default (mod_negation is compiled in), but
is turned off. To enable it you have to enable it with  "Options
MultiViews" directive on specific directory.  Note that using "Options
All" does not enable this feature, you have to enable it explicitly.
[1][4]

Second,
Apache HTTPD goes to some length to prevent unwanted caching of these
responses by proxies.

"To prevent this, httpd normally marks all responses that are returned
after content negotiation as non-cacheable by HTTP/1.0 clients" [1]
It is configurable with directive "CacheNegotiatedDocs".

For HTTP/1.1 clients it sends a "Vary" header, such as "Vary:
Accept-Encoding". [1][3].

[1] http://httpd.apache.org/docs/current/content-negotiation.html
[2] http://httpd.apache.org/docs/current/mod/mod_negotiation.html
[3] http://httpd.apache.org/docs/current/mod/mod_deflate.html
[4] http://httpd.apache.org/docs/current/mod/core.html#options

Best regards,
Konstantin Kolinko

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Serving gz files in DefaultServlet (Re: r1531115, BZ 54095)

Posted by Remy Maucherat <re...@apache.org>.
On Sun, 2013-10-13 at 16:03 +0100, Mark Thomas wrote:
> I disagree since:
> - this is only in 8.0.x and we haven't had a stable release yet.
> - the user has to create the gzip'd version which is unlikely to exist
> be default before this feature does anything

I also think it should be disabled by default in all branches.

Rémy



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Serving gz files in DefaultServlet (Re: r1531115, BZ 54095)

Posted by Mark Thomas <ma...@apache.org>.
On 13/10/2013 14:11, Konstantin Kolinko wrote:

<snip/>

>> URL: http://svn.apache.org/r1531115
>> Log:
>> Fix https://issues.apache.org/bugzilla/show_bug.cgi?id=54095
>> Add support to the Default Servlet for serving gzipped versions of static resources directly from disk as an alternative to Tomcat compressing them on each request. Patch by Philippe Marschall.

<snip/>

> General:
> I think this feature should be opt-in, like the listings feature of
> DefaultServlet, being off by default.

I disagree since:
- this is only in 8.0.x and we haven't had a stable release yet.
- the user has to create the gzip'd version which is unlikely to exist
be default before this feature does anything

I agree if it is ever back-ported to earlier versions it needs to be
disabled by default.

> Concerns:
> (1) Excessive disk access to check existence of gz files.

The caching mechanism should handle this the same way it does for any
other static resource.

> (2) Additional access path for ".gz" files, which might be not covered
> by security constraints
> 
> (3) Interoperability with filters that may preprocess or postprocess
> the response,
> including ISE handling in the following lines of DefaultServlet:
> 
> [[[
>             try {
>                 ostream = response.getOutputStream();
>             } catch (IllegalStateException e) {
>                 ...
>                 writer = response.getWriter();
> ]]]
> 
> (2) and (3) would occur only if both "foo" and "foo.gz" files are present.

I've disabled the fall back for (3).

> As (2) requires the presence and accessibility of "foo" file (as
> "resource.exists()" check is done before gzip processing), I see no
> exploitable issue here, though we may mention this on the
> "security-howto" page.

I think it is worth mentioning that the .gz versions will be directly
accessible and need to be protected if the normal version is protected.

> Technical issues with the code:
> 1) "If-Modified-Since" header processing is inconsistent,

Fixed.

> 2) I think that if it is an "included" resource, the gzip feature
> should be disabled.

Fixed.

> 3) The following line:
>> +       gzipResource.setMimeType(contentType);
> 
> I think that line is not needed here, as the "Content-Type" header is
> served from a local variable, not property of the resource.

Fixed.

Thanks for the review,

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org