You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by lu...@apache.org on 2004/07/28 02:43:44 UTC

cvs commit: jakarta-tomcat-connectors/http11/src/java/org/apache/coyote/http11 Http11Processor.java

luehe       2004/07/27 17:43:44

  Modified:    http11/src/java/org/apache/coyote/http11
                        Http11Processor.java
  Log:
  Fixed Bugtraq 6152759 ("Default charset not included in Content-Type
  response header if no char encoding was specified").
  
  According to the Servlet 2.4 spec, calling:
  
    ServletResponse.setContentType("text/html");
  
  must yield these results:
  
    ServletResponse.getContentType() -> "text/html"
  
    Content-Type response header -> "text/html;charset=ISO-8859-1"
  
  Notice the absence of a charset in the result of getContentType(), but
  its presence (set to the default ISO-8859-1) in the Content-Type
  response header.
  
  Tomcat is currently not including the default charset in the
  Content-Type response header if no char encoding was specified.
  
  Revision  Changes    Path
  1.101     +1 -1      jakarta-tomcat-connectors/http11/src/java/org/apache/coyote/http11/Http11Processor.java
  
  Index: Http11Processor.java
  ===================================================================
  RCS file: /home/cvs/jakarta-tomcat-connectors/http11/src/java/org/apache/coyote/http11/Http11Processor.java,v
  retrieving revision 1.100
  retrieving revision 1.101
  diff -u -r1.100 -r1.101
  --- Http11Processor.java	1 Jun 2004 14:34:22 -0000	1.100
  +++ Http11Processor.java	28 Jul 2004 00:43:44 -0000	1.101
  @@ -1450,7 +1450,7 @@
           if (!entityBody) {
               response.setContentLength(-1);
           } else {
  -            String contentType = response.getContentType();
  +            String contentType = response.getContentTypeResponseHeader();
               if (contentType != null) {
                   headers.setValue("Content-Type").setString(contentType);
               }
  
  
  

---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Re: cvs commit: jakarta-tomcat-connectors/http11/src/java/org/apache/coyote/http11 Http11Processor.java

Posted by Remy Maucherat <re...@apache.org>.
Jan Luehe wrote:

> yes, sorry, I had forgotten about that case.
>
> Believe, I didn't commit yesterday's patch simply because
> I don't have anything better to do. It's come up as a compliance
> issue which had been masked by an unrelated problem that was fixed.

I thought the issue caused enough trouble last time ;) I remember we got 
a huge amount of bug reports on that.

> So what I suggest is that you send that you send that as feedback to 
> the servlet API, and that they put a small errata for the specification.
>
> ServletResponse.setCharacterEncoding/setContentType/setLocale...
>
>     * <p>Containers must communicate the character encoding used for
>     * the servlet response's writer to the client if the protocol
>     * provides a way for doing so. In the case of HTTP, the character
>     * encoding is communicated as part of the <code>Content-Type</code>
>     * header for text media types. Note that the character encoding
>     * cannot be communicated via HTTP headers if the servlet does not
>     * specify a content type; however, it is still used to encode text
>     * written via the servlet response's writer.
>
> Reason: Many browsers let the user choose which encoding to apply to
> responses that don't declare their encoding, which will result in data
> corruption if the response was encoded in ISO-8859-1 and the user
> picks an incompatible encoding.

The user can still manually override the charset on its client if it 
wants to do so. I don't see how the specified behavior fixes anything.

> ServletResponse.getContentType...:
>
>     * If no character encoding has been specified, the
>     * charset parameter is omitted.
>
> Reason: Allow setLocale() to set the response's char encoding
> (corresponding to the locale) if the char encoding has not already
> been set by setContentType() or setCharacterEncoding().
>
>
> You can find corresponding wordings in the servlet spec.
>
> Therefore, we need to add a charset to the Content-Type response 
> header if:
>
> - no response encoding has been specified (ie., the return value of
>   ServletResponse.getContentType() has no charset), and
> - the response is using a writer
>
> My patch did not check for the latter.

Well, this would be a big improvement already (the implementation is 
inefficient, also, but either Bill or me can rework it). People who 
wanted to generate PDFs from JSPs had trouble (but it made sense), now 
it'll be worse. This doesn't seem very useable overall.

Rémy


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Re: cvs commit: jakarta-tomcat-connectors/http11/src/java/org/apache/coyote/http11 Http11Processor.java

Posted by Jan Luehe <Ja...@Sun.COM>.
Remy,

Remy Maucherat wrote:
> Remy Maucherat wrote:
> 
>> Cool. So after all the efforts I'm doing to optimize, you casually add 
>> GC, because the servlet API is completely stupid ?
>> So -1 for your patch: you need to rework it. I also didn't read all 
>> that funny stuff in the specification, so where does it come from ? ;)
> 
> 
> I thought about it some more, and in addition to the performance 
> problem, it would lead to adding charset even for binaries. We already 
> found out that this way very bad (in addition to being meaningless, it 
> breaks some clients such as Acrobat which simply does a check on the 
> MIME type without removing the charset, which is rather logical since 
> they're dealing with binaries) - this was some unintended behavior in 
> some 4.1.x release (if I remember well, it was introduced by you 
> already: you do like charset related issues ;) ).

yes, sorry, I had forgotten about that case.

Believe, I didn't commit yesterday's patch simply because
I don't have anything better to do. It's come up as a compliance
issue which had been masked by an unrelated problem that was fixed.

> So what I suggest is that you send that you send that as feedback to the 
> servlet API, and that they put a small errata for the specification.
> -1 for implementing the requirements of the specificatrion (although I 
> can't find them anywhere ;) ) to the letter right now, as they are 
> broken (I didn't dislike the previous behavior, so I want it to remain).

ServletResponse.setCharacterEncoding/setContentType/setLocale...

     * <p>Containers must communicate the character encoding used for
     * the servlet response's writer to the client if the protocol
     * provides a way for doing so. In the case of HTTP, the character
     * encoding is communicated as part of the <code>Content-Type</code>
     * header for text media types. Note that the character encoding
     * cannot be communicated via HTTP headers if the servlet does not
     * specify a content type; however, it is still used to encode text
     * written via the servlet response's writer.

Reason: Many browsers let the user choose which encoding to apply to
responses that don't declare their encoding, which will result in data
corruption if the response was encoded in ISO-8859-1 and the user
picks an incompatible encoding.


ServletResponse.getContentType...:

     * If no character encoding has been specified, the
     * charset parameter is omitted.

Reason: Allow setLocale() to set the response's char encoding
(corresponding to the locale) if the char encoding has not already
been set by setContentType() or setCharacterEncoding().


You can find corresponding wordings in the servlet spec.

Therefore, we need to add a charset to the Content-Type response header if:

- no response encoding has been specified (ie., the return value of
   ServletResponse.getContentType() has no charset), and
- the response is using a writer

My patch did not check for the latter.


Jan


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Re: cvs commit: jakarta-tomcat-connectors/http11/src/java/org/apache/coyote/http11 Http11Processor.java

Posted by Remy Maucherat <re...@apache.org>.
Remy Maucherat wrote:

> Cool. So after all the efforts I'm doing to optimize, you casually add 
> GC, because the servlet API is completely stupid ?
> So -1 for your patch: you need to rework it. I also didn't read all 
> that funny stuff in the specification, so where does it come from ? ;)

I thought about it some more, and in addition to the performance 
problem, it would lead to adding charset even for binaries. We already 
found out that this way very bad (in addition to being meaningless, it 
breaks some clients such as Acrobat which simply does a check on the 
MIME type without removing the charset, which is rather logical since 
they're dealing with binaries) - this was some unintended behavior in 
some 4.1.x release (if I remember well, it was introduced by you 
already: you do like charset related issues ;) ).
So what I suggest is that you send that you send that as feedback to the 
servlet API, and that they put a small errata for the specification.
-1 for implementing the requirements of the specificatrion (although I 
can't find them anywhere ;) ) to the letter right now, as they are 
broken (I didn't dislike the previous behavior, so I want it to remain).

Rémy


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Re: cvs commit: jakarta-tomcat-connectors/http11/src/java/org/apache/coyote/http11 Http11Processor.java

Posted by Remy Maucherat <re...@apache.org>.
luehe@apache.org wrote:

>luehe       2004/07/27 17:43:44
>
>  Modified:    http11/src/java/org/apache/coyote/http11
>                        Http11Processor.java
>  Log:
>  Fixed Bugtraq 6152759 ("Default charset not included in Content-Type
>  response header if no char encoding was specified").
>  
>  According to the Servlet 2.4 spec, calling:
>  
>    ServletResponse.setContentType("text/html");
>  
>  must yield these results:
>  
>    ServletResponse.getContentType() -> "text/html"
>  
>    Content-Type response header -> "text/html;charset=ISO-8859-1"
>  
>  Notice the absence of a charset in the result of getContentType(), but
>  its presence (set to the default ISO-8859-1) in the Content-Type
>  response header.
>  
>  Tomcat is currently not including the default charset in the
>  Content-Type response header if no char encoding was specified.
>  
>  Revision  Changes    Path
>  1.101     +1 -1      jakarta-tomcat-connectors/http11/src/java/org/apache/coyote/http11/Http11Processor.java
>  
>  Index: Http11Processor.java
>  ===================================================================
>  RCS file: /home/cvs/jakarta-tomcat-connectors/http11/src/java/org/apache/coyote/http11/Http11Processor.java,v
>  retrieving revision 1.100
>  retrieving revision 1.101
>  diff -u -r1.100 -r1.101
>  --- Http11Processor.java	1 Jun 2004 14:34:22 -0000	1.100
>  +++ Http11Processor.java	28 Jul 2004 00:43:44 -0000	1.101
>  @@ -1450,7 +1450,7 @@
>           if (!entityBody) {
>               response.setContentLength(-1);
>           } else {
>  -            String contentType = response.getContentType();
>  +            String contentType = response.getContentTypeResponseHeader();
>               if (contentType != null) {
>                   headers.setValue("Content-Type").setString(contentType);
>               }
>  
>
Cool. So after all the efforts I'm doing to optimize, you casually add 
GC, because the servlet API is completely stupid ?
So -1 for your patch: you need to rework it. I also didn't read all that 
funny stuff in the specification, so where does it come from ? ;)

Rémy


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org