You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by lu...@apache.org on 2004/07/28 02:43:17 UTC

cvs commit: jakarta-tomcat-connectors/coyote/src/java/org/apache/coyote Response.java

luehe       2004/07/27 17:43:17

  Modified:    coyote/src/java/org/apache/coyote Response.java
  Log:
  Fixed Bugtraq 6152759 ("Default charset not included in Content-Type
  response header if no char encoding was specified").
  
  According to the Servlet 2.4 spec, calling:
  
    ServletResponse.setContentType("text/html");
  
  must yield these results:
  
    ServletResponse.getContentType() -> "text/html"
  
    Content-Type response header -> "text/html;charset=ISO-8859-1"
  
  Notice the absence of a charset in the result of getContentType(), but
  its presence (set to the default ISO-8859-1) in the Content-Type
  response header.
  
  Tomcat is currently not including the default charset in the
  Content-Type response header if no char encoding was specified.
  
  Revision  Changes    Path
  1.33      +28 -0     jakarta-tomcat-connectors/coyote/src/java/org/apache/coyote/Response.java
  
  Index: Response.java
  ===================================================================
  RCS file: /home/cvs/jakarta-tomcat-connectors/coyote/src/java/org/apache/coyote/Response.java,v
  retrieving revision 1.32
  retrieving revision 1.33
  diff -u -r1.32 -r1.33
  --- Response.java	24 Feb 2004 08:54:29 -0000	1.32
  +++ Response.java	28 Jul 2004 00:43:17 -0000	1.33
  @@ -21,6 +21,7 @@
   
   import org.apache.tomcat.util.buf.ByteChunk;
   import org.apache.tomcat.util.http.MimeHeaders;
  +import org.apache.tomcat.util.http.ContentType;
   
   /**
    * Response object.
  @@ -524,6 +525,33 @@
           return ret;
       }
       
  +
  +    /**
  +     * Returns the value of the Content-Type response header, based on the
  +     * current return value of getContentType().
  +     *
  +     * Notice that while the charset parameter must be omitted from the
  +     * return value of ServletResponse.getContentType() if no character
  +     * encoding has been specified, the spec requires that a charset (default:
  +     * ISO-8859-1) always be included in the Content-Type response header
  +     *
  +     * @return Value of Content-Type response header
  +     */
  +    public String getContentTypeResponseHeader() {
  +
  +        String header = getContentType();
  +        if (header != null) {
  +            if (!ContentType.hasCharset(header)) {
  +                // Must communicate response encoding to client
  +                header = header + ";charset="
  +                    + Constants.DEFAULT_CHARACTER_ENCODING;
  +            }
  +        }
  +
  +        return header;
  +    }
  +
  +
       public void setContentLength(int contentLength) {
           this.contentLength = contentLength;
       }
  
  
  

---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Re: cvs commit: jakarta-tomcat-connectors/coyote/src/java/org/apache/coyote Response.java

Posted by Costin Manolache <cm...@yahoo.com>.
Jan Luehe wrote:

> Bill,
> 
> 
>>> luehe       2004/07/27 17:43:17
>>>
>>>  Modified:    coyote/src/java/org/apache/coyote Response.java
>>>  Log:
>>>  Fixed Bugtraq 6152759 ("Default charset not included in Content-Type
>>>  response header if no char encoding was specified").
>>>
>>>  According to the Servlet 2.4 spec, calling:
>>>
>>>    ServletResponse.setContentType("text/html");
>>>
>>>  must yield these results:
>>>
>>>    ServletResponse.getContentType() -> "text/html"
>>>
>>>    Content-Type response header -> "text/html;charset=ISO-8859-1"
>>>
>>>  Notice the absence of a charset in the result of getContentType(), but
>>>  its presence (set to the default ISO-8859-1) in the Content-Type
>>>  response header.
>>>
>>>  Tomcat is currently not including the default charset in the
>>>  Content-Type response header if no char encoding was specified.
>>>
>>
>>
>> -1.  This gets us right back to the same old problem where we are sending
>> back "image/gif; charset=iso-8859-1", and nobody can read the response.
> 
> 
> yes, sorry, I had forgotten about that case.
> 
>> If we're not going to assume that the UA believes that the default 
>> encoding
>> is iso-8859-1 (which is what we are doing now),
> 
> 
> I think the reason the spec added the requirement to clearly identify
> the encoding in all cases (when using a writer) was because many
> browsers let the user choose
> which encoding to apply to responses that don't declare their encoding,
> which will result in data corruption if the response was encoded in
> ISO-8859-1 and the user picks an incompatible encoding.

AFAIK browsers let the user choose the encoding even if it is specified.

And they do that exactly because some 'smart' servers send a wrong 
encoding ( like 8859-1 ) even if the content is different.

If you are using a foreign charset, your data will be either 8859-x ( 
with x!= 1 ) or UTF8. In any case - it will never be 8859-1 ( since the 
foreign characters won't exist there ). So the requirement is to 
basically break any foreign language.



> 
>> then I'd suggest simply
>> doing:
>>    setCharacterEncoding(getCharacterEncoding());
>> in Response.getWriter (since the spec only requires that we identify the
>> charset when using a Writer, and we don't really know what it is when 
>> using
>> OutputStream).
> 
> 
> The problem with this is that if you call getWriter() (with your 
> proposed fix) followed by getContentType(), the returned content type
> will include a charset, which is against the spec of getContentType():
> 
>   * If no character encoding has been specified, the
>   * charset parameter is omitted.
> 
> This is why we need to append the default charset to the value of the
> Content-Type header, if no char encoding has been specified.


On one side it is required to identify the charset in all cases ( to not 
confuse browsers ), but on the other you are not allowed to specify the 
real encoding from the writer, if it wasn't specified :-) ?

Costin

> 
> Jan
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Re: cvs commit: jakarta-tomcat-connectors/coyote/src/java/org/apache/coyote Response.java

Posted by Jan Luehe <Ja...@Sun.COM>.
Jan Luehe wrote:
> Remy,
> 
> Remy Maucherat wrote:
> 
>> Jan Luehe wrote:
>>
>>> Bill,
>>>
>>>> then I'd suggest simply
>>>> doing:
>>>>    setCharacterEncoding(getCharacterEncoding());
>>>> in Response.getWriter (since the spec only requires that we identify 
>>>> the
>>>> charset when using a Writer, and we don't really know what it is 
>>>> when using
>>>> OutputStream).
>>>
>>>
>>>
>>>
>>> The problem with this is that if you call getWriter() (with your 
>>> proposed fix) followed by getContentType(), the returned content type
>>> will include a charset, which is against the spec of getContentType():
>>>
>>>   * If no character encoding has been specified, the
>>>   * charset parameter is omitted.
>>>
>>> This is why we need to append the default charset to the value of the
>>> Content-Type header, if no char encoding has been specified.
>>
>>
>>
>> This is not acceptable, and is not an option, so you shouldn't be 
>> using "we need", because we won't ;)
> 
> 
> This is why I said "we need" instead of "we will". ;-)
> 
> I agreed with Bill's proposed solution in principle (include charset
> only when using writer), but pointed out that his proposed patch
> would break ServletResponse.getContentType(), which is why I said that
> if we were to include the default charset (if no charset was specified
> and if a writer was being used), we'd have to do it at the time of
> generating the header.
> 
> I'll reply to your other mail shortly.

Actually, I just found that Bill's patch would fix the issue and
be compliant with the spec:

ServletResponse.getWriter:

      * If the response's character encoding has not been
      * specified as described in <code>getCharacterEncoding</code>
      * (i.e., the method just returns the default value
      * <code>ISO-8859-1</code>), <code>getWriter</code>
      * updates it to <code>ISO-8859-1</code>.

So it *is* expected for getContentType() to include a
"charset=ISO-8859-1" if no char encoding had been specified
before getWriter() was called.

This way, the charset would automatically be included in the
Content-Type response header.

I'll revert my patch and apply Bill's solution.

Thanks, Bill!

Jan



> Jan
> 
> 
>> The right solution is IMO to point out the issues to the specification 
>> people.
>>
>> Rémy
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Re: cvs commit: jakarta-tomcat-connectors/coyote/src/java/org/apache/coyote Response.java

Posted by Jan Luehe <Ja...@Sun.COM>.
Remy,

Remy Maucherat wrote:
> Jan Luehe wrote:
> 
>> Bill,
>>
>>> then I'd suggest simply
>>> doing:
>>>    setCharacterEncoding(getCharacterEncoding());
>>> in Response.getWriter (since the spec only requires that we identify the
>>> charset when using a Writer, and we don't really know what it is when 
>>> using
>>> OutputStream).
>>
>>
>>
>> The problem with this is that if you call getWriter() (with your 
>> proposed fix) followed by getContentType(), the returned content type
>> will include a charset, which is against the spec of getContentType():
>>
>>   * If no character encoding has been specified, the
>>   * charset parameter is omitted.
>>
>> This is why we need to append the default charset to the value of the
>> Content-Type header, if no char encoding has been specified.
> 
> 
> This is not acceptable, and is not an option, so you shouldn't be using 
> "we need", because we won't ;)

This is why I said "we need" instead of "we will". ;-)

I agreed with Bill's proposed solution in principle (include charset
only when using writer), but pointed out that his proposed patch
would break ServletResponse.getContentType(), which is why I said that
if we were to include the default charset (if no charset was specified
and if a writer was being used), we'd have to do it at the time of
generating the header.

I'll reply to your other mail shortly.

Jan


> The right solution is IMO to point out the issues to the specification 
> people.
> 
> Rémy
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Re: cvs commit: jakarta-tomcat-connectors/coyote/src/java/org/apache/coyote Response.java

Posted by Remy Maucherat <re...@apache.org>.
Jan Luehe wrote:

> Bill,
>
>> then I'd suggest simply
>> doing:
>>    setCharacterEncoding(getCharacterEncoding());
>> in Response.getWriter (since the spec only requires that we identify the
>> charset when using a Writer, and we don't really know what it is when 
>> using
>> OutputStream).
>
>
> The problem with this is that if you call getWriter() (with your 
> proposed fix) followed by getContentType(), the returned content type
> will include a charset, which is against the spec of getContentType():
>
>   * If no character encoding has been specified, the
>   * charset parameter is omitted.
>
> This is why we need to append the default charset to the value of the
> Content-Type header, if no char encoding has been specified.

This is not acceptable, and is not an option, so you shouldn't be using 
"we need", because we won't ;)
The right solution is IMO to point out the issues to the specification 
people.

Rémy


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Re: cvs commit: jakarta-tomcat-connectors/coyote/src/java/org/apache/coyote Response.java

Posted by Jan Luehe <Ja...@Sun.COM>.
Bill,


>>luehe       2004/07/27 17:43:17
>>
>>  Modified:    coyote/src/java/org/apache/coyote Response.java
>>  Log:
>>  Fixed Bugtraq 6152759 ("Default charset not included in Content-Type
>>  response header if no char encoding was specified").
>>
>>  According to the Servlet 2.4 spec, calling:
>>
>>    ServletResponse.setContentType("text/html");
>>
>>  must yield these results:
>>
>>    ServletResponse.getContentType() -> "text/html"
>>
>>    Content-Type response header -> "text/html;charset=ISO-8859-1"
>>
>>  Notice the absence of a charset in the result of getContentType(), but
>>  its presence (set to the default ISO-8859-1) in the Content-Type
>>  response header.
>>
>>  Tomcat is currently not including the default charset in the
>>  Content-Type response header if no char encoding was specified.
>>
> 
> 
> -1.  This gets us right back to the same old problem where we are sending
> back "image/gif; charset=iso-8859-1", and nobody can read the response.

yes, sorry, I had forgotten about that case.

> If we're not going to assume that the UA believes that the default encoding
> is iso-8859-1 (which is what we are doing now),

I think the reason the spec added the requirement to clearly identify
the encoding in all cases (when using a writer) was because many
browsers let the user choose
which encoding to apply to responses that don't declare their encoding,
which will result in data corruption if the response was encoded in
ISO-8859-1 and the user picks an incompatible encoding.

> then I'd suggest simply
> doing:
>    setCharacterEncoding(getCharacterEncoding());
> in Response.getWriter (since the spec only requires that we identify the
> charset when using a Writer, and we don't really know what it is when using
> OutputStream).

The problem with this is that if you call getWriter() (with your 
proposed fix) followed by getContentType(), the returned content type
will include a charset, which is against the spec of getContentType():

   * If no character encoding has been specified, the
   * charset parameter is omitted.

This is why we need to append the default charset to the value of the
Content-Type header, if no char encoding has been specified.

Jan



> 
> 
> 
> ------------------------------------------------------------------------
> 
> 
> This message is intended only for the use of the person(s) listed above as the intended recipient(s), and may contain information that is PRIVILEGED and CONFIDENTIAL.  If you are not an intended recipient, you may not read, copy, or distribute this message or any attachment. If you received this communication in error, please notify us immediately by e-mail and then delete all copies of this message and any attachments.
> 
> In addition you should be aware that ordinary (unencrypted) e-mail sent through the Internet is not secure. Do not send confidential or sensitive information, such as social security numbers, account numbers, personal identification numbers and passwords, to us via ordinary (unencrypted) e-mail.
> 
> 
> 
> ------------------------------------------------------------------------
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Bill is a spammer

Posted by Remy Maucherat <re...@apache.org>.
Bill Barker wrote:

> This message is intended only for the use of the person(s) listed 
> above as the intended recipient(s), and may contain information that 
> is PRIVILEGED and CONFIDENTIAL. If you are not an intended recipient, 
> you may not read, copy, or distribute this message or any attachment. 
> If you received this communication in error, please notify us 
> immediately by e-mail and then delete all copies of this message and 
> any attachments.
>
>In addition you should be aware that ordinary (unencrypted) e-mail sent through the Internet is not secure. Do not send confidential or sensitive information, such as social security numbers, account numbers, personal identification numbers and passwords, to us via ordinary (unencrypted) e-mail.
>  
>
Because of that (I assume), a lot of Bill's email were considered spam 
by Thunderbird ;)

Rémy


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Re: cvs commit: jakarta-tomcat-connectors/coyote/src/java/org/apache/coyote Response.java

Posted by Bill Barker <wb...@wilshire.com>.
----- Original Message -----
From: <lu...@apache.org>
To: <ja...@apache.org>
Sent: Tuesday, July 27, 2004 5:43 PM
Subject: cvs commit:
jakarta-tomcat-connectors/coyote/src/java/org/apache/coyote Response.java


> luehe       2004/07/27 17:43:17
>
>   Modified:    coyote/src/java/org/apache/coyote Response.java
>   Log:
>   Fixed Bugtraq 6152759 ("Default charset not included in Content-Type
>   response header if no char encoding was specified").
>
>   According to the Servlet 2.4 spec, calling:
>
>     ServletResponse.setContentType("text/html");
>
>   must yield these results:
>
>     ServletResponse.getContentType() -> "text/html"
>
>     Content-Type response header -> "text/html;charset=ISO-8859-1"
>
>   Notice the absence of a charset in the result of getContentType(), but
>   its presence (set to the default ISO-8859-1) in the Content-Type
>   response header.
>
>   Tomcat is currently not including the default charset in the
>   Content-Type response header if no char encoding was specified.
>

-1.  This gets us right back to the same old problem where we are sending
back "image/gif; charset=iso-8859-1", and nobody can read the response.

If we're not going to assume that the UA believes that the default encoding
is iso-8859-1 (which is what we are doing now), then I'd suggest simply
doing:
   setCharacterEncoding(getCharacterEncoding());
in Response.getWriter (since the spec only requires that we identify the
charset when using a Writer, and we don't really know what it is when using
OutputStream).