You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by re...@apache.org on 2006/01/24 02:13:21 UTC

svn commit: r371765 - /tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Response.java

Author: remm
Date: Mon Jan 23 17:13:19 2006
New Revision: 371765

URL: http://svn.apache.org/viewcvs?rev=371765&view=rev
Log:
- Remove nonsensical systematic inclusion on ISO-8859-1 charset in the content type, which is noth
  useless and inefficient.

Modified:
    tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Response.java

Modified: tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Response.java
URL: http://svn.apache.org/viewcvs/tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Response.java?rev=371765&r1=371764&r2=371765&view=diff
==============================================================================
--- tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Response.java (original)
+++ tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Response.java Mon Jan 23 17:13:19 2006
@@ -599,20 +599,6 @@
             throw new IllegalStateException
                 (sm.getString("coyoteResponse.getWriter.ise"));
 
-        /*
-         * If the response's character encoding has not been specified as
-         * described in <code>getCharacterEncoding</code> (i.e., the method
-         * just returns the default value <code>ISO-8859-1</code>),
-         * <code>getWriter</code> updates it to <code>ISO-8859-1</code>
-         * (with the effect that a subsequent call to getContentType() will
-         * include a charset=ISO-8859-1 component which will also be
-         * reflected in the Content-Type response header, thereby satisfying
-         * the Servlet spec requirement that containers must communicate the
-         * character encoding used for the servlet response's writer to the
-         * client).
-         */
-        setCharacterEncoding(getCharacterEncoding());
-
         usingWriter = true;
         outputBuffer.checkConverter();
         if (writer == null) {



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: svn commit: r371765 - /tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Response.java

Posted by Remy Maucherat <re...@apache.org>.
Jan Luehe wrote:
> I don't think you are misunderstanding the spec.
> 
> See the following javadocs snippets from ServletResponse:
> 
>     public String getCharacterEncoding():
> 
>      * If no character encoding
>      * has been specified, <code>ISO-8859-1</code> is returned.
> 
> 
>     public PrintWriter getWriter() throws IOException:
> 
>      * If the response's character encoding has not been
>      * specified as described in <code>getCharacterEncoding</code>
>      * (i.e., the method just returns the default value
>      * <code>ISO-8859-1</code>), <code>getWriter</code>
>      * updates it to <code>ISO-8859-1</code>.
> 
> 
>     public void setCharacterEncoding(String charset):
> 
>      * <p>Containers *must* communicate the character encoding used for
>      * the servlet response's writer to the client if the protocol
>      * provides a way for doing so. In the case of HTTP, the character
>      * encoding is communicated as part of the <code>Content-Type</code>
>      * header for text media types.

Yes, but the strict dumb application of what ended up being written 
written is definitely not what they intended, because it brings no 
benefits. I think everyone agrees that if the application is very 
careful about not specifying a charset anywhere, it shouldn't be 
forcefully added to the content-type header.

Anyway:
- Did you read the "for text media types" portion ? I find it important.
- "communicated as part of the Content-Type header": ISO-8859-1 is the 
default for HTTP, so one could consider it is communicated even if it is 
not physically present in the Content-Type header.

Rémy

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: svn commit: r371765 - /tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Response.java

Posted by Jan Luehe <Ja...@Sun.COM>.

Bill Barker wrote On 02/02/06 11:32,:
>  
> 
> 
>>-----Original Message-----
>>From: Remy Maucherat [mailto:remm@apache.org] 
>>Sent: Thursday, February 02, 2006 4:02 AM
>>To: Tomcat Developers List
>>Subject: Re: svn commit: r371765 - 
>>/tomcat/container/tc5.5.x/catalina/src/share/org/apache/catali
>>na/connector/Response.java
>>
>>Bill Barker wrote:
>>
>>>Yes, RFC 2616 does specify iso-latin-1 as the default for 
>>
>>HTTP/1.1 clients. 
>>
>>>However, section 3.4.1 is also relevant for HTTP/1.0 
>>
>>clients (like, say, the 
>>
>>>TCK :).  In any case, it doesn't matter since section 5.4 
>>
>>of the servlet 
>>
>>>spec says "must".  Complaints go to the expert group;  here 
>>
>>we just develop 
>>
>>>Tomcat.
>>
>>Ok, so I asked the expert group, and many people interpret the 
>>specification as I do (and is logical to do): if the 
>>application uses a 
>>writer, and never specifies the charset in any way, the 
>>container has no 
>>business rewriting the content-type header to include 
>>";charset=ISO-8859-1".
>>
> 
> 
> Then they should make the language in the spec clearer ;-).
> 
> If I'm misunderstanding the spec, then I don't have a valid reason for my
> veto.  Consider the veto withdrawn.

I don't think you are misunderstanding the spec.

See the following javadocs snippets from ServletResponse:

    public String getCharacterEncoding():

     * If no character encoding
     * has been specified, <code>ISO-8859-1</code> is returned.


    public PrintWriter getWriter() throws IOException:

     * If the response's character encoding has not been
     * specified as described in <code>getCharacterEncoding</code>
     * (i.e., the method just returns the default value
     * <code>ISO-8859-1</code>), <code>getWriter</code>
     * updates it to <code>ISO-8859-1</code>.


    public void setCharacterEncoding(String charset):

     * <p>Containers *must* communicate the character encoding used for
     * the servlet response's writer to the client if the protocol
     * provides a way for doing so. In the case of HTTP, the character
     * encoding is communicated as part of the <code>Content-Type</code>
     * header for text media types.


Jan



> 
> 
>>Rémy
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>>For additional commands, e-mail: dev-help@tomcat.apache.org
>>
>>
>>
> 
> 
> 
> 
> This message is intended only for the use of the person(s) listed above as the intended recipient(s), and may contain information that is PRIVILEGED and CONFIDENTIAL.  If you are not an intended recipient, you may not read, copy, or distribute this message or any attachment. If you received this communication in error, please notify us immediately by e-mail and then delete all copies of this message and any attachments.
> 
> In addition you should be aware that ordinary (unencrypted) e-mail sent through the Internet is not secure. Do not send confidential or sensitive information, such as social security numbers, account numbers, personal identification numbers and passwords, to us via ordinary (unencrypted) e-mail.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: dev-help@tomcat.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


RE: svn commit: r371765 - /tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Response.java

Posted by Bill Barker <wb...@wilshire.com>.
 

> -----Original Message-----
> From: Remy Maucherat [mailto:remm@apache.org] 
> Sent: Thursday, February 02, 2006 4:02 AM
> To: Tomcat Developers List
> Subject: Re: svn commit: r371765 - 
> /tomcat/container/tc5.5.x/catalina/src/share/org/apache/catali
> na/connector/Response.java
> 
> Bill Barker wrote:
> > Yes, RFC 2616 does specify iso-latin-1 as the default for 
> HTTP/1.1 clients. 
> > However, section 3.4.1 is also relevant for HTTP/1.0 
> clients (like, say, the 
> > TCK :).  In any case, it doesn't matter since section 5.4 
> of the servlet 
> > spec says "must".  Complaints go to the expert group;  here 
> we just develop 
> > Tomcat.
> 
> Ok, so I asked the expert group, and many people interpret the 
> specification as I do (and is logical to do): if the 
> application uses a 
> writer, and never specifies the charset in any way, the 
> container has no 
> business rewriting the content-type header to include 
> ";charset=ISO-8859-1".
> 

Then they should make the language in the spec clearer ;-).

If I'm misunderstanding the spec, then I don't have a valid reason for my
veto.  Consider the veto withdrawn.

> Rémy
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: dev-help@tomcat.apache.org
> 
> 
> 



This message is intended only for the use of the person(s) listed above as the intended recipient(s), and may contain information that is PRIVILEGED and CONFIDENTIAL.  If you are not an intended recipient, you may not read, copy, or distribute this message or any attachment. If you received this communication in error, please notify us immediately by e-mail and then delete all copies of this message and any attachments.

In addition you should be aware that ordinary (unencrypted) e-mail sent through the Internet is not secure. Do not send confidential or sensitive information, such as social security numbers, account numbers, personal identification numbers and passwords, to us via ordinary (unencrypted) e-mail.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: svn commit: r371765 - /tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Response.java

Posted by Remy Maucherat <re...@apache.org>.
Bill Barker wrote:
> Yes, RFC 2616 does specify iso-latin-1 as the default for HTTP/1.1 clients. 
> However, section 3.4.1 is also relevant for HTTP/1.0 clients (like, say, the 
> TCK :).  In any case, it doesn't matter since section 5.4 of the servlet 
> spec says "must".  Complaints go to the expert group;  here we just develop 
> Tomcat.

Ok, so I asked the expert group, and many people interpret the 
specification as I do (and is logical to do): if the application uses a 
writer, and never specifies the charset in any way, the container has no 
business rewriting the content-type header to include ";charset=ISO-8859-1".

Rémy

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: svn commit: r371765 - /tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Response.java

Posted by Bill Barker <wb...@wilshire.com>.
"Remy Maucherat" <re...@apache.org> wrote in message 
news:43D69450.3020807@apache.org...
> Bill Barker wrote:
>> It's relevant to the browser trying to display the code.  If you've
>> configured your browser's default encoding to EUC-JP, without the charset
>> you'll see a big mess when you hit a latin-1 page ;-).
>
> Obviously, this would only impact the case where ;charset=ISO-8859-1 would 
> be forcefully added to the content-type header for no good reason when the 
> user didn't specify any. This is the HTTP default encoding, and will not 
> change the behavior from the user perspective.
>

Yes, RFC 2616 does specify iso-latin-1 as the default for HTTP/1.1 clients. 
However, section 3.4.1 is also relevant for HTTP/1.0 clients (like, say, the 
TCK :).  In any case, it doesn't matter since section 5.4 of the servlet 
spec says "must".  Complaints go to the expert group;  here we just develop 
Tomcat.

>> Yup, that's what it means :).  I'm sure you've played the blame-game by 
>> now,
>> and I'm not interested enough to do it myself.  It looks like it's trying 
>> to
>> avoid computing the entire header value each time the characterEncoding
>> changes.
>
> This whole thing is a huge mess right now. Hopefully, it's doing what it 
> should. You can also for example compare 
> o.a.c.connector.Response.setContentType with 
> o.a.coyote.Response.setContentType. I have to suppose substring and 
> concatenation is a very cool activity.
>

Yeah, the spec is a mess wrt characterEncoding.  Complaints to the same 
place as above :).  The problem is that we need to deal with such 
pathological cases as:
   response.setContentType("text/html; charset=EUC-JP");
   // Oops, I want French instead of Japanese
   response.setCharacterEncoding("iso-8859-1");

Since you can change your mind (according to the spec) many times before you 
actually grab the Writer, I don't really see a way around substring and 
concatenation being cool :).  Of course, I would love to be proven wrong :).

> R�my 




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: svn commit: r371765 - /tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Response.java

Posted by Remy Maucherat <re...@apache.org>.
Bill Barker wrote:
> It's relevant to the browser trying to display the code.  If you've
> configured your browser's default encoding to EUC-JP, without the charset
> you'll see a big mess when you hit a latin-1 page ;-).

Obviously, this would only impact the case where ;charset=ISO-8859-1 
would be forcefully added to the content-type header for no good reason 
when the user didn't specify any. This is the HTTP default encoding, and 
will not change the behavior from the user perspective.

> Yup, that's what it means :).  I'm sure you've played the blame-game by now,
> and I'm not interested enough to do it myself.  It looks like it's trying to
> avoid computing the entire header value each time the characterEncoding
> changes.

This whole thing is a huge mess right now. Hopefully, it's doing what it 
should. You can also for example compare 
o.a.c.connector.Response.setContentType with 
o.a.coyote.Response.setContentType. I have to suppose substring and 
concatenation is a very cool activity.

Rémy

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


RE: svn commit: r371765 - /tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Response.java

Posted by Bill Barker <wb...@wilshire.com>.
 

> -----Original Message-----
> From: Remy Maucherat [mailto:remm@apache.org] 
> Sent: Tuesday, January 24, 2006 12:52 AM
> To: Tomcat Developers List
> Subject: Re: svn commit: r371765 - 
> /tomcat/container/tc5.5.x/catalina/src/share/org/apache/catali
> na/connector/Response.java
> 
> Bill Barker wrote:
> >> Author: remm
> >> Date: Mon Jan 23 17:13:19 2006
> >> New Revision: 371765
> >>
> >> URL: http://svn.apache.org/viewcvs?rev=371765&view=rev
> >> Log:
> >> - Remove nonsensical systematic inclusion on ISO-8859-1 
> >> charset in the content type, which is noth
> >>   useless and inefficient.
> >>
> > 
> > -1
> > Sending the charset used by the Writer is very clearly 
> required by the
> > servlet spec. 
> 
> Thanks, I expected no less coming from you :) I will revert my patch.
> 
> A couple questions for your enjoyment:
> 1) Is this relevant or irrelevant from the HTTP specification 
> perspective ?

It's relevant to the browser trying to display the code.  If you've
configured your browser's default encoding to EUC-JP, without the charset
you'll see a big mess when you hit a latin-1 page ;-).

> 2) Does this mean we're running the following ultra efficient code (I 
> don't even know why I accepted this stuff back then. It must 
> have been 
> that this has been done gradually through many many commits) for each 
> request that uses a writer ?
> 

Yup, that's what it means :).  I'm sure you've played the blame-game by now,
and I'm not interested enough to do it myself.  It looks like it's trying to
avoid computing the entire header value each time the characterEncoding
changes.

>      public String getContentType() {
> 
>          String ret = contentType;
> 
>          if (ret != null
>              && characterEncoding != null
>              && charsetSet) {
>              ret = ret + ";charset=" + characterEncoding;
>          }
> 
>          return ret;
>      }
> 
> Rémy
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: dev-help@tomcat.apache.org
> 
> 
> 



This message is intended only for the use of the person(s) listed above as the intended recipient(s), and may contain information that is PRIVILEGED and CONFIDENTIAL.  If you are not an intended recipient, you may not read, copy, or distribute this message or any attachment. If you received this communication in error, please notify us immediately by e-mail and then delete all copies of this message and any attachments.

In addition you should be aware that ordinary (unencrypted) e-mail sent through the Internet is not secure. Do not send confidential or sensitive information, such as social security numbers, account numbers, personal identification numbers and passwords, to us via ordinary (unencrypted) e-mail.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: svn commit: r371765 - /tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Response.java

Posted by Remy Maucherat <re...@apache.org>.
Bill Barker wrote:
>> Author: remm
>> Date: Mon Jan 23 17:13:19 2006
>> New Revision: 371765
>>
>> URL: http://svn.apache.org/viewcvs?rev=371765&view=rev
>> Log:
>> - Remove nonsensical systematic inclusion on ISO-8859-1 
>> charset in the content type, which is noth
>>   useless and inefficient.
>>
> 
> -1
> Sending the charset used by the Writer is very clearly required by the
> servlet spec. 

Thanks, I expected no less coming from you :) I will revert my patch.

A couple questions for your enjoyment:
1) Is this relevant or irrelevant from the HTTP specification perspective ?
2) Does this mean we're running the following ultra efficient code (I 
don't even know why I accepted this stuff back then. It must have been 
that this has been done gradually through many many commits) for each 
request that uses a writer ?

     public String getContentType() {

         String ret = contentType;

         if (ret != null
             && characterEncoding != null
             && charsetSet) {
             ret = ret + ";charset=" + characterEncoding;
         }

         return ret;
     }

Rémy

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


RE: svn commit: r371765 - /tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Response.java

Posted by Bill Barker <wb...@wilshire.com>.
 

> -----Original Message-----
> From: remm@apache.org [mailto:remm@apache.org] 
> Sent: Monday, January 23, 2006 5:13 PM
> To: tomcat-dev@jakarta.apache.org
> Subject: svn commit: r371765 - 
> /tomcat/container/tc5.5.x/catalina/src/share/org/apache/catali
> na/connector/Response.java
> 
> Author: remm
> Date: Mon Jan 23 17:13:19 2006
> New Revision: 371765
> 
> URL: http://svn.apache.org/viewcvs?rev=371765&view=rev
> Log:
> - Remove nonsensical systematic inclusion on ISO-8859-1 
> charset in the content type, which is noth
>   useless and inefficient.
> 

-1
Sending the charset used by the Writer is very clearly required by the
servlet spec. 



This message is intended only for the use of the person(s) listed above as the intended recipient(s), and may contain information that is PRIVILEGED and CONFIDENTIAL.  If you are not an intended recipient, you may not read, copy, or distribute this message or any attachment. If you received this communication in error, please notify us immediately by e-mail and then delete all copies of this message and any attachments.

In addition you should be aware that ordinary (unencrypted) e-mail sent through the Internet is not secure. Do not send confidential or sensitive information, such as social security numbers, account numbers, personal identification numbers and passwords, to us via ordinary (unencrypted) e-mail.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org