You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Luis Arriaga <lu...@servicenow.com> on 2020/01/14 19:50:14 UTC

Encoding properties

Hi Apache team,

We are updating our encoding implementation on Tomcat and the ServiceNow platform from ISO-8859-1 to UTF-8 and ran into some concerns. Our Tomcat setup includes 8.5.47 on a CentOS VM, where Tomcat serves as our Java Servlet Web Server and we terminate SSL on our load balancer.

There is a URIEncoding property that defaults to UTF-8 if it is not specified. Is there any reason there is no BodyEncoding property or is there a workaround you guys are aware of that does not require the source code to be modified? Looking through the Tomcat source code the default body encoding seems to be ISO-8859-1, looking at the Parameters, ByteChunk, and Constants classes there are two variables DEFAULT_BODY_CHARSET and DEFAULT_CHARSET that determine the body charset/encoding.

We have forked the Tomcat source code and applied the changes below which fixed the issue. Are you guys aware of this? Seems strange to have a URIEncoding property but not a BodyEncoding property unless I am missing something. Maybe this is an enhancement request we can submit unless there is a valid reason to not have such property.

diff --git a/java/org/apache/coyote/Constants.java b/java/org/apache/coyote/Constants.java
index 9de194d55..0883904f4 100644
--- a/java/org/apache/coyote/Constants.java
+++ b/java/org/apache/coyote/Constants.java
@@ -33,7 +33,7 @@ public final class Constants {
     public static final String DEFAULT_CHARACTER_ENCODING="ISO-8859-1";
     public static final Charset DEFAULT_URI_CHARSET = StandardCharsets.ISO_8859_1;
-    public static final Charset DEFAULT_BODY_CHARSET = StandardCharsets.ISO_8859_1;
+    public static final Charset DEFAULT_BODY_CHARSET = StandardCharsets.UTF_8;
     public static final int MAX_NOTES = 32;
diff --git a/java/org/apache/tomcat/util/buf/ByteChunk.java b/java/org/apache/tomcat/util/buf/ByteChunk.java
index 555c0f6b8..ed9f6e5ea 100644
--- a/java/org/apache/tomcat/util/buf/ByteChunk.java
+++ b/java/org/apache/tomcat/util/buf/ByteChunk.java
@@ -123,7 +123,7 @@ public final class ByteChunk extends AbstractChunk {
      * standards seem to converge, but the servlet API requires 8859_1, and this
      * object is used mostly for servlets.
      */
-    public static final Charset DEFAULT_CHARSET = StandardCharsets.ISO_8859_1;
+    public static final Charset DEFAULT_CHARSET = StandardCharsets.UTF_8;
     private transient Charset charset;
diff --git a/java/org/apache/tomcat/util/http/Parameters.java b/java/org/apache/tomcat/util/http/Parameters.java
index 4d7d6cc1e..f59f75514 100644
--- a/java/org/apache/tomcat/util/http/Parameters.java
+++ b/java/org/apache/tomcat/util/http/Parameters.java
@@ -266,7 +266,7 @@ public final class Parameters {
      */
     @Deprecated
     public static final String DEFAULT_ENCODING = "ISO-8859-1";
-    private static final Charset DEFAULT_BODY_CHARSET = StandardCharsets.ISO_8859_1;
+    private static final Charset DEFAULT_BODY_CHARSET = StandardCharsets.UTF_8;
     private static final Charset DEFAULT_URI_CHARSET = StandardCharsets.UTF_8;



_____________________________________________
Luis Arriaga
Software Engineer
M: +17605192599
servicenow.com<https://www.servicenow.com>
LinkedIn<https://www.linkedin.com/company/servicenow> | Twitter<https://twitter.com/servicenow> | YouTube<https://www.youtube.com/user/servicenowinc> | Facebook<https://www.facebook.com/servicenow>

Re: Encoding properties

Posted by Mark Thomas <ma...@apache.org>.
On 14/01/2020 19:50, Luis Arriaga wrote:
> Hi Apache team,

Please don't cross-post. This discussion can continue on the dev list
since you posted there first and it is related to changing the source code.

> We are updating our encoding implementation on Tomcat and the ServiceNow platform from ISO-8859-1 to UTF-8 and ran into some concerns. Our Tomcat setup includes 8.5.47 on a CentOS VM, where Tomcat serves as our Java Servlet Web Server and we terminate SSL on our load balancer.
> 
> There is a URIEncoding property that defaults to UTF-8 if it is not specified. Is there any reason there is no BodyEncoding property or is there a workaround you guys 

Some people read "guys" as being gender specific and, as such, it is not
appropriate to use that term when referring to the community as a whole.

Mark


> are aware of that does not require the source code to be modified? Looking through the Tomcat source code the default body encoding seems to be ISO-8859-1, looking at the Parameters, ByteChunk, and Constants classes there are two variables DEFAULT_BODY_CHARSET and DEFAULT_CHARSET that determine the body charset/encoding.
> 
> We have forked the Tomcat source code and applied the changes below which fixed the issue. Are you guys aware of this? Seems strange to have a URIEncoding property but not a BodyEncoding property unless I am missing something. Maybe this is an enhancement request we can submit unless there is a valid reason to not have such property.
> 
> diff --git a/java/org/apache/coyote/Constants.java b/java/org/apache/coyote/Constants.java
> index 9de194d55..0883904f4 100644
> --- a/java/org/apache/coyote/Constants.java
> +++ b/java/org/apache/coyote/Constants.java
> @@ -33,7 +33,7 @@ public final class Constants {
>      public static final String DEFAULT_CHARACTER_ENCODING="ISO-8859-1";
>      public static final Charset DEFAULT_URI_CHARSET = StandardCharsets.ISO_8859_1;
> -    public static final Charset DEFAULT_BODY_CHARSET = StandardCharsets.ISO_8859_1;
> +    public static final Charset DEFAULT_BODY_CHARSET = StandardCharsets.UTF_8;
>      public static final int MAX_NOTES = 32;
> diff --git a/java/org/apache/tomcat/util/buf/ByteChunk.java b/java/org/apache/tomcat/util/buf/ByteChunk.java
> index 555c0f6b8..ed9f6e5ea 100644
> --- a/java/org/apache/tomcat/util/buf/ByteChunk.java
> +++ b/java/org/apache/tomcat/util/buf/ByteChunk.java
> @@ -123,7 +123,7 @@ public final class ByteChunk extends AbstractChunk {
>       * standards seem to converge, but the servlet API requires 8859_1, and this
>       * object is used mostly for servlets.
>       */
> -    public static final Charset DEFAULT_CHARSET = StandardCharsets.ISO_8859_1;
> +    public static final Charset DEFAULT_CHARSET = StandardCharsets.UTF_8;
>      private transient Charset charset;
> diff --git a/java/org/apache/tomcat/util/http/Parameters.java b/java/org/apache/tomcat/util/http/Parameters.java
> index 4d7d6cc1e..f59f75514 100644
> --- a/java/org/apache/tomcat/util/http/Parameters.java
> +++ b/java/org/apache/tomcat/util/http/Parameters.java
> @@ -266,7 +266,7 @@ public final class Parameters {
>       */
>      @Deprecated
>      public static final String DEFAULT_ENCODING = "ISO-8859-1";
> -    private static final Charset DEFAULT_BODY_CHARSET = StandardCharsets.ISO_8859_1;
> +    private static final Charset DEFAULT_BODY_CHARSET = StandardCharsets.UTF_8;
>      private static final Charset DEFAULT_URI_CHARSET = StandardCharsets.UTF_8;
> 
> 
> 
> _____________________________________________
> Luis Arriaga
> Software Engineer
> M: +17605192599
> servicenow.com<https://www.servicenow.com>
> LinkedIn<https://www.linkedin.com/company/servicenow> | Twitter<https://twitter.com/servicenow> | YouTube<https://www.youtube.com/user/servicenowinc> | Facebook<https://www.facebook.com/servicenow>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org