You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tomcat.apache.org by bu...@apache.org on 2017/06/19 00:19:34 UTC

[Bug 61197] New: Breaking change in Content-Type / Character Encoding handling

https://bz.apache.org/bugzilla/show_bug.cgi?id=61197

            Bug ID: 61197
           Summary: Breaking change in Content-Type / Character Encoding
                    handling
           Product: Tomcat 8
           Version: 8.5.15
          Hardware: All
                OS: All
            Status: NEW
          Severity: regression
          Priority: P2
         Component: Catalina
          Assignee: dev@tomcat.apache.org
          Reporter: matthew@matt-shaw.co.uk
  Target Milestone: ----

I *believe* this constitutes some level of regression, based on distinct
difference from prior behaviour, but please correct me if I'm wrong :) Also I
couldn't find any clear mention of this change in the change log for 8.5.15.

Prior to 8.5.15 (specifically, this commit:
https://github.com/apache/tomcat/commit/b2bab804b543bfe181fe435efe35628ce0e21b39)
the behaviour of `org.apache.catalina.connector.Response` when setting the
content-type with encoding parameter included, e.g.
`setContentType("application/json;charset=MS932")`, was to simply take the
provided encoding string and set this for the output.

As long as the character set was supported by the JVM (as a specific code page,
or an alias of one of the supported code pages), requests would return with the
*exact* character set string provided.

Since the above commit / 8.5.15 release, this is now forcibly modified with no
option to disable such behaviour. For instance, if I specify "MS932" or
"windows-932" this is replaced now with "windows-31j" , or "eucjis" with
"EUC-JP", "sjis" with "Shift-JIS", etc.

This may seem like a reasonable behaviour for modern systems that we would
*hope* support mapping aliased encodings, but with legacy systems unable to
handle this (and any system that, stupidly or otherwise, checks for a specific
encoding string, possibly in a case-sensitive manner), suddenly we have broken
behaviour. The client expects one encoding string and receives something
equivalent but that it just can't handle.

Unfortunately I'm now stuck in this situation as a legacy-systems integrations
engineer. We *have* to be able to provide our output with very specific
encoding strings set or else several dozen systems we (sadly) can't change will
break. Thankfully we caught this in internal testing of the upgrade to 8.5.15
and can put it off temporarily, but we're now also stuck with either needing to
maintain our own patched version of Tomcat to revert this behaviour, not
continue updating (not a real option given security requirements), or possibly
review migrating to an alternative servlet container (please no q_q).

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

[Bug 61197] Breaking change in Content-Type / Character Encoding handling

Posted by bu...@apache.org.

https://bz.apache.org/bugzilla/show_bug.cgi?id=61197

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #2 from Mark Thomas <ma...@apache.org> ---
Fixed in:
- trunk for 9.0.0.M22 onwards
- 8.5.x for 8.5.16 onwards

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

[Bug 61197] Breaking change in Content-Type / Character Encoding handling

Posted by bu...@apache.org.

https://bz.apache.org/bugzilla/show_bug.cgi?id=61197

--- Comment #1 from Mark Thomas <ma...@apache.org> ---
The change relates to this entry in the change log:

<quote>
Start to switch to using Charset rather than String to store encoding
configuration settings to reduce the number of places the associated Charset
needs to be looked up. (markt)
</quote>

The primary drivers for the change were performance (the repeated String ->
Charset calls were relatively expensive) and earlier error reporting when an
invalid value was provided.

There might be an alternative way of setting the charset that avoids this
restriction. I'll take a look. If that doesn't work, preserving the user
provided value is another option.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org