You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Thorsten Schöning <ts...@am-soft.de> on 2018/11/26 13:45:56 UTC

Character encoding problems using jsp:include with jsp:param in Tomcat 8.5 only.

Hi all,

I'm currently testing migration of a legacy web app from Tomcat 7 to 8
to 8.5 and ran into problems regarding character encoding in 8.5 only.
That app uses JSP pages and declares all of those to be stored in
UTF-8, does really do so :-), and declares a HTTP-Content type of
"text/html; charset=UTF-8" as well. Textual content at HTML-level is
properly encoded using UTF-8 and looks properly in the browser etc.

In Tomcat 8.5 the following is introducing encoding problems, though:

> <jsp:include page="/WEB-INF/jsp/includes/search.jsp">
>       <jsp:param      name="chooseSearchInputTitle"
>                       value="Benutzer wählen"
>       />
> </jsp:include>

"search.jsp" simply outputs the value of the param as the "title"
attribute of some HTML-link and the character "ä" is replaced
somewhere with the Unicode character REPLACEMENT CHARACTER 0xFFFD. But
really only in Tomcat 8.5, not in 8 and not in 7.

I can fix that problem using either "SetCharacterEncodingFilter" or
the following line, which simply results in the same I guess:

> <% request.setCharacterEncoding("UTF-8"); %>

Looking at the generated Java code for the JSP I get the following:

> org.apache.jasper.runtime.JspRuntimeLibrary.include(request, response, "/WEB-INF/jsp/includes/search.jsp" + "?" + org.apache.jasper.runtime.JspRuntimeLibrary.URLEncode("chooseSearchInputTitle", request.getCharacterEncoding())+ "=" + org.apache.jasper.runtime.JspRuntimeLibrary.URLEncode("Benutzer wählen", request.getCharacterEncoding()), out, false);

The "ä" is properly encoded using UTF-8 in all versions of Tomcat and
the generated code seems to be the same in all versions as well,
especially regarding "request.getCharacterEncoding()".

"getCharacterEncoding" in Tomcat 8.8 has changed, the former
implementation didn't take the context into account:

>    @Override
>    public String getCharacterEncoding() {
>        String characterEncoding = coyoteRequest.getCharacterEncoding();
>        if (characterEncoding != null) {
>            return characterEncoding;
>        }
>
>        Context context = getContext();
>        if (context != null) {
>            return context.getRequestCharacterEncoding();
>        }
>
>        return null;
>    }

My connector in server.xml is configured to use "URIEncoding" as UTF-8
in all versions of Tomcat, but that doesn't make a difference to 8.5.
So I understand that using "setCharacterEncoding", I set the value
actually used in the generated Java now, even though the following is
documented for character encoding filter:

> Note that the encoding for GET requests is not set here, but on a Connector

https://tomcat.apache.org/tomcat-8.5-doc/config/filter.html#Set_Character_Encoding_Filter/Introduction

Now I'm wondering about multiple things...

1. Doesn't "getCharacterEncoding" provide the encoding of the
   HTTP-body? My JSP is called using GET and the Java quoted above
   seems to build a query string as well. So why does it depend on
   some body encoding instead of e.g. URIEncoding of the connector?

2. Is my former approach wrong or did changes in Tomcat 8.5 introduce
   some regression? There is some conversion somewhere which was not
   present in the past.

3. What is the correct fix I need now? The character encoding filter,
   even though it only applies to bodies per documentation?

Thanks!

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning       E-Mail: Thorsten.Schoening@AM-SoFT.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Character encoding problems using jsp:include with jsp:param in Tomcat 8.5 only.

Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Thorsten,

On 11/27/18 04:48, Thorsten Schöning wrote:
> Guten Tag Christopher Schultz, am Montag, 26. November 2018 um
> 16:07 schrieben Sie:
> 
>> web.xml - ------- <web-app> 
>> <request-character-encoding>UTF-8</request-character-encoding> 
>> </web-app>
> 
> Tested that with Tomcat 9 and this setting fixed my problem the
> same as using SetCharacterEncodingFilter. It doesn't work in Tomcat
> 8.5, I guess because that simply doesn't implement Servlet 4.0?

Correct. Tomcat 8.0 and 8.5 implement servlet 3.1. In Tomcat 8.x,
you'll need to use the SetCharacterEncodingFilter.

> Because I still need to support Tomcat 7 and 8.0 for some time,
> I'll keep SetCharacterEncodingFilter for now and just document the
> better solution. Thanks!

Sounds good. The SetCharacterEncodingFilter should be entirely
forward-compatible.

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlwAMN8ACgkQHPApP6U8
pFgY/w/+JyJy02PVIebDXUNYugq8rR2GR+7cQhrHiFwdR0kcf8/FySP8s/8IsJyn
JaCbQ4V/qssMRYlSaxHb2m7xpioraXJkXQE/3HGZyJFKnLykZcAwF86jTSuTesS0
I20IRMh5KJKMoCszmDfqMnY3vQSGJJ7G+Jc47myApKn7qu2igQcDHkVZSK7hEqsb
+ayfHiUIkyN24h6xvFEb7u5RDiATMli6GOverpW1t5+oWdDoUK452aQGQYfN8ojH
Nv2lI6r9OSKQoz3eA6xNkMLlfSPGCH1kzfDyY4KYqhBtxshTnxRzkEoZ3w+DjVjD
U69oOpLthm7nTiYbdGft4dMTcKW+17LczjEbRExV8ZqM3EI92a2iTPDhrva5T65E
dTcNuImv2dr9Ijgn6hvMttE1Ntubncy+UwRdfuGTAoeZ771zxrP7+6UN6BXyO14S
rwgAI1tPzwwsWHJ4emfNEERjKbKy0m5U/WivoKmVVDavGfYskCWQXkzZ64eUGxuU
QKANPJJcprELYw2bX06n+ViJ+zKRHju4SsdJuScKpiXsBgVqiE6MsilB5DKIO8vg
zypgshIpoKVjq3KevsEyHUbVNZguxv4wtSOsGhjkYpm0+e07e/MNLXaK2OnLxIV5
0OGfimo2pYNocS2iM2a2aiwi5PMfDchqjjVovyQvFSV4W3xaMIk=
=mqmG
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Character encoding problems using jsp:include with jsp:param in Tomcat 8.5 only.

Posted by Thorsten Schöning <ts...@am-soft.de>.
Guten Tag Christopher Schultz,
am Montag, 26. November 2018 um 16:07 schrieben Sie:

> web.xml
> - -------
> <web-app>
>   <request-character-encoding>UTF-8</request-character-encoding>
> </web-app>

Tested that with Tomcat 9 and this setting fixed my problem the same
as using SetCharacterEncodingFilter. It doesn't work in Tomcat 8.5, I
guess because that simply doesn't implement Servlet 4.0?

Because I still need to support Tomcat 7 and 8.0 for some time, I'll
keep SetCharacterEncodingFilter for now and just document the better
solution. Thanks!

P.S.:

I've send you a private mail some days ago, unrelated to Tomcat. Did
you get that? Just want to make sure that I'm not spam filtered.

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning       E-Mail: Thorsten.Schoening@AM-SoFT.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Character encoding problems using jsp:include with jsp:param in Tomcat 8.5 only.

Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Thorsten,

On 11/26/18 08:45, Thorsten Schöning wrote:
> Hi all,
> 
> I'm currently testing migration of a legacy web app from Tomcat 7
> to 8 to 8.5 and ran into problems regarding character encoding in
> 8.5 only. That app uses JSP pages and declares all of those to be
> stored in UTF-8, does really do so :-), and declares a HTTP-Content
> type of "text/html; charset=UTF-8" as well. Textual content at
> HTML-level is properly encoded using UTF-8 and looks properly in
> the browser etc.
> 
> In Tomcat 8.5 the following is introducing encoding problems,
> though:
> 
>> <jsp:include page="/WEB-INF/jsp/includes/search.jsp"> <jsp:param
>> name="chooseSearchInputTitle" value="Benutzer wählen" /> 
>> </jsp:include>
> 
> "search.jsp" simply outputs the value of the param as the "title" 
> attribute of some HTML-link and the character "ä" is replaced 
> somewhere with the Unicode character REPLACEMENT CHARACTER 0xFFFD.
> But really only in Tomcat 8.5, not in 8 and not in 7.

Have you been able to determine if the problem is on input or output?

> I can fix that problem using either "SetCharacterEncodingFilter"
> or the following line, which simply results in the same I guess:
> 
>> <% request.setCharacterEncoding("UTF-8"); %>

FYI the SetCharacterEncodingFilter only modifies request encoding and
not response encoding. Also, it only changes the encoding of the
request *body* (e.g. PUT/POST), and not the encoding used to decode
the URI. That's configured in <Connector>'s URIEncoding. There is also
useBodyEncodingForURI which inherits the request body's encoding if
it's present. I recommend using useBodyEncodingForURI="true".

I recommend *always* using SetCharacterEncodingFilter, since web
browsers both habitually refuse to send a correct content/type and
often use UTF-8 in URLs in violation of the HTTP spec. The result is
essentially that everything works the way you *want* it to work,
except that you just have to "hope" it works instead of being able to
prove that it will.

> Looking at the generated Java code for the JSP I get the
> following:
> 
>> org.apache.jasper.runtime.JspRuntimeLibrary.include(request,
>> response, "/WEB-INF/jsp/includes/search.jsp" + "?" +
>> org.apache.jasper.runtime.JspRuntimeLibrary.URLEncode("chooseSearchIn
putTitle",
>> request.getCharacterEncoding())+ "=" +
>> org.apache.jasper.runtime.JspRuntimeLibrary.URLEncode("Benutzer
>> wählen", request.getCharacterEncoding()), out, false);
> 
> The "ä" is properly encoded using UTF-8 in all versions of Tomcat
> and the generated code seems to be the same in all versions as
> well, especially regarding "request.getCharacterEncoding()".
> 
> "getCharacterEncoding" in Tomcat 8.8 has changed, the former 
> implementation didn't take the context into account:
> 
>> @Override public String getCharacterEncoding() { String
>> characterEncoding = coyoteRequest.getCharacterEncoding(); if
>> (characterEncoding != null) { return characterEncoding; }
>> 
>> Context context = getContext(); if (context != null) { return
>> context.getRequestCharacterEncoding(); }
>> 
>> return null; }

This is just a fall-back for when there is no character encoding
defined in the request (because the browser didn't send one).

> My connector in server.xml is configured to use "URIEncoding" as
> UTF-8 in all versions of Tomcat, but that doesn't make a difference
> to 8.5. So I understand that using "setCharacterEncoding", I set
> the value actually used in the generated Java now, even though the
> following is documented for character encoding filter:
> 
>> Note that the encoding for GET requests is not set here, but on a
>> Connector
> 
> https://tomcat.apache.org/tomcat-8.5-doc/config/filter.html#Set_Charac
ter_Encoding_Filter/Introduction
>
>  Now I'm wondering about multiple things...
> 
> 1. Doesn't "getCharacterEncoding" provide the encoding of the 
> HTTP-body?

Yes, but it comes directly from the browser, who often doesn't provide
it. There is no encoding-detection going on, so it's often "null" or
ISO-8859-1, which is the spec-defined default.

> My JSP is called using GET and the Java quoted above seems to build
> a query string as well. So why does it depend on some body encoding
> instead of e.g. URIEncoding of the connector?

Good question. Might be  a bug, here.

> 2. Is my former approach wrong or did changes in Tomcat 8.5
> introduce some regression? There is some conversion somewhere which
> was not present in the past.

Tomcat 8.5 follows the servlet spec, which in v4.0 added the
<web-app><request-character-encoding> to make things even more fun.
Actually, this can replace the use of the SetCharacterEncodingFilter.
Thanks for pointing this out; I wasn't aware of this feature of the
4.0 spec.

> 3. What is the correct fix I need now? The character encoding
> filter, even though it only applies to bodies per documentation?

Try setting <request-character-encoding> in your <web-app> like this:

web.xml
- -------
<web-app>
  <request-character-encoding>UTF-8</request-character-encoding>
</web-app>

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlv8DEYACgkQHPApP6U8
pFjbihAAuX3vNtHpJ2qLpIofvz83wFbCxyVsgnRPGIQsqT/wxskOizwkKCmxnITc
pYEJHOEjF5U+C9QJtyC4iPz/Dj9MOfk8986NZ/9bhxFuGJsAifO1HKZ2vTvf9dYD
s5yAPJryQYaShgiDRPopYDgCOWi6a9mQMjvQeYclQjFAOa3MWMa4tlnKD2mOL4GQ
X/PuUiKA97XMmj6LZTwh9dGJwU2Fi6LlWOIXXP2qAB8RmcfIlDr20/m1OKg4l0Z3
dVzbD0rWM7tNCtDhnybclamdKv+apDJGS3NtTHzScXlqT51EdUiKup+mTJbaRncD
okL9MKlGLZYe5ankTGHaNH5P4BfhSv1BUYwiTXpUMgVpuAl5AMxEwu5ZHdoyeSJm
+B27/RLXMFue25Qtni6op06ssJGjQZyR5AxAN4qO/k3eTJUzAp5tLiJlbpJbMIzd
fEiL2kIkvIeHUE6Iz39deaWsFqu6m1hweSGcTXsvky0mEi20QZ9Pa+1E9UTvii20
HL0h/MxKlfJFc7yXmLU2SpTho4lTLUIMD57XOuYPQTkHBcW0QoHJLSCymANx/wpv
OdPjXsqGDBAKWteRTaB7caqU0Fb+Z3UHA8PUIjT4sPW88uHkRGA5XRLMWWlXe+Cx
DVwykOEkBaKXLWzZ51R+cYoWEWKtbR0pzEW+dA9JEMClWMrovkg=
=pfKy
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org