You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@tomcat.apache.org by Tomoki Sato <nc...@gmail.com> on 2019/05/12 04:51:22 UTC

ServletContext#setRequestCharacterEncoding does not have an effect on HttpServletRequest#getReader

Hello,

The reader that HttpServletRequest#getReader returns
seems to decode characters not using the character encoding
set by ServletContext#setRequestCharacterEncoding(since Servlet 4.0).

My questions are:
1. Is this behavior intentional(e.g. for backward compatibility)?
2. If this behavior is intentional, is there any specification
describing such ServletContext#setRequestCharacterEncoding
and HttpServletRequest#getReader behaviors?


I have created a simple war application and tested
ServletContext#setRequestCharacterEncoding as follows.

[Environment]
Tomcat9.0.19 (I don't change any default configuration)
JDK11
Windows8.1

[index.html]
#####################################################################
    <button type="button" id="the_button">post</button>
    <script>
        document.getElementById('the_button').addEventListener('click',
function() {
            var xhttp = new XMLHttpRequest();
            xhttp.open('POST', '/SimpleWarApp/app/simple');
            xhttp.setRequestHeader('Content-Type', 'text/plain');
            <!-- The body content is Japanese character '\u3042' -->
            xhttp.send('あ');
        });
    </script>
#####################################################################

[InitServletContextListener.java]
#####################################################################
@WebListener
public class InitServletContextListener implements ServletContextListener {
    @Override
    public void contextInitialized(ServletContextEvent sce) {
        sce.getServletContext().setRequestCharacterEncoding("UTF-8");
    }
}
#####################################################################

[SimpleServlet.java]
#####################################################################
@WebServlet("/app/simple")
@SuppressWarnings("serial")
public class SimpleServlet extends HttpServlet {

    @Override
    protected void doPost(HttpServletRequest req, HttpServletResponse resp)
throws ServletException, IOException {

        System.out.println("requestCharacterEncoding : " +
req.getServletContext().getRequestCharacterEncoding());
        System.out.println("req.getCharacterEncoding() : " +
req.getCharacterEncoding());

        String hello = req.getParameter("hello");
        if (hello != null) {
            System.out.println("hello : " + req.getParameter("hello"));
        } else {
            System.out.println("body : " + req.getReader().readLine());
        }
    }
}
#####################################################################

I don't have any servlet filters.
The above three are all the components of this war application.


Here are the war application's console logs.

Case 1:
When I submit the form with a parameter 'hello',
the value of 'hello' is successfully decoded as follows.
#####################################################################
requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
hello : あ
#####################################################################

Case 2:
When I click 'post' and send text content,
the request body cannot be successfully decoded as follows.
#####################################################################
requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
body : ???
#####################################################################

Case 1 is OK,
but at Case 2, I expect for the request body to be successfully decoded
by the character encoding set by ServletContext#setRequestCharacterEncoding.


I originally posted the issue on StackOverflow.
There are a bit of more information there:
https://stackoverflow.com/questions/56087155/why-servletcontextsetrequestcharacterencoding-does-not-have-an-effect-on-htt


Thanks,

Tomoki

Re: ServletContext#setRequestCharacterEncoding does not have an effect on HttpServletRequest#getReader

Posted by Mark Thomas <ma...@apache.org>.

On 13/05/2019 14:07, Mark Thomas wrote:

> I'll look into a fix.

It will be in 9.0.21 onwards.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

Re: ServletContext#setRequestCharacterEncoding does not have an effect on HttpServletRequest#getReader

Posted by Mark Thomas <ma...@apache.org>.

This is, unfortunately, a complex topic. There is an FAQ on this topic:
https://cwiki.apache.org/confluence/display/TOMCAT/Character+Encoding

but even that could probably do with some updates to reflect the
addition of #setRequestCharacterEncoding() and
#setResponseCharacterEncoding().

I'll work through your test case and see if I can figure out where
things are going wrong.

On 12/05/2019 05:51, Tomoki Sato wrote:
> Hello,
> 
> The reader that HttpServletRequest#getReader returns
> seems to decode characters not using the character encoding
> set by ServletContext#setRequestCharacterEncoding(since Servlet 4.0).
> 
> My questions are:
> 1. Is this behavior intentional(e.g. for backward compatibility)?
> 2. If this behavior is intentional, is there any specification
> describing such ServletContext#setRequestCharacterEncoding
> and HttpServletRequest#getReader behaviors?

<snip/>

> Case 1:
> When I submit the form with a parameter 'hello',

There is no form in the example code. Ah. The StackOverflow question has
a complete example.

On to the tests...

> the value of 'hello' is successfully decoded as follows.
> #####################################################################
> requestCharacterEncoding : UTF-8
> req.getCharacterEncoding() : UTF-8
> hello : あ
> #####################################################################

Yes, this works for me too.

> Case 2:
> When I click 'post' and send text content,
> the request body cannot be successfully decoded as follows.
> #####################################################################
> requestCharacterEncoding : UTF-8
> req.getCharacterEncoding() : UTF-8
> body : ???
> #####################################################################

I see corruption too.

Before I dig into what is going on, I do want to point out that writing
to stdout can itself be problematic if your platform default encoding is
not UTF-8 (I don't believe it is for Windows). I'm testing on a platform
with UTF-8 is the default so I'm going to ignore this for now.

OK. The Reader used to read the request body is created using ISO-8859-1
rather than UTF-8. Hmm. Need to dig into that some more. Ah. I think we
have a bug. The way the Reader is created bypasses the check for a value
set by ServletContext#setRequestCharacterEncoding

I'll look into a fix.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org