You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Mark Juszczec <ma...@gmail.com> on 2016/10/16 18:09:56 UTC

Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

Hello

I have Tomcat 8.0.28 running on CentOS Linux 7.2.1511 behind Apache 2.4.6

I'm using AJP 1.3 for communication between Apache and Tomcat

Its all powered by Java 1.8

I'm having a problem with international characters when I send them as the
request *URI* (which is used by GET requests and this is a GET request).

Let's say I get the string AOËL

mod_jk log  logs the bytes with the message

 "ajp_connection_tcp_send_message::jk_ajp_common.c (1208): sending to ajp13
pos=4 len=1411 max=8192" (at
ajp_connection_tcp_send_message::jk_ajp_common.c) shows them to be:

  41 4f c3 8b 4c

AFAIK this means the correct bytes are being sent to AJP.  Is that correct?

Running remote debugging via Spring Tool Suite to hook up to my code shows
me I receive:

    41 4f c3 c3 83 c2 c2 8b 4c

I have verified the incorrect bytes appear as early in the call stack as
when CoyoteAdapter.process() is invoked

I have UTF-8 specified as URIEncoding in ajp <Connector> and it has had no
effect.

Ive also specified  useBodyEncodingForURI as true with no effect.

Conventional wisdom says the data is getting inadvertently as ISO-8859-1
somewhere along the line. Since the data is correct (per mod_jk.log)
heading into AJP and incorrect once CoyoteAdapter.java starts handling it
somehow, something is going wrong when the data is interpreted after being
read from the AJP port.

Is that correct?

I am at a loss as to how to correct this.  The only 2 things the docs say
are to use URIEnocding="UTF-8" and  useBodyEncodingForURI="true".  I'm
doing that and its not working.

I am at a loss about what else to try or where to look.

If you were faced with this, what would you try?  Any advice or suggestions
will be greatly appreciated.


Mark

Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

Posted by Mark Juszczec <ma...@gmail.com>.
On Tue, Oct 18, 2016 at 10:28 AM, Mark Juszczec <ma...@gmail.com>
wrote:

>
>
> On Tue, Oct 18, 2016 at 10:23 AM, André Warnier (tomcat) <aw...@ice-sa.com>
> wrote:
>>
>>
>> Good. That our goal here. We live to help :-)
>>
>>
> You all have been helpful beyond description.
>
>
>> I don't think that there is a need for a formal "petition". This being a
>> Tomcat list, and the mod_jk Connector being part of the Tomcat project
>> (despite being an Apache httpd add-on), I believe that this is in scope of
>> the list, as long as we do not stray too far in httpd-specific things.
>> One problem is that this list strips most attachments, and that posting
>> your whole configuration setup may be a bit much for pasting it into your
>> next message.
>> Is there a way by which you can post your configuration to some
>> publicly-accessible place, and provide a link ?
>>
>>
>
httpd.conf:

ServerRoot "/apache2.4.6/install"
DocumentRoot "/apache2.4.6/install/proj"
PidFile bin/someFile.pid
ServerTokens Prod
Timeout 60
KeepAlive Off
MaxKeepAliveRequests 100
KeepAliveTimeout 15
ExtendedStatus On
UseCanonicalName On
HostnameLookups Off
ServerSignature Off

ServerName my.server.name:9001

ServerAdmin root@localhost
Listen 9001
TraceEnable off
AddDefaultCharset On

<IfModule prefork.c>
#  parms deleted for space
</IfModule>

<IfModule mpm_worker_module>
#  parms deleted for space
</IfModule>

LoadModule unixd_module modules/mod_unixd.so
LoadModule access_compat_module modules/mod_access_compat.so
LoadModule authz_host_module modules/mod_authz_host.so
LoadModule authz_core_module modules/mod_authz_core.so
LoadModule status_module modules/mod_status.so
LoadModule info_module modules/mod_info.so
LoadModule mpm_worker_module modules/mod_mpm_worker.so
LoadModule dir_module modules/mod_dir.so
LoadModule userdir_module modules/mod_userdir.so
LoadModule log_config_module modules/mod_log_config.so
LoadModule log_forensic_module modules/mod_log_forensic.so

LoadModule deflate_module modules/mod_deflate.so
LoadModule filter_module modules/mod_filter.so
LoadModule headers_module modules/mod_headers.so
LoadModule mime_module modules/mod_mime.so
LoadModule mime_magic_module modules/mod_mime_magic.so
LoadModule autoindex_module modules/mod_autoindex.so
LoadModule setenvif_module modules/mod_setenvif.so
LoadModule negotiation_module modules/mod_negotiation.so
LoadModule rewrite_module modules/mod_rewrite.so
LoadModule authn_core_module modules/mod_authn_core.so
LoadModule authn_file_module modules/mod_authn_file.so
LoadModule auth_basic_module  modules/mod_auth_basic.so

User myUser
Group myUser

<Directory />
    Options FollowSymLinks
    AllowOverride None
    Require all denied
</Directory>

<IfModule log_forensic_module>
     ForensicLog logs/forensic_log
</IfModule>

<IfModule mod_userdir.c>
    UserDir disabled
</IfModule>

<Directory "/var/www">
    AllowOverride None
    Require all granted
</Directory>

<Directory "/apache2.4.6/install/proj">
    Options FollowSymLinks
    AllowOverride None
    Require all granted
</Directory>

<IfModule dir_module>
    DirectoryIndex index.html
</IfModule>

<Files ".ht*">
    Require all denied
</Files>

ErrorLog "logs/error_log"

<IfModule mod_rewrite.c>
LogLevel mod_rewrite.c:trace8
</IfModule>

<IfModule log_config_module>
    CustomLog logs/mj_debug "%{canonical}p %{local}p %{remote}p %r %q
%{FirstName}e %{FirstName}n %{FirstName}o"
</IfModule>

IndexOptions FancyIndexing VersionSort NameWidth=* HTMLTable

# removed icon stuff

ReadmeName README.html
HeaderName HEADER.html

IndexIgnore .??* *~ *# HEADER* README* RCS CVS *,v *,t

AddLanguage ca .ca
AddLanguage cs .cz .cs
AddLanguage da .dk
AddLanguage de .de
AddLanguage el .el
AddLanguage en .en
AddLanguage eo .eo
AddLanguage es .es
AddLanguage et .et
AddLanguage fr .fr
AddLanguage he .he
AddLanguage hr .hr
AddLanguage it .it
AddLanguage ja .ja
AddLanguage ko .ko
AddLanguage ltz .ltz
AddLanguage nl .nl
AddLanguage nn .nn
AddLanguage no .no
AddLanguage pl .po
AddLanguage pt .pt
AddLanguage pt-BR .pt-br
AddLanguage ru .ru
AddLanguage sv .sv
AddLanguage zh-CN .zh-cn
AddLanguage zh-TW .zh-tw

LanguagePriority en ca cs da de el eo es et fr he hr it ja ko ltz nl nn no
pl pt pt-BR ru sv zh-CN zh-TW

ForceLanguagePriority Prefer Fallback

<IfModule mime_module>
    TypesConfig /etc/mime.types
    DefaultType None
    AddType application/x-compress .Z
    AddType application/x-gzip .gz .tgz
    AddType text/html .shtml
    AddOutputFilter INCLUDES .shtml
</IfModule>

AddDefaultCharset UTF-8

<IfModule mime_magic_module>
    MIMEMagicFile conf/magic
</IfModule>

# removed a bunch of BrowserMatch directives for space

IncludeOptional conf.d/*.conf

AddOutputFilterByType DEFLATE text/html text/xml text/javascript text/css
image/bmp application/x-amf application/pdf

DeflateCompressionLevel 6

<Location /server-status>
SetHandler server-status
#  parms deleted for space
</Location>

<Location /server-info>
 SetHandler server-info
#  parms deleted for space
 </Location>

<Location /jkmanager>
JkMount jkstatus
#  parms deleted for space
</Location>

--------------------------------------------------------------


located in conf.d/*.conf


LoadModule jk_module /apache2.4.6/install/modules/mod_jk.so

JkWorkersFile /apache2.4.6/install/conf.d/workers.properties

JkLogFile /apache2.4.6/install/logs/mod_jk.log

JkLogLevel trace

JkLogStampFormat "[%a %b %d %H:%M:%S %Y] "

JkOptions +ForwardKeySize +ForwardURIEscaped -ForwardDirectories

JkRequestLogFormat "%w %V %T"

<VirtualHost *:9001>
        DocumentRoot /apache2.4.6/install/proj/

 RewriteCond  %{LA-U:REQUEST_FILENAME} !-f
 RewriteRule  ^/(content/help/Help-)[a-zA-Z_]+(.*) $1en$2 [R,L]

 RewriteCond  %{LA-U:REQUEST_FILENAME} !-f
 RewriteRule  ^/(content_reparent/Reparent_help-)[a-zA-Z_]+(.*) $1en$2 [R,L]

<Location /webapp1/cache-monitor >
#  parms removed for brevity
</Location>

<Location /webapp2/automation.html >
#  parms removed for brevity
</Location>

################### Shibboleth Configuration #################A

UseCanonicalPhysicalPort On

<Location /Shibboleth.sso>
  SetHandler shib
</Location>

<Location /shib>
#  parms removed for brevity
</Location>

<Location /path/to/some.fve>
 AuthType Shibboleth
 ShibRequestSetting requireSession true
 require shib-session
 Require valid-user
</Location>

JkEnvVar Var1
JkEnvVar Var2



JkMount /SiteA/* lbA2
JkMount /SiteA lbA2

JkMount /SiteB/* lbA2
JkMount /SiteB lbA2

JkMount /shib/* lbA2
JkMount /shib lbA2

JkMount /SiteC/* lbB
JkMount /SiteC lbB

JKMount /jkmanager/* jkstatus
JKMount /jkmanager jkstatus

JKMount /SiteD/* lbC
JKMount /SiteD lbC


</VirtualHost>


Some questions (if these are not relevant, please disregard):

I'm loading a whole bunch of modules.  Could some of them be incompatible?

DocumentRoot refers to a directory that does not exist.  Is that a problem?

What does AddLanguage do?

Is AddDefaultCharset redundant?

Are +ForwardKeySize and -ForwardDirectories somehow disabling what
+ForwardURIEscaped does?

I have verified the data coming out of Shibboleth is what we expect.

Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

Posted by Mark Juszczec <ma...@gmail.com>.
On Tue, Oct 18, 2016 at 10:23 AM, André Warnier (tomcat) <aw...@ice-sa.com>
wrote:
>
>
> Good. That our goal here. We live to help :-)
>
>
You all have been helpful beyond description.


> I don't think that there is a need for a formal "petition". This being a
> Tomcat list, and the mod_jk Connector being part of the Tomcat project
> (despite being an Apache httpd add-on), I believe that this is in scope of
> the list, as long as we do not stray too far in httpd-specific things.
> One problem is that this list strips most attachments, and that posting
> your whole configuration setup may be a bit much for pasting it into your
> next message.
> Is there a way by which you can post your configuration to some
> publicly-accessible place, and provide a link ?
>
>
I was going to post skeletons of the conf files in the text of the emails.
I didn't plan to send them as attachements.

I recognize I'll have to edit them and strip out the parts not significant
to the conversation and i'm completely happy to do so.

Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

Posted by "André Warnier (tomcat)" <aw...@ice-sa.com>.
On 18.10.2016 16:16, Mark Juszczec wrote:
> On Tue, Oct 18, 2016 at 10:10 AM, Andr� Warnier (tomcat) <aw...@ice-sa.com>
> wrote:
>
>>
>> This being a list dedicated to Tomcat, maybe we are going a bit deep in
>> the Apache httpd configuration and precedence rules here.
>> It is anyway difficult to answer your questions, without seeing the whole
>> of the Apache httpd configuration files.
>>
>>
> I certainly don't want to run afoul of the forum rules.
>
> This is the most progress I've made in 3 weeks.
>

Good. That our goal here. We live to help :-)

> Is there a way I can petition to be allowed to post my Apache config files
> here OR is there a more suitable forum and can I invite the people
> contributing to this thread to continue comment in the more suitable forum?
>

I don't think that there is a need for a formal "petition". This being a Tomcat list, and 
the mod_jk Connector being part of the Tomcat project (despite being an Apache httpd 
add-on), I believe that this is in scope of the list, as long as we do not stray too far 
in httpd-specific things.
One problem is that this list strips most attachments, and that posting your whole 
configuration setup may be a bit much for pasting it into your next message.
Is there a way by which you can post your configuration to some publicly-accessible place, 
and provide a link ?


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

Posted by Mark Juszczec <ma...@gmail.com>.
On Tue, Oct 18, 2016 at 10:10 AM, André Warnier (tomcat) <aw...@ice-sa.com>
wrote:

>
> This being a list dedicated to Tomcat, maybe we are going a bit deep in
> the Apache httpd configuration and precedence rules here.
> It is anyway difficult to answer your questions, without seeing the whole
> of the Apache httpd configuration files.
>
>
I certainly don't want to run afoul of the forum rules.

This is the most progress I've made in 3 weeks.

Is there a way I can petition to be allowed to post my Apache config files
here OR is there a more suitable forum and can I invite the people
contributing to this thread to continue comment in the more suitable forum?

Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

Posted by "André Warnier (tomcat)" <aw...@ice-sa.com>.
On 18.10.2016 15:22, Mark Juszczec wrote:
> On Tue, Oct 18, 2016 at 9:13 AM, Mark Juszczec <ma...@gmail.com>
> wrote:
>
>>
>>
>>    <VirtualHost *:9001>
>>      DocumentRoot /some/dir/thatDoesNotExist/
>>      JkEnvVar nameWithIntlChar
>>      JkMount /myService/* lbAjpWorker
>>      JkMount /myService lbAjpWorker
>>
>>    </VirtualHost>
>>
>>
> I forgot to ask something.
>
> The above DocumentRoot does not exist.  There is another DocumentRoot
> defined outside of the <VirtualHost *:9001> if posted but it does not exist
> either.
>
> Could this have anything to do with my problem?
>
> What should these values be set to?
>

This being a list dedicated to Tomcat, maybe we are going a bit deep in the Apache httpd 
configuration and precedence rules here.
It is anyway difficult to answer your questions, without seeing the whole of the Apache 
httpd configuration files.
Generally speaking :
- whatever configuration directive is outside a <VirtualHost> section, acts as a 
"default", which is inherited by all <VirtualHost> sections.
- if a <VirtualHost> section contains a similar directive to one of these default values, 
then for requests to this <VirtualHost>, the directive that is inside the <VirtualHost> 
section overrides the one that is outside.

But there are some total or partial exceptions to the above rules, some of which apply to 
mod_jk (see for example JkMountCopy).

About the VirtualHost section which you list above : it looks strange to me, because :
- it would apply only to HTTP requests directed to port 9001, which is a bit unusual
- it does not seem to have a ServerName, which for name-based VirtualHosts is quite essential
- and without a valid DocumentRoot, any request for something other than the "JkMounted" 
URIs "/myService*" would have nowhere to be served from, and would thus return a "Not Found"

But again, without seeing the whole of the Apache httpd configuration, and without knowing 
exactly how browsers access this server, it is difficult to make a final call.
On wich platform (OS) is this running ?

Separate note : if you are more familiar or at ease with Apache httpd configuration 
sections than with the JkMount/JkUnmount directives, you may want to have a look at an 
alternative way of configuring the httpd -> Tomcat forwarding.
See this page : http://tomcat.apache.org/connectors-doc/reference/apache.html
and scroll down to the section :
Using SetHandler and Environment Variables

This method *replaces* the usage of JkMount/JkUnmount directives, by directives enclosed 
in Apache httpd <Location> sections.  Personally, I find that (for someone familiar with 
Apache httpd) this configuration method is clearer than JkMount/JkUnmount, because with 
JkMount/JkUnmount, it is sometimes unclear what the precedence rules are with respect to 
Alias, rewrite, proxy etc..

With respect to your (later) question about JkOptions : the same above page, in the 
initial "Configuration Directives" section, clearly specifies where each "Jk*" directive 
can be used and to what scope it applies. (The "global" term means : in the part of the 
configuration which is /not/ inside a <VirtualHost> section).



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

Posted by Mark Juszczec <ma...@gmail.com>.
On Tue, Oct 18, 2016 at 9:13 AM, Mark Juszczec <ma...@gmail.com>
wrote:

>
>
>   <VirtualHost *:9001>
>     DocumentRoot /some/dir/thatDoesNotExist/
>     JkEnvVar nameWithIntlChar
>     JkMount /myService/* lbAjpWorker
>     JkMount /myService lbAjpWorker
>
>   </VirtualHost>
>
>
I forgot to ask something.

The above DocumentRoot does not exist.  There is another DocumentRoot
defined outside of the <VirtualHost *:9001> if posted but it does not exist
either.

Could this have anything to do with my problem?

What should these values be set to?

Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

Posted by Mark Juszczec <ma...@gmail.com>.
On Tue, Oct 18, 2016 at 8:36 AM, André Warnier (tomcat) <aw...@ice-sa.com>
wrote:

> On 18.10.2016 13:03, Mark Juszczec wrote:
>>
>>
>> No, the following line:
>>
>> JkOptions +ForwardKeySize +ForwardURIEscaped -ForwardDirectories
>>
>> is in an Apache conf file, but not in a VirtualHost entry.
>>
>> Can this directive go in multiple places?
>>
>>
> See here : http://tomcat.apache.org/connectors-doc/reference/apache.html
> -->Forwarding
>
> JkOptions are generally inherited by the <VirtualHost> sections, from the
> "main" (or "default") Apache configuration (aka outside <VirtualHost>
> entries).
> But there are exceptions and additional rules, so check carefully.
>
>
Ok.  My JkOptions directive appears outside the <VirtualHost> entry so I'm
going to assume its being inherited.

I will experiment with placing it in the <VirtualHost> section to see what
happens.

One quick question, do you or anyone reading know if something special
needs to be done to force % encoding when you have a workers.properties
file?

My workers.properties contains the following:

  worker.template.type=ajp13
  worker.template.ping_mode=A
  worker.template.socket_timeout=30
  worker.template.retries=2
  worker.template.host=localhost

  worker.list= someWorker, lbAjpWorker,

  worker.lbAjpWorker.reference=worker.template
  worker.lbAjpWorker.port=8045

In my conf file, someone has tied them together as follows:

  LoadModule jk_module /somepath/modules/mod_jk.so

  JkOptions +ForwardKeySize +ForwardURIEscaped -ForwardDirectories

  <VirtualHost *:9001>
    DocumentRoot /some/dir/thatDoesNotExist/
    JkEnvVar nameWithIntlChar
    JkMount /myService/* lbAjpWorker
    JkMount /myService lbAjpWorker

  </VirtualHost>

I don't see anything that says "make sure everything going to lbAjpWorker
is % encoded"

Is the JkOptions appearing outside <VirtualHost *:9001> supposed to take
care of that?

Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

Posted by "André Warnier (tomcat)" <aw...@ice-sa.com>.
On 18.10.2016 13:03, Mark Juszczec wrote:
> On Tue, Oct 18, 2016 at 1:14 AM, Rainer Jung <ra...@kippdata.de>
> wrote:
>
>> Am 17.10.2016 um 22:38 schrieb Mark Juszczec:
>>
>>>
>>>
>>> I've tried adding +ForwardURIEscaped in my conf file as follows:
>>>
>>> # JkOptions indicate to send SSL KEY SIZE,
>>> JkOptions +ForwardKeySize +ForwardURIEscaped -ForwardDirectories
>>>
>>> I would have expected mod_jk log to show the data % encoded, but it does
>>> not:
>>>
>>> text: J O � \u2039 L
>>> hex: 0x4a 0x4f 0xc3 0x8b 0x4c
>>>
>>> I had expected to see something like:
>>>
>>> JO%C3%8BL
>>>
>>> Is that reasonable?  Does it make sense?
>>>
>>
>> Yes.
>>
>> Could something be turning off the encoding?  Do the headers values need to
>>> be set to something specific?
>>>
>>
>> Did you put the directive into the correct VirtualHost?
>>
>>
>> Regards,
>>
>> Rainer
>>
>
> No, the following line:
>
> JkOptions +ForwardKeySize +ForwardURIEscaped -ForwardDirectories
>
> is in an Apache conf file, but not in a VirtualHost entry.
>
> Can this directive go in multiple places?
>

See here : http://tomcat.apache.org/connectors-doc/reference/apache.html
-->Forwarding

JkOptions are generally inherited by the <VirtualHost> sections, from the "main" (or 
"default") Apache configuration (aka outside <VirtualHost> entries).
But there are exceptions and additional rules, so check carefully.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

Posted by Mark Juszczec <ma...@gmail.com>.
On Tue, Oct 18, 2016 at 1:14 AM, Rainer Jung <ra...@kippdata.de>
wrote:

> Am 17.10.2016 um 22:38 schrieb Mark Juszczec:
>
>>
>>
>> I've tried adding +ForwardURIEscaped in my conf file as follows:
>>
>> # JkOptions indicate to send SSL KEY SIZE,
>> JkOptions +ForwardKeySize +ForwardURIEscaped -ForwardDirectories
>>
>> I would have expected mod_jk log to show the data % encoded, but it does
>> not:
>>
>> text: J O Ë ‹ L
>> hex: 0x4a 0x4f 0xc3 0x8b 0x4c
>>
>> I had expected to see something like:
>>
>> JO%C3%8BL
>>
>> Is that reasonable?  Does it make sense?
>>
>
> Yes.
>
> Could something be turning off the encoding?  Do the headers values need to
>> be set to something specific?
>>
>
> Did you put the directive into the correct VirtualHost?
>
>
> Regards,
>
> Rainer
>

No, the following line:

JkOptions +ForwardKeySize +ForwardURIEscaped -ForwardDirectories

is in an Apache conf file, but not in a VirtualHost entry.

Can this directive go in multiple places?

Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

Posted by Rainer Jung <ra...@kippdata.de>.
Am 17.10.2016 um 22:38 schrieb Mark Juszczec:
> On Mon, Oct 17, 2016 at 8:20 AM, Rainer Jung <ra...@kippdata.de>
> wrote:
>
>> Am 17.10.2016 um 12:35 schrieb Mark Juszczec:
>>
>>> On Mon, Oct 17, 2016 at 4:29 AM, Mark Thomas <ma...@apache.org> wrote:
>>>
>>>
>>>> A small hint. I'd expect those to be % encoded.
>>>>
>>>>
>>> Thank you very much for your reply.
>>>
>>> I've been thinking the problem is lack of % encoding after reading:
>>>
>>> *"Default encoding for GET*
>>> The character set for HTTP query strings (that's the technical term for
>>> 'GET parameters') can be found in sections 2 and 2.1 the "URI Syntax"
>>> specification. The character set is defined to be US-ASCII
>>> <http://en.wikipedia.org/wiki/ASCII>. Any character that does not map to
>>> US-ASCII must be encoded in some way. Section 2.1 of the URI Syntax
>>> specification says that characters outside of US-ASCII must be encoded
>>> using
>>>  % escape sequences: each character is encoded as a literal % followed by
>>> the two hexadecimal codes which indicate its character code. Thus, a
>>> (US-ASCII
>>> character code 97 = 0x61) is equivalent to %61. There *is no default
>>> encoding for URIs* specified anywhere, which is why there is a lot of
>>> confusion when it comes to decoding these values. "
>>>
>>> from http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8
>>>
>>> Do you know if there's a way to force something (mod_jk, mod_rewrite or
>>> something else) to % encode the data being fed into the AJP port?
>>>
>>
>> You can force nod_jk to %-encode the URI before forwarding:
>>
>> JkOptions     +ForwardURIEscaped
>>
>>
> I've tried adding +ForwardURIEscaped in my conf file as follows:
>
> # JkOptions indicate to send SSL KEY SIZE,
> JkOptions +ForwardKeySize +ForwardURIEscaped -ForwardDirectories
>
> I would have expected mod_jk log to show the data % encoded, but it does
> not:
>
> text: J O � \u2039 L
> hex: 0x4a 0x4f 0xc3 0x8b 0x4c
>
> I had expected to see something like:
>
> JO%C3%8BL
>
> Is that reasonable?  Does it make sense?

Yes.

> Could something be turning off the encoding?  Do the headers values need to
> be set to something specific?

Did you put the directive into the correct VirtualHost?

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

Posted by Mark Juszczec <ma...@gmail.com>.
On Mon, Oct 17, 2016 at 8:20 AM, Rainer Jung <ra...@kippdata.de>
wrote:

> Am 17.10.2016 um 12:35 schrieb Mark Juszczec:
>
>> On Mon, Oct 17, 2016 at 4:29 AM, Mark Thomas <ma...@apache.org> wrote:
>>
>>
>>> A small hint. I'd expect those to be % encoded.
>>>
>>>
>> Thank you very much for your reply.
>>
>> I've been thinking the problem is lack of % encoding after reading:
>>
>> *"Default encoding for GET*
>> The character set for HTTP query strings (that's the technical term for
>> 'GET parameters') can be found in sections 2 and 2.1 the "URI Syntax"
>> specification. The character set is defined to be US-ASCII
>> <http://en.wikipedia.org/wiki/ASCII>. Any character that does not map to
>> US-ASCII must be encoded in some way. Section 2.1 of the URI Syntax
>> specification says that characters outside of US-ASCII must be encoded
>> using
>>  % escape sequences: each character is encoded as a literal % followed by
>> the two hexadecimal codes which indicate its character code. Thus, a
>> (US-ASCII
>> character code 97 = 0x61) is equivalent to %61. There *is no default
>> encoding for URIs* specified anywhere, which is why there is a lot of
>> confusion when it comes to decoding these values. "
>>
>> from http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8
>>
>> Do you know if there's a way to force something (mod_jk, mod_rewrite or
>> something else) to % encode the data being fed into the AJP port?
>>
>
> You can force nod_jk to %-encode the URI before forwarding:
>
> JkOptions     +ForwardURIEscaped
>
>
I've tried adding +ForwardURIEscaped in my conf file as follows:

# JkOptions indicate to send SSL KEY SIZE,
JkOptions +ForwardKeySize +ForwardURIEscaped -ForwardDirectories

I would have expected mod_jk log to show the data % encoded, but it does
not:

text: J O Ë ‹ L
hex: 0x4a 0x4f 0xc3 0x8b 0x4c

I had expected to see something like:

JO%C3%8BL

Is that reasonable?  Does it make sense?

Could something be turning off the encoding?  Do the headers values need to
be set to something specific?

Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

Posted by Rainer Jung <ra...@kippdata.de>.
Am 17.10.2016 um 12:35 schrieb Mark Juszczec:
> On Mon, Oct 17, 2016 at 4:29 AM, Mark Thomas <ma...@apache.org> wrote:
>
>> On 17/10/2016 08:30, Mark Thomas wrote:
>>> On 16/10/2016 19:09, Mark Juszczec wrote:
>>>> Hello
>>>>
>>>> I have Tomcat 8.0.28 running on CentOS Linux 7.2.1511 behind Apache
>> 2.4.6
>>>>
>>>> I'm using AJP 1.3 for communication between Apache and Tomcat
>>>>
>>>> Its all powered by Java 1.8
>>>>
>>>> I'm having a problem with international characters when I send them as
>> the
>>>> request *URI* (which is used by GET requests and this is a GET request).
>>>>
>>>> Let's say I get the string AO�L
>>>>
>>>> mod_jk log  logs the bytes with the message
>>>>
>>>>  "ajp_connection_tcp_send_message::jk_ajp_common.c (1208): sending to
>> ajp13
>>>> pos=4 len=1411 max=8192" (at
>>>> ajp_connection_tcp_send_message::jk_ajp_common.c) shows them to be:
>>>>
>>>>   41 4f c3 8b 4c
>>>>
>>>> AFAIK this means the correct bytes are being sent to AJP.  Is that
>> correct?
>>>
>>> That is the correct UTF-8 byte encoding for the characters AO�L.
>>
>> A small hint. I'd expect those to be % encoded.
>>
>
> Thank you very much for your reply.
>
> I've been thinking the problem is lack of % encoding after reading:
>
> *"Default encoding for GET*
> The character set for HTTP query strings (that's the technical term for
> 'GET parameters') can be found in sections 2 and 2.1 the "URI Syntax"
> specification. The character set is defined to be US-ASCII
> <http://en.wikipedia.org/wiki/ASCII>. Any character that does not map to
> US-ASCII must be encoded in some way. Section 2.1 of the URI Syntax
> specification says that characters outside of US-ASCII must be encoded using
>  % escape sequences: each character is encoded as a literal % followed by
> the two hexadecimal codes which indicate its character code. Thus, a (US-ASCII
> character code 97 = 0x61) is equivalent to %61. There *is no default
> encoding for URIs* specified anywhere, which is why there is a lot of
> confusion when it comes to decoding these values. "
>
> from http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8
>
> Do you know if there's a way to force something (mod_jk, mod_rewrite or
> something else) to % encode the data being fed into the AJP port?

You can force nod_jk to %-encode the URI before forwarding:

JkOptions     +ForwardURIEscaped

(see http://tomcat.apache.org/connectors-doc/webserver_howto/apache.html)

You might need to experiment whether that really fixes your issues, e.g. 
when parts of the URI are already %-encoded etc.

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

Posted by Mark Juszczec <ma...@gmail.com>.
On Mon, Oct 17, 2016 at 4:29 AM, Mark Thomas <ma...@apache.org> wrote:

> On 17/10/2016 08:30, Mark Thomas wrote:
> > On 16/10/2016 19:09, Mark Juszczec wrote:
> >> Hello
> >>
> >> I have Tomcat 8.0.28 running on CentOS Linux 7.2.1511 behind Apache
> 2.4.6
> >>
> >> I'm using AJP 1.3 for communication between Apache and Tomcat
> >>
> >> Its all powered by Java 1.8
> >>
> >> I'm having a problem with international characters when I send them as
> the
> >> request *URI* (which is used by GET requests and this is a GET request).
> >>
> >> Let's say I get the string AOËL
> >>
> >> mod_jk log  logs the bytes with the message
> >>
> >>  "ajp_connection_tcp_send_message::jk_ajp_common.c (1208): sending to
> ajp13
> >> pos=4 len=1411 max=8192" (at
> >> ajp_connection_tcp_send_message::jk_ajp_common.c) shows them to be:
> >>
> >>   41 4f c3 8b 4c
> >>
> >> AFAIK this means the correct bytes are being sent to AJP.  Is that
> correct?
> >
> > That is the correct UTF-8 byte encoding for the characters AOËL.
>
> A small hint. I'd expect those to be % encoded.
>

Thank you very much for your reply.

I've been thinking the problem is lack of % encoding after reading:

*"Default encoding for GET*
The character set for HTTP query strings (that's the technical term for
'GET parameters') can be found in sections 2 and 2.1 the "URI Syntax"
specification. The character set is defined to be US-ASCII
<http://en.wikipedia.org/wiki/ASCII>. Any character that does not map to
US-ASCII must be encoded in some way. Section 2.1 of the URI Syntax
specification says that characters outside of US-ASCII must be encoded using
 % escape sequences: each character is encoded as a literal % followed by
the two hexadecimal codes which indicate its character code. Thus, a (US-ASCII
character code 97 = 0x61) is equivalent to %61. There *is no default
encoding for URIs* specified anywhere, which is why there is a lot of
confusion when it comes to decoding these values. "

from http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8

Do you know if there's a way to force something (mod_jk, mod_rewrite or
something else) to % encode the data being fed into the AJP port?

Mark

Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

Posted by Mark Thomas <ma...@apache.org>.
On 17/10/2016 08:30, Mark Thomas wrote:
> On 16/10/2016 19:09, Mark Juszczec wrote:
>> Hello
>>
>> I have Tomcat 8.0.28 running on CentOS Linux 7.2.1511 behind Apache 2.4.6
>>
>> I'm using AJP 1.3 for communication between Apache and Tomcat
>>
>> Its all powered by Java 1.8
>>
>> I'm having a problem with international characters when I send them as the
>> request *URI* (which is used by GET requests and this is a GET request).
>>
>> Let's say I get the string AO�L
>>
>> mod_jk log  logs the bytes with the message
>>
>>  "ajp_connection_tcp_send_message::jk_ajp_common.c (1208): sending to ajp13
>> pos=4 len=1411 max=8192" (at
>> ajp_connection_tcp_send_message::jk_ajp_common.c) shows them to be:
>>
>>   41 4f c3 8b 4c
>>
>> AFAIK this means the correct bytes are being sent to AJP.  Is that correct?
> 
> That is the correct UTF-8 byte encoding for the characters AO�L.

A small hint. I'd expect those to be % encoded.

Mark


> 
> 
>> Running remote debugging via Spring Tool Suite to hook up to my code shows
>> me I receive:
>>
>>     41 4f c3 c3 83 c2 c2 8b 4c
> 
> That is not valid UTF-8. If the UTF-8 bytes had been treated as
> ISO-8859-1 and then re-encoded as UTF-8 I'd expect to see:
> 
> 41 4f c3 83 c2 8b 4c
> 
>> I have verified the incorrect bytes appear as early in the call stack as
>> when CoyoteAdapter.process() is invoked
> 
> I think you need to go a little further up the stack to track this down.
> 
>> I have UTF-8 specified as URIEncoding in ajp <Connector> and it has had no
>> effect.
> 
> That is the change I would have expected was required.
> 
>> Ive also specified  useBodyEncodingForURI as true with no effect.
> 
> That won't help for a GET request.
> 
>> Conventional wisdom says the data is getting inadvertently as ISO-8859-1
>> somewhere along the line. Since the data is correct (per mod_jk.log)
>> heading into AJP and incorrect once CoyoteAdapter.java starts handling it
>> somehow, something is going wrong when the data is interpreted after being
>> read from the AJP port.
>>
>> Is that correct?
> 
> It looks to be something like that.
> 
>> I am at a loss as to how to correct this.  The only 2 things the docs say
>> are to use URIEnocding="UTF-8" and  useBodyEncodingForURI="true".  I'm
>> doing that and its not working.
>>
>> I am at a loss about what else to try or where to look.
>>
>> If you were faced with this, what would you try?  Any advice or suggestions
>> will be greatly appreciated.
> 
> I'd dig into the connector code. You need to figure out where those
> bytes are being transformed and why.
> 
> Mark
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

Posted by Mark Thomas <ma...@apache.org>.
On 16/10/2016 19:09, Mark Juszczec wrote:
> Hello
> 
> I have Tomcat 8.0.28 running on CentOS Linux 7.2.1511 behind Apache 2.4.6
> 
> I'm using AJP 1.3 for communication between Apache and Tomcat
> 
> Its all powered by Java 1.8
> 
> I'm having a problem with international characters when I send them as the
> request *URI* (which is used by GET requests and this is a GET request).
> 
> Let's say I get the string AO�L
> 
> mod_jk log  logs the bytes with the message
> 
>  "ajp_connection_tcp_send_message::jk_ajp_common.c (1208): sending to ajp13
> pos=4 len=1411 max=8192" (at
> ajp_connection_tcp_send_message::jk_ajp_common.c) shows them to be:
> 
>   41 4f c3 8b 4c
> 
> AFAIK this means the correct bytes are being sent to AJP.  Is that correct?

That is the correct UTF-8 byte encoding for the characters AO�L.


> Running remote debugging via Spring Tool Suite to hook up to my code shows
> me I receive:
> 
>     41 4f c3 c3 83 c2 c2 8b 4c

That is not valid UTF-8. If the UTF-8 bytes had been treated as
ISO-8859-1 and then re-encoded as UTF-8 I'd expect to see:

41 4f c3 83 c2 8b 4c

> I have verified the incorrect bytes appear as early in the call stack as
> when CoyoteAdapter.process() is invoked

I think you need to go a little further up the stack to track this down.

> I have UTF-8 specified as URIEncoding in ajp <Connector> and it has had no
> effect.

That is the change I would have expected was required.

> Ive also specified  useBodyEncodingForURI as true with no effect.

That won't help for a GET request.

> Conventional wisdom says the data is getting inadvertently as ISO-8859-1
> somewhere along the line. Since the data is correct (per mod_jk.log)
> heading into AJP and incorrect once CoyoteAdapter.java starts handling it
> somehow, something is going wrong when the data is interpreted after being
> read from the AJP port.
> 
> Is that correct?

It looks to be something like that.

> I am at a loss as to how to correct this.  The only 2 things the docs say
> are to use URIEnocding="UTF-8" and  useBodyEncodingForURI="true".  I'm
> doing that and its not working.
> 
> I am at a loss about what else to try or where to look.
> 
> If you were faced with this, what would you try?  Any advice or suggestions
> will be greatly appreciated.

I'd dig into the connector code. You need to figure out where those
bytes are being transformed and why.

Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org