You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Jens Stutte <st...@email.it> on 2005/03/22 18:52:25 UTC

[users@httpd] mod_proxy_html and URL encoded parameters in form

Hello,

i have a problem with a reverse proxy which uses mod_proxy_html.
The application on the server to which the reverse proxy points opens a 
form which parameters then have to be posted back. The application by 
default delivers all content in ISO-8859-1 encoding. If i connect
 directly with the browser, the parameters of the form will be url-encoded
according to ISO-8859-1. That is, for example, an italian à becomes a %E0.
 
If i put in between the reverse proxy (configuration see below), 
all content is delivered by mod_proxy_html as UTF-8. Therefore the 
browser will encode an à as %C3 in the form parameters. This
parameter is passed to the backend web server without being 
changed, but it seems that the application expects ISO-8859-1 as
input and the http header of the request has been changed to ISO-8859-1 
by mod_proxy_html.

Is this an error of mod_proxy_html or am i missing something? 
I had a short look at the code of the module, but i wasn't able 
to find the point in which is determined the encoding used against
the backend server.

Any help would be very appreciated and thanx in advance!


Appendix: The config
ProxyRequests off

#----------------------------------------
# Le direttive proxypass standard
ProxyPass /www/ http://portal01.tuconti.telecomitalia.it:7001/
ProxyPass /cal/ http://ldap01.tuconti.telecomitalia.it:8080/
ProxyPass /aut/ http://ldap01.tuconti.telecomitalia.it:80/
ProxyPassReverse /www/ http://portal01.tuconti.telecomitalia.it:7001/
ProxyPassReverse /cal/ http://ldap01.tuconti.telecomitalia.it:8080/
ProxyPassReverse /aut/ http://ldap01.tuconti.telecomitalia.it:80/

#----------------------------------------
# Configurazione globale di mod_proxy_html
ProxyHTMLExtended On
ProxyHTMLStripComments Off
ProxyHTMLURLMap http://portal01.tuconti.telecomitalia.it:7001/ /www/
ProxyHTMLURLMap http://ldap01.tuconti.telecomitalia.it:8080/ /cal/
ProxyHTMLURLMap http://ldap01.tuconti.telecomitalia.it:80/ /aut/


<Location /www/>
        SetOutputFilter  proxy-html

        ProxyHTMLURLMap /       /www/ ec
        ProxyHTMLURLMap /www    /www  ec

        ProxyHTMLURLMap (['"])/   $1/www/ Rh

        RequestHeader   unset   Accept-Encoding
</Location>

<Location /cal/>
        SetOutputFilter proxy-html

        ProxyHTMLURLMap /       /cal/ ec
        ProxyHTMLURLMap /cal    /cal ec

$1/cal/$2 Rh


        RequestHeader   unset   Accept-Encoding
</Location>

<Location /aut/>
        SetOutputFilter proxy-html

        ProxyHTMLURLMap /       /aut/ ec
        ProxyHTMLURLMap /aut    /aut ec

        ProxyHTMLURLMap (['"])/   $1/aut/ Rh

        RequestHeader   unset   Accept-Encoding
</Location>

ProxyHTMLLogVerbose On
LogLevel Info



---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


[users@httpd] Re: mod_proxy_html and URL encoded parameters in form

Posted by Jens Stutte <st...@email.it>.
Just to complete the scenario: any special character contained in the html 
body is shown correctly in the browser, only the form parameters are wrong.



---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


[users@httpd] Re: mod_proxy_html and URL encoded parameters in form

Posted by Jens Stutte <st...@email.it>.
Nick Kew <nick <at> webthing.com> writes:


> libxml2 uses utf-8 internally and transcodes on input, so mod_proxy_html
> determines encoding and tells libxml2.  To re-transcode on output would
> be an additional overhead, which mod_proxy_html doesn't incur.  Since
> transcoding filters (like mod_charset_lite) are available, it would be
> superfluous for mod_proxy_html to do that too.
> 
> You could insert a transcoding filter after mod_proxy_html to get back
> your iso-8859-1.  Or you insert a transcoding input filter when you
> accept the form data.
> 

Thank you for this precious hint! As far as you know, in this way i can
determine the original encoding of the webserver to reencode everything as it
was? I will have to integrate several apps and cannot ensure that the encoding
they will use is the same for all.

Best regards,

Jens



---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


[users@httpd] Re: mod_proxy_html and URL encoded parameters in form

Posted by Jens Stutte <st...@email.it>.
After a bit of peeking i have changed mod_charset_lite, as currently it does not
work on proxied requests. See my post on the devel list.




---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] mod_proxy_html and URL encoded parameters in form

Posted by Nick Kew <ni...@webthing.com>.
Jens Stutte wrote:

> If i put in between the reverse proxy (configuration see below), 
> all content is delivered by mod_proxy_html as UTF-8. Therefore the 
> browser will encode an à as %C3 in the form parameters. This
> parameter is passed to the backend web server without being 
> changed, but it seems that the application expects ISO-8859-1 as
> input and the http header of the request has been changed to ISO-8859-1 
> by mod_proxy_html.
> 
> Is this an error of mod_proxy_html or am i missing something? 
> I had a short look at the code of the module, but i wasn't able 
> to find the point in which is determined the encoding used against
> the backend server.
> 
> Any help would be very appreciated and thanx in advance!

libxml2 uses utf-8 internally and transcodes on input, so mod_proxy_html
determines encoding and tells libxml2.  To re-transcode on output would
be an additional overhead, which mod_proxy_html doesn't incur.  Since
transcoding filters (like mod_charset_lite) are available, it would be
superfluous for mod_proxy_html to do that too.

You could insert a transcoding filter after mod_proxy_html to get back
your iso-8859-1.  Or you insert a transcoding input filter when you
accept the form data.

-- 
Nick Kew

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org