You are viewing a plain text version of this content. The canonical link for it is here.
Posted to bugs@httpd.apache.org by bu...@apache.org on 2020/05/15 07:09:59 UTC

[Bug 64443] New: POSTing form data through proxy_html with different frontend / backend charsets

https://bz.apache.org/bugzilla/show_bug.cgi?id=64443

            Bug ID: 64443
           Summary: POSTing form data through proxy_html with different
                    frontend / backend charsets
           Product: Apache httpd-2
           Version: 2.4.43
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: mod_proxy_html
          Assignee: bugs@httpd.apache.org
          Reporter: a.suarez@defensor-and.es
  Target Milestone: ---

Per design and by default, proxy_html will translate HTML content into UTF-8
regardless of the backend charset. This is fine, since UTF-8 has wide browser
support as far as I know.

In that scenario, the browser will encode POSTed form data in UTF-8, but that
may not match the backend charset when proxy_html re-submits the form content
upstream. E.g.:

GET:  Client <--(UTF-8)--- proxy_html <--(ISO-8859-1)--- Backend
POST: Client ---(UTF-8)--> proxy_html ---(UTF-8)-------> Backend
                                     (encoding mismatch!)

A simple workaround is to specify the backend charset by adding an
accept-charset attribute to HTML <form> tags. That attribute isn't usually
needed, as form enconding usually matches that of the HTML document; so -I
guess- it's rarely used. When moving a site from direct to proxied publishing,
that means the whole site would need to be checked and recoded to add that
accept-charset attribute to every <form>. 

As proxy_html deals automatically with different fronted / backend charsets in
downstream content, maybe it would be expected to do the same with upstream
POSTed form data. Maybe a "stateful" approach to it (i.e. proxy_html keeping
track of every form translated downstream that should be reverse-translated
when posted upstream) isn't convenient or even feasible. In my very humble
opinion (with no knowledge of the internals of it) maybe a simpler solution
could be having that accept-charset attribute added automatically by proxy_html
when translating HTML forms. As per the docs, proxy_html's mission is just to
"rewrite HTML links in a proxy situation", but maybe it could be more widely
scoped to make HTML content coherent accross an Apache HTTP proxy.

Thank you in advance. Best regards,

Antonio

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 64443] POSTing form data through proxy_html with different frontend / backend charsets

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64443

Antonio Suárez <a....@defensor-and.es> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |WORKSFORME
             Status|NEW                         |RESOLVED

--- Comment #2 from Antonio Suárez <a....@defensor-and.es> ---
Seems well thought of :)

Also: checked out from trunk and works fine.

(only tested the <form> handling part under conditions b) and c) so far;
willing to test it more widely as soon as able)

Thanks for the great job!

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 64443] POSTing form data through proxy_html with different frontend / backend charsets

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64443

Nick Kew <ni...@webthing.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |FixedInTrunk

--- Comment #1 from Nick Kew <ni...@webthing.com> ---
Committed a fix to trunk in r1878553 .

This is the patch I posted and you tested on-list, fleshed out to test whether
the attribute is really necessary rather than insert it willy-nilly:

(a) if the input is utf-8, then we can't have broken anything, so don't fix it.
(b) if ProxyHTMLCharsetOut is set, assume the sysop is in charge, and don't fix
anything.
(c) if the backend set its own accept-charset attribute, don't mess with it!

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org