You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Carlo Perassi <ca...@linux.it> on 2002/08/12 22:59:16 UTC

Why can't ap_send_error_response() count on charset?

Hi all.
In modules/http/http_protocol.c
the comment say
ap_send_error_response is used for any response that can be generated by the
server from the request record. This includes all [snip] messages that have
not been redirected to another handler via the ErrorDocument feature.
On line 2331 I read:
/* can't count on a charset filter being in place here,
 * so do ebcdic->ascii translation explicitly (if needed)
 */

It's trivial to add on line 2336 to ap_rvputs_proto_in_ascii() a string like
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
or so... but the comment about say "can't count on a charset".

Anyway... with the actual code, the html generated by ap_send_error_response
can't pass the W3C Validator test (with the missing meta line it would be ok).

I'd like to see the html generated by ap_send_error_response to pass the W3C
Validator test in the default configuration (say without using external html
files for 404 and so on).

The patch is trivial but I don't understand why (we) "can't count on a charset
filter being in place here".

Thank you.

-- 
Carlo Perassi - http://www.linux.it/~carlo/
Do only what only you can do (Edsger Wybe Dijkstra: 1930-2002)

Re[2]: Why can't ap_send_error_response() count on charset?

Posted by Denis Karavayev <he...@yandex.ru>.
Hello!

    Well, it's maybe not so close to the subject... But...
    I'm faced up with some problem with the type-map files (yes, for
example, with default error pages in Apache distribution). Who can
tell me, what is the next line (that can be inserted in type-map
files) exactly means:

    Content-type: text/html;charset=WINDOWS-1251

    This is from index.html.var (htdocs dir).
    Ok, this is my problem: i just translated all error pages into my own
language (Russian), for example, next lines from HTTP_NOT_FOUND.html.var:

[... previous languages ...]

Content-language: ru
Content-type: text/html;charset=WINDOWS-1251
Body:----------ru--
<!--#set var="TITLE" value="Object not found!" -->
<!--#include virtual="include/top.html" -->

    The requested URL was not found on this server. (Assume it's in
Russian).
    Bla-bla-bla...

<!--#include virtual="include/bottom.html" -->
----------ru--

    Is it Ok ;)??? It is Ok! But when i try to see it ;), i've got a
non readable page (page tagged in header such as it in
AddDefaultCharset (ISO-8859-1) charset). It seems that

Content-type: text/html;charset=WINDOWS-1251

means NOTHING (at last in terms of charset). It is simply not work (or
maybe i simply don't understand meaning of this line ;)??? I can put
all error text in the file and rewrite all Body section to

URI: error.html

but this will not change anything. I MUST add .ru.cp-2151 extension to
this file so mod_mime can wake up...

    General question:
    Is it possible to define a charset in type-map files (for Body
sections)???

                                                   Best regards, Denis.


Re: Why can't ap_send_error_response() count on charset?

Posted by Carlo Perassi <ca...@linux.it>.
On Tue, Aug 13, 2002 at 12:52:25PM -0400, Greg Ames sent those random bytes:
> in the html.  I am curious to hear what the W3C Validator people say.

Well, my message to W3C generated a thread of ten emails.
This is a short report of their toughts.
1 - There is no need to specify a meta charset in HTML documents if the
    charset is given in the Content-Type header.
    <Liam Quinn>
    But there may be an additional complication: Some 404s may be in
    other encodings than iso-8859-1. In that case, the header would
    be wrong. As long as this is just for the built-in 'last resort'
    error message that doesn't change, it's okay. But in case it's
    tagged onto any arbitrary error message, it's a problem.
    (So with Greg's fix Apache should be fine - Carlo)
    <Martin Duerst>
    BTW, a related problem is the directive 'AddDefaultCharset'.
    This adds a 'charset' parameter to *every* Content-Type that
    doesn't already have one. This means that if you have some
    gifs, they get served as Content-Type: image/gif; charset=foo.
    This is of quite useless.
    <Martin Duerst>
    (About the AddDefaultCharset problem noted by Duerst)
    The Apache documentation implies that, but it isn't actually the case in
    my testing with Apache 1.3.26.  The charset parameter only seems to be
    added for text/html and text/plain.  It's not added for image/* or
    text/vnd.wap.wml.
    <Liam Quinn>
2 - About the default HTML code provided for a 404:
    (Apache developers) should change <hr /> to <hr>.  <hr /> is for
    XHTML/XML only, but they've specified HTML 2.0.
    <Liam Quinn>
3 - Some of the W3C people thinks having an option 'validate error messages' in
    the validator form is a good idea, because they want to be able to validate
    all html.

-- 
Carlo Perassi - http://www.linux.it/~carlo/
Do only what only you can do (Edsger Wybe Dijkstra: 1930-2002)

Re: Why can't ap_send_error_response() count on charset?

Posted by Greg Ames <gr...@apache.org>.
Carlo Perassi wrote:
> 
> On Tue, Aug 13, 2002 at 11:06:57AM -0400, Greg Ames sent those random bytes:
> > Can you try it again with current cvs HEAD?  I'm not familiar with the W3C
> > Validator test, but I would hope that if it saw a good http Content-Type header,
> > it wouldn't need the stuff in the html meta line.

> it's trivial to change the Apache C code to generate W3C pages but they have
> technical reasons which don't permit to define a meta tag with charset
> definition... 

Well, really it's more of a philosophical issue than a technical reason.  This
code should be as simple as possible due to its use in error situations, plus I
dislike the idea of having to maintain the same information in two different
places.  But if the meta tag is necessary for some reason (which I don't
understand) we can easily add it, as you pointed out.  What do the html experts
think?  Joshua?  Cliff?

> so some minutes ago, on the Apache CVS tree it's appeared a fix
> for a header problem, and as Greg Ames <gr...@apache.org> said
> "I would hope that if (the Validator) saw a good http Content-Type header,
> it wouldn't need the stuff in the html meta line."
> 
> Before trying the new Apache CVS code... 

Carlo, I don't think the fix I commited will make a difference here, now that I
see what you are doing.  So trying cvs HEAD would be a waste of your time.  It
only improves things when an ErrorDocument local redirect fails.  You are
already getting a good http Content-Type header from apache.org, but no meta tag
in the html.  I am curious to hear what the W3C Validator people say.

> carlo@voyager:~$ telnet www.apache.org 80
> Trying 63.251.56.142...
> Connected to daedalus.apache.org.
> Escape character is '^]'.
> GET http://www.apache.org/doesntexist.html HTTP/1.0
> 
> HTTP/1.1 404 Not Found
> Date: Tue, 13 Aug 2002 15:41:38 GMT
> Server: Apache/2.0.40 (Unix)
> Content-Length: 287
> Connection: close
> Content-Type: text/html; charset=iso-8859-1
> 
> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
> <html><head>
> <title>404 Not Found</title>
> </head><body>
> <h1>Not Found</h1>
> <p>The requested URL /doesntexist.html was not found on this server.</p>
> <hr />
> <address>Apache/2.0.40 Server at www.apache.org Port 80</address>
> </body></html>
> Connection closed by foreign host.

Greg

Re: Why can't ap_send_error_response() count on charset?

Posted by Marc Slemko <ma...@znep.com>.
On Tue, 13 Aug 2002, Roy T. Fielding wrote:

> Someone could try adding the meta tag to the HTML output instead of
> on the content-type, but then they would have to check to see if this
> still reduces the cross-site scripting problems that Marc found earlier.

My recollection (I would have to check my notes to be sure) is that it is
not sufficient to put it as a meta tag to attempt to protect from charset
related cross site scripting attacks because it can be overidden.

In fact, the exact browser bug that this thread is trying to work around
(redirect with an explicit encoding in the HTTP headers results in that
encoding being used for the target of the redirect instead of anything
specified in a meta tag in that document) is one good example of why
setting it in the meta tag ourself wouldn't be sufficient to avoid charset
related cross site scripting attacks on browsers with this bug... since
the attacker site could just set one in their HTTP headers.


Re: Why can't ap_send_error_response() count on charset?

Posted by "Roy T. Fielding" <fi...@apache.org>.
>> Can you try it again with current cvs HEAD?  I'm not familiar with the 
>> W3C
>> Validator test, but I would hope that if it saw a good http Content-Type 
>> header,
>> it wouldn't need the stuff in the html meta line.
>
> Me too but I found a problem/feature due to the validator so I just wrote 
> the
> following email to the w3c validator team:

Just turn off the validator feature for detecting charset through the
META tag.  That check is bogus anyway and the W3C knows it.  The charset
of the error documents is iso-8859-1, which is the default for HTTP.

Someone could try adding the meta tag to the HTML output instead of
on the content-type, but then they would have to check to see if this
still reduces the cross-site scripting problems that Marc found earlier.

....Roy


Re: Why can't ap_send_error_response() count on charset?

Posted by Carlo Perassi <ca...@linux.it>.
On Tue, Aug 13, 2002 at 11:06:57AM -0400, Greg Ames sent those random bytes:
> Can you try it again with current cvs HEAD?  I'm not familiar with the W3C
> Validator test, but I would hope that if it saw a good http Content-Type header,
> it wouldn't need the stuff in the html meta line.

Me too but I found a problem/feature due to the validator so I just wrote the
following email to the w3c validator team:

/*

Hi all
the default "404 Not Found" page generated by the latest version of Apache HTTP
Server (and the similar pages) doesn't pass the W3C Validator test
(
it's a HTML 2.0 code shipped without a meta tag with charset value: try this
foo page to see it:
http://www.apache.org/doesntexist.html
)

As I explain to the Apache developers
(
see
http://marc.theaimsgroup.com/?l=apache-httpd-dev&m=102918549709592&w=2
and
http://marc.theaimsgroup.com/?l=apache-httpd-dev&m=102925143132691&w=2
)
it's trivial to change the Apache C code to generate W3C pages but they have
technical reasons which don't permit to define a meta tag with charset
definition... so some minutes ago, on the Apache CVS tree it's appeared a fix
for a header problem, and as Greg Ames <gr...@apache.org> said
"I would hope that if (the Validator) saw a good http Content-Type header,
it wouldn't need the stuff in the html meta line."

Before trying the new Apache CVS code... I found a "problem": when your
Validator found a "404" on the response header of the server, it doesn't
parse the HTML provided anymore.

see this session and, trust me, the validator doesn't parse the code below:

#
# BEGIN
#

carlo@voyager:~$ telnet www.apache.org 80
Trying 63.251.56.142...
Connected to daedalus.apache.org.
Escape character is '^]'.
GET http://www.apache.org/doesntexist.html HTTP/1.0

HTTP/1.1 404 Not Found
Date: Tue, 13 Aug 2002 15:41:38 GMT
Server: Apache/2.0.40 (Unix)
Content-Length: 287
Connection: close
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL /doesntexist.html was not found on this server.</p>
<hr />
<address>Apache/2.0.40 Server at www.apache.org Port 80</address>
</body></html>
Connection closed by foreign host.

#
# END
#

My question is: why don't you drive the Validator to parse the html code, even
when the return code is different from 200?
If you do like this, Apache team will be able to check if the fix on the code
which produces the header of the response is enough to pass the test.

Thank you.

*/

So I (we) should wait their answer.
Thanks.

-- 
Carlo Perassi - http://www.linux.it/~carlo/
Do only what only you can do (Edsger Wybe Dijkstra: 1930-2002)

Re: Why can't ap_send_error_response() count on charset?

Posted by Greg Ames <gr...@apache.org>.
Carlo Perassi wrote:
> 
> Hi all.
> In modules/http/http_protocol.c
> the comment say
> ap_send_error_response is used for any response that can be generated by the
> server from the request record. This includes all [snip] messages that have
> not been redirected to another handler via the ErrorDocument feature.
> On line 2331 I read:
> /* can't count on a charset filter being in place here,
>  * so do ebcdic->ascii translation explicitly (if needed)
>  */

This code produces worst-case error messages.  We can't assume that
mod_charset_lite's output filter (or any other resource filter) is present and
working, because there are many reasons why it might be missing or
non-functional, including user configuration errors.  So on ebcdic platforms, we
must explictly translate the error messages from the native charset used in the
source code to ascii.  In non-error situations, ebcdic->ascii charset
translations (and vice versa) are done using filters.

> Anyway... with the actual code, the html generated by ap_send_error_response
> can't pass the W3C Validator test (with the missing meta line it would be ok).

I just committed a fix a few minutes ago which should fix http header problems
with the worst case error messages, as well as the ebcdic issue.  With this fix,
ap_send_error_message should *always* send a good http Content-Type header
containing the values set up on line 2264:

ap_set_content_type(r, "text/html; charset=iso-8859-1");

Can you try it again with current cvs HEAD?  I'm not familiar with the W3C
Validator test, but I would hope that if it saw a good http Content-Type header,
it wouldn't need the stuff in the html meta line.

Greg