You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by "Roy T. Fielding" <fi...@gbiv.com> on 2004/12/10 11:12:25 UTC

removing AddDefaultCharset from config file

I've looked back at the Jan-Feb 2000 discussion regarding cross-site
scripting in an attempt to find out why AddDefaultCharset is being
set to iso-8859-1 in 2.x (but not in 1.3.x).  I can't find any rationale
for that behavior -- in fact, several people pointed out that it would
be inappropriate to set any default, which is why it was not set in 1.3.

The purpose of AddDefaultCharset is to provide sites that suffer from
poorly written scripts and cross-site scripting issues an immediate
handle by which they can force a single charset.  As it turns out, 
forcing
a charset does nothing to reduce the problem of cross-site scripting
because the browser will either auto-detect (and switch) or the user,
upon seeing a bunch of gibberish, will go up to the menu and switch
the charset just out of curiosity.  The real solutions were to
stop reflecting client-provided data back to the browser without first
carefully validating or percent-encoding it.

To make matters worse, the documentation in the default config is
completely wrong:

     # Specify a default charset for all pages sent out. This is
     # always a good idea and opens the door for future 
internationalisation
     # of your web site, should you ever want it. Specifying it as
     # a default does little harm; as the standard dictates that a page
     # is in iso-8859-1 (latin1) unless specified otherwise i.e. you
     # are merely stating the obvious. There are also some security
     # reasons in browsers, related to javascript and URL parsing
     # which encourage you to always set a default char set.
     #
     AddDefaultCharset ISO-8859-1

First, it only applies to text/plain and text/html, in spite of the
convoluted implementation in core.c.  Second, setting a default in the
server config actually hinders internationalization because normal 
authors
don't understand config files.  Furthermore, it causes harm because
it overrides the indicators present in the content. There is some 
argument
to make for doing that to CGI and SSI output for the sake of protecting
idiots from themselves, but not for flat files that do not contain any
generated content.  And the security reasons are not fixed by overriding
the charset anyway -- that just makes it easier for people to ignore the
real problems of unencoded data.  All that is really needed is the
availability of the directive so that *if* a site or tree is subject to
the XSS problem, then the server admins can set a default.

In short, unless someone can think of a justification for the above
being in the default config for 2.x, I will delete it soon and close
the festering PR 23421.

....Roy


Re: removing AddDefaultCharset from config file

Posted by Nick Kew <ni...@webthing.com>.
On Fri, 10 Dec 2004, Roy T.Fielding wrote:

> In short, unless someone can think of a justification for the above
> being in the default config for 2.x, I will delete it soon and close
> the festering PR 23421.

+1 for removing it.

I presume the reason it remains is that the last time it came up coincided
with discussion of a major reduction of the default shipped httpd.conf.
Did that discussion go anywhere?

-- 
Nick Kew

Re: removing AddDefaultCharset from config file

Posted by "Roy T. Fielding" <fi...@gbiv.com>.
On Dec 10, 2004, at 4:19 AM, Joe Orton wrote:
> My understanding was that the forced default charset *does* prevent
> browsers (or maybe, MSIE) from guessing the charset as UTF-7; UTF-7
> being the special case as it's already an "escaped" encoding and hence
> defies normal escaping-of-client-provided-data tricks.  Is that not
> correct?

Yes and no -- it is both the source of the problem and the biggest
reason that we should NOT set charset as a default.

Consider the following two identical content resources, the first
being sent as

      Content-Type: text/html; charset=ISO-8859-15

   http://www-uxsup.csx.cam.ac.uk/~jw35/docs/cross-site-demo.html

and the second being sent with only

      Content-Type: text/html

   http://www.ics.uci.edu/~fielding/xss-demo.html

I've tested the above with all of my browsers.  Safari and MSIE-Mac do 
not
support utf-7 at all.  Firefox (Mac and Win) supports utf-7 but only 
when
manually set (it does not auto-detect utf-7, even when read from a 
local file).

MSIE (Windows), of course, does the least intelligent thing -- it does
not allow users to select utf-7 manually, but does auto-detect and 
interpret
utf-7 if it is read from a local file, or if "auto-detect" is enabled
regardless of the content-type charset parameter -- setting charset has
no effect on MSIE's auto-detect results.  In other words, it
is only at risk for XSS via utf-7 if auto-detect is enabled.

The problem we have created is that AddDefaultCharset causes entire
sites to default to one charset, usually iso-8859-1.  And because it
is set by default (no brains spent thinking about the right value),
it is often set that way even when installed in non-Latin countries
[and there is also a problem in Europe, since iso-8859-15 is where
the euro symbol was added].  As a result, normal users get a higher
frequency of wrong charset declarations in HTTP, for which the only
"standards-compliant" solution short of manually adjusting every
page received is to turn on auto-detect!  In other words, our default
is now causing more users to be vulnerable to utf-7 XSS attacks than
they would otherwise be if we never sent a default charset.

In any case, the only tutorials on cross-site scripting that still
emphasize setting charset is our own (written by Marc) and CERT's
(based on input from Marc).  Those were intended to be temporary
workarounds until folks had a chance to fix the real problems, which
were non-validating scripts that echo untrusted content to users.

After doing another afternoon of research on this one, I am now 
convinced
that AddDefaultCharset does far more harm than good.

....Roy


Re: removing AddDefaultCharset from config file

Posted by Joe Orton <jo...@redhat.com>.
On Fri, Dec 10, 2004 at 02:12:25AM -0800, Roy T. Fielding wrote:
> I've looked back at the Jan-Feb 2000 discussion regarding cross-site
> scripting in an attempt to find out why AddDefaultCharset is being
> set to iso-8859-1 in 2.x (but not in 1.3.x).  I can't find any rationale
> for that behavior -- in fact, several people pointed out that it would
> be inappropriate to set any default, which is why it was not set in 1.3.
> 
> The purpose of AddDefaultCharset is to provide sites that suffer from
> poorly written scripts and cross-site scripting issues an immediate
> handle by which they can force a single charset.  As it turns out,
> forcing a charset does nothing to reduce the problem of cross-site
> scripting because the browser will either auto-detect (and switch) or
> the user, upon seeing a bunch of gibberish, will go up to the menu and
> switch the charset just out of curiosity.  The real solutions were to
> stop reflecting client-provided data back to the browser without first
> carefully validating or percent-encoding it.

My understanding was that the forced default charset *does* prevent
browsers (or maybe, MSIE) from guessing the charset as UTF-7; UTF-7
being the special case as it's already an "escaped" encoding and hence
defies normal escaping-of-client-provided-data tricks.  Is that not
correct?

joe