You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Greg Ward <gw...@python.net> on 2003/10/02 23:18:07 UTC

HTML quoting

What's the standard way of quoting text for inclusion in a web page in
Java?  Ie. I need a method to convert the string

  Jeb said, "Hell & damnation! Is 5 > 4?"

to

  Jeb said, &quot;Hell &amp; damnation! Is 5 &gt; 4?&quot;

(I think: I've never been entirely sure what the right way to handle
quotes is.)  That is, I want the standard Java equivalent of Python's
cgi.escape(), or Perl's CGI::escapeHTML().

To my utter amazement, I cannot find any indication that such a method
even exists in the standard Java library!  (I tried Google'ing and
poking through the JDK 1.4 docs.)

So I went looking in the source for Tomcat 4.1.27 -- surely the HTML
version of the manager app must quote at least the webapp's display
name, since it comes from a user-supplied file and therefore might
contain funny characters.  Surprisingly, the manager just lets funny
characters through without touching them.  Eg. if you put

  <display-name>foo &amp; bar webapp</display-name>

then "&amp;" is translated back to "&" by some part of the XML-parsing
chain, and is emitted as "&" in the manager HTML page.  Most browsers
can deal with minor violations like this, but it's still technically
incorrect.  Just for fun I tried this:

  <display-name>my &lt;script&gt;alert("foo");&lt;/script&gt;</display-name>

...and it works!  The manager emits this HTML:

 <td class="row-left"><small>my <script>alert("foo");</script> webapp</small></td>

and my browser pops up a JavaScript window while rendering the manager
page.  Cool!  I doubt this is a security hole -- not many people can
edit web.xml! -- but surely it at least counts as a rendering bug.  ;-)

So: can someone tell me what the standard way of quoting text for
inclusion in a web page generated by a Java web application is?

Thanks!

        Greg

---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


Re: HTML quoting

Posted by Christopher Williams <cc...@ntlworld.com>.
>
> It is obvious then that a space would be &#032; or &#x20; since 32 is
> the ascii code for a space. Though i cannot quite figure out why you
> would want to escape a space...
>

I escape spaces and character entities in form fields; if you do the
following:
    <input name="x" size=20 maxlength=20 value="<%= someVal %>">
and someVal contains space characters (or worse, '>'), it won't display
properly.  If you escape the spaces, it will, and this is what I use the
method for.

If I'm emitting HTML where I know what it will be beforehand, I simply
include any appropriate character entities in my resource strings.



---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


Re: HTML quoting

Posted by drm <dr...@melp.nl>.
If you refuse to or cannot use the 
org.apache.commons.lang.StringEscapeUtils (noted by Mike Curwen here), 
or the JSP functionality (noted by Tim Funk) it might help to know that 
a character entity doesn't have to be named. All characters have their 
respective character entity using the format &#nn..; or &#xhh..; where 
'nn..' and 'hh..' are their respective character index in the current 
encoding, decimal and hexadecimal respectively.

It is obvious then that a space would be &#032; or &#x20; since 32 is 
the ascii code for a space. Though i cannot quite figure out why you 
would want to escape a space...

The characters you want to escape are outside the bounds ascii
33 <= c <= 127
and c == '<', c == '>', c == '&', c == '\''

so this would be (imho) the better method:

xmp:

public static String escapeHtml ( String s ) {
    StringBuffer buffer = new StringBuffer ();

    for ( int i = 0; i < s.length(); i ++ ) {
       char c = s.charAt ( i );
       if ( c < 32 || c > 127 || c == '<' || c == '>' || c == '&' || c 
== '\'' ) {
          buffer.append ( "&#" + (int)c + ";" );
       } else {
          buffer.append ( c );
       }
    }
    return buffer.toString ();
}

One might consider ordering the conditions in the if statement by 
occurrence probability to improve performance...


HTH,
drm

Christopher Williams wrote:
> Here's a simple method to quote the most important character entities:
> 
>     /**
>      * Handles a couple of problematic characters in strings that are
> printed to
>      * an HTML stream, replacing them with their escaped equivalents
>      * @param s an input string
>      * @return the escaped string
>      */
>     public static String escapeSpaces(String s)
>     {
>         StringBuffer sb = new StringBuffer();
>         int nChars = s.length();
>         for (int i = 0; i < nChars; i++)
>         {
>             char c = s.charAt(i);
>             if (' ' == c)
>             {
>                 sb.append("&#032;");
>             }
>             else if ('>' == c)
>             {
>                 sb.append("&gt;");
>             }
>             else if ('<' == c)
>             {
>                 sb.append("&lt;");
>             }
>             else if ('\"' == c)
>             {
>                 sb.append("&quot;");
>             }
>             else if ('&' == c)
>             {
>                 sb.append("&amp;");
>             }
>             else
>             {
>                 sb.append(c);
>             }
>         }
>         return sb.toString();
>     }
> 
> A more complete solution would be to look up the complete list of character
> entities (e.g 'HTML and XHTML The Definitive Guide'), build a lookup table
> and use each character as an index into that table.
> 
> Chris Williams.
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


Re: HTML quoting

Posted by Christopher Williams <cc...@ntlworld.com>.
Here's a simple method to quote the most important character entities:

    /**
     * Handles a couple of problematic characters in strings that are
printed to
     * an HTML stream, replacing them with their escaped equivalents
     * @param s an input string
     * @return the escaped string
     */
    public static String escapeSpaces(String s)
    {
        StringBuffer sb = new StringBuffer();
        int nChars = s.length();
        for (int i = 0; i < nChars; i++)
        {
            char c = s.charAt(i);
            if (' ' == c)
            {
                sb.append("&#032;");
            }
            else if ('>' == c)
            {
                sb.append("&gt;");
            }
            else if ('<' == c)
            {
                sb.append("&lt;");
            }
            else if ('\"' == c)
            {
                sb.append("&quot;");
            }
            else if ('&' == c)
            {
                sb.append("&amp;");
            }
            else
            {
                sb.append(c);
            }
        }
        return sb.toString();
    }

A more complete solution would be to look up the complete list of character
entities (e.g 'HTML and XHTML The Definitive Guide'), build a lookup table
and use each character as an index into that table.

Chris Williams.



---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


Re: HTML quoting

Posted by Greg Ward <gw...@python.net>.
On 02 October 2003, Andy Eastham said:
> "Standard" one is java.net.URLEncoder.encode() and
> java.net.URLEncoder.decode()

No.  HTML quoting and URL encoding are quite different.

URLEncoder.encode() on my test string returns

  Jeb+said%2C+%22Hell+%26+damnation%21+Is+5+%3E+4%3F%22

(It would also be correct to replace every space with "%20".)

HTML quoting is for handling < > & and maybe " '.  Eg. Python's cgi.escape()
returns

  Jeb said, "Hell &amp; damnation! Is 5 &gt; 4?"

while Perl's CGI::escapeHTML() returns

  Jeb said, &quot;Hell &amp; damnation! Is 5 &gt; 4?&quot;

Hmmm, I see that Python's cgi.escape() has an optional arg to specify
quoting " characters.  Looks like Perl's CGI::escapeHTML() always quotes
quotes.  That's essential for cases like

  out.print("<input name=\"foo\" value=\"" + someUserSuppliedValue + "\"")

-- if someUserSuppliedValue has " characters in it, they must be quoted!

        Greg

---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


Re: HTML quoting

Posted by drm <dr...@melp.nl>.
Nope. 2 different things.

Filip Hanik wrote:
> encoding a URL is not the same as encoding HTML, or is it?
> ----- Original Message -----
> From: "Andy Eastham" <an...@barllama.demon.co.uk>
> To: "Tomcat Users List" <to...@jakarta.apache.org>
> Sent: Thursday, October 02, 2003 2:55 PM
> Subject: RE: HTML quoting
> 
> 
> Greg,
> 
> "Standard" one is java.net.URLEncoder.encode() and
> java.net.URLEncoder.decode()
> 
> Andy
> 
> 
>>-----Original Message-----
>>From: David Rees [mailto:drees@greenhydrant.com]
>>Sent: 02 October 2003 22:25
>>To: Tomcat Users List
>>Subject: Re: HTML quoting
>>
>>
>>On Thu, October 2, 2003 at 2:18 pm, Greg Ward sent the following
>>
>>>What's the standard way of quoting text for inclusion in a web page in
>>>Java?  Ie. I need a method to convert the string
>>>
>>>  Jeb said, "Hell & damnation! Is 5 > 4?"
>>>
>>>to
>>>
>>>  Jeb said, "Hell & damnation! Is 5 > 4?"
>>>
>>>(I think: I've never been entirely sure what the right way to handle
>>>quotes is.)  That is, I want the standard Java equivalent of Python's
>>>cgi.escape(), or Perl's CGI::escapeHTML().
>>
>>I am not aware of a standard utility for doing so, I have written my own
>>utility class which escapes data for encapsulation in XML/HTML.  I am sure
>>that one exists out there somewhere, though.
>>
>>-Dave
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>
>>
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


Re: HTML quoting

Posted by Filip Hanik <de...@hanik.com>.
encoding a URL is not the same as encoding HTML, or is it?
----- Original Message -----
From: "Andy Eastham" <an...@barllama.demon.co.uk>
To: "Tomcat Users List" <to...@jakarta.apache.org>
Sent: Thursday, October 02, 2003 2:55 PM
Subject: RE: HTML quoting


Greg,

"Standard" one is java.net.URLEncoder.encode() and
java.net.URLEncoder.decode()

Andy

> -----Original Message-----
> From: David Rees [mailto:drees@greenhydrant.com]
> Sent: 02 October 2003 22:25
> To: Tomcat Users List
> Subject: Re: HTML quoting
>
>
> On Thu, October 2, 2003 at 2:18 pm, Greg Ward sent the following
> > What's the standard way of quoting text for inclusion in a web page in
> > Java?  Ie. I need a method to convert the string
> >
> >   Jeb said, "Hell & damnation! Is 5 > 4?"
> >
> > to
> >
> >   Jeb said, "Hell & damnation! Is 5 > 4?"
> >
> > (I think: I've never been entirely sure what the right way to handle
> > quotes is.)  That is, I want the standard Java equivalent of Python's
> > cgi.escape(), or Perl's CGI::escapeHTML().
>
> I am not aware of a standard utility for doing so, I have written my own
> utility class which escapes data for encapsulation in XML/HTML.  I am sure
> that one exists out there somewhere, though.
>
> -Dave
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


RE: HTML quoting

Posted by Andy Eastham <an...@barllama.demon.co.uk>.
Greg,

"Standard" one is java.net.URLEncoder.encode() and
java.net.URLEncoder.decode()

Andy

> -----Original Message-----
> From: David Rees [mailto:drees@greenhydrant.com]
> Sent: 02 October 2003 22:25
> To: Tomcat Users List
> Subject: Re: HTML quoting
>
>
> On Thu, October 2, 2003 at 2:18 pm, Greg Ward sent the following
> > What's the standard way of quoting text for inclusion in a web page in
> > Java?  Ie. I need a method to convert the string
> >
> >   Jeb said, "Hell & damnation! Is 5 > 4?"
> >
> > to
> >
> >   Jeb said, "Hell & damnation! Is 5 > 4?"
> >
> > (I think: I've never been entirely sure what the right way to handle
> > quotes is.)  That is, I want the standard Java equivalent of Python's
> > cgi.escape(), or Perl's CGI::escapeHTML().
>
> I am not aware of a standard utility for doing so, I have written my own
> utility class which escapes data for encapsulation in XML/HTML.  I am sure
> that one exists out there somewhere, though.
>
> -Dave
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


Re: HTML quoting

Posted by David Rees <dr...@greenhydrant.com>.
On Thu, October 2, 2003 at 2:18 pm, Greg Ward sent the following
> What's the standard way of quoting text for inclusion in a web page in
> Java?  Ie. I need a method to convert the string
>
>   Jeb said, "Hell & damnation! Is 5 > 4?"
>
> to
>
>   Jeb said, "Hell & damnation! Is 5 > 4?"
>
> (I think: I've never been entirely sure what the right way to handle
> quotes is.)  That is, I want the standard Java equivalent of Python's
> cgi.escape(), or Perl's CGI::escapeHTML().

I am not aware of a standard utility for doing so, I have written my own
utility class which escapes data for encapsulation in XML/HTML.  I am sure
that one exists out there somewhere, though.

-Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


Re: HTML quoting

Posted by Greg Ward <gw...@python.net>.
On 02 October 2003, Tim Funk said:
> JSTL by default escapes all output to be HTML friendly
> <c:out value="${myValue}"/>
> and to disable:
> <c:out value="${myValue}" escapeXML="false"/>

Alas, I'm working on some crufty old servlets that are chock full of

  System.out.println( ... boatloads of HTML ...);

We plan to move to a real template language one of these days, but for the
time being we're stuck maintaining this vile code.  ;-(

I asked on the Tomcat list because I know that lots of people who know lots
about web development with Java hang out here...

        Greg

---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


Re: HTML quoting

Posted by Tim Funk <fu...@joedog.org>.
JSTL by default escapes all output to be HTML friendly
<c:out value="${myValue}"/>
and to disable:
<c:out value="${myValue}" escapeXML="false"/>

-Tim

Greg Ward wrote:
> What's the standard way of quoting text for inclusion in a web page in
> Java?  Ie. I need a method to convert the string
> 
>   Jeb said, "Hell & damnation! Is 5 > 4?"
> 
> to
> 
>   Jeb said, &quot;Hell &amp; damnation! Is 5 &gt; 4?&quot;
> 
> (I think: I've never been entirely sure what the right way to handle
> quotes is.)  That is, I want the standard Java equivalent of Python's
> cgi.escape(), or Perl's CGI::escapeHTML().



---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


RE: HTML quoting

Posted by David Rees <dr...@greenhydrant.com>.
On Thu, October 2, 2003 at 2:37 pm, Mike Curwen sent the following
> How about:
> org.apache.commons.lang.StringEscapeUtils  ?
> which:
> Escapes and unescapes Strings for Java, Java Script, HTML, XML, and SQL.

Thanks for the pointer!  That's probably a lot more effective than my
home-brew version.

-Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


RE: HTML quoting

Posted by Mike Curwen <gb...@gb-im.com>.
How about:
org.apache.commons.lang.StringEscapeUtils  ?
which:
Escapes and unescapes Strings for Java, Java Script, HTML, XML, and SQL.


> -----Original Message-----
> From: Greg Ward [mailto:gward-work@python.net] 
> Sent: Thursday, October 02, 2003 4:18 PM
> To: tomcat-user@jakarta.apache.org
> Subject: HTML quoting
> 
> 
> What's the standard way of quoting text for inclusion in a 
> web page in Java?  Ie. I need a method to convert the string
> 
>   Jeb said, "Hell & damnation! Is 5 > 4?"
> 
> to
> 
>   Jeb said, &quot;Hell &amp; damnation! Is 5 &gt; 4?&quot;
> 
> (I think: I've never been entirely sure what the right way to 
> handle quotes is.)  That is, I want the standard Java 
> equivalent of Python's cgi.escape(), or Perl's CGI::escapeHTML().
> 
> To my utter amazement, I cannot find any indication that such 
> a method even exists in the standard Java library!  (I tried 
> Google'ing and poking through the JDK 1.4 docs.)
> 
> So I went looking in the source for Tomcat 4.1.27 -- surely 
> the HTML version of the manager app must quote at least the 
> webapp's display name, since it comes from a user-supplied 
> file and therefore might contain funny characters.  
> Surprisingly, the manager just lets funny characters through 
> without touching them.  Eg. if you put
> 
>   <display-name>foo &amp; bar webapp</display-name>
> 
> then "&amp;" is translated back to "&" by some part of the 
> XML-parsing chain, and is emitted as "&" in the manager HTML 
> page.  Most browsers can deal with minor violations like 
> this, but it's still technically incorrect.  Just for fun I 
> tried this:
> 
>   <display-name>my 
> &lt;script&gt;alert("foo");&lt;/script&gt;</display-name>
> 
> ...and it works!  The manager emits this HTML:
> 
>  <td class="row-left"><small>my 
> <script>alert("foo");</script> webapp</small></td>
> 
> and my browser pops up a JavaScript window while rendering 
> the manager page.  Cool!  I doubt this is a security hole -- 
> not many people can edit web.xml! -- but surely it at least 
> counts as a rendering bug.  ;-)
> 
> So: can someone tell me what the standard way of quoting text 
> for inclusion in a web page generated by a Java web application is?
> 
> Thanks!
> 
>         Greg
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


RE: HTML quoting

Posted by George Sexton <gs...@mhsoftware.com>.
Here is code that we use. It depends on our own variant of String
Buffer, but you get the idea:

public  String htmlEncode(String cVal) {
	if (cVal==null || cVal.length()==0) {
		return "";
	}
	MHBuffer buf=new MHBuffer(cVal.length()<<2);
	final String[] aOld=    {"&",    "<",   ">",   "\""};
	final String[] aReplace={"&amp;","&lt;","&gt;","&quot;"};

	buf.append(cVal);

	for (int i=0; i < aOld.length; i++) {
		buf.replace(aOld[i],aReplace[i]);	
	}
	return buf.toString();
}



-----Original Message-----
From: Greg Ward [mailto:gward-work@python.net] 
Sent: Thursday, October 02, 2003 3:18 PM
To: tomcat-user@jakarta.apache.org
Subject: HTML quoting


What's the standard way of quoting text for inclusion in a web page in
Java?  Ie. I need a method to convert the string

  Jeb said, "Hell & damnation! Is 5 > 4?"

to

  Jeb said, &quot;Hell &amp; damnation! Is 5 &gt; 4?&quot;

(I think: I've never been entirely sure what the right way to handle
quotes is.)  That is, I want the standard Java equivalent of Python's
cgi.escape(), or Perl's CGI::escapeHTML().

To my utter amazement, I cannot find any indication that such a method
even exists in the standard Java library!  (I tried Google'ing and
poking through the JDK 1.4 docs.)

So I went looking in the source for Tomcat 4.1.27 -- surely the HTML
version of the manager app must quote at least the webapp's display
name, since it comes from a user-supplied file and therefore might
contain funny characters.  Surprisingly, the manager just lets funny
characters through without touching them.  Eg. if you put

  <display-name>foo &amp; bar webapp</display-name>

then "&amp;" is translated back to "&" by some part of the XML-parsing
chain, and is emitted as "&" in the manager HTML page.  Most browsers
can deal with minor violations like this, but it's still technically
incorrect.  Just for fun I tried this:

  <display-name>my
&lt;script&gt;alert("foo");&lt;/script&gt;</display-name>

...and it works!  The manager emits this HTML:

 <td class="row-left"><small>my <script>alert("foo");</script>
webapp</small></td>

and my browser pops up a JavaScript window while rendering the manager
page.  Cool!  I doubt this is a security hole -- not many people can
edit web.xml! -- but surely it at least counts as a rendering bug.  ;-)

So: can someone tell me what the standard way of quoting text for
inclusion in a web page generated by a Java web application is?

Thanks!

        Greg

---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org