You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Greg Ward <gw...@python.net> on 2003/10/02 23:18:07 UTC
HTML quoting
What's the standard way of quoting text for inclusion in a web page in
Java? Ie. I need a method to convert the string
Jeb said, "Hell & damnation! Is 5 > 4?"
to
Jeb said, "Hell & damnation! Is 5 > 4?"
(I think: I've never been entirely sure what the right way to handle
quotes is.) That is, I want the standard Java equivalent of Python's
cgi.escape(), or Perl's CGI::escapeHTML().
To my utter amazement, I cannot find any indication that such a method
even exists in the standard Java library! (I tried Google'ing and
poking through the JDK 1.4 docs.)
So I went looking in the source for Tomcat 4.1.27 -- surely the HTML
version of the manager app must quote at least the webapp's display
name, since it comes from a user-supplied file and therefore might
contain funny characters. Surprisingly, the manager just lets funny
characters through without touching them. Eg. if you put
<display-name>foo & bar webapp</display-name>
then "&" is translated back to "&" by some part of the XML-parsing
chain, and is emitted as "&" in the manager HTML page. Most browsers
can deal with minor violations like this, but it's still technically
incorrect. Just for fun I tried this:
<display-name>my <script>alert("foo");</script></display-name>
...and it works! The manager emits this HTML:
<td class="row-left"><small>my <script>alert("foo");</script> webapp</small></td>
and my browser pops up a JavaScript window while rendering the manager
page. Cool! I doubt this is a security hole -- not many people can
edit web.xml! -- but surely it at least counts as a rendering bug. ;-)
So: can someone tell me what the standard way of quoting text for
inclusion in a web page generated by a Java web application is?
Thanks!
Greg
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
Re: HTML quoting
Posted by Christopher Williams <cc...@ntlworld.com>.
>
> It is obvious then that a space would be   or   since 32 is
> the ascii code for a space. Though i cannot quite figure out why you
> would want to escape a space...
>
I escape spaces and character entities in form fields; if you do the
following:
<input name="x" size=20 maxlength=20 value="<%= someVal %>">
and someVal contains space characters (or worse, '>'), it won't display
properly. If you escape the spaces, it will, and this is what I use the
method for.
If I'm emitting HTML where I know what it will be beforehand, I simply
include any appropriate character entities in my resource strings.
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
Re: HTML quoting
Posted by drm <dr...@melp.nl>.
If you refuse to or cannot use the
org.apache.commons.lang.StringEscapeUtils (noted by Mike Curwen here),
or the JSP functionality (noted by Tim Funk) it might help to know that
a character entity doesn't have to be named. All characters have their
respective character entity using the format &#nn..; or &#xhh..; where
'nn..' and 'hh..' are their respective character index in the current
encoding, decimal and hexadecimal respectively.
It is obvious then that a space would be   or   since 32 is
the ascii code for a space. Though i cannot quite figure out why you
would want to escape a space...
The characters you want to escape are outside the bounds ascii
33 <= c <= 127
and c == '<', c == '>', c == '&', c == '\''
so this would be (imho) the better method:
xmp:
public static String escapeHtml ( String s ) {
StringBuffer buffer = new StringBuffer ();
for ( int i = 0; i < s.length(); i ++ ) {
char c = s.charAt ( i );
if ( c < 32 || c > 127 || c == '<' || c == '>' || c == '&' || c
== '\'' ) {
buffer.append ( "&#" + (int)c + ";" );
} else {
buffer.append ( c );
}
}
return buffer.toString ();
}
One might consider ordering the conditions in the if statement by
occurrence probability to improve performance...
HTH,
drm
Christopher Williams wrote:
> Here's a simple method to quote the most important character entities:
>
> /**
> * Handles a couple of problematic characters in strings that are
> printed to
> * an HTML stream, replacing them with their escaped equivalents
> * @param s an input string
> * @return the escaped string
> */
> public static String escapeSpaces(String s)
> {
> StringBuffer sb = new StringBuffer();
> int nChars = s.length();
> for (int i = 0; i < nChars; i++)
> {
> char c = s.charAt(i);
> if (' ' == c)
> {
> sb.append(" ");
> }
> else if ('>' == c)
> {
> sb.append(">");
> }
> else if ('<' == c)
> {
> sb.append("<");
> }
> else if ('\"' == c)
> {
> sb.append(""");
> }
> else if ('&' == c)
> {
> sb.append("&");
> }
> else
> {
> sb.append(c);
> }
> }
> return sb.toString();
> }
>
> A more complete solution would be to look up the complete list of character
> entities (e.g 'HTML and XHTML The Definitive Guide'), build a lookup table
> and use each character as an index into that table.
>
> Chris Williams.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
Re: HTML quoting
Posted by Christopher Williams <cc...@ntlworld.com>.
Here's a simple method to quote the most important character entities:
/**
* Handles a couple of problematic characters in strings that are
printed to
* an HTML stream, replacing them with their escaped equivalents
* @param s an input string
* @return the escaped string
*/
public static String escapeSpaces(String s)
{
StringBuffer sb = new StringBuffer();
int nChars = s.length();
for (int i = 0; i < nChars; i++)
{
char c = s.charAt(i);
if (' ' == c)
{
sb.append(" ");
}
else if ('>' == c)
{
sb.append(">");
}
else if ('<' == c)
{
sb.append("<");
}
else if ('\"' == c)
{
sb.append(""");
}
else if ('&' == c)
{
sb.append("&");
}
else
{
sb.append(c);
}
}
return sb.toString();
}
A more complete solution would be to look up the complete list of character
entities (e.g 'HTML and XHTML The Definitive Guide'), build a lookup table
and use each character as an index into that table.
Chris Williams.
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
Re: HTML quoting
Posted by Greg Ward <gw...@python.net>.
On 02 October 2003, Andy Eastham said:
> "Standard" one is java.net.URLEncoder.encode() and
> java.net.URLEncoder.decode()
No. HTML quoting and URL encoding are quite different.
URLEncoder.encode() on my test string returns
Jeb+said%2C+%22Hell+%26+damnation%21+Is+5+%3E+4%3F%22
(It would also be correct to replace every space with "%20".)
HTML quoting is for handling < > & and maybe " '. Eg. Python's cgi.escape()
returns
Jeb said, "Hell & damnation! Is 5 > 4?"
while Perl's CGI::escapeHTML() returns
Jeb said, "Hell & damnation! Is 5 > 4?"
Hmmm, I see that Python's cgi.escape() has an optional arg to specify
quoting " characters. Looks like Perl's CGI::escapeHTML() always quotes
quotes. That's essential for cases like
out.print("<input name=\"foo\" value=\"" + someUserSuppliedValue + "\"")
-- if someUserSuppliedValue has " characters in it, they must be quoted!
Greg
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
Re: HTML quoting
Posted by drm <dr...@melp.nl>.
Nope. 2 different things.
Filip Hanik wrote:
> encoding a URL is not the same as encoding HTML, or is it?
> ----- Original Message -----
> From: "Andy Eastham" <an...@barllama.demon.co.uk>
> To: "Tomcat Users List" <to...@jakarta.apache.org>
> Sent: Thursday, October 02, 2003 2:55 PM
> Subject: RE: HTML quoting
>
>
> Greg,
>
> "Standard" one is java.net.URLEncoder.encode() and
> java.net.URLEncoder.decode()
>
> Andy
>
>
>>-----Original Message-----
>>From: David Rees [mailto:drees@greenhydrant.com]
>>Sent: 02 October 2003 22:25
>>To: Tomcat Users List
>>Subject: Re: HTML quoting
>>
>>
>>On Thu, October 2, 2003 at 2:18 pm, Greg Ward sent the following
>>
>>>What's the standard way of quoting text for inclusion in a web page in
>>>Java? Ie. I need a method to convert the string
>>>
>>> Jeb said, "Hell & damnation! Is 5 > 4?"
>>>
>>>to
>>>
>>> Jeb said, "Hell & damnation! Is 5 > 4?"
>>>
>>>(I think: I've never been entirely sure what the right way to handle
>>>quotes is.) That is, I want the standard Java equivalent of Python's
>>>cgi.escape(), or Perl's CGI::escapeHTML().
>>
>>I am not aware of a standard utility for doing so, I have written my own
>>utility class which escapes data for encapsulation in XML/HTML. I am sure
>>that one exists out there somewhere, though.
>>
>>-Dave
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>
>>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
Re: HTML quoting
Posted by Filip Hanik <de...@hanik.com>.
encoding a URL is not the same as encoding HTML, or is it?
----- Original Message -----
From: "Andy Eastham" <an...@barllama.demon.co.uk>
To: "Tomcat Users List" <to...@jakarta.apache.org>
Sent: Thursday, October 02, 2003 2:55 PM
Subject: RE: HTML quoting
Greg,
"Standard" one is java.net.URLEncoder.encode() and
java.net.URLEncoder.decode()
Andy
> -----Original Message-----
> From: David Rees [mailto:drees@greenhydrant.com]
> Sent: 02 October 2003 22:25
> To: Tomcat Users List
> Subject: Re: HTML quoting
>
>
> On Thu, October 2, 2003 at 2:18 pm, Greg Ward sent the following
> > What's the standard way of quoting text for inclusion in a web page in
> > Java? Ie. I need a method to convert the string
> >
> > Jeb said, "Hell & damnation! Is 5 > 4?"
> >
> > to
> >
> > Jeb said, "Hell & damnation! Is 5 > 4?"
> >
> > (I think: I've never been entirely sure what the right way to handle
> > quotes is.) That is, I want the standard Java equivalent of Python's
> > cgi.escape(), or Perl's CGI::escapeHTML().
>
> I am not aware of a standard utility for doing so, I have written my own
> utility class which escapes data for encapsulation in XML/HTML. I am sure
> that one exists out there somewhere, though.
>
> -Dave
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
RE: HTML quoting
Posted by Andy Eastham <an...@barllama.demon.co.uk>.
Greg,
"Standard" one is java.net.URLEncoder.encode() and
java.net.URLEncoder.decode()
Andy
> -----Original Message-----
> From: David Rees [mailto:drees@greenhydrant.com]
> Sent: 02 October 2003 22:25
> To: Tomcat Users List
> Subject: Re: HTML quoting
>
>
> On Thu, October 2, 2003 at 2:18 pm, Greg Ward sent the following
> > What's the standard way of quoting text for inclusion in a web page in
> > Java? Ie. I need a method to convert the string
> >
> > Jeb said, "Hell & damnation! Is 5 > 4?"
> >
> > to
> >
> > Jeb said, "Hell & damnation! Is 5 > 4?"
> >
> > (I think: I've never been entirely sure what the right way to handle
> > quotes is.) That is, I want the standard Java equivalent of Python's
> > cgi.escape(), or Perl's CGI::escapeHTML().
>
> I am not aware of a standard utility for doing so, I have written my own
> utility class which escapes data for encapsulation in XML/HTML. I am sure
> that one exists out there somewhere, though.
>
> -Dave
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
Re: HTML quoting
Posted by David Rees <dr...@greenhydrant.com>.
On Thu, October 2, 2003 at 2:18 pm, Greg Ward sent the following
> What's the standard way of quoting text for inclusion in a web page in
> Java? Ie. I need a method to convert the string
>
> Jeb said, "Hell & damnation! Is 5 > 4?"
>
> to
>
> Jeb said, "Hell & damnation! Is 5 > 4?"
>
> (I think: I've never been entirely sure what the right way to handle
> quotes is.) That is, I want the standard Java equivalent of Python's
> cgi.escape(), or Perl's CGI::escapeHTML().
I am not aware of a standard utility for doing so, I have written my own
utility class which escapes data for encapsulation in XML/HTML. I am sure
that one exists out there somewhere, though.
-Dave
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
Re: HTML quoting
Posted by Greg Ward <gw...@python.net>.
On 02 October 2003, Tim Funk said:
> JSTL by default escapes all output to be HTML friendly
> <c:out value="${myValue}"/>
> and to disable:
> <c:out value="${myValue}" escapeXML="false"/>
Alas, I'm working on some crufty old servlets that are chock full of
System.out.println( ... boatloads of HTML ...);
We plan to move to a real template language one of these days, but for the
time being we're stuck maintaining this vile code. ;-(
I asked on the Tomcat list because I know that lots of people who know lots
about web development with Java hang out here...
Greg
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
Re: HTML quoting
Posted by Tim Funk <fu...@joedog.org>.
JSTL by default escapes all output to be HTML friendly
<c:out value="${myValue}"/>
and to disable:
<c:out value="${myValue}" escapeXML="false"/>
-Tim
Greg Ward wrote:
> What's the standard way of quoting text for inclusion in a web page in
> Java? Ie. I need a method to convert the string
>
> Jeb said, "Hell & damnation! Is 5 > 4?"
>
> to
>
> Jeb said, "Hell & damnation! Is 5 > 4?"
>
> (I think: I've never been entirely sure what the right way to handle
> quotes is.) That is, I want the standard Java equivalent of Python's
> cgi.escape(), or Perl's CGI::escapeHTML().
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
RE: HTML quoting
Posted by David Rees <dr...@greenhydrant.com>.
On Thu, October 2, 2003 at 2:37 pm, Mike Curwen sent the following
> How about:
> org.apache.commons.lang.StringEscapeUtils ?
> which:
> Escapes and unescapes Strings for Java, Java Script, HTML, XML, and SQL.
Thanks for the pointer! That's probably a lot more effective than my
home-brew version.
-Dave
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
RE: HTML quoting
Posted by Mike Curwen <gb...@gb-im.com>.
How about:
org.apache.commons.lang.StringEscapeUtils ?
which:
Escapes and unescapes Strings for Java, Java Script, HTML, XML, and SQL.
> -----Original Message-----
> From: Greg Ward [mailto:gward-work@python.net]
> Sent: Thursday, October 02, 2003 4:18 PM
> To: tomcat-user@jakarta.apache.org
> Subject: HTML quoting
>
>
> What's the standard way of quoting text for inclusion in a
> web page in Java? Ie. I need a method to convert the string
>
> Jeb said, "Hell & damnation! Is 5 > 4?"
>
> to
>
> Jeb said, "Hell & damnation! Is 5 > 4?"
>
> (I think: I've never been entirely sure what the right way to
> handle quotes is.) That is, I want the standard Java
> equivalent of Python's cgi.escape(), or Perl's CGI::escapeHTML().
>
> To my utter amazement, I cannot find any indication that such
> a method even exists in the standard Java library! (I tried
> Google'ing and poking through the JDK 1.4 docs.)
>
> So I went looking in the source for Tomcat 4.1.27 -- surely
> the HTML version of the manager app must quote at least the
> webapp's display name, since it comes from a user-supplied
> file and therefore might contain funny characters.
> Surprisingly, the manager just lets funny characters through
> without touching them. Eg. if you put
>
> <display-name>foo & bar webapp</display-name>
>
> then "&" is translated back to "&" by some part of the
> XML-parsing chain, and is emitted as "&" in the manager HTML
> page. Most browsers can deal with minor violations like
> this, but it's still technically incorrect. Just for fun I
> tried this:
>
> <display-name>my
> <script>alert("foo");</script></display-name>
>
> ...and it works! The manager emits this HTML:
>
> <td class="row-left"><small>my
> <script>alert("foo");</script> webapp</small></td>
>
> and my browser pops up a JavaScript window while rendering
> the manager page. Cool! I doubt this is a security hole --
> not many people can edit web.xml! -- but surely it at least
> counts as a rendering bug. ;-)
>
> So: can someone tell me what the standard way of quoting text
> for inclusion in a web page generated by a Java web application is?
>
> Thanks!
>
> Greg
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
RE: HTML quoting
Posted by George Sexton <gs...@mhsoftware.com>.
Here is code that we use. It depends on our own variant of String
Buffer, but you get the idea:
public String htmlEncode(String cVal) {
if (cVal==null || cVal.length()==0) {
return "";
}
MHBuffer buf=new MHBuffer(cVal.length()<<2);
final String[] aOld= {"&", "<", ">", "\""};
final String[] aReplace={"&","<",">","""};
buf.append(cVal);
for (int i=0; i < aOld.length; i++) {
buf.replace(aOld[i],aReplace[i]);
}
return buf.toString();
}
-----Original Message-----
From: Greg Ward [mailto:gward-work@python.net]
Sent: Thursday, October 02, 2003 3:18 PM
To: tomcat-user@jakarta.apache.org
Subject: HTML quoting
What's the standard way of quoting text for inclusion in a web page in
Java? Ie. I need a method to convert the string
Jeb said, "Hell & damnation! Is 5 > 4?"
to
Jeb said, "Hell & damnation! Is 5 > 4?"
(I think: I've never been entirely sure what the right way to handle
quotes is.) That is, I want the standard Java equivalent of Python's
cgi.escape(), or Perl's CGI::escapeHTML().
To my utter amazement, I cannot find any indication that such a method
even exists in the standard Java library! (I tried Google'ing and
poking through the JDK 1.4 docs.)
So I went looking in the source for Tomcat 4.1.27 -- surely the HTML
version of the manager app must quote at least the webapp's display
name, since it comes from a user-supplied file and therefore might
contain funny characters. Surprisingly, the manager just lets funny
characters through without touching them. Eg. if you put
<display-name>foo & bar webapp</display-name>
then "&" is translated back to "&" by some part of the XML-parsing
chain, and is emitted as "&" in the manager HTML page. Most browsers
can deal with minor violations like this, but it's still technically
incorrect. Just for fun I tried this:
<display-name>my
<script>alert("foo");</script></display-name>
...and it works! The manager emits this HTML:
<td class="row-left"><small>my <script>alert("foo");</script>
webapp</small></td>
and my browser pops up a JavaScript window while rendering the manager
page. Cool! I doubt this is a security hole -- not many people can
edit web.xml! -- but surely it at least counts as a rendering bug. ;-)
So: can someone tell me what the standard way of quoting text for
inclusion in a web page generated by a Java web application is?
Thanks!
Greg
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org