You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by Vitor Costa <fv...@yahoo.com.br> on 2010/08/25 23:50:45 UTC
[lang] StringEscapeUtils.unescapeHtml(" ") doesn't return a space
Hi,
I am writing a crawler to get some info on web pages and I am using commons lang
to unescape the html file.
I was having some problems with my regex expressions until I realized that the
following is printing false:
System.out.println(" ".equals(StringEscapeUtils. unescapeHtml(" ")));
Is this a bug? Or is it the expected behavior of the unescape method when
dealing with escaped space characters?
Also, if I unescape 'sbrubbles&nbps;' and then trim() it, the space still
appears in the end of the string.
Visually speaking, unescaping ' ' returns a space. But programmatically
speaking, the system doesn't recognize it as a space character.
Thanks in advance,
Vitor.
Re: [lang] StringEscapeUtils.unescapeHtml(" ") doesn't return a space
Posted by "E. Michael Akerman" <mi...@exchange.uark.edu>.
I'm not certain how StringEscapeUtils handles it, but in HTML land, it should be equal to character 160 instead of 32. It has
different meaning than space.
Michael Akerman
Systems Analyst
University IT Services
----- Original Message -----
From: "Vitor Costa" <fv...@yahoo.com.br>
To: <us...@commons.apache.org>
Sent: Wednesday, August 25, 2010 4:50 PM
Subject: [lang] StringEscapeUtils.unescapeHtml(" ") doesn't return a space
Hi,
I am writing a crawler to get some info on web pages and I am using commons lang
to unescape the html file.
I was having some problems with my regex expressions until I realized that the
following is printing false:
System.out.println(" ".equals(StringEscapeUtils. unescapeHtml(" ")));
Is this a bug? Or is it the expected behavior of the unescape method when
dealing with escaped space characters?
Also, if I unescape 'sbrubbles&nbps;' and then trim() it, the space still
appears in the end of the string.
Visually speaking, unescaping ' ' returns a space. But programmatically
speaking, the system doesn't recognize it as a space character.
Thanks in advance,
Vitor.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org