You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Duncan Jones (JIRA)" <ji...@apache.org> on 2014/10/24 16:55:34 UTC
[jira] [Comment Edited] (LANG-1056) StringEscapeUtils.unescapeHtml4
java.lang.IllegalArgumentException
[ https://issues.apache.org/jira/browse/LANG-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14182854#comment-14182854 ]
Duncan Jones edited comment on LANG-1056 at 10/24/14 2:54 PM:
--------------------------------------------------------------
The Javadocs are not overly clear on this subject:
bq. If an entity is unrecognized, it is left alone, and inserted verbatim into the result string. e.g. "&gt;&zzzz;x" will become ">&zzzz;x".
The tricky word here is "unrecognized". I think {{�}} is recognised as an escaped Unicode character, but it fails during conversion. That's probably a different scenario to not _recognising_ an invalid entity like {{&zzz;}}.
I would suggest the docs are vague enough to support action in either direction. We either declare this is a bug and fix it or we decide it's good behaviour, but update the Javadocs to make it clearer this will happen.
I welcome comments from others. I think the original intention here was for no exceptions to be thrown, so I'd be in favour of calling this a bug.
was (Author: dmjones500):
The Javadocs are not overly clear on this subject, but I would suggest this isn't a bug. The docs say:
bq. If an entity is unrecognized, it is left alone, and inserted verbatim into the result string. e.g. "&gt;&zzzz;x" will become ">&zzzz;x".
The tricky word here is "unrecognized". I think {{�}} is recognised as an escaped Unicode character, but it fails during conversion. That's probably a different scenario to not _recognising_ an invalid entity like {{&zzz;}}.
I would suggest the docs are vague enough to support action in either direction. We either declare this is a bug and fix it or we decide it's good behaviour, but update the Javadocs to make it clearer this will happen.
I welcome comments from others. I think the original intention here was for no exceptions to be thrown, so I'd be in favour of calling this a bug.
> StringEscapeUtils.unescapeHtml4 java.lang.IllegalArgumentException
> ------------------------------------------------------------------
>
> Key: LANG-1056
> URL: https://issues.apache.org/jira/browse/LANG-1056
> Project: Commons Lang
> Issue Type: Bug
> Components: lang.*
> Affects Versions: 3.3.2
> Environment: Ubuntu 64
> Reporter: Jakub
>
> When I try to unescape
> {code:java}
> String test = "test �";
> StringEscapeUtils.unescapeHtml4(test);
> {code}
> I got :
> {noformat}
> java.lang.IllegalArgumentException
> at java.lang.Character.toChars(Character.java:4982)
> at org.apache.commons.lang3.text.translate.NumericEntityUnescaper.translate(NumericEntityUnescaper.java:128)
> at org.apache.commons.lang3.text.translate.AggregateTranslator.translate(AggregateTranslator.java:52)
> at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:85)
> at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:59)
> at org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(StringEscapeUtils.java:627)
> at unescapeHtml4Test.Main.main(Main.java:10)
> {noformat}
> It is bug or not? Method should return "test �" without exception or not?.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)