You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Sampanna Kahu (JIRA)" <ji...@apache.org> on 2017/02/18 17:16:44 UTC

[jira] [Comment Edited] (TEXT-40) Escape HTML characters only once

    [ https://issues.apache.org/jira/browse/TEXT-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873258#comment-15873258 ] 

Sampanna Kahu edited comment on TEXT-40 at 2/18/17 5:16 PM:
------------------------------------------------------------

Good point Sebb.
I can try to answer your question.
bq. "How does one know whether the input has already been escaped or not?"
The original escapeHtml() method uses a look-up table (LUT).
So, when deciding whether a string has already been escaped or not, we use the same LUT. We check whether the string to be escaped is present in any of escaped values of the LUT. If it is, then we conclude that it has already been escaped and we skip escaping it, thereby preventing double escaping.
This is the approach I've used in the escapeHtml3Once() and escapeHtml4Once() methods.

The description of this jira ticket (TEXT-40) describes an example where this functionality adds value.

Did I somewhat answer your question, Sebb? Is that what you are asking?
I am open to discussions :-)


was (Author: sampyash):
Good point Sebb.
I can try to answer your question.
bq. "How does one know whether the input has already been escaped or not?"
The original escapeHtml() method uses a look-up table (LUT).
So, when deciding whether a string is escaped or not, we use the same LUT. We check whether the string to be escaped is present in any of escaped values of the LUT. If it is, then we conclude that it has already been escaped and we skip escaping it, thereby preventing double escaping.
This is the approach I've used in the escapeHtml3Once() and escapeHtml4Once() methods.

The description of this jira ticket (TEXT-40) describes an example where this functionality adds value.

Did I somewhat answer your question, Sebb? Is that what you are asking?
I am open to discussions :-)

> Escape HTML characters only once
> --------------------------------
>
>                 Key: TEXT-40
>                 URL: https://issues.apache.org/jira/browse/TEXT-40
>             Project: Commons Text
>          Issue Type: Improvement
>            Reporter: Sampanna Kahu
>            Assignee: Rob Tompkins
>            Priority: Minor
>              Labels: features, newbie
>
> If already escaped HTML characters are in the input test, they get escaped again using StringEscapeUtils.escapeHtml4().
> For example:
> If the input is:
> 100 kg & l t ; 1000kg <without the spaces>
> Then the output of escapeHtml4() becomes:
> 100kg & amp ; l t ; 1000kg <without the spaces>
> At my workplace, we felt the need for a method in StringEscapeUtils which does not escape already escaped characters.
> I have attempted to create this method. Creating a pull request soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)