You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Sebb (JIRA)" <ji...@apache.org> on 2017/02/06 17:35:41 UTC

[jira] [Reopened] (TEXT-40) Escape HTML characters only once

     [ https://issues.apache.org/jira/browse/TEXT-40?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebb reopened TEXT-40:
----------------------

How does one know whether the input has already been escaped or not?

If the input only contains unescaped characters then it has not been escaped.

But if there is a mixture, or if all the input contains escaped enties how can one know?

For example the input could be text explaining how to escape an ampersand.

I don't think these methods make any sense; the only way to be sure whether input has been escaped or not is to keep track of the state in the application.

As an analogy, how about a method that multiplies numbers by 10 unless they have already been multiplied by 10?

> Escape HTML characters only once
> --------------------------------
>
>                 Key: TEXT-40
>                 URL: https://issues.apache.org/jira/browse/TEXT-40
>             Project: Commons Text
>          Issue Type: Improvement
>            Reporter: Sampanna Kahu
>            Assignee: Rob Tompkins
>            Priority: Minor
>              Labels: features, newbie
>
> If already escaped HTML characters are in the input test, they get escaped again using StringEscapeUtils.escapeHtml4().
> For example:
> If the input is:
> 100 kg & l t ; 1000kg <without the spaces>
> Then the output of escapeHtml4() becomes:
> 100kg & amp ; l t ; 1000kg <without the spaces>
> At my workplace, we felt the need for a method in StringEscapeUtils which does not escape already escaped characters.
> I have attempted to create this method. Creating a pull request soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)