You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@roller.apache.org by Susanne Gladén <su...@gmail.com> on 2011/04/04 14:25:58 UTC

RC5 tests - encoding error in comments

Hi,

I think I have found a bug in the code concerning weblogentry comments.

If I add a comment to a weblog entry:  "Fint väder idag"
Then the comment is displayed as: "Fint v&auml;der idag"


In WeblogEntryCommentWrapper.java in method getContent()

    public String getContent() {

        String content = this.pojo.getContent();

        // escape content if content-type is text/plain
        if("text/plain".equals(this.pojo.getContentType())) {
            content = StringEscapeUtils.escapeHtml(content);
        }

        // apply plugins
        PluginManager pmgr = WebloggerFactory.getWeblogger().getPluginManager();
        content = pmgr.applyCommentPlugins(this.pojo, content);

        // always add rel=nofollow for links
        content = Utilities.addNofollow(content);

        return content;
    }

First the content is transformed in:

        if("text/plain".equals(this.pojo.getContentType())) {
            content = StringEscapeUtils.escapeHtml(content);
        }

Then the content transformed once again in HTMLSubsetPlugin.java    (
content = pmgr.applyCommentPlugins(this.pojo, content);)


This makes the string escaped twice.


I found in Utilities.transformToHTMLSubset(String s)  that you try to
make a fix for this problem for some characters by calling s.replace(
... )
But its difficult to list all Latin1 characters  ...


/Susanne

Re: RC5 tests - encoding error in comments

Posted by Dave <sn...@gmail.com>.
On Mon, Apr 4, 2011 at 8:25 AM, Susanne Gladén <su...@gmail.com> wrote:
> I think I have found a bug in the code concerning weblogentry comments.
>
> If I add a comment to a weblog entry:  "Fint väder idag"
> Then the comment is displayed as: "Fint v&auml;der idag"
>
>
> In WeblogEntryCommentWrapper.java in method getContent()
>
>    public String getContent() {
>
>        String content = this.pojo.getContent();
>
>        // escape content if content-type is text/plain
>        if("text/plain".equals(this.pojo.getContentType())) {
>            content = StringEscapeUtils.escapeHtml(content);
>        }
>
>        // apply plugins
>        PluginManager pmgr = WebloggerFactory.getWeblogger().getPluginManager();
>        content = pmgr.applyCommentPlugins(this.pojo, content);
>
>        // always add rel=nofollow for links
>        content = Utilities.addNofollow(content);
>
>        return content;
>    }
>
> First the content is transformed in:
>
>        if("text/plain".equals(this.pojo.getContentType())) {
>            content = StringEscapeUtils.escapeHtml(content);
>        }
>
> Then the content transformed once again in HTMLSubsetPlugin.java    (
> content = pmgr.applyCommentPlugins(this.pojo, content);)
>
>
> This makes the string escaped twice.
>
>
> I found in Utilities.transformToHTMLSubset(String s)  that you try to
> make a fix for this problem for some characters by calling s.replace(
> ... )

The transformToSafeHTMLSubset() is designed to unescape only the HTML
tags that are considered safe.

In WeblogEntryCommentWrapper we only escape content if the content is
text/plain, meaning that HTML comments are disabled.

I believe the fix is to change transformToSafeHTMLSubset() to act only
when the comment is text/html and therefore needs safe subsetting.

Thanks,
Dave