You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@roller.apache.org by Allen Gilliland <al...@sun.com> on 2007/07/09 19:25:20 UTC

comment formatting and escaping is a bit kludgy

i am looking at some ways to improve the comment system and provide 
greater flexibility in how comment content is handled when it's 
submitted, particularly in the realm of html vs. non-html comment 
posting.  xss is becoming so common now that allowing anonymous users to 
post html comments is pretty dangerous and so I am trying to find some 
better ways to allow for comment formatting to take place on non-html 
comments.

after looking through what we have right now i think we need to fix a 
couple issues.  the main problem is with the current autoformat and 
escape html comment settings.

1. these settings are global settings and apply to all comments on all 
blogs, however they are not tracked on a per comment basis and this 
makes it nearly impossible to change these settings without messing up 
old comments.  for example, if you start your site by allowing html in 
comments, then disable html and enable autoformatting then it's going to 
apply that to all your old comments which don't need it and make things 
look pretty ugly :/  same situation applies in reverse as well.  if you 
start by using autoformatting and then disable it then all your old 
comments are going to show up unformatted.

2. these settings are applied in velocity templates and macros and are 
not enforced in any way, so it would be easy for a savvy user to 
circumvent these settings if they really wanted too.  it is more 
appropriate if we apply any plugins or transformations inside the code 
so that we aren't relying on logic in macros/templates.

So I think we need to fix these things first, then I'd like to add a 
more flexible way of applying custom formatting options to comments.  #2 
is very easy to fix since we can just put that logic in the comment 
wrapper class.  fixing #1 is more tricky.

For #1 I would like to introduce the concept of pluggable 
CommentFormatter classes.  Basically, these would work exactly like 
CommentValidators except that instead of just checking the comment for 
some kind of illegal content they would actual be performing 
transformations on the content, much like our weblog entry plugins do 
now.  So site admins would be able to define what comment formatters are 
available to the application and enable/disable them globally to make 
them apply to comments.  We would track what formatters to apply to a 
given comment by adding a new column in the db for 'formatters' and 
listing which formatters to apply to each comment.

We would start by defining formatters to replace our current settings 
for autoformatting and escaping html and via a simple upgrade we can set 
these formatters to all existing comments if those settings are enabled.

Then moving forward it's very easy to add formatters to the application 
and enable/disable them whenever the admin wants.  Since the formatters 
are tracked with the comments it is possible to disable a formatter from 
being used on new comments but have it still available to apply to old 
comments.  This also provides a very simple way for site admins to 
create custom formatters to suit whatever their needs may be, for 
example you could do a wiki syntax comment formatter if you wanted.

Thoughts?

-- Allen


Re: Problems upgrading JRoller to Roller 3.1

Posted by Matthew Schmidt <ma...@dzone.com>.
Replies inline:
>
>
> you'll have to elaborate more on exactly what problems you are seeing? 
> if the old models and macros aren't working at all then the pages 
> would just have lots of $etc variables and #showBlah() calls all over 
> the place.
>
> if the legacy models and macros are working but things just aren't 
> looking formatted properly then it could just be a problem with 
> changes made to the html/css between versions.
>
> if you set ...
>
> rendering.legacyModels.enabled=true
>
> ... in your config file then things should be working, at least for 
> the most part.
>
The problem we're seeing is that the main macro to show the weblog 
entries:  #showWeblogEntries("_day" 15) doesn't appear to be outputting 
anything.  Most of the items in the sidebar such as the bookmark and 
category listings appear to be working. 
> no, the upload scheme you mentioned is not in the current code, so 
> that's a customization you'll have to handle on your own.  if i 
> understand your comment you are doing this ...
>
> ${file.uploads}/a/aaa
> ${file.uploads}/a/arrr
> ${file.uploads}/b/blah
>
> an easier way to keep that working in newer versions would be to 
> subclass or modify the FileManagerImpl and just override the necessary 
> code so that it appends that letter before the handle name.  shouldn't 
> be that hard to do.
>
> -- Allen
>
> \
Thanks, we'll put that code in.  We won't have to modify any macros 
correct?  We're really trying to keep our fork type modifications to a 
minimum so that we can maybe have some chance to keeping the code up to 
date :)

-Matt

Re: Problems upgrading JRoller to Roller 3.1

Posted by Allen Gilliland <al...@sun.com>.

Matthew Schmidt wrote:
> Hi guys.  We're in the middle of upgrading JRoller to Roller 3.1 (and 
> then on to 4.0) but we're having a bit of a problem with our themes.  We 
> have turned on the legacy theme support, but it seems that all of these 
> call macros which apparently don't work anymore.  New blogs that are 
> created in the system seem to work fine as they use the new macros, but 
> the old blogs just don't work.  In addition, most people have customized 
> their blogs in one way or another, making it very difficult for us to 
> just bulk fix these.  Is there a way to make our old themes completely 
> work?


you'll have to elaborate more on exactly what problems you are seeing? 
if the old models and macros aren't working at all then the pages would 
just have lots of $etc variables and #showBlah() calls all over the place.

if the legacy models and macros are working but things just aren't 
looking formatted properly then it could just be a problem with changes 
made to the html/css between versions.

if you set ...

rendering.legacyModels.enabled=true

... in your config file then things should be working, at least for the 
most part.


> Separately I have two other issues.  First, Are you guys doing a 301 
> redirect on the /page to /blogname?  If not, I would recommend it as 
> Google will likely not be transferring the pagerank for these blogs if 
> not.  Secondly, at JRoller, we have a custom file upload scheme that 
> stores each weblog's resources in subdirectories sorted by the first 
> letter of their handle.  Was this code ever rolled into the core?  It 
> was code + an update to an image macro that found the proper location.


yes, the old urls are all 301 permanently moved redirects.

no, the upload scheme you mentioned is not in the current code, so 
that's a customization you'll have to handle on your own.  if i 
understand your comment you are doing this ...

${file.uploads}/a/aaa
${file.uploads}/a/arrr
${file.uploads}/b/blah

an easier way to keep that working in newer versions would be to 
subclass or modify the FileManagerImpl and just override the necessary 
code so that it appends that letter before the handle name.  shouldn't 
be that hard to do.

-- Allen


> 
> Thanks,
> Matt
> 
> Dave wrote:
>> On 7/12/07, Allen Gilliland <al...@sun.com> wrote:
>>> I think that's fine, but it's extra effort to do that and I don't have
>>> time for it right now.  I think just sticking with a simple html or
>>> plain text option will be fine for now, and there won't be any issues
>>> with expanding on that down the road and allowing admins to choose their
>>> own content-type if they want.
>>>
>>> So for right now I'm just going to leave things as is and provide just a
>>> single "Allow HTML in comments?" checkbox, which is effectively the same
>>> functionality we have right now.  If HTML is allowed then the
>>> content-type for incoming comments will be set to text/html, otherwise
>>> it will be set to text/plain.
>>>
>>> Cool?
>>
>> Yep. Sounds good.
>>
>> Thanks,
>> Dave

Problems upgrading JRoller to Roller 3.1

Posted by Matthew Schmidt <ma...@dzone.com>.
Hi guys.  We're in the middle of upgrading JRoller to Roller 3.1 (and 
then on to 4.0) but we're having a bit of a problem with our themes.  We 
have turned on the legacy theme support, but it seems that all of these 
call macros which apparently don't work anymore.  New blogs that are 
created in the system seem to work fine as they use the new macros, but 
the old blogs just don't work.  In addition, most people have customized 
their blogs in one way or another, making it very difficult for us to 
just bulk fix these.  Is there a way to make our old themes completely 
work? 

Separately I have two other issues.  First, Are you guys doing a 301 
redirect on the /page to /blogname?  If not, I would recommend it as 
Google will likely not be transferring the pagerank for these blogs if 
not.  Secondly, at JRoller, we have a custom file upload scheme that 
stores each weblog's resources in subdirectories sorted by the first 
letter of their handle.  Was this code ever rolled into the core?  It 
was code + an update to an image macro that found the proper location.

Thanks,
Matt

Dave wrote:
> On 7/12/07, Allen Gilliland <al...@sun.com> wrote:
>> I think that's fine, but it's extra effort to do that and I don't have
>> time for it right now.  I think just sticking with a simple html or
>> plain text option will be fine for now, and there won't be any issues
>> with expanding on that down the road and allowing admins to choose their
>> own content-type if they want.
>>
>> So for right now I'm just going to leave things as is and provide just a
>> single "Allow HTML in comments?" checkbox, which is effectively the same
>> functionality we have right now.  If HTML is allowed then the
>> content-type for incoming comments will be set to text/html, otherwise
>> it will be set to text/plain.
>>
>> Cool?
>
> Yep. Sounds good.
>
> Thanks,
> Dave

Re: comment formatting and escaping is a bit kludgy

Posted by Dave <sn...@gmail.com>.
On 7/12/07, Allen Gilliland <al...@sun.com> wrote:
> I think that's fine, but it's extra effort to do that and I don't have
> time for it right now.  I think just sticking with a simple html or
> plain text option will be fine for now, and there won't be any issues
> with expanding on that down the road and allowing admins to choose their
> own content-type if they want.
>
> So for right now I'm just going to leave things as is and provide just a
> single "Allow HTML in comments?" checkbox, which is effectively the same
> functionality we have right now.  If HTML is allowed then the
> content-type for incoming comments will be set to text/html, otherwise
> it will be set to text/plain.
>
> Cool?

Yep. Sounds good.

Thanks,
Dave

Re: comment formatting and escaping is a bit kludgy

Posted by Allen Gilliland <al...@sun.com>.

Matt Raible wrote:
> Is it possible that a comment formatting plugin can be applied to a 
> specific
> subset of comments? For example, I'd like to keep the current "allow all
> HTML" for older comments, but do "allow all HTML + auto-line-break" for new
> comments.


yes.  as part of the 4.0 upgrade process we are changing the data model 
so that comment plugins are tracked on a per comment basis, which means 
that you can enabled/disable plugins and that will only affect new 
comments.  so basically, it'll be setup so that when a comment is posted 
it will apply whatever plugins are configured at that time.

you'll have to test out the use of the autoformat plugin combined with 
allowing full html.  the autoformat is more designed for use when html 
is disabled.  but yes, you will be able to do that.

-- Allen


> Matt
> 
> On 7/12/07, Allen Gilliland <al...@sun.com> wrote:
>>
>>
>>
>> Matt Raible wrote:
>> > I've been skimming this thread, but not thoroughly reading each 
>> message.
>> > One
>> > of the things I'd like to see (that doesn't happen now) is to allow 
>> full
>> > HTML - no ifs ands or buts. Meaning that I can add JavaScript or any
>> HTML
>> > tags I'd like to a comment. In addition, it would be nice to add some
>> > comment-post pre-processors that allow replacing \n with <br/> before
>> the
>> > entry gets put in the database.
>> > The first feature is for me. I seem to comment as much on my blog as
>> others
>> > do. It's frustrating for me when my HTML gets escaped. Furthermore,
>> since I
>> > get almost instant notification of comments - I know when to delete
>> those
>> > that have XSS text.
>>
>>
>> yep, those are both things that we are trying to solve for with this
>> work.  i've already introduced the concept of comment plugins into the
>> trunk and those basically work like our entry plugins, they get applied
>> to comments as they are rendered and allow you to transform the text any
>> way you want.  we have 2 plugins migrated from old functionality, one
>> which does the autoformatting as you described (text paragraphing to
>> html paragraphing) and one which restricts the content to only a subset
>> of html.
>>
>> i'm not sure we'll have a way to configure things so that certain users
>> have different restrictions on what they can post in comments though.
>>
>>
>> >
>> > The 2nd feature is for readers. A lot of people comment on my blog w/o
>> > doing
>> > "preview" first and end up with comments that run together - even 
>> though
>> > there's line breaks in their entry. I realize I can change the macro to
>> > render the output different, but that affects all comments - whereas I
>> just
>> > want to affect new ones. I regularly go into my database and format
>> > comments
>> > by adding <br/><br/> tags, but it'd be nice not to have to do that.
>>
>>
>> yep, that's in the autoformatting plugin now.  so if you want to do the
>> safe thing you will be able to disable html in comments and then apply
>> the autoformat plugin and it will format things appropriately.
>>
>> -- Allen
>>
>>
>> >
>> > Matt
>> >
>> > On 7/12/07, Allen Gilliland <al...@sun.com> wrote:
>> >>
>> >>
>> >>
>> >> Dave wrote:
>> >> > On 7/11/07, Allen Gilliland <al...@sun.com> wrote:
>> >> >> I personally like the idea of rejecting comments with html in them
>> >> when
>> >> >> the admin has disabled html in comments because it seems like the
>> >> safest
>> >> >>   and most consistent thing to do.  My feeling is that it's more
>> >> likely
>> >> >> that users will enter in a comment and try to use html than it is
>> that
>> >> >> users will be using the < and > characters in a non-html
>> fashion.  So
>> >> >> rejecting the comment and telling them not to use those characters
>> >> will
>> >> >> be appropriate and fitting most of the time.
>> >> >>
>> >> >> There certainly will be some cases where users aren't trying to put
>> in
>> >> >> html and their comment gets rejected, but my feeling is that those
>> >> cases
>> >> >> will be relatively few and with a decent error message they should
>> be
>> >> >> able to fix it without much of a problem.
>> >> >
>> >> > I don't like the idea of rejecting a comment because it has a 
>> bracket
>> >> > that definitely doesn't seem right. If HTML Is disabled then we
>> should
>> >> > treat the comment as plain text. An angle bracket is allowed in 
>> plain
>> >> > text and is not considered HTML.
>> >>
>> >>
>> >> I don't like it either, but I'm not sure that will happen all that
>> >> often.  What we really need is a better parser so that we could
>> actually
>> >> distinguish between real html and casual use of the < and > 
>> characters.
>> >>
>> >>
>> >> >
>> >> > I think the most consistent thing to do is this:
>> >> >
>> >> > If HTML is disabled in comments that means that Roller consider the
>> >> > comment content to be of type text/plain. Add a content-type 
>> field to
>> >> > the comment table so we can save that. And whenever Roller displays
>> >> > content of type text/plain in HTML or XML it must be escaped.
>> >> >
>> >> > If HTML is enabled then consider the content type to be text/html.
>> >> > When we display HTML in a comment we always do the HTML
>> >> > subsetting/white-listing bit. I don't think we should ever display
>> raw
>> >> > comment HTML.
>> >>
>> >>
>> >> Okay, that sounds like a pretty good plan to me.  However, I'm not 
>> sure
>> >> I agree about forcing the HTML subsetting/white-listing part.  While I
>> >> agree that it is typically dangerous and undesirable, we have to
>> >> remember that many times blog applications are used in trusted
>> >> environments now, such as corporate intranets, and in those cases it
>> >> would be entirely valid to just allow any html in comments.
>> >>
>> >> I guess I also think that it seems silly to not make that an option,
>> >> since undoubtably someone will want to do things that way.
>> >>
>> >> So I will add one more column to the roller_comment table for
>> >> content-type and tweak the upgrade process so that it gets properly 
>> set
>> >> based on the previous value of the 'users.comments.escapehtml'
>> property.
>> >>   If that property was enabled then the content-type will be 
>> text/plain
>> >> and if not then it will be text/html.
>> >>
>> >>
>> >> >
>> >> > Going farther down that road, maybe we shouldn't have a "Allow HTML
>> in
>> >> > comments" property. Instead, have a "Set content-type to be used for
>> >> > new comments" and offer options Text, HTML and XHTML.
>> >> >
>> >> > This will also make it clear how comments should be treated when 
>> in a
>> >> > feed. See the Atom <content> elements type attribute here:
>> >> > http://tools.ietf.org/html/rfc4287#page-14
>> >>
>> >>
>> >> I think that's fine, but it's extra effort to do that and I don't have
>> >> time for it right now.  I think just sticking with a simple html or
>> >> plain text option will be fine for now, and there won't be any issues
>> >> with expanding on that down the road and allowing admins to choose
>> their
>> >> own content-type if they want.
>> >>
>> >> So for right now I'm just going to leave things as is and provide just
>> a
>> >> single "Allow HTML in comments?" checkbox, which is effectively the
>> same
>> >> functionality we have right now.  If HTML is allowed then the
>> >> content-type for incoming comments will be set to text/html, otherwise
>> >> it will be set to text/plain.
>> >>
>> >> Cool?
>> >>
>> >> -- Allen
>> >>
>> >>
>> >> >
>> >> >
>> >> > If we don't want to go down that road and add a content-type, I 
>> think
>> >> > we should stick with the practice of escaping comments that are not
>> >> > supposed to be HTML rather than rejecting any angle brackets or HTML
>> >> > tags. I don't think you have enough justification for making such a
>> >> > behavior change.
>> >> >
>> >> > - Dave
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > - Dave
>> >>
>> >
>> >
>> >
>>
> 
> 
> 

Re: comment formatting and escaping is a bit kludgy

Posted by Matt Raible <ma...@raibledesigns.com>.
Is it possible that a comment formatting plugin can be applied to a specific
subset of comments? For example, I'd like to keep the current "allow all
HTML" for older comments, but do "allow all HTML + auto-line-break" for new
comments.
Matt

On 7/12/07, Allen Gilliland <al...@sun.com> wrote:
>
>
>
> Matt Raible wrote:
> > I've been skimming this thread, but not thoroughly reading each message.
> > One
> > of the things I'd like to see (that doesn't happen now) is to allow full
> > HTML - no ifs ands or buts. Meaning that I can add JavaScript or any
> HTML
> > tags I'd like to a comment. In addition, it would be nice to add some
> > comment-post pre-processors that allow replacing \n with <br/> before
> the
> > entry gets put in the database.
> > The first feature is for me. I seem to comment as much on my blog as
> others
> > do. It's frustrating for me when my HTML gets escaped. Furthermore,
> since I
> > get almost instant notification of comments - I know when to delete
> those
> > that have XSS text.
>
>
> yep, those are both things that we are trying to solve for with this
> work.  i've already introduced the concept of comment plugins into the
> trunk and those basically work like our entry plugins, they get applied
> to comments as they are rendered and allow you to transform the text any
> way you want.  we have 2 plugins migrated from old functionality, one
> which does the autoformatting as you described (text paragraphing to
> html paragraphing) and one which restricts the content to only a subset
> of html.
>
> i'm not sure we'll have a way to configure things so that certain users
> have different restrictions on what they can post in comments though.
>
>
> >
> > The 2nd feature is for readers. A lot of people comment on my blog w/o
> > doing
> > "preview" first and end up with comments that run together - even though
> > there's line breaks in their entry. I realize I can change the macro to
> > render the output different, but that affects all comments - whereas I
> just
> > want to affect new ones. I regularly go into my database and format
> > comments
> > by adding <br/><br/> tags, but it'd be nice not to have to do that.
>
>
> yep, that's in the autoformatting plugin now.  so if you want to do the
> safe thing you will be able to disable html in comments and then apply
> the autoformat plugin and it will format things appropriately.
>
> -- Allen
>
>
> >
> > Matt
> >
> > On 7/12/07, Allen Gilliland <al...@sun.com> wrote:
> >>
> >>
> >>
> >> Dave wrote:
> >> > On 7/11/07, Allen Gilliland <al...@sun.com> wrote:
> >> >> I personally like the idea of rejecting comments with html in them
> >> when
> >> >> the admin has disabled html in comments because it seems like the
> >> safest
> >> >>   and most consistent thing to do.  My feeling is that it's more
> >> likely
> >> >> that users will enter in a comment and try to use html than it is
> that
> >> >> users will be using the < and > characters in a non-html
> fashion.  So
> >> >> rejecting the comment and telling them not to use those characters
> >> will
> >> >> be appropriate and fitting most of the time.
> >> >>
> >> >> There certainly will be some cases where users aren't trying to put
> in
> >> >> html and their comment gets rejected, but my feeling is that those
> >> cases
> >> >> will be relatively few and with a decent error message they should
> be
> >> >> able to fix it without much of a problem.
> >> >
> >> > I don't like the idea of rejecting a comment because it has a bracket
> >> > that definitely doesn't seem right. If HTML Is disabled then we
> should
> >> > treat the comment as plain text. An angle bracket is allowed in plain
> >> > text and is not considered HTML.
> >>
> >>
> >> I don't like it either, but I'm not sure that will happen all that
> >> often.  What we really need is a better parser so that we could
> actually
> >> distinguish between real html and casual use of the < and > characters.
> >>
> >>
> >> >
> >> > I think the most consistent thing to do is this:
> >> >
> >> > If HTML is disabled in comments that means that Roller consider the
> >> > comment content to be of type text/plain. Add a content-type field to
> >> > the comment table so we can save that. And whenever Roller displays
> >> > content of type text/plain in HTML or XML it must be escaped.
> >> >
> >> > If HTML is enabled then consider the content type to be text/html.
> >> > When we display HTML in a comment we always do the HTML
> >> > subsetting/white-listing bit. I don't think we should ever display
> raw
> >> > comment HTML.
> >>
> >>
> >> Okay, that sounds like a pretty good plan to me.  However, I'm not sure
> >> I agree about forcing the HTML subsetting/white-listing part.  While I
> >> agree that it is typically dangerous and undesirable, we have to
> >> remember that many times blog applications are used in trusted
> >> environments now, such as corporate intranets, and in those cases it
> >> would be entirely valid to just allow any html in comments.
> >>
> >> I guess I also think that it seems silly to not make that an option,
> >> since undoubtably someone will want to do things that way.
> >>
> >> So I will add one more column to the roller_comment table for
> >> content-type and tweak the upgrade process so that it gets properly set
> >> based on the previous value of the 'users.comments.escapehtml'
> property.
> >>   If that property was enabled then the content-type will be text/plain
> >> and if not then it will be text/html.
> >>
> >>
> >> >
> >> > Going farther down that road, maybe we shouldn't have a "Allow HTML
> in
> >> > comments" property. Instead, have a "Set content-type to be used for
> >> > new comments" and offer options Text, HTML and XHTML.
> >> >
> >> > This will also make it clear how comments should be treated when in a
> >> > feed. See the Atom <content> elements type attribute here:
> >> > http://tools.ietf.org/html/rfc4287#page-14
> >>
> >>
> >> I think that's fine, but it's extra effort to do that and I don't have
> >> time for it right now.  I think just sticking with a simple html or
> >> plain text option will be fine for now, and there won't be any issues
> >> with expanding on that down the road and allowing admins to choose
> their
> >> own content-type if they want.
> >>
> >> So for right now I'm just going to leave things as is and provide just
> a
> >> single "Allow HTML in comments?" checkbox, which is effectively the
> same
> >> functionality we have right now.  If HTML is allowed then the
> >> content-type for incoming comments will be set to text/html, otherwise
> >> it will be set to text/plain.
> >>
> >> Cool?
> >>
> >> -- Allen
> >>
> >>
> >> >
> >> >
> >> > If we don't want to go down that road and add a content-type, I think
> >> > we should stick with the practice of escaping comments that are not
> >> > supposed to be HTML rather than rejecting any angle brackets or HTML
> >> > tags. I don't think you have enough justification for making such a
> >> > behavior change.
> >> >
> >> > - Dave
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > - Dave
> >>
> >
> >
> >
>



-- 
http://raibledesigns.com

Re: comment formatting and escaping is a bit kludgy

Posted by Allen Gilliland <al...@sun.com>.

Matt Raible wrote:
> I've been skimming this thread, but not thoroughly reading each message. 
> One
> of the things I'd like to see (that doesn't happen now) is to allow full
> HTML - no ifs ands or buts. Meaning that I can add JavaScript or any HTML
> tags I'd like to a comment. In addition, it would be nice to add some
> comment-post pre-processors that allow replacing \n with <br/> before the
> entry gets put in the database.
> The first feature is for me. I seem to comment as much on my blog as others
> do. It's frustrating for me when my HTML gets escaped. Furthermore, since I
> get almost instant notification of comments - I know when to delete those
> that have XSS text.


yep, those are both things that we are trying to solve for with this 
work.  i've already introduced the concept of comment plugins into the 
trunk and those basically work like our entry plugins, they get applied 
to comments as they are rendered and allow you to transform the text any 
way you want.  we have 2 plugins migrated from old functionality, one 
which does the autoformatting as you described (text paragraphing to 
html paragraphing) and one which restricts the content to only a subset 
of html.

i'm not sure we'll have a way to configure things so that certain users 
have different restrictions on what they can post in comments though.


> 
> The 2nd feature is for readers. A lot of people comment on my blog w/o 
> doing
> "preview" first and end up with comments that run together - even though
> there's line breaks in their entry. I realize I can change the macro to
> render the output different, but that affects all comments - whereas I just
> want to affect new ones. I regularly go into my database and format 
> comments
> by adding <br/><br/> tags, but it'd be nice not to have to do that.


yep, that's in the autoformatting plugin now.  so if you want to do the 
safe thing you will be able to disable html in comments and then apply 
the autoformat plugin and it will format things appropriately.

-- Allen


> 
> Matt
> 
> On 7/12/07, Allen Gilliland <al...@sun.com> wrote:
>>
>>
>>
>> Dave wrote:
>> > On 7/11/07, Allen Gilliland <al...@sun.com> wrote:
>> >> I personally like the idea of rejecting comments with html in them 
>> when
>> >> the admin has disabled html in comments because it seems like the
>> safest
>> >>   and most consistent thing to do.  My feeling is that it's more 
>> likely
>> >> that users will enter in a comment and try to use html than it is that
>> >> users will be using the < and > characters in a non-html fashion.  So
>> >> rejecting the comment and telling them not to use those characters 
>> will
>> >> be appropriate and fitting most of the time.
>> >>
>> >> There certainly will be some cases where users aren't trying to put in
>> >> html and their comment gets rejected, but my feeling is that those
>> cases
>> >> will be relatively few and with a decent error message they should be
>> >> able to fix it without much of a problem.
>> >
>> > I don't like the idea of rejecting a comment because it has a bracket
>> > that definitely doesn't seem right. If HTML Is disabled then we should
>> > treat the comment as plain text. An angle bracket is allowed in plain
>> > text and is not considered HTML.
>>
>>
>> I don't like it either, but I'm not sure that will happen all that
>> often.  What we really need is a better parser so that we could actually
>> distinguish between real html and casual use of the < and > characters.
>>
>>
>> >
>> > I think the most consistent thing to do is this:
>> >
>> > If HTML is disabled in comments that means that Roller consider the
>> > comment content to be of type text/plain. Add a content-type field to
>> > the comment table so we can save that. And whenever Roller displays
>> > content of type text/plain in HTML or XML it must be escaped.
>> >
>> > If HTML is enabled then consider the content type to be text/html.
>> > When we display HTML in a comment we always do the HTML
>> > subsetting/white-listing bit. I don't think we should ever display raw
>> > comment HTML.
>>
>>
>> Okay, that sounds like a pretty good plan to me.  However, I'm not sure
>> I agree about forcing the HTML subsetting/white-listing part.  While I
>> agree that it is typically dangerous and undesirable, we have to
>> remember that many times blog applications are used in trusted
>> environments now, such as corporate intranets, and in those cases it
>> would be entirely valid to just allow any html in comments.
>>
>> I guess I also think that it seems silly to not make that an option,
>> since undoubtably someone will want to do things that way.
>>
>> So I will add one more column to the roller_comment table for
>> content-type and tweak the upgrade process so that it gets properly set
>> based on the previous value of the 'users.comments.escapehtml' property.
>>   If that property was enabled then the content-type will be text/plain
>> and if not then it will be text/html.
>>
>>
>> >
>> > Going farther down that road, maybe we shouldn't have a "Allow HTML in
>> > comments" property. Instead, have a "Set content-type to be used for
>> > new comments" and offer options Text, HTML and XHTML.
>> >
>> > This will also make it clear how comments should be treated when in a
>> > feed. See the Atom <content> elements type attribute here:
>> > http://tools.ietf.org/html/rfc4287#page-14
>>
>>
>> I think that's fine, but it's extra effort to do that and I don't have
>> time for it right now.  I think just sticking with a simple html or
>> plain text option will be fine for now, and there won't be any issues
>> with expanding on that down the road and allowing admins to choose their
>> own content-type if they want.
>>
>> So for right now I'm just going to leave things as is and provide just a
>> single "Allow HTML in comments?" checkbox, which is effectively the same
>> functionality we have right now.  If HTML is allowed then the
>> content-type for incoming comments will be set to text/html, otherwise
>> it will be set to text/plain.
>>
>> Cool?
>>
>> -- Allen
>>
>>
>> >
>> >
>> > If we don't want to go down that road and add a content-type, I think
>> > we should stick with the practice of escaping comments that are not
>> > supposed to be HTML rather than rejecting any angle brackets or HTML
>> > tags. I don't think you have enough justification for making such a
>> > behavior change.
>> >
>> > - Dave
>> >
>> >
>> >
>> >
>> >
>> > - Dave
>>
> 
> 
> 

Re: comment formatting and escaping is a bit kludgy

Posted by Matt Raible <ma...@raibledesigns.com>.
I've been skimming this thread, but not thoroughly reading each message. One
of the things I'd like to see (that doesn't happen now) is to allow full
HTML - no ifs ands or buts. Meaning that I can add JavaScript or any HTML
tags I'd like to a comment. In addition, it would be nice to add some
comment-post pre-processors that allow replacing \n with <br/> before the
entry gets put in the database.
The first feature is for me. I seem to comment as much on my blog as others
do. It's frustrating for me when my HTML gets escaped. Furthermore, since I
get almost instant notification of comments - I know when to delete those
that have XSS text.

The 2nd feature is for readers. A lot of people comment on my blog w/o doing
"preview" first and end up with comments that run together - even though
there's line breaks in their entry. I realize I can change the macro to
render the output different, but that affects all comments - whereas I just
want to affect new ones. I regularly go into my database and format comments
by adding <br/><br/> tags, but it'd be nice not to have to do that.

Matt

On 7/12/07, Allen Gilliland <al...@sun.com> wrote:
>
>
>
> Dave wrote:
> > On 7/11/07, Allen Gilliland <al...@sun.com> wrote:
> >> I personally like the idea of rejecting comments with html in them when
> >> the admin has disabled html in comments because it seems like the
> safest
> >>   and most consistent thing to do.  My feeling is that it's more likely
> >> that users will enter in a comment and try to use html than it is that
> >> users will be using the < and > characters in a non-html fashion.  So
> >> rejecting the comment and telling them not to use those characters will
> >> be appropriate and fitting most of the time.
> >>
> >> There certainly will be some cases where users aren't trying to put in
> >> html and their comment gets rejected, but my feeling is that those
> cases
> >> will be relatively few and with a decent error message they should be
> >> able to fix it without much of a problem.
> >
> > I don't like the idea of rejecting a comment because it has a bracket
> > that definitely doesn't seem right. If HTML Is disabled then we should
> > treat the comment as plain text. An angle bracket is allowed in plain
> > text and is not considered HTML.
>
>
> I don't like it either, but I'm not sure that will happen all that
> often.  What we really need is a better parser so that we could actually
> distinguish between real html and casual use of the < and > characters.
>
>
> >
> > I think the most consistent thing to do is this:
> >
> > If HTML is disabled in comments that means that Roller consider the
> > comment content to be of type text/plain. Add a content-type field to
> > the comment table so we can save that. And whenever Roller displays
> > content of type text/plain in HTML or XML it must be escaped.
> >
> > If HTML is enabled then consider the content type to be text/html.
> > When we display HTML in a comment we always do the HTML
> > subsetting/white-listing bit. I don't think we should ever display raw
> > comment HTML.
>
>
> Okay, that sounds like a pretty good plan to me.  However, I'm not sure
> I agree about forcing the HTML subsetting/white-listing part.  While I
> agree that it is typically dangerous and undesirable, we have to
> remember that many times blog applications are used in trusted
> environments now, such as corporate intranets, and in those cases it
> would be entirely valid to just allow any html in comments.
>
> I guess I also think that it seems silly to not make that an option,
> since undoubtably someone will want to do things that way.
>
> So I will add one more column to the roller_comment table for
> content-type and tweak the upgrade process so that it gets properly set
> based on the previous value of the 'users.comments.escapehtml' property.
>   If that property was enabled then the content-type will be text/plain
> and if not then it will be text/html.
>
>
> >
> > Going farther down that road, maybe we shouldn't have a "Allow HTML in
> > comments" property. Instead, have a "Set content-type to be used for
> > new comments" and offer options Text, HTML and XHTML.
> >
> > This will also make it clear how comments should be treated when in a
> > feed. See the Atom <content> elements type attribute here:
> > http://tools.ietf.org/html/rfc4287#page-14
>
>
> I think that's fine, but it's extra effort to do that and I don't have
> time for it right now.  I think just sticking with a simple html or
> plain text option will be fine for now, and there won't be any issues
> with expanding on that down the road and allowing admins to choose their
> own content-type if they want.
>
> So for right now I'm just going to leave things as is and provide just a
> single "Allow HTML in comments?" checkbox, which is effectively the same
> functionality we have right now.  If HTML is allowed then the
> content-type for incoming comments will be set to text/html, otherwise
> it will be set to text/plain.
>
> Cool?
>
> -- Allen
>
>
> >
> >
> > If we don't want to go down that road and add a content-type, I think
> > we should stick with the practice of escaping comments that are not
> > supposed to be HTML rather than rejecting any angle brackets or HTML
> > tags. I don't think you have enough justification for making such a
> > behavior change.
> >
> > - Dave
> >
> >
> >
> >
> >
> > - Dave
>



-- 
http://raibledesigns.com

Re: comment formatting and escaping is a bit kludgy

Posted by Allen Gilliland <al...@sun.com>.

Dave wrote:
> On 7/11/07, Allen Gilliland <al...@sun.com> wrote:
>> I personally like the idea of rejecting comments with html in them when
>> the admin has disabled html in comments because it seems like the safest
>>   and most consistent thing to do.  My feeling is that it's more likely
>> that users will enter in a comment and try to use html than it is that
>> users will be using the < and > characters in a non-html fashion.  So
>> rejecting the comment and telling them not to use those characters will
>> be appropriate and fitting most of the time.
>>
>> There certainly will be some cases where users aren't trying to put in
>> html and their comment gets rejected, but my feeling is that those cases
>> will be relatively few and with a decent error message they should be
>> able to fix it without much of a problem.
> 
> I don't like the idea of rejecting a comment because it has a bracket
> that definitely doesn't seem right. If HTML Is disabled then we should
> treat the comment as plain text. An angle bracket is allowed in plain
> text and is not considered HTML.


I don't like it either, but I'm not sure that will happen all that 
often.  What we really need is a better parser so that we could actually 
distinguish between real html and casual use of the < and > characters.


> 
> I think the most consistent thing to do is this:
> 
> If HTML is disabled in comments that means that Roller consider the
> comment content to be of type text/plain. Add a content-type field to
> the comment table so we can save that. And whenever Roller displays
> content of type text/plain in HTML or XML it must be escaped.
> 
> If HTML is enabled then consider the content type to be text/html.
> When we display HTML in a comment we always do the HTML
> subsetting/white-listing bit. I don't think we should ever display raw
> comment HTML.


Okay, that sounds like a pretty good plan to me.  However, I'm not sure 
I agree about forcing the HTML subsetting/white-listing part.  While I 
agree that it is typically dangerous and undesirable, we have to 
remember that many times blog applications are used in trusted 
environments now, such as corporate intranets, and in those cases it 
would be entirely valid to just allow any html in comments.

I guess I also think that it seems silly to not make that an option, 
since undoubtably someone will want to do things that way.

So I will add one more column to the roller_comment table for 
content-type and tweak the upgrade process so that it gets properly set 
based on the previous value of the 'users.comments.escapehtml' property. 
  If that property was enabled then the content-type will be text/plain 
and if not then it will be text/html.


> 
> Going farther down that road, maybe we shouldn't have a "Allow HTML in
> comments" property. Instead, have a "Set content-type to be used for
> new comments" and offer options Text, HTML and XHTML.
> 
> This will also make it clear how comments should be treated when in a
> feed. See the Atom <content> elements type attribute here:
> http://tools.ietf.org/html/rfc4287#page-14


I think that's fine, but it's extra effort to do that and I don't have 
time for it right now.  I think just sticking with a simple html or 
plain text option will be fine for now, and there won't be any issues 
with expanding on that down the road and allowing admins to choose their 
own content-type if they want.

So for right now I'm just going to leave things as is and provide just a 
single "Allow HTML in comments?" checkbox, which is effectively the same 
functionality we have right now.  If HTML is allowed then the 
content-type for incoming comments will be set to text/html, otherwise 
it will be set to text/plain.

Cool?

-- Allen


> 
> 
> If we don't want to go down that road and add a content-type, I think
> we should stick with the practice of escaping comments that are not
> supposed to be HTML rather than rejecting any angle brackets or HTML
> tags. I don't think you have enough justification for making such a
> behavior change.
> 
> - Dave
> 
> 
> 
> 
> 
> - Dave

Re: comment formatting and escaping is a bit kludgy

Posted by Dave <sn...@gmail.com>.
On 7/11/07, Allen Gilliland <al...@sun.com> wrote:
> I personally like the idea of rejecting comments with html in them when
> the admin has disabled html in comments because it seems like the safest
>   and most consistent thing to do.  My feeling is that it's more likely
> that users will enter in a comment and try to use html than it is that
> users will be using the < and > characters in a non-html fashion.  So
> rejecting the comment and telling them not to use those characters will
> be appropriate and fitting most of the time.
>
> There certainly will be some cases where users aren't trying to put in
> html and their comment gets rejected, but my feeling is that those cases
> will be relatively few and with a decent error message they should be
> able to fix it without much of a problem.

I don't like the idea of rejecting a comment because it has a bracket
that definitely doesn't seem right. If HTML Is disabled then we should
treat the comment as plain text. An angle bracket is allowed in plain
text and is not considered HTML.

I think the most consistent thing to do is this:

If HTML is disabled in comments that means that Roller consider the
comment content to be of type text/plain. Add a content-type field to
the comment table so we can save that. And whenever Roller displays
content of type text/plain in HTML or XML it must be escaped.

If HTML is enabled then consider the content type to be text/html.
When we display HTML in a comment we always do the HTML
subsetting/white-listing bit. I don't think we should ever display raw
comment HTML.

Going farther down that road, maybe we shouldn't have a "Allow HTML in
comments" property. Instead, have a "Set content-type to be used for
new comments" and offer options Text, HTML and XHTML.

This will also make it clear how comments should be treated when in a
feed. See the Atom <content> elements type attribute here:
http://tools.ietf.org/html/rfc4287#page-14


If we don't want to go down that road and add a content-type, I think
we should stick with the practice of escaping comments that are not
supposed to be HTML rather than rejecting any angle brackets or HTML
tags. I don't think you have enough justification for making such a
behavior change.

- Dave





- Dave

Re: comment formatting and escaping is a bit kludgy

Posted by Allen Gilliland <al...@sun.com>.

Dave wrote:
> On 7/11/07, Allen Gilliland <al...@sun.com> wrote:
>> Okay, well for some reason I am not getting escaped html even though the
>> comment wrapper is escaping html from the comment content before
>> returning it, so that is one thing that needs to be fixed.
>>
>> The other problem is that there isn't (and hasn't been) a way to get to
>> option #3 where html is fully allowed.  This is what is used to render
>> comment content in macros in 3.1 ...
>>
>> #if($config.commentEscapeHtml)
>>    #set($content = $utils.escapeHTML($comment.content))
>> #else
>>    #set($content = $utils.transformToHTMLSubset($utils.escapeHTML($
>> comment.content)))
>> #end
> 
> Hmmm... that must have slipped in during XSS fixes. I'd better check
> 3.1.1 for that issue.


Yeah, that's definitely possible.


> 
> 
>> ... and that basically means your only options are #1 (no html) or #2
>> (partial html), which is not really what we are aiming for i don't think.
>>
>> Then, even if html escaping is working properly for #1, i don't think
>> that's the right way of providing that functionality.  If you don't want
>> people putting html in comments then we should be rejecting those
>> comments, not just leaving the html there and escaping it.  The problem
>> with just escaping the html is that 1) it's ugly and if someone puts
>> lots of html in their comment and it gets escaped then it makes the
>> comment almost unreadable and 2) it's dangerous because technically if
>> at some later point in time html escaping isn't applied anymore then
>> potentially dangerous html in the comment can become active again.  So
>> it makes far more sense to me that if the admin wants to prevent html in
>> comments then we should reject those comments with an error message.
>>
>> So, getting back to my original suggestions again I think we can revise
>> a couple things.
>>
>> 1. I think we still need a new runtime property for "enable html in
>> comments?"  This is kind of like a built in comment validator which if
>> enabled will reject comments with html in them.
> 
> Hmm... When HTML in comments is disabled I'm not sure if we should
> reject HTML, strip it or simply escape it as we have been doing.
> 
> What if somebody wants to write about the <h1> tag and wants the
> comment to display the string <h1> and not have it interpreted as
> HTML. If HTML is turned off, why should should we reject <h1>. Why not
> just treat it as text by escaping it at display time?
> 
> If we had a WYSIWYG comment editor then they could type <h1> and it
> would appear as <h1>, but since we don't they have to type &lt;h1&gt;
> which is bit of a pain. Then again, if somebody is writing about HTML
> elements then they're probably smart enough to know how to escape
> them.
> 
> Not sure what the right thing is to do here, but we need to be careful
> about making such a significant change in comment behavior.


I agree, there are a variety of options and the choice isn't obvious.

I personally like the idea of rejecting comments with html in them when 
the admin has disabled html in comments because it seems like the safest 
  and most consistent thing to do.  My feeling is that it's more likely 
that users will enter in a comment and try to use html than it is that 
users will be using the < and > characters in a non-html fashion.  So 
rejecting the comment and telling them not to use those characters will 
be appropriate and fitting most of the time.

There certainly will be some cases where users aren't trying to put in 
html and their comment gets rejected, but my feeling is that those cases 
will be relatively few and with a decent error message they should be 
able to fix it without much of a problem.

A WYSIWYG comment editor would be nice, but ultimately that doesn't 
solve the problem because at the end of the day we still need to 
properly validate the comment on the server side.  So even if a WYSIWYG 
editor was escaping content for people there is still comment spam and 
other situations where someone could try to submit html in their comment 
and we would need to deal with it.

-- Allen


> 
> 
>> 2. For doing the html subset we can basically leave it how it is now as
>> a comment plugin and what it will do is the same thing it does now, just
>> escape all html then unescape the supported tags.  So the expected use
>> case here would be if the user checked the "enable html in comments?"
>> box to allow for html in their comments, then checked the HTML Subset
>> plugin which transforms the comment contents so that only certain html
>> tags are left unescaped.
> 
> Yes, that sounds good.
> 
> - Dave

Re: comment formatting and escaping is a bit kludgy

Posted by Dave <sn...@gmail.com>.
On 7/11/07, Allen Gilliland <al...@sun.com> wrote:
> Okay, well for some reason I am not getting escaped html even though the
> comment wrapper is escaping html from the comment content before
> returning it, so that is one thing that needs to be fixed.
>
> The other problem is that there isn't (and hasn't been) a way to get to
> option #3 where html is fully allowed.  This is what is used to render
> comment content in macros in 3.1 ...
>
> #if($config.commentEscapeHtml)
>    #set($content = $utils.escapeHTML($comment.content))
> #else
>    #set($content = $utils.transformToHTMLSubset($utils.escapeHTML($
> comment.content)))
> #end

Hmmm... that must have slipped in during XSS fixes. I'd better check
3.1.1 for that issue.


> ... and that basically means your only options are #1 (no html) or #2
> (partial html), which is not really what we are aiming for i don't think.
>
> Then, even if html escaping is working properly for #1, i don't think
> that's the right way of providing that functionality.  If you don't want
> people putting html in comments then we should be rejecting those
> comments, not just leaving the html there and escaping it.  The problem
> with just escaping the html is that 1) it's ugly and if someone puts
> lots of html in their comment and it gets escaped then it makes the
> comment almost unreadable and 2) it's dangerous because technically if
> at some later point in time html escaping isn't applied anymore then
> potentially dangerous html in the comment can become active again.  So
> it makes far more sense to me that if the admin wants to prevent html in
> comments then we should reject those comments with an error message.
>
> So, getting back to my original suggestions again I think we can revise
> a couple things.
>
> 1. I think we still need a new runtime property for "enable html in
> comments?"  This is kind of like a built in comment validator which if
> enabled will reject comments with html in them.

Hmm... When HTML in comments is disabled I'm not sure if we should
reject HTML, strip it or simply escape it as we have been doing.

What if somebody wants to write about the <h1> tag and wants the
comment to display the string <h1> and not have it interpreted as
HTML. If HTML is turned off, why should should we reject <h1>. Why not
just treat it as text by escaping it at display time?

If we had a WYSIWYG comment editor then they could type <h1> and it
would appear as <h1>, but since we don't they have to type &lt;h1&gt;
which is bit of a pain. Then again, if somebody is writing about HTML
elements then they're probably smart enough to know how to escape
them.

Not sure what the right thing is to do here, but we need to be careful
about making such a significant change in comment behavior.


> 2. For doing the html subset we can basically leave it how it is now as
> a comment plugin and what it will do is the same thing it does now, just
> escape all html then unescape the supported tags.  So the expected use
> case here would be if the user checked the "enable html in comments?"
> box to allow for html in their comments, then checked the HTML Subset
> plugin which transforms the comment contents so that only certain html
> tags are left unescaped.

Yes, that sounds good.

- Dave

Re: comment formatting and escaping is a bit kludgy

Posted by Allen Gilliland <al...@sun.com>.
Okay, well for some reason I am not getting escaped html even though the 
comment wrapper is escaping html from the comment content before 
returning it, so that is one thing that needs to be fixed.

The other problem is that there isn't (and hasn't been) a way to get to 
option #3 where html is fully allowed.  This is what is used to render 
comment content in macros in 3.1 ...

#if($config.commentEscapeHtml)
   #set($content = $utils.escapeHTML($comment.content))
#else
   #set($content = $utils.transformToHTMLSubset($utils.escapeHTML($
comment.content)))
#end

... and that basically means your only options are #1 (no html) or #2 
(partial html), which is not really what we are aiming for i don't think.

Then, even if html escaping is working properly for #1, i don't think 
that's the right way of providing that functionality.  If you don't want 
people putting html in comments then we should be rejecting those 
comments, not just leaving the html there and escaping it.  The problem 
with just escaping the html is that 1) it's ugly and if someone puts 
lots of html in their comment and it gets escaped then it makes the 
comment almost unreadable and 2) it's dangerous because technically if 
at some later point in time html escaping isn't applied anymore then 
potentially dangerous html in the comment can become active again.  So 
it makes far more sense to me that if the admin wants to prevent html in 
comments then we should reject those comments with an error message.

So, getting back to my original suggestions again I think we can revise 
a couple things.

1. I think we still need a new runtime property for "enable html in 
comments?"  This is kind of like a built in comment validator which if 
enabled will reject comments with html in them.

2. For doing the html subset we can basically leave it how it is now as 
a comment plugin and what it will do is the same thing it does now, just 
escape all html then unescape the supported tags.  So the expected use 
case here would be if the user checked the "enable html in comments?" 
box to allow for html in their comments, then checked the HTML Subset 
plugin which transforms the comment contents so that only certain html 
tags are left unescaped.

Does that sound good?

-- Allen


Dave wrote:
> On 7/11/07, Allen Gilliland <al...@sun.com> wrote:
>> Okay, this work is basically complete, but after doing a fair amount of
>> testing I am a little uncertain if our handling of html in comments is
>> working how we expected.  My understanding from looking at the UI is
>> that we are supposed to have these options ...
>>
>> 1. HTML disabled and not allowed in comments.  Triggered by checking
>> "escape comment html" on global config page.  This is also the property
>> which toggles the "HTML Syntax: Enabled/Disabled" text on the comment 
>> form.
> 
> What this means is that, if you enter an HTML tag it will be escaped
> on display and therefore tags you enter will be displayed as text and
> not interpreted by the browser as HTML tags.
> 
> 
>> 2. HTML subset available.  Only certain html tags work.  I'm not sure I
>> fully understand how this is supposed to be enforced, it looks like the
>> code tries to escape all html into &lt; &gt; syntax and then unescapes
>> certain tags.
> 
> In this case, we escape on display and then un-escape certain tags
> that we deem to be safe.
> 
> 
>> 3. HTML enabled.  Basically, anything goes.
> 
> 
> Yep, that's how they are supposed to work.
> 
> 
> 
>> Now, from looking at the code it doesn't look like we are properly
>> providing these options.  From my testing, when HTML is supposed to be
>> disabled it's not, and I could still put HTML in comments.  And to a
>> larger degree, it looks like the escapeHTML() method isn't really doing
>> much.  This basically means that the HTML subset option wasn't working
>> either because the comment was never being escaped in the first place,
>> so that option was just doing nothing.
> 
> Works for me. Escaping completely prevents HTML tags from being
> interpreted by the browser.
> 
> 
>> So can someone verify that my assumptions above are correct and that's
>> the actual functionality we are shooting for?
>>
>> I think that fixing #1 should be relatively easy and we can just display
>> a comment error message to users if they put html in a comment when they
>> aren't supposed to.
> 
> If we escape (as we are doing now) that works just fine.
> 
> 
> To handle the conditions for #1 and #3 I'd like to
>> just migrate the old "escape comment html" property to a new one called
>> "enable html in comments" and if it's enabled then we don't do anything
>> to check html in comments, if it's disabled then we reject any comments
>> with html in them and display and error message.
>>
>> #2 is a bit more tricky since you then need to be parsing the comment
>> and looking for specific html.  At the end of the day this one feels
>> more like a comment validator than a built-in option to me because
>> effectively all you are trying to do is check if an incoming comment
>> uses any html that you are not allowing, which is the purpose of our
>> comment validators.  So I think that one should be rewritten as a
>> comment validator.
> 
> What's the problem with the white-list based HTML subsetting we have now?
> 
> - Dave

Re: comment formatting and escaping is a bit kludgy

Posted by Dave <sn...@gmail.com>.
On 7/11/07, Allen Gilliland <al...@sun.com> wrote:
> Okay, this work is basically complete, but after doing a fair amount of
> testing I am a little uncertain if our handling of html in comments is
> working how we expected.  My understanding from looking at the UI is
> that we are supposed to have these options ...
>
> 1. HTML disabled and not allowed in comments.  Triggered by checking
> "escape comment html" on global config page.  This is also the property
> which toggles the "HTML Syntax: Enabled/Disabled" text on the comment form.

What this means is that, if you enter an HTML tag it will be escaped
on display and therefore tags you enter will be displayed as text and
not interpreted by the browser as HTML tags.


> 2. HTML subset available.  Only certain html tags work.  I'm not sure I
> fully understand how this is supposed to be enforced, it looks like the
> code tries to escape all html into &lt; &gt; syntax and then unescapes
> certain tags.

In this case, we escape on display and then un-escape certain tags
that we deem to be safe.


> 3. HTML enabled.  Basically, anything goes.


Yep, that's how they are supposed to work.



> Now, from looking at the code it doesn't look like we are properly
> providing these options.  From my testing, when HTML is supposed to be
> disabled it's not, and I could still put HTML in comments.  And to a
> larger degree, it looks like the escapeHTML() method isn't really doing
> much.  This basically means that the HTML subset option wasn't working
> either because the comment was never being escaped in the first place,
> so that option was just doing nothing.

Works for me. Escaping completely prevents HTML tags from being
interpreted by the browser.


> So can someone verify that my assumptions above are correct and that's
> the actual functionality we are shooting for?
>
> I think that fixing #1 should be relatively easy and we can just display
> a comment error message to users if they put html in a comment when they
> aren't supposed to.

If we escape (as we are doing now) that works just fine.


To handle the conditions for #1 and #3 I'd like to
> just migrate the old "escape comment html" property to a new one called
> "enable html in comments" and if it's enabled then we don't do anything
> to check html in comments, if it's disabled then we reject any comments
> with html in them and display and error message.
>
> #2 is a bit more tricky since you then need to be parsing the comment
> and looking for specific html.  At the end of the day this one feels
> more like a comment validator than a built-in option to me because
> effectively all you are trying to do is check if an incoming comment
> uses any html that you are not allowing, which is the purpose of our
> comment validators.  So I think that one should be rewritten as a
> comment validator.

What's the problem with the white-list based HTML subsetting we have now?

- Dave

Re: comment formatting and escaping is a bit kludgy

Posted by Allen Gilliland <al...@sun.com>.
Okay, this work is basically complete, but after doing a fair amount of 
testing I am a little uncertain if our handling of html in comments is 
working how we expected.  My understanding from looking at the UI is 
that we are supposed to have these options ...

1. HTML disabled and not allowed in comments.  Triggered by checking 
"escape comment html" on global config page.  This is also the property 
which toggles the "HTML Syntax: Enabled/Disabled" text on the comment form.

2. HTML subset available.  Only certain html tags work.  I'm not sure I 
fully understand how this is supposed to be enforced, it looks like the 
code tries to escape all html into &lt; &gt; syntax and then unescapes 
certain tags.

3. HTML enabled.  Basically, anything goes.

Now, from looking at the code it doesn't look like we are properly 
providing these options.  From my testing, when HTML is supposed to be 
disabled it's not, and I could still put HTML in comments.  And to a 
larger degree, it looks like the escapeHTML() method isn't really doing 
much.  This basically means that the HTML subset option wasn't working 
either because the comment was never being escaped in the first place, 
so that option was just doing nothing.

So can someone verify that my assumptions above are correct and that's 
the actual functionality we are shooting for?

I think that fixing #1 should be relatively easy and we can just display 
a comment error message to users if they put html in a comment when they 
aren't supposed to.  To handle the conditions for #1 and #3 I'd like to 
just migrate the old "escape comment html" property to a new one called 
"enable html in comments" and if it's enabled then we don't do anything 
to check html in comments, if it's disabled then we reject any comments 
with html in them and display and error message.

#2 is a bit more tricky since you then need to be parsing the comment 
and looking for specific html.  At the end of the day this one feels 
more like a comment validator than a built-in option to me because 
effectively all you are trying to do is check if an incoming comment 
uses any html that you are not allowing, which is the purpose of our 
comment validators.  So I think that one should be rewritten as a 
comment validator.

Does that all make sense?

-- Allen


Allen Gilliland wrote:
> 
> 
> Dave wrote:
>> On 7/10/07, Allen Gilliland <al...@sun.com> wrote
>>> One thing that I didn't mention but is probably worth saying ... it's
>>> also possible to design things a bit differently such that instead of
>>> saving comments unmodified and then applying reformatting plugins as
>>> they get displayed, we could apply the plugins on the incoming comment
>>> and save the transformed version.
>>>
>>> Each approach has it's own benefits.  Applying plugins on the incoming
>>> comment simplifies things quite a bit because we no longer need to add
>>> the 'plugins' attribute to the db or pojo and it would improve
>>> performance a fair amount since we wouldn't need to apply XX plugins per
>>> comment during rendering.  The downside is that since you modify the
>>> comment before saving it you have no way of getting back to the original
>>> comment if you wanted to, although that's not likely to be necessary.
>>>
>>> I kind of like the idea of just applying the plugins once as the comment
>>> is being submitted and then not having to worry about it after that, but
>>> the solution I detailed above is slightly more flexible.
>>
>> I don't have a strong feeling about this, but I guess I prefer to
>> store the raw comment data.
> 
> 
> Actually, I remember why I abandoned this option as I was prototyping. 
> The major difficulty here is upgrade process because currently we are 
> doing transformations of comment content as it gets displayed and so if 
> we wanted to change that then it means going through all comments in the 
> db, applying the tranformations, and then re-saving.  Basically, a major 
> PITA.
> 
> -- Allen
> 
> 
>>
>> - Dave

Re: comment formatting and escaping is a bit kludgy

Posted by Allen Gilliland <al...@sun.com>.

Dave wrote:
> On 7/10/07, Allen Gilliland <al...@sun.com> wrote
>> One thing that I didn't mention but is probably worth saying ... it's
>> also possible to design things a bit differently such that instead of
>> saving comments unmodified and then applying reformatting plugins as
>> they get displayed, we could apply the plugins on the incoming comment
>> and save the transformed version.
>>
>> Each approach has it's own benefits.  Applying plugins on the incoming
>> comment simplifies things quite a bit because we no longer need to add
>> the 'plugins' attribute to the db or pojo and it would improve
>> performance a fair amount since we wouldn't need to apply XX plugins per
>> comment during rendering.  The downside is that since you modify the
>> comment before saving it you have no way of getting back to the original
>> comment if you wanted to, although that's not likely to be necessary.
>>
>> I kind of like the idea of just applying the plugins once as the comment
>> is being submitted and then not having to worry about it after that, but
>> the solution I detailed above is slightly more flexible.
> 
> I don't have a strong feeling about this, but I guess I prefer to
> store the raw comment data.


Actually, I remember why I abandoned this option as I was prototyping. 
The major difficulty here is upgrade process because currently we are 
doing transformations of comment content as it gets displayed and so if 
we wanted to change that then it means going through all comments in the 
db, applying the tranformations, and then re-saving.  Basically, a major 
PITA.

-- Allen


> 
> - Dave

Re: comment formatting and escaping is a bit kludgy

Posted by Dave <sn...@gmail.com>.
On 7/10/07, Allen Gilliland <al...@sun.com> wrote
> One thing that I didn't mention but is probably worth saying ... it's
> also possible to design things a bit differently such that instead of
> saving comments unmodified and then applying reformatting plugins as
> they get displayed, we could apply the plugins on the incoming comment
> and save the transformed version.
>
> Each approach has it's own benefits.  Applying plugins on the incoming
> comment simplifies things quite a bit because we no longer need to add
> the 'plugins' attribute to the db or pojo and it would improve
> performance a fair amount since we wouldn't need to apply XX plugins per
> comment during rendering.  The downside is that since you modify the
> comment before saving it you have no way of getting back to the original
> comment if you wanted to, although that's not likely to be necessary.
>
> I kind of like the idea of just applying the plugins once as the comment
> is being submitted and then not having to worry about it after that, but
> the solution I detailed above is slightly more flexible.

I don't have a strong feeling about this, but I guess I prefer to
store the raw comment data.

- Dave

Re: comment formatting and escaping is a bit kludgy

Posted by Allen Gilliland <al...@sun.com>.

Allen Gilliland wrote:
> 
> 

*snip*

>>
>> I think you are on the right track. Are you going to propose this for
>> 4.0 or later?
> 
> 
> I would like to get it done for 4.0 because it's one of the things that 
> we want to get fixed up on our site and I already have the work done in 
> my workspace.  I think the work is small enough that it doesn't really 
> need a full proposal, this is the list of changes and additions ...
> 
> 1. Added new column 'plugins' to roller_comment table, varchar(255).
> 
> 2. Added new attribute 'plugins' to WeblogEntryComment pojo, including 
> mapping elements in jpa and hibernate mapping files.
> 
> 3. Added new interface WeblogEntryCommentPlugin in 
> business.plugins.comment package.  Interface has 1 method, 
> render(comment) which returns a string of the transformed content.
> 
> 5. Added 1 new comment plugin, AutoformatPlugin which takes plain text 
> and formats it using html.  This is basically a copy of the convert 
> linebreaks entry plugin as far as functionality goes.
> 
> 6. Added 2 new methods to PluginManager and PluginManagerImpl. 
> applyCommentPlugins(comment) which returns a string, and 
> getCommentPlugins() which returns the list of plugins.
> 
> 7. Added new property 'comments.plugins.classnames' in roller.properties 
> which lists the configured comment plugin classes.
> 
> 8. Added new runtime property 'users.comments.plugins' which is a comma 
> separated list of the plugins which are enabled.
> 
> 9. Modified CommentServlet to basically call 
> comment.setPlugins(runtimeConfig.getProp('users.comments.plugins')). 
> This basically applies whatever comment plugins are configured to the 
> specific comment as it gets saved.
> 
> 10. Modified Comment pojo wrapper so that the call to getContent() 
> actually applies plugins before returning output.
> 
> 11. Unit test for AutoformatPlugin.
> 
> Things that I still need to do ...
> 
> * Add to the current upgrade process to set roller_comment.plugins = 
> 'AutoFormat Plugin' if the installation is currently using comment 
> autoformatting.
> 
> * Add a way for site admins to enable/disable comment plugins via the 
> global settings page.
> 
> * Cleanup current comment macros and move logic out of macro and into 
> pojo wrapper.  We already to html escaping in the pojo wrapper, but we 
> also need to do a couple other things.
> 
> That's it.  Pretty straight forward really.


One thing that I didn't mention but is probably worth saying ... it's 
also possible to design things a bit differently such that instead of 
saving comments unmodified and then applying reformatting plugins as 
they get displayed, we could apply the plugins on the incoming comment 
and save the transformed version.

Each approach has it's own benefits.  Applying plugins on the incoming 
comment simplifies things quite a bit because we no longer need to add 
the 'plugins' attribute to the db or pojo and it would improve 
performance a fair amount since we wouldn't need to apply XX plugins per 
comment during rendering.  The downside is that since you modify the 
comment before saving it you have no way of getting back to the original 
comment if you wanted to, although that's not likely to be necessary.

I kind of like the idea of just applying the plugins once as the comment 
is being submitted and then not having to worry about it after that, but 
the solution I detailed above is slightly more flexible.

Any preferences one way or the other?

-- Allen


> 
> -- Allen
> 
> 
>>
>> - Dave

Re: comment formatting and escaping is a bit kludgy

Posted by Allen Gilliland <al...@sun.com>.

Dave wrote:
> On 7/10/07, Allen Gilliland <al...@sun.com> wrote:
>> I would like to get it done for 4.0 because it's one of the things that
>> we want to get fixed up on our site and I already have the work done in
>> my workspace.  I think the work is small enough that it doesn't really
>> need a full proposal, this is the list of changes and additions ...
>>
>> 1. Added new column 'plugins' to roller_comment table, varchar(255).
> 
> That's OK with me. Glad to hear it's ready to commit.
> 
> I'd like to cut off any further database changes after this, if possible.


Me too.  I didn't want to make this change, it just became necessary 
after I realized how things were currently working.

My 4.0 TODO list is getting very short now and I plan to wrap the final 
things up as soon as I can call it done.  The only thing I have left of 
real importance is to fix up the issues with the task scheduling ... 
ROL-1294.

-- Allen


> 
> - Dave

Re: comment formatting and escaping is a bit kludgy

Posted by Dave <sn...@gmail.com>.
On 7/10/07, Allen Gilliland <al...@sun.com> wrote:
> I would like to get it done for 4.0 because it's one of the things that
> we want to get fixed up on our site and I already have the work done in
> my workspace.  I think the work is small enough that it doesn't really
> need a full proposal, this is the list of changes and additions ...
>
> 1. Added new column 'plugins' to roller_comment table, varchar(255).

That's OK with me. Glad to hear it's ready to commit.

I'd like to cut off any further database changes after this, if possible.

- Dave

Re: comment formatting and escaping is a bit kludgy

Posted by Allen Gilliland <al...@sun.com>.

Dave wrote:
> On 7/9/07, Allen Gilliland <al...@sun.com> wrote:
>> i am looking at some ways to improve the comment system and provide
>> greater flexibility in how comment content is handled when it's
>> submitted, particularly in the realm of html vs. non-html comment
>> posting.  xss is becoming so common now that allowing anonymous users to
>> post html comments is pretty dangerous and so I am trying to find some
>> better ways to allow for comment formatting to take place on non-html
>> comments.
>>
>> after looking through what we have right now i think we need to fix a
>> couple issues.  the main problem is with the current autoformat and
>> escape html comment settings.
>>
>> 1. these settings are global settings and apply to all comments on all
>> blogs, however they are not tracked on a per comment basis and this
>> makes it nearly impossible to change these settings without messing up
>> old comments.  for example, if you start your site by allowing html in
>> comments, then disable html and enable autoformatting then it's going to
>> apply that to all your old comments which don't need it and make things
>> look pretty ugly :/  same situation applies in reverse as well.  if you
>> start by using autoformatting and then disable it then all your old
>> comments are going to show up unformatted.
> 
> Yes. That's a pretty serious problem.
> 
> 
>> 2. these settings are applied in velocity templates and macros and are
>> not enforced in any way, so it would be easy for a savvy user to
>> circumvent these settings if they really wanted too.  it is more
>> appropriate if we apply any plugins or transformations inside the code
>> so that we aren't relying on logic in macros/templates.
> 
> I'm not so worried about blog owners circumventing the settings, but I
> agree that we should move away from logic in macros/templates wherever
> possible.
> 
> 
>> So I think we need to fix these things first, then I'd like to add a
>> more flexible way of applying custom formatting options to comments.  #2
>> is very easy to fix since we can just put that logic in the comment
>> wrapper class.  fixing #1 is more tricky.
>>
>> For #1 I would like to introduce the concept of pluggable
>> CommentFormatter classes.  Basically, these would work exactly like
>> CommentValidators except that instead of just checking the comment for
>> some kind of illegal content they would actual be performing
>> transformations on the content, much like our weblog entry plugins do
>> now.  So site admins would be able to define what comment formatters are
>> available to the application and enable/disable them globally to make
>> them apply to comments.  We would track what formatters to apply to a
>> given comment by adding a new column in the db for 'formatters' and
>> listing which formatters to apply to each comment.
>>
>> We would start by defining formatters to replace our current settings
>> for autoformatting and escaping html and via a simple upgrade we can set
>> these formatters to all existing comments if those settings are enabled.
>>
>> Then moving forward it's very easy to add formatters to the application
>> and enable/disable them whenever the admin wants.  Since the formatters
>> are tracked with the comments it is possible to disable a formatter from
>> being used on new comments but have it still available to apply to old
>> comments.  This also provides a very simple way for site admins to
>> create custom formatters to suit whatever their needs may be, for
>> example you could do a wiki syntax comment formatter if you wanted.
> 
> I think you are on the right track. Are you going to propose this for
> 4.0 or later?


I would like to get it done for 4.0 because it's one of the things that 
we want to get fixed up on our site and I already have the work done in 
my workspace.  I think the work is small enough that it doesn't really 
need a full proposal, this is the list of changes and additions ...

1. Added new column 'plugins' to roller_comment table, varchar(255).

2. Added new attribute 'plugins' to WeblogEntryComment pojo, including 
mapping elements in jpa and hibernate mapping files.

3. Added new interface WeblogEntryCommentPlugin in 
business.plugins.comment package.  Interface has 1 method, 
render(comment) which returns a string of the transformed content.

5. Added 1 new comment plugin, AutoformatPlugin which takes plain text 
and formats it using html.  This is basically a copy of the convert 
linebreaks entry plugin as far as functionality goes.

6. Added 2 new methods to PluginManager and PluginManagerImpl. 
applyCommentPlugins(comment) which returns a string, and 
getCommentPlugins() which returns the list of plugins.

7. Added new property 'comments.plugins.classnames' in roller.properties 
which lists the configured comment plugin classes.

8. Added new runtime property 'users.comments.plugins' which is a comma 
separated list of the plugins which are enabled.

9. Modified CommentServlet to basically call 
comment.setPlugins(runtimeConfig.getProp('users.comments.plugins')). 
This basically applies whatever comment plugins are configured to the 
specific comment as it gets saved.

10. Modified Comment pojo wrapper so that the call to getContent() 
actually applies plugins before returning output.

11. Unit test for AutoformatPlugin.

Things that I still need to do ...

* Add to the current upgrade process to set roller_comment.plugins = 
'AutoFormat Plugin' if the installation is currently using comment 
autoformatting.

* Add a way for site admins to enable/disable comment plugins via the 
global settings page.

* Cleanup current comment macros and move logic out of macro and into 
pojo wrapper.  We already to html escaping in the pojo wrapper, but we 
also need to do a couple other things.

That's it.  Pretty straight forward really.

-- Allen


> 
> - Dave

Re: comment formatting and escaping is a bit kludgy

Posted by Dave <sn...@gmail.com>.
On 7/9/07, Allen Gilliland <al...@sun.com> wrote:
> i am looking at some ways to improve the comment system and provide
> greater flexibility in how comment content is handled when it's
> submitted, particularly in the realm of html vs. non-html comment
> posting.  xss is becoming so common now that allowing anonymous users to
> post html comments is pretty dangerous and so I am trying to find some
> better ways to allow for comment formatting to take place on non-html
> comments.
>
> after looking through what we have right now i think we need to fix a
> couple issues.  the main problem is with the current autoformat and
> escape html comment settings.
>
> 1. these settings are global settings and apply to all comments on all
> blogs, however they are not tracked on a per comment basis and this
> makes it nearly impossible to change these settings without messing up
> old comments.  for example, if you start your site by allowing html in
> comments, then disable html and enable autoformatting then it's going to
> apply that to all your old comments which don't need it and make things
> look pretty ugly :/  same situation applies in reverse as well.  if you
> start by using autoformatting and then disable it then all your old
> comments are going to show up unformatted.

Yes. That's a pretty serious problem.


> 2. these settings are applied in velocity templates and macros and are
> not enforced in any way, so it would be easy for a savvy user to
> circumvent these settings if they really wanted too.  it is more
> appropriate if we apply any plugins or transformations inside the code
> so that we aren't relying on logic in macros/templates.

I'm not so worried about blog owners circumventing the settings, but I
agree that we should move away from logic in macros/templates wherever
possible.


> So I think we need to fix these things first, then I'd like to add a
> more flexible way of applying custom formatting options to comments.  #2
> is very easy to fix since we can just put that logic in the comment
> wrapper class.  fixing #1 is more tricky.
>
> For #1 I would like to introduce the concept of pluggable
> CommentFormatter classes.  Basically, these would work exactly like
> CommentValidators except that instead of just checking the comment for
> some kind of illegal content they would actual be performing
> transformations on the content, much like our weblog entry plugins do
> now.  So site admins would be able to define what comment formatters are
> available to the application and enable/disable them globally to make
> them apply to comments.  We would track what formatters to apply to a
> given comment by adding a new column in the db for 'formatters' and
> listing which formatters to apply to each comment.
>
> We would start by defining formatters to replace our current settings
> for autoformatting and escaping html and via a simple upgrade we can set
> these formatters to all existing comments if those settings are enabled.
>
> Then moving forward it's very easy to add formatters to the application
> and enable/disable them whenever the admin wants.  Since the formatters
> are tracked with the comments it is possible to disable a formatter from
> being used on new comments but have it still available to apply to old
> comments.  This also provides a very simple way for site admins to
> create custom formatters to suit whatever their needs may be, for
> example you could do a wiki syntax comment formatter if you wanted.

I think you are on the right track. Are you going to propose this for
4.0 or later?

- Dave