You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@roller.apache.org by Allen Gilliland <al...@sun.com> on 2007/07/09 19:25:20 UTC
comment formatting and escaping is a bit kludgy
i am looking at some ways to improve the comment system and provide
greater flexibility in how comment content is handled when it's
submitted, particularly in the realm of html vs. non-html comment
posting. xss is becoming so common now that allowing anonymous users to
post html comments is pretty dangerous and so I am trying to find some
better ways to allow for comment formatting to take place on non-html
comments.
after looking through what we have right now i think we need to fix a
couple issues. the main problem is with the current autoformat and
escape html comment settings.
1. these settings are global settings and apply to all comments on all
blogs, however they are not tracked on a per comment basis and this
makes it nearly impossible to change these settings without messing up
old comments. for example, if you start your site by allowing html in
comments, then disable html and enable autoformatting then it's going to
apply that to all your old comments which don't need it and make things
look pretty ugly :/ same situation applies in reverse as well. if you
start by using autoformatting and then disable it then all your old
comments are going to show up unformatted.
2. these settings are applied in velocity templates and macros and are
not enforced in any way, so it would be easy for a savvy user to
circumvent these settings if they really wanted too. it is more
appropriate if we apply any plugins or transformations inside the code
so that we aren't relying on logic in macros/templates.
So I think we need to fix these things first, then I'd like to add a
more flexible way of applying custom formatting options to comments. #2
is very easy to fix since we can just put that logic in the comment
wrapper class. fixing #1 is more tricky.
For #1 I would like to introduce the concept of pluggable
CommentFormatter classes. Basically, these would work exactly like
CommentValidators except that instead of just checking the comment for
some kind of illegal content they would actual be performing
transformations on the content, much like our weblog entry plugins do
now. So site admins would be able to define what comment formatters are
available to the application and enable/disable them globally to make
them apply to comments. We would track what formatters to apply to a
given comment by adding a new column in the db for 'formatters' and
listing which formatters to apply to each comment.
We would start by defining formatters to replace our current settings
for autoformatting and escaping html and via a simple upgrade we can set
these formatters to all existing comments if those settings are enabled.
Then moving forward it's very easy to add formatters to the application
and enable/disable them whenever the admin wants. Since the formatters
are tracked with the comments it is possible to disable a formatter from
being used on new comments but have it still available to apply to old
comments. This also provides a very simple way for site admins to
create custom formatters to suit whatever their needs may be, for
example you could do a wiki syntax comment formatter if you wanted.
Thoughts?
-- Allen
Re: Problems upgrading JRoller to Roller 3.1
Posted by Matthew Schmidt <ma...@dzone.com>.
Replies inline:
>
>
> you'll have to elaborate more on exactly what problems you are seeing?
> if the old models and macros aren't working at all then the pages
> would just have lots of $etc variables and #showBlah() calls all over
> the place.
>
> if the legacy models and macros are working but things just aren't
> looking formatted properly then it could just be a problem with
> changes made to the html/css between versions.
>
> if you set ...
>
> rendering.legacyModels.enabled=true
>
> ... in your config file then things should be working, at least for
> the most part.
>
The problem we're seeing is that the main macro to show the weblog
entries: #showWeblogEntries("_day" 15) doesn't appear to be outputting
anything. Most of the items in the sidebar such as the bookmark and
category listings appear to be working.
> no, the upload scheme you mentioned is not in the current code, so
> that's a customization you'll have to handle on your own. if i
> understand your comment you are doing this ...
>
> ${file.uploads}/a/aaa
> ${file.uploads}/a/arrr
> ${file.uploads}/b/blah
>
> an easier way to keep that working in newer versions would be to
> subclass or modify the FileManagerImpl and just override the necessary
> code so that it appends that letter before the handle name. shouldn't
> be that hard to do.
>
> -- Allen
>
> \
Thanks, we'll put that code in. We won't have to modify any macros
correct? We're really trying to keep our fork type modifications to a
minimum so that we can maybe have some chance to keeping the code up to
date :)
-Matt
Re: Problems upgrading JRoller to Roller 3.1
Posted by Allen Gilliland <al...@sun.com>.
Matthew Schmidt wrote:
> Hi guys. We're in the middle of upgrading JRoller to Roller 3.1 (and
> then on to 4.0) but we're having a bit of a problem with our themes. We
> have turned on the legacy theme support, but it seems that all of these
> call macros which apparently don't work anymore. New blogs that are
> created in the system seem to work fine as they use the new macros, but
> the old blogs just don't work. In addition, most people have customized
> their blogs in one way or another, making it very difficult for us to
> just bulk fix these. Is there a way to make our old themes completely
> work?
you'll have to elaborate more on exactly what problems you are seeing?
if the old models and macros aren't working at all then the pages would
just have lots of $etc variables and #showBlah() calls all over the place.
if the legacy models and macros are working but things just aren't
looking formatted properly then it could just be a problem with changes
made to the html/css between versions.
if you set ...
rendering.legacyModels.enabled=true
... in your config file then things should be working, at least for the
most part.
> Separately I have two other issues. First, Are you guys doing a 301
> redirect on the /page to /blogname? If not, I would recommend it as
> Google will likely not be transferring the pagerank for these blogs if
> not. Secondly, at JRoller, we have a custom file upload scheme that
> stores each weblog's resources in subdirectories sorted by the first
> letter of their handle. Was this code ever rolled into the core? It
> was code + an update to an image macro that found the proper location.
yes, the old urls are all 301 permanently moved redirects.
no, the upload scheme you mentioned is not in the current code, so
that's a customization you'll have to handle on your own. if i
understand your comment you are doing this ...
${file.uploads}/a/aaa
${file.uploads}/a/arrr
${file.uploads}/b/blah
an easier way to keep that working in newer versions would be to
subclass or modify the FileManagerImpl and just override the necessary
code so that it appends that letter before the handle name. shouldn't
be that hard to do.
-- Allen
>
> Thanks,
> Matt
>
> Dave wrote:
>> On 7/12/07, Allen Gilliland <al...@sun.com> wrote:
>>> I think that's fine, but it's extra effort to do that and I don't have
>>> time for it right now. I think just sticking with a simple html or
>>> plain text option will be fine for now, and there won't be any issues
>>> with expanding on that down the road and allowing admins to choose their
>>> own content-type if they want.
>>>
>>> So for right now I'm just going to leave things as is and provide just a
>>> single "Allow HTML in comments?" checkbox, which is effectively the same
>>> functionality we have right now. If HTML is allowed then the
>>> content-type for incoming comments will be set to text/html, otherwise
>>> it will be set to text/plain.
>>>
>>> Cool?
>>
>> Yep. Sounds good.
>>
>> Thanks,
>> Dave
Problems upgrading JRoller to Roller 3.1
Posted by Matthew Schmidt <ma...@dzone.com>.
Hi guys. We're in the middle of upgrading JRoller to Roller 3.1 (and
then on to 4.0) but we're having a bit of a problem with our themes. We
have turned on the legacy theme support, but it seems that all of these
call macros which apparently don't work anymore. New blogs that are
created in the system seem to work fine as they use the new macros, but
the old blogs just don't work. In addition, most people have customized
their blogs in one way or another, making it very difficult for us to
just bulk fix these. Is there a way to make our old themes completely
work?
Separately I have two other issues. First, Are you guys doing a 301
redirect on the /page to /blogname? If not, I would recommend it as
Google will likely not be transferring the pagerank for these blogs if
not. Secondly, at JRoller, we have a custom file upload scheme that
stores each weblog's resources in subdirectories sorted by the first
letter of their handle. Was this code ever rolled into the core? It
was code + an update to an image macro that found the proper location.
Thanks,
Matt
Dave wrote:
> On 7/12/07, Allen Gilliland <al...@sun.com> wrote:
>> I think that's fine, but it's extra effort to do that and I don't have
>> time for it right now. I think just sticking with a simple html or
>> plain text option will be fine for now, and there won't be any issues
>> with expanding on that down the road and allowing admins to choose their
>> own content-type if they want.
>>
>> So for right now I'm just going to leave things as is and provide just a
>> single "Allow HTML in comments?" checkbox, which is effectively the same
>> functionality we have right now. If HTML is allowed then the
>> content-type for incoming comments will be set to text/html, otherwise
>> it will be set to text/plain.
>>
>> Cool?
>
> Yep. Sounds good.
>
> Thanks,
> Dave
Re: comment formatting and escaping is a bit kludgy
Posted by Dave <sn...@gmail.com>.
On 7/12/07, Allen Gilliland <al...@sun.com> wrote:
> I think that's fine, but it's extra effort to do that and I don't have
> time for it right now. I think just sticking with a simple html or
> plain text option will be fine for now, and there won't be any issues
> with expanding on that down the road and allowing admins to choose their
> own content-type if they want.
>
> So for right now I'm just going to leave things as is and provide just a
> single "Allow HTML in comments?" checkbox, which is effectively the same
> functionality we have right now. If HTML is allowed then the
> content-type for incoming comments will be set to text/html, otherwise
> it will be set to text/plain.
>
> Cool?
Yep. Sounds good.
Thanks,
Dave
Re: comment formatting and escaping is a bit kludgy
Posted by Allen Gilliland <al...@sun.com>.
Matt Raible wrote:
> Is it possible that a comment formatting plugin can be applied to a
> specific
> subset of comments? For example, I'd like to keep the current "allow all
> HTML" for older comments, but do "allow all HTML + auto-line-break" for new
> comments.
yes. as part of the 4.0 upgrade process we are changing the data model
so that comment plugins are tracked on a per comment basis, which means
that you can enabled/disable plugins and that will only affect new
comments. so basically, it'll be setup so that when a comment is posted
it will apply whatever plugins are configured at that time.
you'll have to test out the use of the autoformat plugin combined with
allowing full html. the autoformat is more designed for use when html
is disabled. but yes, you will be able to do that.
-- Allen
> Matt
>
> On 7/12/07, Allen Gilliland <al...@sun.com> wrote:
>>
>>
>>
>> Matt Raible wrote:
>> > I've been skimming this thread, but not thoroughly reading each
>> message.
>> > One
>> > of the things I'd like to see (that doesn't happen now) is to allow
>> full
>> > HTML - no ifs ands or buts. Meaning that I can add JavaScript or any
>> HTML
>> > tags I'd like to a comment. In addition, it would be nice to add some
>> > comment-post pre-processors that allow replacing \n with <br/> before
>> the
>> > entry gets put in the database.
>> > The first feature is for me. I seem to comment as much on my blog as
>> others
>> > do. It's frustrating for me when my HTML gets escaped. Furthermore,
>> since I
>> > get almost instant notification of comments - I know when to delete
>> those
>> > that have XSS text.
>>
>>
>> yep, those are both things that we are trying to solve for with this
>> work. i've already introduced the concept of comment plugins into the
>> trunk and those basically work like our entry plugins, they get applied
>> to comments as they are rendered and allow you to transform the text any
>> way you want. we have 2 plugins migrated from old functionality, one
>> which does the autoformatting as you described (text paragraphing to
>> html paragraphing) and one which restricts the content to only a subset
>> of html.
>>
>> i'm not sure we'll have a way to configure things so that certain users
>> have different restrictions on what they can post in comments though.
>>
>>
>> >
>> > The 2nd feature is for readers. A lot of people comment on my blog w/o
>> > doing
>> > "preview" first and end up with comments that run together - even
>> though
>> > there's line breaks in their entry. I realize I can change the macro to
>> > render the output different, but that affects all comments - whereas I
>> just
>> > want to affect new ones. I regularly go into my database and format
>> > comments
>> > by adding <br/><br/> tags, but it'd be nice not to have to do that.
>>
>>
>> yep, that's in the autoformatting plugin now. so if you want to do the
>> safe thing you will be able to disable html in comments and then apply
>> the autoformat plugin and it will format things appropriately.
>>
>> -- Allen
>>
>>
>> >
>> > Matt
>> >
>> > On 7/12/07, Allen Gilliland <al...@sun.com> wrote:
>> >>
>> >>
>> >>
>> >> Dave wrote:
>> >> > On 7/11/07, Allen Gilliland <al...@sun.com> wrote:
>> >> >> I personally like the idea of rejecting comments with html in them
>> >> when
>> >> >> the admin has disabled html in comments because it seems like the
>> >> safest
>> >> >> and most consistent thing to do. My feeling is that it's more
>> >> likely
>> >> >> that users will enter in a comment and try to use html than it is
>> that
>> >> >> users will be using the < and > characters in a non-html
>> fashion. So
>> >> >> rejecting the comment and telling them not to use those characters
>> >> will
>> >> >> be appropriate and fitting most of the time.
>> >> >>
>> >> >> There certainly will be some cases where users aren't trying to put
>> in
>> >> >> html and their comment gets rejected, but my feeling is that those
>> >> cases
>> >> >> will be relatively few and with a decent error message they should
>> be
>> >> >> able to fix it without much of a problem.
>> >> >
>> >> > I don't like the idea of rejecting a comment because it has a
>> bracket
>> >> > that definitely doesn't seem right. If HTML Is disabled then we
>> should
>> >> > treat the comment as plain text. An angle bracket is allowed in
>> plain
>> >> > text and is not considered HTML.
>> >>
>> >>
>> >> I don't like it either, but I'm not sure that will happen all that
>> >> often. What we really need is a better parser so that we could
>> actually
>> >> distinguish between real html and casual use of the < and >
>> characters.
>> >>
>> >>
>> >> >
>> >> > I think the most consistent thing to do is this:
>> >> >
>> >> > If HTML is disabled in comments that means that Roller consider the
>> >> > comment content to be of type text/plain. Add a content-type
>> field to
>> >> > the comment table so we can save that. And whenever Roller displays
>> >> > content of type text/plain in HTML or XML it must be escaped.
>> >> >
>> >> > If HTML is enabled then consider the content type to be text/html.
>> >> > When we display HTML in a comment we always do the HTML
>> >> > subsetting/white-listing bit. I don't think we should ever display
>> raw
>> >> > comment HTML.
>> >>
>> >>
>> >> Okay, that sounds like a pretty good plan to me. However, I'm not
>> sure
>> >> I agree about forcing the HTML subsetting/white-listing part. While I
>> >> agree that it is typically dangerous and undesirable, we have to
>> >> remember that many times blog applications are used in trusted
>> >> environments now, such as corporate intranets, and in those cases it
>> >> would be entirely valid to just allow any html in comments.
>> >>
>> >> I guess I also think that it seems silly to not make that an option,
>> >> since undoubtably someone will want to do things that way.
>> >>
>> >> So I will add one more column to the roller_comment table for
>> >> content-type and tweak the upgrade process so that it gets properly
>> set
>> >> based on the previous value of the 'users.comments.escapehtml'
>> property.
>> >> If that property was enabled then the content-type will be
>> text/plain
>> >> and if not then it will be text/html.
>> >>
>> >>
>> >> >
>> >> > Going farther down that road, maybe we shouldn't have a "Allow HTML
>> in
>> >> > comments" property. Instead, have a "Set content-type to be used for
>> >> > new comments" and offer options Text, HTML and XHTML.
>> >> >
>> >> > This will also make it clear how comments should be treated when
>> in a
>> >> > feed. See the Atom <content> elements type attribute here:
>> >> > http://tools.ietf.org/html/rfc4287#page-14
>> >>
>> >>
>> >> I think that's fine, but it's extra effort to do that and I don't have
>> >> time for it right now. I think just sticking with a simple html or
>> >> plain text option will be fine for now, and there won't be any issues
>> >> with expanding on that down the road and allowing admins to choose
>> their
>> >> own content-type if they want.
>> >>
>> >> So for right now I'm just going to leave things as is and provide just
>> a
>> >> single "Allow HTML in comments?" checkbox, which is effectively the
>> same
>> >> functionality we have right now. If HTML is allowed then the
>> >> content-type for incoming comments will be set to text/html, otherwise
>> >> it will be set to text/plain.
>> >>
>> >> Cool?
>> >>
>> >> -- Allen
>> >>
>> >>
>> >> >
>> >> >
>> >> > If we don't want to go down that road and add a content-type, I
>> think
>> >> > we should stick with the practice of escaping comments that are not
>> >> > supposed to be HTML rather than rejecting any angle brackets or HTML
>> >> > tags. I don't think you have enough justification for making such a
>> >> > behavior change.
>> >> >
>> >> > - Dave
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > - Dave
>> >>
>> >
>> >
>> >
>>
>
>
>
Re: comment formatting and escaping is a bit kludgy
Posted by Matt Raible <ma...@raibledesigns.com>.
Is it possible that a comment formatting plugin can be applied to a specific
subset of comments? For example, I'd like to keep the current "allow all
HTML" for older comments, but do "allow all HTML + auto-line-break" for new
comments.
Matt
On 7/12/07, Allen Gilliland <al...@sun.com> wrote:
>
>
>
> Matt Raible wrote:
> > I've been skimming this thread, but not thoroughly reading each message.
> > One
> > of the things I'd like to see (that doesn't happen now) is to allow full
> > HTML - no ifs ands or buts. Meaning that I can add JavaScript or any
> HTML
> > tags I'd like to a comment. In addition, it would be nice to add some
> > comment-post pre-processors that allow replacing \n with <br/> before
> the
> > entry gets put in the database.
> > The first feature is for me. I seem to comment as much on my blog as
> others
> > do. It's frustrating for me when my HTML gets escaped. Furthermore,
> since I
> > get almost instant notification of comments - I know when to delete
> those
> > that have XSS text.
>
>
> yep, those are both things that we are trying to solve for with this
> work. i've already introduced the concept of comment plugins into the
> trunk and those basically work like our entry plugins, they get applied
> to comments as they are rendered and allow you to transform the text any
> way you want. we have 2 plugins migrated from old functionality, one
> which does the autoformatting as you described (text paragraphing to
> html paragraphing) and one which restricts the content to only a subset
> of html.
>
> i'm not sure we'll have a way to configure things so that certain users
> have different restrictions on what they can post in comments though.
>
>
> >
> > The 2nd feature is for readers. A lot of people comment on my blog w/o
> > doing
> > "preview" first and end up with comments that run together - even though
> > there's line breaks in their entry. I realize I can change the macro to
> > render the output different, but that affects all comments - whereas I
> just
> > want to affect new ones. I regularly go into my database and format
> > comments
> > by adding <br/><br/> tags, but it'd be nice not to have to do that.
>
>
> yep, that's in the autoformatting plugin now. so if you want to do the
> safe thing you will be able to disable html in comments and then apply
> the autoformat plugin and it will format things appropriately.
>
> -- Allen
>
>
> >
> > Matt
> >
> > On 7/12/07, Allen Gilliland <al...@sun.com> wrote:
> >>
> >>
> >>
> >> Dave wrote:
> >> > On 7/11/07, Allen Gilliland <al...@sun.com> wrote:
> >> >> I personally like the idea of rejecting comments with html in them
> >> when
> >> >> the admin has disabled html in comments because it seems like the
> >> safest
> >> >> and most consistent thing to do. My feeling is that it's more
> >> likely
> >> >> that users will enter in a comment and try to use html than it is
> that
> >> >> users will be using the < and > characters in a non-html
> fashion. So
> >> >> rejecting the comment and telling them not to use those characters
> >> will
> >> >> be appropriate and fitting most of the time.
> >> >>
> >> >> There certainly will be some cases where users aren't trying to put
> in
> >> >> html and their comment gets rejected, but my feeling is that those
> >> cases
> >> >> will be relatively few and with a decent error message they should
> be
> >> >> able to fix it without much of a problem.
> >> >
> >> > I don't like the idea of rejecting a comment because it has a bracket
> >> > that definitely doesn't seem right. If HTML Is disabled then we
> should
> >> > treat the comment as plain text. An angle bracket is allowed in plain
> >> > text and is not considered HTML.
> >>
> >>
> >> I don't like it either, but I'm not sure that will happen all that
> >> often. What we really need is a better parser so that we could
> actually
> >> distinguish between real html and casual use of the < and > characters.
> >>
> >>
> >> >
> >> > I think the most consistent thing to do is this:
> >> >
> >> > If HTML is disabled in comments that means that Roller consider the
> >> > comment content to be of type text/plain. Add a content-type field to
> >> > the comment table so we can save that. And whenever Roller displays
> >> > content of type text/plain in HTML or XML it must be escaped.
> >> >
> >> > If HTML is enabled then consider the content type to be text/html.
> >> > When we display HTML in a comment we always do the HTML
> >> > subsetting/white-listing bit. I don't think we should ever display
> raw
> >> > comment HTML.
> >>
> >>
> >> Okay, that sounds like a pretty good plan to me. However, I'm not sure
> >> I agree about forcing the HTML subsetting/white-listing part. While I
> >> agree that it is typically dangerous and undesirable, we have to
> >> remember that many times blog applications are used in trusted
> >> environments now, such as corporate intranets, and in those cases it
> >> would be entirely valid to just allow any html in comments.
> >>
> >> I guess I also think that it seems silly to not make that an option,
> >> since undoubtably someone will want to do things that way.
> >>
> >> So I will add one more column to the roller_comment table for
> >> content-type and tweak the upgrade process so that it gets properly set
> >> based on the previous value of the 'users.comments.escapehtml'
> property.
> >> If that property was enabled then the content-type will be text/plain
> >> and if not then it will be text/html.
> >>
> >>
> >> >
> >> > Going farther down that road, maybe we shouldn't have a "Allow HTML
> in
> >> > comments" property. Instead, have a "Set content-type to be used for
> >> > new comments" and offer options Text, HTML and XHTML.
> >> >
> >> > This will also make it clear how comments should be treated when in a
> >> > feed. See the Atom <content> elements type attribute here:
> >> > http://tools.ietf.org/html/rfc4287#page-14
> >>
> >>
> >> I think that's fine, but it's extra effort to do that and I don't have
> >> time for it right now. I think just sticking with a simple html or
> >> plain text option will be fine for now, and there won't be any issues
> >> with expanding on that down the road and allowing admins to choose
> their
> >> own content-type if they want.
> >>
> >> So for right now I'm just going to leave things as is and provide just
> a
> >> single "Allow HTML in comments?" checkbox, which is effectively the
> same
> >> functionality we have right now. If HTML is allowed then the
> >> content-type for incoming comments will be set to text/html, otherwise
> >> it will be set to text/plain.
> >>
> >> Cool?
> >>
> >> -- Allen
> >>
> >>
> >> >
> >> >
> >> > If we don't want to go down that road and add a content-type, I think
> >> > we should stick with the practice of escaping comments that are not
> >> > supposed to be HTML rather than rejecting any angle brackets or HTML
> >> > tags. I don't think you have enough justification for making such a
> >> > behavior change.
> >> >
> >> > - Dave
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > - Dave
> >>
> >
> >
> >
>
--
http://raibledesigns.com
Re: comment formatting and escaping is a bit kludgy
Posted by Allen Gilliland <al...@sun.com>.
Matt Raible wrote:
> I've been skimming this thread, but not thoroughly reading each message.
> One
> of the things I'd like to see (that doesn't happen now) is to allow full
> HTML - no ifs ands or buts. Meaning that I can add JavaScript or any HTML
> tags I'd like to a comment. In addition, it would be nice to add some
> comment-post pre-processors that allow replacing \n with <br/> before the
> entry gets put in the database.
> The first feature is for me. I seem to comment as much on my blog as others
> do. It's frustrating for me when my HTML gets escaped. Furthermore, since I
> get almost instant notification of comments - I know when to delete those
> that have XSS text.
yep, those are both things that we are trying to solve for with this
work. i've already introduced the concept of comment plugins into the
trunk and those basically work like our entry plugins, they get applied
to comments as they are rendered and allow you to transform the text any
way you want. we have 2 plugins migrated from old functionality, one
which does the autoformatting as you described (text paragraphing to
html paragraphing) and one which restricts the content to only a subset
of html.
i'm not sure we'll have a way to configure things so that certain users
have different restrictions on what they can post in comments though.
>
> The 2nd feature is for readers. A lot of people comment on my blog w/o
> doing
> "preview" first and end up with comments that run together - even though
> there's line breaks in their entry. I realize I can change the macro to
> render the output different, but that affects all comments - whereas I just
> want to affect new ones. I regularly go into my database and format
> comments
> by adding <br/><br/> tags, but it'd be nice not to have to do that.
yep, that's in the autoformatting plugin now. so if you want to do the
safe thing you will be able to disable html in comments and then apply
the autoformat plugin and it will format things appropriately.
-- Allen
>
> Matt
>
> On 7/12/07, Allen Gilliland <al...@sun.com> wrote:
>>
>>
>>
>> Dave wrote:
>> > On 7/11/07, Allen Gilliland <al...@sun.com> wrote:
>> >> I personally like the idea of rejecting comments with html in them
>> when
>> >> the admin has disabled html in comments because it seems like the
>> safest
>> >> and most consistent thing to do. My feeling is that it's more
>> likely
>> >> that users will enter in a comment and try to use html than it is that
>> >> users will be using the < and > characters in a non-html fashion. So
>> >> rejecting the comment and telling them not to use those characters
>> will
>> >> be appropriate and fitting most of the time.
>> >>
>> >> There certainly will be some cases where users aren't trying to put in
>> >> html and their comment gets rejected, but my feeling is that those
>> cases
>> >> will be relatively few and with a decent error message they should be
>> >> able to fix it without much of a problem.
>> >
>> > I don't like the idea of rejecting a comment because it has a bracket
>> > that definitely doesn't seem right. If HTML Is disabled then we should
>> > treat the comment as plain text. An angle bracket is allowed in plain
>> > text and is not considered HTML.
>>
>>
>> I don't like it either, but I'm not sure that will happen all that
>> often. What we really need is a better parser so that we could actually
>> distinguish between real html and casual use of the < and > characters.
>>
>>
>> >
>> > I think the most consistent thing to do is this:
>> >
>> > If HTML is disabled in comments that means that Roller consider the
>> > comment content to be of type text/plain. Add a content-type field to
>> > the comment table so we can save that. And whenever Roller displays
>> > content of type text/plain in HTML or XML it must be escaped.
>> >
>> > If HTML is enabled then consider the content type to be text/html.
>> > When we display HTML in a comment we always do the HTML
>> > subsetting/white-listing bit. I don't think we should ever display raw
>> > comment HTML.
>>
>>
>> Okay, that sounds like a pretty good plan to me. However, I'm not sure
>> I agree about forcing the HTML subsetting/white-listing part. While I
>> agree that it is typically dangerous and undesirable, we have to
>> remember that many times blog applications are used in trusted
>> environments now, such as corporate intranets, and in those cases it
>> would be entirely valid to just allow any html in comments.
>>
>> I guess I also think that it seems silly to not make that an option,
>> since undoubtably someone will want to do things that way.
>>
>> So I will add one more column to the roller_comment table for
>> content-type and tweak the upgrade process so that it gets properly set
>> based on the previous value of the 'users.comments.escapehtml' property.
>> If that property was enabled then the content-type will be text/plain
>> and if not then it will be text/html.
>>
>>
>> >
>> > Going farther down that road, maybe we shouldn't have a "Allow HTML in
>> > comments" property. Instead, have a "Set content-type to be used for
>> > new comments" and offer options Text, HTML and XHTML.
>> >
>> > This will also make it clear how comments should be treated when in a
>> > feed. See the Atom <content> elements type attribute here:
>> > http://tools.ietf.org/html/rfc4287#page-14
>>
>>
>> I think that's fine, but it's extra effort to do that and I don't have
>> time for it right now. I think just sticking with a simple html or
>> plain text option will be fine for now, and there won't be any issues
>> with expanding on that down the road and allowing admins to choose their
>> own content-type if they want.
>>
>> So for right now I'm just going to leave things as is and provide just a
>> single "Allow HTML in comments?" checkbox, which is effectively the same
>> functionality we have right now. If HTML is allowed then the
>> content-type for incoming comments will be set to text/html, otherwise
>> it will be set to text/plain.
>>
>> Cool?
>>
>> -- Allen
>>
>>
>> >
>> >
>> > If we don't want to go down that road and add a content-type, I think
>> > we should stick with the practice of escaping comments that are not
>> > supposed to be HTML rather than rejecting any angle brackets or HTML
>> > tags. I don't think you have enough justification for making such a
>> > behavior change.
>> >
>> > - Dave
>> >
>> >
>> >
>> >
>> >
>> > - Dave
>>
>
>
>
Re: comment formatting and escaping is a bit kludgy
Posted by Matt Raible <ma...@raibledesigns.com>.
I've been skimming this thread, but not thoroughly reading each message. One
of the things I'd like to see (that doesn't happen now) is to allow full
HTML - no ifs ands or buts. Meaning that I can add JavaScript or any HTML
tags I'd like to a comment. In addition, it would be nice to add some
comment-post pre-processors that allow replacing \n with <br/> before the
entry gets put in the database.
The first feature is for me. I seem to comment as much on my blog as others
do. It's frustrating for me when my HTML gets escaped. Furthermore, since I
get almost instant notification of comments - I know when to delete those
that have XSS text.
The 2nd feature is for readers. A lot of people comment on my blog w/o doing
"preview" first and end up with comments that run together - even though
there's line breaks in their entry. I realize I can change the macro to
render the output different, but that affects all comments - whereas I just
want to affect new ones. I regularly go into my database and format comments
by adding <br/><br/> tags, but it'd be nice not to have to do that.
Matt
On 7/12/07, Allen Gilliland <al...@sun.com> wrote:
>
>
>
> Dave wrote:
> > On 7/11/07, Allen Gilliland <al...@sun.com> wrote:
> >> I personally like the idea of rejecting comments with html in them when
> >> the admin has disabled html in comments because it seems like the
> safest
> >> and most consistent thing to do. My feeling is that it's more likely
> >> that users will enter in a comment and try to use html than it is that
> >> users will be using the < and > characters in a non-html fashion. So
> >> rejecting the comment and telling them not to use those characters will
> >> be appropriate and fitting most of the time.
> >>
> >> There certainly will be some cases where users aren't trying to put in
> >> html and their comment gets rejected, but my feeling is that those
> cases
> >> will be relatively few and with a decent error message they should be
> >> able to fix it without much of a problem.
> >
> > I don't like the idea of rejecting a comment because it has a bracket
> > that definitely doesn't seem right. If HTML Is disabled then we should
> > treat the comment as plain text. An angle bracket is allowed in plain
> > text and is not considered HTML.
>
>
> I don't like it either, but I'm not sure that will happen all that
> often. What we really need is a better parser so that we could actually
> distinguish between real html and casual use of the < and > characters.
>
>
> >
> > I think the most consistent thing to do is this:
> >
> > If HTML is disabled in comments that means that Roller consider the
> > comment content to be of type text/plain. Add a content-type field to
> > the comment table so we can save that. And whenever Roller displays
> > content of type text/plain in HTML or XML it must be escaped.
> >
> > If HTML is enabled then consider the content type to be text/html.
> > When we display HTML in a comment we always do the HTML
> > subsetting/white-listing bit. I don't think we should ever display raw
> > comment HTML.
>
>
> Okay, that sounds like a pretty good plan to me. However, I'm not sure
> I agree about forcing the HTML subsetting/white-listing part. While I
> agree that it is typically dangerous and undesirable, we have to
> remember that many times blog applications are used in trusted
> environments now, such as corporate intranets, and in those cases it
> would be entirely valid to just allow any html in comments.
>
> I guess I also think that it seems silly to not make that an option,
> since undoubtably someone will want to do things that way.
>
> So I will add one more column to the roller_comment table for
> content-type and tweak the upgrade process so that it gets properly set
> based on the previous value of the 'users.comments.escapehtml' property.
> If that property was enabled then the content-type will be text/plain
> and if not then it will be text/html.
>
>
> >
> > Going farther down that road, maybe we shouldn't have a "Allow HTML in
> > comments" property. Instead, have a "Set content-type to be used for
> > new comments" and offer options Text, HTML and XHTML.
> >
> > This will also make it clear how comments should be treated when in a
> > feed. See the Atom <content> elements type attribute here:
> > http://tools.ietf.org/html/rfc4287#page-14
>
>
> I think that's fine, but it's extra effort to do that and I don't have
> time for it right now. I think just sticking with a simple html or
> plain text option will be fine for now, and there won't be any issues
> with expanding on that down the road and allowing admins to choose their
> own content-type if they want.
>
> So for right now I'm just going to leave things as is and provide just a
> single "Allow HTML in comments?" checkbox, which is effectively the same
> functionality we have right now. If HTML is allowed then the
> content-type for incoming comments will be set to text/html, otherwise
> it will be set to text/plain.
>
> Cool?
>
> -- Allen
>
>
> >
> >
> > If we don't want to go down that road and add a content-type, I think
> > we should stick with the practice of escaping comments that are not
> > supposed to be HTML rather than rejecting any angle brackets or HTML
> > tags. I don't think you have enough justification for making such a
> > behavior change.
> >
> > - Dave
> >
> >
> >
> >
> >
> > - Dave
>
--
http://raibledesigns.com
Re: comment formatting and escaping is a bit kludgy
Posted by Allen Gilliland <al...@sun.com>.
Dave wrote:
> On 7/11/07, Allen Gilliland <al...@sun.com> wrote:
>> I personally like the idea of rejecting comments with html in them when
>> the admin has disabled html in comments because it seems like the safest
>> and most consistent thing to do. My feeling is that it's more likely
>> that users will enter in a comment and try to use html than it is that
>> users will be using the < and > characters in a non-html fashion. So
>> rejecting the comment and telling them not to use those characters will
>> be appropriate and fitting most of the time.
>>
>> There certainly will be some cases where users aren't trying to put in
>> html and their comment gets rejected, but my feeling is that those cases
>> will be relatively few and with a decent error message they should be
>> able to fix it without much of a problem.
>
> I don't like the idea of rejecting a comment because it has a bracket
> that definitely doesn't seem right. If HTML Is disabled then we should
> treat the comment as plain text. An angle bracket is allowed in plain
> text and is not considered HTML.
I don't like it either, but I'm not sure that will happen all that
often. What we really need is a better parser so that we could actually
distinguish between real html and casual use of the < and > characters.
>
> I think the most consistent thing to do is this:
>
> If HTML is disabled in comments that means that Roller consider the
> comment content to be of type text/plain. Add a content-type field to
> the comment table so we can save that. And whenever Roller displays
> content of type text/plain in HTML or XML it must be escaped.
>
> If HTML is enabled then consider the content type to be text/html.
> When we display HTML in a comment we always do the HTML
> subsetting/white-listing bit. I don't think we should ever display raw
> comment HTML.
Okay, that sounds like a pretty good plan to me. However, I'm not sure
I agree about forcing the HTML subsetting/white-listing part. While I
agree that it is typically dangerous and undesirable, we have to
remember that many times blog applications are used in trusted
environments now, such as corporate intranets, and in those cases it
would be entirely valid to just allow any html in comments.
I guess I also think that it seems silly to not make that an option,
since undoubtably someone will want to do things that way.
So I will add one more column to the roller_comment table for
content-type and tweak the upgrade process so that it gets properly set
based on the previous value of the 'users.comments.escapehtml' property.
If that property was enabled then the content-type will be text/plain
and if not then it will be text/html.
>
> Going farther down that road, maybe we shouldn't have a "Allow HTML in
> comments" property. Instead, have a "Set content-type to be used for
> new comments" and offer options Text, HTML and XHTML.
>
> This will also make it clear how comments should be treated when in a
> feed. See the Atom <content> elements type attribute here:
> http://tools.ietf.org/html/rfc4287#page-14
I think that's fine, but it's extra effort to do that and I don't have
time for it right now. I think just sticking with a simple html or
plain text option will be fine for now, and there won't be any issues
with expanding on that down the road and allowing admins to choose their
own content-type if they want.
So for right now I'm just going to leave things as is and provide just a
single "Allow HTML in comments?" checkbox, which is effectively the same
functionality we have right now. If HTML is allowed then the
content-type for incoming comments will be set to text/html, otherwise
it will be set to text/plain.
Cool?
-- Allen
>
>
> If we don't want to go down that road and add a content-type, I think
> we should stick with the practice of escaping comments that are not
> supposed to be HTML rather than rejecting any angle brackets or HTML
> tags. I don't think you have enough justification for making such a
> behavior change.
>
> - Dave
>
>
>
>
>
> - Dave
Re: comment formatting and escaping is a bit kludgy
Posted by Dave <sn...@gmail.com>.
On 7/11/07, Allen Gilliland <al...@sun.com> wrote:
> I personally like the idea of rejecting comments with html in them when
> the admin has disabled html in comments because it seems like the safest
> and most consistent thing to do. My feeling is that it's more likely
> that users will enter in a comment and try to use html than it is that
> users will be using the < and > characters in a non-html fashion. So
> rejecting the comment and telling them not to use those characters will
> be appropriate and fitting most of the time.
>
> There certainly will be some cases where users aren't trying to put in
> html and their comment gets rejected, but my feeling is that those cases
> will be relatively few and with a decent error message they should be
> able to fix it without much of a problem.
I don't like the idea of rejecting a comment because it has a bracket
that definitely doesn't seem right. If HTML Is disabled then we should
treat the comment as plain text. An angle bracket is allowed in plain
text and is not considered HTML.
I think the most consistent thing to do is this:
If HTML is disabled in comments that means that Roller consider the
comment content to be of type text/plain. Add a content-type field to
the comment table so we can save that. And whenever Roller displays
content of type text/plain in HTML or XML it must be escaped.
If HTML is enabled then consider the content type to be text/html.
When we display HTML in a comment we always do the HTML
subsetting/white-listing bit. I don't think we should ever display raw
comment HTML.
Going farther down that road, maybe we shouldn't have a "Allow HTML in
comments" property. Instead, have a "Set content-type to be used for
new comments" and offer options Text, HTML and XHTML.
This will also make it clear how comments should be treated when in a
feed. See the Atom <content> elements type attribute here:
http://tools.ietf.org/html/rfc4287#page-14
If we don't want to go down that road and add a content-type, I think
we should stick with the practice of escaping comments that are not
supposed to be HTML rather than rejecting any angle brackets or HTML
tags. I don't think you have enough justification for making such a
behavior change.
- Dave
- Dave
Re: comment formatting and escaping is a bit kludgy
Posted by Allen Gilliland <al...@sun.com>.
Dave wrote:
> On 7/11/07, Allen Gilliland <al...@sun.com> wrote:
>> Okay, well for some reason I am not getting escaped html even though the
>> comment wrapper is escaping html from the comment content before
>> returning it, so that is one thing that needs to be fixed.
>>
>> The other problem is that there isn't (and hasn't been) a way to get to
>> option #3 where html is fully allowed. This is what is used to render
>> comment content in macros in 3.1 ...
>>
>> #if($config.commentEscapeHtml)
>> #set($content = $utils.escapeHTML($comment.content))
>> #else
>> #set($content = $utils.transformToHTMLSubset($utils.escapeHTML($
>> comment.content)))
>> #end
>
> Hmmm... that must have slipped in during XSS fixes. I'd better check
> 3.1.1 for that issue.
Yeah, that's definitely possible.
>
>
>> ... and that basically means your only options are #1 (no html) or #2
>> (partial html), which is not really what we are aiming for i don't think.
>>
>> Then, even if html escaping is working properly for #1, i don't think
>> that's the right way of providing that functionality. If you don't want
>> people putting html in comments then we should be rejecting those
>> comments, not just leaving the html there and escaping it. The problem
>> with just escaping the html is that 1) it's ugly and if someone puts
>> lots of html in their comment and it gets escaped then it makes the
>> comment almost unreadable and 2) it's dangerous because technically if
>> at some later point in time html escaping isn't applied anymore then
>> potentially dangerous html in the comment can become active again. So
>> it makes far more sense to me that if the admin wants to prevent html in
>> comments then we should reject those comments with an error message.
>>
>> So, getting back to my original suggestions again I think we can revise
>> a couple things.
>>
>> 1. I think we still need a new runtime property for "enable html in
>> comments?" This is kind of like a built in comment validator which if
>> enabled will reject comments with html in them.
>
> Hmm... When HTML in comments is disabled I'm not sure if we should
> reject HTML, strip it or simply escape it as we have been doing.
>
> What if somebody wants to write about the <h1> tag and wants the
> comment to display the string <h1> and not have it interpreted as
> HTML. If HTML is turned off, why should should we reject <h1>. Why not
> just treat it as text by escaping it at display time?
>
> If we had a WYSIWYG comment editor then they could type <h1> and it
> would appear as <h1>, but since we don't they have to type <h1>
> which is bit of a pain. Then again, if somebody is writing about HTML
> elements then they're probably smart enough to know how to escape
> them.
>
> Not sure what the right thing is to do here, but we need to be careful
> about making such a significant change in comment behavior.
I agree, there are a variety of options and the choice isn't obvious.
I personally like the idea of rejecting comments with html in them when
the admin has disabled html in comments because it seems like the safest
and most consistent thing to do. My feeling is that it's more likely
that users will enter in a comment and try to use html than it is that
users will be using the < and > characters in a non-html fashion. So
rejecting the comment and telling them not to use those characters will
be appropriate and fitting most of the time.
There certainly will be some cases where users aren't trying to put in
html and their comment gets rejected, but my feeling is that those cases
will be relatively few and with a decent error message they should be
able to fix it without much of a problem.
A WYSIWYG comment editor would be nice, but ultimately that doesn't
solve the problem because at the end of the day we still need to
properly validate the comment on the server side. So even if a WYSIWYG
editor was escaping content for people there is still comment spam and
other situations where someone could try to submit html in their comment
and we would need to deal with it.
-- Allen
>
>
>> 2. For doing the html subset we can basically leave it how it is now as
>> a comment plugin and what it will do is the same thing it does now, just
>> escape all html then unescape the supported tags. So the expected use
>> case here would be if the user checked the "enable html in comments?"
>> box to allow for html in their comments, then checked the HTML Subset
>> plugin which transforms the comment contents so that only certain html
>> tags are left unescaped.
>
> Yes, that sounds good.
>
> - Dave
Re: comment formatting and escaping is a bit kludgy
Posted by Dave <sn...@gmail.com>.
On 7/11/07, Allen Gilliland <al...@sun.com> wrote:
> Okay, well for some reason I am not getting escaped html even though the
> comment wrapper is escaping html from the comment content before
> returning it, so that is one thing that needs to be fixed.
>
> The other problem is that there isn't (and hasn't been) a way to get to
> option #3 where html is fully allowed. This is what is used to render
> comment content in macros in 3.1 ...
>
> #if($config.commentEscapeHtml)
> #set($content = $utils.escapeHTML($comment.content))
> #else
> #set($content = $utils.transformToHTMLSubset($utils.escapeHTML($
> comment.content)))
> #end
Hmmm... that must have slipped in during XSS fixes. I'd better check
3.1.1 for that issue.
> ... and that basically means your only options are #1 (no html) or #2
> (partial html), which is not really what we are aiming for i don't think.
>
> Then, even if html escaping is working properly for #1, i don't think
> that's the right way of providing that functionality. If you don't want
> people putting html in comments then we should be rejecting those
> comments, not just leaving the html there and escaping it. The problem
> with just escaping the html is that 1) it's ugly and if someone puts
> lots of html in their comment and it gets escaped then it makes the
> comment almost unreadable and 2) it's dangerous because technically if
> at some later point in time html escaping isn't applied anymore then
> potentially dangerous html in the comment can become active again. So
> it makes far more sense to me that if the admin wants to prevent html in
> comments then we should reject those comments with an error message.
>
> So, getting back to my original suggestions again I think we can revise
> a couple things.
>
> 1. I think we still need a new runtime property for "enable html in
> comments?" This is kind of like a built in comment validator which if
> enabled will reject comments with html in them.
Hmm... When HTML in comments is disabled I'm not sure if we should
reject HTML, strip it or simply escape it as we have been doing.
What if somebody wants to write about the <h1> tag and wants the
comment to display the string <h1> and not have it interpreted as
HTML. If HTML is turned off, why should should we reject <h1>. Why not
just treat it as text by escaping it at display time?
If we had a WYSIWYG comment editor then they could type <h1> and it
would appear as <h1>, but since we don't they have to type <h1>
which is bit of a pain. Then again, if somebody is writing about HTML
elements then they're probably smart enough to know how to escape
them.
Not sure what the right thing is to do here, but we need to be careful
about making such a significant change in comment behavior.
> 2. For doing the html subset we can basically leave it how it is now as
> a comment plugin and what it will do is the same thing it does now, just
> escape all html then unescape the supported tags. So the expected use
> case here would be if the user checked the "enable html in comments?"
> box to allow for html in their comments, then checked the HTML Subset
> plugin which transforms the comment contents so that only certain html
> tags are left unescaped.
Yes, that sounds good.
- Dave
Re: comment formatting and escaping is a bit kludgy
Posted by Allen Gilliland <al...@sun.com>.
Okay, well for some reason I am not getting escaped html even though the
comment wrapper is escaping html from the comment content before
returning it, so that is one thing that needs to be fixed.
The other problem is that there isn't (and hasn't been) a way to get to
option #3 where html is fully allowed. This is what is used to render
comment content in macros in 3.1 ...
#if($config.commentEscapeHtml)
#set($content = $utils.escapeHTML($comment.content))
#else
#set($content = $utils.transformToHTMLSubset($utils.escapeHTML($
comment.content)))
#end
... and that basically means your only options are #1 (no html) or #2
(partial html), which is not really what we are aiming for i don't think.
Then, even if html escaping is working properly for #1, i don't think
that's the right way of providing that functionality. If you don't want
people putting html in comments then we should be rejecting those
comments, not just leaving the html there and escaping it. The problem
with just escaping the html is that 1) it's ugly and if someone puts
lots of html in their comment and it gets escaped then it makes the
comment almost unreadable and 2) it's dangerous because technically if
at some later point in time html escaping isn't applied anymore then
potentially dangerous html in the comment can become active again. So
it makes far more sense to me that if the admin wants to prevent html in
comments then we should reject those comments with an error message.
So, getting back to my original suggestions again I think we can revise
a couple things.
1. I think we still need a new runtime property for "enable html in
comments?" This is kind of like a built in comment validator which if
enabled will reject comments with html in them.
2. For doing the html subset we can basically leave it how it is now as
a comment plugin and what it will do is the same thing it does now, just
escape all html then unescape the supported tags. So the expected use
case here would be if the user checked the "enable html in comments?"
box to allow for html in their comments, then checked the HTML Subset
plugin which transforms the comment contents so that only certain html
tags are left unescaped.
Does that sound good?
-- Allen
Dave wrote:
> On 7/11/07, Allen Gilliland <al...@sun.com> wrote:
>> Okay, this work is basically complete, but after doing a fair amount of
>> testing I am a little uncertain if our handling of html in comments is
>> working how we expected. My understanding from looking at the UI is
>> that we are supposed to have these options ...
>>
>> 1. HTML disabled and not allowed in comments. Triggered by checking
>> "escape comment html" on global config page. This is also the property
>> which toggles the "HTML Syntax: Enabled/Disabled" text on the comment
>> form.
>
> What this means is that, if you enter an HTML tag it will be escaped
> on display and therefore tags you enter will be displayed as text and
> not interpreted by the browser as HTML tags.
>
>
>> 2. HTML subset available. Only certain html tags work. I'm not sure I
>> fully understand how this is supposed to be enforced, it looks like the
>> code tries to escape all html into < > syntax and then unescapes
>> certain tags.
>
> In this case, we escape on display and then un-escape certain tags
> that we deem to be safe.
>
>
>> 3. HTML enabled. Basically, anything goes.
>
>
> Yep, that's how they are supposed to work.
>
>
>
>> Now, from looking at the code it doesn't look like we are properly
>> providing these options. From my testing, when HTML is supposed to be
>> disabled it's not, and I could still put HTML in comments. And to a
>> larger degree, it looks like the escapeHTML() method isn't really doing
>> much. This basically means that the HTML subset option wasn't working
>> either because the comment was never being escaped in the first place,
>> so that option was just doing nothing.
>
> Works for me. Escaping completely prevents HTML tags from being
> interpreted by the browser.
>
>
>> So can someone verify that my assumptions above are correct and that's
>> the actual functionality we are shooting for?
>>
>> I think that fixing #1 should be relatively easy and we can just display
>> a comment error message to users if they put html in a comment when they
>> aren't supposed to.
>
> If we escape (as we are doing now) that works just fine.
>
>
> To handle the conditions for #1 and #3 I'd like to
>> just migrate the old "escape comment html" property to a new one called
>> "enable html in comments" and if it's enabled then we don't do anything
>> to check html in comments, if it's disabled then we reject any comments
>> with html in them and display and error message.
>>
>> #2 is a bit more tricky since you then need to be parsing the comment
>> and looking for specific html. At the end of the day this one feels
>> more like a comment validator than a built-in option to me because
>> effectively all you are trying to do is check if an incoming comment
>> uses any html that you are not allowing, which is the purpose of our
>> comment validators. So I think that one should be rewritten as a
>> comment validator.
>
> What's the problem with the white-list based HTML subsetting we have now?
>
> - Dave
Re: comment formatting and escaping is a bit kludgy
Posted by Dave <sn...@gmail.com>.
On 7/11/07, Allen Gilliland <al...@sun.com> wrote:
> Okay, this work is basically complete, but after doing a fair amount of
> testing I am a little uncertain if our handling of html in comments is
> working how we expected. My understanding from looking at the UI is
> that we are supposed to have these options ...
>
> 1. HTML disabled and not allowed in comments. Triggered by checking
> "escape comment html" on global config page. This is also the property
> which toggles the "HTML Syntax: Enabled/Disabled" text on the comment form.
What this means is that, if you enter an HTML tag it will be escaped
on display and therefore tags you enter will be displayed as text and
not interpreted by the browser as HTML tags.
> 2. HTML subset available. Only certain html tags work. I'm not sure I
> fully understand how this is supposed to be enforced, it looks like the
> code tries to escape all html into < > syntax and then unescapes
> certain tags.
In this case, we escape on display and then un-escape certain tags
that we deem to be safe.
> 3. HTML enabled. Basically, anything goes.
Yep, that's how they are supposed to work.
> Now, from looking at the code it doesn't look like we are properly
> providing these options. From my testing, when HTML is supposed to be
> disabled it's not, and I could still put HTML in comments. And to a
> larger degree, it looks like the escapeHTML() method isn't really doing
> much. This basically means that the HTML subset option wasn't working
> either because the comment was never being escaped in the first place,
> so that option was just doing nothing.
Works for me. Escaping completely prevents HTML tags from being
interpreted by the browser.
> So can someone verify that my assumptions above are correct and that's
> the actual functionality we are shooting for?
>
> I think that fixing #1 should be relatively easy and we can just display
> a comment error message to users if they put html in a comment when they
> aren't supposed to.
If we escape (as we are doing now) that works just fine.
To handle the conditions for #1 and #3 I'd like to
> just migrate the old "escape comment html" property to a new one called
> "enable html in comments" and if it's enabled then we don't do anything
> to check html in comments, if it's disabled then we reject any comments
> with html in them and display and error message.
>
> #2 is a bit more tricky since you then need to be parsing the comment
> and looking for specific html. At the end of the day this one feels
> more like a comment validator than a built-in option to me because
> effectively all you are trying to do is check if an incoming comment
> uses any html that you are not allowing, which is the purpose of our
> comment validators. So I think that one should be rewritten as a
> comment validator.
What's the problem with the white-list based HTML subsetting we have now?
- Dave
Re: comment formatting and escaping is a bit kludgy
Posted by Allen Gilliland <al...@sun.com>.
Okay, this work is basically complete, but after doing a fair amount of
testing I am a little uncertain if our handling of html in comments is
working how we expected. My understanding from looking at the UI is
that we are supposed to have these options ...
1. HTML disabled and not allowed in comments. Triggered by checking
"escape comment html" on global config page. This is also the property
which toggles the "HTML Syntax: Enabled/Disabled" text on the comment form.
2. HTML subset available. Only certain html tags work. I'm not sure I
fully understand how this is supposed to be enforced, it looks like the
code tries to escape all html into < > syntax and then unescapes
certain tags.
3. HTML enabled. Basically, anything goes.
Now, from looking at the code it doesn't look like we are properly
providing these options. From my testing, when HTML is supposed to be
disabled it's not, and I could still put HTML in comments. And to a
larger degree, it looks like the escapeHTML() method isn't really doing
much. This basically means that the HTML subset option wasn't working
either because the comment was never being escaped in the first place,
so that option was just doing nothing.
So can someone verify that my assumptions above are correct and that's
the actual functionality we are shooting for?
I think that fixing #1 should be relatively easy and we can just display
a comment error message to users if they put html in a comment when they
aren't supposed to. To handle the conditions for #1 and #3 I'd like to
just migrate the old "escape comment html" property to a new one called
"enable html in comments" and if it's enabled then we don't do anything
to check html in comments, if it's disabled then we reject any comments
with html in them and display and error message.
#2 is a bit more tricky since you then need to be parsing the comment
and looking for specific html. At the end of the day this one feels
more like a comment validator than a built-in option to me because
effectively all you are trying to do is check if an incoming comment
uses any html that you are not allowing, which is the purpose of our
comment validators. So I think that one should be rewritten as a
comment validator.
Does that all make sense?
-- Allen
Allen Gilliland wrote:
>
>
> Dave wrote:
>> On 7/10/07, Allen Gilliland <al...@sun.com> wrote
>>> One thing that I didn't mention but is probably worth saying ... it's
>>> also possible to design things a bit differently such that instead of
>>> saving comments unmodified and then applying reformatting plugins as
>>> they get displayed, we could apply the plugins on the incoming comment
>>> and save the transformed version.
>>>
>>> Each approach has it's own benefits. Applying plugins on the incoming
>>> comment simplifies things quite a bit because we no longer need to add
>>> the 'plugins' attribute to the db or pojo and it would improve
>>> performance a fair amount since we wouldn't need to apply XX plugins per
>>> comment during rendering. The downside is that since you modify the
>>> comment before saving it you have no way of getting back to the original
>>> comment if you wanted to, although that's not likely to be necessary.
>>>
>>> I kind of like the idea of just applying the plugins once as the comment
>>> is being submitted and then not having to worry about it after that, but
>>> the solution I detailed above is slightly more flexible.
>>
>> I don't have a strong feeling about this, but I guess I prefer to
>> store the raw comment data.
>
>
> Actually, I remember why I abandoned this option as I was prototyping.
> The major difficulty here is upgrade process because currently we are
> doing transformations of comment content as it gets displayed and so if
> we wanted to change that then it means going through all comments in the
> db, applying the tranformations, and then re-saving. Basically, a major
> PITA.
>
> -- Allen
>
>
>>
>> - Dave
Re: comment formatting and escaping is a bit kludgy
Posted by Allen Gilliland <al...@sun.com>.
Dave wrote:
> On 7/10/07, Allen Gilliland <al...@sun.com> wrote
>> One thing that I didn't mention but is probably worth saying ... it's
>> also possible to design things a bit differently such that instead of
>> saving comments unmodified and then applying reformatting plugins as
>> they get displayed, we could apply the plugins on the incoming comment
>> and save the transformed version.
>>
>> Each approach has it's own benefits. Applying plugins on the incoming
>> comment simplifies things quite a bit because we no longer need to add
>> the 'plugins' attribute to the db or pojo and it would improve
>> performance a fair amount since we wouldn't need to apply XX plugins per
>> comment during rendering. The downside is that since you modify the
>> comment before saving it you have no way of getting back to the original
>> comment if you wanted to, although that's not likely to be necessary.
>>
>> I kind of like the idea of just applying the plugins once as the comment
>> is being submitted and then not having to worry about it after that, but
>> the solution I detailed above is slightly more flexible.
>
> I don't have a strong feeling about this, but I guess I prefer to
> store the raw comment data.
Actually, I remember why I abandoned this option as I was prototyping.
The major difficulty here is upgrade process because currently we are
doing transformations of comment content as it gets displayed and so if
we wanted to change that then it means going through all comments in the
db, applying the tranformations, and then re-saving. Basically, a major
PITA.
-- Allen
>
> - Dave
Re: comment formatting and escaping is a bit kludgy
Posted by Dave <sn...@gmail.com>.
On 7/10/07, Allen Gilliland <al...@sun.com> wrote
> One thing that I didn't mention but is probably worth saying ... it's
> also possible to design things a bit differently such that instead of
> saving comments unmodified and then applying reformatting plugins as
> they get displayed, we could apply the plugins on the incoming comment
> and save the transformed version.
>
> Each approach has it's own benefits. Applying plugins on the incoming
> comment simplifies things quite a bit because we no longer need to add
> the 'plugins' attribute to the db or pojo and it would improve
> performance a fair amount since we wouldn't need to apply XX plugins per
> comment during rendering. The downside is that since you modify the
> comment before saving it you have no way of getting back to the original
> comment if you wanted to, although that's not likely to be necessary.
>
> I kind of like the idea of just applying the plugins once as the comment
> is being submitted and then not having to worry about it after that, but
> the solution I detailed above is slightly more flexible.
I don't have a strong feeling about this, but I guess I prefer to
store the raw comment data.
- Dave
Re: comment formatting and escaping is a bit kludgy
Posted by Allen Gilliland <al...@sun.com>.
Allen Gilliland wrote:
>
>
*snip*
>>
>> I think you are on the right track. Are you going to propose this for
>> 4.0 or later?
>
>
> I would like to get it done for 4.0 because it's one of the things that
> we want to get fixed up on our site and I already have the work done in
> my workspace. I think the work is small enough that it doesn't really
> need a full proposal, this is the list of changes and additions ...
>
> 1. Added new column 'plugins' to roller_comment table, varchar(255).
>
> 2. Added new attribute 'plugins' to WeblogEntryComment pojo, including
> mapping elements in jpa and hibernate mapping files.
>
> 3. Added new interface WeblogEntryCommentPlugin in
> business.plugins.comment package. Interface has 1 method,
> render(comment) which returns a string of the transformed content.
>
> 5. Added 1 new comment plugin, AutoformatPlugin which takes plain text
> and formats it using html. This is basically a copy of the convert
> linebreaks entry plugin as far as functionality goes.
>
> 6. Added 2 new methods to PluginManager and PluginManagerImpl.
> applyCommentPlugins(comment) which returns a string, and
> getCommentPlugins() which returns the list of plugins.
>
> 7. Added new property 'comments.plugins.classnames' in roller.properties
> which lists the configured comment plugin classes.
>
> 8. Added new runtime property 'users.comments.plugins' which is a comma
> separated list of the plugins which are enabled.
>
> 9. Modified CommentServlet to basically call
> comment.setPlugins(runtimeConfig.getProp('users.comments.plugins')).
> This basically applies whatever comment plugins are configured to the
> specific comment as it gets saved.
>
> 10. Modified Comment pojo wrapper so that the call to getContent()
> actually applies plugins before returning output.
>
> 11. Unit test for AutoformatPlugin.
>
> Things that I still need to do ...
>
> * Add to the current upgrade process to set roller_comment.plugins =
> 'AutoFormat Plugin' if the installation is currently using comment
> autoformatting.
>
> * Add a way for site admins to enable/disable comment plugins via the
> global settings page.
>
> * Cleanup current comment macros and move logic out of macro and into
> pojo wrapper. We already to html escaping in the pojo wrapper, but we
> also need to do a couple other things.
>
> That's it. Pretty straight forward really.
One thing that I didn't mention but is probably worth saying ... it's
also possible to design things a bit differently such that instead of
saving comments unmodified and then applying reformatting plugins as
they get displayed, we could apply the plugins on the incoming comment
and save the transformed version.
Each approach has it's own benefits. Applying plugins on the incoming
comment simplifies things quite a bit because we no longer need to add
the 'plugins' attribute to the db or pojo and it would improve
performance a fair amount since we wouldn't need to apply XX plugins per
comment during rendering. The downside is that since you modify the
comment before saving it you have no way of getting back to the original
comment if you wanted to, although that's not likely to be necessary.
I kind of like the idea of just applying the plugins once as the comment
is being submitted and then not having to worry about it after that, but
the solution I detailed above is slightly more flexible.
Any preferences one way or the other?
-- Allen
>
> -- Allen
>
>
>>
>> - Dave
Re: comment formatting and escaping is a bit kludgy
Posted by Allen Gilliland <al...@sun.com>.
Dave wrote:
> On 7/10/07, Allen Gilliland <al...@sun.com> wrote:
>> I would like to get it done for 4.0 because it's one of the things that
>> we want to get fixed up on our site and I already have the work done in
>> my workspace. I think the work is small enough that it doesn't really
>> need a full proposal, this is the list of changes and additions ...
>>
>> 1. Added new column 'plugins' to roller_comment table, varchar(255).
>
> That's OK with me. Glad to hear it's ready to commit.
>
> I'd like to cut off any further database changes after this, if possible.
Me too. I didn't want to make this change, it just became necessary
after I realized how things were currently working.
My 4.0 TODO list is getting very short now and I plan to wrap the final
things up as soon as I can call it done. The only thing I have left of
real importance is to fix up the issues with the task scheduling ...
ROL-1294.
-- Allen
>
> - Dave
Re: comment formatting and escaping is a bit kludgy
Posted by Dave <sn...@gmail.com>.
On 7/10/07, Allen Gilliland <al...@sun.com> wrote:
> I would like to get it done for 4.0 because it's one of the things that
> we want to get fixed up on our site and I already have the work done in
> my workspace. I think the work is small enough that it doesn't really
> need a full proposal, this is the list of changes and additions ...
>
> 1. Added new column 'plugins' to roller_comment table, varchar(255).
That's OK with me. Glad to hear it's ready to commit.
I'd like to cut off any further database changes after this, if possible.
- Dave
Re: comment formatting and escaping is a bit kludgy
Posted by Allen Gilliland <al...@sun.com>.
Dave wrote:
> On 7/9/07, Allen Gilliland <al...@sun.com> wrote:
>> i am looking at some ways to improve the comment system and provide
>> greater flexibility in how comment content is handled when it's
>> submitted, particularly in the realm of html vs. non-html comment
>> posting. xss is becoming so common now that allowing anonymous users to
>> post html comments is pretty dangerous and so I am trying to find some
>> better ways to allow for comment formatting to take place on non-html
>> comments.
>>
>> after looking through what we have right now i think we need to fix a
>> couple issues. the main problem is with the current autoformat and
>> escape html comment settings.
>>
>> 1. these settings are global settings and apply to all comments on all
>> blogs, however they are not tracked on a per comment basis and this
>> makes it nearly impossible to change these settings without messing up
>> old comments. for example, if you start your site by allowing html in
>> comments, then disable html and enable autoformatting then it's going to
>> apply that to all your old comments which don't need it and make things
>> look pretty ugly :/ same situation applies in reverse as well. if you
>> start by using autoformatting and then disable it then all your old
>> comments are going to show up unformatted.
>
> Yes. That's a pretty serious problem.
>
>
>> 2. these settings are applied in velocity templates and macros and are
>> not enforced in any way, so it would be easy for a savvy user to
>> circumvent these settings if they really wanted too. it is more
>> appropriate if we apply any plugins or transformations inside the code
>> so that we aren't relying on logic in macros/templates.
>
> I'm not so worried about blog owners circumventing the settings, but I
> agree that we should move away from logic in macros/templates wherever
> possible.
>
>
>> So I think we need to fix these things first, then I'd like to add a
>> more flexible way of applying custom formatting options to comments. #2
>> is very easy to fix since we can just put that logic in the comment
>> wrapper class. fixing #1 is more tricky.
>>
>> For #1 I would like to introduce the concept of pluggable
>> CommentFormatter classes. Basically, these would work exactly like
>> CommentValidators except that instead of just checking the comment for
>> some kind of illegal content they would actual be performing
>> transformations on the content, much like our weblog entry plugins do
>> now. So site admins would be able to define what comment formatters are
>> available to the application and enable/disable them globally to make
>> them apply to comments. We would track what formatters to apply to a
>> given comment by adding a new column in the db for 'formatters' and
>> listing which formatters to apply to each comment.
>>
>> We would start by defining formatters to replace our current settings
>> for autoformatting and escaping html and via a simple upgrade we can set
>> these formatters to all existing comments if those settings are enabled.
>>
>> Then moving forward it's very easy to add formatters to the application
>> and enable/disable them whenever the admin wants. Since the formatters
>> are tracked with the comments it is possible to disable a formatter from
>> being used on new comments but have it still available to apply to old
>> comments. This also provides a very simple way for site admins to
>> create custom formatters to suit whatever their needs may be, for
>> example you could do a wiki syntax comment formatter if you wanted.
>
> I think you are on the right track. Are you going to propose this for
> 4.0 or later?
I would like to get it done for 4.0 because it's one of the things that
we want to get fixed up on our site and I already have the work done in
my workspace. I think the work is small enough that it doesn't really
need a full proposal, this is the list of changes and additions ...
1. Added new column 'plugins' to roller_comment table, varchar(255).
2. Added new attribute 'plugins' to WeblogEntryComment pojo, including
mapping elements in jpa and hibernate mapping files.
3. Added new interface WeblogEntryCommentPlugin in
business.plugins.comment package. Interface has 1 method,
render(comment) which returns a string of the transformed content.
5. Added 1 new comment plugin, AutoformatPlugin which takes plain text
and formats it using html. This is basically a copy of the convert
linebreaks entry plugin as far as functionality goes.
6. Added 2 new methods to PluginManager and PluginManagerImpl.
applyCommentPlugins(comment) which returns a string, and
getCommentPlugins() which returns the list of plugins.
7. Added new property 'comments.plugins.classnames' in roller.properties
which lists the configured comment plugin classes.
8. Added new runtime property 'users.comments.plugins' which is a comma
separated list of the plugins which are enabled.
9. Modified CommentServlet to basically call
comment.setPlugins(runtimeConfig.getProp('users.comments.plugins')).
This basically applies whatever comment plugins are configured to the
specific comment as it gets saved.
10. Modified Comment pojo wrapper so that the call to getContent()
actually applies plugins before returning output.
11. Unit test for AutoformatPlugin.
Things that I still need to do ...
* Add to the current upgrade process to set roller_comment.plugins =
'AutoFormat Plugin' if the installation is currently using comment
autoformatting.
* Add a way for site admins to enable/disable comment plugins via the
global settings page.
* Cleanup current comment macros and move logic out of macro and into
pojo wrapper. We already to html escaping in the pojo wrapper, but we
also need to do a couple other things.
That's it. Pretty straight forward really.
-- Allen
>
> - Dave
Re: comment formatting and escaping is a bit kludgy
Posted by Dave <sn...@gmail.com>.
On 7/9/07, Allen Gilliland <al...@sun.com> wrote:
> i am looking at some ways to improve the comment system and provide
> greater flexibility in how comment content is handled when it's
> submitted, particularly in the realm of html vs. non-html comment
> posting. xss is becoming so common now that allowing anonymous users to
> post html comments is pretty dangerous and so I am trying to find some
> better ways to allow for comment formatting to take place on non-html
> comments.
>
> after looking through what we have right now i think we need to fix a
> couple issues. the main problem is with the current autoformat and
> escape html comment settings.
>
> 1. these settings are global settings and apply to all comments on all
> blogs, however they are not tracked on a per comment basis and this
> makes it nearly impossible to change these settings without messing up
> old comments. for example, if you start your site by allowing html in
> comments, then disable html and enable autoformatting then it's going to
> apply that to all your old comments which don't need it and make things
> look pretty ugly :/ same situation applies in reverse as well. if you
> start by using autoformatting and then disable it then all your old
> comments are going to show up unformatted.
Yes. That's a pretty serious problem.
> 2. these settings are applied in velocity templates and macros and are
> not enforced in any way, so it would be easy for a savvy user to
> circumvent these settings if they really wanted too. it is more
> appropriate if we apply any plugins or transformations inside the code
> so that we aren't relying on logic in macros/templates.
I'm not so worried about blog owners circumventing the settings, but I
agree that we should move away from logic in macros/templates wherever
possible.
> So I think we need to fix these things first, then I'd like to add a
> more flexible way of applying custom formatting options to comments. #2
> is very easy to fix since we can just put that logic in the comment
> wrapper class. fixing #1 is more tricky.
>
> For #1 I would like to introduce the concept of pluggable
> CommentFormatter classes. Basically, these would work exactly like
> CommentValidators except that instead of just checking the comment for
> some kind of illegal content they would actual be performing
> transformations on the content, much like our weblog entry plugins do
> now. So site admins would be able to define what comment formatters are
> available to the application and enable/disable them globally to make
> them apply to comments. We would track what formatters to apply to a
> given comment by adding a new column in the db for 'formatters' and
> listing which formatters to apply to each comment.
>
> We would start by defining formatters to replace our current settings
> for autoformatting and escaping html and via a simple upgrade we can set
> these formatters to all existing comments if those settings are enabled.
>
> Then moving forward it's very easy to add formatters to the application
> and enable/disable them whenever the admin wants. Since the formatters
> are tracked with the comments it is possible to disable a formatter from
> being used on new comments but have it still available to apply to old
> comments. This also provides a very simple way for site admins to
> create custom formatters to suit whatever their needs may be, for
> example you could do a wiki syntax comment formatter if you wanted.
I think you are on the right track. Are you going to propose this for
4.0 or later?
- Dave