You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tapestry.apache.org by Christian Edward Gruber <ch...@gmail.com> on 2009/05/07 22:26:47 UTC

Encoding with exceptions

Hi,

     I was considering how to write a minimal markup component - that  
is a component which can handle encoding raw text into safe text much  
the same way MarkupWriterImpl passes to a Text node which encodes html  
entities, etc., but in this case, letting a few of them pass.

     One simple way is just to create an <t:output/> like component  
which pre-encodes before passing on the text to the MarkupWriter, but  
I think that's sort of a cheap hack.  Can I contribute an alternate  
MarkupWriter implementation with a marker annotation that I can obtain  
for beginRender()?

     The use case I'm trying to solve (without doing a lot of extra  
code/storage, is to allow extremely minimal markup through -  
specifically <p>,<strong>,<em>, <ul>, <dl>, <ol>, <li>,  
<table><tr><th><td>.  It would disallow all other markup, and strip  
out any style, class, or id attributes.  It's to allow a bit of (safe)  
data entry that can include some rendering hints.

     Probably my first go will be to create a component that pre- 
encodes and does a MarkupWriter.writewRaw() with the results, but it  
feels like a bad hack.

cheers,
Christian.

Christian Edward Gruber
e-mail: christianedwardgruber@gmail.com
weblog: http://www.geekinasuit.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tapestry.apache.org
For additional commands, e-mail: users-help@tapestry.apache.org


Re: Encoding with exceptions

Posted by Christian Edward Gruber <ch...@gmail.com>.
That's also possible.  I could do that and then render... hmm...

Christian.

On 8-May-09, at 03:18 , Otho wrote:

> Is html input mandatory? Id not, how about using bbcode or some wiki  
> markup
> language?
>
> 2009/5/7 Christian Edward Gruber <ch...@gmail.com>
>
>> Yeah - I don't, at least not yet.  I probably will use such an editor
>> later, but I need a protected output system so I'm not using  
>> <t:outputRaw />
>> since that's quite dangerous when writing from a database.  I want  
>> ot make
>> sure that even if bad data got in, it can't come out as an XSS  
>> attack or
>> something.  But I need to let out a titch of markup.
>>
>> Christian.
>>
>>
>> On 7-May-09, at 17:28 , Martin Strand wrote:
>>
>> If you need to parse html input, from a rich text editor, a remote
>>> website, uploaded documents, etc, I would recommend nekohtml:
>>> http://nekohtml.sourceforge.net/
>>>
>>> It cleans up broken html and you can easily add a filter to only  
>>> allow
>>> certain tags:
>>> http://nekohtml.sourceforge.net/filters.html
>>>
>>> Martin
>>>
>>> On Thu, 07 May 2009 22:58:56 +0200, Howard Lewis Ship <hlship@gmail.com 
>>> >
>>> wrote:
>>>
>>> I'd tend to do this on the other end, if possible; parse user input
>>>> (or RSS feed, or whatever) into XML and transform out the content  
>>>> you
>>>> don't like, then store that in DB or render it raw.
>>>>
>>>> On Thu, May 7, 2009 at 1:26 PM, Christian Edward Gruber
>>>> <ch...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>  I was considering how to write a minimal markup component -  
>>>>> that is a
>>>>> component which can handle encoding raw text into safe text much  
>>>>> the
>>>>> same
>>>>> way MarkupWriterImpl passes to a Text node which encodes html  
>>>>> entities,
>>>>> etc., but in this case, letting a few of them pass.
>>>>>
>>>>>  One simple way is just to create an <t:output/> like component  
>>>>> which
>>>>> pre-encodes before passing on the text to the MarkupWriter, but  
>>>>> I think
>>>>> that's sort of a cheap hack.  Can I contribute an alternate  
>>>>> MarkupWriter
>>>>> implementation with a marker annotation that I can obtain for
>>>>> beginRender()?
>>>>>
>>>>>  The use case I'm trying to solve (without doing a lot of extra
>>>>> code/storage, is to allow extremely minimal markup through -
>>>>> specifically
>>>>> <p>,<strong>,<em>, <ul>, <dl>, <ol>, <li>, <table><tr><th><td>.   
>>>>> It
>>>>> would
>>>>> disallow all other markup, and strip out any style, class, or id
>>>>> attributes.
>>>>> It's to allow a bit of (safe) data entry that can include some
>>>>> rendering
>>>>> hints.
>>>>>
>>>>>  Probably my first go will be to create a component that pre- 
>>>>> encodes
>>>>> and
>>>>> does a MarkupWriter.writewRaw() with the results, but it feels  
>>>>> like a
>>>>> bad
>>>>> hack.
>>>>>
>>>>> cheers,
>>>>> Christian.
>>>>>
>>>>> Christian Edward Gruber
>>>>> e-mail: christianedwardgruber@gmail.com
>>>>> weblog: http://www.geekinasuit.com/
>>>>>
>>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@tapestry.apache.org
>>> For additional commands, e-mail: users-help@tapestry.apache.org
>>>
>>>
>> Christian Edward Gruber
>> e-mail: christianedwardgruber@gmail.com
>> weblog: http://www.geekinasuit.com/
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@tapestry.apache.org
>> For additional commands, e-mail: users-help@tapestry.apache.org
>>
>>

Christian Edward Gruber
e-mail: christianedwardgruber@gmail.com
weblog: http://www.geekinasuit.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tapestry.apache.org
For additional commands, e-mail: users-help@tapestry.apache.org


Re: Encoding with exceptions

Posted by Otho <ta...@googlemail.com>.
Is html input mandatory? Id not, how about using bbcode or some wiki markup
language?

2009/5/7 Christian Edward Gruber <ch...@gmail.com>

> Yeah - I don't, at least not yet.  I probably will use such an editor
> later, but I need a protected output system so I'm not using <t:outputRaw />
> since that's quite dangerous when writing from a database.  I want ot make
> sure that even if bad data got in, it can't come out as an XSS attack or
> something.  But I need to let out a titch of markup.
>
> Christian.
>
>
> On 7-May-09, at 17:28 , Martin Strand wrote:
>
>  If you need to parse html input, from a rich text editor, a remote
>> website, uploaded documents, etc, I would recommend nekohtml:
>> http://nekohtml.sourceforge.net/
>>
>> It cleans up broken html and you can easily add a filter to only allow
>> certain tags:
>> http://nekohtml.sourceforge.net/filters.html
>>
>> Martin
>>
>> On Thu, 07 May 2009 22:58:56 +0200, Howard Lewis Ship <hl...@gmail.com>
>> wrote:
>>
>>  I'd tend to do this on the other end, if possible; parse user input
>>> (or RSS feed, or whatever) into XML and transform out the content you
>>> don't like, then store that in DB or render it raw.
>>>
>>> On Thu, May 7, 2009 at 1:26 PM, Christian Edward Gruber
>>> <ch...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>>   I was considering how to write a minimal markup component - that is a
>>>> component which can handle encoding raw text into safe text much the
>>>> same
>>>> way MarkupWriterImpl passes to a Text node which encodes html entities,
>>>> etc., but in this case, letting a few of them pass.
>>>>
>>>>   One simple way is just to create an <t:output/> like component which
>>>> pre-encodes before passing on the text to the MarkupWriter, but I think
>>>> that's sort of a cheap hack.  Can I contribute an alternate MarkupWriter
>>>> implementation with a marker annotation that I can obtain for
>>>> beginRender()?
>>>>
>>>>   The use case I'm trying to solve (without doing a lot of extra
>>>> code/storage, is to allow extremely minimal markup through -
>>>> specifically
>>>> <p>,<strong>,<em>, <ul>, <dl>, <ol>, <li>, <table><tr><th><td>.  It
>>>> would
>>>> disallow all other markup, and strip out any style, class, or id
>>>> attributes.
>>>>  It's to allow a bit of (safe) data entry that can include some
>>>> rendering
>>>> hints.
>>>>
>>>>   Probably my first go will be to create a component that pre-encodes
>>>> and
>>>> does a MarkupWriter.writewRaw() with the results, but it feels like a
>>>> bad
>>>> hack.
>>>>
>>>> cheers,
>>>> Christian.
>>>>
>>>> Christian Edward Gruber
>>>> e-mail: christianedwardgruber@gmail.com
>>>> weblog: http://www.geekinasuit.com/
>>>>
>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@tapestry.apache.org
>> For additional commands, e-mail: users-help@tapestry.apache.org
>>
>>
> Christian Edward Gruber
> e-mail: christianedwardgruber@gmail.com
> weblog: http://www.geekinasuit.com/
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tapestry.apache.org
> For additional commands, e-mail: users-help@tapestry.apache.org
>
>

Re: Encoding with exceptions

Posted by Christian Edward Gruber <ch...@gmail.com>.
Yeah - I don't, at least not yet.  I probably will use such an editor  
later, but I need a protected output system so I'm not using  
<t:outputRaw /> since that's quite dangerous when writing from a  
database.  I want ot make sure that even if bad data got in, it can't  
come out as an XSS attack or something.  But I need to let out a titch  
of markup.

Christian.

On 7-May-09, at 17:28 , Martin Strand wrote:

> If you need to parse html input, from a rich text editor, a remote  
> website, uploaded documents, etc, I would recommend nekohtml:
> http://nekohtml.sourceforge.net/
>
> It cleans up broken html and you can easily add a filter to only  
> allow certain tags:
> http://nekohtml.sourceforge.net/filters.html
>
> Martin
>
> On Thu, 07 May 2009 22:58:56 +0200, Howard Lewis Ship <hlship@gmail.com 
> > wrote:
>
>> I'd tend to do this on the other end, if possible; parse user input
>> (or RSS feed, or whatever) into XML and transform out the content you
>> don't like, then store that in DB or render it raw.
>>
>> On Thu, May 7, 2009 at 1:26 PM, Christian Edward Gruber
>> <ch...@gmail.com> wrote:
>>> Hi,
>>>
>>>    I was considering how to write a minimal markup component -  
>>> that is a
>>> component which can handle encoding raw text into safe text much  
>>> the same
>>> way MarkupWriterImpl passes to a Text node which encodes html  
>>> entities,
>>> etc., but in this case, letting a few of them pass.
>>>
>>>    One simple way is just to create an <t:output/> like component  
>>> which
>>> pre-encodes before passing on the text to the MarkupWriter, but I  
>>> think
>>> that's sort of a cheap hack.  Can I contribute an alternate  
>>> MarkupWriter
>>> implementation with a marker annotation that I can obtain for  
>>> beginRender()?
>>>
>>>    The use case I'm trying to solve (without doing a lot of extra
>>> code/storage, is to allow extremely minimal markup through -  
>>> specifically
>>> <p>,<strong>,<em>, <ul>, <dl>, <ol>, <li>, <table><tr><th><td>.   
>>> It would
>>> disallow all other markup, and strip out any style, class, or id  
>>> attributes.
>>>  It's to allow a bit of (safe) data entry that can include some  
>>> rendering
>>> hints.
>>>
>>>    Probably my first go will be to create a component that pre- 
>>> encodes and
>>> does a MarkupWriter.writewRaw() with the results, but it feels  
>>> like a bad
>>> hack.
>>>
>>> cheers,
>>> Christian.
>>>
>>> Christian Edward Gruber
>>> e-mail: christianedwardgruber@gmail.com
>>> weblog: http://www.geekinasuit.com/
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tapestry.apache.org
> For additional commands, e-mail: users-help@tapestry.apache.org
>

Christian Edward Gruber
e-mail: christianedwardgruber@gmail.com
weblog: http://www.geekinasuit.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tapestry.apache.org
For additional commands, e-mail: users-help@tapestry.apache.org


Re: Encoding with exceptions

Posted by Howard Lewis Ship <hl...@gmail.com>.
Cool; I tend to do a lot of that from Ruby, using Hpricot.

On Thu, May 7, 2009 at 2:28 PM, Martin Strand
<do...@gmail.com> wrote:
> If you need to parse html input, from a rich text editor, a remote website, uploaded documents, etc, I would recommend nekohtml:
> http://nekohtml.sourceforge.net/
>
> It cleans up broken html and you can easily add a filter to only allow certain tags:
> http://nekohtml.sourceforge.net/filters.html
>
> Martin
>
> On Thu, 07 May 2009 22:58:56 +0200, Howard Lewis Ship <hl...@gmail.com> wrote:
>
>> I'd tend to do this on the other end, if possible; parse user input
>> (or RSS feed, or whatever) into XML and transform out the content you
>> don't like, then store that in DB or render it raw.
>>
>> On Thu, May 7, 2009 at 1:26 PM, Christian Edward Gruber
>> <ch...@gmail.com> wrote:
>>> Hi,
>>>
>>>    I was considering how to write a minimal markup component - that is a
>>> component which can handle encoding raw text into safe text much the same
>>> way MarkupWriterImpl passes to a Text node which encodes html entities,
>>> etc., but in this case, letting a few of them pass.
>>>
>>>    One simple way is just to create an <t:output/> like component which
>>> pre-encodes before passing on the text to the MarkupWriter, but I think
>>> that's sort of a cheap hack.  Can I contribute an alternate MarkupWriter
>>> implementation with a marker annotation that I can obtain for beginRender()?
>>>
>>>    The use case I'm trying to solve (without doing a lot of extra
>>> code/storage, is to allow extremely minimal markup through - specifically
>>> <p>,<strong>,<em>, <ul>, <dl>, <ol>, <li>, <table><tr><th><td>.  It would
>>> disallow all other markup, and strip out any style, class, or id attributes.
>>>  It's to allow a bit of (safe) data entry that can include some rendering
>>> hints.
>>>
>>>    Probably my first go will be to create a component that pre-encodes and
>>> does a MarkupWriter.writewRaw() with the results, but it feels like a bad
>>> hack.
>>>
>>> cheers,
>>> Christian.
>>>
>>> Christian Edward Gruber
>>> e-mail: christianedwardgruber@gmail.com
>>> weblog: http://www.geekinasuit.com/
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tapestry.apache.org
> For additional commands, e-mail: users-help@tapestry.apache.org
>
>



-- 
Howard M. Lewis Ship

Creator of Apache Tapestry
Director of Open Source Technology at Formos

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tapestry.apache.org
For additional commands, e-mail: users-help@tapestry.apache.org


Re: Encoding with exceptions

Posted by Martin Strand <do...@gmail.com>.
If you need to parse html input, from a rich text editor, a remote website, uploaded documents, etc, I would recommend nekohtml:
http://nekohtml.sourceforge.net/

It cleans up broken html and you can easily add a filter to only allow certain tags:
http://nekohtml.sourceforge.net/filters.html

Martin

On Thu, 07 May 2009 22:58:56 +0200, Howard Lewis Ship <hl...@gmail.com> wrote:

> I'd tend to do this on the other end, if possible; parse user input
> (or RSS feed, or whatever) into XML and transform out the content you
> don't like, then store that in DB or render it raw.
>
> On Thu, May 7, 2009 at 1:26 PM, Christian Edward Gruber
> <ch...@gmail.com> wrote:
>> Hi,
>>
>>    I was considering how to write a minimal markup component - that is a
>> component which can handle encoding raw text into safe text much the same
>> way MarkupWriterImpl passes to a Text node which encodes html entities,
>> etc., but in this case, letting a few of them pass.
>>
>>    One simple way is just to create an <t:output/> like component which
>> pre-encodes before passing on the text to the MarkupWriter, but I think
>> that's sort of a cheap hack.  Can I contribute an alternate MarkupWriter
>> implementation with a marker annotation that I can obtain for beginRender()?
>>
>>    The use case I'm trying to solve (without doing a lot of extra
>> code/storage, is to allow extremely minimal markup through - specifically
>> <p>,<strong>,<em>, <ul>, <dl>, <ol>, <li>, <table><tr><th><td>.  It would
>> disallow all other markup, and strip out any style, class, or id attributes.
>>  It's to allow a bit of (safe) data entry that can include some rendering
>> hints.
>>
>>    Probably my first go will be to create a component that pre-encodes and
>> does a MarkupWriter.writewRaw() with the results, but it feels like a bad
>> hack.
>>
>> cheers,
>> Christian.
>>
>> Christian Edward Gruber
>> e-mail: christianedwardgruber@gmail.com
>> weblog: http://www.geekinasuit.com/
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tapestry.apache.org
For additional commands, e-mail: users-help@tapestry.apache.org


Re: Encoding with exceptions

Posted by Howard Lewis Ship <hl...@gmail.com>.
I'd tend to do this on the other end, if possible; parse user input
(or RSS feed, or whatever) into XML and transform out the content you
don't like, then store that in DB or render it raw.

On Thu, May 7, 2009 at 1:26 PM, Christian Edward Gruber
<ch...@gmail.com> wrote:
> Hi,
>
>    I was considering how to write a minimal markup component - that is a
> component which can handle encoding raw text into safe text much the same
> way MarkupWriterImpl passes to a Text node which encodes html entities,
> etc., but in this case, letting a few of them pass.
>
>    One simple way is just to create an <t:output/> like component which
> pre-encodes before passing on the text to the MarkupWriter, but I think
> that's sort of a cheap hack.  Can I contribute an alternate MarkupWriter
> implementation with a marker annotation that I can obtain for beginRender()?
>
>    The use case I'm trying to solve (without doing a lot of extra
> code/storage, is to allow extremely minimal markup through - specifically
> <p>,<strong>,<em>, <ul>, <dl>, <ol>, <li>, <table><tr><th><td>.  It would
> disallow all other markup, and strip out any style, class, or id attributes.
>  It's to allow a bit of (safe) data entry that can include some rendering
> hints.
>
>    Probably my first go will be to create a component that pre-encodes and
> does a MarkupWriter.writewRaw() with the results, but it feels like a bad
> hack.
>
> cheers,
> Christian.
>
> Christian Edward Gruber
> e-mail: christianedwardgruber@gmail.com
> weblog: http://www.geekinasuit.com/
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tapestry.apache.org
> For additional commands, e-mail: users-help@tapestry.apache.org
>
>



-- 
Howard M. Lewis Ship

Creator of Apache Tapestry
Director of Open Source Technology at Formos

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tapestry.apache.org
For additional commands, e-mail: users-help@tapestry.apache.org