You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@struts.apache.org by egetchell <er...@distributedlogic.com> on 2008/10/06 21:11:09 UTC

Using POSIX Regular Expressions for Internationalized Validation

All,

I am one of the architects behind a multi-language site using Struts 2. To
mitigate XSS exposure, defining a whitelist of allowable characters is the
normal approach, but seems to become a non-trivial exercise when supporting
multiple languages (we will be supporting 15). My understanding is using
POSIX based regular expressions allow combining language-specific character
sets in a single regular expression. This is my first foray into the
subject of multi-language validation, so I apologize in advance if this is
not the correct forum for these two questions:

First, does Struts 2 support POSIX regular expressions? I’ve tried a bunch
of attempts to just get a simple example working and have had little luck.
>From my research, I believe the following is expected to work:

<field-validator type="regex">

<![CDATA[\\p{Alpha}*]]>

<message>Invalid Field</message>
</field-validator>

Secondly, as a more general (and possibly non-Struts2 specific question),
has anyone had experience in implementing multi-language whitelists? The
information on the Internet is quite vague, so I’m not sure if this is still
a black art making people a bit more close-lipped on the subject.

Thanks!

Eric Getchell | Sr. Technologist

Distributed Logic Corporation
600 Unicorn Park
Woburn, MA 01801
Email: eric.getchell@distributedlogic.com

--
View this message in context: http://www.nabble.com/Using-POSIX-Regular-Expressions-for-Internationalized-Validation-tp19844314p19844314.html
Sent from the Struts - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
For additional commands, e-mail: user-help@struts.apache.org

Re: Using POSIX Regular Expressions for Internationalized Validation

Posted by Greg Lindholm <gl...@yahoo.com>.

I can see that approach may work for certain restrictive fields like their
postal code example but as you are finding it's pretty unworkable in
multi-language unicode applications.  I've always had to deal with input
fields for notes, comments, descriptions etc. where there are no
restrictions and special html characters like '<', '>' are allowed, with
these you have no choice but to escape properly.  

If you want to validate in Java you can use Character.isLetter(),
Character.isDigit(), or org.apache.commons.lang.StringUtils, etc. and these
work for all the unicode languages, but trying to do this with declarative
validation using Regular Expressions... good luck.

egetchell wrote:
> 
> Greg,
> 
> Thanks for the reply.
> 
> The common approach for mitigating XSS is to provide a blacklist of XSS
> enabling characters, enables would include "<", ">", "%3f", etc.  However,
> these filters are easily bypassed by clever encoding constructs, so the
> blacklist concept quickly fails and the site is open for attack.  
> 
> By inverting the solution and supplying only the allowed set of
> characters, the site remains secure no matter what clever encoding scheme
> someone dreams up.  
> 
> The OWASP group provides some pretty extensive documentation around this. 
> Here is a direct link to some common validation strategies:
> http://www.owasp.org/index.php/Data_Validation#Data_Validation_Strategies
> 
> Their document, as a whole, is a very intereseting read.
> 
> 
> Greg Lindholm wrote:
>> 
>> Sorry, I've never heard of whitelisting of allowable characters as being
>> a "normal" approach. <Remainder Removed> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Using-POSIX-Regular-Expressions-for-Internationalized-Validation-tp19844314p19861490.html
Sent from the Struts - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
For additional commands, e-mail: user-help@struts.apache.org

RE: Using POSIX Regular Expressions for Internationalized Validation

Posted by Jishnu Viswanath <ji...@tavant.com>.

Hi,
	Any way you want to unescape, something you escaped, I don't
know but what ever you put in the text field is got from the getter

public String getFieldName(){
	//TODO: Decode/Unescape here
	return this.fieldName;
}

Regards,

Jishnu Viswanath

Software Engineer

*(+9180)41190300 - 222(Ext) ll * ( + 91 ) 9731209330ll

Tavant Technologies Inc.,

www.tavant.com

PEOPLE :: PASSION :: EXCELLENCE

-----Original Message-----
From: egetchell [mailto:eric.getchell@distributedlogic.com] 
Sent: Wednesday, October 08, 2008 1:48 AM
To: user@struts.apache.org
Subject: Re: Using POSIX Regular Expressions for Internationalized
Validation

That's an interesting approach you guys are proposing.  

I did a quick proof of concept where I coded an Interceptor that uses
the
Apache Commons StringEscapeUtils.escapeHtml function to update all
incoming
parameter values.  This seems to implement what you guys suggested.  

What is your approach for then displaying this data?  For example, in my
proof of concept, when I escape Japanese Shift-JIS input, the escaped
values
are persisted to the database, and rendered to the browser in the
escaped
format.  Do you unescape the prior to persisting it data (as it did pass
validation), or do you have special logic in the actions that will
unescape
all properties prior to the JSP page rendering the data? 

Eric

Laurie Harper wrote:
> 
> The validation strategy you cite is well and good when the you *have*
'a 
> set of tightly constrained known good values.' It's not useful in the 
> general case.
> 
> Your concerns with respect to XSS should only present a problem if you

> need to render untrusted HTML (such as is often the case with web-base

> email applications, for example). Unless you need to preserve 
> user-submitted HTML, though, the correct answer is, as Greg said, to 
> HTML-escape all user supplied data (or at least, all user supplied
data 
> you haven't previously sanitized via strategies such as you
referenced).
> 
> If you do that, the browser will never see anything harmful in a
context 
> it will treat as anything other than text (i.e. it will never try to 
> interpret such data as markup) and therefore you wont be vulnerable.
> 
> L.
> 
> 

-- 
View this message in context:
http://www.nabble.com/Using-POSIX-Regular-Expressions-for-Internationali
zed-Validation-tp19844314p19866354.html
Sent from the Struts - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
For additional commands, e-mail: user-help@struts.apache.org

Any comments or statements made in this email are not necessarily those of Tavant Technologies.
The information transmitted is intended only for the person or entity to which it is addressed and may 
contain confidential and/or privileged material. If you have received this in error, please contact the 
sender and delete the material from any computer. All e-mails sent from or to Tavant Technologies 
may be subject to our monitoring procedures.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
For additional commands, e-mail: user-help@struts.apache.org

Re: Using POSIX Regular Expressions for Internationalized Validation

Posted by egetchell <er...@distributedlogic.com>.

That’s an interesting approach you guys are proposing.  

I did a quick proof of concept where I coded an Interceptor that uses the
Apache Commons StringEscapeUtils.escapeHtml function to update all incoming
parameter values.  This seems to implement what you guys suggested.  

What is your approach for then displaying this data?  For example, in my
proof of concept, when I escape Japanese Shift-JIS input, the escaped values
are persisted to the database, and rendered to the browser in the escaped
format.  Do you unescape the prior to persisting it data (as it did pass
validation), or do you have special logic in the actions that will unescape
all properties prior to the JSP page rendering the data? 

Eric

Laurie Harper wrote:
> 
> The validation strategy you cite is well and good when the you *have* 'a 
> set of tightly constrained known good values.' It's not useful in the 
> general case.
> 
> Your concerns with respect to XSS should only present a problem if you 
> need to render untrusted HTML (such as is often the case with web-base 
> email applications, for example). Unless you need to preserve 
> user-submitted HTML, though, the correct answer is, as Greg said, to 
> HTML-escape all user supplied data (or at least, all user supplied data 
> you haven't previously sanitized via strategies such as you referenced).
> 
> If you do that, the browser will never see anything harmful in a context 
> it will treat as anything other than text (i.e. it will never try to 
> interpret such data as markup) and therefore you wont be vulnerable.
> 
> L.
> 
> 

-- 
View this message in context: http://www.nabble.com/Using-POSIX-Regular-Expressions-for-Internationalized-Validation-tp19844314p19866354.html
Sent from the Struts - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
For additional commands, e-mail: user-help@struts.apache.org

Re: Using POSIX Regular Expressions for Internationalized Validation

Posted by Laurie Harper <la...@holoweb.net>.

The validation strategy you cite is well and good when the you *have* 'a 
set of tightly constrained known good values.' It's not useful in the 
general case.

Your concerns with respect to XSS should only present a problem if you 
need to render untrusted HTML (such as is often the case with web-base 
email applications, for example). Unless you need to preserve 
user-submitted HTML, though, the correct answer is, as Greg said, to 
HTML-escape all user supplied data (or at least, all user supplied data 
you haven't previously sanitized via strategies such as you referenced).

If you do that, the browser will never see anything harmful in a context 
it will treat as anything other than text (i.e. it will never try to 
interpret such data as markup) and therefore you wont be vulnerable.

L.

egetchell wrote:
> Greg,
> 
> Thanks for the reply.
> 
> The common approach for mitigating XSS is to provide a blacklist of XSS
> enabling characters, enables would include "<", ">", "%3f", etc.  However,
> these filters are easily bypassed by clever encoding constructs, so the
> blacklist concept quickly fails and the site is open for attack.  
> 
> By inverting the solution and supplying only the allowed set of characters,
> the site remains secure no matter what clever encoding scheme someone dreams
> up.  
> 
> The OWASP group provides some pretty extensive documentation around this. 
> Here is a direct link to some common validation strategies:
> http://www.owasp.org/index.php/Data_Validation#Data_Validation_Strategies
> 
> Their document, as a whole, is a very intereseting read.
> 
> 
> Greg Lindholm wrote:
>> Sorry, I've never heard of whitelisting of allowable characters as being a
>> "normal" approach. <Remainder Removed> 
>>
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
For additional commands, e-mail: user-help@struts.apache.org

RE: Using POSIX Regular Expressions for Internationalized Validation

Posted by egetchell <er...@distributedlogic.com>.

Everything is clear now.  I had misunderstood the previous posts as
suggestions to physically alter the user’s input via HTML
escaping/unescaping at the boundary layer of Struts.  

I totally agree with this XSS mitigation approach and have added the
escape=”true” attribute to all our property tags and verified we don’t use
the text tag anywhere.

No issues with SQL injection as we’ve been using prepared statements in
Hibernate from day one.

Thanks all!


Brad A Cupit wrote:
> 
> From: egetchell [mailto:eric.getchell@distributedlogic.com] 
> Sent: Wednesday, October 08, 2008 11:56 AM
>> The one thing I noticed is that this escaped
>> data is not translated back to the character
>> set when fed into an input field.  
> 
> Perhaps this is an over simplification, but could you just persist the
> raw, unescaped text that the user inputs, then use something like this:
> 
> <s:property value="%{rawText}" escape="true"/>
>   -- or --
> <c:out value=${rawText} escapeXml="true"/>
> 
> For text fields you could then just use the rawText unescaped and it would
> be exactly the way the user entered it.
> 
> Looking back in the history for this post, this idea is basically what
> Greg Lindholm suggested [1].
> 
> To reword what he also said about SQL injection:
> Just use PreparedStatements with '?' placeholders (or Hibernate, or some
> other library which will protect you from SQL injection attacks).
> 
> [1]
> http://www.nabble.com/Using-POSIX-Regular-Expressions-for-Internationalized-Validation-td19844314.html#a19858027
> 
> Brad Cupit
> Louisiana State University - UIS
> 
> 

-- 
View this message in context: http://www.nabble.com/Using-POSIX-Regular-Expressions-for-Internationalized-Validation-tp19844314p19899277.html
Sent from the Struts - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
For additional commands, e-mail: user-help@struts.apache.org

RE: Using POSIX Regular Expressions for Internationalized Validation

Posted by Brad A Cupit <br...@lsu.edu>.

From: egetchell [mailto:eric.getchell@distributedlogic.com] 
Sent: Wednesday, October 08, 2008 11:56 AM
> The one thing I noticed is that this escaped
> data is not translated back to the character
> set when fed into an input field.  

Perhaps this is an over simplification, but could you just persist the raw, unescaped text that the user inputs, then use something like this:

<s:property value="%{rawText}" escape="true"/>
  -- or --
<c:out value=${rawText} escapeXml="true"/>

For text fields you could then just use the rawText unescaped and it would be exactly the way the user entered it.

Looking back in the history for this post, this idea is basically what Greg Lindholm suggested [1].

To reword what he also said about SQL injection:
Just use PreparedStatements with '?' placeholders (or Hibernate, or some other library which will protect you from SQL injection attacks).

[1] http://www.nabble.com/Using-POSIX-Regular-Expressions-for-Internationalized-Validation-td19844314.html#a19858027

Brad Cupit
Louisiana State University - UIS

RE: Using POSIX Regular Expressions for Internationalized Validation

Posted by egetchell <er...@distributedlogic.com>.

I did determine why POSIX regular expressions did not seem to be working. 
The server-side Java-based validations work correctly, it was the
client-side JavaScript implementation that was failing when constructing the
regular expression.  From my brief investigation into this, it would seem
that the JavaScript engine in the browser uses a slightly different POSIX
dialect than Java.  

Back to the general XSS mitigation approach, I am curious to others
experiences with HTML escaping.  It would seem that I would need to know the
context in which a piece of data is being used for this technique to work
correctly.  I tried a second attempt at the HTML encoding, and that is to
*only* HTML escape the data being fed to the UI.  The one thing I noticed is
that this escaped data is not translated back to the character set when fed
into an input field.  So, HTML escaped Shift-JIS data displays correctly in
static HTML, but remains as the escaped values when loaded into a input
field.  I didn’t find any Struts 2 tag option to unescape data, implying
that I would need to conditionally encode the data going to the UI based on
the context in which it is to be used.  This strikes me that it would seem
to handcuff you when implementing a boundary solution as the should not know
how the piece of data is being used.

Jishnu Viswanath wrote:
> 
> Hey egetchell,
> 	Don't know weather that's your name but any way.
> I don't know this is the solution you are looking for
> 	<field name="nameOfTheField">
> 		<field-validator type="typeOfValidator">
> 			<message key="error.validation.regexp"/>
> 		</field-validator>
> 	</field>
> 
> Now you need to map the validator,
> Put a validators.xml in resources folder, same folder as struts.xml
> exist
> 
> 
> <validators>
>     <validator name=" typeOfValidator " class="package.ClassName"/>
> </validators>
> 
> ClassName should extend RegexFieldValidator
> Override validate method, do what ever you want there. This should work.
> 
> 

-- 
View this message in context: http://www.nabble.com/Using-POSIX-Regular-Expressions-for-Internationalized-Validation-tp19844314p19883000.html
Sent from the Struts - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
For additional commands, e-mail: user-help@struts.apache.org

RE: Using POSIX Regular Expressions for Internationalized Validation

Posted by Jishnu Viswanath <ji...@tavant.com>.

Hey egetchell,
	Don't know weather that's your name but any way.
I don't know this is the solution you are looking for
	<field name="nameOfTheField">
		<field-validator type="typeOfValidator">
			<message key="error.validation.regexp"/>
		</field-validator>
	</field>

Now you need to map the validator,
Put a validators.xml in resources folder, same folder as struts.xml
exist

<validators>
    <validator name=" typeOfValidator " class="package.ClassName"/>
</validators>

ClassName should extend RegexFieldValidator
Override validate method, do what ever you want there. This should work.

Regards,

Jishnu Viswanath

Software Engineer

*(+9180)41190300 - 222(Ext) ll * ( + 91 ) 9731209330ll

Tavant Technologies Inc.,

www.tavant.com

PEOPLE :: PASSION :: EXCELLENCE

-----Original Message-----
From: egetchell [mailto:eric.getchell@distributedlogic.com] 
Sent: Tuesday, October 07, 2008 8:02 PM
To: user@struts.apache.org
Subject: Re: Using POSIX Regular Expressions for Internationalized
Validation

Greg,

Thanks for the reply.

The common approach for mitigating XSS is to provide a blacklist of XSS
enabling characters, enables would include "<", ">", "%3f", etc.
However,
these filters are easily bypassed by clever encoding constructs, so the
blacklist concept quickly fails and the site is open for attack.  

By inverting the solution and supplying only the allowed set of
characters,
the site remains secure no matter what clever encoding scheme someone
dreams
up.  

The OWASP group provides some pretty extensive documentation around
this. 
Here is a direct link to some common validation strategies:
http://www.owasp.org/index.php/Data_Validation#Data_Validation_Strategie
s

Their document, as a whole, is a very intereseting read.

Greg Lindholm wrote:
> 
> Sorry, I've never heard of whitelisting of allowable characters as
being a
> "normal" approach. <Remainder Removed> 
> 

-- 
View this message in context:
http://www.nabble.com/Using-POSIX-Regular-Expressions-for-Internationali
zed-Validation-tp19844314p19859522.html
Sent from the Struts - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
For additional commands, e-mail: user-help@struts.apache.org

Any comments or statements made in this email are not necessarily those of Tavant Technologies.
The information transmitted is intended only for the person or entity to which it is addressed and may 
contain confidential and/or privileged material. If you have received this in error, please contact the 
sender and delete the material from any computer. All e-mails sent from or to Tavant Technologies 
may be subject to our monitoring procedures.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
For additional commands, e-mail: user-help@struts.apache.org

Re: Using POSIX Regular Expressions for Internationalized Validation

Posted by egetchell <er...@distributedlogic.com>.

Greg,

Thanks for the reply.

The common approach for mitigating XSS is to provide a blacklist of XSS
enabling characters, enables would include "<", ">", "%3f", etc.  However,
these filters are easily bypassed by clever encoding constructs, so the
blacklist concept quickly fails and the site is open for attack.  

By inverting the solution and supplying only the allowed set of characters,
the site remains secure no matter what clever encoding scheme someone dreams
up.  

The OWASP group provides some pretty extensive documentation around this. 
Here is a direct link to some common validation strategies:
http://www.owasp.org/index.php/Data_Validation#Data_Validation_Strategies

Their document, as a whole, is a very intereseting read.

Greg Lindholm wrote:
> 
> Sorry, I've never heard of whitelisting of allowable characters as being a
> "normal" approach. <Remainder Removed> 
> 

-- 
View this message in context: http://www.nabble.com/Using-POSIX-Regular-Expressions-for-Internationalized-Validation-tp19844314p19859522.html
Sent from the Struts - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
For additional commands, e-mail: user-help@struts.apache.org

Re: Using POSIX Regular Expressions for Internationalized Validation

Posted by Greg Lindholm <gl...@yahoo.com>.

Sorry, I've never heard of whitelisting of allowable characters as being a
"normal" approach. 
I've developed many multi-language web applications, some with Struts (1 &
2) and some without. 
Typically you have to watch for 2 things; 1) when re-displaying anything a
user has entered you need to ensure it is properly escaped for html, the
Struts <s:property> tag nicely takes care of this for you. 2) prevent SQL
injection by using compiled parametrized SQL statements (like Hibernate will
do for you.) 


egetchell wrote:
> 
> All,
> 
> I am one of the architects behind a multi-language site using Struts 2. 
> To mitigate XSS exposure, defining a whitelist of allowable characters is
> the normal approach, but seems to become a non-trivial exercise when
> supporting multiple languages (we will be supporting 15).  My
> understanding is using POSIX based regular expressions allow combining
> language-specific character sets in a single regular expression.  This is
> my first foray into the subject of multi-language validation, so I
> apologize in advance if this is not the correct forum for these two
> questions:
> 
> First, does Struts 2 support POSIX regular expressions?  I’ve tried a
> bunch of attempts to just get a simple example working and have had little
> luck.  From my research, I believe the following is expected to work:
> 
> <field-validator type="regex">
>   
>     <![CDATA[\\p{Alpha}*]]>
>   
>   <message>Invalid Field</message>
> </field-validator>
> 
> Secondly, as a more general (and possibly non-Struts2 specific question),
> has anyone had experience in implementing multi-language whitelists?  The
> information on the Internet is quite vague, so I’m not sure if this is
> still a black art making people a bit more close-lipped on the subject.
> 
> Thanks!
> 
> Eric Getchell | Sr. Technologist
> 
> Distributed Logic Corporation
> 600 Unicorn Park
> Woburn, MA 01801
> Email: eric.getchell@distributedlogic.com
> 
> 

-- 
View this message in context: http://www.nabble.com/Using-POSIX-Regular-Expressions-for-Internationalized-Validation-tp19844314p19858027.html
Sent from the Struts - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
For additional commands, e-mail: user-help@struts.apache.org