You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Clinton Gormley <cl...@traveljury.com> on 2007/06/26 16:22:48 UTC

Config::Loader and HTML::StripScripts

Hi all

I've recently released two modules to CPAN which are of relevance to
mod_perl developers, one as the author and one as the maintainer.

I realise this is a blatant plug, but these modules have been useful to
me in my web-app work, and so there is a good chance that they will be
useful to others.

Config::Loader: 
---------------
 - loads a configuration directory tree (with files containing data in 
   YAML, JSON, XML, Config::General, INI or Perl)

 - allows you to merge in local config (for instance when working
   on a dev machine instead of in production) without accidentally 
   affecting your main config

 - makes the most of shared memory by loading all your config data
   at startup

 - OO or functional interface

 - optional Template Toolkit style key retrieval eg 
     $host = C('app.db.host.1')

 - callbacks to allow you to customise the loading process
   to suit your needs 

    http://search.cpan.org/~drtech/Config-Loader-1.11/


HTML::StripScripts
------------------
 - used to strip XSS scripting from user submitted HTML

 - outputs valid HTML (cleans up nesting, context of tags etc)

 - handles the exploits listed at http://ha.ckers.org/xss.html

 - by default, configured to be safe

 - very customisable via rules including regexes and callbacks
   eg
     - replace <font> tags with <style> tags
     - allow local href's only to certain paths in your site etc

   http://search.cpan.org/~drtech/HTML-StripScripts-1.00/

   use HTML::StripScripts::Parser to feed tokens to HTML::StripScripts
   http://search.cpan.org/~drtech/HTML-StripScripts-Parser-1.00/


I hope this helps others, and if anybody has any suggestions, please
feed them back to me

Clint


Re: Config::Loader and HTML::StripScripts

Posted by Clinton Gormley <cl...@traveljury.com>.
> Actually, something I would feel would be very useful is if it could 
> return an XML::LibXML::DocumentFragment object. 
> 
> I tend to use XML::LibXML to parse user input and insert in the 
> document, which is then going through some XSLT, and since you've 
> allready parsed stuff, it seems like a waste to parse again.


Ooooh - that sounds nasty :)

It sounds like it needs a subclass like HTML::StripScripts::Parser and
HTML::StripScripts::Regex

So XML::LibXML would do the parsing, then HTML::StripScripts::LibXML
would feed it token by token to HTML::StripScripts, which could then
return the parsed HTML, to be constructed into a DocumentFragment by the
subclass.

> 
> So that's my feature request! :-) 

If you send me an example of (the interface) how you would like to use
it, I'll see what I can do.

We'll take it off this list, because I feel that I have been
sufficiently off topic already

Clint
> 
> Cheers,
> 
> Kjetil


HTML::Stripscripts::LibXML (was Config::Loader and HTML::StripScripts)

Posted by Clinton Gormley <cl...@traveljury.com>.
Kjetil Kjernsmo requested a front end to HTML::StripScripts that,
instead of returning HTML text, would return a LibXML Document or
DocumentFragment (ie a DOM tree).

I have released this as HTML::StripScripts::LibXML:
http://search.cpan.org/~drtech/HTML-StripScripts-LibXML-0.10/LibXML.pm

It handles messy HTML, strips out XSS, and gives you fine grained
control of the HTML/XML nodes that are returned.

If you are interested in this, please give it a try, and give me some
feedback about how to improve it, options to add etc.

The main question mark I have is what to do with encoding - suggestions
welcome.

Also see my question at Perl Monks:
http://www.perlmonks.org/index.pl?node_id=624334

thanks

Clint

On Tue, 2007-06-26 at 16:34 +0200, Kjetil Kjernsmo wrote:
> On Tuesday 26 June 2007 16:22, Clinton Gormley wrote:
> >  - used to strip XSS scripting from user submitted HTML
> 
> Ooooh, cool! I haven't found any modules that does that well enough.
> 
> >  - outputs valid HTML (cleans up nesting, context of tags etc)
> >
> >  - handles the exploits listed at http://ha.ckers.org/xss.html
> 
> 
> Great!
> 
> > I hope this helps others, and if anybody has any suggestions, please
> > feed them back to me
> 
> Actually, something I would feel would be very useful is if it could 
> return an XML::LibXML::DocumentFragment object. 
> 
> I tend to use XML::LibXML to parse user input and insert in the 
> document, which is then going through some XSLT, and since you've 
> allready parsed stuff, it seems like a waste to parse again.
> 
> So that's my feature request! :-) 
> 
> Cheers,
> 
> Kjetil


Re: Config::Loader and HTML::StripScripts

Posted by Kjetil Kjernsmo <kj...@opera.com>.
On Tuesday 26 June 2007 16:22, Clinton Gormley wrote:
>  - used to strip XSS scripting from user submitted HTML

Ooooh, cool! I haven't found any modules that does that well enough.

>  - outputs valid HTML (cleans up nesting, context of tags etc)
>
>  - handles the exploits listed at http://ha.ckers.org/xss.html


Great!

> I hope this helps others, and if anybody has any suggestions, please
> feed them back to me

Actually, something I would feel would be very useful is if it could 
return an XML::LibXML::DocumentFragment object. 

I tend to use XML::LibXML to parse user input and insert in the 
document, which is then going through some XSLT, and since you've 
allready parsed stuff, it seems like a waste to parse again.

So that's my feature request! :-) 

Cheers,

Kjetil
-- 
Kjetil Kjernsmo
Information Systems Developer
Opera Software ASA

Re: Config::Loader and HTML::StripScripts

Posted by Clinton Gormley <cl...@traveljury.com>.
I've been looking at how you would add object and embed tags, and it
isn't trivial.  They're not in there by default because of the nasty
things that they can do.  But I could add them in, along with flags to
specify that you want to allow them, much like AllowHref

I'll get back to you.

Again, I'll take this off the list now (until I have something to show
for it).

Jonathan, could you give me some sample code that you would like to
allow through?.

thanks

Clint

> already doing that...
> 
> those are placed in object AND embed tags (i don't recall if embed  
> are off by default)
> regardless, it might make sense to mention them in the docs as  
> they're in a grey-area and something to be wary of when enabling  
> objects.
> 
> allowScriptAccess locks the flashplayer down- it can't call any js  
> functions or do any document writes/etc.  without it, its possible to  
> have a .swf file that onload starts rewriting the page to load in  
> external js files and then write them into the document body (thereby  
> avoiding any js xss safeguards).  thats how a lot of old 'skinning'  
> and 'tracking' was done - people would write mini-apps hiddin in a  
> 1x1 swf file that would manipulate the dom and do whatever data  
> exchange is needed.  it can be pretty insidious.
> 
> allowNetworking, i think, disables what getURL can do.  i could be  
> wrong on that one, but i believe that is the command that locks down  
> what swf files can redirect browsers to ( same domain as html or any  
> or none )
> 
> 
> // Jonathan Vanasco
> 
> | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
> - - - - - - - - - - - - - - - - - - -
> |   CEO/Founder SyndiClick Networks
> | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
> - - - - - - - - - - - - - - - - - - -
> |     Founder/CTO/CVO
> |      FindMeOn.com - The cure for Multiple Web Personality Disorder
> |      Web Identity Management and 3D Social Networking
> | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
> - - - - - - - - - - - - - - - - - - -
> |      RoadSound.com - Tools For Bands, Stuff For Fans
> |      Collaborative Online Management And Syndication Tools
> | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
> - - - - - - - - - - - - - - - - - - -
> 
> 


Re: Config::Loader and HTML::StripScripts

Posted by Jonathan Vanasco <jv...@2xlp.com>.
On Jun 26, 2007, at 11:09 AM, Clinton Gormley wrote:
>>
>> 	allowScriptAccess="never"
>> 	allownetworking="internal"

> I don't know what those are :)
>
> <object> tags are removed by default, and you would still need to
> subclass HTML::StripScripts in order to allow those elements.
>
> The Rules (for safety's sake) are applied after the standard  
> parsing has
> already happened, and object's are not allowed because they are  
> just too
> risky. So if you want to do that, subclass the WHITELIST  
> INITIALIZATION
> METHODS and add the relevant config in there.

already doing that...

those are placed in object AND embed tags (i don't recall if embed  
are off by default)
regardless, it might make sense to mention them in the docs as  
they're in a grey-area and something to be wary of when enabling  
objects.

allowScriptAccess locks the flashplayer down- it can't call any js  
functions or do any document writes/etc.  without it, its possible to  
have a .swf file that onload starts rewriting the page to load in  
external js files and then write them into the document body (thereby  
avoiding any js xss safeguards).  thats how a lot of old 'skinning'  
and 'tracking' was done - people would write mini-apps hiddin in a  
1x1 swf file that would manipulate the dom and do whatever data  
exchange is needed.  it can be pretty insidious.

allowNetworking, i think, disables what getURL can do.  i could be  
wrong on that one, but i believe that is the command that locks down  
what swf files can redirect browsers to ( same domain as html or any  
or none )


// Jonathan Vanasco

| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -
|   CEO/Founder SyndiClick Networks
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -
|     Founder/CTO/CVO
|      FindMeOn.com - The cure for Multiple Web Personality Disorder
|      Web Identity Management and 3D Social Networking
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -
|      RoadSound.com - Tools For Bands, Stuff For Fans
|      Collaborative Online Management And Syndication Tools
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -



Re: Config::Loader and HTML::StripScripts

Posted by Clinton Gormley <cl...@traveljury.com>.
On Tue, 2007-06-26 at 11:02 -0400, Jonathan Vanasco wrote:
> On Jun 26, 2007, at 10:22 AM, Clinton Gormley wrote:
> 
> > HTML::StripScripts
> 
> thanks!  I'm already a happy user.
> excited to check out the changelog.
> 
> does the new version automagically do the anti-xss flash embed  
> extensions that myspace had adobe put in?
> 	allowScriptAccess="never"
> 	allownetworking="internal"
> 
> in the old version, i need to do that manually.
> xss didn't launch with that, but I believe its on the site now. 

I don't know what those are :)

<object> tags are removed by default, and you would still need to
subclass HTML::StripScripts in order to allow those elements.

The Rules (for safety's sake) are applied after the standard parsing has
already happened, and object's are not allowed because they are just too
risky. So if you want to do that, subclass the WHITELIST INITIALIZATION
METHODS and add the relevant config in there.

After that, the full power of Rules is available to you

Clint