You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Clinton Gormley <cl...@traveljury.com> on 2007/06/26 16:22:48 UTC
Config::Loader and HTML::StripScripts
Hi all
I've recently released two modules to CPAN which are of relevance to
mod_perl developers, one as the author and one as the maintainer.
I realise this is a blatant plug, but these modules have been useful to
me in my web-app work, and so there is a good chance that they will be
useful to others.
Config::Loader:
---------------
- loads a configuration directory tree (with files containing data in
YAML, JSON, XML, Config::General, INI or Perl)
- allows you to merge in local config (for instance when working
on a dev machine instead of in production) without accidentally
affecting your main config
- makes the most of shared memory by loading all your config data
at startup
- OO or functional interface
- optional Template Toolkit style key retrieval eg
$host = C('app.db.host.1')
- callbacks to allow you to customise the loading process
to suit your needs
http://search.cpan.org/~drtech/Config-Loader-1.11/
HTML::StripScripts
------------------
- used to strip XSS scripting from user submitted HTML
- outputs valid HTML (cleans up nesting, context of tags etc)
- handles the exploits listed at http://ha.ckers.org/xss.html
- by default, configured to be safe
- very customisable via rules including regexes and callbacks
eg
- replace <font> tags with <style> tags
- allow local href's only to certain paths in your site etc
http://search.cpan.org/~drtech/HTML-StripScripts-1.00/
use HTML::StripScripts::Parser to feed tokens to HTML::StripScripts
http://search.cpan.org/~drtech/HTML-StripScripts-Parser-1.00/
I hope this helps others, and if anybody has any suggestions, please
feed them back to me
Clint
Re: Config::Loader and HTML::StripScripts
Posted by Clinton Gormley <cl...@traveljury.com>.
> Actually, something I would feel would be very useful is if it could
> return an XML::LibXML::DocumentFragment object.
>
> I tend to use XML::LibXML to parse user input and insert in the
> document, which is then going through some XSLT, and since you've
> allready parsed stuff, it seems like a waste to parse again.
Ooooh - that sounds nasty :)
It sounds like it needs a subclass like HTML::StripScripts::Parser and
HTML::StripScripts::Regex
So XML::LibXML would do the parsing, then HTML::StripScripts::LibXML
would feed it token by token to HTML::StripScripts, which could then
return the parsed HTML, to be constructed into a DocumentFragment by the
subclass.
>
> So that's my feature request! :-)
If you send me an example of (the interface) how you would like to use
it, I'll see what I can do.
We'll take it off this list, because I feel that I have been
sufficiently off topic already
Clint
>
> Cheers,
>
> Kjetil
HTML::Stripscripts::LibXML (was Config::Loader and
HTML::StripScripts)
Posted by Clinton Gormley <cl...@traveljury.com>.
Kjetil Kjernsmo requested a front end to HTML::StripScripts that,
instead of returning HTML text, would return a LibXML Document or
DocumentFragment (ie a DOM tree).
I have released this as HTML::StripScripts::LibXML:
http://search.cpan.org/~drtech/HTML-StripScripts-LibXML-0.10/LibXML.pm
It handles messy HTML, strips out XSS, and gives you fine grained
control of the HTML/XML nodes that are returned.
If you are interested in this, please give it a try, and give me some
feedback about how to improve it, options to add etc.
The main question mark I have is what to do with encoding - suggestions
welcome.
Also see my question at Perl Monks:
http://www.perlmonks.org/index.pl?node_id=624334
thanks
Clint
On Tue, 2007-06-26 at 16:34 +0200, Kjetil Kjernsmo wrote:
> On Tuesday 26 June 2007 16:22, Clinton Gormley wrote:
> > - used to strip XSS scripting from user submitted HTML
>
> Ooooh, cool! I haven't found any modules that does that well enough.
>
> > - outputs valid HTML (cleans up nesting, context of tags etc)
> >
> > - handles the exploits listed at http://ha.ckers.org/xss.html
>
>
> Great!
>
> > I hope this helps others, and if anybody has any suggestions, please
> > feed them back to me
>
> Actually, something I would feel would be very useful is if it could
> return an XML::LibXML::DocumentFragment object.
>
> I tend to use XML::LibXML to parse user input and insert in the
> document, which is then going through some XSLT, and since you've
> allready parsed stuff, it seems like a waste to parse again.
>
> So that's my feature request! :-)
>
> Cheers,
>
> Kjetil
Re: Config::Loader and HTML::StripScripts
Posted by Kjetil Kjernsmo <kj...@opera.com>.
On Tuesday 26 June 2007 16:22, Clinton Gormley wrote:
> - used to strip XSS scripting from user submitted HTML
Ooooh, cool! I haven't found any modules that does that well enough.
> - outputs valid HTML (cleans up nesting, context of tags etc)
>
> - handles the exploits listed at http://ha.ckers.org/xss.html
Great!
> I hope this helps others, and if anybody has any suggestions, please
> feed them back to me
Actually, something I would feel would be very useful is if it could
return an XML::LibXML::DocumentFragment object.
I tend to use XML::LibXML to parse user input and insert in the
document, which is then going through some XSLT, and since you've
allready parsed stuff, it seems like a waste to parse again.
So that's my feature request! :-)
Cheers,
Kjetil
--
Kjetil Kjernsmo
Information Systems Developer
Opera Software ASA
Re: Config::Loader and HTML::StripScripts
Posted by Clinton Gormley <cl...@traveljury.com>.
I've been looking at how you would add object and embed tags, and it
isn't trivial. They're not in there by default because of the nasty
things that they can do. But I could add them in, along with flags to
specify that you want to allow them, much like AllowHref
I'll get back to you.
Again, I'll take this off the list now (until I have something to show
for it).
Jonathan, could you give me some sample code that you would like to
allow through?.
thanks
Clint
> already doing that...
>
> those are placed in object AND embed tags (i don't recall if embed
> are off by default)
> regardless, it might make sense to mention them in the docs as
> they're in a grey-area and something to be wary of when enabling
> objects.
>
> allowScriptAccess locks the flashplayer down- it can't call any js
> functions or do any document writes/etc. without it, its possible to
> have a .swf file that onload starts rewriting the page to load in
> external js files and then write them into the document body (thereby
> avoiding any js xss safeguards). thats how a lot of old 'skinning'
> and 'tracking' was done - people would write mini-apps hiddin in a
> 1x1 swf file that would manipulate the dom and do whatever data
> exchange is needed. it can be pretty insidious.
>
> allowNetworking, i think, disables what getURL can do. i could be
> wrong on that one, but i believe that is the command that locks down
> what swf files can redirect browsers to ( same domain as html or any
> or none )
>
>
> // Jonathan Vanasco
>
> | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - - - - - - - - - -
> | CEO/Founder SyndiClick Networks
> | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - - - - - - - - - -
> | Founder/CTO/CVO
> | FindMeOn.com - The cure for Multiple Web Personality Disorder
> | Web Identity Management and 3D Social Networking
> | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - - - - - - - - - -
> | RoadSound.com - Tools For Bands, Stuff For Fans
> | Collaborative Online Management And Syndication Tools
> | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - - - - - - - - - -
>
>
Re: Config::Loader and HTML::StripScripts
Posted by Jonathan Vanasco <jv...@2xlp.com>.
On Jun 26, 2007, at 11:09 AM, Clinton Gormley wrote:
>>
>> allowScriptAccess="never"
>> allownetworking="internal"
> I don't know what those are :)
>
> <object> tags are removed by default, and you would still need to
> subclass HTML::StripScripts in order to allow those elements.
>
> The Rules (for safety's sake) are applied after the standard
> parsing has
> already happened, and object's are not allowed because they are
> just too
> risky. So if you want to do that, subclass the WHITELIST
> INITIALIZATION
> METHODS and add the relevant config in there.
already doing that...
those are placed in object AND embed tags (i don't recall if embed
are off by default)
regardless, it might make sense to mention them in the docs as
they're in a grey-area and something to be wary of when enabling
objects.
allowScriptAccess locks the flashplayer down- it can't call any js
functions or do any document writes/etc. without it, its possible to
have a .swf file that onload starts rewriting the page to load in
external js files and then write them into the document body (thereby
avoiding any js xss safeguards). thats how a lot of old 'skinning'
and 'tracking' was done - people would write mini-apps hiddin in a
1x1 swf file that would manipulate the dom and do whatever data
exchange is needed. it can be pretty insidious.
allowNetworking, i think, disables what getURL can do. i could be
wrong on that one, but i believe that is the command that locks down
what swf files can redirect browsers to ( same domain as html or any
or none )
// Jonathan Vanasco
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
| CEO/Founder SyndiClick Networks
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
| Founder/CTO/CVO
| FindMeOn.com - The cure for Multiple Web Personality Disorder
| Web Identity Management and 3D Social Networking
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
| RoadSound.com - Tools For Bands, Stuff For Fans
| Collaborative Online Management And Syndication Tools
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
Re: Config::Loader and HTML::StripScripts
Posted by Clinton Gormley <cl...@traveljury.com>.
On Tue, 2007-06-26 at 11:02 -0400, Jonathan Vanasco wrote:
> On Jun 26, 2007, at 10:22 AM, Clinton Gormley wrote:
>
> > HTML::StripScripts
>
> thanks! I'm already a happy user.
> excited to check out the changelog.
>
> does the new version automagically do the anti-xss flash embed
> extensions that myspace had adobe put in?
> allowScriptAccess="never"
> allownetworking="internal"
>
> in the old version, i need to do that manually.
> xss didn't launch with that, but I believe its on the site now.
I don't know what those are :)
<object> tags are removed by default, and you would still need to
subclass HTML::StripScripts in order to allow those elements.
The Rules (for safety's sake) are applied after the standard parsing has
already happened, and object's are not allowed because they are just too
risky. So if you want to do that, subclass the WHITELIST INITIALIZATION
METHODS and add the relevant config in there.
After that, the full power of Rules is available to you
Clint