You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Clinton Gormley <cl...@traveljury.com> on 2007/07/03 10:15:07 UTC
HTML::Stripscripts::LibXML (was Config::Loader and
HTML::StripScripts)
Kjetil Kjernsmo requested a front end to HTML::StripScripts that,
instead of returning HTML text, would return a LibXML Document or
DocumentFragment (ie a DOM tree).
I have released this as HTML::StripScripts::LibXML:
http://search.cpan.org/~drtech/HTML-StripScripts-LibXML-0.10/LibXML.pm
It handles messy HTML, strips out XSS, and gives you fine grained
control of the HTML/XML nodes that are returned.
If you are interested in this, please give it a try, and give me some
feedback about how to improve it, options to add etc.
The main question mark I have is what to do with encoding - suggestions
welcome.
Also see my question at Perl Monks:
http://www.perlmonks.org/index.pl?node_id=624334
thanks
Clint
On Tue, 2007-06-26 at 16:34 +0200, Kjetil Kjernsmo wrote:
> On Tuesday 26 June 2007 16:22, Clinton Gormley wrote:
> > - used to strip XSS scripting from user submitted HTML
>
> Ooooh, cool! I haven't found any modules that does that well enough.
>
> > - outputs valid HTML (cleans up nesting, context of tags etc)
> >
> > - handles the exploits listed at http://ha.ckers.org/xss.html
>
>
> Great!
>
> > I hope this helps others, and if anybody has any suggestions, please
> > feed them back to me
>
> Actually, something I would feel would be very useful is if it could
> return an XML::LibXML::DocumentFragment object.
>
> I tend to use XML::LibXML to parse user input and insert in the
> document, which is then going through some XSLT, and since you've
> allready parsed stuff, it seems like a waste to parse again.
>
> So that's my feature request! :-)
>
> Cheers,
>
> Kjetil