You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Mike Cardwell <sp...@lists.grepular.com> on 2010/01/12 21:22:44 UTC

[OT] spamalyser, was "Re: pill image spam learns to walk"

On 12/01/2010 10:24, Henrik K wrote:

>>>> Presently it renders them as plain text. I'm fully aware of the
>>>> potential problems with it. Ideally I'd like to be able to render
>>>> those parts as HTML, but I need to be 100% sure that I've stripped
>>>> out anything dangerous (including embedded remote content by
>>>> default) first. It's on the "ToDo List" page.
>>>
>>> Nice job Mike! :)
>>>
>>> I wrestled with that same issue when I added direct viewing of HTML
>>> content to my offline analysis/FP-pipeline/MassChecks tool.
>>>
>>> Originally, I was using an ActiveX wrapper around IE, which (of
>>> course) made me nervous.  I added some VERY simple, crude tag
>>> stripping (script, iframe, style), but was never happy with it.
>>> I ended up switching to an open source HTML rendering component
>>> which :) lacked support for all the scary stuff.
>>>
>>> Whatever you decide to do, please do post more about it, and q'pla!
>>
>> I shall. There are a multitude of modules on cpan for fixing up html and  
>> stripping out tags. I just need to find time to test them. I've got to  
>> figure out how to "cleanse" the CSS as well. Eg, you can execute  
>> javascript from CSS with stuff like:  
>> background:url("javascript:someFunction();")
> 
> IMO whatever you do, there will always be some hole to be found. Your only
> safe option is to render the HTML into image and display that. It will also
> be always consistent and not depend on browser version.

That was a good suggestion and something I hadn't considered. I've
updated Spamalyser to generate PDFs from HTML parts using the WebKit
rendering engine and QT. So the HTML should look the same as on any
Webkit based user agent. From my tests so far, it's an accurate
representation of what you see in your email client. It handles remote
content like images and CSS fine, and also content attached to the email
with Content-ID headers references by cid URIs. Here's a prime example:
http://spamalyser.com/v/jfv3iz0l/mime#part_1.2

PDF is better than an image because it allows you to maintain the links
in the document. A PNG "thumbnail" generated from the PDF is displayed
along side text/html parts. Clicking that preview image takes you to the
PDF.

I've also tweaked some of the styling so the headers are easier to read.

I've also set up a mailman based mailing list which is linked to from
http://spamalyser.com/ so if anyone wants to discuss anything further to
do with Spamalyser the discussion should probably move there. Any
further announcements will happen there, not here.

-- 
Mike Cardwell    : UK based IT Consultant, LAMP developer, Linux admin
Cardwell IT Ltd. : UK Company - http://cardwellit.com/       #06920226
Technical Blog   : Tech Blog  - https://secure.grepular.com/blog/
Spamalyser       : Spam Tool  - http://spamalyser.com/

Re: [OT] spamalyser, was "pill image spam learns to walk"

Posted by Kai Schaetzl <ma...@conactive.com>.
Mike Cardwell wrote on Wed, 13 Jan 2010 15:15:24 +0000:

> I intend to give the uploader a choice between whether or not to fetch
> remote content at some point in the near future.

That makes sense.

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com




Re: [OT] spamalyser, was "pill image spam learns to walk"

Posted by Kai Schaetzl <ma...@conactive.com>.
Mike Cardwell wrote on Tue, 12 Jan 2010 20:22:44 +0000:

> It handles remote
> content like images and CSS fine

tip: I would not handle remote content at all as this may lead to account 
verification.

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com