You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Jesse Houwing <j....@student.utwente.nl> on 2004/02/25 18:40:51 UTC

html_tag_balance on html and head tags

Hello all,

I've recently started buidling my own custom ruleset to catch some of the 
spam that has eluded the spamassassin filter of the university. Recently al 
lot of the messages I get have unbalanced html and head tags. Some even 
start with </html> and </head>.

I tried to use eval:html_tag_balance('html' '<0') in a rule to test against 
this, but somehow this never triggers. The same is true for the head tag.

I've included a test message on the following wiki page:
http://www.exit0.us/index.php/UnBalancedHTMLorHEADtags

I was wondering if it is a problem with Spamassassin, or if I'm just on the 
wrong track here.


Jesse

Re: html_tag_balance on html and head tags

Posted by Daniel Quinlan <qu...@pathname.com>.
"Jesse Houwing" <j....@student.utwente.nl> writes:

> I've recently started buidling my own custom ruleset to catch some of the 
> spam that has eluded the spamassassin filter of the university. Recently al 
> lot of the messages I get have unbalanced html and head tags. Some even 
> start with </html> and </head>.
> 
> I tried to use eval:html_tag_balance('html' '<0') in a rule to test against 
> this, but somehow this never triggers. The same is true for the head tag.
> 
> I've included a test message on the following wiki page:
> http://www.exit0.us/index.php/UnBalancedHTMLorHEADtags
> 
> I was wondering if it is a problem with Spamassassin, or if I'm just on the 
> wrong track here.

(Unfortunately?) balance only checks for tags opened that were never
closed, not ones closed without being opened.  It relies on the same
code that tries to determine when it is between two tags (for example,
between <title> and </title>).

The code should not be too complicated to follow.  Look at HTML.pm and
maybe EvalTests.pm.

Daniel

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/    and open source consulting