You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Sven Riedel <sr...@baghus.net> on 2005/06/07 14:57:59 UTC

Would a normalization plugin make sense?

Hi,
since a lot of spam nowadays tries to get past the filters
by multiplying random letters, wouldn't it make sense to
introduce normalization plugins to spamassassin?

These would run over the mail once before the actual scanning
starts, and perform transformations on the decoded mail body.

Some functions I could think of off of the  top of my head would
be:
- reducing multiple consecutive letter instances to one occurance
of the given letter

- Transforming html-entities to their given roman letter equivalent

- removing all non-alphanumericals from the mail body

This would require a new new rule calls (e.g. normalbody), to avoid
breaking existing rulesets.

Would this make sense? Can this be included into spamassassin, or 
are the current internals structured in way that makes the introduction
of such plugins hard/impossible?

Regs,
Sven

Re: Would a normalization plugin make sense?

Posted by Loren Wilton <lw...@earthlink.net>.
> Would this make sense? Can this be included into spamassassin, or
> are the current internals structured in way that makes the introduction
> of such plugins hard/impossible?

The concept of normalization has been discussed under various names over
time.  My personal impression is nobody really knows if this would help,
hurt, or be a waste of time.  Although you will certainly find enough
opinions one way or the other.

I suspect that you could do this as a plugin, but I also suspect you would
have to take ugly liberties with the internal data storage in SA.  For
instance, I suspect (but do not know) that plugins are probably not supposed
to modify the mail text.  You could certainly do something like this by
patching permsgstatus.pm.

If this is an idea that interests you enough to work on it, I personally
would suggest you grab the ball and run with it -- find out if this really
helps or not.  It doesn't much matter how you get code working to test your
conclusion.  If the results are wonderful and the code is ugly beyond belief
I'm sure someone will be willing to rewrite as needed.  If the results
aren't any good, then the code won't matter much anyway.  ;-)

        Loren