You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Mike Grau <m....@kcc.state.ks.us> on 2005/02/28 18:12:04 UTC

Rule advice please

Hello.

Following discussions on this list about obfuscating words to avoid spam 
detection, and not being a ninja, I'd like some feedback about the 
possible efficacy or pitfalls on rules like the following.

As noted in other discussions, words with scrambled letters between the 
first and last letter can be caught by checking the permutations of the 
letters:

      /\ba(?:ess|ess|ses|ses)s\b/i   <-   finds permutations of "asses"
	
However, this quickly gets unweildy when building a regex checking all 
the permutations of more than 5 letters. Couldn't one use a regex that 
simply looks for the letters used and uses a negative look-ahead 
assertation to  eliminate other words of the same length by first 
running the expresssion through a dictionary of words and phrases. For 
example, a rule for the word "exploited" after run through a dictionary 
of 617709 words and phrases:

     /\b(?!exploited|elliptoid|epitoxoid)e[xploite]{7}d\b/i

or perhaps an additional rule for added letters "expploited", etc.

           /\b(?!epileptoid)e[xploite]{8}d\b/i
	  /\be[xploite]{9}d\b/i

or combined:

   /\b(?!exploited|elliptoid|epitoxoid|epileptoid)e[xploite]{7,9}d\b/i

Usually the obfusticated word still resembles the word with the meaning 
the spammer wants to convey. I doubt the spammer wants to use the word 
"elliptoid", and anyway, the idea is to use these rules as non-scoring 
rules for use with meta rules.

     (OBFU_EXPLOIT + RULE1 + RULE2 + RULE3) > 1   etc.

or whatever. Thoughts?  Other samples:

   subject =~ /\b(?!cartoon|croatan|carroon)c[arto]{5}n\b/i
   subject =~ /\b(?!downloadable)d[ownladb]{10}e\b/i
   subject =~ /\b(?!dripping)d[ripn]{6}g\b/i
   subject =~ /\b(?!ejaculating|enunciating)e[jacultin]{9}g\b/i

Of course, you could add "1" and "0" in the character set if the word 
contained a "o" or "l", and the like.

-- Mike

Re: Rule advice please

Posted by Loren Wilton <lw...@earthlink.net>.
> Following discussions on this list about obfuscating words to avoid spam
> detection, and not being a ninja, I'd like some feedback about the
> possible efficacy or pitfalls on rules like the following.
[snip]

In general, there are three main ways of dealing with these obfuscations:
1.  Hand-crafted rules looking for the generally expected variants; usually
on a phrase rather than a word.
2. Chris's Obfu generator that generates and exhaustive (and exhausing, if
you try to read the result ;-) regex to catch just about any variation on
obfuscation on a word or phrase, and
3. Tripwire and related rules that will very often end up triggering pretty
heavily on the more creative obfuscations.
4. And then there is SURBL, that renders all the previous pretty moot after
the first hour or two of a new spam target domain.

As a slight subject change, there is another form of obfuscation in the wild
that tends to escape all except the SURBL test.  I'm actually rather fond of
these spams, since I can always get a good belly laugh from whatever the
spam generator managed to come up with.  I'm guessing that these are
generated by a tool that takes a phrase and then does a thesaurus lookup on
each word, with a *very* creative thesarus.  Below is an edited sample of
one such.  The message appears twice, with modifications.  Once in the text
part of the spam, once in the html part:

------ begin spam --------
These pills are only similar normal lozenges but they
are specially formulated to be soft and dissolvable
under the glossa. The tablets is sorbed at the oral fissure
and gets in the bloodstream directly alternatively of rising
through with the tummytum. This effects in a quicker much more
powerful outcome which run up to 35 hours!

Our tablets are simply equal usual lozenges but they <BR>
are specially formulated to be pliant and soluble<BR>
below the clapper. The tablets is sorbed at the oral cavity<BR>
and gets into the bloodstream straight alternatively of progressing<BR>
through with the tummytum. This results in a faster more<BR>
strong result which yet up to 39 hours!<BR>
------------ end spam ------------

        Loren