You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by NFN Smith <wo...@sacbeemail.com> on 2006/12/05 18:30:39 UTC
Need regexp tip
I'm working on a series of rules to find obfuscated words in subject
lines that have been misspelled by adding an extra character (often a
repeated letter) to a word. For certain words, it seems to be
appropriate to assume that if they're misspelled in that way, it's
deliberate.
I've got the syntax for a regular expression mostly working (including
words with trailing punctuation), but I don't have it identifying words
where the last letter is doubled. Thus if I have a regexp that looks like:
/\b(?!badword)(?:b.?a.?d.?w.?o.?r.?d.?)(\b|\!|\.|\,|\;|\:|\?)/i
I'm getting hits on things like 'baddword' and 'badwoord', and even
'badworrd!', but I'm not getting a hit on 'badwordd'
I've tried a number of variants, but still am not quite getting it.
What am I missing?
Smith
Re: Need regexp tip
Posted by "John D. Hardin" <jh...@impsec.org>.
On Tue, 5 Dec 2006, NFN Smith wrote:
> I'm working on a series of rules to find obfuscated words
>
> /\b(?!badword)(?:b.?a.?d.?w.?o.?r.?d.?)(\b|\!|\.|\,|\;|\:|\?)/i
I have a tool that does this (for double letters as well as other
obfuscations) automatically.
http://www.impsec.org/~jhardin/antispam/
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
The question of whether people should be allowed to harm themselves
is simple. They *must*. -- Charles Murray
-----------------------------------------------------------------------
10 days until Bill of Rights day