You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by ma...@swetech.se on 2011/03/30 00:58:05 UTC
Spam
recetly i been getting ALOT of these mail with the subjects like this
contain a link to some scam/chinese crap factory
i run the latest spamassassin along with amavis but these mails keep
getting through any ideas?
Re: YouWillNotBelieveYourPennisCanBbeThhatHardAndThick!GiveYouserlfATreat
Re: Spam
Posted by Per Jessen <pe...@computer.org>.
Adam Katz wrote:
> The multi-lingual dictionary that I use for this kind of purpose has
> 132 words that are 29+ characters. Its longest word is 58 characters:
> Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch is a large
> village on the Welsh island of Anglesey, see
> http://en.wikipedia.org/wiki/Llanfairpwllgwyngyll for more. Wikipedia
> also notes a hill in New Zealand (short name Taumata) with an even
> longer name. The next longest word is
> pneumonoultramicroscopicsilicovolcanoconiosis with 45 letters. German
> words, which I would have expected to take the cake, seem to be
> limited to 35 or so letters.
From:
http://german.about.com/library/blwort_long.htm
Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz
Donaudampfschiffahrtselektrizitätenhauptbetriebswerkbauunterbeamtengesellschaft
/Per Jessen, Zürich
Re: Spam
Posted by John Hardin <jh...@impsec.org>.
On Wed, 30 Mar 2011, RW wrote:
>>> On Wed, 2011-03-30 at 00:58 +0200, martin@swetech.se wrote:
>>>>
>>>> Re:
>>>> YouWillNotBelieveYourPennisCanBbeThhatHardAndThick!GiveYouserlfATreat
>
> The subjects have two separate characteristics: the length and the
> number of lower to upper case transitions. I score them separately and
> use:
>
> header SUBJ_LONG_WORD Subject =~ /\b[^[:space:][:punct:]]{30}/
> header SUBJ_ODD_CASE Subject =~ /(?:[[:lower:]][[:upper:]].{0,15}){3}/
How about:
header SUBJ_RUNON Subject =~ /(?:[[:upper:]][[:lower]]{2,15}[!:,'"]?){10}/
?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
The third basic rule of firearms safety:
Keep your booger hook off the bang switch!
-----------------------------------------------------------------------
2 days until April Fools' day
Re: Spam
Posted by Adam Katz <an...@khopis.com>.
On 03/30/2011 01:23 PM, RW wrote:
> A lot of these long words are rarely used in the wild - other than
> to say how long they are.
>
> The subjects have two separate characteristics: the length and the
> number of lower to upper case transitions. I score them separately
> and use:
>
> header SUBJ_LONG_WORD Subject =~ /\b[^[:space:][:punct:]]{30}/
> header SUBJ_ODD_CASE Subject =~ /(?:[[:lower:]][[:upper:]].{0,15}){3}/
(Personally, I'd prefer to limit it to letters rather than also
including numbers, underscores, and special characters.)
There's also exaggerated text like aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaarg,
hahahahahahahahahahahahahahaha, lollllllllllllllllllllll!11111one,
intentional strings like goodluckwiththat, and suffixes like
"somethingorother" (as in "Mr. Rosensomethingorother").
I think my rule was a little more efficient at accomplishing something
similar. John's was better named and is preferable except for the fact
that it still takes a while to parse (though at least it's limited to
just one line of each message).
Re: Spam
Posted by RW <rw...@googlemail.com>.
On Wed, 30 Mar 2011 09:16:09 -0700
Adam Katz <an...@khopis.com> wrote:
> On 03/29/2011 04:57 PM, Martin Gregorie wrote:
> > On Wed, 2011-03-30 at 00:58 +0200, martin@swetech.se wrote:
> >> recetly i been getting ALOT of these mail with the subjects like
> >> this contain a link to some scam/chinese crap factory
> >>
> >> i run the latest spamassassin along with amavis but these mails
> >> keep getting through any ideas?
> >>
> >> Re:
> >> YouWillNotBelieveYourPennisCanBbeThhatHardAndThick!GiveYouserlfATreat
> >
> > Since the longest (English) word I know has 28 letters
> > (antidisestablishmentarianism), a private rule like:
> >
> > header VERY_LONG_WORD Subject =~ /Re:\s+\S{29}/
> >
> > should catch that spam.
>
> The multi-lingual dictionary that I use for this kind of purpose has
> 132 words that are 29+ characters. Its longest word is 58 characters:
> Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch is a large
> village on the Welsh island of Anglesey, ...
A lot of these long words are rarely used in the wild - other than to
say how long they are.
The subjects have two separate characteristics: the length and the
number of lower to upper case transitions. I score them separately and
use:
header SUBJ_LONG_WORD Subject =~ /\b[^[:space:][:punct:]]{30}/
header SUBJ_ODD_CASE Subject =~ /(?:[[:lower:]][[:upper:]].{0,15}){3}/
Re: Spam
Posted by Adam Katz <an...@khopis.com>.
On 03/29/2011 04:57 PM, Martin Gregorie wrote:
> On Wed, 2011-03-30 at 00:58 +0200, martin@swetech.se wrote:
>> recetly i been getting ALOT of these mail with the subjects like this
>> contain a link to some scam/chinese crap factory
>>
>> i run the latest spamassassin along with amavis but these mails keep
>> getting through any ideas?
>>
>> Re: YouWillNotBelieveYourPennisCanBbeThhatHardAndThick!GiveYouserlfATreat
>
> Since the longest (English) word I know has 28 letters
> (antidisestablishmentarianism), a private rule like:
>
> header VERY_LONG_WORD Subject =~ /Re:\s+\S{29}/
>
> should catch that spam.
The multi-lingual dictionary that I use for this kind of purpose has 132
words that are 29+ characters. Its longest word is 58 characters:
Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch is a large
village on the Welsh island of Anglesey, see
http://en.wikipedia.org/wiki/Llanfairpwllgwyngyll for more. Wikipedia
also notes a hill in New Zealand (short name Taumata) with an even
longer name. The next longest word is
pneumonoultramicroscopicsilicovolcanoconiosis with 45 letters. German
words, which I would have expected to take the cake, seem to be limited
to 35 or so letters.
Maybe try this instead:
header VERY_LONG_WORD Subject =~ /Re:\s+\w(?![a-z]{40})[A-Za-z]{40}/
If anybody is interested in the dictionary I use, this should be enough
to replicate it:
$ ls -lGg |sed 's/^.* 1 //; s/ ... .. ..... / /'
total 18M
17M all
32 american-english -> /usr/share/dict/american-english
37 american-english-huge -> /usr/share/dict/american-english-huge
39 american-english-insane -> /usr/share/dict/american-english-insane
86K beale.wordlist.asc
25 brazilian -> /usr/share/dict/brazilian
36 british-english-huge -> /usr/share/dict/british-english-huge
37 canadian-english-huge -> /usr/share/dict/canadian-english-huge
86K diceware.wordlist.asc
1.6K expurgated
22 french -> /usr/share/dict/french
23 italian -> /usr/share/dict/italian
135 make-all
23 ngerman -> /usr/share/dict/ngerman
23 ogerman -> /usr/share/dict/ogerman
23 spanish -> /usr/share/dict/spanish
1.7M twl06.txt
21 words -> /usr/share/dict/words
$ cat make-all
#!/bin/sh
( cat `ls |grep -Ev '^all|.wordlist.asc'`
sed -r '/^[0-9]{5}\s+/!d; s///; /\w/!d' *.wordlist.asc
) |sort -f |uniq -i >all
Expurgated and twl06.txt are scrabble dictionaries that you'll have to
find specifically. The .wordlist.asc files are for diceware.
Everything else came from a Debian package. If you're not a word nut
like me, all you really need is the largest of each of the languages,
plus perhaps the standard English dictionary so you can determine if
something is an edge case.
This made it really easy for me to verify the cialis-in-word problem we
had here earlier; `grep -ci cialis all` currently counts 287 words.
Re: Spam
Posted by "Lawrence @ Rogers" <la...@nl.rogers.com>.
On 29/03/2011 9:27 PM, Martin Gregorie wrote:
> On Wed, 2011-03-30 at 00:58 +0200, martin@swetech.se wrote:
>> recetly i been getting ALOT of these mail with the subjects like this
>> contain a link to some scam/chinese crap factory
>>
>> i run the latest spamassassin along with amavis but these mails keep
>> getting through any ideas?
>>
>> Re: YouWillNotBelieveYourPennisCanBbeThhatHardAndThick!GiveYouserlfATreat
> Since the longest (English) word I know has 28 letters
> (antidisestablishmentarianism), a private rule like:
>
> header VERY_LONG_WORD Subject =~ /Re:\s+\S{29}/
>
> should catch that spam.
>
>
> Martin
>
>
We started getting those spams about 6 months ago. What I did was come
up with a low scoring rule that hits on this
# Rule 1: check if the Subject also containing numbers, letters, or
common formatting (except spaces) and more than 34 characters
header LW_SUBJECT_SPAMMY Subject =~ /^[0-9a-zA-Z,.+_\-'!\\\/]{31,}$/
describe LW_SUBJECT_SPAMMY Subject appears spammy (31 or more characters
without spaces. Only numbers, letters, and formattiing)
score LW_SUBJECT_SPAMMY 0.2
#tflags LW_SUBJECT_SPAMMY noautolearn
I'm sure this rule could use some improvement.
The ones we saw also always followed 2 possible patterns (sometimes
containing both in the same e-mail)
1) Hit the HTML_MESSAGE, and either FREEMAIL_FROM or TRACKER_ID, rules.
2) Hit MIME_QP_LONG_LINE and a network test.
We have the above 2 in the form of meta rules and scored at 1.0 each.
We also have a 3rd meta rule, with the first rule + the 2 described
above, scored at 1.5
This has proven to be quite effective at nuking these spams without FP.
This is because the likelyhood of a ham e-mail setting off all of the
above rules is quite low.
Regards,
Lawrence
Re: Spam
Posted by Martin Gregorie <ma...@gregorie.org>.
On Wed, 2011-03-30 at 00:58 +0200, martin@swetech.se wrote:
> recetly i been getting ALOT of these mail with the subjects like this
> contain a link to some scam/chinese crap factory
>
> i run the latest spamassassin along with amavis but these mails keep
> getting through any ideas?
>
> Re: YouWillNotBelieveYourPennisCanBbeThhatHardAndThick!GiveYouserlfATreat
Since the longest (English) word I know has 28 letters
(antidisestablishmentarianism), a private rule like:
header VERY_LONG_WORD Subject =~ /Re:\s+\S{29}/
should catch that spam.
Martin