You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "Martin G. Diehl" <md...@nac.net> on 2005/05/11 23:48:27 UTC

SPAM with low readibility

Greetings,

A small non-scientific sample of some SPAM subjects ...
(and an actual serious question later in this message).

[incomprehensible SPAM]
> Subject: ¡Ú±¹³»1À§ Á÷ÀåÀδ롤Ãâ ½ºÆä¼È·Ð 5000¸¸¿ø¿ø±îÁö ³â5~12% 100%½ÂÀÎ!
> Subject: ¡á¡áÇö±ÝÀÌ¿À°¡´Â Ä«Áö³ë °í½ºÅé.Æ÷Ä¿¡á¡á[À̹ÌÁöº¸±âŬ¸¯] cvkuhfq
> Subject: °í¼Óµµ·Î ¹× °¡¼Ó »ç°í¸¦ ¿¹¹æÇص帳´Ï´Ù »ç°í¹«!@ osouaoyoakx tokpf
> Subject: ³×ºñ°ÔÀÌ¼Ç »çÀºÇà»ç ¼³¹® À̺¥Æ®(³×½ºÆÌ ½ºÀ®Æù)
> Subject: ½Å¿ëÄ«µå »ç¿ëÀÚ´Â ¹«¹æ¹® ¹«¼­·ù ´çÀÏ´ëÃâ 100% °¡´ÉÇÕ´Ï´Ù.
> Subject: [¹«¹æ¹®]´©±¸³ª °¡´ÉÇÑ´ëÃâ(Ä«µå,Á÷ÀåÀδëÃâ) °ø½Ä ÃÖÀú¼ö¼ö·á ¾÷ü!!!
> Subject: Áö±ÝÀÎÅͳݰ¡ÀÔÇϸé 6°³¿ù¹«·á+Çö±Ý6¸¸¿ø.»ï¼ºMP3.DVD.»ïõ¸®21´ÜÀÚÀü°Å.HPÄ®¶óÇÁ¸°ÅÍ°¡ °øÂ¥!!! eocqxl  lrmibl
> Subject: Re: ocJfD2xLTANk6
> Subject: ÃÖ°íÀÇ ³ëÆ®ºÏ 50%ÇÒÀÎ ÆǸÅ!

[Russian SPAM}
> Subject: =?Windows-1251?B?7/Du5efkIOIg9uXt8vAg8fLu6+j2+w==?=
> 
> Subject: =?Windows-1251?B?8e/w4OLu9+3o6iDk6/8g7/Du4/Dl8fHg?=
> From: =?Windows-1251?B?2ODt6O3gIMUuzy4=?= <ki...@cs.ucdavis.edu>
> 
> Subject: =?Windows-1251?B?4uX35fAg+PPy7uo=?=
> From: =?Windows-1251?B?1+Xw7e7i4CDOLsEu?= <re...@jwz.org>

[English not spoken here SPAM]
> Subject: Re: Reday 2 Odrer olinne
> Subject:  willie illegitimacy
> Subject: consanguineous alternate

[Makes me think of tricking musicians into performing a P D Q Bach work]
> Subject:    AWARD NOTIFICATION !!!

The incomprehensible SPAM seems to be Korean or Japanese based on the
few times one of those messages was accidentally opened.

The Russians at least tag the character set in the headers ... and the
Cyrillic lettering in the headers looks nice among the rest of the SPAM.

The English not spoken ... category sometimes supplies a little humor
... the second and third examples in that category were actually offers
to refinance my house! ... At competitive rates -- lucky me.

Now for my serious questions ...

(1) Is there a simple rule to detect the incomprehensible ...
hint: for the most part, those letters have code values that are greater
than 128.

In the same line of thinking, is there a way for the scripts to detect
the character set when specified?  IOW could someone code a filter rule
that tested for Russian?

Thanks for listening ...

Martin


Re: SPAM with low readibility

Posted by Robert Menschel <Ro...@Menschel.net>.
Hello Martin,

Wednesday, May 11, 2005, 2:48:27 PM, you wrote:

MGD> Now for my serious questions ...

MGD> (1) Is there a simple rule to detect the incomprehensible ...
MGD> hint: for the most part, those letters have code values that are greater
MGD> than 128.

MGD> In the same line of thinking, is there a way for the scripts to detect
MGD> the character set when specified?  IOW could someone code a filter rule
MGD> that tested for Russian?

Check the SARE rules files, specifically 70_sare_genlsubj_eng.cf and
70_sare_header_eng.cf -- I think you'll find some samples there you
can adapt.

Bob Menschel