You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Marcio Humpris <ma...@gmail.com> on 2013/11/08 00:01:30 UTC

one word spam (still trying...)

Hi, John

This didnt work for me also:

     /^\s{0,80}\S{1,20}\s{0,80}$/

can you kindly check it works here?

http://www.softlion.com/webTools/RegExpTest/default.aspx

Heres the original email I want to block:

http://pastebin.com/download.php?i=0D7tfsjf

Thank you!

Re: one word spam (still trying...)

Posted by John Hardin <jh...@impsec.org>.
On Thu, 7 Nov 2013, Marcio Humpris wrote:

> Hi, John
>
> This didnt work for me also:
>
>     /^\s{0,80}\S{1,20}\s{0,80}$/
>
> can you kindly check it works here?
>
> http://www.softlion.com/webTools/RegExpTest/default.aspx

The slashes at the ends should be removed if you're testing the RE with 
that tool. If I do that it works as expected there.

> Heres the original email I want to block:
>
> http://pastebin.com/download.php?i=0D7tfsjf

"Tudo bom?" is two words.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   "They will be slaughtered as result of England's anti-gun laws
   that concentrates power to the Government."
 			-- Shifty Powers (101 abn) observing British
 			subjects training to repel a German invasion
 			using rakes, hoes and pitchforks
-----------------------------------------------------------------------
  4 days until Veterans Day

Re: one word spam (still trying...)

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Mon, 2013-11-25 at 22:26 +0000, Marcio Humpris wrote:
> Dear Karsten,
> Really appreciate it. Thanks John Hardin also.

> Can you kindly tell me how I can add this to my local.cf so I can test it?

> I imagine I have to do this, correct?
> 
>   rawbody __RB_GT_200  /^.{201}/s
>   meta    __RB_LE_200  !__RB_GT_200
>   score __RB_LE_200 1.5

You cannot assign a score directly here. As I mentioned in my previous
post, these (any rules with names starting with two underscores) are
non-scoring sub-rules intended to be used in a meta rule.

Thus, you would either have to change the __RB_LE_200 rule's name in the
meta and drop the leading underscores. Alternatively preserve these size
constraint logic rules as-is, and add a plain meta rule you can assign a
score to.

  meta  LOCAL_RB_LE_200  __RB_LE_200

That might look slightly redundant on first glance, but helps using the
size constraint rules in other meta rules, as well as more closely
matching your target and prevent false positives with additional
constraints.

The samples in your case are short body without any URI, so you could
e.g. use another stock sub-rule to prevent firing on mail quickly thrown
together to send a funny link to the college next cubicle.

  meta  RB_LE_200_NO_URI  __RB_LE_200 && !__HAS_ANY_URI

Unless you're comfortable with the rule, I suggest to start with a lower
score -- and raise it over time after some performance observation.

  score RB_LE_200_NO_URI  0.5


> Im a bit confused, LE checks less then 200 words and seems to negate
> RB_GT_200...?

Syntax used in the original sub-rules above: RB indicating it being
(based on) a rawbody type rule. GT and LE meaning "greater than" and
"less than or equal" respectively, in relation to the the trailing
number.

The first rule (type rawbody) evaluates true for any message with more
than 200 chars in textual MIME-parts, which one can think about as
"having a body of at least 200 chars". Note that this works on *chars*,
not words.

The second rule does indeed negate that -- which means, the textual
MIME-parts (think body) of the message is "less than or equal 200 chars
in length". Again, operating on chars, not words.


In your case, since you want to match messages with typically much less
chars than 200, I'd go for a version of about 40 chars, maybe. I briefly
outlined what to adjust for that in my previous post, if it isn't clear
by the rule definitions already.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: one word spam (still trying...)

Posted by Benny Pedersen <me...@junc.eu>.
Marcio Humpris skrev den 2013-11-25 23:26:

> I imagine I have to do this, correct?
> 
>   rawbody __RB_GT_200  /^.{201}/s
>   meta    __RB_LE_200  !__RB_GT_200
>   score __RB_LE_200 1.5

nope rules beginning with __ cant score for spamtests

meta RB_LE_200 (__RB_LE_200)
describe RB_LE_200 Meta: less then 200 chars in rawboby
score RB_LE_200 1.5



Re: one word spam (still trying...)

Posted by Marcio Humpris <ma...@gmail.com>.
Dear Karsten,

Really appreciate it. Thanks John Hardin also.

There was no original rule actually. And yes, 2 words in the case, sorry.

Can you kindly tell me how I can add this to my local.cf so I can test it?

  rawbody __RB_GT_200  /^.{201}/s
  meta    __RB_LE_200  !__RB_GT_200

I imagine I have to do this, correct?

  rawbody __RB_GT_200  /^.{201}/s
  meta    __RB_LE_200  !__RB_GT_200
  score __RB_LE_200 1.5
  
Im a bit confused, LE checks less then 200 words and seems to negate
RB_GT_200...?

Thanks again.



Re: one word spam (still trying...)

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Thu, 2013-11-07 at 21:01 -0200, Marcio Humpris wrote:
> This didnt work for me also:
>  
>      /^\s{0,80}\S{1,20}\s{0,80}$/

That RE matches a complete line with a single "word" (anything but
whitespace) of up to 20 chars, and optional whitespace \s before and
after the word.

> Heres the original email I want to block:
> http://pastebin.com/download.php?i=0D7tfsjf

Despite your Subject, that sample has two words in the body.

You missed to post the actual SA rule. The above is just an RE. This is
important, because different rules are applied against different
versions of the message or body, which also impacts the exact definition
of beginning ^ and end $ assertions. And in the case of a body rule, the
Subject becomes the first paragraph.

Even more words? In total, yes, but not as far as the above RE is
concerned. In body rules, paragraphs are normalized to newline delimited
single line strings. Lacking magic like the /m modifier, the beginning
and end assertions are per-line -- not spanning the entire body. A
single one word paragraph in a large mail would match.


Given the sample, what you actually are after might be a "very short
body" rule. This was part of a recent thread:

  rawbody __RB_GT_200  /^.{201}/s
  meta    __RB_LE_200  !__RB_GT_200

The (non-scoring sub-rule) __RB_LE_200 matches any mail with less than
or equal 200 chars in the textual body MIME-parts. To adjust the size
and lower it for your use-case, just replace any instance of 200 and 201
with your desired maximum size, and max size plus one respectively.

These rules are sub-rules intended to be used in a meta rule.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}