You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Sergio <se...@gmail.com> on 2011/11/27 13:17:27 UTC

Porn rules to share?

Hi,
I am getting tons of emails similar to the followings:

S:C H #O+O L "G l, R%L P *0 *R N*
Re:sexy~argentinean_babe_getting messy-facial b941ecfb97289dda phone henin
Re:blonde_babe$wants santa*to_visit-her*cuntzn 016b996cadd31796 headed games
Re:blonde~granny-fucks a=younger_guysr 94dac3a33f0f1b04 surgery

I created the following rule:
header    __PORN_RULE01   SUBJECT =~
/Re.(sexy|blonde).*(messy|wants|fuck|cuntzn)/i
header    __PORN_RULE02   SUBJECT =~ /S.C.H..O.O.L..G.I..R.L.P..0..R.N/I
meta   PORN_RULES (__PORN_RULE01 + __PORN_RULE02 >=1)
score  PORN:_RULES 5.0

But emails are still getting in, any comment on what I need to fix on the
rule? or if someone has a better rule to stop this that wants to share the
rule, it will be appreciated.

Regards,

Sergio

Re: Porn rules to share?

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
> I created the following rule:
> header    __PORN_RULE01   SUBJECT =~ 
> /Re.(sexy|blonde).*(messy|wants|fuck|cuntzn)/i
> header    __PORN_RULE02   SUBJECT =~ /S.C.H..O.O.L..G.I..R.L.P..0..R.N/I
> meta   PORN_RULES (__PORN_RULE01 + __PORN_RULE02 >=1)
> score  PORN:_RULES 5.0
>
> But emails are still getting in, any comment on what I need to fix on 
> the rule? or if someone has a better rule to stop this that wants to 
> share the rule, it will be appreciated.

Your score line has a : in it might be the issue.

A good start is to save an example email in mbox format and run this on 
the command-line:

spamassassin -t -D < /tmp/example.mbox 2>&1 | grep PORN

and see what it says.

Regards,
KAM

Re: Porn rules to share?

Posted by John Hardin <jh...@impsec.org>.
On Sun, 27 Nov 2011, Sergio wrote:

> my major concern is in the garbled words like:
>
> S:C H #O+O L "G l, R%L P *0 *R N*
> T\E /EN"S} P)0_R \N
> S:C H #O+O L "G l, R%L P *0 *R N*
> G ,RA _N N}Y } P %0 ~R |N \
> P,0_ R .N PI ~C}T+U-R(E%S.
> TR %A *N #S S. E. X{UA`L P&0/R N_
>
> What it will be the best way to catch any type of garbled word?

There is a rule for these in my sandbox. Unfortunately it's not in the 
current rules update.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
   does quite what I want. I wish Christopher Robin was here."
                                            -- Peter da Silva in a.s.r
-----------------------------------------------------------------------
  28 days until Christmas

Re: Porn rules to share?

Posted by Martin Gregorie <ma...@gregorie.org>.
On Sun, 2011-11-27 at 10:40 -0500, Kevin A. McGrail wrote:
> On 11/27/2011 10:24 AM, Sergio wrote:
> >
> > I want to thank you KAM for the share of his rules, I have learned a 
> > lot looking on them and thanks to that I have modified the rules that 
> > I had to make them more easy to work, the arithmetic on the rules with 
> > the operand "+" is working really nice I have joined a lot of rules 
> > and make them active with ">=1" so if any of the rules on the group 
> > applies then the rule is triggered.
> You are welcome.  As you can see, my focus with content-based rules is 
> to try and use meta rules almost exclusively to minimize FPs.
> 
> > With the porn rule that I have,  it is working but it still left spam 
> > of this type pass, the score line that I wrote on the email had a typo 
> > that is not in my working rule and my major concern is in the garbled 
> > words like:
> >
> > S:C H #O+O L "G l, R%L P *0 *R N*
> > T\E /EN"S} P)0_R \N
> > S:C H #O+O L "G l, R%L P *0 *R N*
> > G ,RA _N N}Y } P %0 ~R |N \
> > P,0_ R .N PI ~C}T+U-R(E%S.
> > TR %A *N #S S. E. X{UA`L P&0/R N_
> >
> > What it will be the best way to catch any type of garbled word?
> Those could hard because you can get some false positives pretty quickly.
> 
> If this is JUST on the subject header, it might be ok to look at a rule 
> like:
> 
> P.{0,2}[0o].{0,2}R.{0,2},N.*{0,2}
> 
> That looks like it might hit on all the variants above but I wouldn't 
> score it too high.
> 
This should also work. It matches all your example headings and is
general enough to match almost any subject line that uses this type of
obfuscation:

header RULENAME Subject =~ /([A-Z][^A-Z]{1,2}){3,}/

The number of adjacent match groups needs to be exactly 3: setting it
higher causes some of your examples to be missed. Setting it lower
starts to generate FPs on inoffensive subject lines.

Caution: I developed this regex using grep with the -P option to scan a
text file containing one of your subject examples per line plus the
following normal subject lines. It hit all the example subjects you
provided and did not hit either of:

	Unspaced title
	CAPITALLISED TITLE

but has not been tested as part of an SA rule.


Martin



Re: Porn rules to share?

Posted by Sergio <se...@gmail.com>.
On Sun, Nov 27, 2011 at 9:40 AM, Kevin A. McGrail <KM...@pccc.com> wrote:

> On 11/27/2011 10:24 AM, Sergio wrote:
>
>>
>> I want to thank you KAM for the share of his rules, I have learned a lot
>> looking on them and thanks to that I have modified the rules that I had to
>> make them more easy to work, the arithmetic on the rules with the operand
>> "+" is working really nice I have joined a lot of rules and make them
>> active with ">=1" so if any of the rules on the group applies then the rule
>> is triggered.
>>
> You are welcome.  As you can see, my focus with content-based rules is to
> try and use meta rules almost exclusively to minimize FPs.
>
>
>  With the porn rule that I have,  it is working but it still left spam of
>> this type pass, the score line that I wrote on the email had a typo that is
>> not in my working rule and my major concern is in the garbled words like:
>>
>> S:C H #O+O L "G l, R%L P *0 *R N*
>> T\E /EN"S} P)0_R \N
>> S:C H #O+O L "G l, R%L P *0 *R N*
>> G ,RA _N N}Y } P %0 ~R |N \
>> P,0_ R .N PI ~C}T+U-R(E%S.
>> TR %A *N #S S. E. X{UA`L P&0/R N_
>>
>> What it will be the best way to catch any type of garbled word?
>>
> Those could hard because you can get some false positives pretty quickly.
>
> If this is JUST on the subject header, it might be ok to look at a rule
> like:
>
> P.{0,2}[0o].{0,2}R.{0,2},N.*{**0,2}
>
> That looks like it might hit on all the variants above but I wouldn't
> score it too high.
>
> The odd part is that I'm not really seeing these spams slipping through so
> I have very little corpora to compare.  I usually hammer the sexually
> explicit spams pretty hard.
>
> I wonder if you need to invest more time in setting up some RBL tests?
>  Are you using any RBLs right now?
>
> Regards,
> KAM
>
Yes, I use the usually RBLs includijg NEWSPAMHAUS, I have 4 RBLs in my
FireWall. Also, I have collected 400 IPs that are blocked in my FireWall.

I will give it a try on your definition and check how it works, thanks..

Sergio

Re: Porn rules to share?

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 11/27/2011 10:24 AM, Sergio wrote:
>
> I want to thank you KAM for the share of his rules, I have learned a 
> lot looking on them and thanks to that I have modified the rules that 
> I had to make them more easy to work, the arithmetic on the rules with 
> the operand "+" is working really nice I have joined a lot of rules 
> and make them active with ">=1" so if any of the rules on the group 
> applies then the rule is triggered.
You are welcome.  As you can see, my focus with content-based rules is 
to try and use meta rules almost exclusively to minimize FPs.

> With the porn rule that I have,  it is working but it still left spam 
> of this type pass, the score line that I wrote on the email had a typo 
> that is not in my working rule and my major concern is in the garbled 
> words like:
>
> S:C H #O+O L "G l, R%L P *0 *R N*
> T\E /EN"S} P)0_R \N
> S:C H #O+O L "G l, R%L P *0 *R N*
> G ,RA _N N}Y } P %0 ~R |N \
> P,0_ R .N PI ~C}T+U-R(E%S.
> TR %A *N #S S. E. X{UA`L P&0/R N_
>
> What it will be the best way to catch any type of garbled word?
Those could hard because you can get some false positives pretty quickly.

If this is JUST on the subject header, it might be ok to look at a rule 
like:

P.{0,2}[0o].{0,2}R.{0,2},N.*{0,2}

That looks like it might hit on all the variants above but I wouldn't 
score it too high.

The odd part is that I'm not really seeing these spams slipping through 
so I have very little corpora to compare.  I usually hammer the sexually 
explicit spams pretty hard.

I wonder if you need to invest more time in setting up some RBL tests?  
Are you using any RBLs right now?

Regards,
KAM

Re: Porn rules to share?

Posted by Sergio <se...@gmail.com>.
Thank you all for your inputs, as you can see I am creating my own rules as
SA needs help on stopping spam.

I want to thank you KAM for the share of his rules, I have learned a lot
looking on them and thanks to that I have modified the rules that I had to
make them more easy to work, the arithmetic on the rules with the operand
"+" is working really nice I have joined a lot of rules and make them
active with ">=1" so if any of the rules on the group applies then the rule
is triggered.

With the porn rule that I have,  it is working but it still left spam of
this type pass, the score line that I wrote on the email had a typo that is
not in my working rule and my major concern is in the garbled words like:

S:C H #O+O L "G l, R%L P *0 *R N*
T\E /EN"S} P)0_R \N
S:C H #O+O L "G l, R%L P *0 *R N*
G ,RA _N N}Y } P %0 ~R |N \
P,0_ R .N PI ~C}T+U-R(E%S.
TR %A *N #S S. E. X{UA`L P&0/R N_

What it will be the best way to catch any type of garbled word?

Sergio

On Sun, Nov 27, 2011 at 7:53 AM, Kevin A. McGrail <KM...@pccc.com> wrote:

> On 11/27/2011 8:26 AM, Martin Gregorie wrote:
>
>>
>> Change the meta to this:
>>
>> meta   PORN_RULES (__PORN_RULE01 || __PORN_RULE02)
>>
>> A quick glance at the SA rules for name prefixes would have told you
>> that rules with names that start with a double underscore have a zero
>> score, so your meta will never work: these rules are designed to be
>> combined by using logical operators.
>>
>>
>>  Martin,
>
> That's not true from my knowledge or experience.  The meta mathematical
> operators are binary.  ("The value of the sub rule in an arithmetic meta
> rule is the true/false (1/0) value for whether or not the rule hit. " from
> http://wiki.apache.org/**spamassassin/WritingRules<http://wiki.apache.org/spamassassin/WritingRules>
> )
>
> i.e.
> True = 1
> False = 0
>
> However, your test would have worked as it simplifies the math with an OR
> condition.
>
> Thought, his meta of __PORN_RULE01 + __PORN_RULE02 >=1 will work.
>
> Though I wish sometimes you could do what you've described.  I've done
> some crazy work to try and give meta rules extra weighting. But I think
> doing so would give the mass check algorithm more permutations than it
> could ever handle.
>
> For example, here's how I weighted two options to have the weight of just
> one in detecting a refinance spam:
>
> meta            KAM_REFI        (__KAM_REFI1 + __KAM_REFI2 + __KAM_REFI3 +
> __KAM_REFI4 + (__KAM_REFI5 + __KAM_REFI6 >= 1) + __KAM_REFI7 + __KAM_REFI8
> >= 4)
>
> Regards,
> KAM
>

Re: Porn rules to share?

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 11/27/2011 8:26 AM, Martin Gregorie wrote:
>
> Change the meta to this:
>
> meta   PORN_RULES (__PORN_RULE01 || __PORN_RULE02)
>
> A quick glance at the SA rules for name prefixes would have told you
> that rules with names that start with a double underscore have a zero
> score, so your meta will never work: these rules are designed to be
> combined by using logical operators.
>
>
Martin,

That's not true from my knowledge or experience.  The meta mathematical 
operators are binary.  ("The value of the sub rule in an arithmetic meta 
rule is the true/false (1/0) value for whether or not the rule hit. " 
from http://wiki.apache.org/spamassassin/WritingRules)

i.e.
True = 1
False = 0

However, your test would have worked as it simplifies the math with an 
OR condition.

Thought, his meta of __PORN_RULE01 + __PORN_RULE02 >=1 will work.

Though I wish sometimes you could do what you've described.  I've done 
some crazy work to try and give meta rules extra weighting. But I think 
doing so would give the mass check algorithm more permutations than it 
could ever handle.

For example, here's how I weighted two options to have the weight of 
just one in detecting a refinance spam:

meta            KAM_REFI        (__KAM_REFI1 + __KAM_REFI2 + __KAM_REFI3 
+ __KAM_REFI4 + (__KAM_REFI5 + __KAM_REFI6 >= 1) + __KAM_REFI7 + 
__KAM_REFI8 >= 4)

Regards,
KAM

Re: Porn rules to share?

Posted by Martin Gregorie <ma...@gregorie.org>.
On Sun, 2011-11-27 at 06:17 -0600, Sergio wrote:

> But emails are still getting in, any comment on what I need to fix on the
> rule? or if someone has a better rule to stop this that wants to share the
> rule, it will be appreciated.
> 
Change the meta to this:

meta   PORN_RULES (__PORN_RULE01 || __PORN_RULE02)

A quick glance at the SA rules for name prefixes would have told you
that rules with names that start with a double underscore have a zero
score, so your meta will never work: these rules are designed to be
combined by using logical operators.

Always test a new rule by running it through SA to prove it works. The
simplest is to use something like:

spamassassin <saved_spam.txt | less

If you routinely write rules you should consider installing a copy SA on
a host that's not part of your mail processing chain. Writing a set of
simple shell scripts for validating, testing and installing rules on the
mail chain makes adding or extending a rule much quicker. Building a
spam corpus is useful for regression and performance testing your rule
set and, if you classify the corpus, e.g. as pron, fishing, sale, etc.
spam, its almost as good as a ham corpus for checking rule specificity.


Martin