You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Harry Putnam <re...@newsguy.com> on 2004/02/06 03:47:41 UTC

push pull rule... how can I manage to run it

I'd like to run a sort of push pull regex in local.cf like this:

header To_Newsguy_Not_Reader To =~ /\@newsguy.com/ && !/reader\@.newsguy/

But sa sees it as bad perl:

Use of uninitialized value in pattern match (m//) at
/etc/mail/spamassassin/local.cf, rule To_Newsguy_Not_Reader, line 1.

Is it possible to run this kind of dual regex?  If not then how might
I phrase it in a single regex.

I see a fair bit of spam here that doesn't raise any alarms with sa
but it has a certain characteristic.
The To field will contain address's that end in @newsguy.com but not my
address (reader@newsguy.com).  However my address will appear in the
Cc: field.



Re: push pull rule... how can I manage to run it

Posted by David B Funk <db...@engineering.uiowa.edu>.
On Fri, 6 Feb 2004, Bob Apthorpe wrote:

> So you want to match an address in the To field where the domain is
> 'newsguy.com' and the user is anyone except 'reader', correct?
>
> Will this work?:
>
>   header To_Newsguy_Not_Reader To =~ /(?<!reader)\@newsguy\.com\b/i
>   describe mesg to newsguy recipient but not reader
>   score To_Newsguy_Not_Reader 6
>
> where (?<! ... ) is a zero-width negative look-behind assertion, a
> somewhat obscure bit of perl arcana found with 'perldoc perlre', which
> is written in some kind of crazy moon language.

Using it in limited space (such as a specific header) is OK, just
avoid it for general body matching. The negative look-behind matching
can be a performance killer.


-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: push pull rule... how can I manage to run it

Posted by Harry Putnam <re...@newsguy.com>.
Bob Apthorpe <ap...@cynistar.net> writes:

Just an aside to Theo:
  I take your point about using Mail::SpamAssassin::Conf more but I
  will say it would have been a very long time before I put this
  together from reading that.

  Its always the same problem with documentation... too few examples.
  But of coures an example for everyones need would make it untenably
  long

> Will this work?:
>
>   header To_Newsguy_Not_Reader To =~ /(?<!reader)\@newsguy\.com\b/i
>   describe mesg to newsguy recipient but not reader
>   score To_Newsguy_Not_Reader 6

Yup, that does work and testing both ways using spamd, against some 64
messages and 3 runs each, seems to indicate (with just simple `time'
tests on a non-busy os) they are about the same time wise.  At least
on my choice of messages.  (all spam but only 15 match this
partituclar target rule)

With slick regex
 [ header To_Newsguy_Not_Reader To =~ /(?<!reader)\@newsguy\.com\b/i
   score To_Newsguy_Not_Reader 6 ]
-- 
  real    0m9.871s
  user    0m1.990s
  sys     0m0.610s
  
  real    0m9.878s
  user    0m1.870s
  sys     0m0.780s
  
  real    0m9.846s
  user    0m1.980s
  sys     0m0.630s

With 3 part meta rules:
 [ header __To_Newsguy To =~ /\@newsguy.com/ 
   describe To newsguy (for a meta combination rule)
   score To_Newsguy 1
  
   header __To_Newsguy_Reader To  =~ /reader\@.newsguy/
   describe To reader at newsguy (for a meta combination rule)
   score To_Newsguy_Reader 1
 
   meta To_Newsguy_Not_Reader __To_Newsguy && !__To_Newsguy_Reader
   describe mesg to newsguy recipient but not reader
   score To_Newsguy_Not_Reader 6 ]
-- 
  real    0m9.842s
  user    0m1.870s
  sys     0m0.700s
  
  real    0m9.861s
  user    0m1.970s
  sys     0m0.590s
  
  real    0m9.903s
  user    0m1.930s
  sys     0m0.700s


Re: push pull rule... how can I manage to run it

Posted by Bob Apthorpe <ap...@cynistar.net>.
On Thu, 05 Feb 2004 20:47:41 -0600 Harry Putnam <re...@newsguy.com> wrote:

> I'd like to run a sort of push pull regex in local.cf like this:
> 
> header To_Newsguy_Not_Reader To =~ /\@newsguy.com/ && !/reader\@.newsguy/
> 
> But sa sees it as bad perl:
> 
> Use of uninitialized value in pattern match (m//) at
> /etc/mail/spamassassin/local.cf, rule To_Newsguy_Not_Reader, line 1.
> 
> Is it possible to run this kind of dual regex?  If not then how might
> I phrase it in a single regex.
> 
> I see a fair bit of spam here that doesn't raise any alarms with sa
> but it has a certain characteristic.
> The To field will contain address's that end in @newsguy.com but not my
> address (reader@newsguy.com).  However my address will appear in the
> Cc: field.

So you want to match an address in the To field where the domain is
'newsguy.com' and the user is anyone except 'reader', correct?

Will this work?:

  header To_Newsguy_Not_Reader To =~ /(?<!reader)\@newsguy\.com\b/i
  describe mesg to newsguy recipient but not reader
  score To_Newsguy_Not_Reader 6

where (?<! ... ) is a zero-width negative look-behind assertion, a
somewhat obscure bit of perl arcana found with 'perldoc perlre', which
is written in some kind of crazy moon language.

-- Bob

Re: mmm!

Posted by Brian Godette <bg...@idcomm.com>.
On Friday 06 February 2004 03:32 pm, Matt Kettler wrote:
> At 04:47 PM 2/6/2004, Brian Godette wrote:
> >Add to /etc/mail/spamassassin/local.cf
> >def_whitelist_from_rcvd  *@incubator.apache.org apache.org
>
> That's mostly right... Just drop the "def_" part...
>
> def_ is intended to be only be used on the whitelists that are in the
> standard ruleset, as distributed by SA.
>
> User defined whitelists should use whitelist_from_rcvd.
>
> This way you can tell if it's one of SA's default whitelists, or one of
> your whitelists that hits, because the rule name will be different..
>
> whitelist_from_rcvd will cause USER_IN_WHITELIST to hit
>
> def_whitelist_from_rcvd will cause USER_IN_DEF_WHITELIST to hit.
>
> If USER_IN_DEF_WHITELIST hits spam, it's a SA bug. if USER_IN_WHITELIST
> hits spam, it's your config that's the problem.
>
> DEF_WHITELIST also only knocks 15 points off, normal is 100... and 100 is
> probably better for sa discussion lists :)

True, however I've not had a message here push it over +5 yet. I'd rather keep 
it as a low negative on the off chance that somehow one is spoofed, and 
really one would think SA's own mailing list would be in SA's default 
whitelist <g>.

If a legitemate message does go over +5 I'll move it off def_ at that point. I 
don't have the option of simply not scanning sa-users due to a site wide 
install and others besides myself receive this list.

While on the topic of def_whitelist in the standard distribution, *@ebay.com 
ebay.com and *@freshmeat.net freshmeat.net should probably be added. 
Freshmeat really needs it as the daily newsletter tends to hit a number of 
standard rules, and a large amount of the custom rules out there, especially 
tripwire and chickenpox due to version numbers and program naming.


Re: mmm!

Posted by Matt Kettler <mk...@evi-inc.com>.
At 04:47 PM 2/6/2004, Brian Godette wrote:
>Add to /etc/mail/spamassassin/local.cf
>def_whitelist_from_rcvd  *@incubator.apache.org apache.org

That's mostly right... Just drop the "def_" part...

def_ is intended to be only be used on the whitelists that are in the 
standard ruleset, as distributed by SA.

User defined whitelists should use whitelist_from_rcvd.

This way you can tell if it's one of SA's default whitelists, or one of 
your whitelists that hits, because the rule name will be different..

whitelist_from_rcvd will cause USER_IN_WHITELIST to hit

def_whitelist_from_rcvd will cause USER_IN_DEF_WHITELIST to hit.

If USER_IN_DEF_WHITELIST hits spam, it's a SA bug. if USER_IN_WHITELIST 
hits spam, it's your config that's the problem.

DEF_WHITELIST also only knocks 15 points off, normal is 100... and 100 is 
probably better for sa discussion lists :)





Re: mmm!

Posted by Brian Godette <bg...@idcomm.com>.
Add to /etc/mail/spamassassin/local.cf
def_whitelist_from_rcvd  *@incubator.apache.org apache.org

On Friday 06 February 2004 11:58 am, cami wrote:
> | 2004-02-05 | 10:07:18  |
>
> <sp...@incubator.apache.org> |
> <ca...@mweb.co.za> | Blocked   |         9 |
>
> Hi All..
>
> The log above shows that at least one message (that i've checked)
> is getting caught by SA.. Can the next version of SA have
> incubator.apache.org whitelisted? (you can imagine how many mails
> from the new mailing lists are gonna get caught by SA itself)
>
> Regards,
> Cami


Re: mmm!

Posted by Rich Puhek <rp...@etnsystems.com>.
cami wrote:

> | 2004-02-05 | 10:07:18  | 
> <sp...@incubator.apache.org> | 
> <ca...@mweb.co.za> | Blocked   |         9 |
> 
> Hi All..
> 
> The log above shows that at least one message (that i've checked)
> is getting caught by SA.. Can the next version of SA have
> incubator.apache.org whitelisted? (you can imagine how many mails
> from the new mailing lists are gonna get caught by SA itself)
> 
> Regards,
> Cami

Umm, you don't want to pump the mailing list through SA if you can help 
it. Screws up bayes auto-learning, as mentioned, messages will have lots 
of FPs (due to discussion of spam, sample spam, presence of rulesets, etc.).

--Rich



mmm!

Posted by cami <ca...@mweb.co.za>.
| 2004-02-05 | 10:07:18  | 
<sp...@incubator.apache.org> | 
<ca...@mweb.co.za> | Blocked   |         9 |

Hi All..

The log above shows that at least one message (that i've checked)
is getting caught by SA.. Can the next version of SA have
incubator.apache.org whitelisted? (you can imagine how many mails
from the new mailing lists are gonna get caught by SA itself)

Regards,
Cami

Re: push pull rule... how can I manage to run it

Posted by Theo Van Dinter <fe...@kluge.net>.
On Thu, Feb 05, 2004 at 11:36:32PM -0600, David B Funk wrote:
> The score is undefined for double underscore rules, it just
> results in the 'defining' of a variable that can be tested in
> meta rules. Thus there is no score to list in the reports, only
> if a score assigned due to meta-rules that evaluate those
> variables.

Well if you really wanted to be pedantic here ...  The score for
those rules are actually 1 (default for all rules w/out a score), but
PerMsgStatus has a special case which doesn't add the score to the hit
total when handling a hit for rules starting with "__".

I was trying to keep the description simple, albeit not neccessarily
accurate wrt what actually happens in the code.

-- 
Randomly Generated Tagline:
"What was sliced bread the greatest thing since?"   - Unknown

Re: push pull rule... how can I manage to run it

Posted by David B Funk <db...@engineering.uiowa.edu>.
On Thu, 5 Feb 2004, Theo Van Dinter wrote:

> On Thu, Feb 05, 2004 at 10:41:10PM -0600, Harry Putnam wrote:
> > hitting.  (first one) Or does the double underscore tell spama to
> > ignore the rule unless its referenced in a meta rule?
>
> Not ignored, but the score is 0 for those rules.  'perldoc
> Mail::SpamAssassin::Conf' may help here. :)

Well, not to be nitpicky, but to be pedandic ;)
If the score were truely 0, the rule would not be run at all.
The score is undefined for double underscore rules, it just
results in the 'defining' of a variable that can be tested in
meta rules. Thus there is no score to list in the reports, only
if a score assigned due to meta-rules that evaluate those
variables.

For testing purposes, prefix those rule names with a 'T' then
they will be assigned a small score (0.01) and show up in reports.
EG:

body T__MY_RULE1  /\bword1/
body T__MY_RULE2  /\bword2/
meta MY_RULE      T__MY_RULE1 && T__MY_RULE2
describe MY_RULE  found match on word1 && word2
score MY_RULE     1.5

Once you're happy with the way that they work, remove the 'T's

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: push pull rule... how can I manage to run it

Posted by Theo Van Dinter <fe...@kluge.net>.
On Thu, Feb 05, 2004 at 10:41:10PM -0600, Harry Putnam wrote:
> hitting.  (first one) Or does the double underscore tell spama to
> ignore the rule unless its referenced in a meta rule?

Not ignored, but the score is 0 for those rules.  'perldoc
Mail::SpamAssassin::Conf' may help here. :)

-- 
Randomly Generated Tagline:
Sarchasm: The gulf between the author of sarcastic wit, and the recipient
 who doesn't get it.
         - Washington Post

Re: push pull rule... how can I manage to run it

Posted by Harry Putnam <re...@newsguy.com>.
Harry Putnam <re...@newsguy.com> writes:

>> Cool, thanks.  That is handling the messages I was after..
>
> Nope... I jumped the gun here ... misreading the debug output.
>
> How does the scoring work with this setup?
>
> I've tried a number of combinations but I never see the meta rule get
> hit. 

A crap... I over looked your leading underscores on the first 2.  But
still one thing seems odd.  With the syntax right (including
overlooked underscores) I see in -D debug output, that the score is
grabbed from the meta rule, but not from the other one that should be
hitting.  (first one) Or does the double underscore tell spama to
ignore the rule unless its referenced in a meta rule?


Re: push pull rule... how can I manage to run it

Posted by Harry Putnam <re...@newsguy.com>.
Harry Putnam <re...@newsguy.com> writes:

> Theo Van Dinter <fe...@kluge.net> writes:
>
>> On Thu, Feb 05, 2004 at 08:47:41PM -0600, Harry Putnam wrote:
>>> I'd like to run a sort of push pull regex in local.cf like this:
>>> 
>>> header To_Newsguy_Not_Reader To =~ /\@newsguy.com/ && !/reader\@.newsguy/
>>> 
>>> But sa sees it as bad perl:
>>> 
>>> Use of uninitialized value in pattern match (m//) at
>>> /etc/mail/spamassassin/local.cf, rule To_Newsguy_Not_Reader, line 1.
>>> 
>>> Is it possible to run this kind of dual regex?  If not then how might
>>> I phrase it in a single regex.
>>
>> Not in the way you're doing it.  You have to make 2 seperate rules,
>> then use a meta to put them together.  ie:
>>
>> header __To_Newsguy To =~ /\@newsguy.com/
>> header __To_Reader_Newsguy To =~ /reader\@newsguy/
>> meta To_Newsguy_Not_Reader __To_Newsguy && !__To_Reader_Newsguy
>
> Cool, thanks.  That is handling the messages I was after..

Nope... I jumped the gun here ... misreading the debug output.

How does the scoring work with this setup?

I've tried a number of combinations but I never see the meta rule get
hit. 

  header To_Newsguy To =~ /\@newsguy.com/
  describe To newsguy (for a meta combination rule)
  score To_Newsguy 1

  header To_Newsguy_Reader To  =~ /reader\@.newsguy/
  describe To reader at newsguy (for a meta combination rule)
  score To_Newsguy_Reader 1

  meta To_Newsguy_Not_Reader __To_Newsguy && __To_Newsguy_Reader
  describe mesg to newsguy recipient but not reader
  score To_Newsguy_Not_Reader 6

Do the first 2 have to add  or can one give a score to the meta
itself like above?

I've tried setting 0 to the first 2 but not getting how this is
supposed to work...


Re: push pull rule... how can I manage to run it

Posted by Harry Putnam <re...@newsguy.com>.
Theo Van Dinter <fe...@kluge.net> writes:

> On Thu, Feb 05, 2004 at 08:47:41PM -0600, Harry Putnam wrote:
>> I'd like to run a sort of push pull regex in local.cf like this:
>> 
>> header To_Newsguy_Not_Reader To =~ /\@newsguy.com/ && !/reader\@.newsguy/
>> 
>> But sa sees it as bad perl:
>> 
>> Use of uninitialized value in pattern match (m//) at
>> /etc/mail/spamassassin/local.cf, rule To_Newsguy_Not_Reader, line 1.
>> 
>> Is it possible to run this kind of dual regex?  If not then how might
>> I phrase it in a single regex.
>
> Not in the way you're doing it.  You have to make 2 seperate rules,
> then use a meta to put them together.  ie:
>
> header __To_Newsguy To =~ /\@newsguy.com/
> header __To_Reader_Newsguy To =~ /reader\@newsguy/
> meta To_Newsguy_Not_Reader __To_Newsguy && !__To_Reader_Newsguy

Cool, thanks.  That is handling the messages I was after..



Re: push pull rule... how can I manage to run it

Posted by Theo Van Dinter <fe...@kluge.net>.
On Thu, Feb 05, 2004 at 08:47:41PM -0600, Harry Putnam wrote:
> I'd like to run a sort of push pull regex in local.cf like this:
> 
> header To_Newsguy_Not_Reader To =~ /\@newsguy.com/ && !/reader\@.newsguy/
> 
> But sa sees it as bad perl:
> 
> Use of uninitialized value in pattern match (m//) at
> /etc/mail/spamassassin/local.cf, rule To_Newsguy_Not_Reader, line 1.
> 
> Is it possible to run this kind of dual regex?  If not then how might
> I phrase it in a single regex.

Not in the way you're doing it.  You have to make 2 seperate rules,
then use a meta to put them together.  ie:

header __To_Newsguy To =~ /\@newsguy.com/
header __To_Reader_Newsguy To =~ /reader\@newsguy/
meta To_Newsguy_Not_Reader __To_Newsguy && !__To_Reader_Newsguy

-- 
Randomly Generated Tagline:
No prisoner's dilemma here.  Over the long term, symbiosis is more
 useful than parasitism.  More fun, too.  Ask any mitochondria.
              -- Larry Wall in <19...@wall.org>