You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Vicki Brown <vl...@cfcl.com> on 2005/03/19 06:49:30 UTC

spamd and spamassassin appear to have different results

The rule
 header __CF_NOT_TO_ME           To !~ /(?:vlb\@cfcl|vicki\.vlb\@gmail)/i
 header __CF_NOT_CC_ME           Cc !~ /(?:vlb\@cfcl|vicki\.vlb\@gmail)/i
 meta   CF_NOT_FOR_ME            __CF_NOT_TO_ME && __CF_NOT_CC_ME
 score CF_NOT_FOR_ME             0.01
 describe CF_NOT_FOR_ME          Neither To nor Cc me

The mail:
 Date: Fri, 18 Mar 2005 09:05:50 -0500
 From: "TINY Video Camera" <Di...@sobvt...>
 To: <vl...@cfcl.com>
 Subject: A TINY digital video camera from DigiVu

 This Advertisment was brought to you by Newageoptin...

The SA result:
 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on cfcl.com
 X-Spam-Level:
 X-Spam-Status: No, score=-0.6 required=0.5 tests=ALL_TRUSTED,CF_NOT_FOR_ME,
	HTML_30_40,HTML_MESSAGE,URIBL_SBL autolearn=ham version=3.0.2

And that's not right. It _is_ for me. The CF_NOT_FOR_ME rule should not have
triggered.

What I like even less about this is that if I send that message through
  spamassassin -D
I get the results I expect (CF_NOT_FOR_ME does _not_ trigger).

 debug: is spam? score=-0.371 required=0.5
 debug: tests=ALL_TRUSTED,URIBL_SBL
 debug: subtests=__CF_NOT_CC_ME,__HAS_SUBJECT,__UNUSABLE_MSGID
 Date: Fri, 18 Mar 2005 09:05:50 -0500
 From: "TINY Video Camera" <Di...@...>
 To: <vl...@cfcl.com>
 Subject: A TINY digital video camera from DigiVu
 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on cfcl.com
 X-Spam-Level:
 X-Spam-Status: No, score=-0.4 required=0.5 tests=ALL_TRUSTED,URIBL_SBL
        autolearn=ham version=3.0.2

Spamassassin does what I think it should; spamc/spamd fails me.
I am beginning to get the bad feeling that spamd is not working correctly.
But what if anything can I / should I do about it?
Should I adjust all of our user procmail files to call spamassassin directly
instead of using spamc/spamd?
-- 
Vicki Brown          ZZZ
Journeyman Sourceror:  zz  |\     _,,,---,,_     Code, Docs, Process,
Scripts & Philtres      zz /,`.-'`'    -.  ;-;;,_   Perl, WWW, Mac OS X
http://cfcl.com/vlb       |,4-  ) )-,_. ,\ ( `'-'   SF Bay Area, CA  USA
_______________________  '---''(_/--'  `-'\_)  ___________________________

Re: spamd and spamassassin appear to have different results

Posted by Matt Kettler <mk...@evi-inc.com>.
Vicki Brown wrote:

>At 10:55 -0500 03/19/2005, Matt Kettler wrote:
>  
>
>>And be sure to spamassassin --lint it (should run without any messages),
>>and restart spamd after adding the rules.
>>    
>>
>
><vent>
>I realize that this is standard canonical advice and I will make the
>necessary assumption that it's not really being directed at me but...
>I am soooo tired of seeing this reminder.
>

You are correct, it's standard advice. I attach it to any email where I
suggest a user edit a configuration file, or is experiencing really
strange behavior. It's such a common problem I've made a standard
practice of issuing that reminder.

Also, since I only skim threads and offer suggestions where possible I
sometimes do offer the same advice to the same person more than once. My
memory of individual posters isn't perfect. I'm not paid to do this, and
I'm not a SpamAssassin developer. It's purely a "donation of my spare
time to offer small bits of free advice" activity.

If you feel my advice is more frustrating than it's worth you may
request I stop giving you further advice and I'll honor that request
without any hard feelings.

Re: spamd and spamassassin appear to have different results

Posted by jdow <jd...@earthlink.net>.
From: "Vicki Brown" <vl...@cfcl.com>

> At 10:55 -0500 03/19/2005, Matt Kettler wrote:
> >And be sure to spamassassin --lint it (should run without any messages),
> >and restart spamd after adding the rules.
>
> Why can't spamd re-read the system rules file if it's been changed? That's
> not difficult to test for (quickly).  I'll take an option to do this
PLEASE.

Because the spamd speedup comes from caching the system rules as well
as from avoiding the perl startup time.

{^_^}



Re: re-read the config file iff it has changed

Posted by alan premselaar <al...@12inch.com>.
Vicki Brown wrote:
> At 17:40 -0800 03/19/2005, jdow wrote:
> 
>>There is a substantial hit, Vicki, on the order of a factor of two on
>>my machines.
> 
> 
> We are talking about Only when the Config File has Changed_. OK, so you get a
> factor of two, what, once a week?
> 
> Sendmail does this (you run newaliases or "make"to trigger it).

For clarity's sake, sendmail has real-time access to certain db files 
(like aliases.db which is generated by 'newaliases'). since sendmail has 
real-time access to these files, re-creating the .db file from the text 
version is all that is necessary.

However; if you make changes to the sendmail.mc file and the run make to 
create the sendmail.cf file, you still need to restart sendmail for it 
to read those changes.

SpamAssassin reads in all its config files into memory and has no 
real-time file access for configuration files.

> 
> I simply do not believe there can be a "substantial hit" if spamd re-reads
> the config file
> 
>                 Only When The Config File Has Changed

in order to read the config file in >only when it has been changed< you 
need to store state information somewhere (in memory or a real-time 
accessed db file, etc) for each config file.  Since SA will read in 
/path/to/configfiles/*.cf  there could be any number of files that state 
needs to be stored for.  Also, to be prudent, state would also need to 
be stored for /usr/share/spamassassin/*.cf since some people will change 
those config files even against recommendations.

when fine-tuning for performance, even a call to stat() on a file or 
group of files can introduce performance hits.  This is because it 
effectively still has to open and close the file-handle.

Then there's the matter of; in what way, and how often do you poll the 
.cf files to check for changes?  that in itself could add a lot of 
overhead to the program that is unnecessary.

[..snip..]

alan

Re: re-read the config file iff it has changed

Posted by Loren Wilton <lw...@earthlink.net>.
> But this is a daemon that notices changes in user prefs files in real time
so
> the performance issue is spurious.  It's _already_ taking a performances
hit
> _every single time_ for every single user.

No.  For several reasons.

1) Usually user rules are disallowed.  So all SA has to do is open one file
and parse a realtively few lines, which don't include rules.  It then
overlays scores on the existing pre-parsed rules.

2) Rules live in many files, like 50s or hundreds of them.  These files do
NOT get reread for every user.  These are the files that have to be reread
to rebuild the rules after a change.

3) The user rules 'files' in many cases are actually database entries, so
there are no files to open in the first place.

Going out and checking the timestamps on 100 or 200 rules files for every
user would be considerably more overhead than checking the timestamp on one
file, and probably more overhead than opening and reading that one small
file.

Now, checking *occasionally* might not be a bad idea.  Where "occasionally"
was maybe once every few minutes or every few hundred mail messages,
depending on the traffic level at a site.  Event this could of course be the
wrong thing to do, so there would probably need to be an option to enable
this mode.  I'm not at all sure what the appropriate default setting for
this option should be.

        Loren


Re: re-read the config file iff it has changed

Posted by Vicki Brown <vl...@cfcl.com>.
At 13:55 -0500 03/20/2005, Theo Van Dinter wrote:
>Well, that's not sendmail rereading the config.  "newaliases" generates
>a new DBM/hash file from a flat text file.  Sendmail then realizes the
>file (that it has open) has changed and reopens the new file for access.
>The DB is a lookup table, not a "config" (ala sendmail.cf).

Duh.

>Sendmail then realizes the
>file (that it has open) has changed and reopens the new file for access.

This is what we programmers call an "implementation detail". if spamd/spamc
already _had_ the code to do what I want I wouldn't be asking for it, now
would I?

At 11:24 +0900 03/21/2005, alan premselaar wrote:
>For clarity's sake, sendmail has real-time access to certain db files
>(like aliases.db which is generated by 'newaliases'). since sendmail has
>real-time access to these files, re-creating the .db file from the text
>version is all that is necessary.

Uhuh.

>in order to read the config file in >only when it has been changed< you
>need to store state information somewhere

uhuh.

>
>when fine-tuning for performance, even a call to stat() on a file or
>group of files can introduce performance hits.  This is because it
>effectively still has to open and close the file-handle.

"options". Recall that I did say "an option to...'.  I will accept the hit
(which I personally think wouldn't be big enough to notice).

Theo Van Dinter
>This is a very standard method of having a daemon notice a config change.

But this is a daemon that notices changes in user prefs files in real time so
the performance issue is spurious.  It's _already_ taking a performances hit
_every single time_ for every single user.
-- 
Vicki Brown          ZZZ
Journeyman Sourceror:  zz  |\     _,,,---,,_     Code, Docs, Process,
Scripts & Philtres      zz /,`.-'`'    -.  ;-;;,_   Perl, WWW, Mac OS X
http://cfcl.com/vlb       |,4-  ) )-,_. ,\ ( `'-'   SF Bay Area, CA  USA
_______________________  '---''(_/--'  `-'\_)  ___________________________

Re: re-read the config file iff it has changed

Posted by Theo Van Dinter <fe...@kluge.net>.
On Sun, Mar 20, 2005 at 07:06:10PM -0800, Vicki Brown wrote:
> What's one more on rare occasions, really?

Exactly, "rare occasions".  Just send a SIGHUP.

> I'm sorry. I don't buy the arguments. I will remain unconvinced.

Ditto. :)

-- 
Randomly Generated Tagline:
"It was nice of you to let me reattach your arm."
  --Zoidber

Re: re-read the config file iff it has changed

Posted by Vicki Brown <vl...@cfcl.com>.
At 13:55 -0500 03/20/2005, Theo Van Dinter wrote:

>> I simply do not believe there can be a "substantial hit" if spamd re-reads
>> the config file
>
>Besides the fact there are tens of config files that would have to be
>watched (

It's _already_ watching and __reading__ "tens of config files".

man spamd:
    ..."spamd" will check per-user config files for every message,

What's one more on rare occasions, really?

I'm sorry. I don't buy the arguments. I will remain unconvinced.
-- 
Vicki Brown          ZZZ
Journeyman Sourceror:  zz  |\     _,,,---,,_     Code, Docs, Process,
Scripts & Philtres      zz /,`.-'`'    -.  ;-;;,_   Perl, WWW, Mac OS X
http://cfcl.com/vlb       |,4-  ) )-,_. ,\ ( `'-'   SF Bay Area, CA  USA
_______________________  '---''(_/--'  `-'\_)  ___________________________

Re: re-read the config file iff it has changed

Posted by Theo Van Dinter <fe...@kluge.net>.
On Sun, Mar 20, 2005 at 10:39:26AM -0800, Vicki Brown wrote:
> Sendmail does this (you run newaliases or "make"to trigger it).

Well, that's not sendmail rereading the config.  "newaliases" generates
a new DBM/hash file from a flat text file.  Sendmail then realizes the
file (that it has open) has changed and reopens the new file for access.
The DB is a lookup table, not a "config" (ala sendmail.cf).

> I simply do not believe there can be a "substantial hit" if spamd re-reads
> the config file

Besides the fact there are tens of config files that would have to be
watched (versus the handful that most daemons have), there's a ton of
processing involved with loading in the config.  There's no way to remove
a previously loaded config, so spamd would have to reread all of the
config files anytime there's a change to any of the them.  At that point,
it's more efficient to simply send a SIGHUP to spamd when you've made
a change.  It will then "reload" the config and you'll be off and running.

This is a very standard method of having a daemon notice a config change.

-- 
Randomly Generated Tagline:
"A duel of wits?  To the DEATH?"

re-read the config file iff it has changed

Posted by Vicki Brown <vl...@cfcl.com>.
At 17:40 -0800 03/19/2005, jdow wrote:
>There is a substantial hit, Vicki, on the order of a factor of two on
>my machines.

We are talking about Only when the Config File has Changed_. OK, so you get a
factor of two, what, once a week?

Sendmail does this (you run newaliases or "make"to trigger it).

I simply do not believe there can be a "substantial hit" if spamd re-reads
the config file

                Only When The Config File Has Changed

>
>You can accomplish the same thing you seem to want by changing your
>call to spamc into a call to spamassassin itself. You can simulate
>exactly what you want by changing to one child and the child runs
>once, I think.

no; that will run the config file every time. I do not want to read the
config file very time.

>Because the spamd speedup comes from caching the system rules as well
>as from avoiding the perl startup time.

That has only a slight peripheral relationship to what I requested.

   Rebuild the Cache IFF the Config file has changed


-- 
Vicki Brown          ZZZ
Journeyman Sourceror:  zz  |\     _,,,---,,_     Code, Docs, Process,
Scripts & Philtres      zz /,`.-'`'    -.  ;-;;,_   Perl, WWW, Mac OS X
http://cfcl.com/vlb       |,4-  ) )-,_. ,\ ( `'-'   SF Bay Area, CA  USA
_______________________  '---''(_/--'  `-'\_)  ___________________________

Re: spamd and spamassassin appear to have different results

Posted by jdow <jd...@earthlink.net>.
From: "Vicki Brown" <vl...@cfcl.com>

> At 13:36 -0600 03/19/2005, Michael Parker wrote:
> >On Sat, Mar 19, 2005 at 11:24:43AM -0800, Vicki Brown wrote:
> >>
> >> Why can't spamd re-read the system rules file if it's been changed?
That's
> >> not difficult to test for (quickly).  I'll take an option to do this
> >>PLEASE.
> >
> >You might enjoy that, but the performance hit it would cause would not
> >be liked by everyone else.
>
> a) I don't think there'd be that much of a performance hit if it first
> checked to see if the file had changed and only read the rule set iff the
> file had changed
>
> b) that's precisely why I said "I'll take an option to do this"
> because that way _no one else would be affected_ unless they were someone
> like me who thought reading the changes was more important than half a
> microsecond.

There is a substantial hit, Vicki, on the order of a factor of two on
my machines.

You can accomplish the same thing you seem to want by changing your
call to spamc into a call to spamassassin itself. You can simulate
exactly what you want by changing to one child and the child runs
once, I think.

Since at this time I am the only person using the 3.02 SpamAssassin
(Loren insists on 2.63 for some of its reporting capabilities) I
simply put my personal rules into the /etc/mail/spamassassin directory
and restart spamassassin after I run "spamassassin --lint" on changes.
It's gotten to be automatic.

{^_^}



Re: spamd and spamassassin appear to have different results

Posted by Vicki Brown <vl...@cfcl.com>.
At 13:36 -0600 03/19/2005, Michael Parker wrote:
>On Sat, Mar 19, 2005 at 11:24:43AM -0800, Vicki Brown wrote:
>>
>> Why can't spamd re-read the system rules file if it's been changed? That's
>> not difficult to test for (quickly).  I'll take an option to do this
>>PLEASE.
>
>You might enjoy that, but the performance hit it would cause would not
>be liked by everyone else.

a) I don't think there'd be that much of a performance hit if it first
checked to see if the file had changed and only read the rule set iff the
file had changed

b) that's precisely why I said "I'll take an option to do this"
because that way _no one else would be affected_ unless they were someone
like me who thought reading the changes was more important than half a
microsecond.


-- 
Vicki Brown          ZZZ
Journeyman Sourceror:  zz  |\     _,,,---,,_     Code, Docs, Process,
Scripts & Philtres      zz /,`.-'`'    -.  ;-;;,_   Perl, WWW, Mac OS X
http://cfcl.com/vlb       |,4-  ) )-,_. ,\ ( `'-'   SF Bay Area, CA  USA
_______________________  '---''(_/--'  `-'\_)  ___________________________

Re: spamd and spamassassin appear to have different results

Posted by Michael Parker <pa...@pobox.com>.
On Sat, Mar 19, 2005 at 11:24:43AM -0800, Vicki Brown wrote:
> 
> Why can't spamd re-read the system rules file if it's been changed? That's
> not difficult to test for (quickly).  I'll take an option to do this PLEASE.

You might enjoy that, but the performance hit it would cause would not
be liked by everyone else.

Michael

Re: spamd and spamassassin appear to have different results

Posted by Vicki Brown <vl...@cfcl.com>.
At 10:55 -0500 03/19/2005, Matt Kettler wrote:
>And be sure to spamassassin --lint it (should run without any messages),
>and restart spamd after adding the rules.

<vent>
I realize that this is standard canonical advice and I will make the
necessary assumption that it's not really being directed at me but...
I am soooo tired of seeing this reminder.

I KNOW about this now. Honest. I only have to be told once.
lint; HUP; edit; lint; HUP.
I'm about to script the #%@^&&* infernal thing.

Why can't spamd re-read the system rules file if it's been changed? That's
not difficult to test for (quickly).  I'll take an option to do this PLEASE.
</vent>
-- 

Vicki Brown          ZZZ
Journeyman Sourceror:  zz  |\     _,,,---,,_     Code, Docs, Process,
Scripts & Philtres      zz /,`.-'`'    -.  ;-;;,_   Perl, WWW, Mac OS X
http://cfcl.com/vlb       |,4-  ) )-,_. ,\ ( `'-'   SF Bay Area, CA  USA
_______________________  '---''(_/--'  `-'\_)  ___________________________

Re: spamd and spamassassin appear to have different results

Posted by Matt Kettler <mk...@comcast.net>.
At 12:49 AM 3/19/2005, Vicki Brown wrote:
>The rule
>  header __CF_NOT_TO_ME           To !~ /(?:vlb\@cfcl|vicki\.vlb\@gmail)/i
>  header __CF_NOT_CC_ME           Cc !~ /(?:vlb\@cfcl|vicki\.vlb\@gmail)/i
>  meta   CF_NOT_FOR_ME            __CF_NOT_TO_ME && __CF_NOT_CC_ME
>  score CF_NOT_FOR_ME             0.01
>  describe CF_NOT_FOR_ME          Neither To nor Cc me
>
>The mail:
>  Date: Fri, 18 Mar 2005 09:05:50 -0500
>  From: "TINY Video Camera" <Di...@sobvt...>
>  To: <vl...@cfcl.com>
>  Subject: A TINY digital video camera from DigiVu
>
>  This Advertisment was brought to you by Newageoptin...
>
>The SA result:
>  X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on cfcl.com
>  X-Spam-Level:
>  X-Spam-Status: No, score=-0.6 required=0.5 tests=ALL_TRUSTED,CF_NOT_FOR_ME,
>         HTML_30_40,HTML_MESSAGE,URIBL_SBL autolearn=ham version=3.0.2
>
>And that's not right. It _is_ for me. The CF_NOT_FOR_ME rule should not have
>triggered.
>
>What I like even less about this is that if I send that message through
>   spamassassin -D
>I get the results I expect (CF_NOT_FOR_ME does _not_ trigger).


Question - Is there any chance that your MTA, MDA or MUA re-wrote the To: 
header, causing it to actually be different in each place? Some mail tools 
will add the local domain to a username-only To: header. They also will 
commonly insert a To: header containing the envelope recipient if no To: 
header exists.

You might want to add some -0.01 scored riles that look for several 
different combinations, so you can try to debug what's going on:

header L_TO_EXISTS              exists:To
score L_TO_EXISTS       -0.01

header L_CC_EXISTS              exists:Cc
score L_CC_EXISTS       -0.01

header L_TO_CFCL                To =~/\@cfcl/i
score L_TO_CFCL -0.01

header L_TO_GMAIL               To =~/\@gmail/i
score L_TO_GMAIL        -0.01

header L_TO_VLB         To =~/vlb\@l/i
score L_TO_VLB  -0.01


And be sure to spamassassin --lint it (should run without any messages), 
and restart spamd after adding the rules. 


Re: spamd and spamassassin appear to have different results

Posted by Daniel Quinlan <qu...@pathname.com>.
"jdow" <jd...@earthlink.net> writes:

> Not having read the first part of this I do note there is not any blanket
> way to say it's only related to not starting spamd. There is still the
> 3.0x bug related to spamd children. The FIRST time a child runs a message
> it reads rules properly. Every time after the first time it does not pick
> up the per user rule scores. It picks up the user rules but not the
> scores. I *WISH* this could be repaired.

I didn't really read your reply except the last sentence, but I really
wish I had an ice cream cone.

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/

spamd rules ans scores

Posted by Vicki Brown <vl...@cfcl.com>.
At 23:25 -0800 03/18/2005, jdow wrote:
>Not having read the first part of this I do note there is not any blanket
>way to say it's only related to not starting spamd. There is still the
>3.0x bug related to spamd children. The FIRST time a child runs a message
>it reads rules properly. Every time after the first time it does not pick
>up the per user rule scores. It picks up the user rules but not the
>scores. I *WISH* this could be repaired.
>
>{^_^}

FERVENT agreement here. This bug is driving me nutso. According to the
bugzilla thread, it's been repaired but where's the patch update?

-- 
Vicki Brown          ZZZ
Journeyman Sourceror:  zz  |\     _,,,---,,_     Code, Docs, Process,
Scripts & Philtres      zz /,`.-'`'    -.  ;-;;,_   Perl, WWW, Mac OS X
http://cfcl.com/vlb       |,4-  ) )-,_. ,\ ( `'-'   SF Bay Area, CA  USA
_______________________  '---''(_/--'  `-'\_)  ___________________________

Re: spamd and spamassassin appear to have different results

Posted by jdow <jd...@earthlink.net>.
From: "Daniel Quinlan" <qu...@pathname.com>

> Vicki Brown <vl...@cfcl.com> writes:
>
> > The rule
> >  header __CF_NOT_TO_ME           To !~
/(?:vlb\@cfcl|vicki\.vlb\@gmail)/i
> >  header __CF_NOT_CC_ME           Cc !~
/(?:vlb\@cfcl|vicki\.vlb\@gmail)/i
> >  meta   CF_NOT_FOR_ME            __CF_NOT_TO_ME && __CF_NOT_CC_ME
> >  score CF_NOT_FOR_ME             0.01
> >  describe CF_NOT_FOR_ME          Neither To nor Cc me
>
> Easier:
>
>   header CF_NOT_FOR_ME            ToCc !~
/(?:vlb\@cfcl|vicki\.vlb\@gmail)/i
>   score CF_NOT_FOR_ME             0.01
>   describe CF_NOT_FOR_ME          Neither To nor Cc me
>
> > Spamassassin does what I think it should; spamc/spamd fails me.
>
> 98% likely to be the issue: you forgot to restart spamd

Not having read the first part of this I do note there is not any blanket
way to say it's only related to not starting spamd. There is still the
3.0x bug related to spamd children. The FIRST time a child runs a message
it reads rules properly. Every time after the first time it does not pick
up the per user rule scores. It picks up the user rules but not the
scores. I *WISH* this could be repaired.

{^_^}



Re: spamd and spamassassin appear to have different results

Posted by Daniel Quinlan <qu...@pathname.com>.
Vicki Brown <vl...@cfcl.com> writes:

> The rule
>  header __CF_NOT_TO_ME           To !~ /(?:vlb\@cfcl|vicki\.vlb\@gmail)/i
>  header __CF_NOT_CC_ME           Cc !~ /(?:vlb\@cfcl|vicki\.vlb\@gmail)/i
>  meta   CF_NOT_FOR_ME            __CF_NOT_TO_ME && __CF_NOT_CC_ME
>  score CF_NOT_FOR_ME             0.01
>  describe CF_NOT_FOR_ME          Neither To nor Cc me

Easier:

  header CF_NOT_FOR_ME            ToCc !~ /(?:vlb\@cfcl|vicki\.vlb\@gmail)/i
  score CF_NOT_FOR_ME             0.01
  describe CF_NOT_FOR_ME          Neither To nor Cc me

> Spamassassin does what I think it should; spamc/spamd fails me.

98% likely to be the issue: you forgot to restart spamd

Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/