You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Matías López Bergero <ml...@udesa.edu.ar> on 2005/02/11 17:48:18 UTC

diferent rules hitting with diferent users

Hi,

I just got a spam message in my inbox without being flagged by SA, and I
find it very curios because it was a clear spam message. Wen I examine
the headers to see the score assigned by SA I get more surprised

____
   Content analysis details:   (0.4 points, 5.0 required)
   ____
    pts rule name              description
   ---- ----------------------
--------------------------------------------------
    0.2 HTML_MESSAGE           BODY: HTML included in message
    0.2 HTML_FONT_BIG          BODY: HTML tag for a big font size

WTF?

I copy the message a feed SA with it

X-Spam-Status: No, score=2.4 required=5.0 tests=BAYES_99,HTML_80_90,
         HTML_FONT_BIG,HTML_IMAGE_RATIO_08,HTML_MESSAGE,MIME_QP_LONG_LINE
         autolearn=no version=3.0.2

OK, here it hit the rules that it was supposed to hit. But is still
getting a low score.

What can be causing this different scores??

I thought that maybe the running user name was the problem...
So I run SA before as root, now I su to the user name which receive the
mail the first time and guess what...

X-Spam-Status: No, score=0.4 required=5.0 tests=HTML_FONT_BIG,HTML_MESSAGE
         autolearn=no version=3.0.2

WTH?

I run it with another user and I get different results

X-Spam-Status: No, score=1.8 required=5.0 tests=DCC_CHECK,HTML_FONT_BIG,
         HTML_MESSAGE autolearn=no version=3.0.2


What is going on here? Why the matching rules are different for each
user? Any ideas?

BR,
Matías.

Re: diferent rules hitting with diferent users

Posted by Matias Lopez Bergero <ml...@udesa.edu.ar>.
Robert Menschel wrote:
> Hello Matías,
> 
> Friday, February 11, 2005, 1:35:19 PM, you wrote:
> 
> 
>>>It's also hitting BAYES_99, which means your global Bayes database
>>>(since you're running as root) thinks for sure this is NOT spam.
>>>You've got a problem with the emails you've been learning in the past.
> 
> 
> MLB> This are bad news.
> MLB> What could happened?? This can be caused only by a bad training?
> MLB> I have stored all the mail that I feed sa-learn with.
> MLB> What can I do to debug and solve this problem??
> 
> The problem here is that since you use the central database as root,
> but not as the user, root's execution of spamassassin is able to see
> this, but the user is not.

I'm going to configure the bayes site-wide and see if I achieve a better 
result.

> As for the other differences, I don't know why you're getting the
> different rule hits.
> 
> If you save the "spamassassin -D" output from each of the three users
> to files, and use diff to compare them, what differences are there?

That's exactly what I did,
The only differences that I got belongs to the output of razor or dcc.

[root@anubis tmp]# spamassassin -D < /tmp/raro 2>&1 | grep .cf > rootsacheck
[root@anubis tmp]# su - mlopezb
-bash-2.05b$ spamassassin -D < /tmp/raro 2>&1 | grep .cf > mlbsacheck
-bash-2.05b$ logout
[root@anubis tmp]# diff /tmp/rootsacheck ~mlopezb/mlbsacheck
43c43,44
< Feb 14 12:24:49.585575 check[31720]: [ 6] shock.cloudmark.com is a 
Catalogue Server srl 5060; computed min_cf=6, Server se: C8
---
 > Feb 14 12:25:10.727032 check[31797]: [ 6] pride.cloudmark.com is a 
Catalogue Server srl 5060; computed min_cf=6, Server se: C8
 > Feb 14 12:25:11.234550 check[31797]: [ 6] pride.cloudmark.com is a 
Catalogue Server srl 5060; computed min_cf=6, debug: Using results from 
Razor v2.67
49d49
< Feb 14 12:24:50.109628 check[31720]: [ 6] shock.cloudmark.com is a 
Catalogue Server srl 5060; computed min_cf=6, Server se: C8
51c51
< Feb 14 12:24:50.713196 check[31720]: [ 6] mail 1.0 e=4 
sig=5Zkyq938lxN0VuKPy1icYZcRSLMA: Is spam: cf 40 >= min_cf 6
---
 > Feb 14 12:25:11.844318 check[31797]: [ 6] mail 1.0 e=4 
sig=5Zkyq938lxN0VuKPy1icYZcRSLMA: Is spam: cf 40 >= min_cf 6
[root@anubis tmp]#

Thanks a lot for your help Bob.

BR,
Matías.

Re[2]: diferent rules hitting with diferent users

Posted by Robert Menschel <Ro...@Menschel.net>.
Hello Matías,

Friday, February 11, 2005, 1:35:19 PM, you wrote:

>> It's also hitting BAYES_99, which means your global Bayes database
>> (since you're running as root) thinks for sure this is NOT spam.
>> You've got a problem with the emails you've been learning in the past.

MLB> This are bad news.
MLB> What could happened?? This can be caused only by a bad training?
MLB> I have stored all the mail that I feed sa-learn with.
MLB> What can I do to debug and solve this problem??

Sorry -- I've been working too hard.  BAYES_99 says "most definitely
this IS spam."  Your Bayes database here is OK.

The problem here is that since you use the central database as root,
but not as the user, root's execution of spamassassin is able to see
this, but the user is not.

As for the other differences, I don't know why you're getting the
different rule hits.

If you save the "spamassassin -D" output from each of the three users
to files, and use diff to compare them, what differences are there?

Bob Menschel




Re: diferent rules hitting with diferent users

Posted by Matías López Bergero <ml...@udesa.edu.ar>.
Robert Menschel wrote:
> Hello Matías,
> 
> Friday, February 11, 2005, 8:48:18 AM, you wrote:
> 
> MLB> I just got a spam message in my inbox without being flagged by SA, and I
> MLB> find it very curios because it was a clear spam message. Wen I examine
> MLB> the headers to see the score assigned by SA I get more surprised
> 
> MLB>    Content analysis details:   (0.4 points, 5.0 required)
> MLB>     0.2 HTML_MESSAGE           BODY: HTML included in message
> MLB>     0.2 HTML_FONT_BIG          BODY: HTML tag for a big font size
> 
> MLB> I copy the message a feed SA with it
> MLB> X-Spam-Status: No, score=2.4 required=5.0 tests=BAYES_99,HTML_80_90,
> MLB> HTML_FONT_BIG,HTML_IMAGE_RATIO_08,HTML_MESSAGE,MIME_QP_LONG_LINE
> MLB>          autolearn=no version=3.0.2
> 
> MLB> OK, here it hit the rules that it was supposed to hit. But is still
> MLB> getting a low score.
> 
> It's also hitting BAYES_99, which means your global Bayes database
> (since you're running as root) thinks for sure this is NOT spam.
> You've got a problem with the emails you've been learning in the past.

This are bad news.
What could happened?? This can be caused only by a bad training?
I have stored all the mail that I feed sa-learn with.
What can I do to debug and solve this problem??

> MLB> What can be causing this different scores??
> 
> MLB> I thought that maybe the running user name was the problem...
> MLB> So I run SA before as root, now I su to the user name which receive the
> MLB> mail the first time and guess what...
> 
> MLB> X-Spam-Status: No, score=0.4 required=5.0
> MLB> tests=HTML_FONT_BIG,HTML_MESSAGE
> MLB>          autolearn=no version=3.0.2
> 
> This suggests that maybe the user's user_prefs file is turning off
> some rules, or maybe you've got a configuration problem such that the
> user's emails aren't going against the same rules files.
> 
> If you run spamassassin -D on this email from root, and from your two
> users, are the lists of *.cf rules files used the same?

Yes, Exactly the same.
I had never touched those user preferences.
What can be happening?

I have moved the sare rules from the /usr/share/spamassassin dir to 
/etc/mail/spamassassin dir this morning, but I restarted spamd and the 
checking are happening after this, so if one user if affected, the other 
one should do, besides, the spamassassin -D command output is showing 
the same .cf files.

Thanks for your help Bob.

BR,
Matías.

Re: diferent rules hitting with diferent users

Posted by Robert Menschel <Ro...@Menschel.net>.
Hello Matías,

Friday, February 11, 2005, 8:48:18 AM, you wrote:

MLB> I just got a spam message in my inbox without being flagged by SA, and I
MLB> find it very curios because it was a clear spam message. Wen I examine
MLB> the headers to see the score assigned by SA I get more surprised

MLB>    Content analysis details:   (0.4 points, 5.0 required)
MLB>     0.2 HTML_MESSAGE           BODY: HTML included in message
MLB>     0.2 HTML_FONT_BIG          BODY: HTML tag for a big font size

MLB> I copy the message a feed SA with it
MLB> X-Spam-Status: No, score=2.4 required=5.0 tests=BAYES_99,HTML_80_90,
MLB> HTML_FONT_BIG,HTML_IMAGE_RATIO_08,HTML_MESSAGE,MIME_QP_LONG_LINE
MLB>          autolearn=no version=3.0.2

MLB> OK, here it hit the rules that it was supposed to hit. But is still
MLB> getting a low score.

It's also hitting BAYES_99, which means your global Bayes database
(since you're running as root) thinks for sure this is NOT spam.
You've got a problem with the emails you've been learning in the past.

MLB> What can be causing this different scores??

MLB> I thought that maybe the running user name was the problem...
MLB> So I run SA before as root, now I su to the user name which receive the
MLB> mail the first time and guess what...

MLB> X-Spam-Status: No, score=0.4 required=5.0
MLB> tests=HTML_FONT_BIG,HTML_MESSAGE
MLB>          autolearn=no version=3.0.2

This suggests that maybe the user's user_prefs file is turning off
some rules, or maybe you've got a configuration problem such that the
user's emails aren't going against the same rules files.

If you run spamassassin -D on this email from root, and from your two
users, are the lists of *.cf rules files used the same?

Bob Menschel




Re: diferent rules hitting with diferent users

Posted by Matías López Bergero <ml...@udesa.edu.ar>.
Matt Kettler wrote:
> At 11:48 AM 2/11/2005, Matías López Bergero wrote:
> 
>> I copy the message a feed SA with it
>>
>> X-Spam-Status: No, score=2.4 required=5.0 tests=BAYES_99,HTML_80_90,
>>          HTML_FONT_BIG,HTML_IMAGE_RATIO_08,HTML_MESSAGE,MIME_QP_LONG_LINE
>>          autolearn=no version=3.0.2
>>
>> OK, here it hit the rules that it was supposed to hit. But is still
>> getting a low score.
>>
>> What can be causing this different scores??
> 
> 
> FIrst, the big difference is BAYES_99 hitting for the one user, and no 
> bayes at all for the other.. That's a pretty significant difference.

I think that this it's happening because of some access denied at the db :-P
I'm seeing lots of this kind of error messages in my mail log file:
spamd[848]: bayes: bayes db version 0 is not able to be used, aborting!
at /usr/lib/perl5/site_perl/5.8.0/Mail/SpamAssassin/BayesStore/DBM.pm
line 160, <GEN1564> line 69.

Michael Parker thinks that this is a pretty good indication that SA
couldn't get a lock on the bayes db files.

http://article.gmane.org/gmane.mail.spam.spamassassin.general/63264

I'm running spamd as root, and training the bayes as root, so the db at
root's home must be the problem.

Currently reading the setting up site-wide Bayesian filtering SA wiki.
I guess that is going to be my next move.

> The other hits would seem to indicate the messages are formatted 
> differently. In particular MIME_QP_LONG_LINE suggests the message has 
> been re-encoded.
> 
> Define, exactly, what "copy the message and feed SA with it" means. 
> Specifically, where are you copying from, how are you copying, and where 
> are you copying to? Has the message been touched by any other servers, 
> or a mail client inbetween?

I got the email via pop using mozilla email client. It stores the
messages in an mbox format, so I copy the mail to a separated "folder"
in my mozilla inbox, and copy the folder using scp to my mail server.
$ scp .mozilla/[...]/Inbox.sbd/SPAM.sbd/raro matias@mail:~

And then at the mail server I scanned the file as root and the other
users like I said.

$ spamassassin --mbox < raro

The differences appear in the scanning results appears at the mail
server wen the file was already copied, so it's not a truncated message
problem IMHO.

BR,
Matías.


Re: diferent rules hitting with diferent users

Posted by Matt Kettler <mk...@evi-inc.com>.
At 11:48 AM 2/11/2005, Matías López Bergero wrote:
>I copy the message a feed SA with it
>
>X-Spam-Status: No, score=2.4 required=5.0 tests=BAYES_99,HTML_80_90,
>          HTML_FONT_BIG,HTML_IMAGE_RATIO_08,HTML_MESSAGE,MIME_QP_LONG_LINE
>          autolearn=no version=3.0.2
>
>OK, here it hit the rules that it was supposed to hit. But is still
>getting a low score.
>
>What can be causing this different scores??

FIrst, the big difference is BAYES_99 hitting for the one user, and no 
bayes at all for the other.. That's a pretty significant difference.

The other hits would seem to indicate the messages are formatted 
differently. In particular MIME_QP_LONG_LINE suggests the message has been 
re-encoded.

Define, exactly, what "copy the message and feed SA with it" means. 
Specifically, where are you copying from, how are you copying, and where 
are you copying to? Has the message been touched by any other servers, or a 
mail client inbetween?