You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Randy Ramsdell <rr...@livedatagroup.com> on 2007/12/13 17:29:21 UTC

Manuel check vs. auto

Hi,

I have doing some checking of spam messages that make it through our 
mail filtering systems and noticed that the spam score does not reflect 
what I get when checking manually.

An example spam report:

X-Spam-Status: No, score=3.068 tagged_above=-9999 required=5
 tests=[BAYES_50=0.001, HELO_DYNAMIC_DHCP=3.066, HTML_MESSAGE=0.001]
X-Spam-Score: 3.068



But when using "spamassassin -D -lint < $message" it hits more rules:

Content analysis details:   (12.5 points, 5.0 required)

 pts rule name              description
---- ---------------------- 
--------------------------------------------------
 3.1 HELO_DYNAMIC_DHCP      Relay HELO'd using suspicious hostname (DHCP)
 2.0 TVD_FUZZY_DEGREE       BODY: TVD_FUZZY_DEGREE
 0.0 HTML_MESSAGE           BODY: HTML included in message
 3.5 BAYES_99               BODY: Bayesian spam probability is 99 to 100%
                            [score: 1.0000]
 3.9 RCVD_IN_XBL            RBL: Received via a relay in Spamhaus XBL
                            [41.212.143.24 listed in zen.spamhaus.org]
 0.0 RCVD_IN_PBL            RBL: Received via a relay in Spamhaus PBL
                            [41.212.143.24 listed in zen.spamhaus.org]

That is a big difference!

Any ideas about why this is?

Thanks,
Randy Ramsdell



Re: Manuel check vs. auto

Posted by Matt Kettler <mk...@verizon.net>.
Theo Van Dinter wrote:
> On Thu, Dec 13, 2007 at 11:29:21AM -0500, Randy Ramsdell wrote:
>   
>> I have doing some checking of spam messages that make it through our 
>> mail filtering systems and noticed that the spam score does not reflect 
>> what I get when checking manually.
>>
>> An example spam report:
>> X-Spam-Status: No, score=3.068 tagged_above=-9999 required=5
>> tests=[BAYES_50=0.001, HELO_DYNAMIC_DHCP=3.066, HTML_MESSAGE=0.001]
>> X-Spam-Score: 3.068
>>
>> But when using "spamassassin -D -lint < $message" it hits more rules:
>>     
> [...]
>   
Are you *SURE* that works Randy?

note that --lint specifies rule-test only mode, and message scanning is
dsabled. lint also (in recent versions) force disables any network
tests, so hitting RCVD_IN_XBL would be impossible with the --lint parameter.

>> 3.5 BAYES_99               BODY: Bayesian spam probability is 99 to 100%
>> 3.9 RCVD_IN_XBL            RBL: Received via a relay in Spamhaus XBL
>> 0.0 RCVD_IN_PBL            RBL: Received via a relay in Spamhaus PBL
>>
>> That is a big difference!
>> Any ideas about why this is?
>>     
>
> It appears that the first results are a) using a different Bayes DB,
> and b) not using network tests (aka: local mode).
>
>   
Also:

 c) performed using amavis, which will force local-only mode via the
*sa_local_tests_only *option in the amavis config. If you want network
tests, set this to 0.

Also, amavis will run the tests as whatever user amavis runs as, so if
you want to do any sa-learning, or valid spamassassin tests, do so via
something like:

 su $amavis_userid -c 'spamassassin -t < $message





Re: Manuel check vs. auto

Posted by Randy Ramsdell <rr...@livedatagroup.com>.
Richard Frovarp wrote:
> Randy Ramsdell wrote:
>> Randy Ramsdell wrote:
>>> Theo Van Dinter wrote:
>>>> On Thu, Dec 13, 2007 at 11:29:21AM -0500, Randy Ramsdell wrote:
>>>>  
>>>>> I have doing some checking of spam messages that make it through 
>>>>> our mail filtering systems and noticed that the spam score does 
>>>>> not reflect what I get when checking manually.
>>>>>
>>>>> An example spam report:
>>>>> X-Spam-Status: No, score=3.068 tagged_above=-9999 required=5
>>>>> tests=[BAYES_50=0.001, HELO_DYNAMIC_DHCP=3.066, HTML_MESSAGE=0.001]
>>>>> X-Spam-Score: 3.068
>>>>>
>>>>> But when using "spamassassin -D -lint < $message" it hits more rules:
>>>>>     
>>>> [...]
>>>>  
>>>>> 3.5 BAYES_99               BODY: Bayesian spam probability is 99 
>>>>> to 100%
>>>>> 3.9 RCVD_IN_XBL            RBL: Received via a relay in Spamhaus XBL
>>>>> 0.0 RCVD_IN_PBL            RBL: Received via a relay in Spamhaus PBL
>>>>>
>>>>> That is a big difference!
>>>>> Any ideas about why this is?
>>>>>     
>>>>
>>>> It appears that the first results are a) using a different Bayes DB,
>>>> and b) not using network tests (aka: local mode).
>>>>
>>>>   
>>>
>>> This is a log message from our server which shows it checks 
>>> sbl-xbl.spamhaus.org and rejects the message. Also it using a 
>>> different bayes and I am not sure about that either. Actually I 
>>> think I do and will check, but it looks like I need to sort out some 
>>> things here.
>>>
>>> postfix/smtpd[10855]: NOQUEUE: reject: RCPT from 
>>> acd34.internetdsl.tpnet.pl[83.16.55.34]: 554 Service unavailable; 
>>> Client host [83.16.55.34] blocked using sbl-xbl.spamhaus.org; 
>>> http://www.spamhaus.org/query/bl?ip=83.16.55.34; 
>>> from=<ab...@roytompkins.com> to=<hu...@preforeclosure.com> 
>>> proto=ESMTP helo=<acd34.internetdsl.tpnet.pl>
>>>
>>> s
>> Correction.
>>
>> 1.Obviously the log above was from postfix and not spamassassin and 
>> spamassassin is probably set up for local only! But this leads to an 
>> interesting question. How would postfix "sbl-xbl" checks miss this 
>> and spamassassin not? It does appear as if that is the case.
>>
>>
>>
> Postfix is looking at the connecting host. SA is looking in all the 
> untrusted RCVD lines. Hence the rule name RCVD_IN_

Yep thanks.


Re: Manuel check vs. auto

Posted by Richard Frovarp <ri...@sendit.nodak.edu>.
Randy Ramsdell wrote:
> Randy Ramsdell wrote:
>> Theo Van Dinter wrote:
>>> On Thu, Dec 13, 2007 at 11:29:21AM -0500, Randy Ramsdell wrote:
>>>  
>>>> I have doing some checking of spam messages that make it through 
>>>> our mail filtering systems and noticed that the spam score does not 
>>>> reflect what I get when checking manually.
>>>>
>>>> An example spam report:
>>>> X-Spam-Status: No, score=3.068 tagged_above=-9999 required=5
>>>> tests=[BAYES_50=0.001, HELO_DYNAMIC_DHCP=3.066, HTML_MESSAGE=0.001]
>>>> X-Spam-Score: 3.068
>>>>
>>>> But when using "spamassassin -D -lint < $message" it hits more rules:
>>>>     
>>> [...]
>>>  
>>>> 3.5 BAYES_99               BODY: Bayesian spam probability is 99 to 
>>>> 100%
>>>> 3.9 RCVD_IN_XBL            RBL: Received via a relay in Spamhaus XBL
>>>> 0.0 RCVD_IN_PBL            RBL: Received via a relay in Spamhaus PBL
>>>>
>>>> That is a big difference!
>>>> Any ideas about why this is?
>>>>     
>>>
>>> It appears that the first results are a) using a different Bayes DB,
>>> and b) not using network tests (aka: local mode).
>>>
>>>   
>>
>> This is a log message from our server which shows it checks 
>> sbl-xbl.spamhaus.org and rejects the message. Also it using a 
>> different bayes and I am not sure about that either. Actually I think 
>> I do and will check, but it looks like I need to sort out some things 
>> here.
>>
>> postfix/smtpd[10855]: NOQUEUE: reject: RCPT from 
>> acd34.internetdsl.tpnet.pl[83.16.55.34]: 554 Service unavailable; 
>> Client host [83.16.55.34] blocked using sbl-xbl.spamhaus.org; 
>> http://www.spamhaus.org/query/bl?ip=83.16.55.34; 
>> from=<ab...@roytompkins.com> to=<hu...@preforeclosure.com> 
>> proto=ESMTP helo=<acd34.internetdsl.tpnet.pl>
>>
>> s
> Correction.
>
> 1.Obviously the log above was from postfix and not spamassassin and 
> spamassassin is probably set up for local only! But this leads to an 
> interesting question. How would postfix "sbl-xbl" checks miss this and 
> spamassassin not? It does appear as if that is the case.
>
>
>
Postfix is looking at the connecting host. SA is looking in all the 
untrusted RCVD lines. Hence the rule name RCVD_IN_

Re: Manuel check vs. auto

Posted by Randy Ramsdell <rr...@livedatagroup.com>.
Randy Ramsdell wrote:
> Theo Van Dinter wrote:
>> On Thu, Dec 13, 2007 at 11:29:21AM -0500, Randy Ramsdell wrote:
>>  
>>> I have doing some checking of spam messages that make it through our 
>>> mail filtering systems and noticed that the spam score does not 
>>> reflect what I get when checking manually.
>>>
>>> An example spam report:
>>> X-Spam-Status: No, score=3.068 tagged_above=-9999 required=5
>>> tests=[BAYES_50=0.001, HELO_DYNAMIC_DHCP=3.066, HTML_MESSAGE=0.001]
>>> X-Spam-Score: 3.068
>>>
>>> But when using "spamassassin -D -lint < $message" it hits more rules:
>>>     
>> [...]
>>  
>>> 3.5 BAYES_99               BODY: Bayesian spam probability is 99 to 
>>> 100%
>>> 3.9 RCVD_IN_XBL            RBL: Received via a relay in Spamhaus XBL
>>> 0.0 RCVD_IN_PBL            RBL: Received via a relay in Spamhaus PBL
>>>
>>> That is a big difference!
>>> Any ideas about why this is?
>>>     
>>
>> It appears that the first results are a) using a different Bayes DB,
>> and b) not using network tests (aka: local mode).
>>
>>   
>
> This is a log message from our server which shows it checks 
> sbl-xbl.spamhaus.org and rejects the message. Also it using a 
> different bayes and I am not sure about that either. Actually I think 
> I do and will check, but it looks like I need to sort out some things 
> here.
>
> postfix/smtpd[10855]: NOQUEUE: reject: RCPT from 
> acd34.internetdsl.tpnet.pl[83.16.55.34]: 554 Service unavailable; 
> Client host [83.16.55.34] blocked using sbl-xbl.spamhaus.org; 
> http://www.spamhaus.org/query/bl?ip=83.16.55.34; 
> from=<ab...@roytompkins.com> to=<hu...@preforeclosure.com> 
> proto=ESMTP helo=<acd34.internetdsl.tpnet.pl>
>
> s
Correction.

1.Obviously the log above was from postfix and not spamassassin and 
spamassassin is probably set up for local only! But this leads to an 
interesting question. How would postfix "sbl-xbl" checks miss this and 
spamassassin not? It does appear as if that is the case.

2. The bayes are different as one was root and the other was the user 
that spamassassin runs as. The root bayes seems much better for this 
particular e-mail. Is it recommended to swap these databases as I 
believe some learning was done as the wrong user?



Re: Manuel check vs. auto

Posted by Randy Ramsdell <rr...@livedatagroup.com>.
Theo Van Dinter wrote:
> On Thu, Dec 13, 2007 at 11:29:21AM -0500, Randy Ramsdell wrote:
>   
>> I have doing some checking of spam messages that make it through our 
>> mail filtering systems and noticed that the spam score does not reflect 
>> what I get when checking manually.
>>
>> An example spam report:
>> X-Spam-Status: No, score=3.068 tagged_above=-9999 required=5
>> tests=[BAYES_50=0.001, HELO_DYNAMIC_DHCP=3.066, HTML_MESSAGE=0.001]
>> X-Spam-Score: 3.068
>>
>> But when using "spamassassin -D -lint < $message" it hits more rules:
>>     
> [...]
>   
>> 3.5 BAYES_99               BODY: Bayesian spam probability is 99 to 100%
>> 3.9 RCVD_IN_XBL            RBL: Received via a relay in Spamhaus XBL
>> 0.0 RCVD_IN_PBL            RBL: Received via a relay in Spamhaus PBL
>>
>> That is a big difference!
>> Any ideas about why this is?
>>     
>
> It appears that the first results are a) using a different Bayes DB,
> and b) not using network tests (aka: local mode).
>
>   

This is a log message from our server which shows it checks 
sbl-xbl.spamhaus.org and rejects the message. Also it using a different 
bayes and I am not sure about that either. Actually I think I do and 
will check, but it looks like I need to sort out some things here.

postfix/smtpd[10855]: NOQUEUE: reject: RCPT from 
acd34.internetdsl.tpnet.pl[83.16.55.34]: 554 Service unavailable; Client 
host [83.16.55.34] blocked using sbl-xbl.spamhaus.org; 
http://www.spamhaus.org/query/bl?ip=83.16.55.34; 
from=<ab...@roytompkins.com> to=<hu...@preforeclosure.com> 
proto=ESMTP helo=<acd34.internetdsl.tpnet.pl>



Re: Manuel check vs. auto

Posted by Theo Van Dinter <fe...@apache.org>.
On Thu, Dec 13, 2007 at 11:29:21AM -0500, Randy Ramsdell wrote:
> I have doing some checking of spam messages that make it through our 
> mail filtering systems and noticed that the spam score does not reflect 
> what I get when checking manually.
> 
> An example spam report:
> X-Spam-Status: No, score=3.068 tagged_above=-9999 required=5
> tests=[BAYES_50=0.001, HELO_DYNAMIC_DHCP=3.066, HTML_MESSAGE=0.001]
> X-Spam-Score: 3.068
> 
> But when using "spamassassin -D -lint < $message" it hits more rules:
[...]
> 3.5 BAYES_99               BODY: Bayesian spam probability is 99 to 100%
> 3.9 RCVD_IN_XBL            RBL: Received via a relay in Spamhaus XBL
> 0.0 RCVD_IN_PBL            RBL: Received via a relay in Spamhaus PBL
> 
> That is a big difference!
> Any ideas about why this is?

It appears that the first results are a) using a different Bayes DB,
and b) not using network tests (aka: local mode).

-- 
Randomly Selected Tagline:
"So on one hand, honey is an amazingly sophisticated and efficient food
 source. On the other hand it's bee backwash."
             - Alton Brown, Good Eats, "Pantry Raid IV: Comb Alone"