You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Charles Gregory <cg...@hwcn.org> on 2009/06/01 21:12:29 UTC

Re: [sa] Re: Identifying Source of False Positives

On Mon, 1 Jun 2009, Rich Shepard wrote:
> 	 *  2.5 EMPTY_BODY BODY: Message has subject but no body
>  There is certainly body content in the message; it's not empty so I don't
> understand the 2.5 on that third test. I also don't know where the 3.5 on
> the second test arises.

Just to be clear, are you looking at the body in the actual rejected 
message, to make sure it is still there (not 'stripped' from the message)?
First guess, look at the procmail code that 'chooses' to run spamassassin.
Have you used an 'h' where you meant to use an 'H', thereby feeding *only* 
the header to spamassassin?

- C

Re: [sa] Re: Identifying Source of False Positives

Posted by Rich Shepard <rs...@appl-ecosys.com>.
On Mon, 1 Jun 2009, Charles Gregory wrote:


> Is there anywhere in the procmail recipe *above* this one that some
> specila condition has been specified as:
>
>   :0fwh
>
> ...which has the effect of 'filtering' the message down to just its
> headers? It wouldn't necessarily have to be a recent change to your
> procmailrc, it might just be a subtle change in the log mail that
> 'triggers' the rule when it didn't before.

Charles,

# BEGIN RECIPES

# Nuke duplicate messages
#:0 Wh: msgid.lock
#| $FORMAIL -D 8192 msgid.cache

## Call SpamAssassin
:0fw: spamassassin.lock
* < 256000
| spamassassin

   The first recipe has been commented out for a while now, so the call to SA
is at the top of the list.

> Next guess: Has this log summary grown in size past some limit that would 
> cause the whole body to be 'truncated'?

   No. The log summary report (with headers) is < 26,000 bytes.

Rich

Re: [sa] Re: Identifying Source of False Positives

Posted by Charles Gregory <cg...@hwcn.org>.
>>  First guess, look at the procmail code that 'chooses' to run spamassassin.
>>  Have you used an 'h' where you meant to use an 'H', thereby feeding *only*
>>  the header to spamassassin?
> ## Call SpamAssassin
> : 0fw: spamassassin.lock
> * < 256000
> |  spamassassin

Is there anywhere in the procmail recipe *above* this one that some 
specila condition has been specified as:

    :0fwh

...which has the effect of 'filtering' the message down to just its
headers? It wouldn't necessarily have to be a recent change to your
procmailrc, it might just be a subtle change in the log mail that
'triggers' the rule when it didn't before.

Next guess: Has this log summary grown in size past some limit that would 
cause the whole body to be 'truncated'?

- Charles

Re: [sa] Re: Identifying Source of False Positives

Posted by Rich Shepard <rs...@appl-ecosys.com>.
On Mon, 1 Jun 2009, Charles Gregory wrote:

> Just to be clear, are you looking at the body in the actual rejected
> message,

Charles,

   Yes. The body consists of the mail log summary.

> First guess, look at the procmail code that 'chooses' to run spamassassin.
> Have you used an 'h' where you meant to use an 'H', thereby feeding *only*
> the header to spamassassin?

## Call SpamAssassin
:0fw: spamassassin.lock
* < 256000
| spamassassin

   This is how it's been for years.

Rich

Re: Identifying Source of False Positives

Posted by Rich Shepard <rs...@appl-ecosys.com>.
On Tue, 2 Jun 2009, Charles Gregory wrote:

> This *really* suggests that one of two things MUST be occuring:
> 1) What you are seeing is NOT what spamassassin "sees".

Charles,

   Quite possible.

> 2) A character (null/ascii-zeros?) has been injected into the e-mail
>   somewhere in the headers, causing Spamassassin to cease its scan at that
>   point...

   Hmm-m-m-m. I cannot perceive a scenario where this is selective. For
example, the log reports sent by local root to me on the local machine, some
messages posted to this mail list (but not others in the same thread), some
messages posted to other mail lists (again, not all in the same thread), and
so on. There is no consistent pattern other than the locally generated log
summary reports.

> Presuming upon the latter, try examining all the headers injected by other
> processes like clamav. Particularly where *some* messages receive this
> treatment, but not *all*, you should be able to find a 'header difference'
> between the passed and failed messages.

   No clamav or similar. We run only linux with incoming mail processed by
postfix and procmail.

> Something to try:
> Setup a custom rule in local.cf to match a custom header
>   X-Spam-Test: YES
> And then , just before you scan the e-mail with spamassasin, use 'formail' to 
> add that header to the mail.

   I've not before used formail. SA is called from within
~/procmail/recipes.rc:

## Call SpamAssassin
:0fw: spamassassin.lock
* < 256000
| spamassassin

   Where do I insert a call to formail and what is the appropriate format?

Thanks,

Rich

-- 
Richard B. Shepard, Ph.D.               |  Integrity            Credibility
Applied Ecosystem Services, Inc.        |            Innovation
<http://www.appl-ecosys.com>     Voice: 503-667-4517      Fax: 503-667-8863

Re: Identifying Source of False Positives

Posted by Charles Gregory <cg...@hwcn.org>.
On Tue, 2 Jun 2009, Rich Shepard wrote:
>  This morning not only was the mail log report and logwatch report falsely
> flagged as spam, but so were several messages posted to the google group
> mail list for an application I use. What is interesting to me is that every
> one had a +2.5 score for EMPTY_BODY, while none of them had empty bodies.

This *really* suggests that one of two things MUST be occuring:

1) What you are seeing is NOT what spamassassin "sees".

2) A character (null/ascii-zeros?) has been injected into the e-mail
    somewhere in the headers, causing Spamassassin to cease its scan at
    that point...

Presuming upon the latter, try examining all the headers injected by other 
processes like clamav. Particularly where *some* messages receive this 
treatment, but not *all*, you should be able to find a 'header difference' 
between the passed and failed messages.

Something to try:
Setup a custom rule in local.cf to match a custom header
    X-Spam-Test: YES
And then , just before you scan the e-mail with spamassasin, use 'formail' 
to add that header to the mail. It will get injected at the end of the 
headers. If the test rule 'hits' then you have a real mystery. If the test 
rule does *not* 'hit', then we have evidence that something is causing 
Spamassassin to behave like an End-Of-File condition has ben reached on 
the mail before it read it all..... Null/zeros or something....

- Charles

Re: [sa] Re: Identifying Source of False Positives

Posted by Rich Shepard <rs...@appl-ecosys.com>.
On Mon, 1 Jun 2009, Charles Gregory wrote:

> Just to be clear, are you looking at the body in the actual rejected
> message, to make sure it is still there (not 'stripped' from the message)?

Charles,

   I hope the following information is helpful in telling you more
experienced folks why I'm having these false positives.

   This morning not only was the mail log report and logwatch report falsely
flagged as spam, but so were several messages posted to the google group
mail list for an application I use. What is interesting to me is that every
one had a +2.5 score for EMPTY_BODY, while none of them had empty bodies.

   Those from the google group mail list might have had a html-formatted
part, but the log reports are plain ASCII text.

   What might possibly be triggering this false rule?

Rich