You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Rich Shepard <rs...@appl-ecosys.com> on 2009/06/05 17:57:29 UTC

Re: Identifying Source of False Positives -- RESOLVED

On Tue, 2 Jun 2009, Rich Shepard wrote:

>  I started doing this today. Each of the false positive messages was
> exported from alpine to a file, and I ran sa-learn on that file telling it
> the text is ham.

   Today the mail and logwatch summary reports appeared in my inbox and there
were no false positives in the holding cell. This may have resolved the
issue of missing messages, but I'll continue to monitor and train SA on the
ham that was mistakenly labeled as spam.

>> The empty body problem is a more difficult problem.  Have procmail save a
>> copy of the raw message somewhere and take a look at it.  Make sure there
>> is a blank line between the headers and the body.  Run 'spamassassin -D'
>> on this saved message and look for anything unusual in the debug output.

   This seems to have been resolved by replacing the old
/etc/mail/spamassassin/local.cf with the new version. Many fewer rules and
other entries, but I no longer see the EMPTY_BODY test adding 2.5 to the
scores.

Thank you all very much,

Rich

-- 
Richard B. Shepard, Ph.D.               |  Integrity            Credibility
Applied Ecosystem Services, Inc.        |            Innovation
<http://www.appl-ecosys.com>     Voice: 503-667-4517      Fax: 503-667-8863

Re: [SA] Identifying Source of False Positives -- RESOLVED

Posted by Rich Shepard <rs...@appl-ecosys.com>.
On Fri, 5 Jun 2009, Adam Katz wrote:

> Since that regex matches nothing, I assume you meant it to be
> m'^[^\n]+\n\s*$'s  or  m'^[^\n]+\n\s*$'ms

Adam,

   I didn't write this. It apparently came with the local.cf file a few years
ago.

Rich

-- 
Richard B. Shepard, Ph.D.               |  Integrity            Credibility
Applied Ecosystem Services, Inc.        |            Innovation
<http://www.appl-ecosys.com>     Voice: 503-667-4517      Fax: 503-667-8863

Re: [SA] Identifying Source of False Positives -- RESOLVED

Posted by Adam Katz <an...@khopis.com>.
Rich Shepard wrote:
> # for empty message bodies:
> body       EMPTY_BODY   m'^[^\n]+\n\s*$'
> describe   EMPTY_BODY   Message has subject but no body
> score      EMPTY_BODY   2.5

Egads ... that's an unbounded multi-line regex (that little plus sign is
quite CPU-intensive).  I don't understand its intent, either ... it
looks for a line that includes linebreaks but with no multi-line flag.
Ignoring that bug, it wants a nonzero line followed by either a blank
line or a line filled only with spaces.  How does this characterize an
empty body?  What does this have to do with the presence of a subject?

Since that regex matches nothing, I assume you meant it to be
m'^[^\n]+\n\s*$'s  or  m'^[^\n]+\n\s*$'ms

With a trailing s, that rule matches one-line emails that end in a blank
line (which are quite common).

With a trailing ms, that rule matches any email with a paragraph in it
(like this one), which is almost every single email.

It appears you wanted something like this:

body     __EMPTY_BODY  !~ m'\w\n\w's
meta     SUBJ_NO_BODY  __EMPTY_BODY && __HAS_SUBJECT
describe SUBJ_NO_BODY  Message has subject but no body
score    SUBJ_NO_BODY  2.5

Or perhaps like this:

body     EMPTY_BODY    !~ m'\w\n\w's
describe EMPTY_BODY    Message has no text in body
score    EMPTY_BODY    2.5

Also, that score seems pretty high, and I wonder about your intent.  If
you're trying to use it to catch image-only spam, please use the other
rules we've proposed on the list, like MIME_IMAGE_ONLY.

Re: Identifying Source of False Positives -- RESOLVED

Posted by Rich Shepard <rs...@appl-ecosys.com>.
On Fri, 5 Jun 2009, Bowie Bailey wrote:

> In that case, you should be able to track down the issue by comparing the
> two files. Is the EMPTY_BODY rule defined in the old local.cf file? If
> so, what does it say?

Bowie,

   Yes, it was in the old local.cf:

# for empty message bodies:
body       EMPTY_BODY   m'^[^\n]+\n\s*$'
describe   EMPTY_BODY   Message has subject but no body
score      EMPTY_BODY   2.5

   It apparently used to work, but isn't with the new SA to which I upgraded
a few months ago.

Thanks,

Rich

-- 
Richard B. Shepard, Ph.D.               |  Integrity            Credibility
Applied Ecosystem Services, Inc.        |            Innovation
<http://www.appl-ecosys.com>     Voice: 503-667-4517      Fax: 503-667-8863

Re: Identifying Source of False Positives -- RESOLVED

Posted by Bowie Bailey <Bo...@BUC.com>.
Rich Shepard wrote:
>>> The empty body problem is a more difficult problem.  Have procmail 
>>> save a
>>> copy of the raw message somewhere and take a look at it.  Make sure 
>>> there
>>> is a blank line between the headers and the body.  Run 'spamassassin 
>>> -D'
>>> on this saved message and look for anything unusual in the debug 
>>> output.
>
>   This seems to have been resolved by replacing the old
> /etc/mail/spamassassin/local.cf with the new version. Many fewer rules 
> and
> other entries, but I no longer see the EMPTY_BODY test adding 2.5 to the
> scores.

In that case, you should be able to track down the issue by comparing 
the two files.  Is the EMPTY_BODY rule defined in the old local.cf 
file?  If so, what does it say?

-- 
Bowie