You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Bram Mertens <br...@linux.be> on 2004/02/18 14:02:05 UTC

strange results: spamassassin assings score 6.8 spamc -c assigns score 4.3

Hi

This may be related to the mistakes I made earlier but I noticed some
strange results:
after trying to fix some mistakes I made earlier I told sa-learn to
forget some messages (sent-folders and a folder where I store jokes) and
trained it again with some new spam:
m8ram@linux:~> sa-learn --forget --showdots --no-rebuild --mbox /home/m8ram/evolution/local/kim/subfolders/sent/mbox
sa-learn warning: --forget requires read/write access to the database, and is incompatible with --no-rebuild
Learned from 0 message(s) (141 message(s) examined).
m8ram@linux:~> sa-learn --forget --showdots --mbox /home/m8ram/evolution/local/Bram/subfolders/jokes/mbox
Learned from 0 message(s) (115 message(s) examined).
m8ram@linux:~> sa-learn --forget --showdots --mbox /home/m8ram/evolution/local/Bram/subfolders/afterdawn/mbox
Learned from 0 message(s) (3 message(s) examined).
m8ram@linux:~> sa-learn --forget --showdots --mbox /home/m8ram/evolution/local/M8ram/subfolders/sent/mbox
Learned from 0 message(s) (6 message(s) examined).
m8ram@linux:~> sa-learn --forget --showdots --mbox /home/m8ram/evolution/local/gilbert/subfolders/sent/mbox
Learned from 0 message(s) (13 message(s) examined).
m8ram@linux:~> sa-learn --ham --showdots --no-rebuild --mbox /home/m8ram/evolution/local/SPAM/mbox
Learned from 676 message(s) (676 message(s) examined).
m8ram@linux:~> sa-learn --rebuild

The last output confused me a bit, I already trained it on that mbox
yesterday so I expected the learned from number to be much lower...

After this I ran the SPAM/mbox back through SA (using the spamc -c
command in evolution) and only +/- 100 of the +600 messages were sorted
to the spamassassin folder (where I store the messages marked as spam).

I found it strange that a message with a subject:"Reality F*ck Tour
Across America!" would still be marked as ham...

So I saved the message to a text-file and tested it with spamassassin:
m8ram@linux:~> spamassassin < spam.txt > spam.out
And looking at spam.out the message is marked as SPAM:
X-Spam-Status: Yes, hits=6.8 required=5.0 tests=HTML_60_70,
       
HTML_FONTCOLOR_UNKNOWN,HTML_FONT_BIG,HTML_FONT_INVISIBLE,HTML_MESSAGE,
        HTML_TAG_BALANCE_A,HTTP_EXCESSIVE_ESCAPES,MAILTO_TO_SPAM_ADDR,
        MIME_HTML_NO_CHARSET,MIME_HTML_ONLY,RCVD_IN_BL_SPAMCOP_NET,
        RCVD_IN_DSBL autolearn=no version=2.60

But when I run it through spamc -c I get the following:
m8ram@linux:~> spamc -c < spam.txt 
4.3/5.0
m8ram@linux:~> cat spam.txt | spamc -c
4.3/5.0

Can anybody tell me how I can fix this?

I haven't told sa-learn to forget the mbox files I fed it without the
--mbox option yet.  Do I have to train SA again after this?

TIA

Bram
-- 
# Mertens Bram "M8ram"   <br...@linux.be>     Linux User #249103 #
# SuSE Linux 8.2 (i586) kernel 2.4.20-4GB      i686                256MB RAM #
#   1:45pm  up 26 days 17:23,  9 users,  load average: 0.00, 0.02, 0.00 #