You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Chris 'Xenon' Hanson <xe...@alphapixel.com> on 2008/08/19 17:50:43 UTC
SA poor scores after reboot
I run an Ubuntu machine with qmail, qmail-scanner and SpamAssassin. Yes, I know this
isn't the qmail or qmail-scanner list, but I genuinely think this is an SA issue. Well, a
user issue with SA, really.
Normally, the system runs great, rejecting heaps of spam. But after a reboot (our power
failed for a long time this weekend, longer than the UPS could keep up), SA has really
poor filtering for a while. It _is_ running, here are the X-Spam headers from typical
flagged spam:
X-Spam-Status: Yes, hits=5.4 required=4.0
X-Spam-Level: +++++
X-Spam-Report: SA TESTS
1.1 EXTRA_MPART_TYPE Header has extraneous Content-type:...type= entry
0.0 HTML_MESSAGE BODY: HTML included in message
3.1 HTML_IMAGE_ONLY_08 BODY: HTML: images with 400-800 bytes of words
0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60%
[score: 0.5445]
0.2 HTML_FONT_LOW_CONTRAST BODY: HTML font color similar to background
0.9 HTML_SHORT_LINK_IMG_1 HTML is very short with a linked image
X-Spam-Report: SA TESTS
0.1 HTML_90_100 BODY: Message is 90% to 100% HTML
1.1 MIME_HTML_MOSTLY BODY: Multipart message mostly text/html MIME
0.0 HTML_MESSAGE BODY: HTML included in message
3.1 HTML_IMAGE_ONLY_08 BODY: HTML: images with 400-800 bytes of words
0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60%
[score: 0.5008]
0.9 HTML_SHORT_LINK_IMG_1 HTML is very short with a linked image
I have Qmail-scanner set to flag at 4 and silently delete at 6. Normally, most genuine
spam disappears silently because it has SUCH a high score, and I get a few "flagged" false
negatives. I almost never get a false positive flagged.
I have a Bayes database that I hand-fed a huge number of ham and spam a few months ago,
and it has worked really well. I also have bayes autolearn on. I did a sa-learn dump, and
while I didn't really understand all the data it listed, it seemed to have a large dataset.
Where would I begin to investigate this? Normally, SA is such a "utility" program, it
just works, and works great without fuss. I have it set to fetch new rules frequently, and
am using the JMason sought.cf ruleset as well.
Thanks much in advance for any advice.
--
Chris 'Xenon' Hanson, omo sanza lettere Xenon AlphaPixel.com
PixelSense Landsat processing now available! http://www.alphapixel.com/demos/
"There is no Truth. There is only Perception. To Perceive is to Exist." - Xen