You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Chris 'Xenon' Hanson <xe...@alphapixel.com> on 2008/08/19 17:50:43 UTC

SA poor scores after reboot

  I run an Ubuntu machine with qmail, qmail-scanner and SpamAssassin. Yes, I know this
isn't the qmail or qmail-scanner list, but I genuinely think this is an SA issue. Well, a
user issue with SA, really.

  Normally, the system runs great, rejecting heaps of spam. But after a reboot (our power
failed for a long time this weekend, longer than the UPS could keep up), SA has really
poor filtering for a while. It _is_ running, here are the X-Spam headers from typical
flagged spam:

X-Spam-Status: Yes, hits=5.4 required=4.0
X-Spam-Level: +++++
X-Spam-Report: SA TESTS
  1.1 EXTRA_MPART_TYPE       Header has extraneous Content-type:...type= entry
  0.0 HTML_MESSAGE           BODY: HTML included in message
  3.1 HTML_IMAGE_ONLY_08     BODY: HTML: images with 400-800 bytes of words
  0.0 BAYES_50               BODY: Bayesian spam probability is 40 to 60%
                             [score: 0.5445]
  0.2 HTML_FONT_LOW_CONTRAST BODY: HTML font color similar to background
  0.9 HTML_SHORT_LINK_IMG_1  HTML is very short with a linked image


X-Spam-Report: SA TESTS
  0.1 HTML_90_100            BODY: Message is 90% to 100% HTML
  1.1 MIME_HTML_MOSTLY       BODY: Multipart message mostly text/html MIME
  0.0 HTML_MESSAGE           BODY: HTML included in message
  3.1 HTML_IMAGE_ONLY_08     BODY: HTML: images with 400-800 bytes of words
  0.0 BAYES_50               BODY: Bayesian spam probability is 40 to 60%
                             [score: 0.5008]
  0.9 HTML_SHORT_LINK_IMG_1  HTML is very short with a linked image


  I have Qmail-scanner set to flag at 4 and silently delete at 6. Normally, most genuine
spam disappears silently because it has SUCH a high score, and I get a few "flagged" false
negatives. I almost never get a false positive flagged.


  I have a Bayes database that I hand-fed a huge number of ham and spam a few months ago,
and it has worked really well. I also have bayes autolearn on. I did a sa-learn dump, and
while I didn't really understand all the data it listed, it seemed to have a large dataset.

  Where would I begin to investigate this? Normally, SA is such a "utility" program, it
just works, and works great without fuss. I have it set to fetch new rules frequently, and
am using the JMason sought.cf ruleset as well.


  Thanks much in advance for any advice.


-- 
Chris 'Xenon' Hanson, omo sanza lettere                  Xenon AlphaPixel.com
PixelSense Landsat processing now available! http://www.alphapixel.com/demos/
"There is no Truth. There is only Perception. To Perceive is to Exist." - Xen