You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Enrique Perez-Terron <en...@online.no> on 2008/02/15 05:00:26 UTC

Html font invisible or "low contrast" - no score?

I am trying to figure out why almost all spam continues to get through. I 
use 
  Fedora 8, 
  Evolution 2.12.3, and 
  spamassassin 3.2.4

I have marked as junk, respectively as non-junk, more than 100 mails of 
each kind. Probably more than 200 by now.

I have saved to a file the source of one typical example spam.
This mail contains sequences like

  <span style="FONT-SIZE: 2px; FLOAT: right; COLOR: white"> rqz </span>

embedded in the middle of "sensitive" words. That makes the word look like

  spa       massa      ssin

(substitute your favorite merchandise). The sequence above selects white 
letters on a white background, and in addition, makes the letters rather 
small, two pixels high. In this way the words that would otherwise 
trigger a filter rule, get split and the pieces are separated by other 
words or letter combinations; yet those other words do not show up on the 
screen.

Googling around I found a list of Spamassassin tests, including

   Area tested:   body
   Description:   HTML font color similar to background
   Test name:     HTML_FONT_LOW_CONTRAST
   Default score: 
    local:        0.131 
    net:          0.543 
    bayes:        0.663 
    bayes + net:  0.124

(I do not understand these scores. Why are they different? When do they 
apply - eg. does the 'local' value apply if I run "spamassassin --local"? 
But if so, why is a low font contrast less significant when --local is 
used? etc.)

There was also another test named HTML_FONT_INVISIBLE, but I later found 
this test appears to be assiociated with earlier versions of spamassassin.

Since Evolution runs "spamc --local", I tried "spamassassin --local" and 
looked at the output. Here is one:

  X-Spam-Status: No, score=3.4 required=5.0 tests=AWL,DATE_IN_PAST_24_48,
	HS_INDEX_PARAM,HTML_MESSAGE,RDNS_NONE autolearn=no version=3.2.4

There is no indication of the low-contrast rule having been triggered.  
Should this be so? Is this header supposed to show all tests with non-
zero scores?  How can I have spamassassin give me a complete list of 
tests with nonzero scores?

I added lines to my .spamassassin/user_prefs

  score HTML_FONT_INVISIBLE 9.99
  score HTML_FONT_LOW_CONTRAST 9.99

but could not see any change.

Then I tried to look at the source code. I found a function 
"html_font_invisible", which starts by computing the foreground and 
background colors. I inserted an extra line of code to have the function 
log its determinations. Here is some of the output:

  backgroud:#ffffff foreground:#000000
  backgroud:#ffffff foreground:#ffffff
  backgroud:#ffffff foreground:#000000
  backgroud:#ffffff foreground:#ffffff
  backgroud:#ffffff foreground:#000000
  backgroud:#ffffff foreground:#ffffff
  backgroud:#ffffff foreground:#000000
  backgroud:#ffffff foreground:#000000
  backgroud:#ffffff foreground:#000000
  backgroud:#ffffff foreground:#ffffff
  backgroud:#ffffff foreground:#000000

That is, the function assumes the background is white, and correctly 
finds that the text color is sometimes black, sometimes white.

This shows that Spamassassin does run that code, and does correctly 
determine that some of the text has the same color as the background.

However, finding one's way through all of spamassassin's code is likely 
to be a monumental task, so I wish to ask if somebody knows anything 
about this problem.

Further googling turned up some discussions showing that the combination 
fedora+evolution+junk-filtering had more complaints than e.g. ubuntu. 
However, I did not see any resolution (the web server went offline).

Any ideas? Any pointers?

Thanks