You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Aleksander Adamowski <al...@altkom.pl> on 2004/02/18 12:54:31 UTC

Semi-invisible font missed by SA

A message containing a very bright font used to conceal Bayes poison got 
through yesterday.

Spamassassin score was:

X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on
    nmail.altkom.pl
X-Spam-Status: No, hits=1.9 required=5.0 tests=BAYES_50,
    HTML_FONTCOLOR_UNKNOWN,HTML_MESSAGE,MIME_HTML_ONLY,SUBJECT_PHARMACY
    autolearn=no version=2.60
X-Spam-Level: *

Relevant fragments:

--- Start sample ---
<font color=

#fee4e8> sleepwalk negotiate sonic, diluting democrat humpback, 
encamping palaces MacDonald Hewlett brainstem crops cautions chartering 
discharged chronicle disagree presided accordion. sentiments transplant 
corpse defeat downright, immersed Boltzmann skulk beatitudes espouse 
planks palmer compresses populace almsman bivouac tolerance. cookery 
Ridgway scalded. ribbing mockery Oakley glover reopens 
satellites.</font><br>
ONLY REAL SUPER VIAGDRA CALLED CIADLIS IS EFFECTIVE! Annual Sale: ONLY 
$3 per dose<br>
--- End sample ---

another one:

--- Start sample ---
<font color=

#ebeff4>whitely chord cowing gayety aviary, nostalgic glucose Hyannis 
employ; subdued movements mischief smartly intonation reserved distaff 
standoff terrifies. heavily acquirable beach adulthood invertible, 
traversing vacuo enraged Dobbin Avogadro Agnes Bruno enfeeble credible 
notorious carelessly octaves. negotiate makeup SIMULA. sagebrush 
imaginably heiressesfalcons.</font><br>
--- End sample ---

Anybody got a rule for this type of stuff?

BTW I've refined the rule that catches invisible font sizes to include 0 
and 1 pixel/point fontsizes:

rawbody LOCAL_ZERO_FONTSIZE     /\bfont-size\: ?[01]p[xt]\b/i

Please check if it hits your ham.

-- 
Best Regards,
    Aleksander Adamowski
        GG#: 274614
        ICQ UIN: 19780575 
	http://olo.ab.altkom.pl



Re: [spa] Re: [spa] Semi-invisible font missed by SA

Posted by "Keith C. Ivey" <kc...@cpcug.org>.
Aleksander Adamowski <al...@altkom.pl> wrote:

> More specifically, when there's a dangling attribute value like this in 
> HTML source:
> 
> <font color=
> 
> #feefea>
> 
> , then in the html_font_invisible() method the foreground color ($fg 
> variable) has the value of 'color' instead of hex code of the HTML color.

Yes, that does sound like a parsing problem.  It may be 
connected to the fact that the HTML is invalid, since an 
attribute value containing "#" must be quoted.  Of course, SA 
needs to handle that sort of invalid markup, just as mail 
readers do.

Have you submitted a bug report at 
http://bugzilla.spamassassin.org/ ?


-- 
Keith C. Ivey <kc...@cpcug.org>
Washington, DC


Re: Testing markup tags

Posted by Aleksander Adamowski <al...@altkom.pl>.
John Hardin wrote:

>ITYM it fools recipes that expect the tag to all be on a single line,
>which is my point. I don't think SA is actually parsing the HTML (beyond
>the uri stuff).
>  
>
That's a pity, using a standard HTML parsing module from CPAN would 
offload the work involving handling of maliciously malformed HTML syntax 
to that external module and its maintainer.

This would provide a much more similar behaviour to MUA's like Outlook, 
Mozilla Mail or KMail WRT HTML interpretation, which is a good thing. 
The chance would be bigger that if the user sees something in a certain 
way, then SpamAssassin engine sees too.

Anyway, I've published samples of that malicouis spam here:
<http://olo.ab.altkom.pl/domowa/spam/samples/low_contrast/>

You can check for yourself how these are handled by your installation of SA.

-- 
Best Regards,
  Aleksander Adamowski
    GG#: 274614
    ICQ UIN: 19780575 
  http://olo.ab.altkom.pl


Re: Testing markup tags

Posted by Aleksander Adamowski <al...@altkom.pl>.
John Hardin wrote:

>On Mon, 2004-02-23 at 10:03, Aleksander Adamowski wrote:
>  
>
>>More specifically, when there's a dangling attribute value like this in 
>>HTML source:
>>
>><font color=
>>
>>#feefea>
>>    
>>
>Hmm.
>
>Perhaps SA should have a test similar to the URI test, named perhaps
>"tag", that matches a single markup tag with all line breaks removed,
>obfuscation encodings decoded, whitespace collapsed, etc...
>  
>
Agreed, but ideally HTML::Parser (or whatever parses those font tags) 
should be made resistant to such simple attacks. Mozilla and IE 
correctly interpret the color attribute of that mangled tag, so Perl 
HTML::Parser should too...

BTW I've notice that mu MUA has snipped the trailing space, so those 
samples of font tag were identical. The problematic font tag has a line 
break instantly after "color=" in the original spam message, and this 
fools the HTML parser.

-- 
Best Regards,
    Aleksander Adamowski
        GG#: 274614
        ICQ UIN: 19780575 
	http://olo.ab.altkom.pl


Re: [spa] Re: [spa] Semi-invisible font missed by SA

Posted by Aleksander Adamowski <al...@altkom.pl>.
Keith C. Ivey wrote:

>You've lost me there.  What do "dangling attributes" have to do 
>with this case?  HTML_FONTCOLOR_UNKNOWN was triggered, so the 
>COLOR attributes were seen.  The problem is they weren't 
>recognized as being nearly invisible, so the problem seems to 
>be with the HTML_FONT_LOW_CONTRAST test, not with parsing.
>
Well, there's some problem, hard to tell if it's parsing or the test 
itself, but here's what I've found out after adding some debugging calls 
to HTML.pm:

   1. The html_font_invisible() method gets called
   2. There's a problem with arguments passed to it. The foreground
      color is seen by Perl code as the string 'color', not the string
      in the form of '#feefea'. This explains why the test
      HTML_FONTCOLOR_UNKNOWN was triggered ('color' is not a known color
      name or a HTML hex code), and why HTML_FONT_LOW_CONTRAST test has
      failed.

More specifically, when there's a dangling attribute value like this in 
HTML source:

<font color=

#feefea>

, then in the html_font_invisible() method the foreground color ($fg 
variable) has the value of 'color' instead of hex code of the HTML color.

If I add a single space after the equality mark in the tag seen above, 
html_font_invisible() receives correct data and $fg variable holds the 
hex code of font color. So this (notice the space after equality mark):

<font color=

#feefea>

is processed correctly and nearly invisible font is detected .

Moreover, after adding debugging calls to the method html_fgcolor() 
(which extracts foreground color information from a HTML element), I can 
see that the attribute "color" of this font tag already has the value of 
'color', instead of hex code (which should be #feefea in my testcase), 
so the problem is deeper than in html_font_invisible() method.

This suggests a parsing problem somewhere, as far as I understood the 
code... If I am correct in my suspicions that the Perl expression 
"$attr->{color}" is an attribute of a HTML::Parser object, then the 
problem is indeed in HTML parser code (correct me if I'm wrong).

-- 
Best Regards,
    Aleksander Adamowski
        GG#: 274614
        ICQ UIN: 19780575 
	http://olo.ab.altkom.pl


Re: [spa] Re: [spa] Semi-invisible font missed by SA

Posted by Aleksander Adamowski <al...@altkom.pl>.
Keith C. Ivey wrote:

>>This is mainly because HTML.pm can be fooled by dangling attributes.
>>    
>>
>You've lost me there.  What do "dangling attributes" have to do 
>with this case?  HTML_FONTCOLOR_UNKNOWN was triggered, so the 
>COLOR attributes were seen.  The problem is they weren't 
>recognized as being nearly invisible, so the problem seems to 
>be with the HTML_FONT_LOW_CONTRAST test, not with parsing.
>
That's interesting. I'll try to debug HTML.pm tomorrow using those test 
messages. I have a suspicion that the html_font_invisible() function 
isn't called at all...

-- 
Best Regards,
  Aleksander Adamowski
    GG#: 274614
    ICQ UIN: 19780575 
  http://olo.ab.altkom.pl


Re: Semi-invisible font missed by SA

Posted by Charles Gregory <cg...@hwcn.org>.
On Mon, 23 Feb 2004, Keith C. Ivey wrote:
> > This is mainly because HTML.pm can be fooled by dangling attributes.
> You've lost me there.  What do "dangling attributes" have to do 
> with this case?  HTML_FONTCOLOR_UNKNOWN was triggered.....

I would suspect that rule triggers whenever it fails to see a proper color
spec on the SAME LINE. Which makes sense because otherwise the color
specified is NOT an 'unknown' - it is a valid color. 

- C



Re: [spa] Re: [spa] Semi-invisible font missed by SA

Posted by "Keith C. Ivey" <kc...@cpcug.org>.
Aleksander Adamowski <al...@altkom.pl> wrote:

> Understood, what I wanted to say is that Bayes isn't effective against
> this sort of stuff and currently the other SA mechanisms aren't
> sufficient to catch this spam.

My point was that the extra words have no effect one way or the 
other on the Bayes classification.  If they hadn't been there, 
the message would still have slipped through, so it's not 
appropriate to call the extra words "Bayes poison".  People 
talk about "Bayes poison" a lot, but I have yet to see an 
example that actually affects Bayes.

> This is mainly because HTML.pm can be fooled by dangling attributes.

You've lost me there.  What do "dangling attributes" have to do 
with this case?  HTML_FONTCOLOR_UNKNOWN was triggered, so the 
COLOR attributes were seen.  The problem is they weren't 
recognized as being nearly invisible, so the problem seems to 
be with the HTML_FONT_LOW_CONTRAST test, not with parsing.

-- 
Keith C. Ivey <kc...@cpcug.org>
Washington, DC


Re: [spa] Re: [spa] Semi-invisible font missed by SA

Posted by Aleksander Adamowski <al...@altkom.pl>.
Keith C. Ivey wrote:

>Could you post the debug output from spamassassin for one of 
>these messages?  I'm very curious to see why you think the 
>poison is defeating Bayes.  It's certainly possible that every 
>once in a while a spammer will randomly hit on a word that's a 
>good nonspam indicator for you, but I don't believe it can 
>happen for any substantial fraction of messages.

The output you asked for is posted at the end. I've already trained my
Bayes with 2 messages similar to this one (layout is identical, but the
URL is in a different domain in each one, and of course, completely
different set of random poison words).

>The only SA change the message your posted seems to suggest is 
>a modification of the rule for catching low-contrast font 
>color, which has nothing to do with Bayes.  Looking at the 
>spam, it got BAYES_50, so the "poison" didn't affect Bayes at 
>all.  It had no strong spam or nonspam indicators even without 
>the added words.

Understood, what I wanted to say is that Bayes isn't effective against
this sort of stuff and currently the other SA mechanisms aren't
sufficient to catch this spam.

This is mainly because HTML.pm can be fooled by dangling attributes.
Ideally, HTML parser should parse HTML the same way as popular browsers
(IE, Mozilla). Unfortuanately I cannot fix this in HTML.pm myself, this
code is too bity convoluted for me. I think that the help of original
author of HTML.pm is needed here.

--- BEGIN OUTPUT ---
debug: Score set 0 chosen.
debug: running in taint mode? yes
debug: Running in taint mode, removing unsafe env vars, and resetting PATH
debug: PATH included '/usr/kerberos/sbin', keeping.
debug: PATH included '/usr/kerberos/bin', keeping.
debug: PATH included '/usr/lib/courier/bin', keeping.
debug: PATH included '/usr/lib/courier/sbin', keeping.
debug: PATH included '/usr/local/sbin', keeping.
debug: PATH included '/usr/local/bin', keeping.
debug: PATH included '/sbin', keeping.
debug: PATH included '/bin', keeping.
debug: PATH included '/usr/sbin', keeping.
debug: PATH included '/usr/bin', keeping.
debug: PATH included '/usr/X11R6/bin', keeping.
debug: PATH included '/root/bin', keeping.
debug: Final PATH set to: 
/usr/kerberos/sbin:/usr/kerberos/bin:/usr/lib/courier/bin:/usr/lib/courier/sbin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/root/bin
debug: using "/usr/share/spamassassin" for default rules dir
debug: using "/etc/mail/spamassassin" for site rules dir
debug: using "/root/.spamassassin" for user state dir
debug: using "/root/.spamassassin/user_prefs" for user prefs file
debug: bayes: 28543 tie-ing to DB file R/O /etc/mail/spamassassin/bayes_toks
debug: bayes: 28543 tie-ing to DB file R/O /etc/mail/spamassassin/bayes_seen
debug: bayes: found bayes db version 2
debug: Score set 3 chosen.
debug: Initialising learner
debug: is Net::DNS::Resolver available? yes
debug: trying (3) microsoft.com...
debug: looking up MX for 'microsoft.com'
debug: MX for 'microsoft.com' exists? 1
debug: MX lookup of microsoft.com succeeded => Dns available (set 
dns_available to hardcode)
debug: is DNS available? 1
debug: all '*From' addrs: ninawrithed@beerbloat.com
debug: running header regexp tests; score so far=0
debug: running body-text per-line regexp tests; score so far=0.5
debug: bayes corpus size: nspam = 1819, nham = 6265
debug: uri tests: Done uriRE
debug: tokenize: header tokens for *p = "U*ninawrithed D*beerbloat.com 
D*com"
debug: tokenize: header tokens for *M = " qsnlk 636881hohclfgayg 
Kmorphynbkzderzfc com "
debug: tokenize: header tokens for *F = "U*ninawrithed D*beerbloat.com 
D*com"
debug: tokenize: header tokens for To = "U*olo D*altkom.com.pl D*com.pl 
D*pl"
debug: tokenize: header tokens for Mime-Version = "1.0"
debug: tokenize: header tokens for *c = "/html; charset=iso-8859-1"
debug: tokenize: header tokens for Content-Transfer-Encoding = "7bit"
debug: tokenize: header tokens for X-Mime-Autoconverted = "from 8bit to 
7bit by courier 0.44"
debug: tokenize: header tokens for *r = "  olo ([::ffff:202.196.220]) by 
nmail.altkom.pl   esmtp; "
debug: bayes token 'H*c:html' => 0.997358361790176
debug: bayes token 'disagree' => 0.00297237569060773
debug: bayes token 'Sale' => 0.996940397350993
debug: bayes token 'Bruno' => 0.00410687022900763
debug: bayes token 'H*r:olo' => 0.993492957746479
debug: bayes token 'beach' => 0.993492957746479
debug: bayes token 'CALLED' => 0.990941176470588
debug: bayes token 'adulthood' => 0.985096774193548
debug: bayes token 'CheapPharmacy' => 0.978
debug: bayes token 'CIADLIS' => 0.978
debug: bayes token 'VIAGDRA' => 0.978
debug: bayes token 'EFFECTIVE!' => 0.978
debug: bayes token 'tolerance' => 0.0256190476190476
debug: bayes token 'carelessly' => 0.0256190476190476
debug: bayes token 'URI' => 0.96844194358858
debug: bayes token 'REAL' => 0.958964997782887
debug: bayes token 'movements' => 0.958
debug: bayes token 'makeup' => 0.958
debug: bayes token 'chord' => 0.958
debug: bayes token 'downright' => 0.958
debug: bayes token 'sagebrush' => 0.958
debug: bayes token 'corpse' => 0.958
debug: bayes token 'aviary' => 0.958
debug: bayes token 'HTo:U*olo' => 0.95257244243949
debug: bayes token 'Hewlett' => 0.0489090909090909
debug: bayes token 'reopens' => 0.0489090909090909
debug: bayes token 'chronicle' => 0.0489090909090909
debug: bayes token 'cautions' => 0.0489090909090909
debug: bayes token 'discharged' => 0.0489090909090909
debug: bayes token 'compresses' => 0.0489090909090909
debug: bayes token 'notorious' => 0.0489090909090909
debug: bayes token 'defeat' => 0.0489090909090909
debug: bayes token 'smartly' => 0.0489090909090909
debug: bayes token 'credible' => 0.947986086684282
debug: bayes token 'ONLY' => 0.942476065364982
debug: bayes token 'employ' => 0.929714918635859
debug: bayes token 'SUPER' => 0.928422130125509
debug: bayes token 'dose' => 0.916960992788963
debug: bayes token 'href' => 0.902075983318674
debug: bayes token 'HTo:D*altkom.com.pl' => 0.885057514092106
debug: bayes token 'HTo:D*com.pl' => 0.883537429955864
debug: bayes token 'H*r:ffff' => 0.876766493636693
debug: bayes token 'Annual' => 0.864494012282618
debug: bayes: score = 0.622580654529175
debug: bayes: 28543 untie-ing
debug: bayes: 28543 untie-ing db_toks
debug: bayes: 28543 untie-ing db_seen
debug: Razor2 is not available
debug: running raw-body-text per-line regexp tests; score so far=1.92
debug: running uri tests; score so far=2.62
debug: uri tests: Done uriRE
debug: running full-text regexp tests; score so far=2.62
debug: Current PATH is: 
/usr/kerberos/sbin:/usr/kerberos/bin:/usr/lib/courier/bin:/usr/lib/courier/sbin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/root/bin
debug: Pyzor is not available: pyzor not found
debug: Razor2 is not available
debug: DCCifd is not available: no r/w dccifd socket found.
debug: DCC is not available: no executable dccproc found.
debug: all '*To' addrs: olo@altkom.com.pl
debug: DNS MX records found: 1
debug: RBL: success for 1 of 1 queries
debug: running meta tests; score so far=2.62
debug: is spam? score=4.212 required=5 
tests=BAYES_60,HTML_FONTCOLOR_UNKNOWN,HTML_MESSAGE,LOC_HTMLSPLITFONT,MIME_HTML_ONLY,SUBJECT_PHARMACY
Delivered-To: olo@altkom.com.pl
Return-Path: <ni...@beerbloat.com>
Received: from olo ([::ffff:202.196.220.93])
  by nmail.altkom.pl with esmtp; Tue, 17 Feb 2004 10:21:19 +0100
Message-ID: <qs...@Kmorphynbkzderzfc.com>
From: "Kmorphy" <ni...@beerbloat.com>
Date: Tue, 17 Feb 2004 17:21:40 +0800
To: olo@altkom.com.pl
Subject: upholders CheapPharmacy acoustics
Mime-Version: 1.0
Content-Type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: 7bit
X-Mime-Autoconverted: from 8bit to 7bit by courier 0.44
X-Spam-Level: ****
X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on
	nmail.altkom.pl
X-Spam-Status: No, hits=4.2 required=5.0 tests=BAYES_60,
	HTML_FONTCOLOR_UNKNOWN,HTML_MESSAGE,LOC_HTMLSPLITFONT,MIME_HTML_ONLY,
	SUBJECT_PHARMACY autolearn=no version=2.60

<html>
<font color=

#fee4e8> sleepwalk negotiate sonic, diluting democrat humpback, 
encamping palaces MacDonald Hewlett brainstem crops cautions chartering 
discharged chronicle disagree presided accordion. sentiments transplant 
corpse defeat downright, immersed Boltzmann skulk beatitudes espouse 
planks palmer compresses populace almsman bivouac tolerance. cookery 
Ridgway scalded. ribbing mockery Oakley glover reopens 
satellites.</font><br>
ONLY REAL SUPER VIAGDRA CALLED CIADLIS IS EFFECTIVE! Annual Sale: ONLY 
$3 per dose<br>
<br>convening<br>
<br><a hrefredrawnhref=http://multilayer.com href=

"http://goandtakeit.com/sv/index.php?pid=expert">Website</a>
<br><br>
<font color=

#ebeff4>whitely chord cowing gayety aviary, nostalgic glucose Hyannis 
employ; subdued movements mischief smartly intonation reserved distaff 
standoff terrifies. heavily acquirable beach adulthood invertible, 
traversing vacuo enraged Dobbin Avogadro Agnes Bruno enfeeble credible 
notorious carelessly octaves. negotiate makeup SIMULA. sagebrush 
imaginably heiressesfalcons.</font><br>
</html>

--- END OUTPUT ---




-- 
Best Regards,
     Aleksander Adamowski
         GG#: 274614
         ICQ UIN: 19780575
	http://olo.ab.altkom.pl

Re: [spa] Re: [spa] Semi-invisible font missed by SA

Posted by "Keith C. Ivey" <kc...@cpcug.org>.
Aleksander Adamowski <al...@altkom.pl> wrote:
> Raquel Rice wrote:

> >The problem with your theory, is that your bayes hasn't been trained
> >the way mine has, nor has mine been trained the way that Matt's has.
> > The likelihood of any given spam getting past two of us, let alone
> >all three of us, is very slim indeed.
> 
> Unfortunately the sample spam I've sent is quite good at defeating bayes 
> with its poison and hiding the poison from both Spamassassin rules and 
> the eye of the recipient.

Could you post the debug output from spamassassin for one of 
these messages?  I'm very curious to see why you think the 
poison is defeating Bayes.  It's certainly possible that every 
once in a while a spammer will randomly hit on a word that's a 
good nonspam indicator for you, but I don't believe it can 
happen for any substantial fraction of messages.

The only SA change the message your posted seems to suggest is 
a modification of the rule for catching low-contrast font 
color, which has nothing to do with Bayes.  Looking at the 
spam, it got BAYES_50, so the "poison" didn't affect Bayes at 
all.  It had no strong spam or nonspam indicators even without 
the added words.

-- 
Keith C. Ivey <kc...@cpcug.org>
Washington, DC


Re: [spa] Re: [spa] Re: [spa] Semi-invisible font missed by SA

Posted by Raquel Rice <ra...@thericehouse.net>.
On Thu, 19 Feb 2004 14:36:42 -0500 (EST)
Charles Gregory <cg...@hwcn.org> wrote:

> On Thu, 19 Feb 2004, Raquel Rice wrote:
> > > That would be my guess. We are now fully into the part of the
> > > 'game' where the spammers get hold of spamassassin and run
> > > their spew through it*before* trying to mail it, so that they
> > > can try'tricks' like these, and keep trying different ones
> > > until something works.
> > The problem with your theory, is that your bayes hasn't been
> > trained the way mine has, nor has mine been trained the way that
> > Matt's has. The likelihood of any given spam getting past two of
> > us, let alone all three of us, is very slim indeed.
> 
> Unless, of course, you are an ISP, with limited resources, who
> can't run Bayes Databases per user, and with such a diverse user
> base that site-wide Bayes database might not be a good thing.....
> :-(
> 
> Still, my theory is still not that bad, even with a properly
> trained Bayes, because the spammer needs to defeat the rule checks
> anyways....
> 
> - C
> 

It still holds, from server to server, that there are enough
differences that a spammer will have a tough time getting spam out
to everyone using spamassassin ... or any other bayes system.

What rules do you use that I don't?  What other differences are
there between your SA and mine?

-- 
Raquel
============================================================
The world acquires value only through its extremes and endures only
through moderation; extremists make the world great, the moderates
give it stability.
  --Paul Valery


Re: [spa] Re: [spa] Semi-invisible font missed by SA

Posted by Aleksander Adamowski <al...@altkom.pl>.
Raquel Rice wrote:

>On Thu, 19 Feb 2004 13:46:25 -0500 (EST)
>Charles Gregory <cg...@hwcn.org> wrote:
>  
>
>The problem with your theory, is that your bayes hasn't been trained
>the way mine has, nor has mine been trained the way that Matt's has.
> The likelihood of any given spam getting past two of us, let alone
>all three of us, is very slim indeed.
>

Unfortunately the sample spam I've sent is quite good at defeating bayes 
with its poison and hiding the poison from both Spamassassin rules and 
the eye of the recipient.

Re: [spa] Re: [spa] Re: [spa] Semi-invisible font missed by SA

Posted by Greg Cirino - Cirelle Enterprises <gc...@cirelle.com>.
From: "Charles Gregory" <cg...@hwcn.org>
| Unless, of course, you are an ISP, with limited resources, who can't run
| Bayes Databases per user, and with such a diverse user base that site-wide
| Bayes database might not be a good thing..... :-(

In our case, we can only use a central bayes because we are
using a database to store individual user prefs, the alternative
is not to become available from what I understand.

Most spam is so similar that individual bayes dbs may not be
that important as long as each user can adjust their threshold
white lists and black lists.

There are quite a few very nice filters available as well, the best
of all are the ones that block style (for the lack of a better term)
over those that block specific words.

Although, I'm still looking for one that gives me the name, address
phone number and possibly a photo of the spammer.

Greg

Re: [spa] Re: [spa] Re: [spa] Semi-invisible font missed by SA

Posted by Charles Gregory <cg...@hwcn.org>.
On Thu, 19 Feb 2004, Raquel Rice wrote:
> > That would be my guess. We are now fully into the part of the
> > 'game' where the spammers get hold of spamassassin and run their
> > spew through it*before* trying to mail it, so that they can try
> > 'tricks' like these, and keep trying different ones until
> > something works.
> The problem with your theory, is that your bayes hasn't been trained
> the way mine has, nor has mine been trained the way that Matt's has.
> The likelihood of any given spam getting past two of us, let alone
> all three of us, is very slim indeed.

Unless, of course, you are an ISP, with limited resources, who can't run
Bayes Databases per user, and with such a diverse user base that site-wide
Bayes database might not be a good thing..... :-(

Still, my theory is still not that bad, even with a properly trained
Bayes, because the spammer needs to defeat the rule checks anyways....

- C


Re: [spa] Re: [spa] Semi-invisible font missed by SA

Posted by Raquel Rice <ra...@thericehouse.net>.
On Thu, 19 Feb 2004 13:46:25 -0500 (EST)
Charles Gregory <cg...@hwcn.org> wrote:

> On Thu, 19 Feb 2004, Aleksander Adamowski wrote:
> > What's intriguing is that the standard SA 2.60 test 
> > HTML_FONT_LOW_CONTRAST didn't catch it. I've tried raising
> > distance threshold in HTML.pm (line 350) to even aburdly high
> > values (over 80) and it still didn't match!
> > Is it possible that the dangling value causes HTML.pm tests to
> > miss this tag?
> 
> That would be my guess. We are now fully into the part of the
> 'game' where the spammers get hold of spamassassin and run their
> spew through it*before* trying to mail it, so that they can try
> 'tricks' like these, and keep trying different ones until
> something works.
> 
> - Charles
> 

The problem with your theory, is that your bayes hasn't been trained
the way mine has, nor has mine been trained the way that Matt's has.
 The likelihood of any given spam getting past two of us, let alone
all three of us, is very slim indeed.

-- 
Raquel
============================================================
Be a fountain, not a drain.
  --Rex Hudler


Re: [spa] Re: [spa] Semi-invisible font missed by SA

Posted by Charles Gregory <cg...@hwcn.org>.
On Thu, 19 Feb 2004, Aleksander Adamowski wrote:
> What's intriguing is that the standard SA 2.60 test 
> HTML_FONT_LOW_CONTRAST didn't catch it. I've tried raising distance 
> threshold in HTML.pm (line 350) to even aburdly high values (over 80) 
> and it still didn't match!
> Is it possible that the dangling value causes HTML.pm tests to miss this 
> tag?

That would be my guess. We are now fully into the part of the 'game' where
the spammers get hold of spamassassin and run their spew through it
*before* trying to mail it, so that they can try 'tricks' like these, and 
keep trying different ones until something works.

- Charles


Re: [spa] Semi-invisible font missed by SA

Posted by Aleksander Adamowski <al...@altkom.pl>.
Charles Gregory wrote:

>On Wed, 18 Feb 2004, Aleksander Adamowski wrote:
>  
>
>><font color=
>>
>>#fee4e8> sleepwalk negotiate sonic, diluting democrat humpback, 
>>    
>>
>
>I added a check for the dangling color spec....
>
>rawbody LOC_HTMLSPLITFONT  /^\#[a-z0-9]{6}\>/i
>describe LOC_HTMLSPLITFONT font color on separate line from font tag
>score LOC_HTMLSPLITFONT    0.7
>
>Can't score it too high becuase of potential FP's, but it helps.
>

What's intriguing is that the standard SA 2.60 test 
HTML_FONT_LOW_CONTRAST didn't catch it. I've tried raising distance 
threshold in HTML.pm (line 350) to even aburdly high values (over 80) 
and it still didn't match!

Is it possible that the dangling value causes HTML.pm tests to miss this 
tag?

-- 
Best Regards,
    Aleksander Adamowski
        GG#: 274614
        ICQ UIN: 19780575 
	http://olo.ab.altkom.pl


Re: [spa] Re: [spa] Semi-invisible font missed by SA

Posted by Charles Gregory <cg...@hwcn.org>.
On Thu, 19 Feb 2004, Aleksander Adamowski wrote:
> Here's the messae in question, it's interesting in general as it 
> received an unusually low SA score.

Sorry, you sent it as an attachment. We don't do those. (smile)
Please post plain text, where possible....

- C


Re: [spa] Semi-invisible font missed by SA

Posted by Aleksander Adamowski <al...@altkom.pl>.
Here's the messae in question, it's interesting in general as it 
received an unusually low SA score.

-- 
Best Regards,
     Aleksander Adamowski
         GG#: 274614
         ICQ UIN: 19780575
	http://olo.ab.altkom.pl

[OT] Developers search

Posted by German Staltari <gs...@arnet.net.ar>.
Hi, we're looking for developers that would be interested in continue the 
BlackHole Project,

http://blackhole.lohiser.com/
http://sourceforge.net/projects/blackholespam/

If someone is interested, just contact "Jonathan L. Haase" 
<jl...@iland.net>

TIA

German Staltari


Re: Semi-invisible font missed by SA

Posted by Charles Gregory <cg...@hwcn.org>.
On Wed, 18 Feb 2004, Robert Menschel wrote:
> Hello Charles,
> 
> Doesn't hit much here, but at least no FPs either:
> LOC_HTMLSPLITFONT -- 33s/0h of 100794 corpus (82099s/18695h) 02/18/04

Thanks for running these through. I really need to setup my own 'corpus'
and be able to test out new rules myself. Is there a handy reference for
setting up and using one?

 -Charles


Re[2]: [spa] Semi-invisible font missed by SA

Posted by Robert Menschel <Ro...@Menschel.net>.
Hello Charles,

Wednesday, February 18, 2004, 8:14:59 AM, you wrote:

CG> On Wed, 18 Feb 2004, Aleksander Adamowski wrote:
>> <font color=
>> 
>> #fee4e8> sleepwalk negotiate sonic, diluting democrat humpback, 

CG> I added a check for the dangling color spec....

CG> rawbody LOC_HTMLSPLITFONT  /^\#[a-z0-9]{6}\>/i
CG> describe LOC_HTMLSPLITFONT font color on separate line from font tag
CG> score LOC_HTMLSPLITFONT    0.7

CG> Can't score it too high becuase of potential FP's, but it helps.

Doesn't hit much here, but at least no FPs either:

Section 3 -- Frequencies Log
(First numeric frequencies, followed by percentage frequencies)

OVERALL     SPAM      HAM     S/O   SCORE  NAME
 100794    82099    18695    0.815   0.00    0.00  (all messages)
     33       33        0    1.000   1.00   0.70  LOC_HTMLSPLITFONT

LOC_HTMLSPLITFONT -- 33s/0h of 100794 corpus (82099s/18695h) 02/18/04

Bob Menschel



Re: [spa] Semi-invisible font missed by SA

Posted by Charles Gregory <cg...@hwcn.org>.
On Wed, 18 Feb 2004, Aleksander Adamowski wrote:
> <font color=
> 
> #fee4e8> sleepwalk negotiate sonic, diluting democrat humpback, 

I added a check for the dangling color spec....

rawbody LOC_HTMLSPLITFONT  /^\#[a-z0-9]{6}\>/i
describe LOC_HTMLSPLITFONT font color on separate line from font tag
score LOC_HTMLSPLITFONT    0.7

Can't score it too high becuase of potential FP's, but it helps.

- C