You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Marc Perkel <ma...@perkel.com> on 2007/03/22 17:55:07 UTC

Is Bayes Dead? Have the spammers won?

Maybe I'm doing something wrong but with the various methods of bayes 
poisoning going on I've found that bayes is just lowering the score of 
spam and causing more spam to get through. Where bayes used to be the 
centerpiece of spam filtering now I have turned it off to increase accuracy.

Anyone else seeing this or is there some new tricks that I'm missing out on?


Re: R: Is Bayes Dead? Have the spammers won?

Posted by ".rp" <pr...@moveupdate.com>.
> > 
> > On Thu, 22 Mar 2007 09:55:07 -0700, Marc Perkel <ma...@perkel.com>
> > wrote:
> > > Maybe I'm doing something wrong but with the various methods of
> > > bayes poisoning going on I've found that bayes is just lowering
> > > the score
> > of
> > > spam and causing more spam to get through. Where bayes used to be
> > > the centerpiece of spam filtering now I have turned it off to
> > > increase accuracy.
> > >
> > > Anyone else seeing this or is there some new tricks that I'm
> > > missing
> > out
> > > on?

	I use a 3 tier system to minimize the effect of poisining the Bayes tables.
First we do checking against a few databases for known spammer addresses,
then check the message for obvious spam (claiming to come from our server, 
honeypot addresses, words in subjects, high SA score with no Bayes scoring)
and then we do the Bayes scoring.


R: Is Bayes Dead? Have the spammers won?

Posted by Giampaolo Tomassoni <g....@libero.it>.
> -----Messaggio originale-----
> Da: --[ UxBoD ]-- [mailto:uxbod@splatnix.net]
> 
> Using a combination of numerous SA rules, bayes, FuzzyOCR and BotNet on
> a new server Ive just built we are trashing the SPAM.  Attached graph
> is for today :-

What does "received" mean in the graph?

Giampaolo


> Regards,
> 
> UxBoD
> 
> On Thu, 22 Mar 2007 09:55:07 -0700, Marc Perkel <ma...@perkel.com>
> wrote:
> > Maybe I'm doing something wrong but with the various methods of bayes
> > poisoning going on I've found that bayes is just lowering the score
> of
> > spam and causing more spam to get through. Where bayes used to be the
> > centerpiece of spam filtering now I have turned it off to increase
> > accuracy.
> >
> > Anyone else seeing this or is there some new tricks that I'm missing
> out
> > on?
> >
> >
> > --
> > This message has been scanned for viruses and dangerous content by
> > MailScanner, and is
> > believed to be clean.
> --
> --[ UxBoD ]--
> // PGP Key: "curl -s http://www.splatnix.net/uxbod.asc | gpg --import"
> // Fingerprint: 543A E778 7F2D 98F1 3E50 9C1F F190 93E0 E8E8 0CF8
> // Keyserver: www.keyserver.net Key-ID: 0xE8E80CF8
> // SIP Phone: uxbod@sip.splatnix.net
> --
> This message has been scanned for viruses and dangerous content by
> MailScanner, and is
> believed to be clean.



Re: Is Bayes Dead? Have the spammers won?

Posted by --, , UxBoD, , -- <ux...@splatnix.net>.
Using a combination of numerous SA rules, bayes, FuzzyOCR and BotNet on a new server Ive just built we are trashing the SPAM.  Attached graph is for today :-

Regards,

UxBoD

On Thu, 22 Mar 2007 09:55:07 -0700, Marc Perkel <ma...@perkel.com> wrote:
> Maybe I'm doing something wrong but with the various methods of bayes
> poisoning going on I've found that bayes is just lowering the score of
> spam and causing more spam to get through. Where bayes used to be the
> centerpiece of spam filtering now I have turned it off to increase
> accuracy.
> 
> Anyone else seeing this or is there some new tricks that I'm missing out
> on?
> 
> 
> --
> This message has been scanned for viruses and dangerous content by
> MailScanner, and is
> believed to be clean.
-- 
--[ UxBoD ]--
// PGP Key: "curl -s http://www.splatnix.net/uxbod.asc | gpg --import"
// Fingerprint: 543A E778 7F2D 98F1 3E50 9C1F F190 93E0 E8E8 0CF8
// Keyserver: www.keyserver.net Key-ID: 0xE8E80CF8
// SIP Phone: uxbod@sip.splatnix.net
-- 
This message has been scanned for viruses and dangerous content by MailScanner, and is
believed to be clean.


Re: Is Bayes Dead? Have the spammers won?

Posted by Marc Perkel <ma...@perkel.com>.

Henrik Krohns wrote:
> On Thu, Mar 22, 2007 at 09:55:07AM -0700, Marc Perkel wrote:
>   
>> Maybe I'm doing something wrong but with the various methods of bayes 
>> poisoning going on I've found that bayes is just lowering the score of 
>> spam and causing more spam to get through.
>>     
>
> So is there actually any real proof that Bayes poisoning works? I've yet to
> find any evidence. All the cases have been admins/users messing it up
> themselves.
>   

I'm just relating my experience and perhaps wondering if I'm doing 
something wrong.


Re: Is Bayes Dead? Have the spammers won?

Posted by Michel R Vaillancourt <mi...@wolfstar.ca>.
Henrik Krohns wrote:
> On Thu, Mar 22, 2007 at 09:55:07AM -0700, Marc Perkel wrote:
>> Maybe I'm doing something wrong but with the various methods of bayes 
>> poisoning going on I've found that bayes is just lowering the score of 
>> spam and causing more spam to get through.
> 
> So is there actually any real proof that Bayes poisoning works? I've yet to
> find any evidence. All the cases have been admins/users messing it up
> themselves.

	In point of fact, my own experience is that poisoning attempts make no difference at all.  Because the number of poison tokens in an established database is so small, they don't change anything.  However the incidence of other spam-positive keys "tips the hand".

	I use auto-learning.  Always have.  It has NEVER been a problem;  if I get an FP or FN, I resubmit those mails for retraining to the DB.

	I've even gone so far as to take a Spam mail that was visually more than 80% "poison", copy the poison out, put it around another spam mail and mail it to myself from a dummy account.  Result?  Bayes_99.  Took the same poision, wrapped it around a legitimate mail and sent it to myself.  Result?  Bayes_00.  You can't keep a good Bays down;  auto-learned or not.

	And I'm a little guy; 5000 messages a day ... 10000 if the lists I host are busy.  Its not like I have a massive bayes DB to work against.  The Big Boys should be even more accurate just by raw weight of statistical incidence.  Bayes Poison is fiction;  its not even good fiction.
-- 
	--Michel Vaillancourt
	Wolfstar Systems
	www.wolfstar.ca

Re: Is Bayes Dead? Have the spammers won?

Posted by Henrik Krohns <he...@hege.li>.
On Thu, Mar 22, 2007 at 09:55:07AM -0700, Marc Perkel wrote:
> Maybe I'm doing something wrong but with the various methods of bayes 
> poisoning going on I've found that bayes is just lowering the score of 
> spam and causing more spam to get through.

So is there actually any real proof that Bayes poisoning works? I've yet to
find any evidence. All the cases have been admins/users messing it up
themselves.

Re: Is Bayes Dead? Have the spammers won?

Posted by Mike Jackson <mj...@barking-dog.net>.
> /me continues to wait for the spammers to tire of greylisting

I work for a managed hosting provider, and I have seen spam messages get 
back customers' greylisting setups. It may be isolated, but some 
spammers are already starting to work around it.

Re: Is Bayes Dead? Have the spammers won?

Posted by Theo Van Dinter <fe...@apache.org>.
On Thu, Mar 22, 2007 at 02:41:03PM -0500, maillist wrote:
> I don't know about that.  I'd say that 95% of all spam filtered in my 
> system has BAYES_99 as a trigger, and of that, probably 75% - 85% would 
> not have been caught if not for that trigger.

Don't confuse filtering methods with rules.

-- 
Randomly Selected Tagline:
Harriet's Dining Observation:
 	In every restaurant, the hardness of the butter pats
 	increases in direct proportion to the softness of the bread.

Re: extract message-id's from logfile

Posted by Mark Samples <lp...@dmsgranbury.com>.
PERL:
#!/usr/bin/perl

while(<STDIN>) {
    if(/mid=<(.*)>/) {
        print "$1\n";
    }
}

cat spamd.log | <whatever you name above perl script>

will give you all of your 'mid' (message ids) from the spamd.log file 
(or whatever you
call you spam log file for SA).

Starckjohann, Ove wrote:

>Hi!
>
>bit offtopic, but maybe it's easy and someone is able to drop me the
>*magic* snippet of code:
>
>My logile looks like:
>
>Mar 23 10:15:55 admin05 spamd[6084]: spamd: result: Y 5 -
>AWL,BAYES_00,DCC_CHECK,DIGEST_MULTIPLE,HTML_MESSAGE,LOGINHASH2,MIME_HTML
>_ONLY,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E4_51_100,RAZOR2_CHECK
>scantime=1.8,size=4860,user=(unknown),uid=1002,required_score=5.0,rhost=
>mailgate.wee.com,raddr=10.10.10.21,rport=9661,mid=<15669820.200703231447
>06@thai-icecream.com>,bayes=1.25626575044335e-05,autolearn=no
>Mar 23 10:19:38 admin05 spamd[6084]: spamd: result: Y 7 -
>BAYES_00,DCC_CHECK,DIGEST_MULTIPLE,FRT_CONTACT,HTML_30_40,HTML_MESSAGE,H
>TML_TITLE_UNTITLED,LOGINHASH2,MULTIPART_ALT_NON_TEXT,NO_RECEIVED,NO_RELA
>YS,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E4_51_100,RAZOR2_CHECK
>scantime=2.7,size=12337,user=(unknown),uid=1002,required_score=5.0,rhost
>=mailgate.wee.com,raddr=10.10.10.21,rport=9897,mid=<32E4DC5C.0109384@nis
>hikoi.com>,bayes=1.66533453693773e-16,autolearn=no
>...
>
>i do need to extract the message-id's from there to get the following
>list:
>15669820.20070323144706@thai-icecream.com
>32E4DC5C.0109384@nishikoi.com
>
>How to realize ??
>
>Any skilled grep'ers / awk'ers / sed'ers alive here ?
>
>
>Ove Starckjohann
>
>  
>


extract message-id's from logfile

Posted by "Starckjohann, Ove" <st...@norddeutsche.de>.
Hi!

bit offtopic, but maybe it's easy and someone is able to drop me the
*magic* snippet of code:

My logile looks like:

Mar 23 10:15:55 admin05 spamd[6084]: spamd: result: Y 5 -
AWL,BAYES_00,DCC_CHECK,DIGEST_MULTIPLE,HTML_MESSAGE,LOGINHASH2,MIME_HTML
_ONLY,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E4_51_100,RAZOR2_CHECK
scantime=1.8,size=4860,user=(unknown),uid=1002,required_score=5.0,rhost=
mailgate.wee.com,raddr=10.10.10.21,rport=9661,mid=<15669820.200703231447
06@thai-icecream.com>,bayes=1.25626575044335e-05,autolearn=no
Mar 23 10:19:38 admin05 spamd[6084]: spamd: result: Y 7 -
BAYES_00,DCC_CHECK,DIGEST_MULTIPLE,FRT_CONTACT,HTML_30_40,HTML_MESSAGE,H
TML_TITLE_UNTITLED,LOGINHASH2,MULTIPART_ALT_NON_TEXT,NO_RECEIVED,NO_RELA
YS,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E4_51_100,RAZOR2_CHECK
scantime=2.7,size=12337,user=(unknown),uid=1002,required_score=5.0,rhost
=mailgate.wee.com,raddr=10.10.10.21,rport=9897,mid=<32E4DC5C.0109384@nis
hikoi.com>,bayes=1.66533453693773e-16,autolearn=no
...

i do need to extract the message-id's from there to get the following
list:
15669820.20070323144706@thai-icecream.com
32E4DC5C.0109384@nishikoi.com

How to realize ??

Any skilled grep'ers / awk'ers / sed'ers alive here ?


Ove Starckjohann

Re: Is Bayes Dead? Have the spammers won?

Posted by maillist <ma...@emailacs.com>.
Theo Van Dinter wrote:
> On Thu, Mar 22, 2007 at 09:55:07AM -0700, Marc Perkel wrote:
>   
>> Where bayes used to be the centerpiece of spam filtering ...
>>     
>
> FWIW, I don't think Bayes has really ever been the "centerpiece" of
> spam filtering.  Definitely not within SA anyway.  It's a good tool,
> but it's just another tool in the belt.
>   
I don't know about that.  I'd say that 95% of all spam filtered in my 
system has BAYES_99 as a trigger, and of that, probably 75% - 85% would 
not have been caught if not for that trigger.  But I don't autolearn, or 
autowhitelist.  I just don't have enough faith in my own setup to allow 
it to make it's own decisions.

-=Aubrey=-
> /me continues to wait for the spammers to tire of greylisting
>
>   


Re: Is Bayes Dead? Have the spammers won?

Posted by Leander Koornneef <l....@ic-s.nl>.
On 22-mrt-2007, at 20:02, Theo Van Dinter wrote:

> On Thu, Mar 22, 2007 at 09:55:07AM -0700, Marc Perkel wrote:
>> Where bayes used to be the centerpiece of spam filtering ...
>
> FWIW, I don't think Bayes has really ever been the "centerpiece" of
> spam filtering.  Definitely not within SA anyway.  It's a good tool,
> but it's just another tool in the belt.
>
> /me continues to wait for the spammers to tire of greylisting

Yes, exactly! Greylisting is still working amazingly well here.
Also, most spams that get past the greylisting border are still
hitting BAYES_90 or higher, even on instances where the
bayes system is only being trained by autolearning.

I do feel that greylisting is slowly becoming less effective though.
The amount of spams that get through may have risen by as much
as 50%, although this is extremely relative, because this means
that in my case six spams make it through each day, instead of
four, whereas I used to get >80 spams per day without greylisting.
I noticed that almost all of the spams that get through are GIF image
stock spam. Apparently, I should "GET IN ON THE YOUTUBE OF
CHINA NOW!", because that is all I'm reading about these days ;-)

Leander

Re: Is Bayes Dead? Have the spammers won?

Posted by Theo Van Dinter <fe...@apache.org>.
On Thu, Mar 22, 2007 at 09:55:07AM -0700, Marc Perkel wrote:
> Where bayes used to be the centerpiece of spam filtering ...

FWIW, I don't think Bayes has really ever been the "centerpiece" of
spam filtering.  Definitely not within SA anyway.  It's a good tool,
but it's just another tool in the belt.

/me continues to wait for the spammers to tire of greylisting

-- 
Randomly Selected Tagline:
"If you build something that any idiot can use, any idiot will."
                   - Patrick St. Jean

Re: Image spam

Posted by maillist <ma...@emailacs.com>.
David Gibbs wrote:
> --[ UxBoD ]-- wrote:
>   
>> Yes image spam can be a real pain. 
>>     
>
> While I agree that image spam is a PITA ... I have to wonder how ANYONE
> in the right mind could fall for that garbage.
>
> I mean, be real ... if the message you get contains an image, surrounded
> by garbage text, and the image quality is worse than a 60's era
> television picture, how hard is to figure out that the message is
> questionable?
>
> Half the image spam's I've read are so garbled, to avoid the ocr tests,
> it's impossible to decipher what they are trying to pump & dump anyways.
>
> david
>
>   

Maybe there's real genus involved in this image spam.  I have only 
received a few of them because luckily BAYES_99 catches them all (maybe 
I get on in 2 months).  We use to try to decipher what the messages 
were, just to see who could get it first.  Plus, it reminded us of Atari.

sa-learn them, and they should go away

-=Aubrey=-

Re: Image spam (was: Is Bayes Dead? Have the spammers won?)

Posted by "John D. Hardin" <jh...@impsec.org>.
On Tue, 27 Mar 2007, David Gibbs wrote:

> While I agree that image spam is a PITA ... I have to wonder how
> ANYONE in the right mind could fall for that garbage.
> 
> I mean, be real ... if the message you get contains an image,
> surrounded by garbage text, and the image quality is worse than a
> 60's era television picture, how hard is to figure out that the
> message is questionable?

Two things are infinite: the universe and human stupidity; and I'm not 
sure about the universe.
                                      -- Albert Einstein

But then, I'm a cynic.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  ...much of our country's counterterrorism security spending is not
  designed to protect us from the terrorists, but instead to protect
  our public officials from criticism when another attack occurs.
                                                    -- Bruce Schneier
-----------------------------------------------------------------------
 17 days until Thomas Jefferson's 264th Birthday


Image spam (was: Is Bayes Dead? Have the spammers won?)

Posted by David Gibbs <da...@midrange.com>.
--[ UxBoD ]-- wrote:
> Yes image spam can be a real pain. 

While I agree that image spam is a PITA ... I have to wonder how ANYONE
in the right mind could fall for that garbage.

I mean, be real ... if the message you get contains an image, surrounded
by garbage text, and the image quality is worse than a 60's era
television picture, how hard is to figure out that the message is
questionable?

Half the image spam's I've read are so garbled, to avoid the ocr tests,
it's impossible to decipher what they are trying to pump & dump anyways.

david


Re: Is Bayes Dead? Have the spammers won?

Posted by --, , UxBoD, , -- <ux...@splatnix.net>.
Yes image spam can be a real pain. I have just implemented a new mailserver and image spam is certainly on the increase :-

mysql> select count(*) from maillog;
+----------+
| count(*) |
+----------+
|    15091 | 
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from maillog where spamreport like '%FUZZY_OCR%';
+----------+
| count(*) |
+----------+
|     3438 | 
+----------+
1 row in set (0.04 sec)

mysql> select count(*) from maillog where spamreport like '%FUZZY_OCR_KNOWN_HASH%';
+----------+
| count(*) |
+----------+
|     1070 | 
+----------+
1 row in set (0.04 sec)


On Fri, 23 Mar 2007 06:46:50 -0700, Marc Perkel <ma...@perkel.com> wrote:
> Perhaps what I need to do is to get rid of autolearn and write my own
> learning system that strips out the body of messages with images and
> just learns the headers. My problem is that when users get image spam
> they put it in the spam folders and they get learned. But the text in
> the image spam causes ham type text to be learned as spam. That causes
> ham to get higher scores.
> 
> --
> This message has been scanned for viruses and dangerous content by
> MailScanner, and is
> believed to be clean.
-- 
--[ UxBoD ]--
// PGP Key: "curl -s http://www.splatnix.net/uxbod.asc | gpg --import"
// Fingerprint: 543A E778 7F2D 98F1 3E50 9C1F F190 93E0 E8E8 0CF8
// Keyserver: www.keyserver.net Key-ID: 0xE8E80CF8
// SIP Phone: uxbod@sip.splatnix.net


-- 
This message has been scanned for viruses and dangerous content by MailScanner, and is
believed to be clean.


Re: Is Bayes Dead? Have the spammers won?

Posted by Marc Perkel <ma...@perkel.com>.

Ian Eiloart wrote:
>
>
> --On 23 March 2007 11:08:12 -0700 Marc Perkel <ma...@perkel.com> wrote:
>
>>
>> What I think my problem might be is that I have done so much work
>> prescreening messages with Exim that what's left isn't good stock for
>> autolearn. I think what I need is a separate dedicated learner server
>> that is selective and smart about what it learns.
>
> You could do a fake reject, post-data, and pass the spam to your 
> learning engine. Don't forget that you MUST also pass it some ham.
>

Yes - that's what I'm doing now. I'm also checking for graphic 
attachments and if there's an embedded graphic I strip the body out and 
leard the headers only to prevent bayes poisoning. It seems to be 
working better now. Still need to give it time.

Re: Is Bayes Dead? Have the spammers won?

Posted by Ian Eiloart <ia...@sussex.ac.uk>.

--On 23 March 2007 11:08:12 -0700 Marc Perkel <ma...@perkel.com> wrote:

>
> What I think my problem might be is that I have done so much work
> prescreening messages with Exim that what's left isn't good stock for
> autolearn. I think what I need is a separate dedicated learner server
> that is selective and smart about what it learns.

You could do a fake reject, post-data, and pass the spam to your learning 
engine. Don't forget that you MUST also pass it some ham.

-- 
Ian Eiloart
IT Services, University of Sussex
x3148

Re: Is Bayes Dead? Have the spammers won?

Posted by Marc Perkel <ma...@perkel.com>.

Jim Maul wrote:
> Marc Perkel wrote:
>>
>>
>> Jim Maul wrote:
>>> Marc Perkel wrote:
>>>> Perhaps what I need to do is to get rid of autolearn and write my 
>>>> own learning system that strips out the body of messages with 
>>>> images and just learns the headers. My problem is that when users 
>>>> get image spam they put it in the spam folders and they get 
>>>> learned. But the text in the image spam causes ham type text to be 
>>>> learned as spam. That causes ham to get higher scores.
>>>>
>>>>
>>>
>>> Are you sure of this?  Have you also trained these ham messages to 
>>> counter this effect?  Not too long ago we were in the same 
>>> situation.  I have autolearn enabled but I have adjusted the 
>>> thresholds to avoid learning false positives/negatives.  We were 
>>> getting ham (although arguably - they were newsletter type ham) that 
>>> was hitting BAYES_99.  As soon as i started training them as ham the 
>>> problem went away.  Spam is still detected correctly by bayes and 
>>> these newsletters no longer hit bayes_99.
>>>
>>> -Jim
>>>
>>
>> What I think my problem might be is that I have done so much work 
>> prescreening messages with Exim that what's left isn't good stock for 
>> autolearn. I think what I need is a separate dedicated learner server 
>> that is selective and smart about what it learns.
>>
>>
>
> This is quite possible.  I have heard other stories of people using 
> things like greylisting and rbls to reject at smtp time that the only 
> things that eventually made it to SA were so limited that it would 
> produce odd results for bayes.  From my experience, the more you throw 
> at bayes, the better it gets.  The more selective you are, the less it 
> has to work with.
>
> Jim
>

Yes - I think that's what's happening to me. I also create an automatic 
whitelisting system that shaves off about 1/2 of ham bypassing SA. What 
I need to do is fork off a copy of a lot of email that's bypassing SA 
and stuff it into the learner. Like I said originally, bayes used to be 
my best tool. I'd like to get that back.


Re: Is Bayes Dead? Have the spammers won?

Posted by Jim Maul <jm...@elih.org>.
R Lists06 wrote:
>>>>>
>>>> Are you sure of this?  Have you also trained these ham messages to
>>>> counter this effect?  Not too long ago we were in the same situation.
>>>> I have autolearn enabled but I have adjusted the thresholds to avoid
>> This is quite possible.  I have heard other stories of people using
>> things like greylisting and rbls to reject at smtp time that the only
>> things that eventually made it to SA were so limited that it would
>> produce odd results for bayes.  From my experience, the more you throw
>> at bayes, the better it gets.  The more selective you are, the less it
>> has to work with.
>>
>> Jim
> 
> So are you saying for these purposes that you do not use RBLs or greylisting
> or other similar tools that cut down on the obvious cycle consuming garbage?
> 
>

Correct, i do not use RBLs or greylisting.  However, I have 1 domain, 
approx 100 users and receive only 2k messages/day.  We have one machine 
running qmail/SA/clamav which more than handles this load.  I can afford 
not to use rbls or greylisting - other larger setups may not be able to.

-Jim




RE: Is Bayes Dead? Have the spammers won?

Posted by R Lists06 <li...@abbacomm.net>.
> >>>
> >>>
> >>
> >> Are you sure of this?  Have you also trained these ham messages to
> >> counter this effect?  Not too long ago we were in the same situation.
> >> I have autolearn enabled but I have adjusted the thresholds to avoid
> This is quite possible.  I have heard other stories of people using
> things like greylisting and rbls to reject at smtp time that the only
> things that eventually made it to SA were so limited that it would
> produce odd results for bayes.  From my experience, the more you throw
> at bayes, the better it gets.  The more selective you are, the less it
> has to work with.
> 
> Jim

So are you saying for these purposes that you do not use RBLs or greylisting
or other similar tools that cut down on the obvious cycle consuming garbage?

 - rh

--
Robert - Abba Communications
http://www.abbacomm.net/


Re: Is Bayes Dead? Have the spammers won?

Posted by Jim Maul <jm...@elih.org>.
Marc Perkel wrote:
> 
> 
> Jim Maul wrote:
>> Marc Perkel wrote:
>>> Perhaps what I need to do is to get rid of autolearn and write my own 
>>> learning system that strips out the body of messages with images and 
>>> just learns the headers. My problem is that when users get image spam 
>>> they put it in the spam folders and they get learned. But the text in 
>>> the image spam causes ham type text to be learned as spam. That 
>>> causes ham to get higher scores.
>>>
>>>
>>
>> Are you sure of this?  Have you also trained these ham messages to 
>> counter this effect?  Not too long ago we were in the same situation.  
>> I have autolearn enabled but I have adjusted the thresholds to avoid 
>> learning false positives/negatives.  We were getting ham (although 
>> arguably - they were newsletter type ham) that was hitting BAYES_99.  
>> As soon as i started training them as ham the problem went away.  Spam 
>> is still detected correctly by bayes and these newsletters no longer 
>> hit bayes_99.
>>
>> -Jim
>>
> 
> What I think my problem might be is that I have done so much work 
> prescreening messages with Exim that what's left isn't good stock for 
> autolearn. I think what I need is a separate dedicated learner server 
> that is selective and smart about what it learns.
> 
> 

This is quite possible.  I have heard other stories of people using 
things like greylisting and rbls to reject at smtp time that the only 
things that eventually made it to SA were so limited that it would 
produce odd results for bayes.  From my experience, the more you throw 
at bayes, the better it gets.  The more selective you are, the less it 
has to work with.

Jim

Re: Is Bayes Dead? Have the spammers won?

Posted by Marc Perkel <ma...@perkel.com>.

Jim Maul wrote:
> Marc Perkel wrote:
>> Perhaps what I need to do is to get rid of autolearn and write my own 
>> learning system that strips out the body of messages with images and 
>> just learns the headers. My problem is that when users get image spam 
>> they put it in the spam folders and they get learned. But the text in 
>> the image spam causes ham type text to be learned as spam. That 
>> causes ham to get higher scores.
>>
>>
>
> Are you sure of this?  Have you also trained these ham messages to 
> counter this effect?  Not too long ago we were in the same situation.  
> I have autolearn enabled but I have adjusted the thresholds to avoid 
> learning false positives/negatives.  We were getting ham (although 
> arguably - they were newsletter type ham) that was hitting BAYES_99.  
> As soon as i started training them as ham the problem went away.  Spam 
> is still detected correctly by bayes and these newsletters no longer 
> hit bayes_99.
>
> -Jim
>

What I think my problem might be is that I have done so much work 
prescreening messages with Exim that what's left isn't good stock for 
autolearn. I think what I need is a separate dedicated learner server 
that is selective and smart about what it learns.

Re: Is Bayes Dead? Have the spammers won?

Posted by frank jones <ab...@hotmail.com>.
Images were killing us until we installed focr. It really helped. I'm 
dreading the day that the scum find a way to circumvent that though. As an 
aside, I just noticed a bunch of spam like this in our quarantine (scored 
very very high so no one normally sees it, but I look sometimes):


Subject: SPAM: HIGH *  anti-spammers are lamers
Parts/Attachments:
   1   OK      3 lines  Text (charset: ISO-8859-2)
   2 Shown   ~14 lines  Text (charset: ISO-8859-2)
----------------------------------------

subj

regards, spammer.


>From: "Luis Hernán Otegui" <lu...@gmail.com>
>To: "Spamassassin talk list" <us...@spamassassin.apache.org>
>Subject: Re: Is Bayes Dead? Have the spammers won?
>Date: Fri, 23 Mar 2007 11:45:22 -0300
>
>Well, my two cents on this:
>When I upgraded my servers (about half a year ago) and started using a
>mysql-based Bayes DB, image spams began to drive me crazy. Seemed like 
>there
>was no way to stop them. But with a good purge of bayes, a rebuild, and the
>addition of sa-update rules, it all began to get better. Right now, I have
>implemented a system for my users to train a global Bayes database, and I
>must say it is working almost flawlessly. Only a few discussion lists got
>BAYES_99 hits, but as soon as the users forwarded them to the ham training
>account (or moved them to their webmail-based HAM folders), everything got
>better. I'm a small fish in this fight (two servers, about 400 users each,
>~25000 messages a day, ~20000 rejected via zenspamhaus.org mostly, ~1100
>spam messages, and ~30 virus messages a day), but I must say that taking
>good care of my Bayes database has improved a lot the spam fighting
>capabilities of my servers. It includes making sa-forget of false 
>positives,
>then feeding them to sa-learn as ham, sa-forget of false negatives and
>making SA analyze and report them, etc. Luckily, I managed to write some
>scripts to do the work for me. They're still at test stage, but I'm
>convinced that they seem to perform very well...
>
>A taste: http://www.biol.unlp.edu.ar/cgi-bin/mailgraph.cgi
>
>
>Luis
>
>2007/3/23, Jim Maul <jm...@elih.org>:
>>
>>Marc Perkel wrote:
>> > Perhaps what I need to do is to get rid of autolearn and write my own
>> > learning system that strips out the body of messages with images and
>> > just learns the headers. My problem is that when users get image spam
>> > they put it in the spam folders and they get learned. But the text in
>> > the image spam causes ham type text to be learned as spam. That causes
>> > ham to get higher scores.
>> >
>> >
>>
>>Are you sure of this?  Have you also trained these ham messages to
>>counter this effect?  Not too long ago we were in the same situation.  I
>>have autolearn enabled but I have adjusted the thresholds to avoid
>>learning false positives/negatives.  We were getting ham (although
>>arguably - they were newsletter type ham) that was hitting BAYES_99.  As
>>soon as i started training them as ham the problem went away.  Spam is
>>still detected correctly by bayes and these newsletters no longer hit
>>bayes_99.
>>
>>-Jim
>>
>
>
>
>--
>-------------------------------------------------
>GNU-GPL: "May The Source Be With You...
>-------------------------------------------------

_________________________________________________________________
Interest Rates near 39yr lows! $430,000 Mortgage for $1,399/mo - Calculate 
new payment 
http://www.lowermybills.com/lre/index.jsp?sourceid=lmb-9632-18466&moid=7581


Re: Is Bayes Dead? Have the spammers won?

Posted by Matt <lm...@gmail.com>.
>  But with a good purge of bayes, a rebuild, and the
> addition of sa-update rules,

How do you safely purge bayes anyway?


Matt

Re: Is Bayes Dead? Have the spammers won?

Posted by Luis Hernán Otegui <lu...@gmail.com>.
Well, my two cents on this:
 When I upgraded my servers (about half a year ago) and started using a
mysql-based Bayes DB, image spams began to drive me crazy. Seemed like there
was no way to stop them. But with a good purge of bayes, a rebuild, and the
addition of sa-update rules, it all began to get better. Right now, I have
implemented a system for my users to train a global Bayes database, and I
must say it is working almost flawlessly. Only a few discussion lists got
BAYES_99 hits, but as soon as the users forwarded them to the ham training
account (or moved them to their webmail-based HAM folders), everything got
better. I'm a small fish in this fight (two servers, about 400 users each,
~25000 messages a day, ~20000 rejected via zenspamhaus.org mostly, ~1100
spam messages, and ~30 virus messages a day), but I must say that taking
good care of my Bayes database has improved a lot the spam fighting
capabilities of my servers. It includes making sa-forget of false positives,
then feeding them to sa-learn as ham, sa-forget of false negatives and
making SA analyze and report them, etc. Luckily, I managed to write some
scripts to do the work for me. They're still at test stage, but I'm
convinced that they seem to perform very well...

A taste: http://www.biol.unlp.edu.ar/cgi-bin/mailgraph.cgi


Luis

2007/3/23, Jim Maul <jm...@elih.org>:
>
> Marc Perkel wrote:
> > Perhaps what I need to do is to get rid of autolearn and write my own
> > learning system that strips out the body of messages with images and
> > just learns the headers. My problem is that when users get image spam
> > they put it in the spam folders and they get learned. But the text in
> > the image spam causes ham type text to be learned as spam. That causes
> > ham to get higher scores.
> >
> >
>
> Are you sure of this?  Have you also trained these ham messages to
> counter this effect?  Not too long ago we were in the same situation.  I
> have autolearn enabled but I have adjusted the thresholds to avoid
> learning false positives/negatives.  We were getting ham (although
> arguably - they were newsletter type ham) that was hitting BAYES_99.  As
> soon as i started training them as ham the problem went away.  Spam is
> still detected correctly by bayes and these newsletters no longer hit
> bayes_99.
>
> -Jim
>



-- 
-------------------------------------------------
GNU-GPL: "May The Source Be With You...
-------------------------------------------------

Re: Is Bayes Dead? Have the spammers won?

Posted by Jim Maul <jm...@elih.org>.
Marc Perkel wrote:
> Perhaps what I need to do is to get rid of autolearn and write my own 
> learning system that strips out the body of messages with images and 
> just learns the headers. My problem is that when users get image spam 
> they put it in the spam folders and they get learned. But the text in 
> the image spam causes ham type text to be learned as spam. That causes 
> ham to get higher scores.
> 
> 

Are you sure of this?  Have you also trained these ham messages to 
counter this effect?  Not too long ago we were in the same situation.  I 
have autolearn enabled but I have adjusted the thresholds to avoid 
learning false positives/negatives.  We were getting ham (although 
arguably - they were newsletter type ham) that was hitting BAYES_99.  As 
soon as i started training them as ham the problem went away.  Spam is 
still detected correctly by bayes and these newsletters no longer hit 
bayes_99.

-Jim

Re: Is Bayes Dead? Have the spammers won?

Posted by "John D. Hardin" <jh...@impsec.org>.
On Fri, 23 Mar 2007, Marc Perkel wrote:

> Perhaps what I need to do is to get rid of autolearn and write my
> own learning system that strips out the body of messages with
> images and just learns the headers. My problem is that when users
> get image spam they put it in the spam folders and they get
> learned. But the text in the image spam causes ham type text to be
> learned as spam. That causes ham to get higher scores.

Perhaps better: purge the learning folders of messages with image 
attachments before learning.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  It is not the place of government to make right every tragedy and
  woe that befalls every resident of the nation.
-----------------------------------------------------------------------
 592 days until the Presidential Election


Re: Is Bayes Dead? Have the spammers won?

Posted by Marc Perkel <ma...@perkel.com>.
Perhaps what I need to do is to get rid of autolearn and write my own 
learning system that strips out the body of messages with images and 
just learns the headers. My problem is that when users get image spam 
they put it in the spam folders and they get learned. But the text in 
the image spam causes ham type text to be learned as spam. That causes 
ham to get higher scores.

Re: Is Bayes Dead? Have the spammers won?

Posted by Johann Spies <js...@sun.ac.za>.
On Thu, Mar 22, 2007 at 09:55:07AM -0700, Marc Perkel wrote:
> Maybe I'm doing something wrong but with the various methods of bayes 
> poisoning going on I've found that bayes is just lowering the score of 
> spam and causing more spam to get through. Where bayes used to be the 
> centerpiece of spam filtering now I have turned it off to increase accuracy.
> 
> Anyone else seeing this or is there some new tricks that I'm missing out on?

We had to lower our bayesian filter's score from 7.2 to something like
6.4 (8.0 threshold) as a result of the image spam but it still doing a
good job.

My experience with fuzzyocr was not good enough to implement it on all
our mail servers.  Exim had regular problems with the feedback from
Spamassassin when fuzzyocr was active and recently Spamassassin died
because of some problem fuzzyocr had with some mails - so I disabled it
on the one server I was trying it out.

The result is more image spam.  Maybe it is time to rebuild the bayesian
database with "clean" spam excluding image spam and a lot of ham
messages.

Regards
Johann
-- 
Johann Spies          Telefoon: 021-808 4036
Informasietegnologie, Universiteit van Stellenbosch

     "Jesus said unto her, I am the resurrection, and the 
      life; he that believeth in me, though he were dead, 
      yet shall he live; And whosoever liveth and believeth 
      in me shall never die.        John 11:25,26 

Re: Is Bayes Dead? Have the spammers won?

Posted by Anthony Peacock <a....@chime.ucl.ac.uk>.
Hi,

My Bayes is just as accurate as it has always been.

Any false negatives usually all have BAYES_99 in them, they just don't 
have enough other rule hits to raise the overall score above the threshold.

Marc Perkel wrote:
> Maybe I'm doing something wrong but with the various methods of bayes 
> poisoning going on I've found that bayes is just lowering the score of 
> spam and causing more spam to get through. Where bayes used to be the 
> centerpiece of spam filtering now I have turned it off to increase 
> accuracy.
> 
> Anyone else seeing this or is there some new tricks that I'm missing out 
> on?
> 
> 


-- 
Anthony Peacock
CHIME, Royal Free & University College Medical School
WWW:    http://www.chime.ucl.ac.uk/~rmhiajp/
"If you have an apple and I have  an apple and we  exchange apples
then you and I will still each have  one apple. But  if you have an
idea and I have an idea and we exchange these ideas, then each of us
will have two ideas." -- George Bernard Shaw

Re: Is Bayes Dead? Have the spammers won?

Posted by Jason Marshall <ma...@spots.ab.ca>.
I was wondering the same thing, idly.  Then one day my Bayes stopped 
working and I went from 30-40 spams getting through in a day to 500-600 
getting through.  Believe me, I think Bayes is doing a decent job of 
adding to the scores of spammy messages...

> Maybe I'm doing something wrong but with the various methods of bayes 
> poisoning going on I've found that bayes is just lowering the score of spam 
> and causing more spam to get through. Where bayes used to be the centerpiece 
> of spam filtering now I have turned it off to increase accuracy.
>
> Anyone else seeing this or is there some new tricks that I'm missing out on?
>

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
| Jason Marshall, marshalj@spots.ab.ca. Spots InterConnect, Inc. Calgary, AB |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Re: Is Bayes Dead? Have the spammers won?

Posted by Rajkumar S <ra...@gmail.com>.
On 3/22/07, Kris Deugau <kd...@vianet.ca> wrote:
> Anyone using SA in an ISP environment will run into this problem;

I agree here, I am using SA in an ISP and I have disabled Bayes. There
is no way I can get regular good supply of ham from our customers. No
one want's to forward their good mails to me (or any ISP) regularly to
train Bayes.  And we have a wide spectrum of customers, so Bayes will
cause more damage than good, if I do not get enough volume of mails
for training.

I am interested in hearing from any one using Bayes in ISP though.

raj

Re: Is Bayes Dead? Have the spammers won?

Posted by "John D. Hardin" <jh...@impsec.org>.
On Thu, 22 Mar 2007, Kris Deugau wrote:

> John D. Hardin wrote:
> > I've never trusted automatic learning. Why let your Bayes database be 
> > (even partially) under the control of a third party, particularly 
> > when that third party is the attacker?
> 
> Because there's no other (practical and/or ethical) way of getting 
> enough ham to make it useful?

Fair enough. I've only ever administered a limited-size trusted
environment (small corporate and personal).

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  USMC Rules of Gunfighting #7: In ten years nobody will remember the
  details of caliber, stance, or tactics. They will only remember who
  lived.
-----------------------------------------------------------------------
 593 days until the Presidential Election


Re: Is Bayes Dead? Have the spammers won?

Posted by Kris Deugau <kd...@vianet.ca>.
John D. Hardin wrote:
> I've never trusted automatic learning. Why let your Bayes database be 
> (even partially) under the control of a third party, particularly 
> when that third party is the attacker?

Because there's no other (practical and/or ethical) way of getting 
enough ham to make it useful?

Anyone using SA in an ISP environment will run into this problem;  about 
the only way I can see to legitimately get any real volume of ham is to 
send customers' outbound mail into a learning queue somewhere.  Even 
that has its limits and issues - for instance, the fact that any ISP 
larger than a few thousand customers will likely have completely 
separate paths for inbound and outbound mail, which *will* affect the 
usefulness of the learning.  :/

I've been running the same Bayes databases on one system and my personal 
email since I upgraded from SA2.44 to 2.54 and started using Bayes;  I'd 
be running the original Bayes DB on another system if I had figured out 
I *could* just continue to use the exact same files upgrading 
2.64->3.1.7 at the time.

Accuracy on the continuous-use databases hasn't suffered for the 
autolearning, so far as I can tell...  but the more out-of-date SA 
itself got the worse it was at tagging spam.

I *do* regularly feed back both my own missed-spams (my account, and 
three role accounts), as well as customer-submitted missed-spam.  Lately 
there have only been four or five (reported) FNs per day, across the 
whole system.

-kgd

Re: Is Bayes Dead? Have the spammers won?

Posted by "John D. Hardin" <jh...@impsec.org>.
On Thu, 22 Mar 2007, Marc Perkel wrote:

> Maybe I'm doing something wrong but with the various methods of
> bayes poisoning going on I've found that bayes is just lowering
> the score of spam and causing more spam to get through. Where
> bayes used to be the centerpiece of spam filtering now I have
> turned it off to increase accuracy.

I've never trusted automatic learning. Why let your Bayes database be 
(even partially) under the control of a third party, particularly 
when that third party is the attacker?

If a spam technique that scores low is found or that does not place
the commercial message in the textual parts of the message, and you
have automatic learning turned on, then the bad guys have the ability
to affect to a degree your token balance.

Hand-trained bayes can't be affected by poisoning.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  USMC Rules of Gunfighting #7: In ten years nobody will remember the
  details of caliber, stance, or tactics. They will only remember who
  lived.
-----------------------------------------------------------------------
 593 days until the Presidential Election