You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Joe Flowers <fl...@social.chass.ncsu.edu> on 2005/02/16 14:26:43 UTC

Time for my monthly beating again...

Later today I'll be implementing a "drifting" spam/ham dividing line 
(one "line" for the entire system - not individually set per email 
account) to see how effective it is or how effective it appears to be.

I'm curious to know if the dividing line will drift into a wall on some 
self-imposed boundary edge or if it will converge to a point for us or 
if it will slowly drift around in circles.

I'm "determining" the dividing line by taking the average of all of the 
SA hits of all of the messages and changing the dividing line, on the 
fly, for each subsequent message.

Anyone want to tell me or speculate on how this experiment will end or 
what it will tell me, whether I'm listening or not?

For us, SA *seems* to score SPAM messages with lower and lower hit 
scores as time goes by, and the users get more and more glassy-eyed over 
it's ("my" if you prefer) effectiveness as time goes by too.

I've spent a lot of time with the bayesian stuff and sa-learn, but still 
it seems to drift downward.

And, I have to agree that SA is very good but requires a lot of 
attention by someone who knows what they are doing - which, of course, 
may or may not be me.

Nonetheless, I have this problem before me and am attempting a possible 
solution.

Joe

Re: Time for my monthly beating again...

Posted by Martin Hepworth <ma...@solid-state-logic.com>.

Joe

ahh well then....the additional rules from www.rulemporium.com (not 
bigevil.cf) will help alot.

as will the URI-RBL extras from www.surbl.org (see 
http://sourceforge.net/projects/spamcopuri/ for a 2.64 patch to enable 
this).


--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300


Joe Flowers wrote:
> 2.64 currently...I'm hoping to move to 3.0x soon...after I see how this 
> experiment goes.
> It's just a plain-jane out-of-the-box install, nothing special, except 
> maybe I'm doing AWL checks too, which I've seen from the list can cause 
> some headaches from its use or misuse. Although, I have run this without 
> AWL and the same drifting downward seemed to be occuring then too.
> 
> Oh yeah, I think I've disabled the ALL_TRUSTED rule...can't think of 
> anything else.
> 
> J
> 
> Martin Hepworth wrote:
> 
>> Joe
>>
>> what SA version and what extra rules? Using the URI-RBL's?
>>
>> -- 
>> Martin Hepworth
>> Snr Systems Administrator
>> Solid State Logic
>> Tel: +44 (0)1865 842300
>>
>>
>> Joe Flowers wrote:
>>
>>> Later today I'll be implementing a "drifting" spam/ham dividing line 
>>> (one "line" for the entire system - not individually set per email 
>>> account) to see how effective it is or how effective it appears to be.
>>>
>>> I'm curious to know if the dividing line will drift into a wall on 
>>> some self-imposed boundary edge or if it will converge to a point for 
>>> us or if it will slowly drift around in circles.
>>>
>>> I'm "determining" the dividing line by taking the average of all of 
>>> the SA hits of all of the messages and changing the dividing line, on 
>>> the fly, for each subsequent message.
>>>
>>> Anyone want to tell me or speculate on how this experiment will end 
>>> or what it will tell me, whether I'm listening or not?
>>>
>>> For us, SA *seems* to score SPAM messages with lower and lower hit 
>>> scores as time goes by, and the users get more and more glassy-eyed 
>>> over it's ("my" if you prefer) effectiveness as time goes by too.
>>>
>>> I've spent a lot of time with the bayesian stuff and sa-learn, but 
>>> still it seems to drift downward.
>>>
>>> And, I have to agree that SA is very good but requires a lot of 
>>> attention by someone who knows what they are doing - which, of 
>>> course, may or may not be me.
>>>
>>> Nonetheless, I have this problem before me and am attempting a 
>>> possible solution.
>>>
>>> Joe
>>>
>>>
>>
>> **********************************************************************
>>
>> This email and any files transmitted with it are confidential and
>> intended solely for the use of the individual or entity to whom they
>> are addressed. If you have received this email in error please notify
>> the system manager.
>>
>> This footnote confirms that this email message has been swept
>> for the presence of computer viruses and is believed to be clean.
>>
>> **********************************************************************
>>
>>
> 
> 

**********************************************************************

This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.

**********************************************************************

Re: Time for my monthly beating again...

Posted by Joe Flowers <fl...@social.chass.ncsu.edu>.

2.64 currently...I'm hoping to move to 3.0x soon...after I see how this 
experiment goes.
It's just a plain-jane out-of-the-box install, nothing special, except 
maybe I'm doing AWL checks too, which I've seen from the list can cause 
some headaches from its use or misuse. Although, I have run this without 
AWL and the same drifting downward seemed to be occuring then too.

Oh yeah, I think I've disabled the ALL_TRUSTED rule...can't think of 
anything else.

J

Martin Hepworth wrote:

> Joe
>
> what SA version and what extra rules? Using the URI-RBL's?
>
> -- 
> Martin Hepworth
> Snr Systems Administrator
> Solid State Logic
> Tel: +44 (0)1865 842300
>
>
> Joe Flowers wrote:
>
>> Later today I'll be implementing a "drifting" spam/ham dividing line 
>> (one "line" for the entire system - not individually set per email 
>> account) to see how effective it is or how effective it appears to be.
>>
>> I'm curious to know if the dividing line will drift into a wall on 
>> some self-imposed boundary edge or if it will converge to a point for 
>> us or if it will slowly drift around in circles.
>>
>> I'm "determining" the dividing line by taking the average of all of 
>> the SA hits of all of the messages and changing the dividing line, on 
>> the fly, for each subsequent message.
>>
>> Anyone want to tell me or speculate on how this experiment will end 
>> or what it will tell me, whether I'm listening or not?
>>
>> For us, SA *seems* to score SPAM messages with lower and lower hit 
>> scores as time goes by, and the users get more and more glassy-eyed 
>> over it's ("my" if you prefer) effectiveness as time goes by too.
>>
>> I've spent a lot of time with the bayesian stuff and sa-learn, but 
>> still it seems to drift downward.
>>
>> And, I have to agree that SA is very good but requires a lot of 
>> attention by someone who knows what they are doing - which, of 
>> course, may or may not be me.
>>
>> Nonetheless, I have this problem before me and am attempting a 
>> possible solution.
>>
>> Joe
>>
>>
>
> **********************************************************************
>
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they
> are addressed. If you have received this email in error please notify
> the system manager.
>
> This footnote confirms that this email message has been swept
> for the presence of computer viruses and is believed to be clean.
>
> **********************************************************************
>
>

Re: Time for my monthly beating again...

Posted by Martin Hepworth <ma...@solid-state-logic.com>.

Joe

what SA version and what extra rules? Using the URI-RBL's?

--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300


Joe Flowers wrote:
> Later today I'll be implementing a "drifting" spam/ham dividing line 
> (one "line" for the entire system - not individually set per email 
> account) to see how effective it is or how effective it appears to be.
> 
> I'm curious to know if the dividing line will drift into a wall on some 
> self-imposed boundary edge or if it will converge to a point for us or 
> if it will slowly drift around in circles.
> 
> I'm "determining" the dividing line by taking the average of all of the 
> SA hits of all of the messages and changing the dividing line, on the 
> fly, for each subsequent message.
> 
> Anyone want to tell me or speculate on how this experiment will end or 
> what it will tell me, whether I'm listening or not?
> 
> For us, SA *seems* to score SPAM messages with lower and lower hit 
> scores as time goes by, and the users get more and more glassy-eyed over 
> it's ("my" if you prefer) effectiveness as time goes by too.
> 
> I've spent a lot of time with the bayesian stuff and sa-learn, but still 
> it seems to drift downward.
> 
> And, I have to agree that SA is very good but requires a lot of 
> attention by someone who knows what they are doing - which, of course, 
> may or may not be me.
> 
> Nonetheless, I have this problem before me and am attempting a possible 
> solution.
> 
> Joe
> 
> 

**********************************************************************

This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.

**********************************************************************

Re: Time for my monthly beating again...

Posted by Kevin Peuhkurinen <ke...@hepcoe.com>.

Hey Joe.  My 2.64 install is running so well, I almost don't want to 
upgrade to 3.0.2, and I really don't need to spend too much time on it 
to keep it that way.     Perhaps you just need to devote a couple of 
days to do some tweaking and thereafter it should run well on its own.   
Finding out what works for others and taking the time to implement it 
would probably be a better use of your time than to attempt the 
experiment you are suggesting.

My current set up is:  SA runs as a relay, checking email then passing 
it on to our Exchange server.    I have the following extra rulesets 
taken from rulesemporium.com:
70_sare_adult.cf            
70_sare_html0.cf     
70_sare_bayes_poison_nxm.cf 
70_sare_spoof.cf
70_sare_genlsubj0.cf        
70_sare_html1.cf     
72_sare_bml_post25x.cf
70_sare_header0.cf          
70_sare_random.cf   

I am also using DCC, Razor2, and SpamcopURI.   My Bayes database is 
global with autolearning and I am not using AWL.  I've tweaked the 
scores on some of the tests and disabled a few tests.

I have it set up such that anything that hits 3.5 or higher is consider 
spam.   Anything that scores 8 or higher, which is the vast majority of 
spam (about 2000 emails per day) is kept on the SA server and a script 
automatically deletes any emails over 2 weeks old.  Anything from 3.5 to 
7 is sent to a special mailbox on my Exchange server.   This is about 
100-200 emails per day.    I spend about five minutes each morning 
glancing through these emails looking for false positives, of which I 
see at most one per week.   These are forwarded to the correct recipient 
and copies of them are placed into a special "false negative" folder.   
I also have a special "false positive" folder into which my users can 
drag and drop spam that gets through to them.   On a typical day I see 
between five and ten emails put into there, which out of about 450 users 
is pretty darn good.   I spend another five minutes each afternoon 
looking at these emails and making sure that they are in fact spam.   A 
script on the SA server runs automatically each night and feeds the 
false negatives and positives through sa-learn.   If I start to see a 
bunch of similar spams getting through, I'll spend an hour or two 
writing, testing, and deploying a rule to catch them.  This happens 
about once a month.

So, all in all I generally spend 10-15 minutes per day looking after SA 
while achieving very satisfactory false positive and negative rates.   
The only reason I'm bothering to upgrade is that I'm starting to see 
some SA time outs as my production server is a 350mhz clunker.   I got 
myself a brand new 3Ghz server to take over and figure that I ought to 
do the upgrade to 3.0.2 at the same time.

Joe Flowers wrote:

> Later today I'll be implementing a "drifting" spam/ham dividing line 
> (one "line" for the entire system - not individually set per email 
> account) to see how effective it is or how effective it appears to be.
>
> I'm curious to know if the dividing line will drift into a wall on 
> some self-imposed boundary edge or if it will converge to a point for 
> us or if it will slowly drift around in circles.
>
> I'm "determining" the dividing line by taking the average of all of 
> the SA hits of all of the messages and changing the dividing line, on 
> the fly, for each subsequent message.
>
> Anyone want to tell me or speculate on how this experiment will end or 
> what it will tell me, whether I'm listening or not?
>
> For us, SA *seems* to score SPAM messages with lower and lower hit 
> scores as time goes by, and the users get more and more glassy-eyed 
> over it's ("my" if you prefer) effectiveness as time goes by too.
>
> I've spent a lot of time with the bayesian stuff and sa-learn, but 
> still it seems to drift downward.
>
> And, I have to agree that SA is very good but requires a lot of 
> attention by someone who knows what they are doing - which, of course, 
> may or may not be me.
>
> Nonetheless, I have this problem before me and am attempting a 
> possible solution.
>
> Joe
>
>
>

Re: Time for my monthly beating again...

Posted by Joe Flowers <fl...@social.chass.ncsu.edu>.

Interesting Chris...thanks for the feedback...at least maybe I'm still 
on the planet somewhere..

My "monthly" word means that I've been feeling too good about myself 
lately, so I'm due for a slap-down on how dumb I am.

J

Chr. von Stuckrad wrote:

>On Wed, Feb 16, 2005 at 08:26:43AM -0500, Joe Flowers wrote:
>  
>
>>For us, SA *seems* to score SPAM messages with lower and lower hit 
>>scores as time goes by, and the users get more and more glassy-eyed over 
>>it's ("my" if you prefer) effectiveness as time goes by too.
>>    
>>
>
>OH, interesting, I think I had the same effect with a
>GLOBAL bayes database deteriorating slowly
>(being slowly poisoned, I assumed).
>
>I have two completely identical servers, one *working*
>as virus-, the other as spam-filter (so both can be switched
>to do both, if one breaks).
>
>But only the 'actualy running spamfilter' has the
>'actual/current' bayes database.
>
>Testing the same new/undetected spam gave me slowly decreasing
>Values on the 'learning' and 'nearly the same(*)' values on
>the 'not-learning' machine! (* both hosts get my updated configs,
>so values changed anyway).
>
>I simply dropped the 'rotten' bayes database and the problem
>went away ... I'm waiting what comes up now ...
>(does your subject ipmply the poison to work monthly?)
>
>Stucki
>
>  
>

Re: Time for my monthly beating again...

Posted by "Chr. von Stuckrad" <st...@mi.fu-berlin.de>.

On Wed, Feb 16, 2005 at 08:26:43AM -0500, Joe Flowers wrote:
> For us, SA *seems* to score SPAM messages with lower and lower hit 
> scores as time goes by, and the users get more and more glassy-eyed over 
> it's ("my" if you prefer) effectiveness as time goes by too.

OH, interesting, I think I had the same effect with a
GLOBAL bayes database deteriorating slowly
(being slowly poisoned, I assumed).

I have two completely identical servers, one *working*
as virus-, the other as spam-filter (so both can be switched
to do both, if one breaks).

But only the 'actualy running spamfilter' has the
'actual/current' bayes database.

Testing the same new/undetected spam gave me slowly decreasing
Values on the 'learning' and 'nearly the same(*)' values on
the 'not-learning' machine! (* both hosts get my updated configs,
so values changed anyway).

I simply dropped the 'rotten' bayes database and the problem
went away ... I'm waiting what comes up now ...
(does your subject ipmply the poison to work monthly?)

Stucki

-- 
Christoph von Stuckrad     * * |nickname |<st...@math.fu-berlin.de>\
Freie Universitaet Berlin  |/_*|'stucki' |Tel(days):+49 30 838-75 459|
Fachbereich Mathematik, EDV|\ *|if online|Tel(else):+49 30 77 39 6600|
Arnimallee 2-6/14195 Berlin* * |on IRCnet|Fax(alle):+49 30 838-75454/

Re: Time for my monthly beating again...

Posted by Joe Flowers <fl...@social.chass.ncsu.edu>.

Michael,

I apologize for the perceived or real hostility....People have told me 
of that implementation before, which that implemenation is perfectly 
fine with me. More power to them, best wishes, and all the best. Let's 
put some added value into NetMail which I think is a great product and 
help some people out. But, it always seems (perhaps falsely) to come in 
the context of "why are you wasting your time?" Actually, my hope is 
that I can help SpamAssassin and any other implementations including the 
one you've identified as much as I possibly can. It's only going to help 
me and many others in the end.

Again, I apologize Michael, but I do hope you understand that from my 
perspective, what I've done is not a waste of time.

Sincerely,

Joe

Michael Parker wrote:

>On Sat, Feb 19, 2005 at 01:16:39AM -0500, Joe Flowers wrote:
>  
>
>>I know of that implemenation. And, I'm sure there are pluses and minus 
>>to both implementations.
>>
>>I've already tested my replacement spamd on SA 3.02 and it works the 
>>same with no problems found.
>>I know there are a deprecated call or two (get_hits for example) but I 
>>see no reason that the new calls won't work fine.
>>
>>I bet my tiny little C program is a lot faster than the spamd 
>>implementation with a fraction of the resource consumption and problems.
>>
>>Also, a very significant part of my server-side load is not being 
>>shouldered by the already heavily burdened NMAP NetMail Agent.
>>
>>Do as you wish, but I would bet my ragged little implementation is built 
>>on a potentially much much faster and much more scalable and much more 
>>generic (say many more options) foundation.
>>
>>i.e., I fear not.
>>
>>    
>>
>
>Wow, not sure where the hostility came from.  I'm sure your code is
>much better than anything existing.  I was just providing a pointer to
>one Netmail implementation.  I myself didn't know about it til today,
>so just spreading the love.
>
>Michael
>  
>

Re: Time for my monthly beating again...

Posted by Michael Parker <pa...@pobox.com>.

On Sat, Feb 19, 2005 at 01:16:39AM -0500, Joe Flowers wrote:
> I know of that implemenation. And, I'm sure there are pluses and minus 
> to both implementations.
> 
> I've already tested my replacement spamd on SA 3.02 and it works the 
> same with no problems found.
> I know there are a deprecated call or two (get_hits for example) but I 
> see no reason that the new calls won't work fine.
> 
> I bet my tiny little C program is a lot faster than the spamd 
> implementation with a fraction of the resource consumption and problems.
> 
> Also, a very significant part of my server-side load is not being 
> shouldered by the already heavily burdened NMAP NetMail Agent.
> 
> Do as you wish, but I would bet my ragged little implementation is built 
> on a potentially much much faster and much more scalable and much more 
> generic (say many more options) foundation.
> 
> i.e., I fear not.
> 

Wow, not sure where the hostility came from.  I'm sure your code is
much better than anything existing.  I was just providing a pointer to
one Netmail implementation.  I myself didn't know about it til today,
so just spreading the love.

Michael

Re: Time for my monthly beating again...

Posted by Joe Flowers <fl...@social.chass.ncsu.edu>.

I know of that implemenation. And, I'm sure there are pluses and minus 
to both implementations.

I've already tested my replacement spamd on SA 3.02 and it works the 
same with no problems found.
I know there are a deprecated call or two (get_hits for example) but I 
see no reason that the new calls won't work fine.

I bet my tiny little C program is a lot faster than the spamd 
implementation with a fraction of the resource consumption and problems.

Also, a very significant part of my server-side load is not being 
shouldered by the already heavily burdened NMAP NetMail Agent.

Do as you wish, but I would bet my ragged little implementation is built 
on a potentially much much faster and much more scalable and much more 
generic (say many more options) foundation.

i.e., I fear not.

Joe

Michael Parker wrote:

>On Sat, Feb 19, 2005 at 12:55:24AM -0500, Joe Flowers wrote:
>  
>
>>I'll try to keep it as short as possible.
>>
>>By my preference and from hearing continuing horror stories about spamd, 
>>I have a C program in the place of spamd. It makes calls to Perl - Perl 
>>is "embedded" in the C program. The C spamd replacement talks to a C 
>>program running on our NetWare NetMail (soon to be Hula) email servers. 
>>Actually, the same Linux box running SpamAssassin uses this spamd 
>>replacement to talk to 3 different email servers over TCP sockets at the 
>>same time.
>>    
>>
>
>I assume you've seen this:
>http://netmail.sourceforge.net/
>
>Old version of spamd, but has all the NMAP magic needed to play nicely
>with Netmail and most likely Hula.
>
>Michael
>  
>

Re: Time for my monthly beating again...

Posted by Michael Parker <pa...@pobox.com>.

On Sat, Feb 19, 2005 at 12:55:24AM -0500, Joe Flowers wrote:
> I'll try to keep it as short as possible.
> 
> By my preference and from hearing continuing horror stories about spamd, 
> I have a C program in the place of spamd. It makes calls to Perl - Perl 
> is "embedded" in the C program. The C spamd replacement talks to a C 
> program running on our NetWare NetMail (soon to be Hula) email servers. 
> Actually, the same Linux box running SpamAssassin uses this spamd 
> replacement to talk to 3 different email servers over TCP sockets at the 
> same time.

I assume you've seen this:
http://netmail.sourceforge.net/

Old version of spamd, but has all the NMAP magic needed to play nicely
with Netmail and most likely Hula.

Michael

Re: Time for my monthly beating again...

Posted by Joe Flowers <fl...@social.chass.ncsu.edu>.

I'll try to keep it as short as possible.

By my preference and from hearing continuing horror stories about spamd, 
I have a C program in the place of spamd. It makes calls to Perl - Perl 
is "embedded" in the C program. The C spamd replacement talks to a C 
program running on our NetWare NetMail (soon to be Hula) email servers. 
Actually, the same Linux box running SpamAssassin uses this spamd 
replacement to talk to 3 different email servers over TCP sockets at the 
same time.

The default SpamAssassin v2.64 
(Mail::SpamAssassin::PerMsgStatus::get_hits) score of 5.0 corresponds to 
51 "+" marks that are placed into each incoming email message header. 
The NetMail "Rule server" on the email servers then filters on those 
pluses. The more pluses, the more likely the message is a Spam message. 
Every user can adjust his or her threshold away from the 51 default. So, 
the user does have some control over his/her own Spam settings. The 
program on the email servers will never put more than 101 "+" marks nor 
less than 1 "+" mark in any email header. If the email message header 
reaches or exceeds the threshold (the number of pluses) set by the user, 
then the message is filtered by the NetMail Rule server and placed in a 
user's "MostlySpam" folder. i.e., server-side filtering.

A SA get_hits score of 0 or less corresponds to 1 "+" mark in the email 
message header.
A SA get_hits score of 10 or more corresponds to 101 "+" marks in the 
email message header.

Right or wrong (?), I thought that since SA defaults to 5.0, then most 
of the crucial action must be happening between 0 and 10. Also, I didn't 
want to deal with NULL problems that are associated with 0 "+" marks in 
the headers, and I didn't want to clog up the headers unnecessarily with 
an ungodly number of pluses, but I still wanted as fine as control as 
possible within a get_hits of 0 and 10 - I didn't want to just discard 
the significant information held in the tenths spot of the get_hits score.

On the spamd replacement side, the average of all of the get_hits of all 
of the messages are stored in a very small ("tiny") text file, along 
with the number of messages contributing to this average number - the 
total number of messages processed.

Basically and roughly:
-----------------------------------------------------------------------------

hits=get_hits; //Mail::SpamAssassin::PerMsgStatus::get_hits

//Let's try to keep control of the "outliers" - prevent the averages 
from being so sensitive to large positive or negative get_hits
// values. Hopefully, the averages will never reach -20 or 30 (the 
"walls"). If they do run into these walls, then we either need to
// adjust, broaden these limits or abandon this technique altogether or 
re-think the implementation.

if(hits<-20.0) {
hits=-20.0;    }

if(hits>30.0) {
hits=30.0;    }

//Read the "OldAverage" and "TotalNumberOfMessagesProcessed" from the 
tiny text file.

NumberOfPluses = (10.0*(hits-OldAverage)) + 51.0;

//Round NumberOfPluses off correctly.
FractionPartOfNumberOfPluses=modf(NumberOfPluses, 
&IntegerPartOfNumberOfPluses);
if(FractionPartOfNumberOfPluses >= 0.5) { 
NumberOfPluses=(NumberOfPluses+1.0); }

//Put an upper and lower bound on the number of pluses (+).
if(NumberOfPluses < 1.0) { NumberOfPluses = 1.0 }
if(NumberOfPluses > 101.0) { NumberOfPluses = 101.0; }

NewAverage=((OldAverage*TotalNumberOfMessagesProcessed) + hits);
TotalNumberOfMessagesProcessed++;
NewAverage=(NewAverage/TotalNumberOfMessagesProcessed);

//Update the "OldAverage" (with NewAverage) and 
"TotalNumberOfMessagesProcessed" in the tiny text file.

-----------------------------------------------------------------------------

That's the heart of it.... I hope that made sense with enough meat.
The jury is still out of course and I've got my fingers crossed, but 
everything is still going very very well.
Right now, we're somewhere around the 15K TotalNumberOfMessagesProcessed 
mark.

Just looking at it, SpamAssassin itself has to be doing an incredibly 
good job at identifying and scoring these messages from a relativistic 
point of view; otherwise, there is no way I would be seeing these great 
results, and I probably would have run into a "wall" long before now.

Joe
------------------------------------------------------------

Joe Emenaker wrote:

> Joe Flowers wrote:
>
>> Very preliminary results are no less than AWESOME.
>
>
> So... how are you implementing the "drifting" spam threshold?
>
> - Joe

Re: Time for my monthly beating again...

Posted by Joe Emenaker <jo...@emenaker.com>.

Joe Flowers wrote:

> Very preliminary results are no less than AWESOME.

So... how are you implementing the "drifting" spam threshold?

- Joe

Re: Time for my monthly beating again...

Posted by Joe Flowers <fl...@social.chass.ncsu.edu>.

Very preliminary results are no less than AWESOME. I'm seeing and people 
are reporting much higher rates of Spam being caught with no *reports* 
of an increase in false-positives. We'll see if that continues; the 
proofs in the pudding. No sign of the dividing line drifting into a wall 
yet. It seems to be drifting between average SA scores of -1.44 to -0.5 
instead of being fixed at 5.00 as before. I hope the SA developers will 
take notice and improve upon the idea.

Joe

Joe Flowers wrote:

> Later today I'll be implementing a "drifting" spam/ham dividing line 
> (one "line" for the entire system - not individually set per email 
> account) to see how effective it is or how effective it appears to be.
>
> I'm curious to know if the dividing line will drift into a wall on 
> some self-imposed boundary edge or if it will converge to a point for 
> us or if it will slowly drift around in circles.
>
> I'm "determining" the dividing line by taking the average of all of 
> the SA hits of all of the messages and changing the dividing line, on 
> the fly, for each subsequent message.
>
> Anyone want to tell me or speculate on how this experiment will end or 
> what it will tell me, whether I'm listening or not?
>
> For us, SA *seems* to score SPAM messages with lower and lower hit 
> scores as time goes by, and the users get more and more glassy-eyed 
> over it's ("my" if you prefer) effectiveness as time goes by too.
>
> I've spent a lot of time with the bayesian stuff and sa-learn, but 
> still it seems to drift downward.
>
> And, I have to agree that SA is very good but requires a lot of 
> attention by someone who knows what they are doing - which, of course, 
> may or may not be me.
>
> Nonetheless, I have this problem before me and am attempting a 
> possible solution.
>
> Joe
>
>
>