You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Joe Flowers <fl...@social.chass.ncsu.edu> on 2005/02/16 14:26:43 UTC
Time for my monthly beating again...
Later today I'll be implementing a "drifting" spam/ham dividing line
(one "line" for the entire system - not individually set per email
account) to see how effective it is or how effective it appears to be.
I'm curious to know if the dividing line will drift into a wall on some
self-imposed boundary edge or if it will converge to a point for us or
if it will slowly drift around in circles.
I'm "determining" the dividing line by taking the average of all of the
SA hits of all of the messages and changing the dividing line, on the
fly, for each subsequent message.
Anyone want to tell me or speculate on how this experiment will end or
what it will tell me, whether I'm listening or not?
For us, SA *seems* to score SPAM messages with lower and lower hit
scores as time goes by, and the users get more and more glassy-eyed over
it's ("my" if you prefer) effectiveness as time goes by too.
I've spent a lot of time with the bayesian stuff and sa-learn, but still
it seems to drift downward.
And, I have to agree that SA is very good but requires a lot of
attention by someone who knows what they are doing - which, of course,
may or may not be me.
Nonetheless, I have this problem before me and am attempting a possible
solution.
Joe
Re: Time for my monthly beating again...
Posted by Martin Hepworth <ma...@solid-state-logic.com>.
Joe
ahh well then....the additional rules from www.rulemporium.com (not
bigevil.cf) will help alot.
as will the URI-RBL extras from www.surbl.org (see
http://sourceforge.net/projects/spamcopuri/ for a 2.64 patch to enable
this).
--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300
Joe Flowers wrote:
> 2.64 currently...I'm hoping to move to 3.0x soon...after I see how this
> experiment goes.
> It's just a plain-jane out-of-the-box install, nothing special, except
> maybe I'm doing AWL checks too, which I've seen from the list can cause
> some headaches from its use or misuse. Although, I have run this without
> AWL and the same drifting downward seemed to be occuring then too.
>
> Oh yeah, I think I've disabled the ALL_TRUSTED rule...can't think of
> anything else.
>
> J
>
> Martin Hepworth wrote:
>
>> Joe
>>
>> what SA version and what extra rules? Using the URI-RBL's?
>>
>> --
>> Martin Hepworth
>> Snr Systems Administrator
>> Solid State Logic
>> Tel: +44 (0)1865 842300
>>
>>
>> Joe Flowers wrote:
>>
>>> Later today I'll be implementing a "drifting" spam/ham dividing line
>>> (one "line" for the entire system - not individually set per email
>>> account) to see how effective it is or how effective it appears to be.
>>>
>>> I'm curious to know if the dividing line will drift into a wall on
>>> some self-imposed boundary edge or if it will converge to a point for
>>> us or if it will slowly drift around in circles.
>>>
>>> I'm "determining" the dividing line by taking the average of all of
>>> the SA hits of all of the messages and changing the dividing line, on
>>> the fly, for each subsequent message.
>>>
>>> Anyone want to tell me or speculate on how this experiment will end
>>> or what it will tell me, whether I'm listening or not?
>>>
>>> For us, SA *seems* to score SPAM messages with lower and lower hit
>>> scores as time goes by, and the users get more and more glassy-eyed
>>> over it's ("my" if you prefer) effectiveness as time goes by too.
>>>
>>> I've spent a lot of time with the bayesian stuff and sa-learn, but
>>> still it seems to drift downward.
>>>
>>> And, I have to agree that SA is very good but requires a lot of
>>> attention by someone who knows what they are doing - which, of
>>> course, may or may not be me.
>>>
>>> Nonetheless, I have this problem before me and am attempting a
>>> possible solution.
>>>
>>> Joe
>>>
>>>
>>
>> **********************************************************************
>>
>> This email and any files transmitted with it are confidential and
>> intended solely for the use of the individual or entity to whom they
>> are addressed. If you have received this email in error please notify
>> the system manager.
>>
>> This footnote confirms that this email message has been swept
>> for the presence of computer viruses and is believed to be clean.
>>
>> **********************************************************************
>>
>>
>
>
**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.
**********************************************************************
Re: Time for my monthly beating again...
Posted by Joe Flowers <fl...@social.chass.ncsu.edu>.
2.64 currently...I'm hoping to move to 3.0x soon...after I see how this
experiment goes.
It's just a plain-jane out-of-the-box install, nothing special, except
maybe I'm doing AWL checks too, which I've seen from the list can cause
some headaches from its use or misuse. Although, I have run this without
AWL and the same drifting downward seemed to be occuring then too.
Oh yeah, I think I've disabled the ALL_TRUSTED rule...can't think of
anything else.
J
Martin Hepworth wrote:
> Joe
>
> what SA version and what extra rules? Using the URI-RBL's?
>
> --
> Martin Hepworth
> Snr Systems Administrator
> Solid State Logic
> Tel: +44 (0)1865 842300
>
>
> Joe Flowers wrote:
>
>> Later today I'll be implementing a "drifting" spam/ham dividing line
>> (one "line" for the entire system - not individually set per email
>> account) to see how effective it is or how effective it appears to be.
>>
>> I'm curious to know if the dividing line will drift into a wall on
>> some self-imposed boundary edge or if it will converge to a point for
>> us or if it will slowly drift around in circles.
>>
>> I'm "determining" the dividing line by taking the average of all of
>> the SA hits of all of the messages and changing the dividing line, on
>> the fly, for each subsequent message.
>>
>> Anyone want to tell me or speculate on how this experiment will end
>> or what it will tell me, whether I'm listening or not?
>>
>> For us, SA *seems* to score SPAM messages with lower and lower hit
>> scores as time goes by, and the users get more and more glassy-eyed
>> over it's ("my" if you prefer) effectiveness as time goes by too.
>>
>> I've spent a lot of time with the bayesian stuff and sa-learn, but
>> still it seems to drift downward.
>>
>> And, I have to agree that SA is very good but requires a lot of
>> attention by someone who knows what they are doing - which, of
>> course, may or may not be me.
>>
>> Nonetheless, I have this problem before me and am attempting a
>> possible solution.
>>
>> Joe
>>
>>
>
> **********************************************************************
>
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they
> are addressed. If you have received this email in error please notify
> the system manager.
>
> This footnote confirms that this email message has been swept
> for the presence of computer viruses and is believed to be clean.
>
> **********************************************************************
>
>
Re: Time for my monthly beating again...
Posted by Martin Hepworth <ma...@solid-state-logic.com>.
Joe
what SA version and what extra rules? Using the URI-RBL's?
--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300
Joe Flowers wrote:
> Later today I'll be implementing a "drifting" spam/ham dividing line
> (one "line" for the entire system - not individually set per email
> account) to see how effective it is or how effective it appears to be.
>
> I'm curious to know if the dividing line will drift into a wall on some
> self-imposed boundary edge or if it will converge to a point for us or
> if it will slowly drift around in circles.
>
> I'm "determining" the dividing line by taking the average of all of the
> SA hits of all of the messages and changing the dividing line, on the
> fly, for each subsequent message.
>
> Anyone want to tell me or speculate on how this experiment will end or
> what it will tell me, whether I'm listening or not?
>
> For us, SA *seems* to score SPAM messages with lower and lower hit
> scores as time goes by, and the users get more and more glassy-eyed over
> it's ("my" if you prefer) effectiveness as time goes by too.
>
> I've spent a lot of time with the bayesian stuff and sa-learn, but still
> it seems to drift downward.
>
> And, I have to agree that SA is very good but requires a lot of
> attention by someone who knows what they are doing - which, of course,
> may or may not be me.
>
> Nonetheless, I have this problem before me and am attempting a possible
> solution.
>
> Joe
>
>
**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.
**********************************************************************
Re: Time for my monthly beating again...
Posted by Kevin Peuhkurinen <ke...@hepcoe.com>.
Hey Joe. My 2.64 install is running so well, I almost don't want to
upgrade to 3.0.2, and I really don't need to spend too much time on it
to keep it that way. Perhaps you just need to devote a couple of
days to do some tweaking and thereafter it should run well on its own.
Finding out what works for others and taking the time to implement it
would probably be a better use of your time than to attempt the
experiment you are suggesting.
My current set up is: SA runs as a relay, checking email then passing
it on to our Exchange server. I have the following extra rulesets
taken from rulesemporium.com:
70_sare_adult.cf
70_sare_html0.cf
70_sare_bayes_poison_nxm.cf
70_sare_spoof.cf
70_sare_genlsubj0.cf
70_sare_html1.cf
72_sare_bml_post25x.cf
70_sare_header0.cf
70_sare_random.cf
I am also using DCC, Razor2, and SpamcopURI. My Bayes database is
global with autolearning and I am not using AWL. I've tweaked the
scores on some of the tests and disabled a few tests.
I have it set up such that anything that hits 3.5 or higher is consider
spam. Anything that scores 8 or higher, which is the vast majority of
spam (about 2000 emails per day) is kept on the SA server and a script
automatically deletes any emails over 2 weeks old. Anything from 3.5 to
7 is sent to a special mailbox on my Exchange server. This is about
100-200 emails per day. I spend about five minutes each morning
glancing through these emails looking for false positives, of which I
see at most one per week. These are forwarded to the correct recipient
and copies of them are placed into a special "false negative" folder.
I also have a special "false positive" folder into which my users can
drag and drop spam that gets through to them. On a typical day I see
between five and ten emails put into there, which out of about 450 users
is pretty darn good. I spend another five minutes each afternoon
looking at these emails and making sure that they are in fact spam. A
script on the SA server runs automatically each night and feeds the
false negatives and positives through sa-learn. If I start to see a
bunch of similar spams getting through, I'll spend an hour or two
writing, testing, and deploying a rule to catch them. This happens
about once a month.
So, all in all I generally spend 10-15 minutes per day looking after SA
while achieving very satisfactory false positive and negative rates.
The only reason I'm bothering to upgrade is that I'm starting to see
some SA time outs as my production server is a 350mhz clunker. I got
myself a brand new 3Ghz server to take over and figure that I ought to
do the upgrade to 3.0.2 at the same time.
Joe Flowers wrote:
> Later today I'll be implementing a "drifting" spam/ham dividing line
> (one "line" for the entire system - not individually set per email
> account) to see how effective it is or how effective it appears to be.
>
> I'm curious to know if the dividing line will drift into a wall on
> some self-imposed boundary edge or if it will converge to a point for
> us or if it will slowly drift around in circles.
>
> I'm "determining" the dividing line by taking the average of all of
> the SA hits of all of the messages and changing the dividing line, on
> the fly, for each subsequent message.
>
> Anyone want to tell me or speculate on how this experiment will end or
> what it will tell me, whether I'm listening or not?
>
> For us, SA *seems* to score SPAM messages with lower and lower hit
> scores as time goes by, and the users get more and more glassy-eyed
> over it's ("my" if you prefer) effectiveness as time goes by too.
>
> I've spent a lot of time with the bayesian stuff and sa-learn, but
> still it seems to drift downward.
>
> And, I have to agree that SA is very good but requires a lot of
> attention by someone who knows what they are doing - which, of course,
> may or may not be me.
>
> Nonetheless, I have this problem before me and am attempting a
> possible solution.
>
> Joe
>
>
>
Re: Time for my monthly beating again...
Posted by Joe Flowers <fl...@social.chass.ncsu.edu>.
Interesting Chris...thanks for the feedback...at least maybe I'm still
on the planet somewhere..
My "monthly" word means that I've been feeling too good about myself
lately, so I'm due for a slap-down on how dumb I am.
J
Chr. von Stuckrad wrote:
>On Wed, Feb 16, 2005 at 08:26:43AM -0500, Joe Flowers wrote:
>
>
>>For us, SA *seems* to score SPAM messages with lower and lower hit
>>scores as time goes by, and the users get more and more glassy-eyed over
>>it's ("my" if you prefer) effectiveness as time goes by too.
>>
>>
>
>OH, interesting, I think I had the same effect with a
>GLOBAL bayes database deteriorating slowly
>(being slowly poisoned, I assumed).
>
>I have two completely identical servers, one *working*
>as virus-, the other as spam-filter (so both can be switched
>to do both, if one breaks).
>
>But only the 'actualy running spamfilter' has the
>'actual/current' bayes database.
>
>Testing the same new/undetected spam gave me slowly decreasing
>Values on the 'learning' and 'nearly the same(*)' values on
>the 'not-learning' machine! (* both hosts get my updated configs,
>so values changed anyway).
>
>I simply dropped the 'rotten' bayes database and the problem
>went away ... I'm waiting what comes up now ...
>(does your subject ipmply the poison to work monthly?)
>
>Stucki
>
>
>
Re: Time for my monthly beating again...
Posted by "Chr. von Stuckrad" <st...@mi.fu-berlin.de>.
On Wed, Feb 16, 2005 at 08:26:43AM -0500, Joe Flowers wrote:
> For us, SA *seems* to score SPAM messages with lower and lower hit
> scores as time goes by, and the users get more and more glassy-eyed over
> it's ("my" if you prefer) effectiveness as time goes by too.
OH, interesting, I think I had the same effect with a
GLOBAL bayes database deteriorating slowly
(being slowly poisoned, I assumed).
I have two completely identical servers, one *working*
as virus-, the other as spam-filter (so both can be switched
to do both, if one breaks).
But only the 'actualy running spamfilter' has the
'actual/current' bayes database.
Testing the same new/undetected spam gave me slowly decreasing
Values on the 'learning' and 'nearly the same(*)' values on
the 'not-learning' machine! (* both hosts get my updated configs,
so values changed anyway).
I simply dropped the 'rotten' bayes database and the problem
went away ... I'm waiting what comes up now ...
(does your subject ipmply the poison to work monthly?)
Stucki
--
Christoph von Stuckrad * * |nickname |<st...@math.fu-berlin.de>\
Freie Universitaet Berlin |/_*|'stucki' |Tel(days):+49 30 838-75 459|
Fachbereich Mathematik, EDV|\ *|if online|Tel(else):+49 30 77 39 6600|
Arnimallee 2-6/14195 Berlin* * |on IRCnet|Fax(alle):+49 30 838-75454/
Re: Time for my monthly beating again...
Posted by Joe Flowers <fl...@social.chass.ncsu.edu>.
Michael,
I apologize for the perceived or real hostility....People have told me
of that implementation before, which that implemenation is perfectly
fine with me. More power to them, best wishes, and all the best. Let's
put some added value into NetMail which I think is a great product and
help some people out. But, it always seems (perhaps falsely) to come in
the context of "why are you wasting your time?" Actually, my hope is
that I can help SpamAssassin and any other implementations including the
one you've identified as much as I possibly can. It's only going to help
me and many others in the end.
Again, I apologize Michael, but I do hope you understand that from my
perspective, what I've done is not a waste of time.
Sincerely,
Joe
Michael Parker wrote:
>On Sat, Feb 19, 2005 at 01:16:39AM -0500, Joe Flowers wrote:
>
>
>>I know of that implemenation. And, I'm sure there are pluses and minus
>>to both implementations.
>>
>>I've already tested my replacement spamd on SA 3.02 and it works the
>>same with no problems found.
>>I know there are a deprecated call or two (get_hits for example) but I
>>see no reason that the new calls won't work fine.
>>
>>I bet my tiny little C program is a lot faster than the spamd
>>implementation with a fraction of the resource consumption and problems.
>>
>>Also, a very significant part of my server-side load is not being
>>shouldered by the already heavily burdened NMAP NetMail Agent.
>>
>>Do as you wish, but I would bet my ragged little implementation is built
>>on a potentially much much faster and much more scalable and much more
>>generic (say many more options) foundation.
>>
>>i.e., I fear not.
>>
>>
>>
>
>Wow, not sure where the hostility came from. I'm sure your code is
>much better than anything existing. I was just providing a pointer to
>one Netmail implementation. I myself didn't know about it til today,
>so just spreading the love.
>
>Michael
>
>
Re: Time for my monthly beating again...
Posted by Michael Parker <pa...@pobox.com>.
On Sat, Feb 19, 2005 at 01:16:39AM -0500, Joe Flowers wrote:
> I know of that implemenation. And, I'm sure there are pluses and minus
> to both implementations.
>
> I've already tested my replacement spamd on SA 3.02 and it works the
> same with no problems found.
> I know there are a deprecated call or two (get_hits for example) but I
> see no reason that the new calls won't work fine.
>
> I bet my tiny little C program is a lot faster than the spamd
> implementation with a fraction of the resource consumption and problems.
>
> Also, a very significant part of my server-side load is not being
> shouldered by the already heavily burdened NMAP NetMail Agent.
>
> Do as you wish, but I would bet my ragged little implementation is built
> on a potentially much much faster and much more scalable and much more
> generic (say many more options) foundation.
>
> i.e., I fear not.
>
Wow, not sure where the hostility came from. I'm sure your code is
much better than anything existing. I was just providing a pointer to
one Netmail implementation. I myself didn't know about it til today,
so just spreading the love.
Michael
Re: Time for my monthly beating again...
Posted by Joe Flowers <fl...@social.chass.ncsu.edu>.
I know of that implemenation. And, I'm sure there are pluses and minus
to both implementations.
I've already tested my replacement spamd on SA 3.02 and it works the
same with no problems found.
I know there are a deprecated call or two (get_hits for example) but I
see no reason that the new calls won't work fine.
I bet my tiny little C program is a lot faster than the spamd
implementation with a fraction of the resource consumption and problems.
Also, a very significant part of my server-side load is not being
shouldered by the already heavily burdened NMAP NetMail Agent.
Do as you wish, but I would bet my ragged little implementation is built
on a potentially much much faster and much more scalable and much more
generic (say many more options) foundation.
i.e., I fear not.
Joe
Michael Parker wrote:
>On Sat, Feb 19, 2005 at 12:55:24AM -0500, Joe Flowers wrote:
>
>
>>I'll try to keep it as short as possible.
>>
>>By my preference and from hearing continuing horror stories about spamd,
>>I have a C program in the place of spamd. It makes calls to Perl - Perl
>>is "embedded" in the C program. The C spamd replacement talks to a C
>>program running on our NetWare NetMail (soon to be Hula) email servers.
>>Actually, the same Linux box running SpamAssassin uses this spamd
>>replacement to talk to 3 different email servers over TCP sockets at the
>>same time.
>>
>>
>
>I assume you've seen this:
>http://netmail.sourceforge.net/
>
>Old version of spamd, but has all the NMAP magic needed to play nicely
>with Netmail and most likely Hula.
>
>Michael
>
>
Re: Time for my monthly beating again...
Posted by Michael Parker <pa...@pobox.com>.
On Sat, Feb 19, 2005 at 12:55:24AM -0500, Joe Flowers wrote:
> I'll try to keep it as short as possible.
>
> By my preference and from hearing continuing horror stories about spamd,
> I have a C program in the place of spamd. It makes calls to Perl - Perl
> is "embedded" in the C program. The C spamd replacement talks to a C
> program running on our NetWare NetMail (soon to be Hula) email servers.
> Actually, the same Linux box running SpamAssassin uses this spamd
> replacement to talk to 3 different email servers over TCP sockets at the
> same time.
I assume you've seen this:
http://netmail.sourceforge.net/
Old version of spamd, but has all the NMAP magic needed to play nicely
with Netmail and most likely Hula.
Michael
Re: Time for my monthly beating again...
Posted by Joe Flowers <fl...@social.chass.ncsu.edu>.
I'll try to keep it as short as possible.
By my preference and from hearing continuing horror stories about spamd,
I have a C program in the place of spamd. It makes calls to Perl - Perl
is "embedded" in the C program. The C spamd replacement talks to a C
program running on our NetWare NetMail (soon to be Hula) email servers.
Actually, the same Linux box running SpamAssassin uses this spamd
replacement to talk to 3 different email servers over TCP sockets at the
same time.
The default SpamAssassin v2.64
(Mail::SpamAssassin::PerMsgStatus::get_hits) score of 5.0 corresponds to
51 "+" marks that are placed into each incoming email message header.
The NetMail "Rule server" on the email servers then filters on those
pluses. The more pluses, the more likely the message is a Spam message.
Every user can adjust his or her threshold away from the 51 default. So,
the user does have some control over his/her own Spam settings. The
program on the email servers will never put more than 101 "+" marks nor
less than 1 "+" mark in any email header. If the email message header
reaches or exceeds the threshold (the number of pluses) set by the user,
then the message is filtered by the NetMail Rule server and placed in a
user's "MostlySpam" folder. i.e., server-side filtering.
A SA get_hits score of 0 or less corresponds to 1 "+" mark in the email
message header.
A SA get_hits score of 10 or more corresponds to 101 "+" marks in the
email message header.
Right or wrong (?), I thought that since SA defaults to 5.0, then most
of the crucial action must be happening between 0 and 10. Also, I didn't
want to deal with NULL problems that are associated with 0 "+" marks in
the headers, and I didn't want to clog up the headers unnecessarily with
an ungodly number of pluses, but I still wanted as fine as control as
possible within a get_hits of 0 and 10 - I didn't want to just discard
the significant information held in the tenths spot of the get_hits score.
On the spamd replacement side, the average of all of the get_hits of all
of the messages are stored in a very small ("tiny") text file, along
with the number of messages contributing to this average number - the
total number of messages processed.
Basically and roughly:
-----------------------------------------------------------------------------
hits=get_hits; //Mail::SpamAssassin::PerMsgStatus::get_hits
//Let's try to keep control of the "outliers" - prevent the averages
from being so sensitive to large positive or negative get_hits
// values. Hopefully, the averages will never reach -20 or 30 (the
"walls"). If they do run into these walls, then we either need to
// adjust, broaden these limits or abandon this technique altogether or
re-think the implementation.
if(hits<-20.0) {
hits=-20.0; }
if(hits>30.0) {
hits=30.0; }
//Read the "OldAverage" and "TotalNumberOfMessagesProcessed" from the
tiny text file.
NumberOfPluses = (10.0*(hits-OldAverage)) + 51.0;
//Round NumberOfPluses off correctly.
FractionPartOfNumberOfPluses=modf(NumberOfPluses,
&IntegerPartOfNumberOfPluses);
if(FractionPartOfNumberOfPluses >= 0.5) {
NumberOfPluses=(NumberOfPluses+1.0); }
//Put an upper and lower bound on the number of pluses (+).
if(NumberOfPluses < 1.0) { NumberOfPluses = 1.0 }
if(NumberOfPluses > 101.0) { NumberOfPluses = 101.0; }
NewAverage=((OldAverage*TotalNumberOfMessagesProcessed) + hits);
TotalNumberOfMessagesProcessed++;
NewAverage=(NewAverage/TotalNumberOfMessagesProcessed);
//Update the "OldAverage" (with NewAverage) and
"TotalNumberOfMessagesProcessed" in the tiny text file.
-----------------------------------------------------------------------------
That's the heart of it.... I hope that made sense with enough meat.
The jury is still out of course and I've got my fingers crossed, but
everything is still going very very well.
Right now, we're somewhere around the 15K TotalNumberOfMessagesProcessed
mark.
Just looking at it, SpamAssassin itself has to be doing an incredibly
good job at identifying and scoring these messages from a relativistic
point of view; otherwise, there is no way I would be seeing these great
results, and I probably would have run into a "wall" long before now.
Joe
------------------------------------------------------------
Joe Emenaker wrote:
> Joe Flowers wrote:
>
>> Very preliminary results are no less than AWESOME.
>
>
> So... how are you implementing the "drifting" spam threshold?
>
> - Joe
Re: Time for my monthly beating again...
Posted by Joe Emenaker <jo...@emenaker.com>.
Joe Flowers wrote:
> Very preliminary results are no less than AWESOME.
So... how are you implementing the "drifting" spam threshold?
- Joe
Re: Time for my monthly beating again...
Posted by Joe Flowers <fl...@social.chass.ncsu.edu>.
Very preliminary results are no less than AWESOME. I'm seeing and people
are reporting much higher rates of Spam being caught with no *reports*
of an increase in false-positives. We'll see if that continues; the
proofs in the pudding. No sign of the dividing line drifting into a wall
yet. It seems to be drifting between average SA scores of -1.44 to -0.5
instead of being fixed at 5.00 as before. I hope the SA developers will
take notice and improve upon the idea.
Joe
Joe Flowers wrote:
> Later today I'll be implementing a "drifting" spam/ham dividing line
> (one "line" for the entire system - not individually set per email
> account) to see how effective it is or how effective it appears to be.
>
> I'm curious to know if the dividing line will drift into a wall on
> some self-imposed boundary edge or if it will converge to a point for
> us or if it will slowly drift around in circles.
>
> I'm "determining" the dividing line by taking the average of all of
> the SA hits of all of the messages and changing the dividing line, on
> the fly, for each subsequent message.
>
> Anyone want to tell me or speculate on how this experiment will end or
> what it will tell me, whether I'm listening or not?
>
> For us, SA *seems* to score SPAM messages with lower and lower hit
> scores as time goes by, and the users get more and more glassy-eyed
> over it's ("my" if you prefer) effectiveness as time goes by too.
>
> I've spent a lot of time with the bayesian stuff and sa-learn, but
> still it seems to drift downward.
>
> And, I have to agree that SA is very good but requires a lot of
> attention by someone who knows what they are doing - which, of course,
> may or may not be me.
>
> Nonetheless, I have this problem before me and am attempting a
> possible solution.
>
> Joe
>
>
>