You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Paul J Fries <pa...@cwie.net> on 2004/12/29 01:03:36 UTC

Is this a dumb idea?

I was thinking, would it be feasible to have an option for spamassassin
to exit after a message reached the spam threshold?

You would need to run all of the negative scoring (non-spammy) rules
first, and then start running the positive scoring tests. Then once the
message reached the spam threshold, mark it as spam, and move on to the
next one. The idea is that you would not have to run every single
negative scoring test on every message. This should save some CPU
cycles.

I think I remember a similar feature in a (very) early version of SA
back in the day, but it disappeared along time ago.

Anyway, just a thought. Was hoping to get some feedback, and see if any
developer thought it would be practical.

Thanks!

Regards,
Paul Fries
paul@cwie.net

Re: Is this a dumb idea?

Posted by jdow <jd...@earthlink.net>.
From: "Loren Wilton" <lw...@earthlink.net>

> > I was thinking, would it be feasible to have an option for spamassassin
> > to exit after a message reached the spam threshold?
>
> This gets discussed periodically, and I think there may be a bugzilla
ticket
> open on it as an enhamcement request.
>
> It seems there are some implementation annoyances with actually making
> something like this work.  Like network tests, which are started early but
> may not come in until very late.  And Bayes, which can have a positive or
> negative score.  And there are some pathelogical possible cases, like
having
> to run rules that are in metas before the metas can run, so you could
> potentially get a lot of high-scoring rules before they are negated by a
> negative-scoring meta.
>
> I personally don't see any of these objections as really serious problems.
> But then, I'm not a dev, so there are things I could be missing.  It
> certainly is something that would take a little serious thought to make
sure
> that the rule ordering was correct, and you didn't end up bailing too
early.
> And that the cost of rule ordering didn't exceed the cost of simply
running
> all of the rules and looking at the final total!
>
> It isn't clear that you can completely precompute the evaluation order,
> since user rule files can change scores on rules, and users can
(sometimes)
> also have rules of their own.  You would potentially have to compute a
graph
> for each user, and somehow detect when the user's rules or scores have
> changed and you need to recompute the graph.
>
> Of course, one way of doing this would be to have a graph compiler that
you
> were required to run after making rule or score changes.  This would be an
> annoyance, but I don't know that it would be completely unacceptable.

I think the old argument was that some scores are positive and some scores
are negative. They are not sorted by magnitude of score and then applied.
So you can add up quite a score and discover another score in the pipeline
that reduces the message back out of spam range. (Suppose you elected to
give this list a huge negative score after the Apache sysadmin taught the
emails for this list to bypass his silly antispam filter upon realizing
that spam is food for this list. It was fun to see the occasional spam
that seemed to hit every rule in the book to run up a HUGE score on low
scoring rules. If that negative score was applied as one of the last
rules the early drop out would get missed and you'd have to do something
else to get the email from this list.)

{^_^}



Re: Is this a dumb idea?

Posted by Loren Wilton <lw...@earthlink.net>.
> I was thinking, would it be feasible to have an option for spamassassin
> to exit after a message reached the spam threshold?

This gets discussed periodically, and I think there may be a bugzilla ticket
open on it as an enhamcement request.

It seems there are some implementation annoyances with actually making
something like this work.  Like network tests, which are started early but
may not come in until very late.  And Bayes, which can have a positive or
negative score.  And there are some pathelogical possible cases, like having
to run rules that are in metas before the metas can run, so you could
potentially get a lot of high-scoring rules before they are negated by a
negative-scoring meta.

I personally don't see any of these objections as really serious problems.
But then, I'm not a dev, so there are things I could be missing.  It
certainly is something that would take a little serious thought to make sure
that the rule ordering was correct, and you didn't end up bailing too early.
And that the cost of rule ordering didn't exceed the cost of simply running
all of the rules and looking at the final total!

It isn't clear that you can completely precompute the evaluation order,
since user rule files can change scores on rules, and users can (sometimes)
also have rules of their own.  You would potentially have to compute a graph
for each user, and somehow detect when the user's rules or scores have
changed and you need to recompute the graph.

Of course, one way of doing this would be to have a graph compiler that you
were required to run after making rule or score changes.  This would be an
annoyance, but I don't know that it would be completely unacceptable.

        Loren


Re: Is this a dumb idea?

Posted by "Robin Lynn Frank (SA)" <rl...@paradigm-omega.com>.
Paul J Fries wrote:
> I was thinking, would it be feasible to have an option for spamassassin
> to exit after a message reached the spam threshold?
> 
> You would need to run all of the negative scoring (non-spammy) rules
> first, and then start running the positive scoring tests. Then once the
> message reached the spam threshold, mark it as spam, and move on to the
> next one. The idea is that you would not have to run every single
> negative scoring test on every message. This should save some CPU
> cycles.
> 
> I think I remember a similar feature in a (very) early version of SA
> back in the day, but it disappeared along time ago.
> 
> Anyway, just a thought. Was hoping to get some feedback, and see if any
> developer thought it would be practical.
> 
> Thanks!
> 
> Regards,
> Paul Fries
> paul@cwie.net
> 
> 
I'm no developer, but it seems to me that if SA has to calculate if it 
has reached the magic number each time it runs a rule, you might not be 
reducing CPU cycles very much.

-- 
Robin Lynn Frank      Director of Operations
  (0 0)                Paradigm-Omega, LLC
    V                  http://www.paradigm-omega.com/
=====================================================================
Infinite spamtraps at http://paradigm-omega.net/cgi-bin/custmail.cgi
=====================================================================
Commerce is the art of parting a sucker from his money.

Re: Is this a dumb idea?

Posted by Thomas Arend <ml...@arend-whv.info>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am Mittwoch, 29. Dezember 2004 01:03 schrieb Paul J Fries:
> I was thinking, would it be feasible to have an option for spamassassin
> to exit after a message reached the spam threshold?
>
> You would need to run all of the negative scoring (non-spammy) rules
> first, and then start running the positive scoring tests. Then once the
> message reached the spam threshold, mark it as spam, and move on to the
> next one. The idea is that you would not have to run every single
> negative scoring test on every message. This should save some CPU
> cycles.

But one way would be to order the rules by negative and positive scores and 
check if its possible that the remaining checks with negative scores can get 
the message score below the threshold (spam / autolearn). If not, bail out. 
But you would loose statistical data.

There are a lot of rules. Maybe it would not save much processing cycles.

For proper working of the baysian filter you should reach the auto_learn 
threshold.


Regards

Thomas Arend

[]

> Thanks!
>
> Regards,
> Paul Fries
> paul@cwie.net

- -- 
icq:133073900
http://www.t-arend.de
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFB0lFGHe2ZLU3NgHsRAjl1AJ9NlbW4UeTuj1/O53AMkDsfvvxvuQCdEYI6
3sXZh+5kjrKoV2Lci0qCmVk=
=ISeK
-----END PGP SIGNATURE-----