You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Daniel Quinlan <qu...@pathname.com> on 2004/09/12 07:51:39 UTC

proposed high-level 3.1 goals

I know we've discussed this some, but since decisions are officially
made on the mailing list, I think it's time to discuss and see if we can
generally agree about our high-level goals for 3.1.  These are not
absolute goals (as in we won't tie our hands too severely), but I think
we are evolving a consensus that our focus needs to shift a bit going
from 3.0 to 3.1.

So, the goals:

 - lower resource usage: higher throughput and lower memory usage
 - higher accuracy: lower FPs and lower FNs (rules, rules, rules... this
   also includes some notion of speeding up the mass-check process)
 - convert optional/non-performance-sensitive code to plugins (I think
   this is lower priority, but we've often talked about it and it also
   helps achieve the first goal of lower resource usage)

And the anti-goals:

 - features: extra options, non-critical changes not related to the
   above goals, etc. (except perhaps in plugins)
 - option bloat (except perhaps in plugins)

We should probably evolve some understanding of what we want to convert
to plugins.  Here's the list mostly based on conversations with Theo,
Justin, and Michael:

 - Razor
 - DCC
 - Pyzor
 - SpamCop reporting
 - nuke AWL and replace with "History" plugin
 - TextCat

Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/

Re: Revisiting high-level 3.1 goals

Posted by Robert Menschel <Ro...@Menschel.net>.
Hello Daniel,

Saturday, January 29, 2005, 9:46:05 PM, you wrote:

>>  - higher accuracy: lower FPs and lower FNs (rules, rules, rules... this
>>    also includes some notion of speeding up the mass-check process)

DQ> I've been banging away on this.  We're closer to fixing the autolearn
DQ> thing and Henry has expressed some interest in coordinating a test of
DQ> perfect (train on everything) and perfect-sample (train on sample)
DQ> learning.

DQ> bin-doph's ReplaceTags plugin will also really help with rule writing, I
DQ> think, so I hope we get that into the tree soon.

DQ> I also now have a working prototype of network-test reuse code and boy
DQ> does it speed up network mass-checks.

Look forward to all of those.  I'm also trying to develop a
"mass-check installation/setup script" of my own, based on what you
were able to give me last year, which will enable people to simply run
a script and build a mass-check system. It will enable people to do
their own mass-checks the way we do in SARE, and it will also enable
them to participate in the primary nightly mass-check run.

My install/setup is still very rough, and has a long way to go, so I
don't want to try to put a time table on it, but I have hopes it will
be a help to people.

Bob Menschel




Re: [SURBL-Discuss] Re: Revisiting high-level 3.1 goals

Posted by Daniel Quinlan <qu...@pathname.com>.
> BTW Any ideas when the last mass check for 3.1 might happen?

No, but it'll be announced in advance.

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/

Re: [SURBL-Discuss] Re: Revisiting high-level 3.1 goals

Posted by Jeff Chan <je...@surbl.org>.
Thanks, I closed the ticket for the JP rule.

BTW Any ideas when the last mass check for 3.1 might happen?

We'd want to take the JP data out of WS before then.

Jeff C.


Re: [SURBL-Discuss] Re: Revisiting high-level 3.1 goals

Posted by Theo Van Dinter <fe...@kluge.net>.
On Sun, Jan 30, 2005 at 05:24:09AM -0800, Jeff Chan wrote:
> I've created a bugzilla 4114 to request a separate JP rule in the
> default config for 3.1.  When that is added we should remove the
> JP data from WS.
> 
> SA devs please give us a heads up when the JP rule is added and
> that will trigger our changes on the SURBL data side.

I responded in the ticket, but JP has had its own rule in 3.1 for ages:

------------------------------------------------------------------------
r47078 | felicity | 2004-09-22 19:27:17 -0400 (Wed, 22 Sep 2004) | 1 line

add in support for surbl jp list
------------------------------------------------------------------------

-- 
Randomly Generated Tagline:
Money is better than poverty, if only for financial reasons.

Re: [SURBL-Discuss] Re: Revisiting high-level 3.1 goals

Posted by Jeff Chan <je...@surbl.org>.
On Sunday, January 30, 2005, 4:34:07 AM, Raymond Dijkxhoorn wrote:

[When SA 3.1?]

>>> Real soon now.

>> OK, let's ask Raymond and Joe to remove the JP data from WS
>> before your final pre-3.1 mass check.  Should we do that now?

>>>> One of the things we planned for it was to move JP data out of WS on
>>>> the SURBL lists.

>>> So, WS includes all of JP?  Or, are JP entries individually considered
>>> and added manually over time to WS?  Or, is the problem something else?

> Please let us know what we should do, cutting out we should announce, the 
> actual removal is just altering one export script...

I've created a bugzilla 4114 to request a separate JP rule in the
default config for 3.1.  When that is added we should remove the
JP data from WS.

SA devs please give us a heads up when the JP rule is added and
that will trigger our changes on the SURBL data side.

Cheers,

Jeff C.


Re: [SURBL-Discuss] Re: Revisiting high-level 3.1 goals

Posted by Jeff Chan <je...@surbl.org>.
On Sunday, January 30, 2005, 8:03:37 PM, Daniel Quinlan wrote:
> Oh crap.  It's *me* that's confused and I'm sure I'll get 5 replies from
> people who don't read all their mail before sending replies telling me
> that.  Anyway, disregard my last message.

> Adding JP to WS was clearly a horrible idea to begin with.  However,
> wasting a bit on this is silly (and I think that's what I'm reacting to
> here), especially considering that Henry and I have been discussing a
> revamp of the SURBL rules where source would not matter and the number
> of bits set would matter -- we'd have to special case this.

Not to worry, I had to remove my foot from my mouth before
I could speak too.  ;-)

I think we have all the cases covered.   After we remove
JP from WS, anyone lacking a separate JP rule and not upgrading
to 3.1 can simply add a JP rule, as we've advised from the
beginning of JP.  It's just that such a change did not get
into the 3.0 release.

Jeff C.


Re: [SURBL-Discuss] Re: Revisiting high-level 3.1 goals

Posted by Daniel Quinlan <qu...@pathname.com>.
Oh crap.  It's *me* that's confused and I'm sure I'll get 5 replies from
people who don't read all their mail before sending replies telling me
that.  Anyway, disregard my last message.

Adding JP to WS was clearly a horrible idea to begin with.  However,
wasting a bit on this is silly (and I think that's what I'm reacting to
here), especially considering that Henry and I have been discussing a
revamp of the SURBL rules where source would not matter and the number
of bits set would matter -- we'd have to special case this.

Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/

Re: [SURBL-Discuss] Re: Revisiting high-level 3.1 goals

Posted by Daniel Quinlan <qu...@pathname.com>.
Justin Mason <jm...@jmason.org> writes:

> btw, I think requiring people to upgrade ASAP isn't necessarily a great
> idea; we can avoid it by setting up a new BL for "WS minus JP".  then
> 3.1.0 can look up

Gah!!! I think you might be confused about this.  There's no issue as
far as I can tell.  JP is the new blacklist and it includes WS, the old
blacklist.  The new blacklist should never have included WS
(#include-style, incidental overlap is okay, of course).

No longer importing WS into JP has no negative effect on any of these
users:

  - people with pre-JP versions: don't have JP anyway
  - people with pre-JP versions who added JP on their own: will still
    have both JP and WS
  - people running SVN HEAD or anything else that included JP already:
    will still have WS!

No new blacklist needed.

Warning is merely a courtesy for any oddballs who are only using JP.

Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/

Re: [SURBL-Discuss] Re: Revisiting high-level 3.1 goals

Posted by Daniel Quinlan <qu...@pathname.com>.
Raymond Dijkxhoorn <ra...@prolocation.net> writes:

> Please let us know what we should do, cutting out we should announce, the 
> actual removal is just altering one export script...

Considering that SA hasn't shipped with JP yet and that those hosts are
already caught in WS (which predates JP), I'd announce that you're
making the change in a week and then make the change.

Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/

Re: [SURBL-Discuss] Re: Revisiting high-level 3.1 goals

Posted by Raymond Dijkxhoorn <ra...@prolocation.net>.
Hi!

>> Real soon now.

> OK, let's ask Raymond and Joe to remove the JP data from WS
> before your final pre-3.1 mass check.  Should we do that now?

>>> One of the things we planned for it was to move JP data out of WS on
>>> the SURBL lists.

>> So, WS includes all of JP?  Or, are JP entries individually considered
>> and added manually over time to WS?  Or, is the problem something else?

Please let us know what we should do, cutting out we should announce, the 
actual removal is just altering one export script...

Bye,
Raymond.

Re: Revisiting high-level 3.1 goals

Posted by Jeff Chan <je...@surbl.org>.
On Saturday, January 29, 2005, 11:29:28 PM, Daniel Quinlan wrote:
> Jeff Chan <je...@surbl.org> writes:

>> Can you give an estimate on when 3.1 is coming out?

> Real soon now.

OK, let's ask Raymond and Joe to remove the JP data from WS
before your final pre-3.1 mass check.  Should we do that now?

>> One of the things we planned for it was to move JP data out of WS on
>> the SURBL lists.

> So, WS includes all of JP?  Or, are JP entries individually considered
> and added manually over time to WS?  Or, is the problem something else?

WS includes all of JP.  They're simply added in right now.
That's because 3.0.0 came out before you could add a separate
rule for JP, and we wanted to get the benefit of JP into 3.0.0.

Is there a separate rule for JP now?  If not would you please
add one?

Jeff C.


Re: Revisiting high-level 3.1 goals

Posted by Daniel Quinlan <qu...@pathname.com>.
Jeff Chan <je...@surbl.org> writes:

> Can you give an estimate on when 3.1 is coming out?

Real soon now.

> One of the things we planned for it was to move JP data out of WS on
> the SURBL lists.

So, WS includes all of JP?  Or, are JP entries individually considered
and added manually over time to WS?  Or, is the problem something else?

> The rules would need to be rescored for that so we'd want to get it in
> before the final mass check for 3.1, in addition to letting people
> know to update their scores manually if they weren't upgrading.

Well, keep us in the loop.

Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/

Re: Revisiting high-level 3.1 goals

Posted by Jeff Chan <je...@surbl.org>.
Can you give an estimate on when 3.1 is coming out?  One of the
things we planned for it was to move JP data out of WS on the
SURBL lists.  The rules would need to be rescored for that so
we'd want to get it in before the final mass check for 3.1,
in addition to letting people know to update their scores
manually if they weren't upgrading.

Jeff C.


Re: Revisiting high-level 3.1 goals

Posted by Theo Van Dinter <fe...@kluge.net>.
On Sat, Jan 29, 2005 at 09:46:05PM -0800, Dan Quinlan wrote:
> And there was a thundering silence on the list in response to that post,
> so maybe we agreed or maybe everyone was having fun back in September.
> Anyway, just wanted to go over it again because I think we're closing in
> on the 3.1 release.

I remember September...  Back when I kind of had time for things.  <sigh>

What happened to the list of stuff we were going to do for 3.1?
I thought it was a wiki doc, but didn't see it on there.  It may have
been an email instead.

Anyway, a few things come to mind that we haven't really done anything
with yet: Short Circuit, lower mass-check time, auto-updates (I have
some more complete notes written about this since the last time BTW),
changing spamd to not need Storable, etc...

IMO, we really need to get the mass-check time down.  SC would be really
nice (would help bring mass-check time down as well), but I don't
think it's a blocker.  Auto-updates can be launched separately since
it's really not part of SA main-line IMO.  I'd really like to get the
Storable requirement bit out.

I'll see if I can't get my update notes together and post it sometime soon.

-- 
Randomly Generated Tagline:
It's not whether you win or lose but how you played the game.
 		-- Grantland Rice

Revisiting high-level 3.1 goals

Posted by Daniel Quinlan <qu...@pathname.com>.
Daniel Quinlan <qu...@pathname.com> writes:

> I know we've discussed this some, but since decisions are officially
> made on the mailing list, I think it's time to discuss and see if we can
> generally agree about our high-level goals for 3.1.  These are not
> absolute goals (as in we won't tie our hands too severely), but I think
> we are evolving a consensus that our focus needs to shift a bit going
> from 3.0 to 3.1.

And there was a thundering silence on the list in response to that post,
so maybe we agreed or maybe everyone was having fun back in September.
Anyway, just wanted to go over it again because I think we're closing in
on the 3.1 release.
 
> So, the goals:
> 
>  - lower resource usage: higher throughput and lower memory usage

Hmmm... we still need to whack the pristine issue and our memory usage
could be improved.  I really like Theo's idea in bug 3978 (Internally
use scalars for Message objects) and there are a few other bugs we
should think about if they don't destabilize the tree to much.

>  - higher accuracy: lower FPs and lower FNs (rules, rules, rules... this
>    also includes some notion of speeding up the mass-check process)

I've been banging away on this.  We're closer to fixing the autolearn
thing and Henry has expressed some interest in coordinating a test of
perfect (train on everything) and perfect-sample (train on sample)
learning.

bin-doph's ReplaceTags plugin will also really help with rule writing, I
think, so I hope we get that into the tree soon.

I also now have a working prototype of network-test reuse code and boy
does it speed up network mass-checks.

>  - convert optional/non-performance-sensitive code to plugins (I think
>    this is lower priority, but we've often talked about it and it also
>    helps achieve the first goal of lower resource usage)

I think everyone has been a working on this.
 
> And the anti-goals:
> 
>  - features: extra options, non-critical changes not related to the
>    above goals, etc. (except perhaps in plugins)
>  - option bloat (except perhaps in plugins)

Seems to be under control.  Mostly.
 
> We should probably evolve some understanding of what we want to convert
> to plugins.  Here's the list mostly based on conversations with Theo,
> Justin, and Michael:
> 
>  - Razor

Done.

>  - DCC

Done.

>  - Pyzor

Done.

>  - SpamCop reporting

Done.

>  - nuke AWL and replace with "History" plugin

Not done.  Michael's working on this, his plan is to create two plugins,
one for the old AWL and one for the new History plugin.

>  - TextCat

Not done, but should be easy enough.  For optional code not always
needed in core, it seems to be a pretty darn small win relative to the
size of the AWL or Bayes.

-- 
Daniel Quinlan
phttp://www.pathname.com/~quinlan/