You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2006/09/06 19:56:03 UTC

Re: Timelines ?

Theo Van Dinter writes:
> We've been doing pretty good recently at getting a new 3.1.x release out
> every month or so, congrats all around. :)   I'd like to keep it up and
> get 3.1.6 released at the start of October, so just an fyi there.
> 
> However, I was thinking about this the other day ...  Keeping the
> stable release updated is great and all, but what's going on with 3.2?
> 
> We generally release a new minor version (3.x) every year or so, and
> it'll be 1 year from 3.1.0 in about 2 weeks.  With the constant updates
> of 3.1, plus the addition of sa-update, we haven't felt a lot of heat
> to get 3.2 out, so we can do the "it's ready when it's ready" thing...
> But I think we need to generally figure out some kind of schedule goal
> that we want to hit.
> 
> So just to throw it out there:  how does end of December or the beginning
> of January sound for a 3.2 release?  Is this achievable?
> 
> I'd like to see us get tickets sorted out -- punt things from 3.2 that
> we won't get to, move things from "future" and "undefined" into 3.2
> appropriately if they should get done first, etc.  I'd also like to see
> us get the major changes between 3.1 and 3.2 documented appropriately --
> API, config options, etc.

End of December strikes me as good!  I'm keen to get 3.2.0 on the road,
alright, so the auto-generated rule updates go live.

--j.

Re: Timelines ?

Posted by Theo Van Dinter <fe...@apache.org>.
On Thu, Sep 07, 2006 at 03:17:40PM +0100, Justin Mason wrote:
> OK -- agreed.  However, my point is that we'd be better off doing that
> work as part of the 3.2.0 development, and later for 3.3.0 -- rather
> than trying to "retrofit" it into 3.1.6 or 3.1.7, I think.
> 
> There's no *need* to keep 3.1.x going, if we can start getting 3.2.0
> released instead.

Sure.  My point was that as far as I can tell, it's the same amount
of work to get it working for 3.[123] as it is for 3.[23], so why not
do that?

> Hmm -- I think I'd need more details of how that'd work -- I'm not
> sure I get it.

Ok.  Keep 3.1 the way it currently is, semi-revert 3.2 to have a rules
directory and we'll put a minimal set of 3.2 rules in there.

External to the normal distro, we have code that generates updates based on
the mass-check results (or whatever else we want to base them on).

> One thing I'd want to avoid is having to set up two separate SVN
> workspaces to get a usable checkout, or having to download two separate
> tarballs to get a usable release.  In my opinion, the core code
> is nearly useless without rules, so there isn't a need to ship it
> without them.

What do you mean by "usable checkout" ?   If you mean you want to do one
checkout to get the engine and all the rules, we can discuss how to do that.
I think the engine with a short amount of core rules (which we could update
manually from the rulesrc area) would work perfectly well.

The original idea to split off the rules stuff into a subproject was to more
concretely separate the two areas that we work on, which we've been working
towards for a while now.  At the moment, the rules stuff is being forced to
integrate with the engine in roughly the same way it did before, which doesn't
make a lot of sense to me.

At ACUS last year, the idea we discussed (and I thought agreed on) was
to split the two areas and use sa-update to deliver all the rules --
during which time I clearly remember at least mentioning the idea that:

- user downloads the engine, installs it
- user runs sa-update, gets the rules
- user starts using SA, keeps running sa-update periodically to get new rules

The idea there was to specifically have no rules distributed with the engine,
and people would have to use sa-update (or download an update manually) to
get the rules.  I think our current methodology of:

- user downloads the engine, installs it, it comes with a core set of rules
- user is encouraged to run sa-update, and then run it periodically as well to
  get the complete set of rules

works well.  It doesn't require us to have everything in the engine
distribution though.

> I was also thinking we should set up some trusted spamtraps to collect
> lots of spam with "live" network test data -- I think most of our spam
> corpora we're mass-checking nowadays is incomplete.  for example, my
> corpora will omit everything that hit SBL+XBL, and Michael's is similarly
> omitting lots of those too.  Nowadays spamtrapping may be the only viable
> way to get a really representative spam corpus....

FWIW, my personal mail and my spamtraps have no filtering other than SA.
I can create new/share some of my current spamtrap addresses if people
want to "spread them around" more than I have (which isn't a lot).

-- 
Randomly Generated Tagline:
"I instigated Linus's first shooting expedition in a long while a few months
 back (I can report that he is a steady, competent shot with a 9mm semi)."
                   - Eric Raymond

Re: Timelines ?

Posted by Doc Schneider <ma...@maddoc.net>.
Theo Van Dinter wrote:
> On Wed, Sep 06, 2006 at 03:08:48PM -0500, Doc Schneider wrote:
>>>       6  0.0 ham-bb-doc.log
>>>    15006  3.9 spam-bb-doc.log
>> Hmm... is my masschecker still working? I had thought it died. Anyway, I 
>> need to figure out how to make mine more automagically work. Things here 
>> have been just crazy.
> 
> The buildbot runs are working apparently -- results range between (EDT):
> 
> Sat Jul 23 01:51:24 2005
> Wed Sep  6 05:11:34 2006
> 
> Perhaps you switched methods?
> 

Yeah could very well be what is happening. Guess I should do an audit of 
my crons and see what is happening where and to whom, eh?

-- 

  -Doc

  Penguins: Do it on the ice.
    3:32pm  up 4 days, 19:05, 15 users,  load average: 0.07, 0.20, 0.33

  SARE HQ  http://www.rulesemporium.com/

Re: Timelines ?

Posted by Theo Van Dinter <fe...@apache.org>.
On Wed, Sep 06, 2006 at 03:08:48PM -0500, Doc Schneider wrote:
> >       6  0.0 ham-bb-doc.log
> >    15006  3.9 spam-bb-doc.log
> 
> Hmm... is my masschecker still working? I had thought it died. Anyway, I 
> need to figure out how to make mine more automagically work. Things here 
> have been just crazy.

The buildbot runs are working apparently -- results range between (EDT):

Sat Jul 23 01:51:24 2005
Wed Sep  6 05:11:34 2006

Perhaps you switched methods?

-- 
Randomly Generated Tagline:
"it's so easy! you click, you kill, you loot!"
         - Gonzo Granzeau paraphasing a friend about Diablo II

Re: Timelines ?

Posted by Doc Schneider <ma...@maddoc.net>.
Theo Van Dinter wrote:
> On Wed, Sep 06, 2006 at 07:07:42PM +0100, Justin Mason wrote:
>> the problem is that it needs to read the rules from rulesrc/sandbox/* --
>> and those rules are pretty dependent in places on the rules in
>> rulesrc/core.  Those rules, in turn, are the 3.2.0 core ruleset, which
>> doesn't mix well with (ie stomps all over) the 3.1.x core ruleset.
>>
>> We could come up with a way to use the 3.2.0 core ruleset in place of
>> the 3.1.x one -- but I think the effort required would be too much, esp.
>> since it's easier to just concentrate on the 3.2.0 release instead.
> 
> I'm not sure I agree with this, and we *need* to solve this problem
> going forward, or else we won't be able to do 3.2 updates when we're
> working on 3.3.
> 
> The rules are pretty version agnostic, except for the ones which have a
> dependency on a plugin or other code change that 3.1 doesn't have.  I think
> it'd be pretty easy to do a run with the 3.2 code and run with the 3.1 code
> and figure out which those are.
> 
> Rules that don't work the same get an "if version" wrapper, the rest can stay
> the way they are.  We can also look at backporting the differences as
> appropriate.
> 
> As for rulesrc, mkrules, etc -- 3.1 doesn't need any of that.  This is also my
> main issue with how 3.2 currently does stuff.  I don't understand why this
> stuff is part of the normal distro.  I like to think of the distro as the
> engine side of the project, and mkrules/rulesrc as the rules side of the
> project, and there's no reason they have to be together.
> 
> So for 3.1, we generate, externally, the rules directory and include it in the
> directory that gets mass-check'ed.  For 3.2, same thing.  Then in the normal
> SA distribution, we don't need the whole svn:external/rulesrc/mkrules/etc
> stuff, it'll just be a rules dir like before.
> 
>> last week featuring data from 7 contributors.
> 
> Hrm.  It still seems like a small number of messages/diversity:
> 
>        6  0.0 ham-bb-doc.log
>    14998 18.9 ham-bb-jm.log
>        6  0.0 ham-bb-zmi.log
>     6357  8.0 ham-cthielen.log
>     1510  1.9 ham-daf.log
>      167  0.2 ham-dos.log
>     1958  2.5 ham-parkerm.log
>    46895 59.0 ham-theo.log
>     2028  2.5 ham-wtogami.log
>     5619  7.1 ham-zmi.log
> 
>     15006  3.9 spam-bb-doc.log
>     15000  3.9 spam-bb-jm.log
>      8358  2.2 spam-bb-zmi.log
>     13783  3.5 spam-cthielen.log
>      6261  1.6 spam-daf.log
>      4676  1.2 spam-dos.log
>     61619 15.9 spam-parkerm.log
>    253448 65.2 spam-theo.log
>      2156  0.6 spam-wtogami.log
>      8359  2.2 spam-zmi.log
> 
> (that's 468210 total, btw)   and why does zmi have two sets of files?
> 
> 
> 

Hmm... is my masschecker still working? I had thought it died. Anyway, I 
need to figure out how to make mine more automagically work. Things here 
have been just crazy.

-- 

  -Doc

  SA/SARE/URIBL/SURBL -- Ninja
    3:04pm  up 4 days, 18:37, 15 users,  load average: 1.07, 0.67, 0.55

  SARE HQ  http://www.rulesemporium.com/

Re: Timelines ?

Posted by Theo Van Dinter <fe...@apache.org>.
On Wed, Sep 06, 2006 at 07:07:42PM +0100, Justin Mason wrote:
> the problem is that it needs to read the rules from rulesrc/sandbox/* --
> and those rules are pretty dependent in places on the rules in
> rulesrc/core.  Those rules, in turn, are the 3.2.0 core ruleset, which
> doesn't mix well with (ie stomps all over) the 3.1.x core ruleset.
> 
> We could come up with a way to use the 3.2.0 core ruleset in place of
> the 3.1.x one -- but I think the effort required would be too much, esp.
> since it's easier to just concentrate on the 3.2.0 release instead.

I'm not sure I agree with this, and we *need* to solve this problem
going forward, or else we won't be able to do 3.2 updates when we're
working on 3.3.

The rules are pretty version agnostic, except for the ones which have a
dependency on a plugin or other code change that 3.1 doesn't have.  I think
it'd be pretty easy to do a run with the 3.2 code and run with the 3.1 code
and figure out which those are.

Rules that don't work the same get an "if version" wrapper, the rest can stay
the way they are.  We can also look at backporting the differences as
appropriate.

As for rulesrc, mkrules, etc -- 3.1 doesn't need any of that.  This is also my
main issue with how 3.2 currently does stuff.  I don't understand why this
stuff is part of the normal distro.  I like to think of the distro as the
engine side of the project, and mkrules/rulesrc as the rules side of the
project, and there's no reason they have to be together.

So for 3.1, we generate, externally, the rules directory and include it in the
directory that gets mass-check'ed.  For 3.2, same thing.  Then in the normal
SA distribution, we don't need the whole svn:external/rulesrc/mkrules/etc
stuff, it'll just be a rules dir like before.

> last week featuring data from 7 contributors.

Hrm.  It still seems like a small number of messages/diversity:

       6  0.0 ham-bb-doc.log
   14998 18.9 ham-bb-jm.log
       6  0.0 ham-bb-zmi.log
    6357  8.0 ham-cthielen.log
    1510  1.9 ham-daf.log
     167  0.2 ham-dos.log
    1958  2.5 ham-parkerm.log
   46895 59.0 ham-theo.log
    2028  2.5 ham-wtogami.log
    5619  7.1 ham-zmi.log

    15006  3.9 spam-bb-doc.log
    15000  3.9 spam-bb-jm.log
     8358  2.2 spam-bb-zmi.log
    13783  3.5 spam-cthielen.log
     6261  1.6 spam-daf.log
     4676  1.2 spam-dos.log
    61619 15.9 spam-parkerm.log
   253448 65.2 spam-theo.log
     2156  0.6 spam-wtogami.log
     8359  2.2 spam-zmi.log

(that's 468210 total, btw)   and why does zmi have two sets of files?



-- 
Randomly Generated Tagline:
"Have you paged him?
  Yes, he hasn't answered ...
  Well, page him again.  It's like a defibrillator -- you don't stop after
  just one time!"
                                 - Theo and Brian O'Neill, 2002.07.15

Re: Timelines ?

Posted by Theo Van Dinter <fe...@apache.org>.
On Wed, Sep 06, 2006 at 06:56:03PM +0100, Justin Mason wrote:
> End of December strikes me as good!  I'm keen to get 3.2.0 on the road,
> alright, so the auto-generated rule updates go live.

Yeah, I think we need to talk about that...  ;)

I'm not sure what the problem is with 3.1 getting auto-updates (isn't it just
parsing the results from a 3.1 mass-check and doing something with the
results?)  I also think there may be an issue wrt the amount of results that
we currently have, or lack thereof.

-- 
Randomly Generated Tagline:
"Can not say.  Saying, I would know.  Do not know, so can not say.
  Very damaged.  Zathras can never have anything nice."
                 - Zathras, Babylon 5, "War Without End I"