You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2004/08/26 05:23:10 UTC

Re: Re[2]: daily updates

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Robert Menschel writes:
> Tuesday, August 24, 2004, 7:02:00 PM, you wrote:
> JM> That's the main issue that we had in the past with "external"
> JM> rulesets -- most of those were developed without measuring accuracy,
> JM> and once tested they don't come out too hot.  But from what I can see
> JM> (from outside) it looks like you all have been doing that for a
> JM> while, which is cool.
> 
> Yes.  We post the rules to each other, run them through two or more
> (usually three) corpora, and use the combined results to determine
> whether rules are viable. (We're hoping to add a fourth corpus soon.)
> 
> Viable to us is less strict than viable to the development team, lower
> thresholds, but the basic philosophies are the same, I think.

Yep, from what I can see, agreed ;)

> JM> (BTW I should qualify what Daniel means by "non-heavyweight" -- in
> JM> other words, the rule doesn't greatly affect speed/RAM usage.  I
> JM> think that's what he means at least.)
> 
> Also important to us.  My system, for instance, does a comprehensive
> mass-check on a single rule to dozens of rules in about half an hour. If
> any rule causes a noticeable jump in this performance measure, we either
> fix it or toss it.
> 
> (I can't really measure RAM usage on my system, but the same concern
> applies.)
> 
> We've also been trying to some extent to document a rule's history, so we
> know whether it came from a CLA member or elsewhere. We're discussing
> ways of making that more formal.

The version 2 Apache license has some text allowed "trivial" contributions
to not require CLAs -- but then, what's the definition of "trivial"? we
haven't got a really good definition of that as it applies to rules yet,
unfortunately ;)

> JM> If we can work something out, that'll be great ;)
> 
> We're all agreed about that.  I'm hopeful we can.

cool.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFBLVedQTcbUG5Y7woRAoR2AKDibOVadoq72mOsUiSRc1eMVM2SEwCfU312
iiAoHwRTNkPlN0RX/yODUU0=
=VPad
-----END PGP SIGNATURE-----


Re: Re[2]: daily updates

Posted by Daniel Quinlan <qu...@pathname.com>.
Loren Wilton <lw...@earthlink.net> writes:

> I'd suggest as a lower limit that a contribution of a single rule
> ought to be 'trivial' in the CLA sense, even if it does happen to tag
> 50% of the spam and no ham.
>
> 30 rules might be a different quesiton.

The "trivial" thing is an Apache guideline.  Basically, if the
contribution is trivial, we're okay with relying on the Apache License
2.0 which includes this bit:

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

If it's non-trivial (and series of trivial contributions is generally
considered non-trivial), the ASF requires that we get a signed
Contributor License Agreement.

For the trivial case, I should also note that:

 - Rules are often (usually?) not derivative works of SpamAssassin,
   therefore the Apache license does not necessarily apply.
 - The rules have to be intentionally submitted **to the ASF**.

Also, even if we know who wrote it and they submitted a CLA previously,
we still need them to okay or make the contribution.  A CLA is not a
license for us to grab anything.  :-)

One way to simplify the process of getting a new untested rule all the
way into the ASF would be to do new untested rule development in the ASF
rather than outside of the ASF.  Once it's in the door and legally
kosher, any ASF project can use a contribution.

Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/

Re[4]: daily updates

Posted by Robert Menschel <Ro...@Menschel.net>.
Hello Loren,

Wednesday, August 25, 2004, 9:32:26 PM, you wrote:

>> The version 2 Apache license has some text allowed "trivial" contributions
>> to not require CLAs -- but then, what's the definition of "trivial"? we
>> haven't got a really good definition of that as it applies to rules yet,
>> unfortunately ;)

LW> I'd suggest as a lower limit that a contribution of a single rule ought to
LW> be 'trivial' in the CLA sense, even if it does happen to tag 50% of the spam
LW> and no ham.

A rule like
> header T_DOUBLE_USCORES  Subject =~ /__/
yes.  A rule that had seven or eight negative look-aheads, eight or nine
character classes, and nine or ten alternatives, might be the type of
rule that requires a CLA.

I'm more concerned with the method of submission. If a single rule is
submitted with the intent that it be available to the entire community,
that's good for me. If we don't know that, however, it's possible the
submitter /meant/ it to be available to individual systems that are doing
their own anti-spam work, but does *not* want it to be included in any
distribution that will be incorporated into and then sold as a commercial
product.

That's why I'm trying to track authorship of the SARE rules I manage, so
we can find out whether the original author has any objections to full
SA distribution if/when appropriate.

Bob Menschel




Re: Re[2]: daily updates

Posted by Loren Wilton <lw...@earthlink.net>.
> The version 2 Apache license has some text allowed "trivial" contributions
> to not require CLAs -- but then, what's the definition of "trivial"? we
> haven't got a really good definition of that as it applies to rules yet,
> unfortunately ;)

I'd suggest as a lower limit that a contribution of a single rule ought to
be 'trivial' in the CLA sense, even if it does happen to tag 50% of the spam
and no ham.

30 rules might be a different quesiton.

        Loren