You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2004/09/25 20:23:27 UTC

class renaming

So in thinking about the class cleanup we've been wanting to do for a
while; I think the top items on the list (my list at least) are:

  - rename the Mail::SpamAssassin::PerMsgStatus class

  - break it up into multiple, smaller classes

So here's what I propose for the first one.


Rename the Mail::SpamAssassin::PerMsgStatus class
-------------------------------------------------

Initially the purpose was as a "per-message status" object, describing the
results of a scan of one message -- in other words, that message's spam
status -- "is it spam or not?". I think we all now agree the name is
not so hot ;)

It's purpose has eventually turned out to be:

  - (public) methods that actually cause the scan to happen
  - (public) the results of a scan operation
  - (public) message rewriting functionality

  - (internally, plugins) state for a scan operation; plugins and our code
    can store state on this object for the duration of the scan
  - (plugins) a set of APIs to access aspects of the message being
    scanned, and plugin support APIs

  - (internally) methods that implement Eval tests
  - (internally) methods that control how tests are run,
    their ordering etc.
  - (internally) methods that implement the DNS event-driven algorithm
  - (internally) methods that perform auto-learning
  - (internally) methods that compile tests into perl bytecode at runtime
  - (internally) methods that parse aspects of the message
  - (internally) the tests compiled as perl bytecode


So in my opinion, in cases like this where there's lots of internal and
external APIs, it's more sensible to name the class after what it's
external APIs do.   (in fact, most OO design would indicate that this
means you need to refactor out into >1 class. I'm getting to that ;)

So, I think "Mail::SpamAssassin::Scan" is a better name -- the object
returned from M::SpamAssassin::check() is the results of a scan of a
single message.

("Scanner" is another poss, but I think "Scan" is better because we aren't
returning the object that *did* the scan, we're returning the *results* of
the scan.)


The next thing is backwards compatibility.   We can only do this if we
don't break third-party code.   We *can* rename *this* class without
breaking backwards compatibility, thankfully.  Our requirements here are:

- 1. plugins and third-party perl code will very likely contain "use
  Mail::SpamAssassin::PerMsgStatus;" lines, so having some kind of
  "use"able file there, is a MUST.

- 2. there are possibly locations in third-party code where a
  Mail::SpamAssassin::PerMsgStatus object is created other than through
  the Mail::SpamAssassin::check() API, so being able to support that is a
  SHOULD.   
  
- 3. However, the majority of callers should not be creating PerMsgStatus
  objects directly, or depending in any way on the object being of
  that specific type.  (hooray: perl's not strongly typed! ;)

Here's how I propose to do that:

  - rename the current Mail::SpamAssassin::PerMsgStatus class to
    Mail::SpamAssassin::Scan

  - create a "Facade" Mail::SpamAssassin::PerMsgStatus object that is a
    sub-class of ::Scan, with no additional methods or data.  in other
    words, all method calls and member var accesses will fall through into
    ::Scan.

  - If we deprecate any 3.0.0 APIs in the 3.1.0 cycle, we can move
    their "backwards compatibility" methods into that facade class,
    because 3.1.0 code will be "Scan-native".

  - keep the facade object around for a while, at least until the next
    major cycle, because it's super-cheap; we won't even have a "use" line
    for it in our code, so it'll take up roughly 200 bytes on disk and
    that's it.

Sound useful?

in my opinion this is definitely useful in the 3.1.0 tree.




Break it up into multiple, smaller classes
------------------------------------------

OK, part the second.  in my opinion, this is also a very good idea -- as
the XP guys say, PerMsgStatus has a bad "code smell" -- it's a big file
with lots of totally different functionality mushed into one class.  In
fact, it even loads methods from *multiple* files, which is totally
nasty. ;)

Here's some more details about what APIs are on the ::PerMsgStatus object
(or the ::Scan object as it may be renamed):
 
  - (public) methods that actually cause the scan to happen

        check

    This should be left as a public API, but its code moved to a new
    class.   see "(internally) methods that control how tests are run,
    their ordering etc" below.

  - (public) the results of a scan operation

        is_spam
        get_names_of_tests_hit / get_names_of_subtests_hit
        get_score / get_hits
        get_required_score / get_required_hits
        get_autolearn_status / get_report
        get_content_preview
        finish

    These are the main thing that the ::Scan object does, so they stay.

  - (public) message rewriting functionality

        rewrite_mail

    move code into another class; leave this public API on the ::Scan
    object which calls into that class.
    Proposed class name: Mail::SpamAssassin::Scan::Rewriter?

  - (internally, plugins) state for a scan operation; plugins
    and our code can store state on this object for the
    duration of the scan.

        [no methods involved, just the obj itself]

    I don't see any need to change this.

  - (plugins) a set of APIs to access aspects of the message
    being scanned, and plugin support APIs

        get_message / get_current_eval_rule_name
        get_uri_list / get
        create_fulltext_tmpfile

    The first two, I think make sense to leave on ::Scan. Last three,
    possibly move the logic into another class, but leave the API on
    ::Scan.   In particular, get_uri_list() and get() are both
    more appropriate somewhere in the Mail::SpamAssassin::Message
    hierarchy.

  - (internally) methods that implement Eval tests

        [entire contents of EvalTests.pm which
        do this horrible hack of putting themselves into
        the PerMsgStatus namespace]

    move code into another namespace.  Eval tests use the
    PerMsgStatus object as $self, and since they're just
    functions, not objects themselves, that doesn't need to
    change -- they'd still get the ::Scan object as their
    first arg.

    Proposed namespace: Mail::SpamAssassin::Test::Eval?

  - (internally) methods that control how tests are run,
    their ordering etc.

        [parts of check]
        [parts of do_head_tests / etc. ]

    Definitely move.
    Proposed class: Mail::SpamAssassin::TestRunner?
    RunTests? Runner? Scanner?

  - (internally) methods that implement the DNS event-driven algorithm

        [entire contents of Dns.pm which do this horrible hack of putting
        themselves into the PerMsgStatus namespace]

    into Mail::SpamAssassin::TestRunner as above?

  - (internally) methods that perform auto-learning

        learn

    Proposed class: Mail::SpamAssassin::AutoLearn?   (I don't think
    mushing into PerMsgLearner, Bayes, or Mail::SpamAssassin makes
    sense, so a new class would be better.)

  - (internally) methods that compile tests into perl bytecode at runtime

        do_head_tests / do_body_tests / do_body_uri_tests
        do_rawbody_tests etc.

    into Mail::SpamAssassin::TestRunner as above?

  - (internally) the tests compiled as perl bytecode

        MAKE_MONEY_FAST_body_test etc.

    like eval tests, the functions here get the Scan object passed
    in as arg 1, and that doesn't need to change.
    Proposed namespace: Mail::SpamAssassin::Test::Body, etc., with a
    namespace for each type of test?  or should we carry on mushing them
    into one namespace, Mail::SpamAssassin::Test::Compiled?

--j.

Re: class renaming

Posted by Daniel Quinlan <qu...@pathname.com>.
Justin Mason <jm...@jmason.org> writes:

> yes, actually, you're right.

Hmmm... I'm not sure how to take that.  ;-)

>> I'd propose that rewriting be part of a Mail::SpamAssassin::Format
>> class.

> any particular reason for that name?

It was the best name I could come up with.  :-)

>     - Scan: object returned to user
>     - [this class]: object that contains the algorithm and code
>       to run whatever subset of the tests in whatever order
>
> And the idea is that all this logic shouldn't be in the
> simple "results" object we give back to the user.

I was figuring that [this class] would be underneath the "Scan" object.

-- 
Daniel Quinlan                     ApacheCon! 13-17 November (3 SpamAssassin
http://www.pathname.com/~quinlan/  http://www.apachecon.com/  sessions & more)

Re: class renaming

Posted by Theo Van Dinter <fe...@kluge.net>.
On Mon, Sep 27, 2004 at 06:19:03PM +0200, Malte S. Stretz wrote:
> As much as I loved to have this thing renamed, why didn't we do this 
> *before* we released 3.0?  Or to quote you from bug 3668: "there's *no way* 

3.0 dragged on for a long time.  We really should have finished the API
changes beforehand, but what can you do?

> I'd be happy making any of these changes before 4.0.0 ;)"  (Actually, the 
> "no way" is exaggerated but I don't like the idea at this point).

I'm against making large-scale API changes in the 3.x tree at the moment.  The
PMS thing does suck, but it's what we have for now.  I'd like to focus on
getting 3.0 stabilized, and deal with the more pressing issues for 3.1.

If we can do the break-up w/out affecting API, then 3.2 may not be a bad
choice.  Otherwise 4.0 is generally the way to do it IMO.

-- 
Randomly Generated Tagline:
"Who will guard the guards" == "quis custodiet ipsos"

Re: class renaming

Posted by "Malte S. Stretz" <ms...@gmx.net>.
On Monday 27 September 2004 18:19 CET Malte S. Stretz wrote:
> On Sunday 26 September 2004 10:42 CET Daniel Quinlan wrote:
> >[...]
> > Do we really need to do this now?  This is not going to significantly
> > help performance, accuracy, or memory usage, is it?
>
> As much as I loved to have this thing renamed, why didn't we do this
> *before* we released 3.0?  Or to quote you from bug 3668: "there's *no
> way* I'd be happy making any of these changes before 4.0.0 ;)" 

Hrmmm... wrong quoting.  The quote from "you" was of course from Justin, not 
Daniel.

Cheers,
Malte

-- 
[SGT] Simon G. Tatham: "How to Report Bugs Effectively"
      <http://www.chiark.greenend.org.uk/~sgtatham/bugs.html>
[ESR] Eric S. Raymond: "How To Ask Questions The Smart Way"
      <http://www.catb.org/~esr/faqs/smart-questions.html>

Re: class renaming

Posted by "Malte S. Stretz" <ms...@gmx.net>.
On Sunday 26 September 2004 10:42 CET Daniel Quinlan wrote:
>[...]
> Do we really need to do this now?  This is not going to significantly
> help performance, accuracy, or memory usage, is it?

As much as I loved to have this thing renamed, why didn't we do this 
*before* we released 3.0?  Or to quote you from bug 3668: "there's *no way* 
I'd be happy making any of these changes before 4.0.0 ;)"  (Actually, the 
"no way" is exaggerated but I don't like the idea at this point).

Cheers,
Malte

-- 
[SGT] Simon G. Tatham: "How to Report Bugs Effectively"
      <http://www.chiark.greenend.org.uk/~sgtatham/bugs.html>
[ESR] Eric S. Raymond: "How To Ask Questions The Smart Way"
      <http://www.catb.org/~esr/faqs/smart-questions.html>

Re: class renaming

Posted by Daniel Quinlan <qu...@pathname.com>.
jm@jmason.org (Justin Mason) writes:

>   - (public) message rewriting functionality
> 
>         rewrite_mail
> 
>     move code into another class; leave this public API on the ::Scan
>     object which calls into that class.
>     Proposed class name: Mail::SpamAssassin::Scan::Rewriter?

Rewriting should not be part of the Scan object.  I'd propose that
rewriting be part of a Mail::SpamAssassin::Format class.
 
>   - (internally) methods that implement Eval tests
> 
>         [entire contents of EvalTests.pm which
>         do this horrible hack of putting themselves into
>         the PerMsgStatus namespace]
> 
>     move code into another namespace.  Eval tests use the
>     PerMsgStatus object as $self, and since they're just
>     functions, not objects themselves, that doesn't need to
>     change -- they'd still get the ::Scan object as their
>     first arg.
> 
>     Proposed namespace: Mail::SpamAssassin::Test::Eval?

Just Mail::SpamAssassin::Tests.pm ?
 
>   - (internally) methods that control how tests are run,
>     their ordering etc.
> 
>         [parts of check]
>         [parts of do_head_tests / etc. ]
> 
>     Definitely move.
>     Proposed class: Mail::SpamAssassin::TestRunner?
>     RunTests? Runner? Scanner?

Shouldn't this just be part of "Scan"?
 
>   - (internally) methods that implement the DNS event-driven algorithm
> 
>         [entire contents of Dns.pm which do this horrible hack of putting
>         themselves into the PerMsgStatus namespace]
> 
>     into Mail::SpamAssassin::TestRunner as above?

I'd say this belongs in the EvalTests module, wherever it ends up.
 
>   - (internally) methods that perform auto-learning
> 
>         learn
> 
>     Proposed class: Mail::SpamAssassin::AutoLearn?   (I don't think
>     mushing into PerMsgLearner, Bayes, or Mail::SpamAssassin makes
>     sense, so a new class would be better.)

I think there's too much breaking up of stuff here.  Bayes would be
fine.
 
Do we really need to do this now?  This is not going to significantly
help performance, accuracy, or memory usage, is it?

What's the effect on stability?

How does this affect our release cycle?

Daniel

-- 
Daniel Quinlan                     ApacheCon! 13-17 November (3 SpamAssassin
http://www.pathname.com/~quinlan/  http://www.apachecon.com/  sessions & more)

Re: class renaming

Posted by Daniel Quinlan <qu...@pathname.com>.
"Loren Wilton" <lw...@earthlink.net> writes:

> Personally I'd prefer Mail::SpamAssassin::Assassinate.  No, it doesn't make
> any obvious sense, but a project isn't worthwhile if you can't have one
> class somewhere with a fun name that everyone "just knows what it does".
> 
> Although that might be a better name for the TestRunner or Rewriter.

It's all fun and games until you have to type the name 10,000 times
during development.  The "SpamAssassin" directory name is bad enough.
;-)

Daniel

-- 
Daniel Quinlan                     ApacheCon! 13-17 November (3 SpamAssassin
http://www.pathname.com/~quinlan/  http://www.apachecon.com/  sessions & more)

Re: class renaming

Posted by Loren Wilton <lw...@earthlink.net>.
>   - rename the current Mail::SpamAssassin::PerMsgStatus class to
>     Mail::SpamAssassin::Scan

Personally I'd prefer Mail::SpamAssassin::Assassinate.  No, it doesn't make
any obvious sense, but a project isn't worthwhile if you can't have one
class somewhere with a fun name that everyone "just knows what it does".

Although that might be a better name for the TestRunner or Rewriter.

        Loren