You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2004/09/25 20:23:27 UTC
class renaming
So in thinking about the class cleanup we've been wanting to do for a
while; I think the top items on the list (my list at least) are:
- rename the Mail::SpamAssassin::PerMsgStatus class
- break it up into multiple, smaller classes
So here's what I propose for the first one.
Rename the Mail::SpamAssassin::PerMsgStatus class
-------------------------------------------------
Initially the purpose was as a "per-message status" object, describing the
results of a scan of one message -- in other words, that message's spam
status -- "is it spam or not?". I think we all now agree the name is
not so hot ;)
It's purpose has eventually turned out to be:
- (public) methods that actually cause the scan to happen
- (public) the results of a scan operation
- (public) message rewriting functionality
- (internally, plugins) state for a scan operation; plugins and our code
can store state on this object for the duration of the scan
- (plugins) a set of APIs to access aspects of the message being
scanned, and plugin support APIs
- (internally) methods that implement Eval tests
- (internally) methods that control how tests are run,
their ordering etc.
- (internally) methods that implement the DNS event-driven algorithm
- (internally) methods that perform auto-learning
- (internally) methods that compile tests into perl bytecode at runtime
- (internally) methods that parse aspects of the message
- (internally) the tests compiled as perl bytecode
So in my opinion, in cases like this where there's lots of internal and
external APIs, it's more sensible to name the class after what it's
external APIs do. (in fact, most OO design would indicate that this
means you need to refactor out into >1 class. I'm getting to that ;)
So, I think "Mail::SpamAssassin::Scan" is a better name -- the object
returned from M::SpamAssassin::check() is the results of a scan of a
single message.
("Scanner" is another poss, but I think "Scan" is better because we aren't
returning the object that *did* the scan, we're returning the *results* of
the scan.)
The next thing is backwards compatibility. We can only do this if we
don't break third-party code. We *can* rename *this* class without
breaking backwards compatibility, thankfully. Our requirements here are:
- 1. plugins and third-party perl code will very likely contain "use
Mail::SpamAssassin::PerMsgStatus;" lines, so having some kind of
"use"able file there, is a MUST.
- 2. there are possibly locations in third-party code where a
Mail::SpamAssassin::PerMsgStatus object is created other than through
the Mail::SpamAssassin::check() API, so being able to support that is a
SHOULD.
- 3. However, the majority of callers should not be creating PerMsgStatus
objects directly, or depending in any way on the object being of
that specific type. (hooray: perl's not strongly typed! ;)
Here's how I propose to do that:
- rename the current Mail::SpamAssassin::PerMsgStatus class to
Mail::SpamAssassin::Scan
- create a "Facade" Mail::SpamAssassin::PerMsgStatus object that is a
sub-class of ::Scan, with no additional methods or data. in other
words, all method calls and member var accesses will fall through into
::Scan.
- If we deprecate any 3.0.0 APIs in the 3.1.0 cycle, we can move
their "backwards compatibility" methods into that facade class,
because 3.1.0 code will be "Scan-native".
- keep the facade object around for a while, at least until the next
major cycle, because it's super-cheap; we won't even have a "use" line
for it in our code, so it'll take up roughly 200 bytes on disk and
that's it.
Sound useful?
in my opinion this is definitely useful in the 3.1.0 tree.
Break it up into multiple, smaller classes
------------------------------------------
OK, part the second. in my opinion, this is also a very good idea -- as
the XP guys say, PerMsgStatus has a bad "code smell" -- it's a big file
with lots of totally different functionality mushed into one class. In
fact, it even loads methods from *multiple* files, which is totally
nasty. ;)
Here's some more details about what APIs are on the ::PerMsgStatus object
(or the ::Scan object as it may be renamed):
- (public) methods that actually cause the scan to happen
check
This should be left as a public API, but its code moved to a new
class. see "(internally) methods that control how tests are run,
their ordering etc" below.
- (public) the results of a scan operation
is_spam
get_names_of_tests_hit / get_names_of_subtests_hit
get_score / get_hits
get_required_score / get_required_hits
get_autolearn_status / get_report
get_content_preview
finish
These are the main thing that the ::Scan object does, so they stay.
- (public) message rewriting functionality
rewrite_mail
move code into another class; leave this public API on the ::Scan
object which calls into that class.
Proposed class name: Mail::SpamAssassin::Scan::Rewriter?
- (internally, plugins) state for a scan operation; plugins
and our code can store state on this object for the
duration of the scan.
[no methods involved, just the obj itself]
I don't see any need to change this.
- (plugins) a set of APIs to access aspects of the message
being scanned, and plugin support APIs
get_message / get_current_eval_rule_name
get_uri_list / get
create_fulltext_tmpfile
The first two, I think make sense to leave on ::Scan. Last three,
possibly move the logic into another class, but leave the API on
::Scan. In particular, get_uri_list() and get() are both
more appropriate somewhere in the Mail::SpamAssassin::Message
hierarchy.
- (internally) methods that implement Eval tests
[entire contents of EvalTests.pm which
do this horrible hack of putting themselves into
the PerMsgStatus namespace]
move code into another namespace. Eval tests use the
PerMsgStatus object as $self, and since they're just
functions, not objects themselves, that doesn't need to
change -- they'd still get the ::Scan object as their
first arg.
Proposed namespace: Mail::SpamAssassin::Test::Eval?
- (internally) methods that control how tests are run,
their ordering etc.
[parts of check]
[parts of do_head_tests / etc. ]
Definitely move.
Proposed class: Mail::SpamAssassin::TestRunner?
RunTests? Runner? Scanner?
- (internally) methods that implement the DNS event-driven algorithm
[entire contents of Dns.pm which do this horrible hack of putting
themselves into the PerMsgStatus namespace]
into Mail::SpamAssassin::TestRunner as above?
- (internally) methods that perform auto-learning
learn
Proposed class: Mail::SpamAssassin::AutoLearn? (I don't think
mushing into PerMsgLearner, Bayes, or Mail::SpamAssassin makes
sense, so a new class would be better.)
- (internally) methods that compile tests into perl bytecode at runtime
do_head_tests / do_body_tests / do_body_uri_tests
do_rawbody_tests etc.
into Mail::SpamAssassin::TestRunner as above?
- (internally) the tests compiled as perl bytecode
MAKE_MONEY_FAST_body_test etc.
like eval tests, the functions here get the Scan object passed
in as arg 1, and that doesn't need to change.
Proposed namespace: Mail::SpamAssassin::Test::Body, etc., with a
namespace for each type of test? or should we carry on mushing them
into one namespace, Mail::SpamAssassin::Test::Compiled?
--j.
Re: class renaming
Posted by Daniel Quinlan <qu...@pathname.com>.
Justin Mason <jm...@jmason.org> writes:
> yes, actually, you're right.
Hmmm... I'm not sure how to take that. ;-)
>> I'd propose that rewriting be part of a Mail::SpamAssassin::Format
>> class.
> any particular reason for that name?
It was the best name I could come up with. :-)
> - Scan: object returned to user
> - [this class]: object that contains the algorithm and code
> to run whatever subset of the tests in whatever order
>
> And the idea is that all this logic shouldn't be in the
> simple "results" object we give back to the user.
I was figuring that [this class] would be underneath the "Scan" object.
--
Daniel Quinlan ApacheCon! 13-17 November (3 SpamAssassin
http://www.pathname.com/~quinlan/ http://www.apachecon.com/ sessions & more)
Re: class renaming
Posted by Theo Van Dinter <fe...@kluge.net>.
On Mon, Sep 27, 2004 at 06:19:03PM +0200, Malte S. Stretz wrote:
> As much as I loved to have this thing renamed, why didn't we do this
> *before* we released 3.0? Or to quote you from bug 3668: "there's *no way*
3.0 dragged on for a long time. We really should have finished the API
changes beforehand, but what can you do?
> I'd be happy making any of these changes before 4.0.0 ;)" (Actually, the
> "no way" is exaggerated but I don't like the idea at this point).
I'm against making large-scale API changes in the 3.x tree at the moment. The
PMS thing does suck, but it's what we have for now. I'd like to focus on
getting 3.0 stabilized, and deal with the more pressing issues for 3.1.
If we can do the break-up w/out affecting API, then 3.2 may not be a bad
choice. Otherwise 4.0 is generally the way to do it IMO.
--
Randomly Generated Tagline:
"Who will guard the guards" == "quis custodiet ipsos"
Re: class renaming
Posted by "Malte S. Stretz" <ms...@gmx.net>.
On Monday 27 September 2004 18:19 CET Malte S. Stretz wrote:
> On Sunday 26 September 2004 10:42 CET Daniel Quinlan wrote:
> >[...]
> > Do we really need to do this now? This is not going to significantly
> > help performance, accuracy, or memory usage, is it?
>
> As much as I loved to have this thing renamed, why didn't we do this
> *before* we released 3.0? Or to quote you from bug 3668: "there's *no
> way* I'd be happy making any of these changes before 4.0.0 ;)"
Hrmmm... wrong quoting. The quote from "you" was of course from Justin, not
Daniel.
Cheers,
Malte
--
[SGT] Simon G. Tatham: "How to Report Bugs Effectively"
<http://www.chiark.greenend.org.uk/~sgtatham/bugs.html>
[ESR] Eric S. Raymond: "How To Ask Questions The Smart Way"
<http://www.catb.org/~esr/faqs/smart-questions.html>
Re: class renaming
Posted by "Malte S. Stretz" <ms...@gmx.net>.
On Sunday 26 September 2004 10:42 CET Daniel Quinlan wrote:
>[...]
> Do we really need to do this now? This is not going to significantly
> help performance, accuracy, or memory usage, is it?
As much as I loved to have this thing renamed, why didn't we do this
*before* we released 3.0? Or to quote you from bug 3668: "there's *no way*
I'd be happy making any of these changes before 4.0.0 ;)" (Actually, the
"no way" is exaggerated but I don't like the idea at this point).
Cheers,
Malte
--
[SGT] Simon G. Tatham: "How to Report Bugs Effectively"
<http://www.chiark.greenend.org.uk/~sgtatham/bugs.html>
[ESR] Eric S. Raymond: "How To Ask Questions The Smart Way"
<http://www.catb.org/~esr/faqs/smart-questions.html>
Re: class renaming
Posted by Daniel Quinlan <qu...@pathname.com>.
jm@jmason.org (Justin Mason) writes:
> - (public) message rewriting functionality
>
> rewrite_mail
>
> move code into another class; leave this public API on the ::Scan
> object which calls into that class.
> Proposed class name: Mail::SpamAssassin::Scan::Rewriter?
Rewriting should not be part of the Scan object. I'd propose that
rewriting be part of a Mail::SpamAssassin::Format class.
> - (internally) methods that implement Eval tests
>
> [entire contents of EvalTests.pm which
> do this horrible hack of putting themselves into
> the PerMsgStatus namespace]
>
> move code into another namespace. Eval tests use the
> PerMsgStatus object as $self, and since they're just
> functions, not objects themselves, that doesn't need to
> change -- they'd still get the ::Scan object as their
> first arg.
>
> Proposed namespace: Mail::SpamAssassin::Test::Eval?
Just Mail::SpamAssassin::Tests.pm ?
> - (internally) methods that control how tests are run,
> their ordering etc.
>
> [parts of check]
> [parts of do_head_tests / etc. ]
>
> Definitely move.
> Proposed class: Mail::SpamAssassin::TestRunner?
> RunTests? Runner? Scanner?
Shouldn't this just be part of "Scan"?
> - (internally) methods that implement the DNS event-driven algorithm
>
> [entire contents of Dns.pm which do this horrible hack of putting
> themselves into the PerMsgStatus namespace]
>
> into Mail::SpamAssassin::TestRunner as above?
I'd say this belongs in the EvalTests module, wherever it ends up.
> - (internally) methods that perform auto-learning
>
> learn
>
> Proposed class: Mail::SpamAssassin::AutoLearn? (I don't think
> mushing into PerMsgLearner, Bayes, or Mail::SpamAssassin makes
> sense, so a new class would be better.)
I think there's too much breaking up of stuff here. Bayes would be
fine.
Do we really need to do this now? This is not going to significantly
help performance, accuracy, or memory usage, is it?
What's the effect on stability?
How does this affect our release cycle?
Daniel
--
Daniel Quinlan ApacheCon! 13-17 November (3 SpamAssassin
http://www.pathname.com/~quinlan/ http://www.apachecon.com/ sessions & more)
Re: class renaming
Posted by Daniel Quinlan <qu...@pathname.com>.
"Loren Wilton" <lw...@earthlink.net> writes:
> Personally I'd prefer Mail::SpamAssassin::Assassinate. No, it doesn't make
> any obvious sense, but a project isn't worthwhile if you can't have one
> class somewhere with a fun name that everyone "just knows what it does".
>
> Although that might be a better name for the TestRunner or Rewriter.
It's all fun and games until you have to type the name 10,000 times
during development. The "SpamAssassin" directory name is bad enough.
;-)
Daniel
--
Daniel Quinlan ApacheCon! 13-17 November (3 SpamAssassin
http://www.pathname.com/~quinlan/ http://www.apachecon.com/ sessions & more)
Re: class renaming
Posted by Loren Wilton <lw...@earthlink.net>.
> - rename the current Mail::SpamAssassin::PerMsgStatus class to
> Mail::SpamAssassin::Scan
Personally I'd prefer Mail::SpamAssassin::Assassinate. No, it doesn't make
any obvious sense, but a project isn't worthwhile if you can't have one
class somewhere with a fun name that everyone "just knows what it does".
Although that might be a better name for the TestRunner or Rewriter.
Loren