You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by David Birnbaum <da...@pins.net> on 2004/02/25 20:58:04 UTC

Assymetry in learning spam and ham

Howdy,

Just a question.  For learning a message as spam (and reporting it), it
seems like this is adequate:

  $f->report_as_spam( $mail )

However, for ham, it seems like you need to do this:

  $f->init_learner;
  $status = $f->learn( $mail );
  $f->rebuild_learner_caches;
  $f->finish_learner

report_as_spam() implies that it is learnt by Bayes, but I didn't see if
the learn() stuff is happening under the hood.  It also appears the
learn() returns a status object, where report returns nothing terribly
useful.

Is the code in 2.63 doing the right thing?  This is for a persistant
process, so I need to make sure I get it right.

Thanks,

David.

Re: Assymetry in learning spam and ham

Posted by David Birnbaum <da...@pins.net>.
On Wed, 25 Feb 2004, Matt Kettler wrote:

> >Just a question.  For learning a message as spam (and reporting it), it
> >seems like this is adequate:
> >
> >   $f->report_as_spam( $mail )
> >
> >However, for ham, it seems like you need to do this:
> >
> >   $f->init_learner;
> >   $status = $f->learn( $mail );
> >   $f->rebuild_learner_caches;
> >   $f->finish_learner
>
> it's a bit convoluted  if you as me, and it's contrary to the claims of
> what you must do before calling various functions, but it appears to be
> functional.

Yes...labyrinthan even.  I'd really like to not worry about that stuff.
But for now I'll just do it this way.  Perhaps the next time somebody
staggers through this code we can make a learn_as_(sp|h)am() routine that
just encapsulates the whole thing instead of adding the other stuff.

Does it seem to break anything if you call the init() functions multiple
times?

David.

Re: Assymetry in learning spam and ham

Posted by Matt Kettler <mk...@evi-inc.com>.
At 02:58 PM 2/25/2004, David Birnbaum wrote:
>Howdy,
>
>Just a question.  For learning a message as spam (and reporting it), it
>seems like this is adequate:
>
>   $f->report_as_spam( $mail )
>
>However, for ham, it seems like you need to do this:
>
>   $f->init_learner;
>   $status = $f->learn( $mail );
>   $f->rebuild_learner_caches;
>   $f->finish_learner
>
>report_as_spam() implies that it is learnt by Bayes, but I didn't see if
>the learn() stuff is happening under the hood.  It also appears the
>learn() returns a status object, where report returns nothing terribly
>useful.

it's a bit convoluted  if you as me, and it's contrary to the claims of 
what you must do before calling various functions, but it appears to be 
functional.

I also can't find a finish_learner call anywhere in the chain, but that 
doesn't mean it's not there, somewhere in some long chain of sub calls.

Here's my short track through the 2.63 code:

report_as_spam in SpamAssassin.pm does definitely do learning. You can see 
it call self->learn like this:

sub report_as_spam {

<snip - bunch of code>
   # learn as spam if enabled
   if ( $self->{conf}->{bayes_learn_during_report} ) {
     $self->learn ($mail, undef, 1, 0);
   }


And the learn subroutine calls self->init(1), which in turn does 
init_learner. (Note that the comments around self->learn claim you need to 
init the learner first, but apparently this isn't 100% true.. )


Now, the SpamAssassin.pm learn routine calls msg->learn_spam.

PerMsgLearner.pm 's learn_spam calls:

         $self->{bayes_scanner}->learn (1, $self->{msg}, $id);