You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Henrik K <he...@hege.li> on 2022/05/03 15:08:23 UTC

Re: svn commit: r1900514 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm

I really dislike these kinds of "band-aids" which really don't help the main
cause: terribly convoluted plugin code.  Why skip debug output if owner is
missing?  Is the a good reason for owner missing, what is the cause for that
and can it be better fixed upstream?  Why are there all kinds of regexs
without any sanity/error checks?  Etc..

If I don't hear any volunteers, I guess I need to try to clean up this too.


On Tue, May 03, 2022 at 02:56:35PM -0000, gbechis@apache.org wrote:
> Author: gbechis
> Date: Tue May  3 14:56:35 2022
> New Revision: 1900514
> 
> URL: http://svn.apache.org/viewvc?rev=1900514&view=rev
> Log:
> silence a warning if uri_to_domain fails.
> bz #7984
> 
> Modified:
>     spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm
> 
> Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm
> URL: http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm?rev=1900514&r1=1900513&r2=1900514&view=diff
> ==============================================================================
> --- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm (original)
> +++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm Tue May  3 14:56:35 2022
> @@ -390,9 +390,9 @@ sub _check_fromnamespoof
>      $pms->set_tag("FNSFNAMEDOMAIN", $fnd{'domain'});
>      $pms->set_tag("FNSFADDRDOMAIN", $fad{'domain'});
>  
> -    dbg("From name spoof: $fnd{addr} $fnd{domain} $fnd{owner}");
> -    dbg("Actual From: $fad{addr} $fad{domain} $fad{owner}");
> -    dbg("To Address: $tod{addr} $tod{domain} $tod{owner}");
> +    dbg("From name spoof: $fnd{addr} $fnd{domain} $fnd{owner}") if defined $fnd{owner};
> +    dbg("Actual From: $fad{addr} $fad{domain} $fad{owner}") if defined $fad{owner};
> +    dbg("To Address: $tod{addr} $tod{domain} $tod{owner}") if defined $tod{owner};
>    }
>  }
>  
> @@ -410,6 +410,8 @@ sub _find_address_owner
>  
>    my $owner = $self->{main}->{registryboundaries}->uri_to_domain($check);
>  
> +  return if not defined $owner;
> +
>    $check =~ /^([^\@]+)\@(.*)$/;
>  
>    if ($owner ne $2) {
> 

Re: svn commit: r1900514 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm

Posted by Henrik K <he...@hege.li>.
FYI I will clean up the whole plugin today.  Including fixing the DKIM
dependency which would randomly work.

If someone wants to work on creating tests for the plugin, that would be
helpful..

On Tue, May 03, 2022 at 06:21:38PM +0300, Henrik K wrote:
> 
> For example just above the added defined checks is this..  what's the point
> of checking some defines later, as this would croak anyway..
> 
>   if ($fnd{'owner'} ne $fad{'owner'}) {
>     $pms->{fromname_owner_different} = 1;
>   }
> 
> 
> On Tue, May 03, 2022 at 06:08:23PM +0300, Henrik K wrote:
> > 
> > I really dislike these kinds of "band-aids" which really don't help the main
> > cause: terribly convoluted plugin code.  Why skip debug output if owner is
> > missing?  Is the a good reason for owner missing, what is the cause for that
> > and can it be better fixed upstream?  Why are there all kinds of regexs
> > without any sanity/error checks?  Etc..
> > 
> > If I don't hear any volunteers, I guess I need to try to clean up this too.
> > 
> > 
> > On Tue, May 03, 2022 at 02:56:35PM -0000, gbechis@apache.org wrote:
> > > Author: gbechis
> > > Date: Tue May  3 14:56:35 2022
> > > New Revision: 1900514
> > > 
> > > URL: http://svn.apache.org/viewvc?rev=1900514&view=rev
> > > Log:
> > > silence a warning if uri_to_domain fails.
> > > bz #7984
> > > 
> > > Modified:
> > >     spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm
> > > 
> > > Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm
> > > URL: http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm?rev=1900514&r1=1900513&r2=1900514&view=diff
> > > ==============================================================================
> > > --- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm (original)
> > > +++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm Tue May  3 14:56:35 2022
> > > @@ -390,9 +390,9 @@ sub _check_fromnamespoof
> > >      $pms->set_tag("FNSFNAMEDOMAIN", $fnd{'domain'});
> > >      $pms->set_tag("FNSFADDRDOMAIN", $fad{'domain'});
> > >  
> > > -    dbg("From name spoof: $fnd{addr} $fnd{domain} $fnd{owner}");
> > > -    dbg("Actual From: $fad{addr} $fad{domain} $fad{owner}");
> > > -    dbg("To Address: $tod{addr} $tod{domain} $tod{owner}");
> > > +    dbg("From name spoof: $fnd{addr} $fnd{domain} $fnd{owner}") if defined $fnd{owner};
> > > +    dbg("Actual From: $fad{addr} $fad{domain} $fad{owner}") if defined $fad{owner};
> > > +    dbg("To Address: $tod{addr} $tod{domain} $tod{owner}") if defined $tod{owner};
> > >    }
> > >  }
> > >  
> > > @@ -410,6 +410,8 @@ sub _find_address_owner
> > >  
> > >    my $owner = $self->{main}->{registryboundaries}->uri_to_domain($check);
> > >  
> > > +  return if not defined $owner;
> > > +
> > >    $check =~ /^([^\@]+)\@(.*)$/;
> > >  
> > >    if ($owner ne $2) {
> > > 

Re: svn commit: r1900514 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm

Posted by Henrik K <he...@hege.li>.
For example just above the added defined checks is this..  what's the point
of checking some defines later, as this would croak anyway..

  if ($fnd{'owner'} ne $fad{'owner'}) {
    $pms->{fromname_owner_different} = 1;
  }


On Tue, May 03, 2022 at 06:08:23PM +0300, Henrik K wrote:
> 
> I really dislike these kinds of "band-aids" which really don't help the main
> cause: terribly convoluted plugin code.  Why skip debug output if owner is
> missing?  Is the a good reason for owner missing, what is the cause for that
> and can it be better fixed upstream?  Why are there all kinds of regexs
> without any sanity/error checks?  Etc..
> 
> If I don't hear any volunteers, I guess I need to try to clean up this too.
> 
> 
> On Tue, May 03, 2022 at 02:56:35PM -0000, gbechis@apache.org wrote:
> > Author: gbechis
> > Date: Tue May  3 14:56:35 2022
> > New Revision: 1900514
> > 
> > URL: http://svn.apache.org/viewvc?rev=1900514&view=rev
> > Log:
> > silence a warning if uri_to_domain fails.
> > bz #7984
> > 
> > Modified:
> >     spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm
> > 
> > Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm
> > URL: http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm?rev=1900514&r1=1900513&r2=1900514&view=diff
> > ==============================================================================
> > --- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm (original)
> > +++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm Tue May  3 14:56:35 2022
> > @@ -390,9 +390,9 @@ sub _check_fromnamespoof
> >      $pms->set_tag("FNSFNAMEDOMAIN", $fnd{'domain'});
> >      $pms->set_tag("FNSFADDRDOMAIN", $fad{'domain'});
> >  
> > -    dbg("From name spoof: $fnd{addr} $fnd{domain} $fnd{owner}");
> > -    dbg("Actual From: $fad{addr} $fad{domain} $fad{owner}");
> > -    dbg("To Address: $tod{addr} $tod{domain} $tod{owner}");
> > +    dbg("From name spoof: $fnd{addr} $fnd{domain} $fnd{owner}") if defined $fnd{owner};
> > +    dbg("Actual From: $fad{addr} $fad{domain} $fad{owner}") if defined $fad{owner};
> > +    dbg("To Address: $tod{addr} $tod{domain} $tod{owner}") if defined $tod{owner};
> >    }
> >  }
> >  
> > @@ -410,6 +410,8 @@ sub _find_address_owner
> >  
> >    my $owner = $self->{main}->{registryboundaries}->uri_to_domain($check);
> >  
> > +  return if not defined $owner;
> > +
> >    $check =~ /^([^\@]+)\@(.*)$/;
> >  
> >    if ($owner ne $2) {
> > 

Re: Some talk about FromNameSpoof and async coding

Posted by Henrik K <he...@hege.li>.
On Fri, May 06, 2022 at 07:49:14AM +0300, Henrik K wrote:
>
> I see now that we can keep adding evals into queue as they are called, and
> keep the check_cleanup() around to make sure any pending queue is run.

Now fixed and better commented in revision 1900613.


Re: Some talk about FromNameSpoof and async coding

Posted by Henrik K <he...@hege.li>.
On Fri, May 06, 2022 at 07:49:14AM +0300, Henrik K wrote:
> 
> I was initially scared of users messing with priorities so that some of the
> evals run at completely different times.  This can happen for example with
> meta rules and $pms->fix_priorities() which can adjust priorities if some of
> the FNS rules are depended on.  So it's possible that when DKIMDOMAIN is
> set, not all eval functions are even registered in async queue etc.

... and of course it's possible that DKIMDOMAIN will never get
set, thus action_depends_on_tags will never call anything.  In this case
check_cleanup() is also needed to make sure _check_fromnamespoof is run.


Some talk about FromNameSpoof and async coding

Posted by Henrik K <he...@hege.li>.
On Thu, May 05, 2022 at 10:25:07PM +0200, Michael Storz wrote:
> 
> @Henrik: I was only surprised that you took the asynchronous method with
> rule_pending/rule_ready + check_cleanup instead of the method for tags with
> action_depends_on_tags. This separates the evaluation of the eval functions
> of the two plugins in time, since check_cleanup is called at the very end
> after the evaluation of all priorities. Since _check_fromnamespoof in turn
> defines a set of tags, all functions that depend on these tags are also
> evaluated only at this time and thus all meta rules that depend on them,
> etc.
> 
> If, on the other hand, action_depends_on_tags is used, e.g.
> 
> sub _check_eval {
>   my ($self, $pms, $result) = @_;
> 
>   if (exists $pms->{fromname_async_queue}) {
>     my $rulename = $pms->get_current_eval_rule_name();
>     $pms->action_depends_on_tags2('DKIMDOMAIN', sub {
>       $self->_check_fromnamespoof($pms);
>       return $result->() ? $pms->got_hit($rulename, '', ruletype =>
> 'header') : 0 ;
>     });
>     return;
>   }
> 
>   $self->_check_fromnamespoof($pms);
>   return $result->();
> }
> 
> then after the DKIMDOMAIN tag is defined, all eval functions that depend on
> it are called immediately. Thus, all eval functions run at priority level 0
> and all dependent eval and meta rules can run at the same priority level.
> This solution runs on 3.4.6. Therefore I would be interested which reasons
> speak against this solution. Maybe there is something in 4.0 which changes
> the game.

Appreciate the thoughtful post.

I was initially scared of users messing with priorities so that some of the
evals run at completely different times.  This can happen for example with
meta rules and $pms->fix_priorities() which can adjust priorities if some of
the FNS rules are depended on.  So it's possible that when DKIMDOMAIN is
set, not all eval functions are even registered in async queue etc.

( As a random note, just noticed that $pms->fix_priorities() seems to act
  strange too, adjusting different/random rules on each call.  Since all
  meta rules were changed into dynamic evaluation some time ago, not sure if
  fix_priorities() is even needed, atleast for anything else than
  shortcircuiting meta rules.. )

I see now that we can keep adding evals into queue as they are called, and
keep the check_cleanup() around to make sure any pending queue is run.

As we all can see, it's very hard to make 100% robust logic when there are
async stuff or complex dependencies between plugin states.  Imagine someone
new stumbling into SA and trying to make some plugins, there really isn't
any sane documentation for it.  I've been tweaking these for a long time,
but keep getting bitten.  Even just noticed that the ASN plugin logic is
broken (for example expecting 'ASN' tag, which might not be never set as
user can change the tag name).

Cheers,
Henrik

Re: svn commit: r1900514 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm

Posted by Michael Storz <Mi...@lrz.de>.
Am 2022-05-03 17:08, schrieb Henrik K:
> I really dislike these kinds of "band-aids" which really don't help the 
> main
> cause: terribly convoluted plugin code.  Why skip debug output if owner 
> is
> missing?  Is the a good reason for owner missing, what is the cause for 
> that
> and can it be better fixed upstream?  Why are there all kinds of regexs
> without any sanity/error checks?  Etc..
> 
> If I don't hear any volunteers, I guess I need to try to clean up this 
> too.
> 
> 

@Henrik: Please read the end of the text for the question I have about 
your rewrite of FromNameSpoof.pm

After Henrik asked for volunteers, I took a quick look at the 
 FromNameSpoof.pm plugin and, like Henrik, immediately realized that the 
DKIM signature part can't work. You can't call the same function in both 
async and sync mode. The call to action_depends_on_tags2 is not a 
declaration that makes every call to _check_fromnamespoof an 
asynchronous call.

Why does the plugin seem to work anyway? If you were to examine the log 
files closely, you would find that the plugin works most of the time but 
not always. In one of my plugins I fell for a similar bug related to 
DKIMDOMAIN. So I think it is interesting to present a detailed analysis 
of the problem.

Apart from the 35 predefined common tags in the PerMsgStatus.pm plugin, 
which are mainly for use in templates, there is another class of tags 
defined via set_tag, which for me I divide into static and dynamic tags. 
Static tags are those that are defined before the rules are executed. 
These essentially come from the Metadata.pm and PerMsgStatus.pm modules. 
Examples are SENDERDOMAIN and AUTHORDOMAIN.

Dynamic tags are defined by executing rules with an eval function. 
DKIMDOMAIN belongs to the dynamic tags even if one is tempted to count 
it among the static tags due to the similarity in name to SENDERDOMAIN 
and AUTHORDOMAIN.

The tag DKIMDOMAIN is set together with the tags DKIMIDENTITY and 
DKIMSELECTOR by the worker routine _check_dkim_signature of the plugin 
DKIM.pm. The two plugins DKIM.pm and FromNameSpoof.pm have a similar 
structure. Both have a worker routine that is activated by the first 
eval function called and calculates and caches the results. The 
respective eval functions then only use the result from the cache.

The plugin DKIM.pm has 9 eval functions that can call the worker routine 
_check_dkim_signature:

check_dkim_signed
check_dkim_valid
check_dkim_valid_author_sig
check_dkim_valid_envelopefrom
check_dkim_dependable
check_dkim_adsp
check_dkim_signsome
check_dkim_signall
check_dkim_testing

These are in turn called by 17 rules:

full     DKIM_SIGNED            eval:check_dkim_signed()
full     DKIM_VALID             eval:check_dkim_valid()
full     DKIM_VALID_AU          eval:check_dkim_valid_author_sig()
full     DKIM_VALID_EF          eval:check_dkim_valid_envelopefrom()
full     __DKIM_DEPENDABLE      eval:check_dkim_dependable()
header   DKIM_ADSP_NXDOMAIN     eval:check_dkim_adsp('N')
header   DKIM_ADSP_DISCARD      eval:check_dkim_adsp('D')
header   DKIM_ADSP_ALL          eval:check_dkim_adsp('A')
header   DKIM_ADSP_CUSTOM_LOW   eval:check_dkim_adsp('1')
header   DKIM_ADSP_CUSTOM_MED   eval:check_dkim_adsp('2')
header   DKIM_ADSP_CUSTOM_HIGH  eval:check_dkim_adsp('3')
full     __RESIGNER1     eval:check_dkim_valid('linkedin.com')
full     __RESIGNER2     
eval:check_dkim_valid('googlegroups.com','yahoogroups.com','yahoogroups.de')
full   DKIM_VERIFIED            eval:check_dkim_valid()
header DKIM_POLICY_TESTING      eval:check_dkim_testing()
header DKIM_POLICY_SIGNSOME     eval:check_dkim_signsome()
header DKIM_POLICY_SIGNALL      eval:check_dkim_signall()

The plugin FromNameSpoof.pm has 7 eval functions that can call the 
worker routine _check_fromnamespoof:

check_fromname_different
check_fromname_domain_differ
check_fromname_spoof
check_fromname_contains_email
check_fromname_equals_replyto
check_fromname_equals_to
check_fromname_owners_differ

These are called by (only) 2 and 3 rules, respectively:

header    __FROM_EQ_REPLY             
eval:check_fromname_equals_replyto() ifplugin 
Mail::SpamAssassin::Plugin::FreeMail
header    __PLUGIN_FROMNAME_EQUALS_TO eval:check_fromname_equals_to()
header    __PLUGIN_FROMNAME_SPOOF     eval:check_fromname_spoof()

The generation of the evaluation of the rules happens in some sense in 
the plugin Check.pm in the subroutine run_generic_tests in

     while (my($rulename, $test) = each %{$opts{testhash}->{$priority}}) 
{
       $opts{loop_body}->($self, $pms, $conf, $rulename, $test, %nopts);
     }

As you can see the hash testhash which holds the rules is accessed here. 
This leads to the fact that each evaluation of the rules of a priority 
level is carried out in a different order. This is how the random order 
of the evaluation occurs.

Since no priorities were assigned for the rules of the two plugins, the 
rules are all executed with the same priority 0.

This means that in

17 / (17 +2) * 100 = 89.5% resp. 17 / (17 +3) * 100 = 85.0%

of the cases an eval function of DKIM.pm is executed first. Only in the 
rest of the cases an eval function of FromNameSpoof.pm is executed 
first.

The subroutine parsed_metadata contains an asynchronous call to the 
worker routine _check_fromnamespoof to wait for the tag DKIMDOMAIN 
(action_depends_on_tags).

If now first an eval function of DKIM.pm is called, then the worker 
routine _check_dkim_signature is called which defines the tag 
DKIMDOMAIN. After that the waiting worker routine _check_fromnamespoof 
is called immediately, which evaluates the tag DKIMDOMAIN and calculates 
the actual data of the plugin. From this point on, when an eval function 
is called from one of the two plugins, only the data from the caches is 
read.

On the other hand, if one of the eval functions from FromNameSpoof.pm is 
called first, the worker routine _check_fromnamespoof is called 
immediately (synchronously). The value of the DKIMDOMAIN tag is now 
undefined and cannot be used or is used incorrectly in the calculation 
of the data. As we have seen in the above calculation, this is the case 
in 10.5% or 15% of the evaluations.

What can be done about it now? The simplest and dirtiest solution is 
what the plugins AWL.pm and TxRep.pm do:

Mail::SpamAssassin::Plugin::AWL
header    AWL    eval:check_from_in_auto_welcomelist()
priority  AWL    1000

Mail::SpamAssassin::Plugin::TxRep
header    TXREP  eval:check_senders_reputation()
priority  TXREP  1000

i.e. via the priority the order of the rules is controlled. For this it 
would be enough to run one of the DKIM rules with a higher priority.

The clean solution is to call all eval functions asynchronously if the 
DKIMDOMAIN tag has to be considered. This is the way Henrik has taken.

@Henrik: I was only surprised that you took the asynchronous method with 
rule_pending/rule_ready + check_cleanup instead of the method for tags 
with action_depends_on_tags. This separates the evaluation of the eval 
functions of the two plugins in time, since check_cleanup is called at 
the very end after the evaluation of all priorities. Since 
_check_fromnamespoof in turn defines a set of tags, all functions that 
depend on these tags are also evaluated only at this time and thus all 
meta rules that depend on them, etc.

If, on the other hand, action_depends_on_tags is used, e.g.

sub _check_eval {
   my ($self, $pms, $result) = @_;

   if (exists $pms->{fromname_async_queue}) {
     my $rulename = $pms->get_current_eval_rule_name();
     $pms->action_depends_on_tags2('DKIMDOMAIN', sub {
       $self->_check_fromnamespoof($pms);
       return $result->() ? $pms->got_hit($rulename, '', ruletype => 
'header') : 0 ;
     });
     return;
   }

   $self->_check_fromnamespoof($pms);
   return $result->();
}

then after the DKIMDOMAIN tag is defined, all eval functions that depend 
on it are called immediately. Thus, all eval functions run at priority 
level 0 and all dependent eval and meta rules can run at the same 
priority level. This solution runs on 3.4.6. Therefore I would be 
interested which reasons speak against this solution. Maybe there is 
something in 4.0 which changes the game.

Regards,
Michael