You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Henrik K <he...@hege.li> on 2022/05/03 15:08:23 UTC
Re: svn commit: r1900514 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm
I really dislike these kinds of "band-aids" which really don't help the main
cause: terribly convoluted plugin code. Why skip debug output if owner is
missing? Is the a good reason for owner missing, what is the cause for that
and can it be better fixed upstream? Why are there all kinds of regexs
without any sanity/error checks? Etc..
If I don't hear any volunteers, I guess I need to try to clean up this too.
On Tue, May 03, 2022 at 02:56:35PM -0000, gbechis@apache.org wrote:
> Author: gbechis
> Date: Tue May 3 14:56:35 2022
> New Revision: 1900514
>
> URL: http://svn.apache.org/viewvc?rev=1900514&view=rev
> Log:
> silence a warning if uri_to_domain fails.
> bz #7984
>
> Modified:
> spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm
>
> Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm
> URL: http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm?rev=1900514&r1=1900513&r2=1900514&view=diff
> ==============================================================================
> --- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm (original)
> +++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm Tue May 3 14:56:35 2022
> @@ -390,9 +390,9 @@ sub _check_fromnamespoof
> $pms->set_tag("FNSFNAMEDOMAIN", $fnd{'domain'});
> $pms->set_tag("FNSFADDRDOMAIN", $fad{'domain'});
>
> - dbg("From name spoof: $fnd{addr} $fnd{domain} $fnd{owner}");
> - dbg("Actual From: $fad{addr} $fad{domain} $fad{owner}");
> - dbg("To Address: $tod{addr} $tod{domain} $tod{owner}");
> + dbg("From name spoof: $fnd{addr} $fnd{domain} $fnd{owner}") if defined $fnd{owner};
> + dbg("Actual From: $fad{addr} $fad{domain} $fad{owner}") if defined $fad{owner};
> + dbg("To Address: $tod{addr} $tod{domain} $tod{owner}") if defined $tod{owner};
> }
> }
>
> @@ -410,6 +410,8 @@ sub _find_address_owner
>
> my $owner = $self->{main}->{registryboundaries}->uri_to_domain($check);
>
> + return if not defined $owner;
> +
> $check =~ /^([^\@]+)\@(.*)$/;
>
> if ($owner ne $2) {
>
Re: svn commit: r1900514 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm
Posted by Henrik K <he...@hege.li>.
FYI I will clean up the whole plugin today. Including fixing the DKIM
dependency which would randomly work.
If someone wants to work on creating tests for the plugin, that would be
helpful..
On Tue, May 03, 2022 at 06:21:38PM +0300, Henrik K wrote:
>
> For example just above the added defined checks is this.. what's the point
> of checking some defines later, as this would croak anyway..
>
> if ($fnd{'owner'} ne $fad{'owner'}) {
> $pms->{fromname_owner_different} = 1;
> }
>
>
> On Tue, May 03, 2022 at 06:08:23PM +0300, Henrik K wrote:
> >
> > I really dislike these kinds of "band-aids" which really don't help the main
> > cause: terribly convoluted plugin code. Why skip debug output if owner is
> > missing? Is the a good reason for owner missing, what is the cause for that
> > and can it be better fixed upstream? Why are there all kinds of regexs
> > without any sanity/error checks? Etc..
> >
> > If I don't hear any volunteers, I guess I need to try to clean up this too.
> >
> >
> > On Tue, May 03, 2022 at 02:56:35PM -0000, gbechis@apache.org wrote:
> > > Author: gbechis
> > > Date: Tue May 3 14:56:35 2022
> > > New Revision: 1900514
> > >
> > > URL: http://svn.apache.org/viewvc?rev=1900514&view=rev
> > > Log:
> > > silence a warning if uri_to_domain fails.
> > > bz #7984
> > >
> > > Modified:
> > > spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm
> > >
> > > Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm
> > > URL: http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm?rev=1900514&r1=1900513&r2=1900514&view=diff
> > > ==============================================================================
> > > --- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm (original)
> > > +++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm Tue May 3 14:56:35 2022
> > > @@ -390,9 +390,9 @@ sub _check_fromnamespoof
> > > $pms->set_tag("FNSFNAMEDOMAIN", $fnd{'domain'});
> > > $pms->set_tag("FNSFADDRDOMAIN", $fad{'domain'});
> > >
> > > - dbg("From name spoof: $fnd{addr} $fnd{domain} $fnd{owner}");
> > > - dbg("Actual From: $fad{addr} $fad{domain} $fad{owner}");
> > > - dbg("To Address: $tod{addr} $tod{domain} $tod{owner}");
> > > + dbg("From name spoof: $fnd{addr} $fnd{domain} $fnd{owner}") if defined $fnd{owner};
> > > + dbg("Actual From: $fad{addr} $fad{domain} $fad{owner}") if defined $fad{owner};
> > > + dbg("To Address: $tod{addr} $tod{domain} $tod{owner}") if defined $tod{owner};
> > > }
> > > }
> > >
> > > @@ -410,6 +410,8 @@ sub _find_address_owner
> > >
> > > my $owner = $self->{main}->{registryboundaries}->uri_to_domain($check);
> > >
> > > + return if not defined $owner;
> > > +
> > > $check =~ /^([^\@]+)\@(.*)$/;
> > >
> > > if ($owner ne $2) {
> > >
Re: svn commit: r1900514 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm
Posted by Henrik K <he...@hege.li>.
For example just above the added defined checks is this.. what's the point
of checking some defines later, as this would croak anyway..
if ($fnd{'owner'} ne $fad{'owner'}) {
$pms->{fromname_owner_different} = 1;
}
On Tue, May 03, 2022 at 06:08:23PM +0300, Henrik K wrote:
>
> I really dislike these kinds of "band-aids" which really don't help the main
> cause: terribly convoluted plugin code. Why skip debug output if owner is
> missing? Is the a good reason for owner missing, what is the cause for that
> and can it be better fixed upstream? Why are there all kinds of regexs
> without any sanity/error checks? Etc..
>
> If I don't hear any volunteers, I guess I need to try to clean up this too.
>
>
> On Tue, May 03, 2022 at 02:56:35PM -0000, gbechis@apache.org wrote:
> > Author: gbechis
> > Date: Tue May 3 14:56:35 2022
> > New Revision: 1900514
> >
> > URL: http://svn.apache.org/viewvc?rev=1900514&view=rev
> > Log:
> > silence a warning if uri_to_domain fails.
> > bz #7984
> >
> > Modified:
> > spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm
> >
> > Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm
> > URL: http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm?rev=1900514&r1=1900513&r2=1900514&view=diff
> > ==============================================================================
> > --- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm (original)
> > +++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm Tue May 3 14:56:35 2022
> > @@ -390,9 +390,9 @@ sub _check_fromnamespoof
> > $pms->set_tag("FNSFNAMEDOMAIN", $fnd{'domain'});
> > $pms->set_tag("FNSFADDRDOMAIN", $fad{'domain'});
> >
> > - dbg("From name spoof: $fnd{addr} $fnd{domain} $fnd{owner}");
> > - dbg("Actual From: $fad{addr} $fad{domain} $fad{owner}");
> > - dbg("To Address: $tod{addr} $tod{domain} $tod{owner}");
> > + dbg("From name spoof: $fnd{addr} $fnd{domain} $fnd{owner}") if defined $fnd{owner};
> > + dbg("Actual From: $fad{addr} $fad{domain} $fad{owner}") if defined $fad{owner};
> > + dbg("To Address: $tod{addr} $tod{domain} $tod{owner}") if defined $tod{owner};
> > }
> > }
> >
> > @@ -410,6 +410,8 @@ sub _find_address_owner
> >
> > my $owner = $self->{main}->{registryboundaries}->uri_to_domain($check);
> >
> > + return if not defined $owner;
> > +
> > $check =~ /^([^\@]+)\@(.*)$/;
> >
> > if ($owner ne $2) {
> >
Re: Some talk about FromNameSpoof and async coding
Posted by Henrik K <he...@hege.li>.
On Fri, May 06, 2022 at 07:49:14AM +0300, Henrik K wrote:
>
> I see now that we can keep adding evals into queue as they are called, and
> keep the check_cleanup() around to make sure any pending queue is run.
Now fixed and better commented in revision 1900613.
Re: Some talk about FromNameSpoof and async coding
Posted by Henrik K <he...@hege.li>.
On Fri, May 06, 2022 at 07:49:14AM +0300, Henrik K wrote:
>
> I was initially scared of users messing with priorities so that some of the
> evals run at completely different times. This can happen for example with
> meta rules and $pms->fix_priorities() which can adjust priorities if some of
> the FNS rules are depended on. So it's possible that when DKIMDOMAIN is
> set, not all eval functions are even registered in async queue etc.
... and of course it's possible that DKIMDOMAIN will never get
set, thus action_depends_on_tags will never call anything. In this case
check_cleanup() is also needed to make sure _check_fromnamespoof is run.
Some talk about FromNameSpoof and async coding
Posted by Henrik K <he...@hege.li>.
On Thu, May 05, 2022 at 10:25:07PM +0200, Michael Storz wrote:
>
> @Henrik: I was only surprised that you took the asynchronous method with
> rule_pending/rule_ready + check_cleanup instead of the method for tags with
> action_depends_on_tags. This separates the evaluation of the eval functions
> of the two plugins in time, since check_cleanup is called at the very end
> after the evaluation of all priorities. Since _check_fromnamespoof in turn
> defines a set of tags, all functions that depend on these tags are also
> evaluated only at this time and thus all meta rules that depend on them,
> etc.
>
> If, on the other hand, action_depends_on_tags is used, e.g.
>
> sub _check_eval {
> my ($self, $pms, $result) = @_;
>
> if (exists $pms->{fromname_async_queue}) {
> my $rulename = $pms->get_current_eval_rule_name();
> $pms->action_depends_on_tags2('DKIMDOMAIN', sub {
> $self->_check_fromnamespoof($pms);
> return $result->() ? $pms->got_hit($rulename, '', ruletype =>
> 'header') : 0 ;
> });
> return;
> }
>
> $self->_check_fromnamespoof($pms);
> return $result->();
> }
>
> then after the DKIMDOMAIN tag is defined, all eval functions that depend on
> it are called immediately. Thus, all eval functions run at priority level 0
> and all dependent eval and meta rules can run at the same priority level.
> This solution runs on 3.4.6. Therefore I would be interested which reasons
> speak against this solution. Maybe there is something in 4.0 which changes
> the game.
Appreciate the thoughtful post.
I was initially scared of users messing with priorities so that some of the
evals run at completely different times. This can happen for example with
meta rules and $pms->fix_priorities() which can adjust priorities if some of
the FNS rules are depended on. So it's possible that when DKIMDOMAIN is
set, not all eval functions are even registered in async queue etc.
( As a random note, just noticed that $pms->fix_priorities() seems to act
strange too, adjusting different/random rules on each call. Since all
meta rules were changed into dynamic evaluation some time ago, not sure if
fix_priorities() is even needed, atleast for anything else than
shortcircuiting meta rules.. )
I see now that we can keep adding evals into queue as they are called, and
keep the check_cleanup() around to make sure any pending queue is run.
As we all can see, it's very hard to make 100% robust logic when there are
async stuff or complex dependencies between plugin states. Imagine someone
new stumbling into SA and trying to make some plugins, there really isn't
any sane documentation for it. I've been tweaking these for a long time,
but keep getting bitten. Even just noticed that the ASN plugin logic is
broken (for example expecting 'ASN' tag, which might not be never set as
user can change the tag name).
Cheers,
Henrik
Re: svn commit: r1900514 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/FromNameSpoof.pm
Posted by Michael Storz <Mi...@lrz.de>.
Am 2022-05-03 17:08, schrieb Henrik K:
> I really dislike these kinds of "band-aids" which really don't help the
> main
> cause: terribly convoluted plugin code. Why skip debug output if owner
> is
> missing? Is the a good reason for owner missing, what is the cause for
> that
> and can it be better fixed upstream? Why are there all kinds of regexs
> without any sanity/error checks? Etc..
>
> If I don't hear any volunteers, I guess I need to try to clean up this
> too.
>
>
@Henrik: Please read the end of the text for the question I have about
your rewrite of FromNameSpoof.pm
After Henrik asked for volunteers, I took a quick look at the
FromNameSpoof.pm plugin and, like Henrik, immediately realized that the
DKIM signature part can't work. You can't call the same function in both
async and sync mode. The call to action_depends_on_tags2 is not a
declaration that makes every call to _check_fromnamespoof an
asynchronous call.
Why does the plugin seem to work anyway? If you were to examine the log
files closely, you would find that the plugin works most of the time but
not always. In one of my plugins I fell for a similar bug related to
DKIMDOMAIN. So I think it is interesting to present a detailed analysis
of the problem.
Apart from the 35 predefined common tags in the PerMsgStatus.pm plugin,
which are mainly for use in templates, there is another class of tags
defined via set_tag, which for me I divide into static and dynamic tags.
Static tags are those that are defined before the rules are executed.
These essentially come from the Metadata.pm and PerMsgStatus.pm modules.
Examples are SENDERDOMAIN and AUTHORDOMAIN.
Dynamic tags are defined by executing rules with an eval function.
DKIMDOMAIN belongs to the dynamic tags even if one is tempted to count
it among the static tags due to the similarity in name to SENDERDOMAIN
and AUTHORDOMAIN.
The tag DKIMDOMAIN is set together with the tags DKIMIDENTITY and
DKIMSELECTOR by the worker routine _check_dkim_signature of the plugin
DKIM.pm. The two plugins DKIM.pm and FromNameSpoof.pm have a similar
structure. Both have a worker routine that is activated by the first
eval function called and calculates and caches the results. The
respective eval functions then only use the result from the cache.
The plugin DKIM.pm has 9 eval functions that can call the worker routine
_check_dkim_signature:
check_dkim_signed
check_dkim_valid
check_dkim_valid_author_sig
check_dkim_valid_envelopefrom
check_dkim_dependable
check_dkim_adsp
check_dkim_signsome
check_dkim_signall
check_dkim_testing
These are in turn called by 17 rules:
full DKIM_SIGNED eval:check_dkim_signed()
full DKIM_VALID eval:check_dkim_valid()
full DKIM_VALID_AU eval:check_dkim_valid_author_sig()
full DKIM_VALID_EF eval:check_dkim_valid_envelopefrom()
full __DKIM_DEPENDABLE eval:check_dkim_dependable()
header DKIM_ADSP_NXDOMAIN eval:check_dkim_adsp('N')
header DKIM_ADSP_DISCARD eval:check_dkim_adsp('D')
header DKIM_ADSP_ALL eval:check_dkim_adsp('A')
header DKIM_ADSP_CUSTOM_LOW eval:check_dkim_adsp('1')
header DKIM_ADSP_CUSTOM_MED eval:check_dkim_adsp('2')
header DKIM_ADSP_CUSTOM_HIGH eval:check_dkim_adsp('3')
full __RESIGNER1 eval:check_dkim_valid('linkedin.com')
full __RESIGNER2
eval:check_dkim_valid('googlegroups.com','yahoogroups.com','yahoogroups.de')
full DKIM_VERIFIED eval:check_dkim_valid()
header DKIM_POLICY_TESTING eval:check_dkim_testing()
header DKIM_POLICY_SIGNSOME eval:check_dkim_signsome()
header DKIM_POLICY_SIGNALL eval:check_dkim_signall()
The plugin FromNameSpoof.pm has 7 eval functions that can call the
worker routine _check_fromnamespoof:
check_fromname_different
check_fromname_domain_differ
check_fromname_spoof
check_fromname_contains_email
check_fromname_equals_replyto
check_fromname_equals_to
check_fromname_owners_differ
These are called by (only) 2 and 3 rules, respectively:
header __FROM_EQ_REPLY
eval:check_fromname_equals_replyto() ifplugin
Mail::SpamAssassin::Plugin::FreeMail
header __PLUGIN_FROMNAME_EQUALS_TO eval:check_fromname_equals_to()
header __PLUGIN_FROMNAME_SPOOF eval:check_fromname_spoof()
The generation of the evaluation of the rules happens in some sense in
the plugin Check.pm in the subroutine run_generic_tests in
while (my($rulename, $test) = each %{$opts{testhash}->{$priority}})
{
$opts{loop_body}->($self, $pms, $conf, $rulename, $test, %nopts);
}
As you can see the hash testhash which holds the rules is accessed here.
This leads to the fact that each evaluation of the rules of a priority
level is carried out in a different order. This is how the random order
of the evaluation occurs.
Since no priorities were assigned for the rules of the two plugins, the
rules are all executed with the same priority 0.
This means that in
17 / (17 +2) * 100 = 89.5% resp. 17 / (17 +3) * 100 = 85.0%
of the cases an eval function of DKIM.pm is executed first. Only in the
rest of the cases an eval function of FromNameSpoof.pm is executed
first.
The subroutine parsed_metadata contains an asynchronous call to the
worker routine _check_fromnamespoof to wait for the tag DKIMDOMAIN
(action_depends_on_tags).
If now first an eval function of DKIM.pm is called, then the worker
routine _check_dkim_signature is called which defines the tag
DKIMDOMAIN. After that the waiting worker routine _check_fromnamespoof
is called immediately, which evaluates the tag DKIMDOMAIN and calculates
the actual data of the plugin. From this point on, when an eval function
is called from one of the two plugins, only the data from the caches is
read.
On the other hand, if one of the eval functions from FromNameSpoof.pm is
called first, the worker routine _check_fromnamespoof is called
immediately (synchronously). The value of the DKIMDOMAIN tag is now
undefined and cannot be used or is used incorrectly in the calculation
of the data. As we have seen in the above calculation, this is the case
in 10.5% or 15% of the evaluations.
What can be done about it now? The simplest and dirtiest solution is
what the plugins AWL.pm and TxRep.pm do:
Mail::SpamAssassin::Plugin::AWL
header AWL eval:check_from_in_auto_welcomelist()
priority AWL 1000
Mail::SpamAssassin::Plugin::TxRep
header TXREP eval:check_senders_reputation()
priority TXREP 1000
i.e. via the priority the order of the rules is controlled. For this it
would be enough to run one of the DKIM rules with a higher priority.
The clean solution is to call all eval functions asynchronously if the
DKIMDOMAIN tag has to be considered. This is the way Henrik has taken.
@Henrik: I was only surprised that you took the asynchronous method with
rule_pending/rule_ready + check_cleanup instead of the method for tags
with action_depends_on_tags. This separates the evaluation of the eval
functions of the two plugins in time, since check_cleanup is called at
the very end after the evaluation of all priorities. Since
_check_fromnamespoof in turn defines a set of tags, all functions that
depend on these tags are also evaluated only at this time and thus all
meta rules that depend on them, etc.
If, on the other hand, action_depends_on_tags is used, e.g.
sub _check_eval {
my ($self, $pms, $result) = @_;
if (exists $pms->{fromname_async_queue}) {
my $rulename = $pms->get_current_eval_rule_name();
$pms->action_depends_on_tags2('DKIMDOMAIN', sub {
$self->_check_fromnamespoof($pms);
return $result->() ? $pms->got_hit($rulename, '', ruletype =>
'header') : 0 ;
});
return;
}
$self->_check_fromnamespoof($pms);
return $result->();
}
then after the DKIMDOMAIN tag is defined, all eval functions that depend
on it are called immediately. Thus, all eval functions run at priority
level 0 and all dependent eval and meta rules can run at the same
priority level. This solution runs on 3.4.6. Therefore I would be
interested which reasons speak against this solution. Maybe there is
something in 4.0 which changes the game.
Regards,
Michael