You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@spamassassin.apache.org on 2020/06/07 09:46:30 UTC

[Bug 7822] New: HashBL not examining all addresses in a message

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7822

            Bug ID: 7822
           Summary: HashBL not examining all addresses in a message
           Product: Spamassassin
           Version: 3.4.4
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Plugins
          Assignee: dev@spamassassin.apache.org
          Reporter: mgrant@grant.org
  Target Milestone: Undefined

HashBL isn't examining all email addresses in a message.  

On line 344 of HashBO.pm, it gets the stripped body of lines back in an array
reference.  The join is returning an array and not a string in this case.  (see
https://www.perlmonks.org/?node_id=222012).  So all we need to do is deref it
into an array.

Here's the fix:

*** /usr/share/perl5/Mail/SpamAssassin/Plugin/HashBL.pm 2020-06-07
05:38:02.517917366 -0400
--- /usr/share/perl5/Mail/SpamAssassin/Plugin/HashBL.pm.orig    2019-11-10
23:09:44.000000000 -0500
***************
*** 341,347 ****
          }
        }
      }
!     my $body = join('', @{$pms->get_decoded_stripped_body_text_array()});
      if ($opts =~ /\bnouri\b/) {
        # strip urls with possible emails inside
        $body =~ s#<?https?://\S{0,255}(?:\@|%40)\S{0,255}# #gi;
--- 341,347 ----
          }
        }
      }
!     my $body = join('', $pms->get_decoded_stripped_body_text_array());
      if ($opts =~ /\bnouri\b/) {
        # strip urls with possible emails inside
        $body =~ s#<?https?://\S{0,255}(?:\@|%40)\S{0,255}# #gi;

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7822] HashBL not examining all addresses in a message

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7822

--- Comment #8 from Henrik Krohns <ap...@hege.li> ---

Multiple emails now logged.

Sending        spamassassin-3.4/lib/Mail/SpamAssassin/Plugin/HashBL.pm
Sending        trunk/lib/Mail/SpamAssassin/Plugin/HashBL.pm
Transmitting file data ..done
Committing transaction...
Committed revision 1878574.

Note that 3.4 requires also Revision 1878572, this doesn't work without
check_cleanup callback.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7822] HashBL not examining all addresses in a message

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7822

Michael Grant <mg...@grant.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mgrant@grant.org

--- Comment #1 from Michael Grant <mg...@grant.org> ---
(sorry that was a "reverse patch" but you see what I did)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7822] HashBL not examining all addresses in a message

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7822

Henrik Krohns <ap...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED
                 CC|                            |apache@hege.li

--- Comment #2 from Henrik Krohns <ap...@hege.li> ---
Thanks fixed.

Sending        spamassassin-3.4/lib/Mail/SpamAssassin/Plugin/HashBL.pm
Sending        trunk/lib/Mail/SpamAssassin/Plugin/HashBL.pm
Transmitting file data ..done
Committing transaction...
Committed revision 1878559.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7822] HashBL not examining all addresses in a message

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7822

--- Comment #9 from Henrik Krohns <ap...@hege.li> ---
(In reply to Henrik Krohns from comment #8)
> Multiple emails now logged.
> 
> Sending        spamassassin-3.4/lib/Mail/SpamAssassin/Plugin/HashBL.pm
> Sending        trunk/lib/Mail/SpamAssassin/Plugin/HashBL.pm
> Transmitting file data ..done
> Committing transaction...
> Committed revision 1878574.
> 
> Note that 3.4 requires also Revision 1878572, this doesn't work without
> check_cleanup callback.

FYI this change was reverted in 3.4, due to Bug 7897.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7822] HashBL not examining all addresses in a message

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7822

--- Comment #6 from Michael Grant <mg...@grant.org> ---
Not shouting at you.  With such a long report, I am merely trying to draw
attention to what I thought was the relavant part of the message, as in bold.

Thanks for pointing out that those were not deprecated, that's my misreading of
the docs then.  

I did spend about 3 full days digging into these things.  I'm no expert at
spamassassin modules.  I hope at least I shedded some light on what's going on
or at minimum make it clear that there seems to be problems that I couldn't fix
and someone with better knowledge/experience with these modules can take it
further.

At minimum, with the changes I did, the module *seems* usable. (my emphasis,
not shouting!)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7822] HashBL not examining all addresses in a message

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7822

--- Comment #4 from Henrik Krohns <ap...@hege.li> ---
(In reply to Michael Grant from comment #3)
> HashBL uses this bgsend_and_start_lookup function which seems to be
> DEPRECATED.

What makes you think this? It's not.

In trunk the workings were streamlined and register_async_rule_* was deprecated
as redundant there.

> It then calls _fininsh_query() which calls got_hit() which I think
> may be not causing it to collect up any further evidence.  

All DNS queries are independent and each of them will hit _finish_query() in
the end, nothing stopping the rest from hitting.

> PLEASE REPORT ALL ADDRESSES that get hits in the hashbl.

Shouting doesn't help, but I'll try to have a look.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7822] HashBL not examining all addresses in a message

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7822

--- Comment #7 from Henrik Krohns <ap...@hege.li> ---
(In reply to Michael Grant from comment #6)
>
> At minimum, with the changes I did, the module *seems* usable. (my emphasis,
> not shouting!)

Your logic is in the right direction, one can save all the hits without calling
got_hit() and only later check if there was a hit. But it needs to be done in
slightly more correct way, I'll see if I can make it today..

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7822] HashBL not examining all addresses in a message

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7822

--- Comment #5 from Henrik Krohns <ap...@hege.li> ---
(In reply to Henrik Krohns from comment #4)
>
> Shouting doesn't help, but I'll try to have a look.

I've created Bug 7824, but it's not a simple fix. So it will only be
implemented in 4.0+ if someone has time for it.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7822] HashBL not examining all addresses in a message

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7822

--- Comment #3 from Michael Grant <mg...@grant.org> ---
While debugging this, I also see that HashBL.pm is only reporting the first
hit.  

HashBL.pm looks though a message and extracts out all the email addresses in
the message headers and body.  

        *  0.1 HASHBL_EMAIL Message contains email address found on
        *      the EBL
        *      [user[at]example.com]

Here's what's going on.  On lines 646,647:
 646│  $ent = $pms->{async}->bgsend_and_start_lookup($lookup, $type, undef,
$ent,
 647│    sub { my ($ent, $pkt) = @_; $self->_finish_query($pms, $ent, $pkt); },

HashBL uses this bgsend_and_start_lookup function which seems to be DEPRECATED.
 It then calls _fininsh_query() which calls got_hit() which I think may be not
causing it to collect up any further evidence.  

One could argue that if a message contains at least one address on the hashbl,
why continue?  However, we have already made the queries for each address (or
gotten them from the cache).  If you don't put them in the headers, you obscure
why valuable information in downstream spam reporting as to why this message
was flagged as spam.  PLEASE REPORT ALL ADDRESSES that get hits in the hashbl.

Here's a larger fix for that, but I don't know if I did this the best, cleanest
way. Probably updating this module not to use the deprecated
bgsend_and_start_lookup() function and rewriting whatever replacement for the
finish_query() function is probably the better way to fix this, but for now,
this at least shows me all the addresses that get hits in a , separated list. 
I did not know how to get the addresses to come out on separate lines, and
adding a space after the , (as in ", ") seems to force a separate line in a
strange way.

*** /usr/share/perl5/Mail/SpamAssassin/Plugin/HashBL.pm.orig    2019-11-10
23:09:44.000000000 -0500
--- /usr/share/perl5/Mail/SpamAssassin/Plugin/HashBL.pm 2020-06-07
06:53:06.459383712 -0400
***************
*** 150,155 ****
--- 150,157 ----
    $self->register_eval_rule("check_hashbl_uris");
    $self->register_eval_rule("check_hashbl_bodyre");
    $self->set_config($mailsa->{conf});
+   $self->{outstanding_queries}=0;
+   $self->{hits}="";

    return $self;
  }
***************
*** 341,347 ****
          }
        }
      }
!     my $body = join('', $pms->get_decoded_stripped_body_text_array());
      if ($opts =~ /\bnouri\b/) {
        # strip urls with possible emails inside
        $body =~ s#<?https?://\S{0,255}(?:\@|%40)\S{0,255}# #gi;
--- 343,349 ----
          }
        }
      }
!     my $body = join('', @{$pms->get_decoded_stripped_body_text_array()});
      if ($opts =~ /\bnouri\b/) {
        # strip urls with possible emails inside
        $body =~ s#<?https?://\S{0,255}(?:\@|%40)\S{0,255}# #gi;
***************
*** 643,648 ****
--- 645,651 ----
      value => $value,
      subtest => $subtest,
    };
+   $self->{outstanding_queries}++;
    $ent = $pms->{async}->bgsend_and_start_lookup($lookup, $type, undef, $ent,
      sub { my ($ent, $pkt) = @_; $self->_finish_query($pms, $ent, $pkt); },
      master_deadline => $pms->{master_deadline}
***************
*** 665,676 ****
      if ($rr->address =~ $dnsmatch) {
        dbg("$ent->{rulename}: $ent->{zone} hit '$ent->{value}'");
        $ent->{value} =~ s/\@/[at]/g;
!       $pms->test_log($ent->{value});
        $pms->got_hit($ent->{rulename}, '', ruletype => 'eval');
        $pms->register_async_rule_finish($ent->{rulename});
-       return;
-     }
    }
  }

  # Version features
--- 668,684 ----
      if ($rr->address =~ $dnsmatch) {
        dbg("$ent->{rulename}: $ent->{zone} hit '$ent->{value}'");
        $ent->{value} =~ s/\@/[at]/g;
!       if ($self->{hits}) { $self->{hits} .= ","; }
!       $self->{hits} .= $ent->{value};
!     }
!   }
!   $self->{outstanding_queries}--;
!   if ($self->{outstanding_queries}==0) {
!       $pms->test_log($self->{hits});
        $pms->got_hit($ent->{rulename}, '', ruletype => 'eval');
        $pms->register_async_rule_finish($ent->{rulename});
    }
+   return;
  }

  # Version features

-- 
You are receiving this mail because:
You are the assignee for the bug.