You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@spamassassin.apache.org on 2022/11/09 22:37:00 UTC

[Bug 8072] New: AWL incorrectly resets message count

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8072

            Bug ID: 8072
           Summary: AWL incorrectly resets message count
           Product: Spamassassin
           Version: 3.4.6
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Plugins
          Assignee: dev@spamassassin.apache.org
          Reporter: mnalis-sabug@voyager.hr
  Target Milestone: Undefined

I've had situations when some legit messages suddenly have spam score increase
of several hundreds after one email. Trying to debug, I've tracked it down to
AWL module (using mysql backend).

First looking at the mail log, it seems AWL is changing minimally as expected,
and then suddenly it jumps HUGELY (and than trail off slowly again):
AWL=0.016
AWL=0.016
AWL=0.016
AWL=-0.085
AWL=0.016
AWL=0.937
AWL=1.406
AWL=1.627
AWL=318.182
AWL=265.033
AWL=176.606
[...]

sifting through mysqlbinlog to find what happened I find this piece of
evidence:

/*!*/;
# at 585992025
# at 585992247
#221104  9:54:16 server id 1  end_log_pos 585992247 CRC32 0x3d2066fc   
Annotate_rows:
#Q> UPDATE awl SET msgcount = '5', totscore = totscore + '-0.459' WHERE
username = 'amavis' AND email = 'redacted@example.com' AND signedby =
'example-com.20210112.gappssmtp.com' AND i>
#221104  9:54:16 server id 1  end_log_pos 585992317 CRC32 0x3463013b   
Table_map: `spamassassin`.`awl` mapped to number 22
# at 585992317
#221104  9:54:16 server id 1  end_log_pos 585992531 CRC32 0xd7fcc32a   
Update_rows: table id 22 flags: STMT_END_F
### UPDATE `spamassassin`.`awl`
### WHERE
###   @1='amavis'
###   @2='redacted@example.com'
###   @3='213.191'
###   @4=2392
###   @5=3189.25
###   @6='example-com.20210112.gappssmtp.com'
###   @7=1667550625
### SET
###   @1='amavis'
###   @2='redacted@example.com'
###   @3='213.191'
###   @4=5
###   @5=3188.79
###   @6='example-com.20210112.gappssmtp.com'
###   @7=1667552056
# Number of rows: 1
# at 585992531
#221104  9:54:16 server id 1  end_log_pos 585992562 CRC32 0xff0bfeb5    Xid =
8567075
COMMIT/*!*/;

So it seems that while totscore is correctly increased, message count gets
reset  -- instead of increasing from 2392 to 2393, it gets reset to 5 -- which
obviously leads to all further uses to get hugely increased AWL!

That is  spamassassin 3.4.6-1 from Debian Bullseye.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8072] AWL incorrectly resets message count

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8072

Matija Nalis <mn...@voyager.hr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mnalis-sabug@voyager.hr

--- Comment #1 from Matija Nalis <mn...@voyager.hr> ---
Some 20 minutes earlier, I find possibly related AWL update - for same email
address, but different IP range, the msgcount was increased from 3 to 4 (and
the above obviously buggy change sets msgcount to 5:

/*!*/;
# at 579668809
# at 579669029
#221104  9:34:03 server id 1  end_log_pos 579669029 CRC32 0xff91a46d   
Annotate_rows:
#Q> UPDATE awl SET msgcount = '4', totscore = totscore + '1.39' WHERE username
= 'amavis' AND email = 'redacted@example.com' AND signedby =
'example-com.20210112.gappssmtp.com' AND ip >
#221104  9:34:03 server id 1  end_log_pos 579669099 CRC32 0xa7d7c1f0   
Table_map: `spamassassin`.`awl` mapped to number 22
# at 579669099
#221104  9:34:03 server id 1  end_log_pos 579669313 CRC32 0x460a91e9   
Update_rows: table id 22 flags: STMT_END_F
### UPDATE `spamassassin`.`awl`
### WHERE
###   @1='amavis'
###   @2='redacted@example.com'
###   @3='195.190'
###   @4=3
###   @5=9.794
###   @6='example-com.20210112.gappssmtp.com'
###   @7=1667550842
### SET
###   @1='amavis'
###   @2='redacted@example.com'
###   @3='195.190'
###   @4=4
###   @5=11.184
###   @6='example-com.20210112.gappssmtp.com'
###   @7=1667550843
# Number of rows: 1
# at 579669313
#221104  9:34:03 server id 1  end_log_pos 579669344 CRC32 0xb73804b3    Xid =
8550665
COMMIT/*!*/;

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8072] AWL incorrectly resets message count

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8072

--- Comment #3 from Matija Nalis <mn...@voyager.hr> ---
Interestingly, code has following comment:

"NOTE: This code uses a self referential SQL call (ie set foo = foo + 1) which
is supported by most modern database backends, but not everything calling      
 itself a SQL database."

That might explain historic reasons why it uses "set msgcount = ?" instead of
"set msgcount = msgcount + 1", WHERE IT NOT FOR it also using "totscore =
totscore + ?" at the same time (in which case, such non-compliant SQL database
would fail anyway).

There are 3 places in that file using "msgcount = ?" which could easily be
fixed by converting them to "msgcount = msgcount + 1" (and removing that
parameter from SQL argument list).

While that would be welcome and make this specific problem go away (or at least
reduce its severity by several orders of magnitude), what makes me uneasy is
the fact that $entry->{msgcount} (and thus probably whole $entry) gets
corrupted (e.g. probably replaced by some other entry. Brings me back
nightmares from my previous life about non-reentrant C library functions used
in threaded code). 

Or has such $entry corruption bug have been found and fixed since 3.4.6
release?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8072] AWL incorrectly resets message count

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8072

--- Comment #2 from Matija Nalis <mn...@voyager.hr> ---

the SQL statement seems to be in the  Mail/SpamAssassin/SQLBasedAddrList.pm
which seems to set msgcount to value specified by script:

    my $sql = "UPDATE $self->{tablename} ".
              "SET msgcount = ?, totscore = totscore + ? ".
              "WHERE username = ? AND email = ?";

I haven't tried 4.0/trunk for that, but looking in SVN
https://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/SQLBasedAddrList.pm?revision=1903510&view=markup#l361
it seems to use similar logic.

Is there a reason why the SA code specifies the final msgcount score manually?
Sure, original bug leading to that should be found and fixed, but wouldn't
letting SQL update the counter avoid such horrific pitfalls, as we presumably
always work on message-by-message basis?

E.g. something like:

    my $sql = "UPDATE $self->{tablename} ".
              "SET msgcount = msgcount + 1, totscore = totscore + ? ".
              "WHERE username = ? AND email = ?";

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8072] AWL incorrectly resets message count

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8072

Paul Stead <pa...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |paul.stead@gmail.com

--- Comment #4 from Paul Stead <pa...@gmail.com> ---
It's pretty late here, but as I recall, the reason for the "set msgcount = ?"
is because if a message is re-learned by AWL, the totscore is adjusted
positively/negatively, however the msgcount needs to stay the same to make the
calculation work.

-- 
You are receiving this mail because:
You are the assignee for the bug.