You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@spamassassin.apache.org on 2019/12/11 13:03:23 UTC

[Bug 7776] New: Limit Bayes parsed token count

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7776

            Bug ID: 7776
           Summary: Limit Bayes parsed token count
           Product: Spamassassin
           Version: 3.4.2
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Learner
          Assignee: dev@spamassassin.apache.org
          Reporter: apache@hege.li
  Target Milestone: Undefined

As discussed on users lists, probably should limit the amount of tokens parsed.
As a safeguard, if scanning degenerate messages (example: 20MB uuencoded
message body generating 6000000 tokens).

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7776] Limit Bayes parsed token count

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7776

--- Comment #3 from Henrik Krohns <ap...@hege.li> ---
Just in case 3.4.4 realizes..

Sending        spamassassin-3.4/lib/Mail/SpamAssassin/Plugin/Bayes.pm
Transmitting file data .done
Committing transaction...
Committed revision 1871708.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7776] Limit Bayes parsed token count

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7776

Henrik Krohns <ap...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |apache@hege.li

--- Comment #1 from Henrik Krohns <ap...@hege.li> ---
On Wed, Dec 11, 2019 at 01:58:03PM +0100, Matus UHLAR - fantomas wrote:
>
> My question was, if there's a bug in the bayes code, causing it to eat too
> much of memory.  Both ~750B per token with file-based bayes or ~600B per
> token in redis-based BAYES looks like too much for me.

Not so much a bug, but we should probably add some internal limit to parsed
tokens (10000?) - a normal message would not contain more tokens.  At those
counts the per token memory usage is irrelevant (but we could look at
optimizing it too).  Just need to be careful not to create a loophole for
spammers (filling up few 50k parts with random short tokens, so last part
won't be tokenized at all?)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7776] Limit Bayes parsed token count

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7776

Henrik Krohns <ap...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #2 from Henrik Krohns <ap...@hege.li> ---

Quick analyze of corpus shows token counts pretty much never exceed ~2000.
Implemented some hard limits which are way abobe that, 50k tokens for body, 10k
for uris/headers.

Sending        trunk/lib/Mail/SpamAssassin/Plugin/Bayes.pm
Transmitting file data .done
Committing transaction...
Committed revision 1871706.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7776] Machine Learning Online Course

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7776

Henrik Krohns <ap...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
                 CC|                            |apache@hege.li
             Status|NEW                         |RESOLVED

--- Comment #1 from Henrik Krohns <ap...@hege.li> ---
spam

Anyone have permissions to delete the user?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7776] Machine Learning Online Course

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7776

Henrik Krohns <ap...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|Building & Packaging        |trash
           Assignee|dev@spamassassin.apache.org |apache@hege.li

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7776] Limit Bayes parsed token count

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7776

Henrik Krohns <ap...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|Undefined                   |4.0.0

-- 
You are receiving this mail because:
You are the assignee for the bug.