You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@spamassassin.apache.org on 2019/12/11 13:03:23 UTC
[Bug 7776] New: Limit Bayes parsed token count
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7776
Bug ID: 7776
Summary: Limit Bayes parsed token count
Product: Spamassassin
Version: 3.4.2
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: P2
Component: Learner
Assignee: dev@spamassassin.apache.org
Reporter: apache@hege.li
Target Milestone: Undefined
As discussed on users lists, probably should limit the amount of tokens parsed.
As a safeguard, if scanning degenerate messages (example: 20MB uuencoded
message body generating 6000000 tokens).
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7776] Limit Bayes parsed token count
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7776
--- Comment #3 from Henrik Krohns <ap...@hege.li> ---
Just in case 3.4.4 realizes..
Sending spamassassin-3.4/lib/Mail/SpamAssassin/Plugin/Bayes.pm
Transmitting file data .done
Committing transaction...
Committed revision 1871708.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7776] Limit Bayes parsed token count
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7776
Henrik Krohns <ap...@hege.li> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |apache@hege.li
--- Comment #1 from Henrik Krohns <ap...@hege.li> ---
On Wed, Dec 11, 2019 at 01:58:03PM +0100, Matus UHLAR - fantomas wrote:
>
> My question was, if there's a bug in the bayes code, causing it to eat too
> much of memory. Both ~750B per token with file-based bayes or ~600B per
> token in redis-based BAYES looks like too much for me.
Not so much a bug, but we should probably add some internal limit to parsed
tokens (10000?) - a normal message would not contain more tokens. At those
counts the per token memory usage is irrelevant (but we could look at
optimizing it too). Just need to be careful not to create a loophole for
spammers (filling up few 50k parts with random short tokens, so last part
won't be tokenized at all?)
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7776] Limit Bayes parsed token count
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7776
Henrik Krohns <ap...@hege.li> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #2 from Henrik Krohns <ap...@hege.li> ---
Quick analyze of corpus shows token counts pretty much never exceed ~2000.
Implemented some hard limits which are way abobe that, 50k tokens for body, 10k
for uris/headers.
Sending trunk/lib/Mail/SpamAssassin/Plugin/Bayes.pm
Transmitting file data .done
Committing transaction...
Committed revision 1871706.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7776] Machine Learning Online Course
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7776
Henrik Krohns <ap...@hege.li> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |INVALID
CC| |apache@hege.li
Status|NEW |RESOLVED
--- Comment #1 from Henrik Krohns <ap...@hege.li> ---
spam
Anyone have permissions to delete the user?
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7776] Machine Learning Online Course
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7776
Henrik Krohns <ap...@hege.li> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|Building & Packaging |trash
Assignee|dev@spamassassin.apache.org |apache@hege.li
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7776] Limit Bayes parsed token count
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7776
Henrik Krohns <ap...@hege.li> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|Undefined |4.0.0
--
You are receiving this mail because:
You are the assignee for the bug.