You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2005/02/13 10:40:18 UTC
[Bug 4135] New: Suggestion for Improved Bayesian Filtering
http://bugzilla.spamassassin.org/show_bug.cgi?id=4135
Summary: Suggestion for Improved Bayesian Filtering
Product: Spamassassin
Version: 3.0.2
Platform: PC
OS/Version: other
Status: NEW
Severity: normal
Priority: P3
Component: spamassassin
AssignedTo: dev@spamassassin.apache.org
ReportedBy: tom@hedges.com
1) Use current rules to convert special character-mixed Ascii to common ASCII text through incoming e-
mail.
2) By analyzing boundaries between text and HTML, develop heuristics to discard leading and trailing plain
text "red herring" blocks of text that are added simply to undermine conventional Bayesian analysis.
3) Use techology from speech recognition to discard semantically unlike groups of 2, 3 and 4 words (work
by Dragon and ViaVioce is very advanced in this area).
4) Only do Bayesian analysis after performing thr prior steps to reduce "poisoning".
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4135] Suggestion for Improved Bayesian Filtering
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4135
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |WONTFIX
------- Additional Comments From jm@jmason.org 2007-04-16 05:38 -------
I think the meat of this will be implemented in the Bayesian Noise Reduction
code being written in this year's summer of code...
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
Re: [Bug 4135] New: Suggestion for Improved Bayesian Filtering
Posted by Robert Menschel <Ro...@Menschel.net>.
Sunday, February 13, 2005, 1:40:18 AM, tom@hedges.com wrote:
> http://bugzilla.spamassassin.org/show_bug.cgi?id=4135
> Summary: Suggestion for Improved Bayesian Filtering
> ReportedBy: tom@hedges.com
> 2) By analyzing boundaries between text and HTML, develop heuristics
> to discard leading and trailing plain text "red herring" blocks of
> text that are added simply to undermine conventional Bayesian
> analysis.
Tom, Do you find that these "red herring" blocks of text actually
cause any problems? I find that their very use of randomized text, or
literary text, provides fodder for Bayes because of their significant
difference from conversational email, technical email, and newsletter
email (none of which bears enough of a relationship with the red
herring sections to cause Bayes any confusion here).
Bob Menschel
[Bug 4135] Suggestion for Improved Bayesian Filtering
Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4135
Bob@Menschel.net changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|Undefined |Future
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.