You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2005/02/13 10:40:18 UTC

[Bug 4135] New: Suggestion for Improved Bayesian Filtering

http://bugzilla.spamassassin.org/show_bug.cgi?id=4135

           Summary: Suggestion for Improved Bayesian Filtering
           Product: Spamassassin
           Version: 3.0.2
          Platform: PC
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P3
         Component: spamassassin
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: tom@hedges.com


1) Use current rules to convert special character-mixed Ascii to common ASCII text through incoming e-
mail.
2) By analyzing boundaries between text and HTML, develop heuristics to discard leading and trailing plain 
text "red herring" blocks of text that are added simply to undermine conventional Bayesian analysis.
3) Use techology from speech recognition to discard semantically unlike groups of 2, 3 and 4 words (work 
by Dragon and ViaVioce is very advanced in this area).
4) Only do Bayesian analysis after performing thr prior steps to reduce "poisoning".



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4135] Suggestion for Improved Bayesian Filtering

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4135


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX




------- Additional Comments From jm@jmason.org  2007-04-16 05:38 -------
I think the meat of this will be implemented in the Bayesian Noise Reduction
code being written in this year's summer of code...



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Re: [Bug 4135] New: Suggestion for Improved Bayesian Filtering

Posted by Robert Menschel <Ro...@Menschel.net>.
Sunday, February 13, 2005, 1:40:18 AM, tom@hedges.com wrote:

> http://bugzilla.spamassassin.org/show_bug.cgi?id=4135
>            Summary: Suggestion for Improved Bayesian Filtering
>         ReportedBy: tom@hedges.com

> 2) By analyzing boundaries between text and HTML, develop heuristics
> to discard leading and trailing plain text "red herring" blocks of
> text that are added simply to undermine conventional Bayesian
> analysis.

Tom, Do you find that these "red herring" blocks of text actually
cause any problems?  I find that their very use of randomized text, or
literary text, provides fodder for Bayes because of their significant
difference from conversational email, technical email, and newsletter
email (none of which bears enough of a relationship with the red
herring sections to cause Bayes any confusion here).

Bob Menschel




[Bug 4135] Suggestion for Improved Bayesian Filtering

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4135


Bob@Menschel.net changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|Undefined                   |Future






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.