You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2005/04/30 21:51:56 UTC

[Bug 4294] New: enh mass-check / hit-frequencies / perceptron to ignore duplicates

http://bugzilla.spamassassin.org/show_bug.cgi?id=4294

           Summary: enh mass-check / hit-frequencies / perceptron to ignore
                    duplicates
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: Masses
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: Bob@Menschel.net


Given the flood of spam it's unreasonable to expect humans, even those of us who
can accurately and reliably determine ham vs spam, to identify duplicate emails. 

It's generally considered worth while to eliminate duplicates before feeding
them into mass-check, hit-frequencies, and/or scoring activities.  I therefore
propose that we find a way to do this automatically. 

Attached will be an extract from my personal mass-check script (bash), which I
use for this purpose. It might be a useful starting point for this purpose.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4294] enh mass-check / hit-frequencies / perceptron to ignore duplicates

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4294


quinlan@pathname.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX




------- Additional Comments From quinlan@pathname.com  2005-04-30 19:45 -------
It's better to do this before running the mass-check.  There are scripts
in the tree to do that.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4294] enh mass-check / hit-frequencies / perceptron to ignore duplicates

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4294


Bob@Menschel.net changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|Undefined                   |3.2.0






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4294] enh mass-check / hit-frequencies / perceptron to ignore duplicates

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4294





------- Additional Comments From Bob@Menschel.net  2005-04-30 12:52 -------
Created an attachment (id=2818)
 --> (http://bugzilla.spamassassin.org/attachment.cgi?id=2818&action=view)
bash script which I use to eliminate duplicates before hit-frequencies




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.