You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2005/07/20 20:58:25 UTC
[Bug 4493] New: add pre-tokenize text munge to learner
http://bugzilla.spamassassin.org/show_bug.cgi?id=4493
Summary: add pre-tokenize text munge to learner
Product: Spamassassin
Version: 3.0.4
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Learner
AssignedTo: dev@spamassassin.apache.org
ReportedBy: dharris@drh.net
I setup spamassassin with a site-wide bayes database. Users are reporting their
own spam, and after being approved by an administrator, that spam is used to
train the spamassassin bayes database.
Because I have users reporting spam into a global bayes database, I want the
learner to ignore any e-mail addresses of my users in the learning, because if
one user happens to report lots of spam, bayes would learn that their address
means spam. I don't want this.
I have already excluded the To, Cc, Bcc headers using the base_ignore_header
config, however e-mail addresses show up in my Received header like the
following and can show up others places too.
Received: from w3.drh.net ([64.21.76.5])
(envelope-sender <dh...@drh.net>)
by secondary.scan1.myactv.net (qmail-ldap-1.03) with SMTP
for <te...@mail.myactv.net>; 20 Jul 2005 18:12:32 -0000
So, I created a patch that applies the below regular expression to any text
before it tokenized by bayes to wipe out the username:
s/[a-z0-9][a-z0-9\_\.-]{1,48}\@
(myactv.net|mail.myactv.net|mss1.myactv.net)/MYACTVREPLACEDUSERNAME\@myactv.net/
gi;
Because I have multiple MX servers, I also used this regular expression to
solve the problem described here http://wiki.apache.org/spamassassin/BayesBitMe
s/scan\d.myactv.net/scan1.myactv.net/g;
A configurable way rewrite text before tokenization would be appreciated.
Also note that crm114 (http://crm114.sourceforge.net/) has a feature to do this
same thing.
Here is my patch to add this feature manually:
http://www.davideous.com/qmail/Mail-SpamAssassin-3.0.4-antietam-bayes-
customizations-040719-just-rewrite.patch
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4493] add pre-tokenize text munge to learner
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4493
felicity@apache.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|Undefined |3.2.0
------- Additional Comments From felicity@apache.org 2006-12-31 12:48 -------
It seems like a plugin call in Bayes::tokenize() would solve this. Then people
could filter out whatever tokens they don't want, or add in new tokens, or whatever.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4493] RFE: add pre-tokenize text munge to learner
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4493
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P2 |P5
Summary|add pre-tokenize text munge |RFE: add pre-tokenize text
|to learner |munge to learner
------- Additional Comments From jm@jmason.org 2007-01-14 07:00 -------
seems unlikely to happen in 3.2.0 without a patch
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4493] RFE: add pre-tokenize text munge to learner
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4493
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|3.2.0 |3.3.0
------- Additional Comments From jm@jmason.org 2007-02-21 12:05 -------
pushing out to 3.3.0, since I don't think it's a 3.2.0 blocker. shout (or change
the milestone) if you disagree....
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.