You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2005/07/20 21:06:10 UTC
[Bug 4494] New: sa-learn uses local_tests_only=0 which can mess up bayes
http://bugzilla.spamassassin.org/show_bug.cgi?id=4494
Summary: sa-learn uses local_tests_only=0 which can mess up bayes
Product: Spamassassin
Version: 3.0.4
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P5
Component: Learner
AssignedTo: dev@spamassassin.apache.org
ReportedBy: dharris@drh.net
The sa-learn command specifies local_tests_only as false to Mail::SpamAssassin
and there is no way to override this.
If local_test_only is set to true when doing actual scanning, this can confuse
bayes.
Let me explain: The tokenizer will get different tokens for the same message
depending on the local_tests_only setting, because DNS lookups are sometimes
required to differentiate trusted and un-trusted received headers. Tokens in
trusted and un-trusted receievd headers are prefixed wtih "*RT:" and "*RU:" so
completely different tokens get created. I had this mess up bayes in my
situation.
Here is my patch. However, you probably want this to be a configurable command
line option.
http://www.davideous.com/qmail/Mail-SpamAssassin-3.0.4-antietam-bayes-
customizations-040719-just-salearn.patch
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4494] [review] sa-learn uses local_tests_only=0 which can mess up bayes
Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4494
spamassassin@dostech.ca changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|sa-learn uses |[review] sa-learn uses
|local_tests_only=0 which can|local_tests_only=0 which can
|mess up bayes |mess up bayes
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
[Bug 4494] sa-learn uses local_tests_only=0 which can mess up bayes
Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4494
Bob@Menschel.net changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |dev@spamassassin.apache.org
Target Milestone|Undefined |3.1.0
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
[Bug 4494] [review] sa-learn uses local_tests_only=0 which can mess up bayes
Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4494
------- Additional Comments From dharris@drh.net 2005-08-29 09:31 -------
Justin Mason 2005-08-04 17:25 wrote:
> gh, this is confusing -- I'm not sure Bayes should be seeing
> any diff between message tokens, whether -L is on or not.
>
> David, could you post a demo of what you saw?
I'm sorry for the late rpely. I don't have the time to post a demo, and it
might be a moot point right now. But here is more thinking, if you want to
track this down:
The logic for determining if a header is a trusted header or an untrusted
header is entirely different depending on if remote network tests are allowed.
I think that if you look in the tokenize headers function (I forget the name)
you can see this. Each token, depending on if it came from a trusted or an un-
trusted header, gets a different prefix to show the context, so this really
*can* create different tokens from the same e-mail depending on if network
tests are enabled or not. (If the tokensize headers function was guaranteed
make the same decisions about trusted/untrusted headers regardless of having
network tests, then it could be much simplified.)
In addition, you might want to add a note in the documentation telling people
to consider re-learning from their corpus after changing the trusted_networks
and/or internal_networks configuration.
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
[Bug 4494] [review] sa-learn uses local_tests_only=0 which can mess up bayes
Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4494
sidney@sidney.com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status Whiteboard|1 more vote |ready to apply
------- Additional Comments From sidney@sidney.com 2005-08-27 20:21 -------
+1
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
[Bug 4494] [review] sa-learn uses local_tests_only=0 which can mess up bayes
Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4494
duncf@debian.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status Whiteboard| |1 more vote
------- Additional Comments From duncf@debian.org 2005-08-27 17:54 -------
+1
That's clearly a bug, as the --local option doesn't do *anything* right now. As
to whether or not there should be a difference between local and net with
sa-learn, that should be left till later. (Not 3.1.0 blocking, definitely).
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
[Bug 4494] sa-learn uses local_tests_only=0 which can mess up bayes
Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4494
Bob@Menschel.net changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|dev@spamassassin.apache.org |Bob@Menschel.net
Status|NEW |ASSIGNED
------- Additional Comments From Bob@Menschel.net 2005-07-31 20:09 -------
Created an attachment (id=3047)
--> (http://bugzilla.spamassassin.org/attachment.cgi?id=3047&action=view)
patch to sa-learn to activate --local parameter
There is a command line option, sa-learn -L or --local, which should do what
you want, if I read the sa-learn documentation correctly.
The code currently reads
> local_tests_only => 1,
You change this to
> local_tests_only => 0,
instead, if I read the code correctly, it should be
> local_tests_only => $opt{'local'},
Patch submitted for dev review.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4494] [review] sa-learn uses local_tests_only=0 which can mess up bayes
Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4494
duncf@debian.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
------- Additional Comments From duncf@debian.org 2005-08-27 20:32 -------
Fixed on TRUNK and branch, r263807 and r263808.
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
[Bug 4494] [review] sa-learn uses local_tests_only=0 which can mess up bayes
Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4494
------- Additional Comments From jm@jmason.org 2005-08-04 17:25 -------
argh, this is confusing -- I'm not sure Bayes should be seeing any diff between
message tokens, whether -L is on or not.
David, could you post a demo of what you saw?
it may unavoidable to apply this patch, but I'd prefer to avoid having to do any
network accesses while learning.
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.