You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2005/07/20 21:06:10 UTC

[Bug 4494] New: sa-learn uses local_tests_only=0 which can mess up bayes

http://bugzilla.spamassassin.org/show_bug.cgi?id=4494

           Summary: sa-learn uses local_tests_only=0 which can mess up bayes
           Product: Spamassassin
           Version: 3.0.4
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Learner
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: dharris@drh.net


The sa-learn command specifies local_tests_only as false to Mail::SpamAssassin 
and there is no way to override this.

If local_test_only is set to true when doing actual scanning, this can confuse 
bayes.

Let me explain: The tokenizer will get different tokens for the same message 
depending on the local_tests_only setting, because DNS lookups are sometimes 
required to differentiate trusted and un-trusted received headers. Tokens in 
trusted and un-trusted receievd headers are prefixed wtih "*RT:" and "*RU:" so 
completely different tokens get created. I had this mess up bayes in my 
situation.

Here is my patch. However, you probably want this to be a configurable command 
line option.

http://www.davideous.com/qmail/Mail-SpamAssassin-3.0.4-antietam-bayes-
customizations-040719-just-salearn.patch



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4494] [review] sa-learn uses local_tests_only=0 which can mess up bayes

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4494


spamassassin@dostech.ca changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|sa-learn uses               |[review] sa-learn uses
                   |local_tests_only=0 which can|local_tests_only=0 which can
                   |mess up bayes               |mess up bayes






------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4494] sa-learn uses local_tests_only=0 which can mess up bayes

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4494


Bob@Menschel.net changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dev@spamassassin.apache.org
   Target Milestone|Undefined                   |3.1.0






------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4494] [review] sa-learn uses local_tests_only=0 which can mess up bayes

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4494





------- Additional Comments From dharris@drh.net  2005-08-29 09:31 -------

Justin Mason 2005-08-04 17:25 wrote:
> gh, this is confusing -- I'm not sure Bayes should be seeing 
> any diff between message tokens, whether -L is on or not.
> 
> David, could you post a demo of what you saw?

I'm sorry for the late rpely. I don't have the time to post a demo, and it 
might be a moot point right now. But here is more thinking, if you want to 
track this down:

The logic for determining if a header is a trusted header or an untrusted 
header is entirely different depending on if remote network tests are allowed. 
I think that if you look in the tokenize headers function (I forget the name) 
you can see this. Each token, depending on if it came from a trusted or an un-
trusted header, gets a different prefix to show the context, so this really 
*can* create different tokens from the same e-mail depending on if network 
tests are enabled or not. (If the tokensize headers function was guaranteed 
make the same decisions about trusted/untrusted headers regardless of having 
network tests, then it could be much simplified.)

In addition, you might want to add a note in the documentation telling people 
to consider re-learning from their corpus after changing the trusted_networks 
and/or internal_networks configuration.




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4494] [review] sa-learn uses local_tests_only=0 which can mess up bayes

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4494


sidney@sidney.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Status Whiteboard|1 more vote                 |ready to apply




------- Additional Comments From sidney@sidney.com  2005-08-27 20:21 -------
+1




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4494] [review] sa-learn uses local_tests_only=0 which can mess up bayes

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4494


duncf@debian.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Status Whiteboard|                            |1 more vote




------- Additional Comments From duncf@debian.org  2005-08-27 17:54 -------
+1

That's clearly a bug, as the --local option doesn't do *anything* right now. As
to whether or not there should be a difference between local and net with
sa-learn, that should be left till later. (Not 3.1.0 blocking, definitely).



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4494] sa-learn uses local_tests_only=0 which can mess up bayes

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4494


Bob@Menschel.net changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|dev@spamassassin.apache.org |Bob@Menschel.net
             Status|NEW                         |ASSIGNED




------- Additional Comments From Bob@Menschel.net  2005-07-31 20:09 -------
Created an attachment (id=3047)
 --> (http://bugzilla.spamassassin.org/attachment.cgi?id=3047&action=view)
patch to sa-learn to activate --local parameter

There is a command line option, sa-learn -L or --local, which should do what
you want, if I read the sa-learn documentation correctly. 

The code currently reads
> local_tests_only    => 1,
You change this to 
> local_tests_only    => 0,
instead, if I read the code correctly, it should be
> local_tests_only    => $opt{'local'},

Patch submitted for dev review. 



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4494] [review] sa-learn uses local_tests_only=0 which can mess up bayes

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4494


duncf@debian.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED




------- Additional Comments From duncf@debian.org  2005-08-27 20:32 -------
Fixed on TRUNK and branch, r263807 and r263808.



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4494] [review] sa-learn uses local_tests_only=0 which can mess up bayes

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4494





------- Additional Comments From jm@jmason.org  2005-08-04 17:25 -------
argh, this is confusing -- I'm not sure Bayes should be seeing any diff between
message tokens, whether -L is on or not.

David, could you post a demo of what you saw?

it may unavoidable to apply this patch, but I'd prefer to avoid having to do any
network accesses while learning.



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.