You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/09/21 21:05:41 UTC

[Bug 3797] New: spamassassin learns sample-spam as ham

http://bugzilla.spamassassin.org/show_bug.cgi?id=3797

           Summary: spamassassin learns sample-spam as ham
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P4
         Component: spamassassin
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: brian@unearthed.org


Here is the debug output that I thought would be required.  I have bayes stored
in mysql and this was the first test of the bayes in mysql.  I also tested the
same sample-spam.txt with the berkeleydb bayes format (v3 converted from v2) and
had the same result.  This seems a little odd.  I tested wit ha peice of spam
that I have, and it did not auto learn the spam as ham, but as spam.

debug: bayes: Using username: vscan
debug: bayes: Database connection established
debug: bayes: found bayes db version 3
debug: bayes: Using userid: 12
debug: Score set 3 chosen.
--snip--
debug: Running tests for priority: 0
debug: running header regexp tests; score so far=0
--snip--
debug: running uri tests; score so far=996.7
debug: bayes corpus size: nspam = 160456, nham = 6837
debug: tokenize: header tokens for *M = "  GTUBE1 1010101 example net "
debug: tokenize: header tokens for *F = "U*sender D*example.net D*net"
debug: tokenize: header tokens for To = "U*recipient D*example.net D*net"
debug: tokenize: header tokens for Precedence = " junk"
debug: tokenize: header tokens for MIME-Version = " "
debug: tokenize: header tokens for *c = " /plain; charset=us-ascii"
debug: tokenize: header tokens for Content-Transfer-Encoding = " 7bit"
debug: tokenize: header tokens for *RT = " "
debug: tokenize: header tokens for *RU = " "
debug: bayes: tok_get_all: Token Count: 67
debug: bayes token 'Generic' => 0.999970890303069
debug: bayes token 'Bulk' => 0.999529051987768
debug: bayes token 'HPrecedence:junk' => 0.00444628099173554
debug: bayes token 'characters' => 0.992426229508197
debug: bayes token 'supports' => 0.0108502623784683
debug: bayes token 'installed' => 0.0251183618699686
debug: bayes token 'upper' => 0.917427241876386
debug: bayes token 'spam' => 0.0869097926812392
debug: bayes token 'generic' => 0.903304368382317
debug: bayes token 'string' => 0.10639499498212
debug: bayes token 'detecting' => 0.114304051726548
debug: bayes token 'breaks' => 0.134265664855711
debug: bayes: score = 0.500330325948386
debug: bayes: opportunistic call found expiry due
debug: Syncing Bayes and expiring old tokens...
debug: bayes: expiry check keep size, 0.75 * max: 112500
debug: bayes: token count: 161336, final goal reduction size: 48836
debug: bayes: First pass?  Current: 1095792132, Last: 0, atime: 0, count: 0,
newdelta: 0, ratio: 0, period: 43200
debug: bayes: Can't use estimation method for expiry, something fishy,
calculating optimal atime delta (first pass)
debug: bayes: expiry max exponent: 9
debug: bayes: atime     token reduction
debug: bayes: ========  ===============
debug: bayes: 43200     135848
debug: bayes: 86400     119122
debug: bayes: 172800    97892
debug: bayes: 345600    51698
debug: bayes: 691200    0
debug: bayes: 1382400   0
debug: bayes: 2764800   0
debug: bayes: 5529600   0
debug: bayes: 11059200  0
debug: bayes: 22118400  0
debug: bayes: couldn't find a good delta atime, need more token difference,
skipping expire.
debug: Syncing complete.
--snip--
debug: running meta tests; score so far=998.563
debug: running header regexp tests; score so far=998.563
debug: running body-text per-line regexp tests; score so far=998.563
debug: running uri tests; score so far=998.563
debug: running raw-body-text per-line regexp tests; score so far=998.563
debug: running full-text regexp tests; score so far=998.563
debug: Running tests for priority: 1000
debug: running meta tests; score so far=998.563
debug: running header regexp tests; score so far=998.563
debug: running body-text per-line regexp tests; score so far=998.563
debug: running uri tests; score so far=998.563
debug: running raw-body-text per-line regexp tests; score so far=998.563
debug: running full-text regexp tests; score so far=998.563
debug: auto-learn: currently using scoreset 3, recomputing score based on
scoreset 1.
debug: auto-learn: message score: 998.563, computed score for autolearn: -1.115
debug: auto-learn? ham=0.1, spam=12, body-points=1.705, head-points=-2.6,
learned-points=0.001
debug: auto-learn? yes, ham (-1.115 < 0.1)
debug: Learning Ham
debug: all '*From' addrs: sender@example.net
debug: all '*To' addrs: recipient@example.net
debug: bayes: Database connection established
debug: bayes: found bayes db version 3
debug: bayes: Using userid: 12
debug: tokenize: header tokens for *M = "  GTUBE1 1010101 example net "
debug: tokenize: header tokens for *F = "U*sender D*example.net D*net"
debug: tokenize: header tokens for To = "U*recipient D*example.net D*net"
debug: tokenize: header tokens for Precedence = " junk"
debug: tokenize: header tokens for MIME-Version = " "
debug: tokenize: header tokens for *c = " /plain; charset=us-ascii"
debug: tokenize: header tokens for Content-Transfer-Encoding = " 7bit"
debug: tokenize: header tokens for *RT = " "
debug: tokenize: header tokens for *RU = " "
debug: bayes: seen (15b2b262a6ed121f3c1dcb5561dd69dc254dfad8@sa_generated) put
debug: bayes: Learned '15b2b262a6ed121f3c1dcb5561dd69dc254dfad8@sa_generated',
atime: 1058995800
debug: is spam? score=998.563 required=5
debug:
tests=ALL_TRUSTED,BAYES_50,DNS_FROM_AHBL_RHSBL,GTUBE,RAZOR2_CF_RANGE_51_100,RAZOR2_CHECK
debug:
subtests=__CT,__CTE,__CT_TEXT_PLAIN,__HAS_MSGID,__HAS_SUBJECT,__MIME_VERSION,__MSGID_OK_HOST,__SANE_MSGID,__UNUSABLE_MSGID
Received: from localhost by hades.unearthed.org
        with SpamAssassin (version 3.0.0-rc5);
        Tue, 21 Sep 2004 11:42:40 -0700
From: Sender <se...@example.net>
To: Recipient <re...@example.net>
Subject: Test spam mail (GTUBE)
Date: Wed, 23 Jul 2003 23:30:00 +0200
Message-Id: <GT...@example.net>
X-Spam-Flag: YES
X-Spam-Level: **************************************************
X-Spam-Checker-Version: SpamAssassin 3.0.0-rc5 (2004-09-13) on
        hades.unearthed.org
X-Spam-Status: Yes, score=998.6 required=5.0 tests=ALL_TRUSTED,BAYES_50,
        DNS_FROM_AHBL_RHSBL,GTUBE,RAZOR2_CF_RANGE_51_100,RAZOR2_CHECK
        autolearn=ham version=3.0.0-rc5
X-Spam-Report:
        * -3.3 ALL_TRUSTED Did not pass through any untrusted hosts
        * 1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email
        *  0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60%
        *      [score: 0.5003]
        *  0.1 RAZOR2_CF_RANGE_51_100 BODY: Razor2 gives confidence level above 50%
        *      [cf: 100]
        *  1.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
        *  0.3 DNS_FROM_AHBL_RHSBL RBL: From: sender listed in dnsbl.ahbl.org
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----------=_41507620.E1C0559F"



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.