You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Mike Cavanagh <mi...@5cs.com> on 2005/08/03 06:13:39 UTC

Bayes: not enough usable tokens found

What does this message mean??
    debug: cannot use bayes on this message; not enough usable tokens found
    debug: bayes: not scoring message, returning undef

I am using MimeDefang Ver. 2.52 and SpamAssassin Ver. 3.0.4

Below is:
    current status of bayes database (sa-learn --dump=magic)
    sa-mimedefang.cf
    spamassassin --lint --debug

What am I doing wrong?  I am sure this is something simple, I just can't 
seem to see it.
Thanks,
Mike.

*********************************************************
SA-LEARN Status:
 /usr/local/bin/sa-learn --username=mimedefang 
--siteconfigpath=/etc/mail/spamassassin --dump=magic
0.000          0          3          0  non-token data: bayes db version
0.000          0       4275          0  non-token data: nspam
0.000          0        765          0  non-token data: nham
0.000          0     148928          0  non-token data: ntokens
0.000          0 1120235107          0  non-token data: oldest atime
0.000          0 1123040192          0  non-token data: newest atime
0.000          0 1123030366          0  non-token data: last journal 
sync atime
0.000          0 1123000571          0  non-token data: last expiry atime
0.000          0    2764800          0  non-token data: last expire 
atime delta
0.000          0       2580          0  non-token data: last expire 
reduction count

*********************************************************
Sa-mimedefang.cf:
required_hits           10
ok_locales              en, zh
skip_rbl_checks 0               # Go ahead and check anyways
use_bayes 1
bayes_auto_learn 1
bayes_auto_learn_threshold_nonspam 0.1
bayes_auto_learn_threshold_spam 12.0
bayes_learn_during_report 1
bayes_path /etc/mail/spamassassin/bayes
bayes_file_mode 0700
bayes_min_ham_num 200
bayes_min_spam_num 200
bayes_use_hapaxes 1
bayes_use_chi2_combining 1
bayes_auto_expire 1
bayes_learn_to_journal 0
bayes_journal_max_size 102400
use_dcc 1
use_pyzor 1
use_razor2 1

*********************************************************
Spamassassin Lint:
spamassassin -D --lint --siteconfigpath=/etc/mail/spamassassin
debug: SpamAssassin version 3.0.4
debug: Score set 0 chosen.
debug: running in taint mode? yes
debug: Running in taint mode, removing unsafe env vars, and resetting PATH
debug: PATH included '/usr/sbin', keeping.
debug: PATH included '/usr/bin', keeping.
debug: PATH included '/usr/ccs/bin', keeping.
debug: PATH included '/usr/local/bin', keeping.
debug: PATH included '/opt/sfw/bin', keeping.
debug: Final PATH set to: 
/usr/sbin:/usr/bin:/usr/ccs/bin:/usr/local/bin:/opt/sfw/bin

debug: diag: module not installed: DBI ('require' failed)

debug: diag: module installed: DB_File, version 1.811
debug: diag: module installed: Digest::SHA1, version 2.07
debug: diag: module installed: IO::Socket::UNIX, version 1.21
debug: diag: module installed: MIME::Base64, version 3.03
debug: diag: module installed: Net::DNS, version 0.46

debug: diag: module not installed: Net::LDAP ('require' failed)

debug: diag: module installed: Razor2::Client::Agent, version 2.40
debug: diag: module installed: Storable, version 2.09
debug: diag: module installed: URI, version 1.30
debug: ignore: using a test message to lint rules
debug: using "/opt/sfw/share/spamassassin" for default rules dir
debug: config: read file /opt/sfw/share/spamassassin/10_misc.cf
debug: config: read file /opt/sfw/share/spamassassin/20_anti_ratware.cf
debug: config: read file /opt/sfw/share/spamassassin/20_body_tests.cf
debug: config: read file /opt/sfw/share/spamassassin/20_compensate.cf
debug: config: read file /opt/sfw/share/spamassassin/20_dnsbl_tests.cf
debug: config: read file /opt/sfw/share/spamassassin/20_drugs.cf
debug: config: read file /opt/sfw/share/spamassassin/20_fake_helo_tests.cf
debug: config: read file /opt/sfw/share/spamassassin/20_head_tests.cf
debug: config: read file /opt/sfw/share/spamassassin/20_html_tests.cf
debug: config: read file /opt/sfw/share/spamassassin/20_meta_tests.cf
debug: config: read file /opt/sfw/share/spamassassin/20_phrases.cf
debug: config: read file /opt/sfw/share/spamassassin/20_porn.cf
debug: config: read file /opt/sfw/share/spamassassin/20_ratware.cf
debug: config: read file /opt/sfw/share/spamassassin/20_uri_tests.cf
debug: config: read file /opt/sfw/share/spamassassin/23_bayes.cf
debug: config: read file /opt/sfw/share/spamassassin/25_body_tests_es.cf
debug: config: read file /opt/sfw/share/spamassassin/25_hashcash.cf
debug: config: read file /opt/sfw/share/spamassassin/25_spf.cf
debug: config: read file /opt/sfw/share/spamassassin/25_uribl.cf
debug: config: read file /opt/sfw/share/spamassassin/30_text_de.cf
debug: config: read file /opt/sfw/share/spamassassin/30_text_fr.cf
debug: config: read file /opt/sfw/share/spamassassin/30_text_nl.cf
debug: config: read file /opt/sfw/share/spamassassin/30_text_pl.cf
debug: config: read file /opt/sfw/share/spamassassin/50_scores.cf
debug: config: read file /opt/sfw/share/spamassassin/60_whitelist.cf
debug: using "/etc/mail/spamassassin" for site rules dir
debug: config: read file /etc/mail/spamassassin/local.cf
debug: config: read file /etc/mail/spamassassin/sa-mimedefang.cf
debug: using "/var/tmp/.spamassassin" for user state dir
debug: using "/var/tmp/.spamassassin/user_prefs" for user prefs file
debug: config: read file /var/tmp/.spamassassin/user_prefs
debug: bayes: 8472 tie-ing to DB file R/O /etc/mail/spamassassin/bayes_toks
debug: bayes: 8472 tie-ing to DB file R/O /etc/mail/spamassassin/bayes_seen
debug: bayes: found bayes db version 3
debug: Score set 3 chosen.
debug: ---- MIME PARSER START ----
debug: main message type: text/plain
debug: parsing normal part
debug: added part, type: text/plain
debug: ---- MIME PARSER END ----
debug: metadata: X-Spam-Relays-Trusted:
debug: metadata: X-Spam-Relays-Untrusted:
debug: is Net::DNS::Resolver available? yes
debug: Net::DNS version: 0.46
debug: trying (3) kernel.org...
debug: looking up NS for 'kernel.org'
debug: NS lookup of kernel.org succeeded => Dns available (set 
dns_available to hardcode)
debug: is DNS available? 1
debug: all '*From' addrs: ignore@compiling.spamassassin.taint.org

debug: decoding: no encoding detected

debug: Running tests for priority: 0
debug: running header regexp tests; score so far=0
debug: all '*To' addrs:
debug: running body-text per-line regexp tests; score so far=-3.174
debug: running uri tests; score so far=-3.174
debug: bayes corpus size: nspam = 4219, nham = 588
debug: tokenize: header tokens for *F = "U*ignore 
D*compiling.spamassassin.taint.org D*spamassassin.taint.org D*taint.org 
D*org"
debug: tokenize: header tokens for *m = "  1122639644 lint_rules "
debug: tokenize: header tokens for *RT = " "
debug: tokenize: header tokens for *RU = " "

debug: cannot use bayes on this message; not enough usable tokens found
debug: bayes: not scoring message, returning undef

debug: bayes: 8472 untie-ing
debug: bayes: 8472 untie-ing db_toks
debug: bayes: 8472 untie-ing db_seen
debug: Razor2 is available
debug: entering helper-app run mode
 Razor-Log: Computed razorhome from env: /var/tmp/.razor
 Razor-Log: Found razorhome: /var/tmp/.razor

 Razor-Log: No /var/tmp/.razor/razor-agent.conf found, skipping.
 Razor-Log: No razor-agent.conf found, using defaults.

Jul 29 05:20:46.813407 check[8472]: [ 2] [bootup] Logging initiated 
LogDebugLevel=9 to stdout
Jul 29 05:20:46.815185 check[8472]: [ 5] computed 
razorhome=/var/tmp/.razor, conf=, ident=/var/tmp/.razor/identity
Jul 29 05:20:46.815993 check[8472]: [ 8] Client supported_engines: 4
Jul 29 05:20:46.817660 check[8472]: [ 8]  prep_mail done: mail 1 
headers=93, mime0=1376
Jul 29 05:20:46.819373 check[8472]: [ 5] read_file: 1 items read from 
/var/tmp/.razor/servers.discovery.lst
Jul 29 05:20:46.820761 check[8472]: [ 5] read_file: 2 items read from 
/var/tmp/.razor/servers.nomination.lst
Jul 29 05:20:46.821861 check[8472]: [ 5] read_file: 1 items read from 
/var/tmp/.razor/servers.catalogue.lst
Jul 29 05:20:46.823840 check[8472]: [ 9] Assigning defaults to 
joy.cloudmark.com
Jul 29 05:20:46.824671 check[8472]: [ 9] Assigning defaults to 
folly.cloudmark.com
Jul 29 05:20:46.825621 check[8472]: [ 9] Assigning defaults to 
shock.cloudmark.com
Jul 29 05:20:46.828536 check[8472]: [ 5] read_file: 15 items read from 
/var/tmp/.razor/server.shock.cloudmark.com.conf
Jul 29 05:20:46.830689 check[8472]: [ 5] read_file: 15 items read from 
/var/tmp/.razor/server.shock.cloudmark.com.conf
Jul 29 05:20:46.831567 check[8472]: [ 5] 135304 seconds before closest 
server discovery
Jul 29 05:20:46.832323 check[8472]: [ 6] shock.cloudmark.com is a 
Catalogue Server srl 5084; computed min_cf=6, Server se: C8
Jul 29 05:20:46.833190 check[8472]: [ 8] Computed supported_engines: 4
Jul 29 05:20:46.833763 check[8472]: [ 8] Using next closest server 
shock.cloudmark.com:2703, cached info srl 5084
Jul 29 05:20:46.834254 check[8472]: [ 8] mail 1 has no subject
Jul 29 05:20:46.838851 check[8472]: [ 6] preproc: mail 1.0 went from 
1376 bytes to 1339
Jul 29 05:20:46.839477 check[8472]: [ 6] computing sigs for mail 1.0, 
len 1339
Jul 29 05:20:46.844532 check[8472]: [ 6] skipping whitelist file 
(empty?): /var/tmp/.razor/razor-whitelist
Jul 29 05:20:46.845185 check[8472]: [ 5] Connecting to 
shock.cloudmark.com ...
Jul 29 05:20:46.884792 check[8472]: [ 8] Connection established
Jul 29 05:20:46.885575 check[8472]: [ 4] shock.cloudmark.com >> 36 
server greeting: sn=C&srl=5084&a=l&a=cg&ep4=7542-10
Jul 29 05:20:46.887841 check[8472]: [ 4] shock.cloudmark.com << 25
Jul 29 05:20:46.888264 check[8472]: [ 6] cn=razor-agents&cv=2.40
Jul 29 05:20:46.889605 check[8472]: [ 6] shock.cloudmark.com is a 
Catalogue Server srl 5084; computed min_cf=6, Server se: C8
Jul 29 05:20:46.890367 check[8472]: [ 8] Computed supported_engines: 4
Jul 29 05:20:46.890989 check[8472]: [ 8] mail 1.0 e4 sig: 
xFaZIZUVHk90OQfARnenjx5BZTMA
Jul 29 05:20:46.891648 check[8472]: [ 8] preparing 1 queries
Jul 29 05:20:46.892380 check[8472]: [ 8] sending 1 batches
Jul 29 05:20:46.893231 check[8472]: [ 4] shock.cloudmark.com << 52
Jul 29 05:20:46.893601 check[8472]: [ 6] 
a=c&e=4&ep4=7542-10&s=xFaZIZUVHk90OQfARnenjx5BZTMA
Jul 29 05:20:46.991882 check[8472]: [ 4] shock.cloudmark.com >> 5
Jul 29 05:20:46.992336 check[8472]: [ 6] response to sent.2
p=0
Jul 29 05:20:46.993896 check[8472]: [ 6] mail 1.0 e=4 
sig=xFaZIZUVHk90OQfARnenjx5BZTMA: sig not found.
Jul 29 05:20:46.994412 check[8472]: [ 7] method 4: mail 1.0: 
no-contention part, spam=0

Jul 29 05:20:46.994756 check[8472]: [ 7] method 4: mail 1: all 
non-contention parts not spam, mail not spam
Jul 29 05:20:46.995069 check[8472]: [ 3] mail 1 is not known spam.

Jul 29 05:20:46.995450 check[8472]: [ 5] disconnecting from server 
shock.cloudmark.com
Jul 29 05:20:46.996116 check[8472]: [ 4] shock.cloudmark.com << 5
Jul 29 05:20:46.996444 check[8472]: [ 6] a=q
debug: Using results from Razor v2.40
debug: Found Razor2 part: part=0 engine=4 ct=0 cf=0
debug: leaving helper-app run mode
debug: Razor2 results: spam? 0  highest cf score: 0
debug: running raw-body-text per-line regexp tests; score so far=-3.174
debug: running full-text regexp tests; score so far=-3.174
debug: Razor2 is available
debug: Current PATH is: 
/usr/sbin:/usr/bin:/usr/ccs/bin:/usr/local/bin:/opt/sfw/bin

debug: Pyzor is not available: pyzor not found
debug: DCCifd is not available: no r/w dccifd socket found.
debug: DCC is not available: no executable dccproc found.

debug: Running tests for priority: 500
debug: RBL: success for 1 of 1 queries
debug: running meta tests; score so far=-3.174
debug: running header regexp tests; score so far=-1.948
debug: running body-text per-line regexp tests; score so far=-1.948
debug: running uri tests; score so far=-1.948
debug: running raw-body-text per-line regexp tests; score so far=-1.948
debug: running full-text regexp tests; score so far=-1.948
debug: Running tests for priority: 1000
debug: running meta tests; score so far=-1.948
debug: running header regexp tests; score so far=-1.948
debug: using "/var/tmp/.spamassassin" for user state dir
debug: lock: 8472 created 
/var/tmp/.spamassassin/auto-whitelist.lock.fred.5cs.com.8472
debug: lock: 8472 trying to get lock on 
/var/tmp/.spamassassin/auto-whitelist with 0 retries
debug: lock: 8472 link to /var/tmp/.spamassassin/auto-whitelist.lock: 
link ok
debug: Tie-ing to DB file R/W in /var/tmp/.spamassassin/auto-whitelist
debug: auto-whitelist (db-based): 
ignore@compiling.spamassassin.taint.org|ip=none scores 0/0
debug: AWL active, pre-score: -1.948, autolearn score: -1.948, mean: 
undef, IP: undef
debug: DB addr list: untie-ing and unlocking.
debug: DB addr list: file locked, breaking lock.
debug: unlock: 8472 unlink /var/tmp/.spamassassin/auto-whitelist.lock
debug: Post AWL score: -1.948
debug: running body-text per-line regexp tests; score so far=-1.948
debug: running uri tests; score so far=-1.948
debug: running raw-body-text per-line regexp tests; score so far=-1.948
debug: running full-text regexp tests; score so far=-1.948
debug: is spam? score=-1.948 required=10
debug: tests=ALL_TRUSTED,MISSING_HEADERS,MISSING_SUBJECT,NO_REAL_NAME
debug: 
subtests=__HAS_MSGID,__MSGID_OK_DIGITS,__MSGID_OK_HOST,__SANE_MSGID,__UNUSABLE_MSGID




Re: Bayes: not enough usable tokens found

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
Mike Cavanagh wrote:
> Hum.  I can see some messages are being caught via the Bayes test, but I 
> would think Bayes would find many more as I have close to 5000 SPAM in 
> the Bayes system.
> I get at most 15 messages a day flagged as SPAM while I receive approx. 
> 100 messages a day as non-SPAM but should be flagged as SPAM.
> 
> I have started to include the Spamassassin footer on all messages to get 
> a handle on what passes in the "non-Spam" messages.
> 
> Any thoughts on how to improve this would be helpful.
> 

> 
>  pts rule name              description
> ---- ---------------------- --------------------------------------------------
> -3.3 ALL_TRUSTED            Did not pass through any untrusted hosts


http://wiki.apache.org/spamassassin/TrustPath


Re: Bayes: not enough usable tokens found

Posted by Loren Wilton <lw...@earthlink.net>.
Hum.  I'm a little confused by that SA score stuff on the bottom of the
message.  If it refers to a message that should be spam you have two serious
problems.  If it referred to a message from this list you may have a serious
problem and a less serious problem.

 pts rule name              description
---- ---------------------- ------------------------------------------------
--
-3.3 ALL_TRUSTED            Did not pass through any untrusted hosts
 0.0 HTML_30_40             BODY: Message is 30% to 40% HTML
 0.0 HTML_MESSAGE           BODY: HTML included in message
 0.0 HTML_TITLE_EMPTY       BODY: HTML title contains no text
-2.6 BAYES_00               BODY: Bayesian spam probability is 0 to 1%

In general ALL_TRUSTED shouldn't be firing for messages coming from an
external source.  This makes me wonder if you have trusted_hosts and
trusted_networks set correctly.

In general SA (and especially Bayes) shouldn't be seeing this list, since it
has a lot of real spam floating through it, and other spammy tokens.  It is
far better to use postfix or whatever your router is to bypass this list
around SA.

If that header referred to a spam, BAYES_00 says that Bayes thought it was
guaranteed ham.  That would be a sign that you have a corrupted bayes
database.

        Loren


Re: Bayes: not enough usable tokens found

Posted by Mike Cavanagh <mi...@5cs.com>.
Hum.  I can see some messages are being caught via the Bayes test, but I 
would think Bayes would find many more as I have close to 5000 SPAM in 
the Bayes system.
I get at most 15 messages a day flagged as SPAM while I receive approx. 
100 messages a day as non-SPAM but should be flagged as SPAM.

I have started to include the Spamassassin footer on all messages to get 
a handle on what passes in the "non-Spam" messages.

Any thoughts on how to improve this would be helpful.

Thanks,
Mike


Loren Wilton wrote:

>>What does this message mean??
>>    debug: cannot use bayes on this message; not enough usable tokens
>>    
>>
>found
>  
>
>>    debug: bayes: not scoring message, returning undef
>>    
>>
>
>Unless you are seeing this a whole lot, I don't think you are doing anything
>wrong.  I think this just means that the particular mail didn't much match
>anything Bayes had seen before, so it didn't feel competent to assign a
>score to it.  I would have expected that to be a bayes_50 case, but it looks
>like it just decided to bypass the message.
>
>        Loren
>
>  
>

Re: Bayes: not enough usable tokens found

Posted by Loren Wilton <lw...@earthlink.net>.
> What does this message mean??
>     debug: cannot use bayes on this message; not enough usable tokens
found
>     debug: bayes: not scoring message, returning undef

Unless you are seeing this a whole lot, I don't think you are doing anything
wrong.  I think this just means that the particular mail didn't much match
anything Bayes had seen before, so it didn't feel competent to assign a
score to it.  I would have expected that to be a bayes_50 case, but it looks
like it just decided to bypass the message.

        Loren