You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Dana Holland <da...@navarrocollege.edu> on 2004/02/19 20:16:50 UTC

another newbie bayes problem - related to my previous question

I finally found this on google, relating to my problem messages that the 
bayes db had not been initialized:

http://bugzilla.spamassassin.org/show_bug.cgi?id=2261

I don't understand how to install a patch yet (but I'm working on it). 
I need a little help understanding what I'm seeing in the spamassassin 
--lint -D output.  I'm seeing the following:

debug: is DNS available? 1
debug: all '*From' addrs: ignore@compiling.spamassassin.taint.org
debug: running header regexp tests; score so far=0
debug: running body-text per-line regexp tests; score so far=2.077
debug: bayes corpus size: nspam = 2411, nham = 6972
debug: uri tests: Done uriRE
debug: tokenize: header tokens for *F = "U*ignore 
D*compiling.spamassassin.taint.org D*spamassassin.taint.org D*taint.org 
D*org"
debug: tokenize: header tokens for *m = " 1077218228 lint_rules "
debug: cannot use bayes on this message; db not initialised yet
debug: bayes: not scoring message, returning 0.5

Where is it getting the spamassassin.tain.org stuff?  Is this something 
that's in the files I used to train bayes?



Re: another newbie bayes problem - related to my previous question

Posted by Chris Thielen <cm...@someone.dhs.org>.
Dana Holland said:
> Chris Thielen wrote:
>
>> Dana, could you paste the output from "sa-learn --dump magic"?
>
> # sa-learn --dump magic
> 0.000          0          2          0  non-token data: bayes db version
> 0.000          0       2411          0  non-token data: nspam
> 0.000          0       6972          0  non-token data: nham
> 0.000          0    1146536          0  non-token data: ntokens
> 0.000          0 1077124857          0  non-token data: oldest atime
> 0.000          0 1077205613          0  non-token data: newest atime
> 0.000          0 1077205625          0  non-token data: last journal
> sync atime
> 0.000          0 1077205769          0  non-token data: last expiry atime
> 0.000          0      43200          0  non-token data: last expire
> atime delta
> 0.000          0     194438          0  non-token data: last expire
> reduction count
>

OK, scratch that line of thought.   Your bayes DB looks well trained and
has plenty of tokens.


--
Chris Thielen

Easily generate SpamAssassin rules to catch obfuscated spam phrases
(0BFU$C/\TED SPA/\/\ P|-|RA$ES):
http://www.sandgnat.com/cmos/

Re: another newbie bayes problem - related to my previous question

Posted by Chris Thielen <cm...@someone.dhs.org>.
Dana Holland said:
> I finally found this on google, relating to my problem messages that the
> bayes db had not been initialized:
>
> http://bugzilla.spamassassin.org/show_bug.cgi?id=2261
>
> I don't understand how to install a patch yet (but I'm working on it).
> I need a little help understanding what I'm seeing in the spamassassin
> --lint -D output.  I'm seeing the following:
>

Dana, could you paste the output from "sa-learn --dump magic"?

Also, the bug you referenced indeed has a patch, however I think that
patch simply changes what error message is printed in a certain condition
(no tokens in bayes matched tokens in the email).  In other words, I don't
think applying that patch will do you much good.

--
Chris Thielen

Easily generate SpamAssassin rules to catch obfuscated spam phrases
(0BFU$C/\TED SPA/\/\ P|-|RA$ES):
http://www.sandgnat.com/cmos/