You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Stefan Osterlitz <in...@osterlitz-medien.de> on 2007/09/06 12:39:18 UTC

spamd autolearn fails, spamassassin autolearn works??

Hello Users,
i have a problem with the autolearning function on my server. Mails
submitted to spamd get autolearn=failed, but mail sent to spamassassin
or sa-learn will get learned correctly.
 
I am running SA 3.2.3 with Bayes and AWL on the MySQL backend.
The server is running as user vscan with start options -d -m 10 -x -q
--socketpath=/var/run/spam -u vscan
 
 
vscan@h9902:~/spam> sa-learn --clear

vscan@h9902:~/spam> sa-learn --sync

vscan@h9902:~/spam> sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0          0          0  non-token data: nspam
0.000          0          0          0  non-token data: nham
0.000          0          0          0  non-token data: ntokens
0.000          0 2147483647          0  non-token data: oldest atime
0.000          0          0          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0          0          0  non-token data: last expiry
atime
0.000          0          0          0  non-token data: last expire
atime delta
0.000          0          0          0  non-token data: last expire
reduction count

vscan@h9902:~/spam> spamc -c --headers -U /var/run/spam <
1390002.msg.msg
8.8/4.0

vscan@h9902:~/spam> sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0          0          0  non-token data: nspam
0.000          0          0          0  non-token data: nham
0.000          0          0          0  non-token data: ntokens
0.000          0 2147483647          0  non-token data: oldest atime
0.000          0          0          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0          0          0  non-token data: last expiry
atime
0.000          0          0          0  non-token data: last expire
atime delta
0.000          0          0          0  non-token data: last expire
reduction count
 
Nothing has happened in the database using spamc/spamd..
 
vscan@h9902:~/spam> spamassassin < 1390002.msg.msg
Received: from localhost by h9902.serverkompetenz.net
        with SpamAssassin (version 3.2.3);
        Thu, 06 Sep 2007 12:12:36 +0200
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on
        h9902.serverkompetenz.net
X-Spam-Level: *********************
X-Spam-Status: Yes, score=21.3 required=4.0 tests=DRUGS_DIET=0.001,
 
DRUGS_ERECTILE=0.646,DRUGS_MUSCLE=0.001,FB_CIALIS_LEO3=2.815,FB_GVR=0.00
1,
        HS_INDEX_PARAM=0.001,MISSING_DATE=0.001,MISSING_HB_SEP=2.5,
 
MISSING_HEADERS=1.581,MISSING_MID=0.001,MISSING_SUBJECT=1.285,NO_RECEIVE
D=0,
 
URIBL_BLACK=1.961,URIBL_JP_SURBL=2.857,URIBL_OB_SURBL=3,URIBL_SC_SURBL=2
.523,
        URIBL_WS_SURBL=2.1
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----------=_46DFD294.09BCF876"
 
This is a multi-part message in MIME format.
 
------------=_46DFD294.09BCF876
...message clipped...
------------=_46DFD294.09BCF876--
 
vscan@h9902:~/spam> sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0          1          0  non-token data: nspam
0.000          0          0          0  non-token data: nham
0.000          0        279          0  non-token data: ntokens
0.000          0 1189073555          0  non-token data: oldest atime
0.000          0 1189073555          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0          0          0  non-token data: last expiry
atime
0.000          0          0          0  non-token data: last expire
atime delta
0.000          0          0          0  non-token data: last expire
reduction count

spamassassin DID update the database..
 
When I run spamd with -D enabled, I get
 
Sep  6 11:03:04 h9902 spamd[9924]: learn: auto-learn: currently using
scoreset 3, recomputing score based on scoreset 1
Sep  6 11:03:04 h9902 spamd[9924]: learn: auto-learn: message score:
10.05, computed score for autolearn: 6.662
Sep  6 11:03:04 h9902 spamd[9924]: learn: auto-learn? ham=-1, spam=6,
body-points=6.662, head-points=6.662, learned-points=5
Sep  6 11:03:04 h9902 spamd[9924]: learn: auto-learn? yes, spam (6.662 >
6)
Sep  6 11:03:04 h9902 spamd[9924]: learn: initializing learner
Sep  6 11:03:04 h9902 spamd[9924]: learn: initializing learner
Sep  6 11:03:04 h9902 spamd[9924]: check: is spam? score=10.05
required=4
Sep  6 11:03:04 h9902 spamd[9924]: check:
tests=BAYES_95,DRUGS_DIET,DRUGS_ERECTILE,DRUGS_MUSCLE,FB_CIALIS_LEO3,FB_
GVR,FROM_LOCAL_NOVOWEL,HS_INDEX_PARAM,HTML_MES
SAGE
Sep  6 11:03:04 h9902 spamd[9924]: check:
subtests=__BAT_BOUNDARY,__CT,__CTYPE_HAS_BOUNDARY,__CTYPE_MULTIPART_ALT,
__DOS_HAS_ANY_URI,__DOS_RCVD_THU,__DRUGS_DIET1
,__DRUGS_DIET_PHEN,__DRUGS_ERECTILE1,__DRUGS_ERECTILE10,__DRUGS_ERECTILE
3,__DRUGS_ERECTILE_C,__DRUGS_ERECTILE_V,__DRUGS_MUSCLE1,__FH_HAS_XPRIORI
TY,__FRAUD_DBI,_
_HAS_ANY_URI,__HAS_MSGID,__HAS_RCVD,__HAS_SUBJECT,__HAS_X_MAILER,__LAST_
UNTRUSTED_RELAY_NO_AUTH,__LOCAL_PP_NONPPURL,__MIME_HTML,__MIME_VERSION,_
_MISSING_REF,__M
SGID_OK_DIGITS,__MSGID_OK_HOST,__NAKED_TO,__NONEMPTY_BODY,__SANE_MSGID,_
_SARE_BODY_BLNK_5_100,__SARE_HAS_BG_COLOR,__SARE_HAS_FG_COLOR,__SARE_HEA
D_MIME_VALID,__S
ARE_HTML_HAS_A,__SARE_HTML_HAS_BR,__SARE_HTML_HAS_FONT,__SARE_HTML_HAS_P
,__SARE_HTML_HAS_TITLE,__SARE_META_MURTY3,__SARE_SPEC_LRD_COST4,__SARE_S
PEC_PROLEO3,__SA
RE_URI_ANY,__SARE_WHITE_BG_COLOR,__TAG_EXISTS_BODY,__TAG_EXISTS_HEAD,__T
AG_EXISTS_HTML,__THEBAT_MUA,__TOCC_EXISTS
Sep  6 11:03:04 h9902 spamd[9924]: spamd: identified spam (10.1/4.0) for
(unknown):65 in 1.0 seconds, 4368 bytes.
Sep  6 11:03:04 h9902 spamd[9924]: spamd: result: Y 10 -
BAYES_95,DRUGS_DIET,DRUGS_ERECTILE,DRUGS_MUSCLE,FB_CIALIS_LEO3,FB_GVR,FR
OM_LOCAL_NOVOWEL,HS_INDEX_PARAM,HTML_MESSAGE
scantime=1.0,size=4368,user=(unknown),uid=65,required_score=4.0,rhost=lo
calhost,raddr=127.0.0.1,rport=/var/run/spam,mid=<817429475.4380830581998
1@acframers.com>,bayes=0.980617,autolearn=unavailable

I do see nothing special there, apart from the autolearn=unavailable.
There are no errors due to permissions, vscan homedir amd /tmp are
writable, the database works and is writable by the user.
This is my local.cf (score changes left out) - it does --lint!
 
trusted_networks 81.169.178.43
internal_networks 81.169.178.43
required_score 4
score ALL_TRUSTED 0
add_header all Status _YESNO_, score=_SCORE_ required=_REQD_
tests=_TESTSSCORES(,)_
ok_locales en de
use_bayes 1
bayes_min_ham_num 50
bayes_min_spam_num 50
bayes_use_hapaxes 0
use_auto_whitelist 1
user_awl_dsn dbi:mysql:spamassassin
user_awl_sql_username spam
user_awl_sql_password hamster
auto_whitelist_factory Mail::SpamAssassin::SQLBasedAddrList
#bayes_path /tmp/bayes
bayes_sql_dsn dbi:mysql:spamassassin
bayes_sql_username spam
bayes_sql_password hamster
bayes_store_module Mail::SpamAssassin::BayesStore::SQL
bayes_sql_override_username spam
bayes_auto_learn_threshold_nonspam -1
bayes_auto_learn_threshold_spam 6

I have already tried commenting out both SQL modules, running DB_File
instead, commenting out the thresholds, checking the number of mysql
connections, nothing worked.
I have upgraded Mail::Spamassassin to the latest version and forced a
rebuild.

Do you have any ideas what I might try? How does the spamd server work
differently from a manual call of spamassassin? Is there any more
debugging I can get from the daemon?
 
Cheers,
Stefan
 



AW: spamd autolearn fails, spamassassin autolearn works?? - SOLVED

Posted by Stefan Osterlitz <in...@osterlitz-medien.de>.
Hello Matt,
thanks for your answer!
I have found the source of the problem - strange matter.

There is a "require PerMsgStatus.pm" on line 508 of SpamAssassin.pm - if
you replace that with a "use" statement, the server is fine again! I
just happened to find that by putting my own debug output statements all
over the place. Seems the code dies without notice in there..


Maybe I should file a bug report??


> Nothing has happened in the database using spamc/spamd..
That alone isn't a concern. Note that not every spam will be
autolearned, and with a score of 8.8, it's probably not going to be.


SA requires at least 3.0 points of header tests AND 3.0 points of body
tests, regardless of what the total score is.
It also recalculates the score as if bayes were disabled, so the total
score the learner uses is likely to be lower.

You were right, if I had not celared the database in every step - in
this case there are no learned messages at all. Additionally,
spamassassin in standalone mode does learn - so the mail is "learnable".


> Also of note, that spamc call is not compatible with your local.cf
below and could result in mail corruption. To use the --headers
parameter to spamc you need to use "report_safe 0" in your local.cf.

This does not have any effect right now as I am on the commandline,
could not be the problem.
In may server, I do not use spamc but cgpav, should have told that!


>If there were permissions issues you'd most likely get
autolearn=failed, and if it was disabled you'd get 
"autolearn=disabled".

> See also:
http://wiki.apache.org/spamassassin/AutolearningNotWorking

You are right, I have read that text - it is just less than helpful.
Strange thing is, when I run as daemon, I get "failed", when running
debug, I get "unavailable".
All that with zero messages in the bayes DB.

> Why did you zero out ALL_TRUSTED? Was it hitting spam? If so, fix your
trusted_networks setting, don't zero out this one rule. You have by far
more problems than just this one rule..

Yes, that was legacy nonsense - I have corrected it now!


Are you sure SA is *never* autolearning?

Yes, sa never learned in the deamon!

Cheers,
Stefan




Re: spamd autolearn fails, spamassassin autolearn works??

Posted by Matt Kettler <mk...@verizon.net>.
Stefan Osterlitz wrote:
> Hello Users,
> i have a problem with the autolearning function on my server. Mails
> submitted to spamd get autolearn=failed, but mail sent to spamassassin
> or sa-learn will get learned correctly.
>  
> I am running SA 3.2.3 with Bayes and AWL on the MySQL backend.
> The server is running as user vscan with start options -d -m 10 -x -q
> --socketpath=/var/run/spam -u vscan
>  
>  
> _vscan@h9902:~/spam_ <mailto:vscan@h9902:%7E/spam>> sa-learn --clear
> _vscan@h9902:~/spam_ <mailto:vscan@h9902:%7E/spam>> sa-learn --sync
> _vscan@h9902:~/spam_ <mailto:vscan@h9902:%7E/spam>> sa-learn --dump magic
> 0.000          0          3          0  non-token data: bayes db version
> 0.000          0          0          0  non-token data: nspam
> 0.000          0          0          0  non-token data: nham
> 0.000          0          0          0  non-token data: ntokens
>
> _vscan@h9902:~/spam_ <mailto:vscan@h9902:%7E/spam>> spamc -c --headers
> -U /var/run/spam < 1390002.msg.msg
> 8.8/4.0
> _vscan@h9902:~/spam_ <mailto:vscan@h9902:%7E/spam>> sa-learn --dump magic
> 0.000          0          3          0  non-token data: bayes db version
> 0.000          0          0          0  non-token data: nspam
> 0.000          0          0          0  non-token data: nham
> Nothing has happened in the database using spamc/spamd..
That alone isn't a concern. Note that not every spam will be
autolearned, and with a score of 8.8, it's probably not going to be.

SA requires at least 3.0 points of header tests AND 3.0 points of body
tests, regardless of what the total score is.
It also recalculates the score as if bayes were disabled, so the total
score the learner uses is likely to be lower.


Also of note, that spamc call is not compatible with your local.cf below
and could result in mail corruption. To use the --headers parameter to
spamc you need to use "report_safe 0" in your local.cf.

from man spamc under --headers:

           Note that this only makes sense if you are using "report_safe
0" in
           the scanning configuration on the remote end; with
"report_safe 1",
           it is likely to result in corrupt messages.

>  
> ep  6 11:03:04 h9902 spamd[9924]: spamd: result: Y 10 -
> BAYES_95,DRUGS_DIET,DRUGS_ERECTILE,DRUGS_MUSCLE,FB_CIALIS_LEO3,FB_GVR,FROM_LOCAL_NOVOWEL,HS_INDEX_PARAM,HTML_MESSAGE
> scantime=1.0,size=4368,user=(unknown),uid=65,required_score=4.0,rhost=localhost,raddr=127.0.0.1,rport=/var/run/spam,mid=<81...@acframers.com>,bayes=0.980617,autolearn=unavailable
> I do see nothing special there, apart from the autolearn=unavailable.
> There are no errors due to permissions, vscan homedir amd /tmp are
> writable, the database works and is writable by the user.
And the autolearn=unavailable isn't concerning either.. It usually means
the message has already been learned.

If there were permissions issues you'd most likely get autolearn=failed,
and if it was disabled you'd get "autolearn=disabled".

See also:
http://wiki.apache.org/spamassassin/AutolearningNotWorking

> This is my local.cf (score changes left out) - it does --lint!
>  
> trusted_networks 81.169.178.43
> internal_networks 81.169.178.43
> required_score 4
> score ALL_TRUSTED 0
Why did you zero out ALL_TRUSTED? Was it hitting spam? If so, fix your
trusted_networks setting, don't zero out this one rule. You have by far
more problems than just this one rule..

See also:

http://wiki.apache.org/spamassassin/TrustPath

> Do you have any ideas what I might try? How does the spamd server work
> differently from a manual call of spamassassin? Is there any more
> debugging I can get from the daemon?

Are you sure SA is *never* autolearning?

ie, assuming your spamd logs to maillog
grep "autolearn=yes" /var/log/maillog

>  
> Cheers,
> Stefan
>