You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Scott Techlist <te...@msws.org> on 2017/08/08 18:06:26 UTC

Bayes auto-learn - not happening

Centos7
Postfix 3.2.2
Amavisd-new 2.11.0
Spamassassin 3.4.0
Site-wide configuration

This is a new box and I've configured some conservative values for auto-learn.  I've enabled it properly AFAIK, but I can't see any sign of it working.  

I have these set in local.cf
use_bayes               1
bayes_auto_learn        1
bayes_auto_learn_threshold_nonspam -1.7
bayes_auto_learn_threshold_spam 10.0
# this is a filename prefix, not a directory per se
bayes_path              /etc/mail/bayes/bayes
bayes_file_mode         0666

-------------bayes prep ----------------
Start fresh for troubleshooting:
su amavis -c 'sa-learn --clear'

Add one spam manually and check tokens:

[root@tn2 mail]# su amavis -c 'sa-learn --dump magic'
0.000          0          3          0  non-token data: bayes db version
0.000          0          1          0  non-token data: nspam
0.000          0          0          0  non-token data: nham
0.000          0       2157          0  non-token data: ntokens

---------amavisd prep----------------

Restart amavisd/spamassassin just to be sure all configs read..

------- ready to process -------------

The next high scoring spam arrives, it was sent to my spam mailbox.  It did NOT autolearn.  Nor did several others.  

To troubleshoot, I took one that did not autolearn, and learned it manually by:
su amavis -c 'sa-learn -D --spam --showdots  --mbox /home/mail/onespam

even though this message was slightly over the threshold, the log says it learned anyway:
-D log snippet:
---------------------
Aug  8 12:37:27.216 [13198] info: archive-iterator: skipping large message: 858 lines, 262203 bytes, limit 262144 bytes

Learned tokens from 1 message(s) (1 message(s) examined)
---------------------

Verified it learned:

[root@tn2 mail]# su amavis -c 'sa-learn --dump magic'
0.000          0          3          0  non-token data: bayes db version
0.000          0          2          0  non-token data: nspam


Partial header from that message:

X-Spam-Flag: YES
X-Spam-Score: 17.374
X-Spam-Level: *****************
X-Spam-Status: Yes, score=17.374 tag=-9999 tag2=5 kill=6.31
        tests=[RCVD_IN_BRBL_LASTEXT=1.644, RCVD_IN_DNSWL_NONE=-0.0001,
        RCVD_IN_RP_RNBL=1.284, RCVD_IN_SBL_CSS=3.558, RCVD_IN_SORBS_WEB=1.5,
        RP_MATCHES_RCVD=-0.001, SUSPICIOUS_RECIPS=2.497,
        URIBL_ABUSE_SURBL=1.948, URIBL_BLACK=1.7, URIBL_DBL_SPAM=2.5,
        URIBL_SBL=0.644, URIBL_SBL_A=0.1] autolearn=no autolearn_force=no

Why aren't my spams getting auto-learned?  If sa-learn "ate" it, shouldn't auto-learn too?

I know there is a default 200 threshold before Bayes starts tagging anything, but I understand it should learn without issue.

Can't figure out what's wrong...














Re: Bayes auto-learn - not happening

Posted by Scott <te...@msxc.com>.
OK, so I don't think auto-learn works on spam.  What about HAM?

I've raised the floor to auto-learn HAM to 1.  Before anyone gives me any
grief, it's just for testing.  I'll rebuild the bayes db from a corpus when
I get it working.

So SPAM takes the 3-way patch, 3 from the header, 3 from the body.  but what
about HAM.  Since the default for autolearn is much lower than 6, I presume
this same limitation does not apply.

So hams should be free to auto-learn with any score that is below my
threshold (1).

Like this one, right? (assuming it is not already learned):

Aug 10 14:21:46 tn2 amavis[3231]: (03231-06) Passed CLEAN {RelayedInbound},
[168.100.1.7]:43757 [173.167.109.218] ESMTP/LMTP
<ow...@postfix.org> -> <te...@myvirt.com>,
(ESMTPS://[168.100.1.7]:43757 < ESMTPS://173.203.187.85 < 173.167.109.218),
Queue-ID: 5EE363BF5, Message-ID:
<01...@mefox.org>, mail_id: JkAvl418yTui, b:
ZbU4iXvCD, Hits: -5.799, size: 5533, queued_as: BD81F3EE7, Subject: "RE:
reloading postfix with systemd", From: <ne...@mefox.org>, X-Mailer:
Microsoft_Outlook_16.0, helo=english-breakfast.cloud9.net, Tests:
[AM.WBL=-3,BAYES_05=-0.5,HEADER_FROM_DIFFERENT_DOMAINS=0.001,RCVD_IN_DNSWL_MED=-2.3],
autolearn=unavailable autolearn_force=no, autolearnscore=-2.299, 4376 ms

Now this sender and a similar message would have been in my my corpus so I
don't expect IT to learn, but I'd expect others to.  





--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138260.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayes auto-learn - not happening

Posted by RW <rw...@googlemail.com>.
On Tue, 8 Aug 2017 13:04:16 -0700 (MST)
Scott wrote:

> The "3 points" criteria does not apply to manually learning 

No it's just a sanity check to reduce mistraining. If you can, don't
use autotraining at all.  

RE: Bayes auto-learn - not happening

Posted by Scott Techlist <te...@msws.org>.
>you need to train your bayes *by hand* to start with - how do you expect
>bayes classification with no hints afetr purge the database - train 200
>ham and spam mails and *after that* look further

Reindl:

Thanks.  I want to use some auto-training with very conservative thresholds set.  All of the messages I've checked would have classified correctly via autolearn comfortably in those ranges.

The 200 threshold is for USING the bayes, but not a auto-learning requirement.  Or that was my clear understanding from many posts.  I saw several old threads where others suggested similar but were corrected.  Maybe they changed it, dunno.

My concern is that auto-learn is not functioning properly.  I use Amavisd that calls spamassassin and has it's own issues.  Trying to make sure my system is operating properly.  It appears it is not to me.

No hint should be necessary for it to learn a spam.  Only to use bayes to score anything.  I get that.  No?





Re: Bayes auto-learn - not happening

Posted by Scott <te...@msxc.com>.
Cleared the database, ran below on the same message:

su amavis -c 'spamassassin -D 2>&1 -t onespam' | less

I didn't see any errors obvious to me.  

It recreated the databases and added this message as expected.


I don't know how to tell why it would not have auto-learned.  

Can you tell/ teach me from this?


Content analysis details:   (17.7 points, 5.0 required)

 pts rule name              description
---- ----------------------
--------------------------------------------------
 1.9 URIBL_ABUSE_SURBL      Contains an URL listed in the ABUSE SURBL
blocklist
                            [URIs: 145.239.41.28]
 0.0 SUBJ_DOLLARS           Subject starts with dollar amount
 3.0 SPF_HELO_SOFTFAIL      SPF: HELO does not match SPF record (softfail)
 1.1 DATE_IN_PAST_03_06     Date: is 3 to 6 hours before Received: date
 0.0 NORMAL_HTTP_TO_IP      URI: URI host has a public dotted-decimal IPv4
                            address
 0.0 HTML_EXTRA_CLOSE       BODY: HTML contains far too many close tags
 0.0 HTML_MESSAGE           BODY: HTML included in message
 1.1 MIME_HTML_ONLY         BODY: Message only has text/html MIME parts
 3.2 DCC_CHECK              Detected as bulk mail by DCC (dcc-servers.net)
 2.5 RAZOR2_CHECK           Listed in Razor2 (http://razor.sf.net/)
 2.4 RAZOR2_CF_RANGE_E8_51_100 Razor2 gives engine 8 confidence level
                            above 50%
                            [cf: 100]
 0.4 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
                            [cf: 100]
 0.0 DIGEST_MULTIPLE        Message hits more than one network digest check
 0.6 HTML_MIME_NO_HTML_TAG  HTML-only message, but there is no HTML tag
 0.1 MISSING_MID            Missing Message-Id: header
 1.3 RDNS_NONE              Delivered to internal network by a host with no
rDNS

Aug  8 15:47:11.098 [17077] dbg: check: tagrun - tag DKIMDOMAIN is still
blocking action 0
Aug  8 15:47:11.105 [17077] dbg: plugin:
Mail::SpamAssassin::Plugin::MIMEHeader=HASH(0x2ccc328) implements
'finish_tests', priority 0
Aug  8 15:47:11.105 [17077] dbg: plugin:
Mail::SpamAssassin::Plugin::Check=HASH(0x2e04e38) implements 'finish_tests',
priority 0
Aug  8 15:47:11.116 [17077] dbg: netset: cache trusted_networks
hits/attempts: 15/17, 88.2 %







--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138078.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayes auto-learn - not happening

Posted by Benny Pedersen <me...@junc.eu>.
Scott skrev den 2017-08-08 22:19:

> Does this one have the requisite 3-point match?  I don't understand how 
> to
> tell yet.

spamassassin -D 2>&1 -t mail.msg | less

should show why

Re: Bayes auto-learn - not happening

Posted by Scott <te...@msxc.com>.
Apologies, I meant sa-learn.  Brain fart.

Thanks for the clarification on the 3-point rule.

I've had a bunch of them come through.  They all get autolearn=no or I get a
few that say "unavailable" like the sample below.  I gather from trying to
figure out myself that unavailable may be things already learned.  Or
something else whatever that may be, per the wiki.  But if the database is
empty, it seems that "already learned" is not the reason for  "unavailable"
in this case anyway.

X-Spam-Status: Yes, score=20.678 tag=-9999 tag2=5 kill=6.4
        tests=[DATE_IN_PAST_03_06=1.076, DCC_CHECK=3.2,
DIGEST_MULTIPLE=0.001,
        HTML_EXTRA_CLOSE=0.001, HTML_MESSAGE=0.001,
        HTML_MIME_NO_HTML_TAG=0.635, MIME_HTML_ONLY=1.105, MISSING_MID=0.14,
        NORMAL_HTTP_TO_IP=0.001, RAZOR2_CF_RANGE_51_100=0.365,
        RAZOR2_CF_RANGE_E8_51_100=2.43, RAZOR2_CHECK=2.5, RDNS_NONE=1.274,
        SPF_HELO_SOFTFAIL=3, SPF_SOFTFAIL=3, SUBJ_DOLLARS=0.001,
        URIBL_ABUSE_SURBL=1.948] autolearn=unavailable autolearn_force=no

Does this one have the requisite 3-point match?  I don't understand how to
tell yet. 

I've cleared the db again.  Will let it run to see if it learns *anything*. 
So far I have not seen that happen.  Surely something will get a 3 way
match.





--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138075.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayes auto-learn - not happening

Posted by Benny Pedersen <me...@junc.eu>.
Scott skrev den 2017-08-08 22:04:
> The "3 points" criteria does not apply to manually learning via 
> sa-update
> then?

typo ?. sa-update does not learn, it just update rules, you meant 
sa-learn ?

when sa-learn is used, its not autolearn, so the limits are not appled

Re: Bayes auto-learn - not happening

Posted by Scott <te...@msxc.com>.
The "3 points" criteria does not apply to manually learning via sa-update
then?





--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138071.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayes auto-learn - not happening

Posted by RW <rw...@googlemail.com>.
On Tue, 8 Aug 2017 13:06:26 -0500
Scott Techlist wrote:

> Centos7
> Postfix 3.2.2
> Amavisd-new 2.11.0
> Spamassassin 3.4.0
> Site-wide configuration
> 
> This is a new box and I've configured some conservative values for
> auto-learn.  I've enabled it properly AFAIK, but I can't see any sign
> of it working.  
> 
> I have these set in local.cf
> use_bayes               1
> bayes_auto_learn        1
> bayes_auto_learn_threshold_nonspam -1.7
> bayes_auto_learn_threshold_spam 10.0
> # this is a filename prefix, not a directory per se
> bayes_path              /etc/mail/bayes/bayes
> bayes_file_mode         0666
> 
> -------------bayes prep ----------------
> Start fresh for troubleshooting:
> su amavis -c 'sa-learn --clear'
> 
> Add one spam manually and check tokens:
> 
> [root@tn2 mail]# su amavis -c 'sa-learn --dump magic'
> 0.000          0          3          0  non-token data: bayes db
> version 0.000          0          1          0  non-token data: nspam
> 0.000          0          0          0  non-token data: nham
> 0.000          0       2157          0  non-token data: ntokens
> 
> ---------amavisd prep----------------
> 
> Restart amavisd/spamassassin just to be sure all configs read..
> 
> ------- ready to process -------------
> 
> The next high scoring spam arrives, it was sent to my spam mailbox.
> It did NOT autolearn.  Nor did several others.  
> 
> To troubleshoot, I took one that did not autolearn, and learned it
> manually by: su amavis -c 'sa-learn -D --spam --showdots
> --mbox /home/mail/onespam
> 
> even though this message was slightly over the threshold, the log
> says it learned anyway: -D log snippet:
> ---------------------
> Aug  8 12:37:27.216 [13198] info: archive-iterator: skipping large
> message: 858 lines, 262203 bytes, limit 262144 bytes
> 
> Learned tokens from 1 message(s) (1 message(s) examined)
> ---------------------
> 
> Verified it learned:
> 
> [root@tn2 mail]# su amavis -c 'sa-learn --dump magic'
> 0.000          0          3          0  non-token data: bayes db
> version 0.000          0          2          0  non-token data: nspam
> 
> 
> Partial header from that message:
> 
> X-Spam-Flag: YES
> X-Spam-Score: 17.374
> X-Spam-Level: *****************
> X-Spam-Status: Yes, score=17.374 tag=-9999 tag2=5 kill=6.31
>         tests=[RCVD_IN_BRBL_LASTEXT=1.644, RCVD_IN_DNSWL_NONE=-0.0001,
>         RCVD_IN_RP_RNBL=1.284, RCVD_IN_SBL_CSS=3.558,
> RCVD_IN_SORBS_WEB=1.5, RP_MATCHES_RCVD=-0.001,
> SUSPICIOUS_RECIPS=2.497, URIBL_ABUSE_SURBL=1.948, URIBL_BLACK=1.7,
> URIBL_DBL_SPAM=2.5, URIBL_SBL=0.644, URIBL_SBL_A=0.1] autolearn=no
> autolearn_force=no
> 
> Why aren't my spams getting auto-learned?  If sa-learn "ate" it,
> shouldn't auto-learn too?

To autolearn spam you need 3 points from the body and 3 from headers.

Re: Bayes auto-learn - not happening, tentative success....

Posted by Scott <te...@msxc.com>.
Yeah, i don't know who the culprit is. sa-learn always worked. autolearn did
not. So far this am it's looking good.  An expected spread of autolearn no,
spam, and ham. Not a single unavailable. Will check this afternoon and
expect to call this done. Summary for other googlers to follow. 



--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138295.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayes auto-learn - not happening, tentative success....

Posted by RW <rw...@googlemail.com>.
On Thu, 10 Aug 2017 20:15:48 -0700 (MST)
Scott wrote:

>   For reasons beyond my skill set,
> SA will not auto-learn to a bayes db in a folder in /etc/mail/bayes.
> Regardless of wide open permissions on everything except /etc.  And
> the user's confirmed ability to write to the folder.

But sa-learn was working. It seems more likely that the difference is
between the ordinary SA scripts and amavis rather than between auto and
manual training. Amavis uses SA libraries from it's own perl code and
it's free to behave very differently.

Re: Bayes auto-learn - not happening, tentative success....

Posted by Scott <te...@msxc.com>.
Tom:
re selinux:
Yes, once I discovered the fix, I considered that could have been the casue. 
FWIW I'm not using it and it's disabled, so it *shouldn't* hose anything. 
But I would not be surprised if it were the culprit.






--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138313.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayes auto-learn - not happening, tentative success....

Posted by Tom Hendrikx <to...@whyscream.net>.
xOn 11-08-17 17:05, Scott wrote:
> I'm going to go back and look at my build notes but I think that directory
> got created for me. It's just as possible i followed some "guide".  I am
> positive i did not think it up on my own LOL.   I remember more than set of
> instructions one with that path setting, and it very well could be the
> related Centos7 package.  Glad i found the casue though. Regardless of the
> source. 
> 
> In the FWIW department, as shown above, I still don't have it in the default
> location (I know, risks...), but why it is happy there and not under /etc I
> don't know.  And really don't care at this point.   
> 

I had to go way back in thread to look it up, but I noticed you're
running Centos, which has selinux.

Maybe your custom path is disallowed under the amavis/spamd/whatever
role? And manual testing when su'ing from the root role will not have
the same impact as running amavis using an init system.

Kind regards,
	Tom


Re: Bayes auto-learn - not happening, tentative success....

Posted by Scott <te...@msxc.com>.
I'm going to go back and look at my build notes but I think that directory
got created for me. It's just as possible i followed some "guide".  I am
positive i did not think it up on my own LOL.   I remember more than set of
instructions one with that path setting, and it very well could be the
related Centos7 package.  Glad i found the casue though. Regardless of the
source. 

In the FWIW department, as shown above, I still don't have it in the default
location (I know, risks...), but why it is happy there and not under /etc I
don't know.  And really don't care at this point.   





--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138299.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayes auto-learn - not happening, tentative success....

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
>On 10.08.17 20:15, Scott wrote:
>>About the only difference in my old, functioning box and this new "clean"
>>install was the location of the bayes files.
>>
>>Old box:
>>/var/spool/amavisd/.spamassassin/
>>New box:
>>/etc/mail/bayes

On 11.08.17 16:22, Matus UHLAR - fantomas wrote:
>Do did you change bayes path in first place?

I mean why, of course

[deleted]

>don't set the path, that way it should work OOTB.
-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Spam is for losers who can't get business any other way.

Re: Bayes auto-learn - not happening, tentative success....

Posted by RW <rw...@googlemail.com>.
On Fri, 11 Aug 2017 16:22:50 +0200
Matus UHLAR - fantomas wrote:


> don't set the path, that way it should work OOTB.

Maybe amavis is different and has it's own internl default location, but
the equivalent for spamd relies on the packager giving the spamd user a
unix home directory.

I once saw a Bayes howto that recommended:

 mkdir  /nonexistent

Re: Bayes auto-learn - not happening, tentative success....

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 10.08.17 20:15, Scott wrote:
>About the only difference in my old, functioning box and this new "clean"
>install was the location of the bayes files.
>
>Old box:
>/var/spool/amavisd/.spamassassin/
>New box:
>/etc/mail/bayes

Do did you change bayes path in first place?

amavis is the only one who processes the database, there's no need to change
it and play with permissions (which might be the reason why it does not
work).

you can still train as root:

sa-learn --dbpath /var/spool/amavis/.spamassassin/  ...

>Finally to the path setting:
>
>I tried setting the default path, and changing the path filename suffix of
>bayes to mybayes for curiosity...
>bayes_path /var/spool/amavisd/.spamassassin/mybayes
>
>Upon sending a test message, SA promptly auto-learned as ham and created the
>2 new files starting with "mybayes" instead of bayes.
>
>So changing the filename didn't hurt autolearn

don't set the path, that way it should work OOTB.


-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
The only substitute for good manners is fast reflexes. 

Re: Bayes auto-learn - not happening, tentative success....

Posted by Scott <te...@msxc.com>.
Well, here's a development...

About the only difference in my old, functioning box and this new "clean"
install was the location of the bayes files.  

Old box:
/var/spool/amavisd/.spamassassin/
New box:
/etc/mail/bayes

The other details that caught my attention were that on the old box, the
ONLY bayes thing that was explicitly set was the path, which was:
bayes_path /home/amavis/.spamassassin/bayes

On this (new) box, I commented out most of the bayes settings similarly.
Restarted amavis/SA and got some errors about no bayes db.  Ignored.

Sent an email anyway.  Guess what?  autolearn=ham right out of the gate, it
and created the database files.

And the very next message received was also shown as autolearn=ham.  

** that eliminates any doubt that a minimum corpus is necessary to autolearn
(whether that's a good practice is a different topic).

I cleared the db's with sa-learn --clear.  I received a new message,
autolearn=ham, again.  Result:

[root@mail2 .spamassassin]# su amavis -c 'sa-learn --dump magic'
0.000          0          3          0  non-token data: bayes db version
0.000          0          0          0  non-token data: nspam
0.000          0          1          0  non-token data: nham


[root@mail2 root]# ll /var/spool/amavisd/.spamassassin
total 28
-rw-rw-rw- 1 amavis amavis 12288 Aug 10 21:20 bayes_seen
-rw-rw-rw- 1 amavis amavis 12288 Aug 10 21:20 bayes_toks
-rw-r--r-- 1 amavis amavis  1869 Aug  8 15:32 user_prefs

I gradually restored all the bayes settings in local.cf, restarting amavis
each time, clearing the db and sending that same test message.  
Each time the result was autolearn=ham

Finally to the path setting:

I tried setting the default path, and changing the path filename suffix of
bayes to mybayes for curiosity...
bayes_path /var/spool/amavisd/.spamassassin/mybayes

Upon sending a test message, SA promptly auto-learned as ham and created the
2 new files starting with "mybayes" instead of bayes.

So changing the filename didn't hurt autolearn

Next I kept the new name and changed the folder back to where it was:  
/etc/mail/bayes/mybayes
Send a test message
This time, NO new file was created.  It apparently cannot write to it.
Permissions on mail, and on bayes are both amavis:amavis & 777.

**** autolearn=unavailable **** (As Matus expected) 

Next, I log on as amavis.  cd to /etc/mail/bayes.  Create a file, edit it,
and then delete it.  User amavis CAN write there.  I verify I'm amavis, Try
to cd to some other user's folder, get "permission denied", check.  

Finally, because I don't like the hidden directory anyway, I try to move the
bayes folder from the default.  I configure the bayes path to: 
bayes_path /var/spool/amavisd/bayes/bayes

Send my test message, voila, db files created, and autolearn=ham  Success!
(tentative, cautiously optimistic)

I hope I have solved the mystery.  For reasons beyond my skill set, SA will
not auto-learn to a bayes db in a folder in /etc/mail/bayes.  Regardless of
wide open permissions on everything except /etc.  And the user's confirmed
ability to write to the folder.

Bug I guess?  

SHIT! What a PITA to figure out.

I'm gonna let this cook overnight and see how it does.  Will report back.








--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138267.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayes auto-learn - not happening

Posted by Scott <te...@msxc.com>.
Trying to check for any locking issues I ran sa-update in debug moed 
su amavis -c 'sa-learn -D --spam --showdots --mbox /home/mail/onespam'

Appears to be creating and dropping lock files.  Nothing left over after
running..

Aug 10 16:48:39.109 [7524] dbg: bayes: expiry starting
Aug 10 16:48:39.110 [7524] dbg: locker: mode is 438
Aug 10 16:48:39.110 [7524] dbg: locker: safe_lock: created
/etc/mail/bayes/bayes.lock.mail2.myserver.com.7524
Aug 10 16:48:39.110 [7524] dbg: locker: safe_lock: trying to get lock on
/etc/mail/bayes/bayes with 0 retries
Aug 10 16:48:39.110 [7524] dbg: locker: safe_lock: link to
/etc/mail/bayes/bayes.lock: link ok
Aug 10 16:48:39.110 [7524] dbg: bayes: tie-ing to DB file R/W
/etc/mail/bayes/bayes_toks
Aug 10 16:48:39.110 [7524] dbg: bayes: tie-ing to DB file R/W
/etc/mail/bayes/bayes_seen
Aug 10 16:48:39.111 [7524] dbg: bayes: found bayes db version 3
Aug 10 16:48:39.111 [7524] dbg: locker: refresh_lock: refresh
/etc/mail/bayes/bayes.lock
Aug 10 16:48:39.111 [7524] dbg: bayes: expiry completed
Aug 10 16:48:39.111 [7524] dbg: archive-iterator:
_set_default_message_selection_opts After: Scanprob[1], want_date[0],
cache[0], from_regex[^From \S+ ?(\S\S\S \S\S\S .\d .\d:\d\d:\d\d
\d{4}|.\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)]
Aug 10 16:48:39.115 [7524] dbg: archive-iterator: _run_mailbox
/home/mail/onespam, ofs 0, limit 262144
Aug 10 16:48:39.118 [7524] info: archive-iterator: skipping large message:
1277 lines, 262241 bytes, limit 262144 bytes

Learned tokens from 0 message(s) (0 message(s) examined)
Aug 10 16:48:39.118 [7524] dbg: plugin:
Mail::SpamAssassin::Plugin::Bayes=HASH(0x29bb798) implements
'learner_close', priority 0
Aug 10 16:48:39.118 [7524] dbg: bayes: untie-ing
Aug 10 16:48:39.119 [7524] dbg: bayes: files locked, now unlocking lock
Aug 10 16:48:39.119 [7524] dbg: locker: safe_unlock: unlink
/etc/mail/bayes/bayes.lock




--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138266.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayes auto-learn - not happening

Posted by Scott <te...@msxc.com>.
Scouring the differences between this and my old server I see this:

Old server:
-rw-------   1 amavis amavis     83472 Aug 10 15:51 bayes_journal
-rw-------   1 amavis amavis      1986 Aug 10 15:51 bayes.mutex
-rw-------   1 amavis amavis 328491008 Aug 10 15:51 bayes_seen
-rw-------   1 amavis amavis   5443584 Aug 10 15:51 bayes_toks

I gathered the journal may very well not always be there and maybe that's
OK.  

But from what I could tell googling the bayes.mutex file is a lock file:
(for others:
http://lists.mailscanner.info/pipermail/mailscanner/2004-November/043067.html)

Is missing IT a problem?  Is this a hint? (fingers crossed)











--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138264.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayes auto-learn - not happening

Posted by Scott <te...@msxc.com>.
OK, so I don't think auto-learn works on spam.  What about HAM?

I've raised the floor to auto-learn HAM to 1.  Before anyone gives me any
grief, it's just for testing.  I'll rebuild the bayes db from a corpus when
I get it working.

So SPAM takes the 3-way patch, 3 from the header, 3 from the body.  but what
about HAM.  Since the default for autolearn is much lower than 6, I presume
this same limitation does not apply.

So hams should be free to auto-learn with any score that is below my
threshold (1).

Like this one, right? (assuming it is not already learned):

Aug 10 14:21:46 mail2 amavis[3231]: (03231-06) Passed CLEAN
{RelayedInbound}, [168.100.1.7]:43757 [173.167.109.218] ESMTP/LMTP
<ow...@postfix.org> -> <te...@myvirt.com>,
(ESMTPS://[168.100.1.7]:43757 < ESMTPS://173.203.187.85 < 173.167.109.218),
Queue-ID: 5EE363BF5, Message-ID:
<01...@mefox.org>, mail_id: JkAvl418yTui, b:
ZbU4iXvCD, Hits: -5.799, size: 5533, queued_as: BD81F3EE7, Subject: "RE:
reloading postfix with systemd", From: <ne...@mefox.org>, X-Mailer:
Microsoft_Outlook_16.0, helo=english-breakfast.cloud9.net, Tests:
[AM.WBL=-3,BAYES_05=-0.5,HEADER_FROM_DIFFERENT_DOMAINS=0.001,RCVD_IN_DNSWL_MED=-2.3],
autolearn=unavailable autolearn_force=no, autolearnscore=-2.299, 4376 ms

Now this sender and a similar message would have been in my my corpus so I
don't expect IT to learn, but I'd expect others to. 




--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138261.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayes auto-learn - not happening

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 08.08.17 14:38, Scott wrote:
>Brand new spam arrives.  It gets
>autolearn=unavailable.
[...]
>su amavis -c 'sa-learn -D --spam --showdots  --max-size=6000000 --mbox
>/home/mail/twospam'
>
>Aug  8 16:35:23.567 [18045] dbg: bayes: learned
>'419769464db0fabb0f1220f9ae0cf12931ad7076@sa_generated', atime: 1502226537
>Learned tokens from 1 message(s) (1 message(s) examined)
>
>At it learned it.  So autolearn=unavailable was NOT due to the token already
>there.

autolearn=unavailable apparently due to not accessible bayes database.

try running "ls -la ~amavis/.spamassassin/" - apparently permissions make
the directory or files in it unwritable for amavis user.
-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Atheism is a non-prophet organization. 

Re: Bayes auto-learn - not happening

Posted by John Hardin <jh...@impsec.org>.
On Tue, 8 Aug 2017, Ian Zimmerman wrote:

> I stopped
> autolearning and hacked up some scripts that put duplicate of each ham
> message into a folder which is then processed by sa-learn from a
> cronjob, with sufficient delay that I can review the contents and remove
> any false negatives; and similarly with spam, excluding the utterly
> horrible category which just goes to /dev/null.

This is generally a good idea, unless you have a really high-volume 
environment - are you an ISP?

Keeping your training corpora around lets you review it for 
misclassifications and retrain very easily if things go off the rails.

Autolearn may be useful once you are initially manually trained. Then you 
can focus on manually training the FPs and FNs.

It's also important to be careful what you train with. If you allow users 
to submit messages for training (particularly a global bayes) then you 
either need to have strong trust in those users' judgement, or review what 
they submit before training with it.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Joan Peterson is like that: you expect at least a pseudological
   argument, but instead you get the weird ramblings of a woman with
   the critical thinking abilities of an 18th century peasant.  -- Ken
-----------------------------------------------------------------------
  7 days until the 72nd anniversary of the end of World War II

Re: Bayes auto-learn - not happening

Posted by David Jones <dj...@ena.com>.
On 08/08/2017 08:02 PM, Ian Zimmerman wrote:
> On 2017-08-08 15:20, Scott wrote:
> 
>> Another new one  big score, auto-learn disabled.  This one is fairly small.
>>
>> X-Spam-Status: Yes, score=29.428 tag=-9999 tag2=5 kill=6.4
>>          tests=[DATE_IN_PAST_03_06=1.076, DCC_CHECK=3.2,
>> DIGEST_MULTIPLE=0.001,
>>          FILL_THIS_FORM=0.001, FROM_MISSPACED=0.001, FROM_MISSP_SPF_FAIL=1,
>>          HEADER_FROM_DIFFERENT_DOMAINS=0.001, HEXHASH_WORD=1,
>>          HTML_EXTRA_CLOSE=0.001, HTML_MESSAGE=0.001,
>>          HTML_MIME_NO_HTML_TAG=0.635, MIME_HTML_ONLY=1.105, MISSING_MID=0.14,
>>          NORMAL_HTTP_TO_IP=0.001, RAZOR2_CF_RANGE_51_100=0.365,
>>          RAZOR2_CF_RANGE_E8_51_100=2.43, RAZOR2_CHECK=2.5,
>>          RCVD_IN_BRBL_LASTEXT=1.644, RDNS_NONE=1.274, SPF_FAIL=4,
>>          SPF_HELO_FAIL=4, STYLE_GIBBERISH=3.093,
>>          T_HTML_TAG_BALANCE_CENTER=0.01, URIBL_ABUSE_SURBL=1.948,
>>          WEIRD_QUOTING=0.001] autolearn=unavailable autolearn_force=no
>>
>> Can you tell if this one has the 3 point match?
> 
> Scott,
> 
> when I tried to use the autolearn feature I was as confused as you are.
> As far as I remember, the 3 point each from header and body is not the
> only requirement; the full truth is that some rules are "privileged" and
> can contribute to autolearning while others cannot.  I found it opaque
> in the extreme and essentially unpredictable, and so I stopped
> autolearning and hacked up some scripts that put duplicate of each ham
> message into a folder which is then processed by sa-learn from a
> cronjob, with sufficient delay that I can review the contents and remove
> any false negatives; and similarly with spam, excluding the utterly
> horrible category which just goes to /dev/null.
> 
> It may not be possible for you to adopt such a process if your volume is
> high, but OTOH in that case you probably have users to help you :)
> 
> I think this is what RW is telling you, too.
> 
> FWIW, this is documented (sort of) by:
> 
> perldoc Mail::SpamAssassin::Plugin::AutoLearnThreshold
> 

Same here.  I had a little success with autolearn.  When I started 
splitting out messages into a spam and ham folder and using a cron 
script to train explicitly, the BAYES hits became very accurate and 
helped with zero-hour spam which is the hardest to block.

I setup an iRedmail server on a local-only subdomain and send/BCC copies 
of messages over to it.  Then I can use simple Inbox rules to sort or 
discard them.  Then I cron'd spam and ham training based on the Maildir 
"cur" folders.  This requires me to do a quick scan of the unread 
messages.  When I mark them as read, then they get sa-learn'd.  Takes a 
few minutes a day and drastically improved the mail filtering.

A side effect of this has allowed me to easily spot some new spam 
campaigns and messages that are scoring just below the block threshold 
so I can add them to local custom rules.  Sometimes these are legit 
senders with good opt-out so I add them to a whitelist_auth entry.

-- 
David Jones

Re: Bayes auto-learn - not happening

Posted by Ian Zimmerman <it...@very.loosely.org>.
On 2017-08-08 15:20, Scott wrote:

> Another new one  big score, auto-learn disabled.  This one is fairly small.  
> 
> X-Spam-Status: Yes, score=29.428 tag=-9999 tag2=5 kill=6.4
>         tests=[DATE_IN_PAST_03_06=1.076, DCC_CHECK=3.2,
> DIGEST_MULTIPLE=0.001,
>         FILL_THIS_FORM=0.001, FROM_MISSPACED=0.001, FROM_MISSP_SPF_FAIL=1,
>         HEADER_FROM_DIFFERENT_DOMAINS=0.001, HEXHASH_WORD=1,
>         HTML_EXTRA_CLOSE=0.001, HTML_MESSAGE=0.001,
>         HTML_MIME_NO_HTML_TAG=0.635, MIME_HTML_ONLY=1.105, MISSING_MID=0.14,
>         NORMAL_HTTP_TO_IP=0.001, RAZOR2_CF_RANGE_51_100=0.365,
>         RAZOR2_CF_RANGE_E8_51_100=2.43, RAZOR2_CHECK=2.5,
>         RCVD_IN_BRBL_LASTEXT=1.644, RDNS_NONE=1.274, SPF_FAIL=4,
>         SPF_HELO_FAIL=4, STYLE_GIBBERISH=3.093,
>         T_HTML_TAG_BALANCE_CENTER=0.01, URIBL_ABUSE_SURBL=1.948,
>         WEIRD_QUOTING=0.001] autolearn=unavailable autolearn_force=no
> 
> Can you tell if this one has the 3 point match?

Scott,

when I tried to use the autolearn feature I was as confused as you are.
As far as I remember, the 3 point each from header and body is not the
only requirement; the full truth is that some rules are "privileged" and
can contribute to autolearning while others cannot.  I found it opaque
in the extreme and essentially unpredictable, and so I stopped
autolearning and hacked up some scripts that put duplicate of each ham
message into a folder which is then processed by sa-learn from a
cronjob, with sufficient delay that I can review the contents and remove
any false negatives; and similarly with spam, excluding the utterly
horrible category which just goes to /dev/null.

It may not be possible for you to adopt such a process if your volume is
high, but OTOH in that case you probably have users to help you :)

I think this is what RW is telling you, too.

FWIW, this is documented (sort of) by:

perldoc Mail::SpamAssassin::Plugin::AutoLearnThreshold

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.

Re: Bayes auto-learn - not happening

Posted by Scott <te...@msxc.com>.
Another new one  big score, auto-learn disabled.  This one is fairly small.  

X-Spam-Status: Yes, score=29.428 tag=-9999 tag2=5 kill=6.4
        tests=[DATE_IN_PAST_03_06=1.076, DCC_CHECK=3.2,
DIGEST_MULTIPLE=0.001,
        FILL_THIS_FORM=0.001, FROM_MISSPACED=0.001, FROM_MISSP_SPF_FAIL=1,
        HEADER_FROM_DIFFERENT_DOMAINS=0.001, HEXHASH_WORD=1,
        HTML_EXTRA_CLOSE=0.001, HTML_MESSAGE=0.001,
        HTML_MIME_NO_HTML_TAG=0.635, MIME_HTML_ONLY=1.105, MISSING_MID=0.14,
        NORMAL_HTTP_TO_IP=0.001, RAZOR2_CF_RANGE_51_100=0.365,
        RAZOR2_CF_RANGE_E8_51_100=2.43, RAZOR2_CHECK=2.5,
        RCVD_IN_BRBL_LASTEXT=1.644, RDNS_NONE=1.274, SPF_FAIL=4,
        SPF_HELO_FAIL=4, STYLE_GIBBERISH=3.093,
        T_HTML_TAG_BALANCE_CENTER=0.01, URIBL_ABUSE_SURBL=1.948,
        WEIRD_QUOTING=0.001] autolearn=unavailable autolearn_force=no

Can you tell if this one has the 3 point match?





--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138085.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayes auto-learn - not happening

Posted by Scott <te...@msxc.com>.
I was getting my commands missed up, been looking at this too long.  When I
ran

su amavis -c 'spamassassin -D 2>&1 -t onespam'

That caused it to LEARN the spam.  Database went from not there to one
learned.  Auto-learn apparently.  That's what it should have done when it
arrived.

Brand new spam arrives.  It gets
autolearn=unavailable.

X-Spam-Status: Yes, score=20.704 tag=-9999 tag2=5 kill=6.4
        tests=[DATE_IN_PAST_06_12=1.103, DCC_CHECK=3.2,
DIGEST_MULTIPLE=0.001,
        HTML_EXTRA_CLOSE=0.001, HTML_MESSAGE=0.001,
        HTML_MIME_NO_HTML_TAG=0.635, MIME_HTML_ONLY=1.105, MISSING_MID=0.14,
        NORMAL_HTTP_TO_IP=0.001, RAZOR2_CF_RANGE_51_100=0.365,
        RAZOR2_CF_RANGE_E8_51_100=2.43, RAZOR2_CHECK=2.5, RDNS_NONE=1.274,
        SPF_HELO_SOFTFAIL=3, SPF_SOFTFAIL=3, URIBL_ABUSE_SURBL=1.948]
        autolearn=unavailable autolearn_force=no

That implies no auto-learn because the token exists (or there was something
else) as I understand it.  So I try to learn that one spam again...

I had to increase the size limit via:

su amavis -c 'sa-learn -D --spam --showdots  --max-size=6000000 --mbox
/home/mail/twospam'

Aug  8 16:35:23.567 [18045] dbg: bayes: learned
'419769464db0fabb0f1220f9ae0cf12931ad7076@sa_generated', atime: 1502226537
Learned tokens from 1 message(s) (1 message(s) examined)

At it learned it.  So autolearn=unavailable was NOT due to the token already
there.

Is there a size limit built into autolearn?  








--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138082.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayes auto-learn - not happening

Posted by Scott <te...@msxc.com>.
Benny:
re tflags
> tflags foo-rule-name noautolearn
> and you can force autolearn based on rulename
> https://lists.gt.net/spamassassin/users/184996
> there is a long thread there that explain it more
>and all condition must be met for learning 

I read the thread.  Nothing there concrete enough for my to latch onto.  I
mean I get the gist of it, but no details on how to look at my tests and see
if I have the requisite 3 parts needed.






--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138081.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayes auto-learn - not happening

Posted by Benny Pedersen <me...@junc.eu>.
Scott skrev den 2017-08-08 22:06:

> Better, what test flags in general disable auto-learn?

tflags foo-rule-name noautolearn

and you can force autolearn based on rulename

https://lists.gt.net/spamassassin/users/184996

there is a long thread there that explain it more

and all condition must be met for learning

Re: Bayes auto-learn - not happening

Posted by Scott <te...@msxc.com>.
> some of the listed tags have tflags that disable autolearn

< there is nothing to fix here 

Benny:  Will you elaborate for me please?  So I can understand and
self-help.

Better, what test flags in general disable auto-learn?



--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138072.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayes auto-learn - not happening

Posted by Benny Pedersen <me...@junc.eu>.
Scott Techlist skrev den 2017-08-08 20:06:

> X-Spam-Flag: YES
> X-Spam-Score: 17.374
> X-Spam-Level: *****************
> X-Spam-Status: Yes, score=17.374 tag=-9999 tag2=5 kill=6.31
>         tests=[RCVD_IN_BRBL_LASTEXT=1.644, RCVD_IN_DNSWL_NONE=-0.0001,
>         RCVD_IN_RP_RNBL=1.284, RCVD_IN_SBL_CSS=3.558, 
> RCVD_IN_SORBS_WEB=1.5,
>         RP_MATCHES_RCVD=-0.001, SUSPICIOUS_RECIPS=2.497,
>         URIBL_ABUSE_SURBL=1.948, URIBL_BLACK=1.7, URIBL_DBL_SPAM=2.5,
>         URIBL_SBL=0.644, URIBL_SBL_A=0.1] autolearn=no 
> autolearn_force=no

> Can't figure out what's wrong...

some of the listed tags have tflags that disable autolearn

there is nothing to fix here

Re: Bayes auto-learn - not happening - tentative success

Posted by Scott <te...@msxc.com>.
 Aug 10, 2017; 10:15pm Scottonline Scott Re: Bayes auto-learn - not
happening, tentative success....

Well, here's a development...

About the only difference in my old, functioning box and this new "clean"
install was the location of the bayes files.  

Old box:
/var/spool/amavisd/.spamassassin/
New box:
/etc/mail/bayes

The other details that caught my attention were that on the old box, the
ONLY bayes thing that was explicitly set was the path, which was:
bayes_path /home/amavis/.spamassassin/bayes

On this (new) box, I commented out most of the bayes settings similarly.
Restarted amavis/SA and got some errors about no bayes db.  Ignored.

Sent an email anyway.  Guess what?  autolearn=ham right out of the gate, it
and created the database files.

And the very next message received was also shown as autolearn=ham.  

** that eliminates any doubt that a minimum corpus is necessary to autolearn
(whether that's a good practice is a different topic).

I cleared the db's with sa-learn --clear.  I received a new message,
autolearn=ham, again.  Result:

[root@mail2 .spamassassin]# su amavis -c 'sa-learn --dump magic'
0.000          0          3          0  non-token data: bayes db version
0.000          0          0          0  non-token data: nspam
0.000          0          1          0  non-token data: nham


[root@mail2 root]# ll /var/spool/amavisd/.spamassassin
total 28
-rw-rw-rw- 1 amavis amavis 12288 Aug 10 21:20 bayes_seen
-rw-rw-rw- 1 amavis amavis 12288 Aug 10 21:20 bayes_toks
-rw-r--r-- 1 amavis amavis  1869 Aug  8 15:32 user_prefs

I gradually restored all the bayes settings in local.cf, restarting amavis
each time, clearing the db and sending that same test message.  
Each time the result was autolearn=ham

Finally to the path setting:

I tried setting the default path, and changing the path filename suffix of
bayes to mybayes for curiosity...
bayes_path /var/spool/amavisd/.spamassassin/mybayes

Upon sending a test message, SA promptly auto-learned as ham and created the
2 new files starting with "mybayes" instead of bayes.

So changing the filename didn't hurt autolearn

Next I kept the new name and changed the folder back to where it was:  
/etc/mail/bayes/mybayes
Send a test message
This time, NO new file was created.  It apparently cannot write to it.
Permissions on mail, and on bayes are both amavis:amavis & 777.

**** autolearn=unavailable **** (As Matus expected)

Next, I log on as amavis.  cd to /etc/mail/bayes.  Create a file, edit it,
and then delete it.  User amavis CAN write there.  I verify I'm amavis, Try
to cd to some other user's folder, get "permission denied", check.  

Finally, because I don't like the hidden directory anyway, I try to move the
bayes folder from the default.  I configure the bayes path to:
bayes_path /var/spool/amavisd/bayes/bayes

Send my test message, voila, db files created, and autolearn=ham  Success!
(tentative, cautiously optimistic)

I hope I have solved the mystery.  For reasons beyond my skill set, SA will
not auto-learn to a bayes db in a folder in /etc/mail/bayes.  Regardless of
wide open permissions on everything except /etc.  And the user's confirmed
ability to write to the folder.

Bug I guess?  

SHIT! What a PITA to figure out.

I'm gonna let this cook overnight and see how it does.  Will report back.






--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138268.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayes auto-learn - Solved

Posted by John Hardin <jh...@impsec.org>.
On Fri, 11 Aug 2017, John Hardin wrote:

> On Fri, 11 Aug 2017, Scott wrote:
>
>>  I'm chicken.  :D
>>
>>  I don't have much (almost no) experience overriding those yum packages.
>
> It's pretty simple, just "yum install {local_filename}"
>
>>  And those warnings I got when I rebuilt from source made me nervous.
>
> I suppose I could publish the Centos 7 x86/64 RPMs I build and use on my 
> website. They wouldn't be signed...

   http://www.impsec.org/~jhardin/antispam/centos7/

You do need the epel-release package installed to use this.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Je ne suis pas Charlie. Je suis armé.
-----------------------------------------------------------------------
  4 days until the 72nd anniversary of the end of World War II

Re: Bayes auto-learn - Solved

Posted by John Hardin <jh...@impsec.org>.
On Fri, 11 Aug 2017, Scott wrote:

> I'm chicken.  :D
>
> I don't have much (almost no) experience overriding those yum packages.

It's pretty simple, just "yum install {local_filename}"

> And those warnings I got when I rebuilt from source made me nervous.

I suppose I could publish the Centos 7 x86/64 RPMs I build and use on my 
website. They wouldn't be signed...


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   USMC Rules of Gunfighting #6: If you can choose what to bring
   to a gunfight, bring a long gun and a friend with a long gun.
-----------------------------------------------------------------------
  4 days until the 72nd anniversary of the end of World War II

Re: Bayes auto-learn - Solved

Posted by Scott <te...@msxc.com>.
I'm chicken.  :D

I don't have much (almost no) experience overriding those yum packages.  And
those warnings I got when I rebuilt from source made me nervous.

Maybe when the dust settles...





--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138316.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayes auto-learn - Solved

Posted by John Hardin <jh...@impsec.org>.
On Fri, 11 Aug 2017, Scott wrote:

> Centos7 (selinux disabled at the time of testing)
> Spamassassin 3.4.0

Next on your plate: upgrading to 3.4.1...

https://dl.fedoraproject.org/pub/fedora/linux/releases/25/Everything/source/tree/Packages/s/spamassassin-3.4.1-9.fc25.src.rpm

It works jes' fine here.

...ooo, time to update:
https://dl.fedoraproject.org/pub/fedora/linux/releases/26/Everything/source/tree/Packages/s/spamassassin-3.4.1-12.fc26.src.rpm


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   ...for a nation to tax itself into prosperity is like
   a man standing in a bucket and trying to
   lift himself up by the handle.                 -- Winston Churchill
-----------------------------------------------------------------------
  4 days until the 72nd anniversary of the end of World War II

Re: Bayes auto-learn - Solved

Posted by Scott <te...@msxc.com>.
Restored my last copy of my manually learned bayes database from the "bad"
directory to the new location.  Let it cook for a day.  Of the messages that 
made it through postscreen, RBLs, etc since my logs rotated early this
morning, 93% were autolearn=no,  2% autolearn=spam, 5% autolearn=ham.  ZERO
autolearn=unavailable.

This is solved.  

Summary for posterity for anyone who may run into the same problem.  Some
tags for searching:

Centos7 (selinux disabled at the time of testing)
Postfix 3.2.2
Amavisd-new amavisd amavis 2.11.0
Spamassassin 3.4.0 
bayes
autolearn=unavailable

I strongly suspected bayes auto-learn was not functioning.  Read the thread
for evidence.  In local.cf had the bayes path set to:
/etc/mail/bayes/bayes

Don't remember if it came packaged that way or if I followed someone else's
"guide" to ed up with that bad location.  I do see one well written guide
that specified that folder, honestly I'm not sure. No matter.

In any case, the end result was that any message that would have been
autolearned got "autolearn=unavailable" and did not learn.

The fix for this setup as listed above was to NOT have the directory under
/etc.  Even with wide open (777) write permissions, amavisd/SA was
apparently unable to write there.  

I moved the bayes database under /var/spool/amavisd/bayes and all now
functions properly.  Note the default if no path is specified is
/var/spool/amavisd/.spamassassin/bayes IIRC (assuming /var/spool/amavisd is
the home directory for amavis)  I tested both, it appears happy with either.
local.cf:
bayes_path /var/spool/amavisd/bayes/bayes

I now have a journal file FWIW:
[root@mail2 root]# ls -la /var/spool/amavisd/bayes
total 4280
drwx------ 2 amavis amavis    4096 Aug 11 16:07 .
drwxr-xr-x 7 amavis amavis    4096 Aug 10 22:18 ..
-rw-rw-rw- 1 amavis amavis   81888 Aug 11 16:07 bayes_journal
-rw-rw-rw- 1 amavis amavis   86016 Aug 11 16:07 bayes_seen
-rw-rw-rw- 1 amavis amavis 5267456 Aug 11 16:07 bayes_toks

And my database is happy:
[root@mail2 root]# su amavis -c 'sa-learn --dump magic'
0.000          0          3          0  non-token data: bayes db version
0.000          0        359          0  non-token data: nspam
0.000          0        494          0  non-token data: nham
0.000          0     149970          0  non-token data: ntokens

I now know way more about amavis-new and spamassassin than I did when I
started.  Guess that's the silver lining to a few days of hair pulling.  

Thanks,
Scott













--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138314.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayes auto-learn - not happening

Posted by Scott <te...@msxc.com>.
> Imho You need 100 ham and 100 spam to auto learning working. Do manual
learning

See earlier post today.  I've got it loaded up, right?:

[root@tn2 mail]# su amavis -c 'sa-learn --dump magic'
0.000          0          3          0  non-token data: bayes db version
0.000          0        349          0  non-token data: nspam
0.000          0        478          0  non-token data: nham
0.000          0     166030          0  non-token data: ntokens





--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138263.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayes auto-learn - not happening

Posted by AM <ad...@gmail.com>.
Imho You need 100 ham and 100 spam to auto learning working. Do manual
learning

08.08.2017 8:20 PM "Scott Techlist" <te...@msws.org> napisał(a):

> Centos7
> Postfix 3.2.2
> Amavisd-new 2.11.0
> Spamassassin 3.4.0
> Site-wide configuration
>
> This is a new box and I've configured some conservative values for
> auto-learn.  I've enabled it properly AFAIK, but I can't see any sign of it
> working.
>
> I have these set in local.cf
> use_bayes               1
> bayes_auto_learn        1
> bayes_auto_learn_threshold_nonspam -1.7
> bayes_auto_learn_threshold_spam 10.0
> # this is a filename prefix, not a directory per se
> bayes_path              /etc/mail/bayes/bayes
> bayes_file_mode         0666
>
> -------------bayes prep ----------------
> Start fresh for troubleshooting:
> su amavis -c 'sa-learn --clear'
>
> Add one spam manually and check tokens:
>
> [root@tn2 mail]# su amavis -c 'sa-learn --dump magic'
> 0.000          0          3          0  non-token data: bayes db version
> 0.000          0          1          0  non-token data: nspam
> 0.000          0          0          0  non-token data: nham
> 0.000          0       2157          0  non-token data: ntokens
>
> ---------amavisd prep----------------
>
> Restart amavisd/spamassassin just to be sure all configs read..
>
> ------- ready to process -------------
>
> The next high scoring spam arrives, it was sent to my spam mailbox.  It
> did NOT autolearn.  Nor did several others.
>
> To troubleshoot, I took one that did not autolearn, and learned it
> manually by:
> su amavis -c 'sa-learn -D --spam --showdots  --mbox /home/mail/onespam
>
> even though this message was slightly over the threshold, the log says it
> learned anyway:
> -D log snippet:
> ---------------------
> Aug  8 12:37:27.216 [13198] info: archive-iterator: skipping large
> message: 858 lines, 262203 bytes, limit 262144 bytes
>
> Learned tokens from 1 message(s) (1 message(s) examined)
> ---------------------
>
> Verified it learned:
>
> [root@tn2 mail]# su amavis -c 'sa-learn --dump magic'
> 0.000          0          3          0  non-token data: bayes db version
> 0.000          0          2          0  non-token data: nspam
>
>
> Partial header from that message:
>
> X-Spam-Flag: YES
> X-Spam-Score: 17.374
> X-Spam-Level: *****************
> X-Spam-Status: Yes, score=17.374 tag=-9999 tag2=5 kill=6.31
>         tests=[RCVD_IN_BRBL_LASTEXT=1.644, RCVD_IN_DNSWL_NONE=-0.0001,
>         RCVD_IN_RP_RNBL=1.284, RCVD_IN_SBL_CSS=3.558,
> RCVD_IN_SORBS_WEB=1.5,
>         RP_MATCHES_RCVD=-0.001, SUSPICIOUS_RECIPS=2.497,
>         URIBL_ABUSE_SURBL=1.948, URIBL_BLACK=1.7, URIBL_DBL_SPAM=2.5,
>         URIBL_SBL=0.644, URIBL_SBL_A=0.1] autolearn=no autolearn_force=no
>
> Why aren't my spams getting auto-learned?  If sa-learn "ate" it, shouldn't
> auto-learn too?
>
> I know there is a default 200 threshold before Bayes starts tagging
> anything, but I understand it should learn without issue.
>
> Can't figure out what's wrong...
>
>
>
>
>
>
>
>
>
>
>
>
>
>

RE: Bayes auto-learn - not happening

Posted by Scott <te...@msxc.com>.
Here's a verbose log of amavis/spamassassin processing another high score
that just came through.  I don't see a peep about auto-learn.  But it was
unavailable too. 

(posting via nabble, apologies if it wraps)

Aug 10 11:03:39 mail2 amavis[377]: (00377-01) LMTP :10024
/var/spool/amavisd/tmp/amavis-20170810T110339-00377-JQiRqEtF:
<co...@qq.com> -> <sh...@myvirt.com> SIZE=175613 BODY=8BITMIME
ENVID=671416;675610;322132;sachin2 Received: from tn2.companypostoffice.com
([127.0.0.1]) by localhost (tn2.companypostoffice.com [127.0.0.1])
(amavisd-new, port 10024) with LMTP for <sh...@myvirt.com>; Thu, 10 Aug
2017 11:03:39 -0500 (CDT)
Aug 10 11:03:39 mail2 postfix/smtpd[450]: disconnect from
unknown[208.110.82.116] ehlo=1 mail=1 rcpt=1 data=1 quit=1 commands=5
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) Checking: KmK8jCUCqcuq
[208.110.82.116] <co...@qq.com> -> <sh...@myvirt.com>
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: tie-ing to DB
file R/O /etc/mail/bayes/bayes_toks
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: tie-ing to DB
file R/O /etc/mail/bayes/bayes_seen
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: found bayes db
version 3
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: DB journal
sync: last sync: 0
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: corpus size:
nspam = 349, nham = 478
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for *p = "U*contact D*qq.com D*com"
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for X-Amavis-PolicyBank = ""
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for X-Amavis-MessageSize = "174380"
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for MIME-Version = ""
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for *F = "U*shorton D*myvirt.com D*org U*yt5r4e3 D*qq.com D*com"
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for To = "U*shorton D*myvirt.com D*org"
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for *c = "/html;"
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for x-spam-relays-external = " [ ip=208.110.82.116 rdns= helo=qq.com
by=tn2.companypostoffice.com ident= envfrom= intl=0 id=1B93B3F5A auth= msa=0
] [ ip=127.0.0.1 rdns=localhost helo=localhost by=qq.com ident=
envfrom=contact@qq.com intl=0 id=hhi1tk16lt0l auth= msa=0 ]"
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for x-spam-relays-internal = " "
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for *RT = " "
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for *RU = " [ ip=208.110.82.116 rdns= helo=qq.com
by=tn2.companypostoffice.com ident= envfrom= intl=0 id=1B93B3F5A auth= msa=0
] [ ip=127.0.0.1 rdns=localhost helo=localhost by=qq.com ident=
envfrom=contact@qq.com intl=0 id=hhi1tk16lt0l auth= msa=0 ]"
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for *r = " localhost (127.0.0 ip*127.0.0.1 ) by qq.com <sh...@myvirt.com>;
envelope- <co...@qq.com>)"
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for *r = " localhost (127.0.0 ip*127.0.0.1 ) by qq.com <sh...@myvirt.com>;
envelope- <co...@qq.com>) qq.com (unknown [208.110.82 ip*208.110.82.116 ])
by tn2.companypostoffice.com (Postfix) <sh...@myvirt.com>; "
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: token '4001' =>
0.999898854265489
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: token
'H*F:U*shorton' => 0.999772898574472
<snip 200ish similar tokens>
<snip similar tokens>
<snip similar tokens>
<snip similar tokens>Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg:
bayes: token 'corresponde' => 0.998847880299252
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: token
'newcastle' => 0.998847880299252
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: score = 1
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: DB expiry:
tokens in DB: 166030, Expiry max size: 150000, Oldest atime: 1501594564,
Newest atime: 1502289189, Last expire: 1502304550, Current time: 1502381019
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: opportunistic
call found expiry due
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: bayes journal
sync starting
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: bayes journal
sync completed
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: expiry starting
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: expiry
completed
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: untie-ing


Then

Aug 10 11:03:44 mail2 amavis[377]: (00377-01) KmK8jCUCqcuq(KmK8jCUCqcuq)
SEND from <> -> <sp...@myvirt.com>, ENVID=671416;675610;322132;sachin2
BODY=7BIT 250 2.0.0 from MTA(smtp:[127.0.0.1]:10025): 250 2.0.0 Ok: queued
as 9592B70
Aug 10 11:03:44 mail2 amavis[377]: (00377-01) Blocked SPAM
{DiscardedInbound,Quarantined}, [208.110.82.116]:48315 [208.110.82.116]
ESMTP/LMTP <co...@qq.com> -> <sh...@myvirt.com>,
(ESMTP://[208.110.82.116]:48315), quarantine: spam06@myvirt.com, Queue-ID:
1B93B3F5A, mail_id: KmK8jCUCqcuq, b: LxuIyLspX, Hits: 23.904, size: 174380,
Subject: "Your order no #562-4581 has arrived", From:
<sh...@qq.com>, helo=qq.com, Tests:
[BAYES_999=0.2,BAYES_99=3.5,DCC_CHECK=3.2,DIGEST_MULTIPLE=0.293,HEADER_FROM_DIFFERENT_DOMAINS=0.001,HTML_MESSAGE=0.001,HTML_MIME_NO_HTML_TAG=0.377,MIME_HTML_ONLY=0.723,MISSING_MID=0.497,NORMAL_HTTP_TO_IP=0.001,OBFUSCATING_COMMENT=0.723,RAZOR2_CF_RANGE_51_100=0.5,RAZOR2_CF_RANGE_E8_51_100=1.886,RAZOR2_CHECK=2.5,RCVD_IN_BRBL_LASTEXT=1.449,RDNS_NONE=0.793,SPF_HELO_SOFTFAIL=3,SPF_SOFTFAIL=3,T_HTML_TAG_BALANCE_CENTER=0.01,URIBL_ABUSE_SURBL=1.25],
autolearn=unavailable autolearn_force=no, autolearnscore=21.255, 5305 ms








--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138252.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

RE: Bayes auto-learn - not happening

Posted by Scott Techlist <te...@msws.org>.
>surely, it makes no sense blow up the database with already 100%
>classified samples - you even don't do that uncnditional with a
>hand-trained database (at least not forever, at the begin it makes sense
>to get additional tokens)

I think you misunderstood my question.  I meant that as I look at messages to see if I think they should have learned, or not, would one that shows 90-100 spam likely per bayes likely be one that is being skipped due to already being recognized.  I am not asking why isn't it learning one like that.  Or maybe I misunderstood your answer.

>but train every single message which is already classified as expected
>would leat to a lot of useless load. blows up the database and makes
>bayes-poisioning and the need to purge the whole database and start from
>scratch (with thanks to autotraining no available corpus) then
>autolearning on it's down does

Agree.  And I understand that is not how it is designed.  


>the question of bayes-poisioning is not "if", it's "when and how often"
>and hence after 10 years expierience i stopped that nonsense and keep a
>currently 120000 messages large corpus of eml-files (HAM AND SPAM)

Not arguing the pros and cons of IF one should use it.

I only want to make it work, or better said, verify that it IS working.  Then I can decide if I want to keep using it.  Right now, I've never seen it work.  Thus my strong suspicion that is is not working.  One thing for sure, it hasn't found a single spam or ham to auto-learn, yet.  Which seems unlikely if it were functioning properly.

The output of "unavailable" is too ambiguous for me to devise a way to troubleshoot.  But I'm not an expert with SA.  Thus the plea for assistance in seeing if it is working.  If auto-learn isn't working, my expectation is that auto-anything-else isn't working either.  Journal maint, etc.





RE: Bayes auto-learn - not happening

Posted by techlist06 <te...@msxc.com>.
>surely, it makes no sense blow up the database with already 100%
>classified samples - you even don't do that uncnditional with a
>hand-trained database (at least not forever, at the begin it makes sense
>to get additional tokens)

I think you misunderstood my question.  I meant that as I look at messages to see if I think they should have learned, or not, would one that shows 90-100 spam likely per bayes likely be one that is being skipped due to already being recognized.  I am not asking why isn't it learning one like that.  Or maybe I misunderstood your answer.

>but train every single message which is already classified as expected
>would leat to a lot of useless load. blows up the database and makes
>bayes-poisioning and the need to purge the whole database and start from
>scratch (with thanks to autotraining no available corpus) then
>autolearning on it's down does

Agree.  And I understand that is not how it is designed.  


>the question of bayes-poisioning is not "if", it's "when and how often"
>and hence after 10 years expierience i stopped that nonsense and keep a
>currently 120000 messages large corpus of eml-files (HAM AND SPAM)

Not arguing the pros and cons of IF one should use it.

I only want to make it work, or better said, verify that it IS working.  Then I can decide if I want to keep using it.  Right now, I've never seen it work.  Thus my strong suspicion that is is not working.  One thing for sure, it hasn't found a single spam or ham to auto-learn, yet.  Which seems unlikely if it were functioning properly.

The output of "unavailable" is too ambiguous for me to devise a way to troubleshoot.  But I'm not an expert with SA.  Thus the plea for assistance in seeing if it is working.  If auto-learn isn't working, my expectation is that auto-anything-else isn't working either.  Journal maint, etc.





Re: Bayes auto-learn - not happening

Posted by Scott <te...@msxc.com>.
If any particular message has a 
 *  3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
 *      [score: 1.0000] 

Is it safe to assume that spam or one close to it has been learned and so it
would not be a candidate for auto-learn?  

Maybe I'm not being patient enough.






--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138255.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayes auto-learn - not happening

Posted by Scott <te...@msxc.com>.
FYI, here's the verbose headers for that same one that flowed above:

X-Spam-Flag: YES
X-Spam-Score: 23.904
X-Spam-Level: ***********************
X-Spam-Status: Yes, score=23.904 tag=-9999 tag2=5 kill=6.4
        tests=[BAYES_999=0.2, BAYES_99=3.5, DCC_CHECK=3.2,
        DIGEST_MULTIPLE=0.293, HEADER_FROM_DIFFERENT_DOMAINS=0.001,
        HTML_MESSAGE=0.001, HTML_MIME_NO_HTML_TAG=0.377,
MIME_HTML_ONLY=0.723,
        MISSING_MID=0.497, NORMAL_HTTP_TO_IP=0.001,
OBFUSCATING_COMMENT=0.723,
        RAZOR2_CF_RANGE_51_100=0.5, RAZOR2_CF_RANGE_E8_51_100=1.886,
        RAZOR2_CHECK=2.5, RCVD_IN_BRBL_LASTEXT=1.449, RDNS_NONE=0.793,
        SPF_HELO_SOFTFAIL=3, SPF_SOFTFAIL=3, T_HTML_TAG_BALANCE_CENTER=0.01,
        URIBL_ABUSE_SURBL=1.25] autolearn=unavailable autolearn_force=no
X-Spam-Report:
 *  3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
 *      [score: 1.0000]
 *  1.4 RCVD_IN_BRBL_LASTEXT RBL: No description available.
 *      [208.110.82.116 listed in bb.barracudacentral.org]
 *  1.2 URIBL_ABUSE_SURBL Contains an URL listed in the ABUSE SURBL
 *      blocklist
 *      [URIs: 154.16.37.73]
 *  0.0 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail
 *      domains are different
 *  3.0 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail)
 *  3.0 SPF_HELO_SOFTFAIL SPF: HELO does not match SPF record (softfail)
 *  0.0 NORMAL_HTTP_TO_IP URI: URI host has a public dotted-decimal IPv4
 *      address
 *  0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
 *      [score: 1.0000]
 *  0.0 HTML_MESSAGE BODY: HTML included in message
 *  0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
 *  3.2 DCC_CHECK Detected as bulk mail by DCC (dcc-servers.net)
 *  2.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
 *  1.9 RAZOR2_CF_RANGE_E8_51_100 Razor2 gives engine 8 confidence level
 *      above 50%
 *      [cf: 100]
 *  0.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
 *      [cf: 100]
 *  0.3 DIGEST_MULTIPLE Message hits more than one network digest check
 *  0.4 HTML_MIME_NO_HTML_TAG HTML-only message, but there is no HTML tag
 *  0.5 MISSING_MID Missing Message-Id: header
 *  0.8 RDNS_NONE Delivered to internal network by a host with no rDNS
 *  0.7 OBFUSCATING_COMMENT HTML comments which obfuscate text
 *  0.0 T_HTML_TAG_BALANCE_CENTER Malformatted HTML
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on





--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138254.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayes auto-learn - not happening

Posted by Scott <te...@msxc.com>.
Here is a debug log for one that just flowed.  I don't see anything about why
auto-learn was unavailable.  But it shows it's talking to the db anyway I
think.

Is there a way to set auto_learn_force to yes?  The log format makes one
thing it's a global setting but all I can find it looks like a per-rule
setting.  Be easier to troubleshoot if I could relax it to looking to any 6
points instead of 3/3.


Aug 10 11:03:39 mail2 amavis[377]: (00377-01) LMTP :10024
/var/spool/amavisd/tmp/amavis-20170810T110339-00377-JQiRqEtF:
<co...@qq.com> -> <my...@myvirt.com> SIZE=175613 BODY=8BITMIME
ENVID=671416;675610;322132;sachin2 Received: from tn2.myserver.com
([127.0.0.1]) by localhost (tn2.myserver.com [127.0.0.1]) (amavisd-new, port
10024) with LMTP for <my...@myvirt.com>; Thu, 10 Aug 2017 11:03:39 -0500
(CDT)
Aug 10 11:03:39 mail2 postfix/smtpd[450]: disconnect from
unknown[208.110.82.116] ehlo=1 mail=1 rcpt=1 data=1 quit=1 commands=5
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) Checking: KmK8jCUCqcuq
[208.110.82.116] <co...@qq.com> -> <my...@myvirt.com>
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: tie-ing to DB
file R/O /etc/mail/bayes/bayes_toks
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: tie-ing to DB
file R/O /etc/mail/bayes/bayes_seen
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: found bayes db
version 3
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: DB journal
sync: last sync: 0
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: corpus size:
nspam = 349, nham = 478
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for *p = "U*contact D*qq.com D*com"
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for X-Amavis-PolicyBank = ""
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for X-Amavis-MessageSize = "174380"
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for MIME-Version = ""
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for *F = "U*myuser D*myvirt.com D*org U*yt5r4e3 D*qq.com D*com"
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for To = "U*myuser D*myvirt.com D*org"
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for *c = "/html;"
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for x-spam-relays-external = " [ ip=208.110.82.116 rdns= helo=qq.com
by=tn2.myserver.com ident= envfrom= intl=0 id=1B93B3F5A auth= msa=0 ] [
ip=127.0.0.1 rdns=localhost helo=localhost by=qq.com ident=
envfrom=contact@qq.com intl=0 id=hhi1tk16lt0l auth= msa=0 ]"
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for x-spam-relays-internal = " "
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for *RT = " "
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for *RU = " [ ip=208.110.82.116 rdns= helo=qq.com by=tn2.myserver.com ident=
envfrom= intl=0 id=1B93B3F5A auth= msa=0 ] [ ip=127.0.0.1 rdns=localhost
helo=localhost by=qq.com ident= envfrom=contact@qq.com intl=0
id=hhi1tk16lt0l auth= msa=0 ]"
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for *r = " localhost (127.0.0 ip*127.0.0.1 ) by qq.com <my...@myvirt.com>;
envelope- <co...@qq.com>)"
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: header tokens
for *r = " localhost (127.0.0 ip*127.0.0.1 ) by qq.com <my...@myvirt.com>;
envelope- <co...@qq.com>) qq.com (unknown [208.110.82 ip*208.110.82.116 ])
by tn2.myserver.com (Postfix) <my...@myvirt.com>; "
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: token '4001' =>
0.999898854265489
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: token
'H*F:U*myuser' => 0.999772898574472
<snip 200ish similar tokens>
<snip similar tokens>
<snip similar tokens>
<snip similar tokens>Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg:
bayes: token 'corresponde' => 0.998847880299252
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: token
'newcastle' => 0.998847880299252
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: score = 1
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: DB expiry:
tokens in DB: 166030, Expiry max size: 150000, Oldest atime: 1501594564,
Newest atime: 1502289189, Last expire: 1502304550, Current time: 1502381019
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: opportunistic
call found expiry due
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: bayes journal
sync starting
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: bayes journal
sync completed
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: expiry starting
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: expiry
completed
Aug 10 11:03:39 mail2 amavis[377]: (00377-01) SA dbg: bayes: untie-ing


Then

Aug 10 11:03:44 mail2 amavis[377]: (00377-01) KmK8jCUCqcuq(KmK8jCUCqcuq)
SEND from <> -> <sp...@myvirt.com>, ENVID=671416;675610;322132;sachin2
BODY=7BIT 250 2.0.0 from MTA(smtp:[127.0.0.1]:10025): 250 2.0.0 Ok: queued
as 9592B70
Aug 10 11:03:44 mail2 amavis[377]: (00377-01) Blocked SPAM
{DiscardedInbound,Quarantined}, [208.110.82.116]:48315 [208.110.82.116]
ESMTP/LMTP <co...@qq.com> -> <my...@myvirt.com>,
(ESMTP://[208.110.82.116]:48315), quarantine: spam06@myvirt.com, Queue-ID:
1B93B3F5A, mail_id: KmK8jCUCqcuq, b: LxuIyLspX, Hits: 23.904, size: 174380,
Subject: "Your order no #562-4581 has arrived", From:
<my...@qq.com>, helo=qq.com, Tests:
[BAYES_999=0.2,BAYES_99=3.5,DCC_CHECK=3.2,DIGEST_MULTIPLE=0.293,HEADER_FROM_DIFFERENT_DOMAINS=0.001,HTML_MESSAGE=0.001,HTML_MIME_NO_HTML_TAG=0.377,MIME_HTML_ONLY=0.723,MISSING_MID=0.497,NORMAL_HTTP_TO_IP=0.001,OBFUSCATING_COMMENT=0.723,RAZOR2_CF_RANGE_51_100=0.5,RAZOR2_CF_RANGE_E8_51_100=1.886,RAZOR2_CHECK=2.5,RCVD_IN_BRBL_LASTEXT=1.449,RDNS_NONE=0.793,SPF_HELO_SOFTFAIL=3,SPF_SOFTFAIL=3,T_HTML_TAG_BALANCE_CENTER=0.01,URIBL_ABUSE_SURBL=1.25],
autolearn=unavailable autolearn_force=no, autolearnscore=21.255, 5305 ms

















--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138253.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Bayes auto-learn - not happening

Posted by Scott <te...@msxc.com>.
>why this?
>When you run from amavisd, you only need permission for amavis user, not
for
>anyone. 

To be sure that is not the problem.  I can tighten it up once working. I
understand thisis what one woudl normally use if they had a multi-user
enviroment.  But it can't hurt the problem for testing, right?

> Is /etc/mail/bayes writeable by amavisd? 
Yes, from "3b" in my lists above:  /etc/mail/bayes is wide open right now.

[root@mail2 amavisd]# ls -la /etc/mail/bayes
total 4196
drwxrwxrwx 2 amavis amavis    4096 Aug  9 13:49 .
drwxr-xr-x 4 amavis amavis    4096 Aug  3 13:02 ..
-rwxrwxrwx 1 amavis amavis   86016 Aug  9 09:51 bayes_seen
-rwxrwxrwx 1 amavis amavis 5246976 Aug  9 13:49 bayes_toks





--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Bayes-auto-learn-not-happening-tp138065p138251.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.