You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by joe a <jo...@j4computers.com> on 2023/02/13 22:42:19 UTC

BAYES_00 BODY. Negative score?

Have some annoying SPAM that consistently shows a negative score on 
BAYES.  Is the default scoring or influenced by BAYES in some way?

*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
*      [score: 0.0000]

SpamAssassin 3.4.5

Thanks for any pointers.

Re: BAYES_00 BODY. Negative score?

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
>>On 13.02.23 17:42, joe a wrote:
>>>Have some annoying SPAM that consistently shows a negative score 
>>>on BAYES.  Is the default scoring or influenced by BAYES in some 
>>>way?
>>>
>>>*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>>>*      [score: 0.0000]

>On 2/14/2023 2:56 AM, Matus UHLAR - fantomas wrote:
>>This indicates a mistrained database, which means you have trained 
>>too many spams or spam-like messages (commercial messages) as ham.
>>
>>Proper training of spams should help. Just keep your spam (and 
>>optionally ham) corpora for retraining in case you would drop the 
>>database.
>>
>>I also recommend to abstain from training commercial mail (notices 
>>from e-shops, companies you done business with etc) as ham, unless 
>>they generate BAYES_999 score and you want it lower.  I often train 
>>them as spam so those give uncertain BAYES_50 result.
>>
>>Those mails resemble spam too much to be used for training.

On 14.02.23 17:37, joe a wrote:
>The term "proper training" has always seemed a bit problematic to me. 
>That aside, experiencing an error trying attempting:
>
>sa-learn -D --spam /var/mail/spamd/Cabinet.saved-spam

just FYI, there are multiple ways to train:

spamassassin -r < mail
- will train single message as spam. 

spamc -C spam < mail
- will tell spamd to train message as spam. spamd must run with -l 
(--allow-tell) option to do that

sa-learn --spam mail
- will train single message as 

sa-learn --mbox --spam mbox
- will train multiple messages in single file in mbox format.

spamd must run as root with -H option in order to train your own database, 
unless you use sql/redis for bayes storage.

when using amavis, spamd is not used and the database is stored under amavis 
users' home directory (unless you changed DB to sql/redis).

you can still use spamassassin or sa-learn, but either run it under su/sudo:

su amavisd -c "spamassassin -r" < message

sa-learn --dbpath /var/lib/amavis/.spamassassin/ --mbox --spam mbox


when you scan messages sized over standard 500K, you must also increase size 
of trained messages too.


>The last line shows:
>
>***************
>Learned tokens from 0 message(s) (1 message(s) examined)
>ERROR: the Bayes learn function returned an error, please re-run with 
>-D for more information at /usr/bin/sa-learn line 500.
>***************
>
>Which may be permissions related.  However, there seem to be some 
>errors/warning at the beginning, starting with:
>
>***************
>Feb 14 17:26:14.956 [2855] dbg: plugin: loading 
>Mail::SpamAssassin::Plugin::Razo                                 r2 
>from @INC
>Feb 14 17:26:14.959 [2855] dbg: razor2: razor2 is not available
>Feb 14 17:26:14.959 [2855] dbg: plugin: loading 
>Mail::SpamAssassin::Plugin::SpamCop from @INC
>plugin: failed to parse plugin (from @INC): Can't locate 
>Mail/SpamAssassin/Plugin/SpamCop.pm: 
>lib/Mail/SpamAssassin/Plugin/SpamCop.pm: Permission denied at (eval 
>44) line 1.

there have nothing to do with training, although spamcop.pm can be used to 
report mail to spamcop.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Despite the cost of living, have you noticed how popular it remains?

Re: BAYES_00 BODY. Negative score?

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
>On Fri, 2023-02-17 at 10:54 -0500, joe a wrote:
>> Could it have been that simple?

On 17.02.23 16:44, Martin Gregorie wrote:
>If, like myself, you find reference books useful, you may want to get a
>copy of "Linux in a Nutshell" - an O'Reilly book.
>
>It tends to assume you know at least one other OS fairly well, is well
>organised and concise. I've also found "Debian Reference"
>
> http://www.debian.org/doc/manuals/debian-reference/
>
>useful for most flavours of Linux (I use Fedora and Raspbian)

reading such book is a good idea, but I think this is more a SA bug, @INC 
containing something that references "." or "..", which it should not, and 
which causes perl fail when it can't open directory in @INC.
(and perl has documented this feature iirc).

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
WinError #98652: Operation completed successfully.

Re: BAYES_00 BODY. Negative score?

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
>On 2/17/2023 8:24 PM, joe a wrote:
>>Did a simple test today sending an email from a gmail account to two 
>>email accounts on my system.   The only difference was the email 
>>address, both were on the same "To:" line in the composed messages.
>>
>>They receive wildly different BAYES scores.

as was mentioned, they were apparently tested under different user.

I want to add, that multiple similar messages can have invisible differences 
which can result into different BAYES results.

Just this week I noticed (at least) two phishing waves, training one e-mail 
resulted into pushing other e-mails' scores up to BAYES_999, while other 
still had BAYES_50 or BAYES_80.

Simply, we need more training.

On 17.02.23 23:46, Jared Hall wrote:
>Try rattling off another Gmail message, but this time switch the two 
>Email addresses on the "To:" line around. Maybe a case where only the 
>first Email address is looked at by SA?

if needed, scan the same mail under different user, if possible.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Microsoft dick is soft to do no harm

Re: BAYES_00 BODY. Negative score?

Posted by Jared Hall <ja...@jaredsec.com>.
On 2/17/2023 8:24 PM, joe a wrote:
> On 2/17/2023 3:25 PM, joe a wrote:
>
> Did a simple test today sending an email from a gmail account to two 
> email accounts on my system.   The only difference was the email 
> address, both were on the same "To:" line in the composed messages.
>
> They receive wildly different BAYES scores.
>
Try rattling off another Gmail message, but this time switch the two 
Email addresses on the "To:" line around. Maybe a case where only the 
first Email address is looked at by SA?

Thanks,

Jared Hall


Re: BAYES_00 BODY. Negative score?

Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 2023-02-17 at 22:41:05 UTC-0500 (Fri, 17 Feb 2023 19:41:05 -0800)
Loren Wilton <lw...@earthlink.net>
is rumored to have said:

>> They receive wildly different BAYES scores.
>> * -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>> *      [score: 0.0002]
>> *  2.2 BAYES_20 BODY: Bayes spam probability is 5 to 20%
>> *      [score: 0.0881]
>
> This looks like you have per-user Bayes databases, and the messaage 
> type has been trained differently in each.
>
> Also, it looks like there are per-user rules, since BAYES_50 has a 
> normal score of 0.2, and there is no reason BAYES_20 (indicating much 
> less spammy) should have a score of 2.2.

Absolutely correct.

However, that does not prove definitively that there are per-user Bayes 
DBs & rules, just that the BAYES_20 score is insane. The difference 
between 8.81% and 00.02% isn't very meaningful. It isn't accidental that 
SA doesn't have finer categories of Bayes scores.


-- 
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire

Re: BAYES_00 BODY. Negative score?

Posted by hg user <me...@gmail.com>.
please
spamassassin -D bayes -t file.eml 2>/tmp/z
and in /tmp/z you will have the score assigned to the "tokens"... from
those points you will understand what created the different totals.

If you can you may relearn all the messages, both ham and spam, with the
tip suggested a couple of days ago, removing all the headers. It may lower
points to some spam but probably it's better..

On Sat, Feb 18, 2023 at 3:37 PM joe a <jo...@j4computers.com> wrote:

> On 2/17/2023 10:41 PM, Loren Wilton wrote:
> >> They receive wildly different BAYES scores.
> >> * -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
> >> *      [score: 0.0002]
> >> *  2.2 BAYES_20 BODY: Bayes spam probability is 5 to 20%
> >> *      [score: 0.0881]
> >
> > This looks like you have per-user Bayes databases, and the messaage type
> > has been trained differently in each.
> >
> > Also, it looks like there are per-user rules, since BAYES_50 has a
> > normal score of 0.2, and there is no reason BAYES_20 (indicating much
> > less spammy) should have a score of 2.2.
> >
>
> Per-user is not setup.
>
> This morning I sent the message again, with users reversed in the TO:
> field and the scores are identical.  This may prove nothing as I
> thoughtlessly added the high score message to my "HAM" folder and it was
> processed.
>
> While the scores are identical the X-Spam-Report lists them in different
> order, while X-Spam-Status shows them identically, "RCVD_IN_MSPIKE_H2
> RBL" being listed near the top in one and near the bottom in the other.
>
> Perhaps that is meaningless, but it pings my curiosity.
>
>
>
>
>

Re: BAYES_00 BODY. Negative score?

Posted by joe a <jo...@j4computers.com>.
On 2/17/2023 10:41 PM, Loren Wilton wrote:
>> They receive wildly different BAYES scores.
>> * -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>> *      [score: 0.0002]
>> *  2.2 BAYES_20 BODY: Bayes spam probability is 5 to 20%
>> *      [score: 0.0881]
> 
> This looks like you have per-user Bayes databases, and the messaage type 
> has been trained differently in each.
> 
> Also, it looks like there are per-user rules, since BAYES_50 has a 
> normal score of 0.2, and there is no reason BAYES_20 (indicating much 
> less spammy) should have a score of 2.2.
> 

Per-user is not setup.

This morning I sent the message again, with users reversed in the TO: 
field and the scores are identical.  This may prove nothing as I 
thoughtlessly added the high score message to my "HAM" folder and it was 
processed.

While the scores are identical the X-Spam-Report lists them in different 
order, while X-Spam-Status shows them identically, "RCVD_IN_MSPIKE_H2 
RBL" being listed near the top in one and near the bottom in the other.

Perhaps that is meaningless, but it pings my curiosity.





Re: BAYES_00 BODY. Negative score?

Posted by Loren Wilton <lw...@earthlink.net>.
> They receive wildly different BAYES scores.
> * -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
> *      [score: 0.0002]
> *  2.2 BAYES_20 BODY: Bayes spam probability is 5 to 20%
> *      [score: 0.0881]

This looks like you have per-user Bayes databases, and the messaage type has 
been trained differently in each.

Also, it looks like there are per-user rules, since BAYES_50 has a normal 
score of 0.2, and there is no reason BAYES_20 (indicating much less spammy) 
should have a score of 2.2.


Re: BAYES_00 BODY. Negative score?

Posted by joe a <jo...@j4computers.com>.
On 2/17/2023 3:25 PM, joe a wrote:

Did a simple test today sending an email from a gmail account to two 
email accounts on my system.   The only difference was the email 
address, both were on the same "To:" line in the composed messages.

They receive wildly different BAYES scores.
------------------------------
X-Spam-Checker-Version: SpamAssassin 3.4.5 (2021-03-20) on myserver
X-Spam-Level: *
X-Spam-Status: No, score=1.1 required=4.9 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE,
	IXHASH_X1,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_SOFTFAIL
	autolearn=disabled version=3.4.5
X-Spam-Report:
	* -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
	*      [score: 0.0002]
------------------------------

X-Spam-Checker-Version: SpamAssassin 3.4.5 (2021-03-20) on myserver
X-Spam-Flag: YES
X-Spam-Level: *****
X-Spam-Status: Yes, score=5.2 required=4.9 tests=BAYES_20,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE,
	IXHASH_X1,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_SOFTFAIL
	autolearn=disabled version=3.4.5
X-Spam-Report:
	*  2.2 BAYES_20 BODY: Bayes spam probability is 5 to 20%
	*      [score: 0.0881]
------------------------------

Just another sign of BAYES wackiness? More evidence of need for rebuild?




Re: BAYES_00 BODY. Negative score?

Posted by joe a <jo...@j4computers.com>.
On 2/17/2023 11:44 AM, Martin Gregorie wrote:
> On Fri, 2023-02-17 at 10:54 -0500, joe a wrote:
> 
>> Could it have been that simple?
>>
> If, like myself, you find reference books useful, you may want to get a
> copy of "Linux in a Nutshell" - an O'Reilly book.
> 
> It tends to assume you know at least one other OS fairly well, is well
> organised and concise. I've also found "Debian Reference"
> 
>   http://www.debian.org/doc/manuals/debian-reference/
> 
> useful for most flavours of Linux (I use Fedora and Raspbian)
> 
> Martin
> 

There was also a "Unix in a Nutshell".  I found it amusing, in my 
NetWare days, to have a copy on my desk and offer it to the Unix-oids 
that meanered in from time to time,  that liked to scoff at "security by 
obscurity" and those "Puny PC's you call Servers".  (That from folks 
that swore sendmail was forever king and operated the email server as an 
open relay).

A bit of an issue when I offered that the book should be called "Nuts, 
in a Unix Shell". . . Ah, the memories . . .



Re: BAYES_00 BODY. Negative score?

Posted by Martin Gregorie <ma...@gregorie.org>.
On Fri, 2023-02-17 at 10:54 -0500, joe a wrote:

> Could it have been that simple?
> 
If, like myself, you find reference books useful, you may want to get a
copy of "Linux in a Nutshell" - an O'Reilly book.

It tends to assume you know at least one other OS fairly well, is well
organised and concise. I've also found "Debian Reference"

 http://www.debian.org/doc/manuals/debian-reference/

useful for most flavours of Linux (I use Fedora and Raspbian)

Martin


Re: BAYES_00 BODY. Negative score?

Posted by joe a <jo...@j4computers.com>.
On 2/17/2023 4:42 AM, Matus UHLAR - fantomas wrote:
> On 16.02.23 15:57, joe a wrote:
>> Re-energized having recently heroically wrestled an elusive issue (to 
>> me) into surrender . . . we now turn to another issue.
>>
>> Probably I need to retrain BAYES "From scratch".  I have a mess 
>> (years?) of stored sample emails that and be relearned.
>>
>> I understand that sa-learn should be run as the same user as spamd, 
>> however I find it has always been run as root and when running as the 
>> spamassassin user results in errors, such as:
>>
>> ~su -c "sa-learn --spam /var/mail/spamd/Cabinet.Missed-SPAM" spamfilter
>>
>> results in errors, starting with:
>>
>> plugin: failed to parse plugin (from @INC): Can't locate 
>> Mail/SpamAssassin/Plugin/SpamCop.pm: 
>> lib/Mail/SpamAssassin/Plugin/SpamCop.pm: Permission denied at (eval 
>> 44) line 1.
>>
>> plugin: failed to parse plugin (from @INC): Can't locate 
>> Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm: 
>> lib/Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm: Permission denied 
>> at (eval 45) line 1.
> 
> try first changing current working directory into one readable by user 
> "spamfilter", perhaps root (/).
> 

Could it have been that simple?

Yes, apparently it was.

Many thanks.

joe a.

Re: BAYES_00 BODY. Negative score?

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 16.02.23 15:57, joe a wrote:
>Re-energized having recently heroically wrestled an elusive issue (to 
>me) into surrender . . . we now turn to another issue.
>
>Probably I need to retrain BAYES "From scratch".  I have a mess 
>(years?) of stored sample emails that and be relearned.
>
>I understand that sa-learn should be run as the same user as spamd, 
>however I find it has always been run as root and when running as the 
>spamassassin user results in errors, such as:
>
>~su -c "sa-learn --spam /var/mail/spamd/Cabinet.Missed-SPAM" spamfilter
>
>results in errors, starting with:
>
>plugin: failed to parse plugin (from @INC): Can't locate 
>Mail/SpamAssassin/Plugin/SpamCop.pm: 
>lib/Mail/SpamAssassin/Plugin/SpamCop.pm: Permission denied at (eval 
>44) line 1.
>
>plugin: failed to parse plugin (from @INC): Can't locate 
>Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm: 
>lib/Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm: Permission denied 
>at (eval 45) line 1.

try first changing current working directory into one readable by user 
"spamfilter", perhaps root (/).

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Despite the cost of living, have you noticed how popular it remains?

Re: BAYES_00 BODY. Negative score?

Posted by joe a <jo...@j4computers.com>.
. . .
>> it also runs with another environment, so it may miss PATHes or @INC 
>> directories.
> 
> That throws me a curve.  What is an @INC directory?  SA specific?
> I do not find any with the locate command, but if the are an actual 
> directory may need to escape the @ sign somehow.  \ does not seem to do it.
> 

I being to see.  It is a perl thing.  I knew I should not have left that 
camel at the oasis.


Re: BAYES_00 BODY. Negative score?

Posted by joe a <jo...@j4computers.com>.
On 2/16/2023 5:32 PM, hg user wrote:
> 
> 
> On Thu, Feb 16, 2023 at 9:57 PM joe a <joea-lists@j4computers.com 
> <ma...@j4computers.com>> wrote:
> 
> 
>     plugin: failed to parse plugin (from @INC): Can't locate
>     Mail/SpamAssassin/Plugin/SpamCop.pm:
>     lib/Mail/SpamAssassin/Plugin/SpamCop.pm: Permission denied at (eval 44)
>     line 1.
> 
> 
> root can do anything. a restricted user can't: it's only allowed to do 
> what others allowed it.
> 
> it also runs with another environment, so it may miss PATHes or @INC 
> directories.

That throws me a curve.  What is an @INC directory?  SA specific?
I do not find any with the locate command, but if the are an actual 
directory may need to escape the @ sign somehow.  \ does not seem to do it.

> You should locate the SpamCop.pm file and list the owner and ACL.

This I have done, with no change, even to the point of starting using _R 
option at /usr/lib/perl5/vendor_perl/5.26.1/Mail


> As user spamfilter run spamassassin with -D and see in the first lines 
> if you have similar errors.

Done that.  It is impressively more verbose, but I did not detect any 
more errors.

> Also check permission of /var/mail/spamd/Cabinet.Missed-SPAM. I had 
> permission problems trying to sa-learn files owned by root.
> 

That I found and fixed some time back.

> 
>     Running with the -D option does produce more, after that list of
>     permission denied items
> 
>     Feb 16 15:55:30.884 [10384] dbg: config: warning: no description set
>     for
>     STOX_REPLY_TYPE_WITHOUT_QUOTES
> 
> 
> These are not permission errors but warnings about the rules having no 
> text descriptions. It's ok.
> 
> 
> 

Re: BAYES_00 BODY. Negative score?

Posted by Martin Gregorie <ma...@gregorie.org>.
On Thu, 2023-02-16 at 23:32 +0100, hg user wrote:
> root can do anything. a restricted user can't: it's only allowed to do
> what
> others allowed it.
> 
> it also runs with another environment, so it may miss PATHes or @INC
> directories.
> 
You can check this by running 

env | less

from a command line under the appropriate user and making sure that all
the environment variables you expect to see defined are, and have the
values you expect.

Martin



Re: BAYES_00 BODY. Negative score?

Posted by hg user <me...@gmail.com>.
On Thu, Feb 16, 2023 at 9:57 PM joe a <jo...@j4computers.com> wrote:

>
> plugin: failed to parse plugin (from @INC): Can't locate
> Mail/SpamAssassin/Plugin/SpamCop.pm:
> lib/Mail/SpamAssassin/Plugin/SpamCop.pm: Permission denied at (eval 44)
> line 1.
>

root can do anything. a restricted user can't: it's only allowed to do what
others allowed it.

it also runs with another environment, so it may miss PATHes or @INC
directories.

You should locate the SpamCop.pm file and list the owner and ACL.

As user spamfilter run spamassassin with -D and see in the first lines if
you have similar errors.

Also check permission of /var/mail/spamd/Cabinet.Missed-SPAM. I had
permission problems trying to sa-learn files owned by root.



> Running with the -D option does produce more, after that list of
> permission denied items
>
> Feb 16 15:55:30.884 [10384] dbg: config: warning: no description set for
> STOX_REPLY_TYPE_WITHOUT_QUOTES
>

These are not permission errors but warnings about the rules having no text
descriptions. It's ok.


>
>

Re: BAYES_00 BODY. Negative score?

Posted by Jared Hall <ja...@jaredsec.com>.
On 2/16/2023 9:13 PM, joe a wrote:
>
> Well, I am in unfamiliar waters.
>
> picking one error message as typical:
>
> plugin: failed to parse plugin (from @INC): Can't locate 
> Mail/SpamAssassin/Plugin/iXhash2.pm: 
> lib/Mail/SpamAssassin/Plugin/iXhash2.pm: Permission denied at (eval 
> 1746) line 1.
>
> The file locations shown do not exist, as explicitly as shown. What I 
> find using "locate iXhash2.pm" is:
>
> /usr/lib/perl5/vendor_perl/5.26.1/Mail/SpamAssassin/Plugin/iXhash2.pm
> which the SA user can access, at least see via ll. The others I've 
> checked are also visible, and directories are x (exccutable).
>
> The sense I am getting is there is a perl file that contains these 
> paths that is referred to as @INC.
>
> I don't have the knowledge at this point to see if, somehow, root sees 
> the files as shown in the error or if the path is somehow altered for 
> the SA user.
>
> Thanks for any guidance.

Sounds like you've got a case of discombobulated PERL.  @INC is a list 
of directories to search for modules to INCLUDE at a PERL script's 
compile time.  It is usually baked into the PERL executable by the 
package builder.  Who really knows why they put things where they do?

perl -V gives verbose output.  @INC paths should be listed near, or at, 
the bottom.
To simplify that with a one-liner: perl -e 'printf "%d %s\n", $i++, $_ 
for @INC'

How you got into that state is a mystery to me.

I see a few symbolic links in your future.  For example:
ln -s /usr/lib/perl5/vendor_perl/5.26.1/Mail /lib/Mail

Or, if you have to be more specific (say, /lib/Mail exists already), 
something like:
ln -s /usr/lib/perl5/vendor_perl/5.26.1/Mail/SpamAssassin 
/lib/Mail/SpamAssassin

etc...

--Jared Hall






Re: BAYES_00 BODY. Negative score?

Posted by joe a <jo...@j4computers.com>.
On 2/16/2023 8:28 PM, Matija Nalis wrote:
> 
> On Thu, Feb 16, 2023 at 05:34:37PM -0500, joe a wrote:
>> Oh, of course.  I installed as root initially, being foolish perhaps, but
>> did create a specific user "later" and adjusted permissions as needed.  Or,
>> so I thought.
> 
> well, installing as root (especially with restrictive umask) manually
> (e.g. "make install" or "cpan" vs. "yum/rpm/dpkg") may often make
> problems, even if you later switch to packages (you need to look not
> only at final file permissions, but at directories leading up to it
> too).
> 
> namei -l /path/to/file.pm is often helpful to quickly check ALL
> permissions needed to access file (+x on directories is a must)
> 
>> Permissions are (almost) certainly the issue.  Now having the impressive
>> locate/mlocate creature at my command, I might actually make progress.
> 
> I usually troubleshoot those (if log is insufficient) with:
> 
> strace -efile -o /tmp/sa.log spamassassin foobar
> 
> then look at /tmp/sa.log to see which open/stat/access returned -1 EPERM
> or EACCES error.  Then check all path components for that file using
> "namei -l" (or multiple "ls -ld"). Then try to su to that user and
> "cat" that file manually.
> 
> If not regular DAC (chmod/chown) permissions, it might also be SELINUX
> restrictions or more rarely ACL (getfacl(1)).
> 

Well, I am in unfamiliar waters.

picking one error message as typical:

plugin: failed to parse plugin (from @INC): Can't locate 
Mail/SpamAssassin/Plugin/iXhash2.pm: 
lib/Mail/SpamAssassin/Plugin/iXhash2.pm: Permission denied at (eval 
1746) line 1.

The file locations shown do not exist, as explicitly as shown.  What I 
find using "locate iXhash2.pm" is:

/usr/lib/perl5/vendor_perl/5.26.1/Mail/SpamAssassin/Plugin/iXhash2.pm
which the SA user can access, at least see via ll. The others I've 
checked are also visible, and directories are x (exccutable).

The sense I am getting is there is a perl file that contains these paths 
that is referred to as @INC.

I don't have the knowledge at this point to see if, somehow, root sees 
the files as shown in the error or if the path is somehow altered for 
the SA user.

Thanks for any guidance.

Re: BAYES_00 BODY. Negative score?

Posted by Matija Nalis <mn...@voyager.hr>.
On Thu, Feb 16, 2023 at 05:34:37PM -0500, joe a wrote:
> Oh, of course.  I installed as root initially, being foolish perhaps, but
> did create a specific user "later" and adjusted permissions as needed.  Or,
> so I thought.

well, installing as root (especially with restrictive umask) manually
(e.g. "make install" or "cpan" vs. "yum/rpm/dpkg") may often make
problems, even if you later switch to packages (you need to look not
only at final file permissions, but at directories leading up to it
too).

namei -l /path/to/file.pm is often helpful to quickly check ALL
permissions needed to access file (+x on directories is a must)

> Permissions are (almost) certainly the issue.  Now having the impressive
> locate/mlocate creature at my command, I might actually make progress.

I usually troubleshoot those (if log is insufficient) with:

strace -efile -o /tmp/sa.log spamassassin foobar

then look at /tmp/sa.log to see which open/stat/access returned -1 EPERM
or EACCES error.  Then check all path components for that file using
"namei -l" (or multiple "ls -ld"). Then try to su to that user and
"cat" that file manually.

If not regular DAC (chmod/chown) permissions, it might also be SELINUX
restrictions or more rarely ACL (getfacl(1)).

-- 
Opinions above are GNU-copylefted.

Re: BAYES_00 BODY. Negative score?

Posted by joe a <jo...@j4computers.com>.
On 2/17/2023 7:37 AM, Reindl Harald wrote:
> 
> 
> Am 16.02.23 um 23:34 schrieb joe a:
>>>> I have no idea what you refer to when you state "don't user proper 
>>>> packages".  "Proper" in what sense? A rhetorical question.
>>>
>>> i have no idea how you installed SA but rpm packages or debs usually 
>>> have correct permissions
>>
>> Oh, of course.  I installed as root initially, being foolish perhaps
> 
> you *must* install software as root because the service *must not* have 
> write permissions to it's own binary files
> 
>> but did create a specific user "later" and adjusted permissions as 
>> needed.  Or, so I thought
> 
> the real question was HOW DID YOU INSTALL it
> 
> from the first day i maintained production servers i learnt to build my 
> own rpm packages - no matter if it's software written in C, PHP or Perl
> 
> why?
> 
> * because you get rid of leftover files over the years
> * permissions are part of te package
> * the package manager dectects many conflicts

One of the first things I learned when assembling things or attempting 
to learn something new, is to follow the instructions and only attempt 
to vary from them once you absolutely understood what your were doing. 
Or, suffer the consequences along with the (rare) accolades for 
improving a process.

That said, I would never "build my own rpm package" in this context.

This is almost entirely a "home/office" system that seems low traffic.

So, I installed postfix and spamassassin initially from the OS vendor 
supplied packages. Over the years I applied updates from outside the OS 
vendor channel, from packages from "authors" sites, as the versions 
diverged enough to be a concern.  There have been some OS updates as 
well and at least one transfer from one VM to another.

All this appears to be digression, to me, the issue, to me, seems to be 
why root sees the stuff in this @INC entity differently from how the SA 
user sees it.

With the insights and pointers gained in this thread, I hope to solve 
that sometime soon.



Re: BAYES_00 BODY. Negative score?

Posted by joe a <jo...@j4computers.com>.
. . .
>>
>> I have no idea what you refer to when you state "don't user proper 
>> packages".  "Proper" in what sense? A rhetorical question.
> 
> i have no idea how you installed SA but rpm packages or debs usually 
> have correct permissions

Oh, of course.  I installed as root initially, being foolish perhaps, 
but did create a specific user "later" and adjusted permissions as 
needed.  Or, so I thought.

>> Mlocate is (was) not installed in this particular system but promises 
>> to be useful in the future, regardless of your intent.  "find" has 
>> always been my go to tool.  Such as it is.
>>
>> Still it remains to be determined why root user can run sa-learn 
>> without error while another whose permissions are more constrained, 
>> cannot.
>>
>> And that, regardless of root (!) cause, would seem to be an SA topic
> 
> because the file permissions are obviously wrong which isn't a SA topic 
> - SA can't do anything when you mess your local permissions
> 

Permissions are (almost) certainly the issue.  Now having the impressive 
locate/mlocate creature at my command, I might actually make progress.

Thanks for the help.




Re: BAYES_00 BODY. Negative score?

Posted by joe a <jo...@j4computers.com>.
On 2/16/2023 4:30 PM, Reindl Harald wrote:
> 
> 
> Am 16.02.23 um 21:57 schrieb joe a:
>> I understand that sa-learn should be run as the same user as spamd, 
>> however I find it has always been run as root and when running as the 
>> spamassassin user results in errors, such as:
>>
>> ~su -c "sa-learn --spam /var/mail/spamd/Cabinet.Missed-SPAM" spamfilter
>>
>> results in errors, starting with:
>>
>> plugin: failed to parse plugin (from @INC): Can't locate 
>> Mail/SpamAssassin/Plugin/SpamCop.pm: 
>> lib/Mail/SpamAssassin/Plugin/SpamCop.pm: Permission denied at (eval 
>> 44) line 1.
>>
>> plugin: failed to parse plugin (from @INC): Can't locate 
>> Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm: 
>> lib/Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm: Permission denied 
>> at (eval 45) line 1.
>>
>> One might presume this to be a permissions issue (where would I get 
>> THAT idea?) but permissions to what?  As I cannot seem to find the 
>> items mentioned even as root.
> 
> when you don't use proper packages and even can't update your mlocate 
> database so that "locate SpamAssassin/Plugin/AutoLearnThreshold" that's 
> hardly a SA topic
> 
> [root@mail-gw:~]$ rpm -q --file 
> /usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm
> spamassassin-3.4.6-5.fc36.x86_64
> 
> [root@mail-gw:~]$ rpm -q --file 
> /usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/SpamCop.pm
> spamassassin-3.4.6-5.fc36.x86_64

I have no idea what you refer to when you state "don't user proper 
packages".  "Proper" in what sense? A rhetorical question.

Mlocate is (was) not installed in this particular system but promises to 
be useful in the future, regardless of your intent.  "find" has always 
been my go to tool.  Such as it is.

Still it remains to be determined why root user can run sa-learn without 
error while another whose permissions are more constrained, cannot.

And that, regardless of root (!) cause, would seem to be an SA topic.


Re: BAYES_00 BODY. Negative score?

Posted by joe a <jo...@j4computers.com>.
On 2/14/2023 6:09 PM, joe a wrote:
> Please let this sit for a while, I've discovered a fundamental issue 
> with my scheme of feeding messages to BAYES.  Unfortunately I was 
> remiss, apparently, it setting up logging for some bits, so have no idea 
> how long this has been failing.
> 
> Sorry for the clutter.
> 
> joe a.
> 

Re-energized having recently heroically wrestled an elusive issue (to 
me) into surrender . . . we now turn to another issue.

Probably I need to retrain BAYES "From scratch".  I have a mess (years?) 
of stored sample emails that and be relearned.

I understand that sa-learn should be run as the same user as spamd, 
however I find it has always been run as root and when running as the 
spamassassin user results in errors, such as:

~su -c "sa-learn --spam /var/mail/spamd/Cabinet.Missed-SPAM" spamfilter

results in errors, starting with:

plugin: failed to parse plugin (from @INC): Can't locate 
Mail/SpamAssassin/Plugin/SpamCop.pm: 
lib/Mail/SpamAssassin/Plugin/SpamCop.pm: Permission denied at (eval 44) 
line 1.

plugin: failed to parse plugin (from @INC): Can't locate 
Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm: 
lib/Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm: Permission denied at 
(eval 45) line 1.

One might presume this to be a permissions issue (where would I get THAT 
idea?) but permissions to what?  As I cannot seem to find the items 
mentioned even as root.

Running with the -D option does produce more, after that list of 
permission denied items

Feb 16 15:55:30.884 [10384] dbg: config: warning: no description set for 
STOX_REPLY_TYPE_WITHOUT_QUOTES
Feb 16 15:55:30.884 [10384] dbg: config: warning: no description set for 
MSOE_MID_WRONG_CASE
Feb 16 15:55:30.884 [10384] dbg: config: warning: no description set for 
HELO_FRIEND
Feb 16 15:55:30.884 [10384] dbg: config: warning: no description set for 
STOX_AND_PRICE
Feb 16 15:55:30.884 [10384] dbg: config: warning: no description set for 
L_SPAM_TOOL_13
Feb 16 15:55:30.885 [10384] dbg: config: warning: no description set for 
FSL_FAKE_HOTMAIL_RVCD

Means something to someone I guess.



Re: BAYES_00 BODY. Negative score?

Posted by joe a <jo...@j4computers.com>.
Please let this sit for a while, I've discovered a fundamental issue 
with my scheme of feeding messages to BAYES.  Unfortunately I was 
remiss, apparently, it setting up logging for some bits, so have no idea 
how long this has been failing.

Sorry for the clutter.

joe a.

On 2/14/2023 5:37 PM, joe a wrote:
> On 2/14/2023 2:56 AM, Matus UHLAR - fantomas wrote:
>> On 13.02.23 17:42, joe a wrote:
>>> Have some annoying SPAM that consistently shows a negative score on 
>>> BAYES.  Is the default scoring or influenced by BAYES in some way?
>>>
>>> *-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>>> *      [score: 0.0000]
>>
>> This indicates a mistrained database, which means you have trained too 
>> many spams or spam-like messages (commercial messages) as ham.
>>
>> Proper training of spams should help. Just keep your spam (and 
>> optionally ham) corpora for retraining in case you would drop the 
>> database.
>>
>> I also recommend to abstain from training commercial mail (notices 
>> from e-shops, companies you done business with etc) as ham, unless 
>> they generate BAYES_999 score and you want it lower.  I often train 
>> them as spam so those give uncertain BAYES_50 result.
>>
>> Those mails resemble spam too much to be used for training.
>>
> 
> All,
> 
> The term "proper training" has always seemed a bit problematic to me. 
> That aside, experiencing an error trying attempting:
> 
> sa-learn -D --spam /var/mail/spamd/Cabinet.saved-spam
> 
> The last line shows:
> 
> ***************
> Learned tokens from 0 message(s) (1 message(s) examined)
> ERROR: the Bayes learn function returned an error, please re-run with -D 
> for more information at /usr/bin/sa-learn line 500.
> ***************
> 
> Which may be permissions related.  However, there seem to be some 
> errors/warning at the beginning, starting with:
> 
> ***************
> Feb 14 17:26:14.956 [2855] dbg: plugin: loading 
> Mail::SpamAssassin::Plugin::Razo                                 r2 from 
> @INC
> Feb 14 17:26:14.959 [2855] dbg: razor2: razor2 is not available
> Feb 14 17:26:14.959 [2855] dbg: plugin: loading 
> Mail::SpamAssassin::Plugin::SpamCop from @INC
> plugin: failed to parse plugin (from @INC): Can't locate 
> Mail/SpamAssassin/Plugin/SpamCop.pm: 
> lib/Mail/SpamAssassin/Plugin/SpamCop.pm: Permission denied at (eval 44) 
> line 1.
> ***************
> 
> While this also suggests a permissions issue the only place I find 
> SpamCom.pm (even as root) is at: 
> "/usr/lib/perl5/vendor_perl/5.26.1/Mail/SpamAssassin/Plugin/SpamCop.pm", 
> which is not in the path sa-learn concocted when invoked.
> 
> Sorry if the formatting is weird or if this is useless information.

Re: BAYES_00 BODY. Negative score?

Posted by joe a <jo...@j4computers.com>.
On 2/14/2023 2:56 AM, Matus UHLAR - fantomas wrote:
> On 13.02.23 17:42, joe a wrote:
>> Have some annoying SPAM that consistently shows a negative score on 
>> BAYES.  Is the default scoring or influenced by BAYES in some way?
>>
>> *-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>> *      [score: 0.0000]
> 
> This indicates a mistrained database, which means you have trained too 
> many spams or spam-like messages (commercial messages) as ham.
> 
> Proper training of spams should help. Just keep your spam (and 
> optionally ham) corpora for retraining in case you would drop the database.
> 
> I also recommend to abstain from training commercial mail (notices from 
> e-shops, companies you done business with etc) as ham, unless they 
> generate BAYES_999 score and you want it lower.  I often train them as 
> spam so those give uncertain BAYES_50 result.
> 
> Those mails resemble spam too much to be used for training.
> 

All,

The term "proper training" has always seemed a bit problematic to me. 
That aside, experiencing an error trying attempting:

sa-learn -D --spam /var/mail/spamd/Cabinet.saved-spam

The last line shows:

***************
Learned tokens from 0 message(s) (1 message(s) examined)
ERROR: the Bayes learn function returned an error, please re-run with -D 
for more information at /usr/bin/sa-learn line 500.
***************

Which may be permissions related.  However, there seem to be some 
errors/warning at the beginning, starting with:

***************
Feb 14 17:26:14.956 [2855] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::Razo                                 r2 from 
@INC
Feb 14 17:26:14.959 [2855] dbg: razor2: razor2 is not available
Feb 14 17:26:14.959 [2855] dbg: plugin: loading 
Mail::SpamAssassin::Plugin::SpamCop from @INC
plugin: failed to parse plugin (from @INC): Can't locate 
Mail/SpamAssassin/Plugin/SpamCop.pm: 
lib/Mail/SpamAssassin/Plugin/SpamCop.pm: Permission denied at (eval 44) 
line 1.
***************

While this also suggests a permissions issue the only place I find 
SpamCom.pm (even as root) is at: 
"/usr/lib/perl5/vendor_perl/5.26.1/Mail/SpamAssassin/Plugin/SpamCop.pm", 
which is not in the path sa-learn concocted when invoked.

Sorry if the formatting is weird or if this is useless information.

Re: BAYES_00 BODY. Negative score?

Posted by hg user <me...@gmail.com>.
he should not compare all the tokens but a rapid survey on the tokens
derived from headers can tell him how the bayes result was formed.

A couple of weeks ago some phishing reached our inboxes. Our custom rule
gave the message 5 points but I was surprised that the message was
categorized  BAYES_00, -1.9.

I run the bayes debug and found that clearly spam words were not recognized
as spammy. Then I discovered that one admin enable auto-learning by mistake
and the database was full of garbage...

I cleared the db, reloaded it with our hand-selected corpus and the message
was now BAYES_50.



On Wed, Feb 15, 2023 at 3:27 PM Matus UHLAR - fantomas <uh...@fantomas.sk>
wrote:

> On 15.02.23 14:53, hg user wrote:
> >If you run spamassasin with -D bayes -t xxx  2>debug.log
> >
> >in debug.log you will see all the "tokens" the bayes system extracts
> >from the headers and you will probably find a lot of them related to
> >mailing lists.
> >
> >If you teach SA that those tokens are spam and they are present both
> >in WP or Forbes, their emails will be flagged. It's normal.
>
> Don't expect anyone to manually compare tokens, unless they are deeply
> debugging bayes functionality.
>
> Simply said, bayes DOES gather all possible tokens and compare their
> occurence with interesting effectivity - if you train Forbes and WP
> newsletters as ham, and other newsletters as spam, bayes should be able to
> distinguish them quite nicely.
>
> However, many of tokens in even Forbes and WP newsletters may occure in
> different spamy newsletters, so be careful when traning even these.
>
> If you get the score down enough not to be classified as spam, you've won
> and should not contine (unless you are willing to check all BAYES_0 mail
> for
> suspicious newsletters and train those as spam, seeing how much it affects
> mentioned Forbes and WP newsletters.
>
> Bayes training is great, but one should be careful about that.
>
>
> >If you want you can use bayes_ignore_header to ignore some headers.
>
> this rarely helps.
>
>
> >On 2/15/23, Matus UHLAR - fantomas <uh...@fantomas.sk> wrote:
> >>>>*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
> >>>> >*      [score: 0.0000]
> >>>>
> >>>> This indicates a mistrained database, which means you have trained too
> >>>> many
> >>>> spams or spam-like messages (commercial messages) as ham.
> >>>>
> >>>> Proper training of spams should help. Just keep your spam (and
> >>>> optionally
> >>>> ham) corpora for retraining in case you would drop the database.
> >>>>
> >>>> I also recommend to abstain from training commercial mail (notices
> from
> >>>> e-shops, companies you done business with etc) as ham, unless they
> >>>> generate
> >>>> BAYES_999 score and you want it lower.  I often train them as spam so
> >>>> those
> >>>> give uncertain BAYES_50 result.
> >>
> >> On 14.02.23 23:05, Alex wrote:
> >>>Is there any ability to distinguish a legitimate newsletter from a spam
> >>>newsletter?
> >>
> >> Very hard.
> >>
> >> That's why I recommend not to train newsletters unless you know
> you/users
> >> want them and they produce BAYES_99 result.
> >>
> >>
> >>>In other words, if I train emails from Forbes or Washington Post as ham,
> >>>then train similar newsletter emails from other other providers that are
> >>>more suspect, will bayes still be able to distinguish Forbes and WP as
> >>> ham?
> >>
> >>>The problem is that if I avoid training newsletters or bulk email
> >>>altogether, then I'm also left with spam newsletters still only hitting
> >>>bayes50.
> >>
> >> If you only do this for Forbes or Washington Post, bayes will likely be
> able
> >>
> >> to distinguish other newsletters, if you train those as spam.
> >>
> >>>I'm actually in a situation now where Forbes and WP newsletters are
> being
> >>>marked as spam, so considering retraining, but wondering what
> >>> approach/best
> >>>practices I should be following.
> >>
> >> This should be safe. There are many types of newsletters, the problem
> would
> >>
> >> only be if you started training them as ham unless you really know they
> are
> >>
> >> welcome.
> >>
> >> --
> >> Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
> >> Warning: I wish NOT to receive e-mail advertising to this address.
> >> Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
> >> WinError #99999: Out of error messages.
> >>
>
> --
> Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
> Warning: I wish NOT to receive e-mail advertising to this address.
> Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
> Save the whales. Collect the whole set.
>

Re: BAYES_00 BODY. Negative score?

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
>> However, many of tokens in even Forbes and WP newsletters may occure in
>> different spamy newsletters, so be careful when traning even these.

On 15.02.23 09:51, Alex wrote:
>This is exactly what I was thinking. When going through the quarantine,
>it's also very difficult to always not only identify which newsletters may
>have been miscategorized or trained incorrectly, but also ever being able
>to correct an improperly trained newsletter (or email in general).

this is why I recomment not to do any training on newsletters, or at least 
no HAM training unless they are known.

>> If you get the score down enough not to be classified as spam, you've won
>> and should not contine (unless you are willing to check all BAYES_0 mail
>> for
>> suspicious newsletters and train those as spam, seeing how much it affects
>> mentioned Forbes and WP newsletters.

>Too bad it wasn't possible to build a shared list of trusted
>newsletters/senders to compensate for these mistakes.

I wouldn't trust such list, too many organizations set up their newsletters 
to anyone they (n)ever communicated with...

>On a related note, how about emails with only an image attachment? People
>use email to send pictures, screenshots and other emails with nothing in
>the body and sometimes even no subject, but aren't spam. The ones I see in
>the quarantine are almost always ham, and despite training them as ham
>(even with --max-size 0), they continue to be tagged as spam.

There are a few rules supposed to catch short/empty messages with image 
attachment.

There is ExtractText plugin that supports OCR scanning with tesseract, which 
should be able to extract any text in those images. But note that OCR takes 
time.

>I've always also had difficulty with marking them so DCC ignores them.

yes, from DCC's point of view they are empty messages and it's hard to score 
anything besides EMPTY_MESSAGE and rules I mentioned above.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Microsoft dick is soft to do no harm

Re: BAYES_00 BODY. Negative score?

Posted by Alex <my...@gmail.com>.
Hi,

>
> However, many of tokens in even Forbes and WP newsletters may occure in
> different spamy newsletters, so be careful when traning even these.
>

This is exactly what I was thinking. When going through the quarantine,
it's also very difficult to always not only identify which newsletters may
have been miscategorized or trained incorrectly, but also ever being able
to correct an improperly trained newsletter (or email in general).


> If you get the score down enough not to be classified as spam, you've won
> and should not contine (unless you are willing to check all BAYES_0 mail
> for
> suspicious newsletters and train those as spam, seeing how much it affects
> mentioned Forbes and WP newsletters.
>

Too bad it wasn't possible to build a shared list of trusted
newsletters/senders to compensate for these mistakes.

On a related note, how about emails with only an image attachment? People
use email to send pictures, screenshots and other emails with nothing in
the body and sometimes even no subject, but aren't spam. The ones I see in
the quarantine are almost always ham, and despite training them as ham
(even with --max-size 0), they continue to be tagged as spam.

I've always also had difficulty with marking them so DCC ignores them.

Re: BAYES_00 BODY. Negative score?

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 15.02.23 14:53, hg user wrote:
>If you run spamassasin with -D bayes -t xxx  2>debug.log
>
>in debug.log you will see all the "tokens" the bayes system extracts
>from the headers and you will probably find a lot of them related to
>mailing lists.
>
>If you teach SA that those tokens are spam and they are present both
>in WP or Forbes, their emails will be flagged. It's normal.

Don't expect anyone to manually compare tokens, unless they are deeply 
debugging bayes functionality.

Simply said, bayes DOES gather all possible tokens and compare their 
occurence with interesting effectivity - if you train Forbes and WP 
newsletters as ham, and other newsletters as spam, bayes should be able to 
distinguish them quite nicely.

However, many of tokens in even Forbes and WP newsletters may occure in 
different spamy newsletters, so be careful when traning even these.

If you get the score down enough not to be classified as spam, you've won 
and should not contine (unless you are willing to check all BAYES_0 mail for 
suspicious newsletters and train those as spam, seeing how much it affects 
mentioned Forbes and WP newsletters.

Bayes training is great, but one should be careful about that.


>If you want you can use bayes_ignore_header to ignore some headers.

this rarely helps.


>On 2/15/23, Matus UHLAR - fantomas <uh...@fantomas.sk> wrote:
>>>>*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>>>> >*      [score: 0.0000]
>>>>
>>>> This indicates a mistrained database, which means you have trained too
>>>> many
>>>> spams or spam-like messages (commercial messages) as ham.
>>>>
>>>> Proper training of spams should help. Just keep your spam (and
>>>> optionally
>>>> ham) corpora for retraining in case you would drop the database.
>>>>
>>>> I also recommend to abstain from training commercial mail (notices from
>>>> e-shops, companies you done business with etc) as ham, unless they
>>>> generate
>>>> BAYES_999 score and you want it lower.  I often train them as spam so
>>>> those
>>>> give uncertain BAYES_50 result.
>>
>> On 14.02.23 23:05, Alex wrote:
>>>Is there any ability to distinguish a legitimate newsletter from a spam
>>>newsletter?
>>
>> Very hard.
>>
>> That's why I recommend not to train newsletters unless you know you/users
>> want them and they produce BAYES_99 result.
>>
>>
>>>In other words, if I train emails from Forbes or Washington Post as ham,
>>>then train similar newsletter emails from other other providers that are
>>>more suspect, will bayes still be able to distinguish Forbes and WP as
>>> ham?
>>
>>>The problem is that if I avoid training newsletters or bulk email
>>>altogether, then I'm also left with spam newsletters still only hitting
>>>bayes50.
>>
>> If you only do this for Forbes or Washington Post, bayes will likely be able
>>
>> to distinguish other newsletters, if you train those as spam.
>>
>>>I'm actually in a situation now where Forbes and WP newsletters are being
>>>marked as spam, so considering retraining, but wondering what
>>> approach/best
>>>practices I should be following.
>>
>> This should be safe. There are many types of newsletters, the problem would
>>
>> only be if you started training them as ham unless you really know they are
>>
>> welcome.
>>
>> --
>> Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
>> Warning: I wish NOT to receive e-mail advertising to this address.
>> Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
>> WinError #99999: Out of error messages.
>>

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Save the whales. Collect the whole set.

Re: BAYES_00 BODY. Negative score?

Posted by hg user <me...@gmail.com>.
If you run spamassasin with -D bayes -t xxx  2>debug.log

in debug.log you will see all the "tokens" the bayes system extracts
from the headers and you will probably find a lot of them related to
mailing lists.

If you teach SA that those tokens are spam and they are present both
in WP or Forbes, their emails will be flagged. It's normal.

If you want you can use bayes_ignore_header to ignore some headers.



On 2/15/23, Matus UHLAR - fantomas <uh...@fantomas.sk> wrote:
>>>*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>>> >*      [score: 0.0000]
>>>
>>> This indicates a mistrained database, which means you have trained too
>>> many
>>> spams or spam-like messages (commercial messages) as ham.
>>>
>>> Proper training of spams should help. Just keep your spam (and
>>> optionally
>>> ham) corpora for retraining in case you would drop the database.
>>>
>>> I also recommend to abstain from training commercial mail (notices from
>>> e-shops, companies you done business with etc) as ham, unless they
>>> generate
>>> BAYES_999 score and you want it lower.  I often train them as spam so
>>> those
>>> give uncertain BAYES_50 result.
>
> On 14.02.23 23:05, Alex wrote:
>>Is there any ability to distinguish a legitimate newsletter from a spam
>>newsletter?
>
> Very hard.
>
> That's why I recommend not to train newsletters unless you know you/users
> want them and they produce BAYES_99 result.
>
>
>>In other words, if I train emails from Forbes or Washington Post as ham,
>>then train similar newsletter emails from other other providers that are
>>more suspect, will bayes still be able to distinguish Forbes and WP as
>> ham?
>
>>The problem is that if I avoid training newsletters or bulk email
>>altogether, then I'm also left with spam newsletters still only hitting
>>bayes50.
>
> If you only do this for Forbes or Washington Post, bayes will likely be able
>
> to distinguish other newsletters, if you train those as spam.
>
>>I'm actually in a situation now where Forbes and WP newsletters are being
>>marked as spam, so considering retraining, but wondering what
>> approach/best
>>practices I should be following.
>
> This should be safe. There are many types of newsletters, the problem would
>
> only be if you started training them as ham unless you really know they are
>
> welcome.
>
> --
> Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
> Warning: I wish NOT to receive e-mail advertising to this address.
> Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
> WinError #99999: Out of error messages.
>

Re: BAYES_00 BODY. Negative score?

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
>>*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>> >*      [score: 0.0000]
>>
>> This indicates a mistrained database, which means you have trained too
>> many
>> spams or spam-like messages (commercial messages) as ham.
>>
>> Proper training of spams should help. Just keep your spam (and optionally
>> ham) corpora for retraining in case you would drop the database.
>>
>> I also recommend to abstain from training commercial mail (notices from
>> e-shops, companies you done business with etc) as ham, unless they
>> generate
>> BAYES_999 score and you want it lower.  I often train them as spam so
>> those
>> give uncertain BAYES_50 result.

On 14.02.23 23:05, Alex wrote:
>Is there any ability to distinguish a legitimate newsletter from a spam
>newsletter?

Very hard.

That's why I recommend not to train newsletters unless you know you/users 
want them and they produce BAYES_99 result.


>In other words, if I train emails from Forbes or Washington Post as ham,
>then train similar newsletter emails from other other providers that are
>more suspect, will bayes still be able to distinguish Forbes and WP as ham?

>The problem is that if I avoid training newsletters or bulk email
>altogether, then I'm also left with spam newsletters still only hitting
>bayes50.

If you only do this for Forbes or Washington Post, bayes will likely be able 
to distinguish other newsletters, if you train those as spam.

>I'm actually in a situation now where Forbes and WP newsletters are being
>marked as spam, so considering retraining, but wondering what approach/best
>practices I should be following.

This should be safe. There are many types of newsletters, the problem would 
only be if you started training them as ham unless you really know they are 
welcome.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
WinError #99999: Out of error messages.

Re: BAYES_00 BODY. Negative score?

Posted by Alex <my...@gmail.com>.
Hi,

>*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
> >*      [score: 0.0000]
>
> This indicates a mistrained database, which means you have trained too
> many
> spams or spam-like messages (commercial messages) as ham.
>
> Proper training of spams should help. Just keep your spam (and optionally
> ham) corpora for retraining in case you would drop the database.
>
> I also recommend to abstain from training commercial mail (notices from
> e-shops, companies you done business with etc) as ham, unless they
> generate
> BAYES_999 score and you want it lower.  I often train them as spam so
> those
> give uncertain BAYES_50 result.
>

Is there any ability to distinguish a legitimate newsletter from a spam
newsletter?

In other words, if I train emails from Forbes or Washington Post as ham,
then train similar newsletter emails from other other providers that are
more suspect, will bayes still be able to distinguish Forbes and WP as ham?

The problem is that if I avoid training newsletters or bulk email
altogether, then I'm also left with spam newsletters still only hitting
bayes50.

I'm actually in a situation now where Forbes and WP newsletters are being
marked as spam, so considering retraining, but wondering what approach/best
practices I should be following.

 # sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0      97002          0  non-token data: nspam
0.000          0      90173          0  non-token data: nham
0.000          0   11581565          0  non-token data: ntokens
0.000          0 1054224948          0  non-token data: oldest atime
0.000          0 1676433889          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal sync
atime
0.000          0 1648164856          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire atime
delta
0.000          0          0          0  non-token data: last expire
reduction count

Re: BAYES_00 BODY. Negative score?

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 13.02.23 17:42, joe a wrote:
>Have some annoying SPAM that consistently shows a negative score on 
>BAYES.  Is the default scoring or influenced by BAYES in some way?
>
>*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>*      [score: 0.0000]

This indicates a mistrained database, which means you have trained too many 
spams or spam-like messages (commercial messages) as ham.

Proper training of spams should help. Just keep your spam (and optionally 
ham) corpora for retraining in case you would drop the database.

I also recommend to abstain from training commercial mail (notices from 
e-shops, companies you done business with etc) as ham, unless they generate 
BAYES_999 score and you want it lower.  I often train them as spam so those 
give uncertain BAYES_50 result.

Those mails resemble spam too much to be used for training.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
I don't have lysdexia. The Dog wouldn't allow that.

Re: BAYES_00 BODY. Negative score?

Posted by Benny Pedersen <me...@junc.eu>.
joe a skrev den 2023-02-14 00:12:
> On 2/13/2023 5:51 PM, Benny Pedersen wrote:
>> joe a skrev den 2023-02-13 23:42:
>>> Have some annoying SPAM that consistently shows a negative score on
>>> BAYES.  Is the default scoring or influenced by BAYES in some way?
>>> 
>>> *-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>>> *      [score: 0.0000]
>>> 
>>> SpamAssassin 3.4.5
>> 
>> time to upgrade imho :=)
>> 
>> or train bayes to know what is spam or not spam, if it fails turn off 
>> autolearn, make a burdon what is autolearned
>> 
>> in local.cf
>> 
>> bayes_auto_learn_threshold_nonspam n.nn (default: 0.1)
>> The score threshold below which a mail has to score, to be fed into 
>> SpamAssassin's learning systems automatically as a non-spam message.
>> bayes_auto_learn_threshold_spam n.nn (default: 12.0)
>> The score threshold above which a mail has to score, to be fed into 
>> SpamAssassin's learning systems automatically as a spam message.
>> 
>> i have changed scores on this 2 :)
>> 
>> now i dont need manuely training
>> 
>> above is a plugin that need to be enabled for this to work
>> 
>> remember to do a spamassassin --lint on changes of config files
> 
> So, what did you change them to, may I ask?  Not sure I really
> understand those limits.

bayes_auto_learn_threshold_nonspam -5
bayes_auto_learn_threshold_spam 5

means all under minus 5 is autolearned as non ham, and all above 5 is 
autolearned as spam

but this is just a suggestion not a recomending, spam and ham is 
diffrent pr recipient

> 
> In any case, I feed new SPAM and HAM into BAYES twice a day. via
> scripts, etc. so I really should have autolearn off, yes?
> 
> Maybe I need to retrain BAYES?  IIRC last time took "a long time".

yes

Re: BAYES_00 BODY. Negative score?

Posted by joe a <jo...@j4computers.com>.
On 2/13/2023 5:51 PM, Benny Pedersen wrote:
> joe a skrev den 2023-02-13 23:42:
>> Have some annoying SPAM that consistently shows a negative score on
>> BAYES.  Is the default scoring or influenced by BAYES in some way?
>>
>> *-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>> *      [score: 0.0000]
>>
>> SpamAssassin 3.4.5
> 
> time to upgrade imho :=)
> 
> or train bayes to know what is spam or not spam, if it fails turn off 
> autolearn, make a burdon what is autolearned
> 
> in local.cf
> 
> bayes_auto_learn_threshold_nonspam n.nn (default: 0.1)
> The score threshold below which a mail has to score, to be fed into 
> SpamAssassin's learning systems automatically as a non-spam message.
> bayes_auto_learn_threshold_spam n.nn (default: 12.0)
> The score threshold above which a mail has to score, to be fed into 
> SpamAssassin's learning systems automatically as a spam message.
> 
> i have changed scores on this 2 :)
> 
> now i dont need manuely training
> 
> above is a plugin that need to be enabled for this to work
> 
> remember to do a spamassassin --lint on changes of config files

So, what did you change them to, may I ask?  Not sure I really 
understand those limits.

In any case, I feed new SPAM and HAM into BAYES twice a day. via 
scripts, etc. so I really should have autolearn off, yes?

Maybe I need to retrain BAYES?  IIRC last time took "a long time".


Re: BAYES_00 BODY. Negative score?

Posted by joe a <jo...@j4computers.com>.
On 2/13/2023 5:51 PM, Benny Pedersen wrote:
> joe a skrev den 2023-02-13 23:42:
>> Have some annoying SPAM that consistently shows a negative score on
>> . . .
> 
> time to upgrade imho :=)
> . . .

And, yes, I should upgrade.


Re: BAYES_00 BODY. Negative score?

Posted by Benny Pedersen <me...@junc.eu>.
joe a skrev den 2023-02-13 23:42:
> Have some annoying SPAM that consistently shows a negative score on
> BAYES.  Is the default scoring or influenced by BAYES in some way?
> 
> *-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
> *      [score: 0.0000]
> 
> SpamAssassin 3.4.5

time to upgrade imho :=)

or train bayes to know what is spam or not spam, if it fails turn off 
autolearn, make a burdon what is autolearned

in local.cf

bayes_auto_learn_threshold_nonspam n.nn (default: 0.1)
The score threshold below which a mail has to score, to be fed into 
SpamAssassin's learning systems automatically as a non-spam message.
bayes_auto_learn_threshold_spam n.nn (default: 12.0)
The score threshold above which a mail has to score, to be fed into 
SpamAssassin's learning systems automatically as a spam message.

i have changed scores on this 2 :)

now i dont need manuely training

above is a plugin that need to be enabled for this to work

remember to do a spamassassin --lint on changes of config files

Re: BAYES_00 BODY. Negative score?

Posted by Loren Wilton <lw...@earthlink.net>.
> Have some annoying SPAM that consistently shows a negative score on BAYES. 
> Is the default scoring or influenced by BAYES in some way?
>
> *-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
> *      [score: 0.0000]

The score is reasonable for guaranteed ham, which is what your Bayes thinks 
this spam email is
Of course the score isn't reasonable for spam, but Bayes thinks it is ham.

In addition to being cautious of autolearn as Benny descriped, yes, you need 
to retrain your Bayes, because it is very clearly confused on this point.

        Loren