You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Alex Regan <my...@gmail.com> on 2014/10/22 03:29:27 UTC

General rules for training bayes

Hi all,

I'm having some trouble with my bayes database, and thought it would be 
a good time to just rebuild it. I'm wondering if anyone has any good 
suggestions for the type of mail that should be used for training.

I understand individually-crafted emails would make the best ham, but do 
you train mail from mass-mailers? Staples? Facebook? Banks?

The main problem I'd like to avoid is the emails that are questionable 
as to whether they were opt-in and something the user actually wants, or 
those that are probably spam.

I have the database in a replicated mysql database for now. I'd like to 
go to redis, but it's not quite ready for distributed configurations, 
correct?

Thanks,
Alex

Re: General rules for training bayes

Posted by Axb <ax...@gmail.com>.

On 10/22/2014 03:29 AM, Alex Regan wrote:
> I have the database in a replicated mysql database for now. I'd like to
> go to redis, but it's not quite ready for distributed configurations,
> correct?

What do you mean by "distributed configurations"?

- many clients querying a central Redis DB?

- real clustering?

- something like mysql dual master?

Redis *does* support master/slaves config.
plus failover handling via Sentinel
(http://redis.io/topics/sentinel)

Redis does *not* support full clustering. (atm...Redis cluster is RC1)

all this you can read on http://redis.io

h2h
Axb

Re: General rules for training bayes

Posted by Reindl Harald <h....@thelounge.net>.

Am 22.10.2014 um 13:15 schrieb Benny Pedersen:
> On October 22, 2014 1:08:45 PM Matus UHLAR - fantomas:
>
>> be careful about forwarded mail, if possible. if you get many spam
>> from your
>> old account, it may start to classify ALL mail forwarded through that
>
> This only correct if internal networks and or trusted networks is not
> configured correct

what has a forwarding from @gmx.net or so to do with trusted_networks?

the topic was about train the bayes on the Received headers and no 
single word about internal hops

Re: General rules for training bayes

Posted by RW <rw...@googlemail.com>.

On Wed, 22 Oct 2014 13:30:44 +0200
Matus UHLAR - fantomas wrote:

> >>be careful about forwarded mail, if possible. if you get many spam
> >>from your old account, it may start to classify ALL mail forwarded
> >>through that
> 
> On 22.10.14 13:15, Benny Pedersen wrote:
> >This only correct if internal networks and or trusted networks is
> >not configured correct
> 
> oh, does BAYES take care about these?

To a limited extent. It effects the contents of some metadata, but
I don't think affect which headers are tokenized.

Re: General rules for training bayes

Posted by Benny Pedersen <me...@junc.eu>.

On October 22, 2014 3:05:56 PM Matus UHLAR - fantomas <uh...@fantomas.sk> 
wrote:

> >>On October 22, 2014 1:30:44 PM Matus UHLAR - fantomas
> >><uh...@fantomas.sk> wrote:
> >>>oh, does BAYES take care about these?
> >>>we are still talking about manually feeding BAYES, aren't we?
>
> >Am 22.10.2014 um 14:30 schrieb Benny Pedersen:
> >>Sorry, yes bayes can be ignore all headers if one dont like it to track
> >>origin senders or ips
>
> On 22.10.14 14:44, Reindl Harald wrote:
> >again: what has that to do with trusted_networks?
>
> seems that Benny just missed the fact that we are talking about BAYES.
> I think it's clear now...

Its all independic but related

Re: General rules for training bayes

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.

>>On October 22, 2014 1:30:44 PM Matus UHLAR - fantomas
>><uh...@fantomas.sk> wrote:
>>>oh, does BAYES take care about these?
>>>we are still talking about manually feeding BAYES, aren't we?

>Am 22.10.2014 um 14:30 schrieb Benny Pedersen:
>>Sorry, yes bayes can be ignore all headers if one dont like it to track
>>origin senders or ips

On 22.10.14 14:44, Reindl Harald wrote:
>again: what has that to do with trusted_networks?

seems that Benny just missed the fact that we are talking about BAYES.
I think it's clear now...

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
The 3 biggets disasters: Hiroshima 45, Tschernobyl 86, Windows 95

Re: General rules for training bayes

Posted by RW <rw...@googlemail.com>.

On Wed, 22 Oct 2014 14:44:24 +0200
Reindl Harald wrote:

> 
> Am 22.10.2014 um 14:30 schrieb Benny Pedersen:
> > On October 22, 2014 1:30:44 PM Matus UHLAR - fantomas
> > <uh...@fantomas.sk> wrote:
> >
> >> oh, does BAYES take care about these?
> >> we are still talking about manually feeding BAYES, aren't we?
> >
> > Sorry, yes bayes can be ignore all headers if one dont like it to
> > track origin senders or ips
> 
> again: what has that to do with trusted_networks?

His original point was not irrelevant. Trusted and internal networks
settings affect how the the Received  headers are normalized into
metadata. Extending the trusted networks doesn't eliminate
tokens from irrelevant headers added in the trusted path, but it does
cause them to produce separate tokens.

Re: General rules for training bayes

Posted by Reindl Harald <h....@thelounge.net>.

Am 22.10.2014 um 14:30 schrieb Benny Pedersen:
> On October 22, 2014 1:30:44 PM Matus UHLAR - fantomas
> <uh...@fantomas.sk> wrote:
>
>> oh, does BAYES take care about these?
>> we are still talking about manually feeding BAYES, aren't we?
>
> Sorry, yes bayes can be ignore all headers if one dont like it to track
> origin senders or ips

again: what has that to do with trusted_networks?

back to what you said above: it don't by default and so your response 
was completly OT as well your yesterdays "Fokus should just be reversed 
to allow ip ranges not deny ip ranges" in context of fail2ban

if you want to do that just remove fail2ban and open your ports only for 
specific IP's and you are done - but please try to stay at context

Re: General rules for training bayes

Posted by Benny Pedersen <me...@junc.eu>.

On October 22, 2014 1:30:44 PM Matus UHLAR - fantomas <uh...@fantomas.sk> 
wrote:

> oh, does BAYES take care about these?
> we are still talking about manually feeding BAYES, aren't we?

Sorry, yes bayes can be ignore all headers if one dont like it to track 
origin senders or ips

Re: General rules for training bayes

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.

>>be careful about forwarded mail, if possible. if you get many spam from your
>>old account, it may start to classify ALL mail forwarded through that

On 22.10.14 13:15, Benny Pedersen wrote:
>This only correct if internal networks and or trusted networks is not 
>configured correct

oh, does BAYES take care about these?

we are still talking about manually feeding BAYES, aren't we?
-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
M$ Win's are shit, do not use it !

Re: General rules for training bayes

Posted by Benny Pedersen <me...@junc.eu>.

On October 22, 2014 1:08:45 PM Matus UHLAR - fantomas <uh...@fantomas.sk> 
wrote:

> be careful about forwarded mail, if possible. if you get many spam from your
> old account, it may start to classify ALL mail forwarded through that

This only correct if internal networks and or trusted networks is not 
configured correct

Re: General rules for training bayes

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.

>>>but do you train mail from mass-mailers?  Staples?  Facebook?  Banks?
>>
>>why not? of course I train if I want such mail to be properly classified
>>later.

On 22.10.14 14:36, Alex Regan wrote:
>The problem I've had with doing this is that it's often so difficult 
>to determine which bulk message should be considered ham and which 
>were not. This would somewhat raise the burden on the sender instead 
>of automatically giving them a -1.9 pass.

oh, this is the problem... train only on mail you are sure.
And keep your corpora somewhere for later re-training.
-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux IS user friendly, it's just selective who its friends are...

Re: General rules for training bayes

Posted by Alex Regan <my...@gmail.com>.

Hi,

>> I'm having some trouble with my bayes database, and thought it would
>> be a good time to just rebuild it. I'm wondering if anyone has any
>> good suggestions for the type of mail that should be used for training.
>
> be careful about forwarded mail, if possible. if you get many spam from
> your
> old account, it may start to classify ALL mail forwarded through that
> account as spam.

After reading the rest of the comments on this, the point is to make 
sure trusted_networks is properly configured, correct? That's been done 
for me long ago.

>> I understand individually-crafted emails would make the best ham,
>
> crafted?

I meant specifically business-related email. Correspondence between 
co-workers, clients/customers, etc, as the main focus of what should be 
bayes00.

>> but do you train mail from mass-mailers?  Staples?  Facebook?  Banks?
>
> why not? of course I train if I want such mail to be properly classified
> later.

The problem I've had with doing this is that it's often so difficult to 
determine which bulk message should be considered ham and which were 
not. This would somewhat raise the burden on the sender instead of 
automatically giving them a -1.9 pass.

Thanks,
Alex

Re: General rules for training bayes

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.

On 21.10.14 21:29, Alex Regan wrote:
>I'm having some trouble with my bayes database, and thought it would 
>be a good time to just rebuild it. I'm wondering if anyone has any 
>good suggestions for the type of mail that should be used for 
>training.

be careful about forwarded mail, if possible. if you get many spam from your
old account, it may start to classify ALL mail forwarded through that
account as spam.

>I understand individually-crafted emails would make the best ham, 

crafted?

> but do you train mail from mass-mailers?  Staples?  Facebook?  Banks?

why not? of course I train if I want such mail to be properly classified
later.

>The main problem I'd like to avoid is the emails that are 
>questionable as to whether they were opt-in and something the user 
>actually wants, or those that are probably spam.

agreed.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
I'm not interested in your website anymore.
If you need cookies, bake them yourself.