You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Joe Zitnik <JZ...@hfcc.net> on 2005/11/29 23:39:37 UTC

Bayes feeding

I apologize if this has been addressed before, but is there a consensus
on feeding bayes ham that is outbound from your organization?  It seems
to make sense to me.  You can almost guarantee the words bayes will be
"learning" are related to your organizations business function.  Even if
they are personal e-mail, it seems to be an excellent source of ham.  Is
there a problem with this, or a flaw in my reasoning?  Part of the
reason this is so attractive is that I am having problems matching the
amount of ham I feed bayes with the amount of spam I have access to. 
Right now, about 80% of my inbound mail is spam.

Re: Bayes feeding

Posted by Matt Kettler <mk...@evi-inc.com>.
Joe Zitnik wrote:
> I apologize if this has been addressed before, but is there a consensus
> on feeding bayes ham that is outbound from your organization?  It seems
> to make sense to me.  You can almost guarantee the words bayes will be
> "learning" are related to your organizations business function.  Even if
> they are personal e-mail, it seems to be an excellent source of ham.  Is
> there a problem with this, or a flaw in my reasoning?

No, I don't see any general flaw, but you need to be sure your internal systems
won't be sending any spam/viruses. This may be more difficult than you think,
even if you trust all your users.

All it takes is one good trojan with a backdoor. Even if you trust your users to
not open email attachments, what about one that loads via an unpatched browser
vulnerability (such as the current one for IE that has no patch) that gets
exploited by a malicious server after a user mis-types a domain name? It takes a
highly security savvy user to be protected against such things. Do any of your
users use IE today? Have they disabled javascript entirely?

A properly constructed backdoor is rather difficult to detect until it starts
sending spam or doing other misdeeds at the behest of it's controller. It's also
damn near impossible to prevent an outsider from controlling a good backdoor
once it's infected a PC with any kind of Internet access.

And before you mention your firewall protecting you from backdoor, will it
protect you against a reverse-shell backdoor?

(For reference, here's a paper on a reverse-shell backdoor over http:
http://www.thc.org/papers/fw-backd.htm. Not an uncommon trick, and will get past
most stateful inspection and application layer firewalls. )



>  Part of the
> reason this is so attractive is that I am having problems matching the
> amount of ham I feed bayes with the amount of spam I have access to. 

Although 1:1 is a good ideal, the use of chi-squared combining makes SA's bayes
very resistant to considerable deviation. Don't kill yourself trying to get a
1:1 ratio.

My current spam:ham ratio is 8.3:1, but I've had ratios as high as 30:1 with no
problem.

> Right now, about 80% of my inbound mail is spam.