You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jason Bertoch <ja...@i6ix.com> on 2010/02/26 04:20:00 UTC

Bayes and Time of Day

Although I grasp the concept of Bayes in the SA system, I don't fully 
understand how and which tokens it grabs from mails passed through SA.  
Although many servers deal with 24-hour customers, mine is 98% business 
only 8AM to 5PM.  Does the SA Bayes system even look at time of day for 
tokens?  I wonder if I'd be able to more strongly scrutinize mails 
outside of normal business hours.  Thoughts?

/Jason

Re: Bayes and Time of Day

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Thu, 2010-02-25 at 22:20 -0500, Jason Bertoch wrote:
> Although I grasp the concept of Bayes in the SA system, I don't fully 
> understand how and which tokens it grabs from mails passed through SA.  
> Although many servers deal with 24-hour customers, mine is 98% business 
> only 8AM to 5PM.  Does the SA Bayes system even look at time of day for 
> tokens?  I wonder if I'd be able to more strongly scrutinize mails 
> outside of normal business hours.  Thoughts?

Are you talking about generating different tokens for a given word based
on the time of receiving? As in "fuck" being a valid term in daily tech
and sales language, but indicating porn during the night. That would
horribly bloat the Bayes DB and result in too many tokens with very low
occurrences.

Or are you talking about merely tokenizing date strings somehow for
Bayes? In this case, a simple, plain rule would be much more effective
and not depend on Bayes.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}