You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "C. Bensend" <be...@bennyvision.com> on 2006/12/09 02:39:42 UTC

user_bayes_sql_custom_query ?

Hey folks,

   So, I've been giving this some thought in the last week, as I'm
running into the old "either site bayes or per-user bayes, nothing
in between" issue.  I'm using simscan, which passes the first email
address to spamc, so for me it's a per-email-address limitation.

   For a majority of my users, that's fine - they only have _one_
email address.  For me, it's a problem, as I have dozens of email
addresses that are delivered to me, and sorted via maildrop.  Many
of these secondary addresses get tons of spam, but because they're
delivered to aliases, SA never applies bayes scoring, because the
"user" doesn't match the user my bayes database uses (using SQL,
of course).

   I would _love_ to have a bayes equivalent of
user_score_sql_custom_query, where spamd would query a table
consisting of something like so:

email_alias  CHAR(64)
email_user   CHAR(64)

or something similar.  That way, I could populate it with data like:

benny@bennyvision.com       benny@bennyvision.com
alias1@bennyvision.com      benny@bennyvision.com
alias2@bennyvision.com      benny@bennyvision.com
alias93@bennyvision.com     theotheraccount@bennyvision.com
alias210@bennyvision.com    benny@bennyvision.com

etc...

   So, in this scenario, an email comes in destined to one of the
many secondary email addresses.  spamd makes a query ("SELECT
email_user FROM aliases WHERE email_alias = '$user'").  If spamd
gets a hit, great, try to initialize the bayes database for that
user.  If not, skip bayes and go on with life.

   Just a thought.  It would certainly help me in my situation,
but perhaps I'm just spending a little too much quality time with
the crackpipe.

Good idea?  Bad idea?  Dumb idea?

Benny


-- 
"The faster you finish the fight, the less shot you will get."
                                            -- Marine Corps Rules for
                                               Gunfighting



Re: user_bayes_sql_custom_query ?

Posted by "C. Bensend" <be...@bennyvision.com>.
> Why not modify simscan to do this kind of lookup for you, and pass the
> "correct" username to SA?

Yes, absolutely, that would be another solution to the issue.  :)

The reason I ask here is because SA already does almost exactly
this sort of lookup for userpref.  Maybe some of the code could be
reused, but maybe not...  I'm not a developer, and you'd weep
yourself to sleep for weeks on end if I tried to come up with a
patch.  ;)

If there's no interest/resources, no problem.  It would be nice to
have, though.  :)

Benny


-- 
"The faster you finish the fight, the less shot you will get."
                                            -- Marine Corps Rules for
                                               Gunfighting



Re: user_bayes_sql_custom_query ?

Posted by Theo Van Dinter <fe...@apache.org>.
On Fri, Dec 08, 2006 at 07:39:42PM -0600, C. Bensend wrote:
> in between" issue.  I'm using simscan, which passes the first email
> address to spamc, so for me it's a per-email-address limitation.
[...]
>    I would _love_ to have a bayes equivalent of
> user_score_sql_custom_query, where spamd would query a table
> consisting of something like so:
> 
> email_alias  CHAR(64)
> email_user   CHAR(64)

Why not modify simscan to do this kind of lookup for you, and pass the
"correct" username to SA?

-- 
Randomly Selected Tagline:
"Your computer hasn't been returning all the bits it gets from the
 Internet." - Today's BOFH Excuse

Re: user_bayes_sql_custom_query ?

Posted by Quinn Comendant <qu...@strangecode.com>.
And as far as I understand it user aliases are only half the problem. On my simscan installation (simscan 1.2 from qmailtoaster.com) if an incoming messages has multiple recipients, simscan doesn't know which one to use and the username that is passed to spamc is just the user simscan is running as (clamav). I think it was *designed* to run likt this because simscan at SMTP transaction time keeps the connection open until scanning is complete. Theoretically, you could change simscan to execute spamc once for each recipient (resolving aliases too) but that would hold up the smtp connection a long time if there are lots of recipients.

This design is a compromise between performance and configuration granularity. 

The only workable solutions I can think of are:

- Run spamassassin at the mail delivery level (maildrop).
- Run two instances of spamassassin: once via simscan (which blocks the bulk of spam) then again at the user level).

And a dirty idea that really against the whole idea of simscan:

- run two instances of qmail: one on port 25 receives mail breaks messages apart into individual recipients and delivers each message one by one (the default qmail behavior, I think); then another qmail on port 2500 running simscan that receives mail from the first one. Actually, this doesn't solve the user aliases problem.

Anybody else have any other ideas?

Quinn


---------------------------------------------------------------------
Strangecode :: Internet Consultancy
http://www.strangecode.com/



On Fri, 8 Dec 2006 19:39:42 -0600 (CST), C. Bensend wrote:
> 
> Hey folks,
> 
>    So, I've been giving this some thought in the last week, as I'm
> running into the old "either site bayes or per-user bayes, nothing
> in between" issue.  I'm using simscan, which passes the first email
> address to spamc, so for me it's a per-email-address limitation.
> 
>    For a majority of my users, that's fine - they only have _one_
> email address.  For me, it's a problem, as I have dozens of email
> addresses that are delivered to me, and sorted via maildrop.  Many
> of these secondary addresses get tons of spam, but because they're
> delivered to aliases, SA never applies bayes scoring, because the
> "user" doesn't match the user my bayes database uses (using SQL,
> of course).
> 
>    I would _love_ to have a bayes equivalent of
> user_score_sql_custom_query, where spamd would query a table
> consisting of something like so:
> 
> email_alias  CHAR(64)
> email_user   CHAR(64)
> 
> or something similar.  That way, I could populate it with data like:
> 
> benny@bennyvision.com       benny@bennyvision.com
> alias1@bennyvision.com      benny@bennyvision.com
> alias2@bennyvision.com      benny@bennyvision.com
> alias93@bennyvision.com     theotheraccount@bennyvision.com
> alias210@bennyvision.com    benny@bennyvision.com
> 
> etc...
> 
>    So, in this scenario, an email comes in destined to one of the
> many secondary email addresses.  spamd makes a query ("SELECT
> email_user FROM aliases WHERE email_alias = '$user'").  If spamd
> gets a hit, great, try to initialize the bayes database for that
> user.  If not, skip bayes and go on with life.
> 
>    Just a thought.  It would certainly help me in my situation,
> but perhaps I'm just spending a little too much quality time with
> the crackpipe.
> 
> Good idea?  Bad idea?  Dumb idea?
> 
> Benny
> 
> 
> -- 
> "The faster you finish the fight, the less shot you will get."
>                                             -- Marine Corps Rules for
>                                                Gunfighting
> 
>