You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2006/11/13 12:10:34 UTC

Re: BayesStore/SQL.pm proposed update.

hi Cedric --

I'm not the SQL/Bayes guru though, so I can't guarantee it'll get in, but
in my opinion it certainly sounds interesting.  thanks!

In terms of procedure -- it would be best to open a "bug" at our bug
tracker (see website) and attach the patch there.  We track all code
contributions through that, as it provides an easy way to upload
attachments and find the discussion later, and it's also easy to
track them and ensure they don't get forgotten.

--j.

Cedric Lejeune writes:
> Hi list!
> 
> First, this is my first submission to a project, so please excuse me if 
> I do something that do not follow the rules.
> 
> I'm in charged with spamassassin (SA) and after an update from 3.0 to 
> 3.1, users starts to complain about SA improved effectiveness (to more 
> false positive). After some investigation, the problem seems to come 
> from bayesian filter. I use a global bayesian database (using 
> bayes_sql_override_username) but with customers coming from all over the 
> world, it seems a more fine grain filtering should be better. But it 
> seems it is currently impossible. I use SQL bayesian database setup and 
> it should have been great if it has offered the same kind of feature 
> that user_scores_sql_custom_query. What I wanted to do is make bayesian 
> filter retrieve database first for the current user, then for the 
> routing domain to which user belongs to, then fall back to global 
> database. For instance:
> 
> toto@foo.bar -> *@foo.bar -> *
> 
> This way, it is possible for users to have their own personal bayes 
> database and if they do no want to create one or they do no have one 
> already, they still can benefit from others' databases. Grouping users 
> per routing domain is like grouping users per center of interest: if one 
> declares mail as spam, there is little few chance that others consider 
> it as ham.
> 
> So this feature was not implemented yet and I have started writing it 
> myself. I warn you that I did not write a single perl line before and 
> "my" code is made of doc quotes and cut & paste. I've added the 
> following configuration option bayes_sql_custom_query to SA 
> configuration file. It acts the same way bayes_sql_override_username does.
> 
> Please, let me know if this feature could be useful to others and/or if 
> it requires some rewriting.
> 
> Please, find diff as attachments. They applied against SA 3.1.4 because 
> it is the SA version shipped with Debian testing at this time.
> 
> Best regards =)
> 
> cedric.
> --- SpamAssassin/Conf.pm	2006-08-12 18:08:44.000000000 +0200
> +++ Conf.pm	2006-11-13 10:37:16.000000000 +0100
> @@ -2330,6 +2330,52 @@
>      type => $CONF_TYPE_STRING
>    });
>  
> +#### Start of modification (last modified on 20061109).
> +=item bayes_sql_custom_query query 
> +
> +This option gives you the ability to create a custom SQL query to
> +retrieve username.  In order to work correctly your query should
> +return only one value, the desired username. In addition, there
> +are several "variables" that you can use as part of your query,
> +these variables will be substituted for the current values right
> +before the query is run.  The current allowed variables are:
> +
> +=over 2
> +
> +=item _USERNAME_
> +
> +The current user's username.
> +
> +=item _DOMAIN_
> +
> +The portion after the @ as derived from the current user's username, this
> +value may be null.
> +
> +=back
> +
> +The query must be one continuous line in order to parse correctly.
> +
> +Here is an example query, please note that it is broken up for easy
> +reading, in your config it should be one continuous line.
> +
> +=over 1
> +
> +=item Current default query:
> +
> +C<SELECT username FROM bayes_vars WHERE username = '*' OR Username = CONCAT('*@',_DOMAIN_) OR Username = _USERNAME_ ORDER BY username ASC>
> +
> +=back
> +
> +=cut
> +
> +  push (@cmds, {
> +    setting => 'bayes_sql_custom_query',
> +    is_admin => 1,
> +    type => $CONF_TYPE_STRING
> +  });
> +
> +#### End of modification.
> +
>  =item bayes_sql_username_authorized ( 0 | 1 )  (default: 0)
>  
>  Whether to call the services_authorized_for_username plugin hook in BayesSQL.
> --- SpamAssassin/BayesStore/SQL.pm	2005-08-11 09:00:37.000000000 +0200
> +++ SQL.pm.bayesstore	2006-11-13 11:15:43.000000000 +0100
> @@ -85,15 +85,70 @@
>    if ($self->{bayes}->{conf}->{bayes_sql_override_username}) {
>      $self->{_username} = $self->{bayes}->{conf}->{bayes_sql_override_username};
>    }
> +#### Start of modification (last modified on 20061113).
>    else {
> -    $self->{_username} = $self->{bayes}->{main}->{username};
> +	if ($self->{bayes}->{conf}->{bayes_sql_custom_query}) {
>  
> -    # Need to make sure that a username is set, so just in case there is
> -    # no username set in main, set one here.
> -    unless ($self->{_username}) {
> -      $self->{_username} = "GLOBALBAYES";
> -    }
> +                # Connect to database.
> +                return 0 unless ($self->_connect_db());
> +
> +                # Retrieve current username and play with it.
> +                my $username = $self->{bayes}->{main}->{username};
> +                my ($mailbox, $domain) = split('@', $username);
> +
> +                my $quoted_username = $self->{_dbh}->quote($username);
> +                my $quoted_domain = $self->{_dbh}->quote($domain);
> +
> +                my $custom_query = $self->{bayes}->{conf}->{bayes_sql_custom_query};
> +                $custom_query =~ s/_USERNAME_/$quoted_username/g;
> +                $custom_query =~ s/_DOMAIN_/$quoted_domain/g;
> +
> +                dbg("bayes: new: quoted_username = ".$quoted_username);
> +                dbg("bayes: new: quoted_domain = ".$quoted_domain);
> +                dbg("bayes: new: custom_query = ".$custom_query);
> +
> +                # Prepare query.
> +                my $sth = $self->{_dbh}->prepare($custom_query);
> +                unless (defined($sth)) {
> +                        dbg("bayes: new: SQL error: ".$self->{_dbh}->errstr());
> +                        return 0;
> +                }
> +
> +                # Execute query.
> +                my $rc = $sth->execute();
> +                unless ($rc) {
> +                        dbg("bayes: new: SQL error: ".$self->{_dbh}->errstr());
> +                        return 0;
> +                }
> +
> +		# Retrieve _username.
> +                my $ary_ref = $sth->fetchall_arrayref();
> +                $self->{_username} = $ary_ref->[-1]->[-1];
> +
> +                dbg("bayes: new: _username = ".$self->{_username});
> +
> +                # Tell database server to free buffer allocated to query.
> +                $sth->finish();
> +
> +                # Close database connection.
> +                $self->{_dbh}->disconnect();
> +
> +                # Set _dbh to initial state.
> +                $self->{_dbh} = undef;
> +
> +  	}
> +#### End of modification.
> +	else {
> +		$self->{_username} = $self->{bayes}->{main}->{username};
> +	}
> +  }
> +	
> +  # Need to make sure that a username is set, so just in case there is
> +  # no username set in main, set one here.
> +  unless ($self->{_username}) {
> +    $self->{_username} = "GLOBALBAYES";
>    }
> +
>    dbg("bayes: using username: ".$self->{_username});
>  
>    return $self;