You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Cedric Lejeune <ce...@arcelor.com> on 2006/11/13 11:58:19 UTC
BayesStore/SQL.pm proposed update.
Hi list!
First, this is my first submission to a project, so please excuse me if
I do something that do not follow the rules.
I'm in charged with spamassassin (SA) and after an update from 3.0 to
3.1, users starts to complain about SA improved effectiveness (to more
false positive). After some investigation, the problem seems to come
from bayesian filter. I use a global bayesian database (using
bayes_sql_override_username) but with customers coming from all over the
world, it seems a more fine grain filtering should be better. But it
seems it is currently impossible. I use SQL bayesian database setup and
it should have been great if it has offered the same kind of feature
that user_scores_sql_custom_query. What I wanted to do is make bayesian
filter retrieve database first for the current user, then for the
routing domain to which user belongs to, then fall back to global
database. For instance:
toto@foo.bar -> *@foo.bar -> *
This way, it is possible for users to have their own personal bayes
database and if they do no want to create one or they do no have one
already, they still can benefit from others' databases. Grouping users
per routing domain is like grouping users per center of interest: if one
declares mail as spam, there is little few chance that others consider
it as ham.
So this feature was not implemented yet and I have started writing it
myself. I warn you that I did not write a single perl line before and
"my" code is made of doc quotes and cut & paste. I've added the
following configuration option bayes_sql_custom_query to SA
configuration file. It acts the same way bayes_sql_override_username does.
Please, let me know if this feature could be useful to others and/or if
it requires some rewriting.
Please, find diff as attachments. They applied against SA 3.1.4 because
it is the SA version shipped with Debian testing at this time.
Best regards =)
cedric.
Re: BayesStore/SQL.pm proposed update.
Posted by Cedric Lejeune <ce...@arcelor.com>.
Hi Michael,
As explained before, the problem is I am a completely perl noob. I have never written a single line of perl before. I understand
what you want, but I do not have a single idea of how to do this, sorry. I have changed SQL.pm and added configuration option
because it seems to be the simpliest solution to me and I have tried to not altere to way SQL.pm worked before. That is, I try
to keep this processing order:
bayes_sql_override_username -> bayes_sql_custom_query -> current user running spamassassin -> default
I would really like to do more, but I do not have enough time to "learn" perl. I had a need and I have tried to solve it. I just
wanted to contribute to spamassassin project providing resulting patchs.
By the way, you can use "my" code if you want. Spamassassin team and contributors should be credited for this piece of code as I
only do cut & paste and some writing based on perl doc.
If I find some free time, I may try to do what you ask, the subclass and the like, but I cannot garantee nothing.
Thanks everyone,
cedric.
Michael Parker wrote:
> Hi Cedric,
>
> The BayesStore API is designed in such a way that implementing a
> separate store for this sort of thing would be super easy. I suggest
> you subclass SQL and make the changes and call it something new.
>
> I'm probably -1 on changing the top level, fairly generic, SQL.pm to do
> what you're asking, but would have to examine the change a little more
> before I made a final determination.
>
> Michael
>
>
> Cedric Lejeune wrote:
>> Hi list!
>>
>> First, this is my first submission to a project, so please excuse me if
>> I do something that do not follow the rules.
>>
>> I'm in charged with spamassassin (SA) and after an update from 3.0 to
>> 3.1, users starts to complain about SA improved effectiveness (to more
>> false positive). After some investigation, the problem seems to come
>> from bayesian filter. I use a global bayesian database (using
>> bayes_sql_override_username) but with customers coming from all over the
>> world, it seems a more fine grain filtering should be better. But it
>> seems it is currently impossible. I use SQL bayesian database setup and
>> it should have been great if it has offered the same kind of feature
>> that user_scores_sql_custom_query. What I wanted to do is make bayesian
>> filter retrieve database first for the current user, then for the
>> routing domain to which user belongs to, then fall back to global
>> database. For instance:
>>
>> toto@foo.bar -> *@foo.bar -> *
>>
>> This way, it is possible for users to have their own personal bayes
>> database and if they do no want to create one or they do no have one
>> already, they still can benefit from others' databases. Grouping users
>> per routing domain is like grouping users per center of interest: if one
>> declares mail as spam, there is little few chance that others consider
>> it as ham.
>>
>> So this feature was not implemented yet and I have started writing it
>> myself. I warn you that I did not write a single perl line before and
>> "my" code is made of doc quotes and cut & paste. I've added the
>> following configuration option bayes_sql_custom_query to SA
>> configuration file. It acts the same way bayes_sql_override_username does.
>>
>> Please, let me know if this feature could be useful to others and/or if
>> it requires some rewriting.
>>
>> Please, find diff as attachments. They applied against SA 3.1.4 because
>> it is the SA version shipped with Debian testing at this time.
>>
>> Best regards =)
>>
>> cedric.
>>
>>
>> ------------------------------------------------------------------------
>>
>> --- SpamAssassin/Conf.pm 2006-08-12 18:08:44.000000000 +0200
>> +++ Conf.pm 2006-11-13 10:37:16.000000000 +0100
>> @@ -2330,6 +2330,52 @@
>> type => $CONF_TYPE_STRING
>> });
>>
>> +#### Start of modification (last modified on 20061109).
>> +=item bayes_sql_custom_query query
>> +
>> +This option gives you the ability to create a custom SQL query to
>> +retrieve username. In order to work correctly your query should
>> +return only one value, the desired username. In addition, there
>> +are several "variables" that you can use as part of your query,
>> +these variables will be substituted for the current values right
>> +before the query is run. The current allowed variables are:
>> +
>> +=over 2
>> +
>> +=item _USERNAME_
>> +
>> +The current user's username.
>> +
>> +=item _DOMAIN_
>> +
>> +The portion after the @ as derived from the current user's username, this
>> +value may be null.
>> +
>> +=back
>> +
>> +The query must be one continuous line in order to parse correctly.
>> +
>> +Here is an example query, please note that it is broken up for easy
>> +reading, in your config it should be one continuous line.
>> +
>> +=over 1
>> +
>> +=item Current default query:
>> +
>> +C<SELECT username FROM bayes_vars WHERE username = '*' OR Username = CONCAT('*@',_DOMAIN_) OR Username = _USERNAME_ ORDER BY username ASC>
>> +
>> +=back
>> +
>> +=cut
>> +
>> + push (@cmds, {
>> + setting => 'bayes_sql_custom_query',
>> + is_admin => 1,
>> + type => $CONF_TYPE_STRING
>> + });
>> +
>> +#### End of modification.
>> +
>> =item bayes_sql_username_authorized ( 0 | 1 ) (default: 0)
>>
>> Whether to call the services_authorized_for_username plugin hook in BayesSQL.
>>
>>
>> ------------------------------------------------------------------------
>>
>> --- SpamAssassin/BayesStore/SQL.pm 2005-08-11 09:00:37.000000000 +0200
>> +++ SQL.pm.bayesstore 2006-11-13 11:15:43.000000000 +0100
>> @@ -85,15 +85,70 @@
>> if ($self->{bayes}->{conf}->{bayes_sql_override_username}) {
>> $self->{_username} = $self->{bayes}->{conf}->{bayes_sql_override_username};
>> }
>> +#### Start of modification (last modified on 20061113).
>> else {
>> - $self->{_username} = $self->{bayes}->{main}->{username};
>> + if ($self->{bayes}->{conf}->{bayes_sql_custom_query}) {
>>
>> - # Need to make sure that a username is set, so just in case there is
>> - # no username set in main, set one here.
>> - unless ($self->{_username}) {
>> - $self->{_username} = "GLOBALBAYES";
>> - }
>> + # Connect to database.
>> + return 0 unless ($self->_connect_db());
>> +
>> + # Retrieve current username and play with it.
>> + my $username = $self->{bayes}->{main}->{username};
>> + my ($mailbox, $domain) = split('@', $username);
>> +
>> + my $quoted_username = $self->{_dbh}->quote($username);
>> + my $quoted_domain = $self->{_dbh}->quote($domain);
>> +
>> + my $custom_query = $self->{bayes}->{conf}->{bayes_sql_custom_query};
>> + $custom_query =~ s/_USERNAME_/$quoted_username/g;
>> + $custom_query =~ s/_DOMAIN_/$quoted_domain/g;
>> +
>> + dbg("bayes: new: quoted_username = ".$quoted_username);
>> + dbg("bayes: new: quoted_domain = ".$quoted_domain);
>> + dbg("bayes: new: custom_query = ".$custom_query);
>> +
>> + # Prepare query.
>> + my $sth = $self->{_dbh}->prepare($custom_query);
>> + unless (defined($sth)) {
>> + dbg("bayes: new: SQL error: ".$self->{_dbh}->errstr());
>> + return 0;
>> + }
>> +
>> + # Execute query.
>> + my $rc = $sth->execute();
>> + unless ($rc) {
>> + dbg("bayes: new: SQL error: ".$self->{_dbh}->errstr());
>> + return 0;
>> + }
>> +
>> + # Retrieve _username.
>> + my $ary_ref = $sth->fetchall_arrayref();
>> + $self->{_username} = $ary_ref->[-1]->[-1];
>> +
>> + dbg("bayes: new: _username = ".$self->{_username});
>> +
>> + # Tell database server to free buffer allocated to query.
>> + $sth->finish();
>> +
>> + # Close database connection.
>> + $self->{_dbh}->disconnect();
>> +
>> + # Set _dbh to initial state.
>> + $self->{_dbh} = undef;
>> +
>> + }
>> +#### End of modification.
>> + else {
>> + $self->{_username} = $self->{bayes}->{main}->{username};
>> + }
>> + }
>> +
>> + # Need to make sure that a username is set, so just in case there is
>> + # no username set in main, set one here.
>> + unless ($self->{_username}) {
>> + $self->{_username} = "GLOBALBAYES";
>> }
>> +
>> dbg("bayes: using username: ".$self->{_username});
>>
>> return $self;
>
Re: BayesStore/SQL.pm proposed update.
Posted by Michael Parker <pa...@pobox.com>.
Hi Cedric,
The BayesStore API is designed in such a way that implementing a
separate store for this sort of thing would be super easy. I suggest
you subclass SQL and make the changes and call it something new.
I'm probably -1 on changing the top level, fairly generic, SQL.pm to do
what you're asking, but would have to examine the change a little more
before I made a final determination.
Michael
Cedric Lejeune wrote:
> Hi list!
>
> First, this is my first submission to a project, so please excuse me if
> I do something that do not follow the rules.
>
> I'm in charged with spamassassin (SA) and after an update from 3.0 to
> 3.1, users starts to complain about SA improved effectiveness (to more
> false positive). After some investigation, the problem seems to come
> from bayesian filter. I use a global bayesian database (using
> bayes_sql_override_username) but with customers coming from all over the
> world, it seems a more fine grain filtering should be better. But it
> seems it is currently impossible. I use SQL bayesian database setup and
> it should have been great if it has offered the same kind of feature
> that user_scores_sql_custom_query. What I wanted to do is make bayesian
> filter retrieve database first for the current user, then for the
> routing domain to which user belongs to, then fall back to global
> database. For instance:
>
> toto@foo.bar -> *@foo.bar -> *
>
> This way, it is possible for users to have their own personal bayes
> database and if they do no want to create one or they do no have one
> already, they still can benefit from others' databases. Grouping users
> per routing domain is like grouping users per center of interest: if one
> declares mail as spam, there is little few chance that others consider
> it as ham.
>
> So this feature was not implemented yet and I have started writing it
> myself. I warn you that I did not write a single perl line before and
> "my" code is made of doc quotes and cut & paste. I've added the
> following configuration option bayes_sql_custom_query to SA
> configuration file. It acts the same way bayes_sql_override_username does.
>
> Please, let me know if this feature could be useful to others and/or if
> it requires some rewriting.
>
> Please, find diff as attachments. They applied against SA 3.1.4 because
> it is the SA version shipped with Debian testing at this time.
>
> Best regards =)
>
> cedric.
>
>
> ------------------------------------------------------------------------
>
> --- SpamAssassin/Conf.pm 2006-08-12 18:08:44.000000000 +0200
> +++ Conf.pm 2006-11-13 10:37:16.000000000 +0100
> @@ -2330,6 +2330,52 @@
> type => $CONF_TYPE_STRING
> });
>
> +#### Start of modification (last modified on 20061109).
> +=item bayes_sql_custom_query query
> +
> +This option gives you the ability to create a custom SQL query to
> +retrieve username. In order to work correctly your query should
> +return only one value, the desired username. In addition, there
> +are several "variables" that you can use as part of your query,
> +these variables will be substituted for the current values right
> +before the query is run. The current allowed variables are:
> +
> +=over 2
> +
> +=item _USERNAME_
> +
> +The current user's username.
> +
> +=item _DOMAIN_
> +
> +The portion after the @ as derived from the current user's username, this
> +value may be null.
> +
> +=back
> +
> +The query must be one continuous line in order to parse correctly.
> +
> +Here is an example query, please note that it is broken up for easy
> +reading, in your config it should be one continuous line.
> +
> +=over 1
> +
> +=item Current default query:
> +
> +C<SELECT username FROM bayes_vars WHERE username = '*' OR Username = CONCAT('*@',_DOMAIN_) OR Username = _USERNAME_ ORDER BY username ASC>
> +
> +=back
> +
> +=cut
> +
> + push (@cmds, {
> + setting => 'bayes_sql_custom_query',
> + is_admin => 1,
> + type => $CONF_TYPE_STRING
> + });
> +
> +#### End of modification.
> +
> =item bayes_sql_username_authorized ( 0 | 1 ) (default: 0)
>
> Whether to call the services_authorized_for_username plugin hook in BayesSQL.
>
>
> ------------------------------------------------------------------------
>
> --- SpamAssassin/BayesStore/SQL.pm 2005-08-11 09:00:37.000000000 +0200
> +++ SQL.pm.bayesstore 2006-11-13 11:15:43.000000000 +0100
> @@ -85,15 +85,70 @@
> if ($self->{bayes}->{conf}->{bayes_sql_override_username}) {
> $self->{_username} = $self->{bayes}->{conf}->{bayes_sql_override_username};
> }
> +#### Start of modification (last modified on 20061113).
> else {
> - $self->{_username} = $self->{bayes}->{main}->{username};
> + if ($self->{bayes}->{conf}->{bayes_sql_custom_query}) {
>
> - # Need to make sure that a username is set, so just in case there is
> - # no username set in main, set one here.
> - unless ($self->{_username}) {
> - $self->{_username} = "GLOBALBAYES";
> - }
> + # Connect to database.
> + return 0 unless ($self->_connect_db());
> +
> + # Retrieve current username and play with it.
> + my $username = $self->{bayes}->{main}->{username};
> + my ($mailbox, $domain) = split('@', $username);
> +
> + my $quoted_username = $self->{_dbh}->quote($username);
> + my $quoted_domain = $self->{_dbh}->quote($domain);
> +
> + my $custom_query = $self->{bayes}->{conf}->{bayes_sql_custom_query};
> + $custom_query =~ s/_USERNAME_/$quoted_username/g;
> + $custom_query =~ s/_DOMAIN_/$quoted_domain/g;
> +
> + dbg("bayes: new: quoted_username = ".$quoted_username);
> + dbg("bayes: new: quoted_domain = ".$quoted_domain);
> + dbg("bayes: new: custom_query = ".$custom_query);
> +
> + # Prepare query.
> + my $sth = $self->{_dbh}->prepare($custom_query);
> + unless (defined($sth)) {
> + dbg("bayes: new: SQL error: ".$self->{_dbh}->errstr());
> + return 0;
> + }
> +
> + # Execute query.
> + my $rc = $sth->execute();
> + unless ($rc) {
> + dbg("bayes: new: SQL error: ".$self->{_dbh}->errstr());
> + return 0;
> + }
> +
> + # Retrieve _username.
> + my $ary_ref = $sth->fetchall_arrayref();
> + $self->{_username} = $ary_ref->[-1]->[-1];
> +
> + dbg("bayes: new: _username = ".$self->{_username});
> +
> + # Tell database server to free buffer allocated to query.
> + $sth->finish();
> +
> + # Close database connection.
> + $self->{_dbh}->disconnect();
> +
> + # Set _dbh to initial state.
> + $self->{_dbh} = undef;
> +
> + }
> +#### End of modification.
> + else {
> + $self->{_username} = $self->{bayes}->{main}->{username};
> + }
> + }
> +
> + # Need to make sure that a username is set, so just in case there is
> + # no username set in main, set one here.
> + unless ($self->{_username}) {
> + $self->{_username} = "GLOBALBAYES";
> }
> +
> dbg("bayes: using username: ".$self->{_username});
>
> return $self;