You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spamassassin.apache.org by Apache Wiki <wi...@apache.org> on 2006/05/11 17:21:10 UTC

[Spamassassin Wiki] Update of "BetterDocumentation/SqlReadmeBayes" by MichaelParker

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by MichaelParker:
http://wiki.apache.org/spamassassin/BetterDocumentation/SqlReadmeBayes

------------------------------------------------------------------------------
  NOTE: sa-learn must be run as the user who's data you are loading, or
        you must make use of the bayes_sql_override_username config
        option.
- 
- 
- Apache Home Apache EmUsing A SQL Database for Bayesian Storage Module
- -------------------------------------------------------
- 
- SpamAssassin can now store users' bayesian filter data in a SQL
- database. The  most common use for a system like this would be for
- users to be able to have per user bayesian filter data on systems
- where users may not have a home directory to store the data.
- 
- In order to activate the SQL based bayesian storage you have to
- configure spamassassin and spamd to use a different bayes storage
- module.  This can be done via a setting in the global configuration
- file.
- 
- The directives required to turn on the SQL based bayesian storage are:
- 
- bayes_store_module		   Mail::SpamAssassin::BayesStore::SQL
- 
- This directive is used by the Bayes module to determine which storage
- module should be used.  If not set it will default to:
- Mail::SpamAssassin::BayesStore::DBM
- 
- PostgreSQL users will want to use the PostgreSQL specific storage
- module:
- bayes_store_module               Mail::SpamAssassin::BayesStore::PgSQL
- This module provides a slightly different interface to makes better
- use of the resources that PostgreSQL offers.  In addition, please make
- sure that you follow the instructions below for loading the proper
- procedural language and installing the tables and stored procedure.
- 
- There is also a MySQL specific storage driver available to provides a
- small boost in performance.  It requires version 4.1 or above of the
- MySQL database software to work properly.  In addition, it provides
- rollback on error functionality if you create your bayes database
- table using the InnoDB storage engine (ie s/MyISAM/InnoDB/ on the
- bayes_mysql.sql file).  WARNING: Using this module with a version of
- MySQL < 4.1 could have unexpected results.  To use the MySQL 4.1+
- specific module set your bayes_store_module directive accordingly:
- bayes_store_module               Mail::SpamAssassin::BayesStore::MySQL
- 
- Additional configuration directives provided by BayesSQL:
- 
- bayes_sql_dsn			   DBI:driver:database:hostname[:port]
- bayes_sql_username		   dbusername
- bayes_sql_password		   dbpassword
- 
- The bayes_sql_dsn directive describes the data source name that will
- be used to create the connection to your SQL server.  It MUST be in
- the format as listed above.  <driver> should be the DBD driver that
- you have installed to access your database (initially tested with
- MySQL, PostgreSQL, and SQLite).  <database> must be the name of the
- database that you created to store the bayes data tables. <hostname>
- is the name of the host that contains the SQL database  server.
- <port> is the optional port number where your database server is
- listening.
- 
- For an example of connection to PostgreSQL, see the main README file.
- 
- In addition to the global configuration directives there is a user
- preference:
- 
- bayes_sql_override_username	   someusername
- 
- This directive, if used, will override the username used for storing
- data in the database.  This could be used to group users together to
- share bayesian filter data.  You can also use this config option to
- trick sa-learn to learn data as a specific user.
- 
- 
- Requirements
- ------------
- 
- In order for SpamAssassin to work with your SQL database, you must
- have the perl DBI module installed, AS WELL AS the DBD driver/module
- for your specific database.  For example, if using MySQL as your
- RDBMS, you must have the DBD::mysql module installed.  Check CPAN for
- the latest versions of DBI and your database driver/module. 
- 
- The BayesStore::SQL module was tested with:
- 
- DBI-1.38
- DBD-mysql-2.9002
- perl v5.8.0
- 
- But older versions should work fine as the SQL code in SpamAssassin is as 
- simple as could be.
- 
- NOTE: Some users have reported problems using DBD::Pg v1.22. There appears
- to be a bug in how parameters are quoted in that version, we recommend you
- upgrade to at least v1.31.
- 
- In addition, there appears to be a quote bug in some versions of PostgreSQL.
- It's unclear when the bug was fixed.  7.4.2 seems to work just fine, however
- some have reported problems with 7.3.6.  Your milage may vary with versions
- less than 7.4.2.  If you happen to know when the specific bug was fixed please
- feel free to notify the dev team so they can update this documentation.
- 
- Database Schema
- ---------------
- 
- The database schema for storage of the bayesian filter data contains
- several different tables.  Several sample SQL schemas have been
- included in to help in setting up your database.  The schemas contain
- the minimum tables and columns necessary to work with the code as
- written.  You are free to add other columns as needed for your local
- implementation.  Presently there is no way to override the table and
- column names used by the BayesStore::SQL code, this feature may be added
- in the future.
- 
- Example setup of bayes tables for MySQL:
- 
- This assumes that you have already created a database for use with
- spamassassin and setup a username/password that can access that database.
- (See "Creating A Database", in "sql/README", if you don't have a suitable
- database ready.)
- 
- To install the tables using the included example, use the following command:
- 
- mysql -h <hostname> -u <adminusername> -p databasename < bayes_mysql.sql
- Enter password: <adminpassword>
- 
- To install the tables for PostgreSQL, use:
- 
- You need to install the plpgsql procedural language (assuming it
- hasn't already been installed in template1). You can do this by
- running the following command:
- 
- createlang plpgsql <databasename>
- 
- and then:
- 
- psql -U <username> -f bayes_pg.sql <databasename>
- 
- From a security viewpoint, it is wise to grant the minimum privileges
- needed to the user on the bayes tables.  These grants largely depend
- on your system policies but here are some example grants to get you
- started, please review carefully before putting into production:
- 
- GRANT SELECT, UPDATE, DELETE, INSERT ON TABLE bayes_token TO <username>;
- GRANT SELECT, UPDATE, DELETE, INSERT ON TABLE bayes_vars TO <username>;
- GRANT SELECT, DELETE, INSERT ON TABLE bayes_seen TO <username>;
- GRANT SELECT, DELETE, INSERT ON TABLE bayes_expire TO <username>;
- GRANT SELECT ON TABLE bayes_global_vars TO <username>;
- GRANT UPDATE, SELECT, INSERT ON bayes_vars_id_seq TO <username>;
- 
- Once you have created the database and added the tables, just add the
- required lines to your global configuration file (local.cf).
- 
- Testing SpamAssassin/SQL
- ------------------------
- 
- To test your SQL setup, and debug any possible problems, you should
- start spamd with the -D option, which will keep spamd in the
- foreground, and will output debug message to the terminal. You should
- then test spamd with a message by calling spamc.  You can use the
- sample-spam.txt file with the following command:
- 
- cat sample-spam.txt | spamc
- 
- Watch the debug output from spamd and look for the following debug
- line:
- 
- debug: bayes: Database connection established
- debug: bayes: Using username: <username>
- 
- If you do not see the above text, then the SQL query was not
- successful, and you should see any error messages reported.
- 
- This code has been tested using MySQL as the RDMS, with basic tests
- against PostgreSQL and SQLite.  It does require a database that allows
- you to refer to a column on the right hand side of an expression (ie
- update foo set bar = bar + 1).  Any database driver that allows for
- that usage should work with the BayesStore::SQL code.  NOTE: You may
- find that some implementations do not provide a significant advantage
- over using the default DBM implementation.  If you find a driver that
- should work and has issues, please report them to the SADev list.
- 
- ******
- NB:  This should be considered BETA, and the interface, schema, or
- overall operation of SQL support may change at any time with future
- releases of SA.
- ******
- 
- Converting Bayes Data From a DBM Database
- -----------------------------------------
- 
- Converting your bayes database data from Berkeley (DBM) based storage
- to SQL based storage is as simple as a backup and then restore.
- 
- If you are upgrading from a previous version of SpamAssassin you
- should first follow any recommended upgrade instructions for that
- release, in most cases this will be as simple as running an
- sa-learn --sync
- 
- Once you have performed this upgrade, for each bayes database follow
- this procedure:
- 
- o Run 'sa-learn --backup > backup.txt' which will backup your bayes
-   data into a text file.
- o Optionally you can run 'sa-learn --clear' to remove the DBM based
-   bayes files.
- o Modify your local.cf file according to the directions above.
- o Run 'sa-learn --restore backup.txt' to restore your bayes data to
-   the SQL database.
- 
- NOTE: sa-learn must be run as the user who's data you are loading, or
-       you must make use of the bayes_sql_override_username config
-       option.
  }}}
  
  [[Anchor(comments)]]