You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "Gerald V. Livingston II" <ge...@sysmatrix.net> on 2005/05/01 07:18:35 UTC

SA + SQL + per-user prefs

OK, this is probably just an over-cautious MySQL question.

All of the examples I look at for setting up per-user prefs using SQL show
creating a table that looks like:

username  pref  value

So, if I want to allow users to control 5 values I would have a table that
looks like thsi:

user1  pref1  value1
user1  pref2  value2
user1  pref3  value3
user1  pref4  value4
user1  pref5  value5
user2  pref1  value1
user2  pref2  value2
user2  pref3  value3
user2  pref4  value4
user2  pref5  value5
user3 ..... etc.

When talking about importing a userbase of 6000+ that's gonna be a TALL
table really fast.

Is there no way to do it with a table that looks like this?:

username pref1 value1 pref2 value2 pref3 value3 pref4 value4 pref5 value5

So we'd end up with:

user1 pref1 value1 pref2 value2 pref3 value3 pref4 value4 pref5 value5
user2 pref1 value1 pref2 value2 pref3 value3 pref4 value4 pref5 value5
user3 pref1 value1 pref2 value2 pref3 value3 pref4 value4 pref5 value5
user4 pref1 value1 pref2 value2 pref3 value3 pref4 value4 pref5 value5
etc.

Only one row per username rather than one row for each pref for each
username.

The only prefs I'm not sure about working in this fashion are things like
white/black lists and allowed language choices. Things that can have
multiple entries that should be additive and can have a different number of
entries for each user.

And to add to the fun, I'm going to have to create a flat file with all of
the info to import current customer data into each table. I'm exporting to
comma delimited from an Access database. Massaging in Excel to remove extra
columns and add default data to newly inserted columns. Moving to the linux
box and importing to MySql with LOAD DATA LOCAL INFILE.

And I need to have this operational by the week of May 30th. Cool beans. I
haven't even figured out how to do Bayes in SQL yet -- or how to have my
users be able to TRAIN bayes since most will be downloading with POP3. How
the heck are they going to get the messages back into a place where
sa-learn can hit them?

Postfix 2.1.5
SA 3.0.2
MySQL 4.0.24

and, to make life difficult on the "using SQL" end

Amavisd-new 20030616p10

Ideas or a more complete walkthrough for setup.

TIA

Gerald


Re: SA + SQL + per-user prefs

Posted by JamesDR <ro...@bellsouth.net>.
Gerald V. Livingston II wrote:
> OK, this is probably just an over-cautious MySQL question.
> 
> All of the examples I look at for setting up per-user prefs using SQL show
> creating a table that looks like:
> 
> username  pref  value
> 
> So, if I want to allow users to control 5 values I would have a table that
> looks like thsi:
> 
> user1  pref1  value1
> user1  pref2  value2
> user1  pref3  value3
> user1  pref4  value4
> user1  pref5  value5
> user2  pref1  value1
> user2  pref2  value2
> user2  pref3  value3
> user2  pref4  value4
> user2  pref5  value5
> user3 ..... etc.
> 
> When talking about importing a userbase of 6000+ that's gonna be a TALL
> table really fast.
> 
> Is there no way to do it with a table that looks like this?:
> 
> username pref1 value1 pref2 value2 pref3 value3 pref4 value4 pref5 value5
> 
> So we'd end up with:
> 
> user1 pref1 value1 pref2 value2 pref3 value3 pref4 value4 pref5 value5
> user2 pref1 value1 pref2 value2 pref3 value3 pref4 value4 pref5 value5
> user3 pref1 value1 pref2 value2 pref3 value3 pref4 value4 pref5 value5
> user4 pref1 value1 pref2 value2 pref3 value3 pref4 value4 pref5 value5
> etc.
> 
> Only one row per username rather than one row for each pref for each
> username.
> 
> The only prefs I'm not sure about working in this fashion are things like
> white/black lists and allowed language choices. Things that can have
> multiple entries that should be additive and can have a different number of
> entries for each user.
> 
> And to add to the fun, I'm going to have to create a flat file with all of
> the info to import current customer data into each table. I'm exporting to
> comma delimited from an Access database. Massaging in Excel to remove extra
> columns and add default data to newly inserted columns. Moving to the linux
> box and importing to MySql with LOAD DATA LOCAL INFILE.
> 
> And I need to have this operational by the week of May 30th. Cool beans. I
> haven't even figured out how to do Bayes in SQL yet -- or how to have my
> users be able to TRAIN bayes since most will be downloading with POP3. How
> the heck are they going to get the messages back into a place where
> sa-learn can hit them?
> 
> Postfix 2.1.5
> SA 3.0.2
> MySQL 4.0.24
> 
> and, to make life difficult on the "using SQL" end
> 
> Amavisd-new 20030616p10
> 
> Ideas or a more complete walkthrough for setup.
> 
> TIA
> 
> Gerald
> 
> 
check out (watch for word wrap)
http://wiki.apache.org/spamassassin/UsingSQL

As far as the white/black lists, they work in sql, however, you need to 
do this:
| %nmgi.com        | whitelist_from          | *@hp.com               |
per the wiki.
I use the sql userprefs, and know several people who use the userprefs 
in the sql as well. The reason the prefs are in 'long' opposed to 'wide' 
is because think of how many options to SA there are to preferences. 
Almost (not all) settings can be set via the sql table. It is a bit more 
efficient to have a long format than it is wide, because you will only 
have a few of the total configurable options in sa in use at any one 
time. For the domain at work, there are about 200rows of prefs for 
black/whitelist. etc.
It does seem like a huge pain at first, but then you could grab one of 
the prepackaged web interfaces (watch for wrap):
http://wiki.apache.org/spamassassin/WebUserInterfaces

As far as bayes training, if you are using sql, you have the -u (user) 
flag in sa-learn to learn a message as a specified user. This, has made 
my life much easier. My users are on a windows mail server, and I have 
created a script that first removes added markup from their mail clients 
(one client in my case) from the forward. Then after this has run (all 
scheduled tasks/cron jobs) the Linux side grabs the mail, trains as 
spam/ham then compresses and sends back to a different folder (in case i 
need to re-train again later.)

http://wiki.apache.org/spamassassin/BayesFeedbackViaForwarding


-- 
Thanks,
JamesDR

Re: SA + SQL + per-user prefs

Posted by Mike Grice <mg...@plus.net>.
On Mon, 2005-05-02 at 10:49 -0500, Michael Parker wrote:
> On Mon, May 02, 2005 at 04:33:28PM +0100, Mike Grice wrote:
> > 
> > Nope, but think of how it would scale.  The design above is bad because
> > there is no unique data in there, so the table will get slow.  A better
> > design would be this:
> > 
> 
> Howdy,
> 
> SpamAssassin is an open source project that welcomes contributions
> from the community.  If you see a particular itch that you would like
> to scratch I highly encourage you to scratch it.  Once you've got some
> working code feel free to post it here or on the wiki to get feedback
> from folks.  If it's a widespread and useful feature then it may
> eventually make it's way into the source base.

See that's my problem right there, I can't code for toffee, so my
wishlists are just there to suggest things that I know would be good
based on the things I've implemented before.  Once I am up to speed I
will give back, I'm not frightened of that but I'm a way off yet :)

> This is exactly how I got my start working with SpamAssassin, I wanted
> to be able to store the bayes and AWL data in SQL.  I spent many many
> months working and perfecting the code and now it's in widespread use
> by many SpamAssassin users.

Yup, that's the best thing about projects being open!

Cheers
Mike
-- 
| Mike Grice                  Broadband Solutions for
| Systems Engineer                  Home & Business @
| PlusNet plc.                           www.plus.net
+ ----- PlusNet - The smarter way to broadband ------



Re: SA + SQL + per-user prefs

Posted by Michael Parker <pa...@pobox.com>.
On Mon, May 02, 2005 at 04:33:28PM +0100, Mike Grice wrote:
> 
> Nope, but think of how it would scale.  The design above is bad because
> there is no unique data in there, so the table will get slow.  A better
> design would be this:
> 

Howdy,

SpamAssassin is an open source project that welcomes contributions
from the community.  If you see a particular itch that you would like
to scratch I highly encourage you to scratch it.  Once you've got some
working code feel free to post it here or on the wiki to get feedback
from folks.  If it's a widespread and useful feature then it may
eventually make it's way into the source base.

This is exactly how I got my start working with SpamAssassin, I wanted
to be able to store the bayes and AWL data in SQL.  I spent many many
months working and perfecting the code and now it's in widespread use
by many SpamAssassin users.

Michael

Re: SA + SQL + per-user prefs

Posted by Mike Grice <mg...@plus.net>.
On Mon, 2005-05-02 at 09:34 +0200, Arvinn L?kebakken wrote:
> 
> Gerald V. Livingston II wrote:
> 
> >OK, this is probably just an over-cautious MySQL question.
> >
> >All of the examples I look at for setting up per-user prefs using SQL show
> >creating a table that looks like:
> >
> >username  pref  value
> >
> >So, if I want to allow users to control 5 values I would have a table that
> >looks like thsi:
> >
> >user1  pref1  value1
> >user1  pref2  value2
> >user1  pref3  value3
> >user1  pref4  value4
> >user1  pref5  value5
> >user2  pref1  value1
> >user2  pref2  value2
> >user2  pref3  value3
> >user2  pref4  value4
> >user2  pref5  value5
> >user3 ..... etc.
> >
> >When talking about importing a userbase of 6000+ that's gonna be a TALL
> >table really fast.
> >
> >  
> >
> 30.000, 5 * 6.000, rows isn't a tall SQL table at all IMHO.

Nope, but think of how it would scale.  The design above is bad because
there is no unique data in there, so the table will get slow.  A better
design would be this:

1.  A table with just users on there, each with their unique user ID,
eg:

UsersTable

UID	Friendlyname
1	bob
2	joe

2.  A table for each preference, linked back by the UID in the first
table:

pref1Table

UID	Value
1	10

SA can then join the tables based on the UID, and the application only
needs to be passed the UID to get all the values.  You can also gain
efficiencies with these smaller tables because you can optimise what
fields are in there (eg on your SpamCutoffTable will only have integer
and tinyint as field types).  Your only problem would be perhaps passing
to the application what values the user has got customised, but you
could fix that up in two(four) ways which would alter the number of
select statements needed:

UsersPrefsTable

UID	Preferences
1	pref1, pref2, pref3

A different way of doing this is multiple fields with booleans:

UsersPrefsTable

UID	pref1	pref2	pref3
1	1	1	0

Or you can build it into your original users table:

UsersTable

UID	Friendlyname	Preferences
1	bob		pref1, pref2, pref3

The other way:

UsersTable

UID	Friendlyname	pref1	pref2	pref3	pref4
1	bob		1	0	0	1

I'm looking into integrating user prefs this quarter where I work, and I
do have some concerns on how it will scale (e.g., with mysql replication
you need to send writes to a different machine from reads if you need to
have seperate databases, like one on each machine for reads and a master
for writes).  I wish more apps could be more db-aware :)

Cheers
Mike

-- 
| Mike Grice                  Broadband Solutions for
| Systems Engineer                  Home & Business @
| PlusNet plc.                           www.plus.net
+ ----- PlusNet - The smarter way to broadband ------



Re: SA + SQL + per-user prefs

Posted by Arvinn Løkkebakken <ar...@whitebird.no>.

Gerald V. Livingston II wrote:

>OK, this is probably just an over-cautious MySQL question.
>
>All of the examples I look at for setting up per-user prefs using SQL show
>creating a table that looks like:
>
>username  pref  value
>
>So, if I want to allow users to control 5 values I would have a table that
>looks like thsi:
>
>user1  pref1  value1
>user1  pref2  value2
>user1  pref3  value3
>user1  pref4  value4
>user1  pref5  value5
>user2  pref1  value1
>user2  pref2  value2
>user2  pref3  value3
>user2  pref4  value4
>user2  pref5  value5
>user3 ..... etc.
>
>When talking about importing a userbase of 6000+ that's gonna be a TALL
>table really fast.
>
>  
>
30.000, 5 * 6.000, rows isn't a tall SQL table at all IMHO.

Arvinn