You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Jan Hejl <jh...@excello.cz> on 2012/03/26 18:26:00 UTC

BayesStore in Redis DB

Hello everyone,

last few months i've been working on new BayesStore module. This module 
uses all-in-memory DB called Redis - more at redis.io

I've put the code on sf.net http://sourceforge.net/projects/bayesredis/

It's usable (at least for me) but it's still under high development. I 
hope that this could be interesting for someone and if you could help me 
to make it work properly and better I'll be glad.

Have a nice day
Jan Hejl


Re: BayesStore in Redis DB

Posted by Benny Pedersen <me...@junc.org>.
Den 2012-03-27 12:11, Axb skrev:
> On 03/27/2012 11:56 AM, Benny Pedersen wrote:
>> Den 2012-03-27 11:43, Axb skrev:
>>
>>> SIMPLE:
>>> - mysql doesn't scale under high traffic.
>>
>> so what does ?
>
> probably anything which doesn't try to be ACID compliant

http://nosql.mypopescu.com/post/1085685966/mysql-is-not-acid-compliant

so what ? :)


Re: BayesStore in Redis DB

Posted by Axb <ax...@gmail.com>.
On 03/27/2012 11:56 AM, Benny Pedersen wrote:
> Den 2012-03-27 11:43, Axb skrev:
>
>> SIMPLE:
>> - mysql doesn't scale under high traffic.
>
> so what does ?

probably anything which doesn't try to be ACID compliant


Re: BayesStore in Redis DB

Posted by Benny Pedersen <me...@junc.org>.
Den 2012-03-27 11:43, Axb skrev:

> SIMPLE:
> - mysql doesn't scale under high traffic.

so what does ?






Re: BayesStore in Redis DB

Posted by Axb <ax...@gmail.com>.
On 03/27/2012 11:36 AM, Benny Pedersen wrote:
> Den 2012-03-26 18:26, Jan Hejl skrev:
>> Hello everyone,
>>
>> last few months i've been working on new BayesStore module. This
>> module uses all-in-memory DB called Redis - more at redis.io
>
> so why not use mysql memory engine ?
>
> will your plugin store data on shutdown of server ?
>
>> I've put the code on sf.net http://sourceforge.net/projects/bayesredis/
>>
>> It's usable (at least for me) but it's still under high development.
>
> remove loadplugin in cf files and move this line into pre file
>
>> I hope that this could be interesting for someone and if you could
>> help me to make it work properly and better I'll be glad.
>
> here i just try to use mysql with less then 30% ram, its a hard work
> with both bayes, and dspam, but usaly i find what can be scaled down in
> mem usage without make to much collective damage :)

SIMPLE:
- mysql doesn't scale under high traffic.



Re: BayesStore in Redis DB

Posted by Jan Hejl <jh...@excello.cz>.

Dne 27.3.2012 13:29, Jan Hejl napsal(a):
> There are few reasons for implementing Redis DB for me (my company) 
> may be not all of it is objective because i like these key value 
> storages.
>
> 1) Simplicity - Redis DB is simple to use and to maintain and is much 
> more complex then other key value storages - remember that first 
> storages were key value (BDB, etc.).
> 2) Scaleability - Redis DB is pretty highly scalable, and the purpose 
> of this plugin is tu use Bayes for high performance nodes. In future 
> there should be lots of new things about Redis functionality, so I 
> thought, that i would be nice to give it a try.
> 3) Resources - i did some short tests about memory consuption and i 
> get to the 10% or less of memory consuption with comparsion to MySQL 
> engines. This reason may be odd, i have to make much more tests.
> 4) Autoexpire - SEEN table is not expiring and this table grows fast. 
> If you learn your bayes with 50000 emails per day, it grows into 
> pretty big monster after few years, and there's no mechanism keeping 
> time signature of seen entries. With Redis DB you can set EXPIRE time 
> of SEEN key (it's also implemented in this plugin) and you don't have 
> to care about anything else.
>
> These are my reasons, but I understand if you rather use simplier way 
> with MySQL memory engine. The point is that the plugin is almost done 
> so why not make it better?
>
> Dne 27.3.2012 11:36, Benny Pedersen napsal(a):
>> Den 2012-03-26 18:26, Jan Hejl skrev:
>>> Hello everyone,
>>>
>>> last few months i've been working on new BayesStore module. This
>>> module uses all-in-memory DB called Redis - more at redis.io
>>
>> so why not use mysql memory engine ?
>>
>> will your plugin store data on shutdown of server ?
> Which one will do?
I'm sorry for this. Sure it will. Redis continuously saves db dump on 
hard drive.
>>
>>> I've put the code on sf.net http://sourceforge.net/projects/bayesredis/
>>>
>>> It's usable (at least for me) but it's still under high development.
>>
>> remove loadplugin in cf files and move this line into pre file
> Sorry, this line is deprecated. I use this as you say inside pre 
> files. Thanks for pointing
>>
>>> I hope that this could be interesting for someone and if you could
>>> help me to make it work properly and better I'll be glad.
>>
>> here i just try to use mysql with less then 30% ram, its a hard work 
>> with both bayes, and dspam, but usaly i find what can be scaled down 
>> in mem usage without make to much collective damage :)
>>
>>
>>


Re: BayesStore in Redis DB

Posted by Jan Hejl <jh...@excello.cz>.

Dne 27.3.2012 15:52, Benny Pedersen napsal(a):
> Den 2012-03-27 13:29, Jan Hejl skrev:
>> There are few reasons for implementing Redis DB for me (my company)
>> may be not all of it is objective because i like these key value
>> storages.
>
> okay
>
>> 1) Simplicity - Redis DB is simple to use and to maintain and is much
>> more complex then other key value storages - remember that first
>> storages were key value (BDB, etc.).
>
> berkdb is on its way out on gentoo, that include the mysql support for 
> it aswell, redis db is not currently in gentoo portage so i cant test 
> it atm
redis is in gentoo portage. i wrote this plugin on gentoo system :-) 
gentoo portage also contains redis-py client for python
>
>> 2) Scaleability - Redis DB is pretty highly scalable, and the purpose
>> of this plugin is tu use Bayes for high performance nodes. In future
>> there should be lots of new things about Redis functionality, so I
>> thought, that i would be nice to give it a try.
>
> yep, hope your work will be part of spamassassin if it turns out good, 
> mysqltuner is helpfull for me, since my server only have 1.2G ram, yes 
> ram is cheap, but not on old servers
>
>> 3) Resources - i did some short tests about memory consuption and i
>> get to the 10% or less of memory consuption with comparsion to MySQL
>> engines. This reason may be odd, i have to make much more tests.
>
> super i would like to change here alone for this reason
For next two months we planned few test scenarios, so i'll let you know 
then.
>
>> 4) Autoexpire - SEEN table is not expiring and this table grows fast.
>> If you learn your bayes with 50000 emails per day, it grows into
>> pretty big monster after few years, and there's no mechanism keeping
>> time signature of seen entries. With Redis DB you can set EXPIRE time
>> of SEEN key (it's also implemented in this plugin) and you don't have
>> to care about anything else.
>
> yes the seen table can be modified to support expire or simply cronned 
> to be deleted, i do it as here with only holds 24 hours last records
sure i agree, but you have to execute cron task which can leads into 
crash on system with big DB data and higher load caused by higher 
traffic. For Redis it is more native.
>
>> These are my reasons, but I understand if you rather use simplier way
>> with MySQL memory engine. The point is that the plugin is almost done
>> so why not make it better?
>
> sure, if ram speed was demended one could make startup init for mysql 
> to alter table engine memory, and on shutdown alter table engine myisam
>
> but according to my reading you do more in the perl code ? :)
Sorry I don't understand this question. More inside perl module code?


Re: BayesStore in Redis DB

Posted by Quanah Gibson-Mount <qu...@zimbra.com>.
--On Tuesday, March 27, 2012 3:52 PM +0200 Benny Pedersen <me...@junc.org> 
wrote:

> Den 2012-03-27 13:29, Jan Hejl skrev:
>> There are few reasons for implementing Redis DB for me (my company)
>> may be not all of it is objective because i like these key value
>> storages.
>
> okay
>
>> 1) Simplicity - Redis DB is simple to use and to maintain and is much
>> more complex then other key value storages - remember that first
>> storages were key value (BDB, etc.).
>
> berkdb is on its way out on gentoo, that include the mysql support for it
> aswell, redis db is not currently in gentoo portage so i cant test it atm

If SA is going to look at alternatives, I suggest looking at the BSD 
licensed MDB library from OpenLDAP.org.  I for one vote for an alternative 
to using BDB. ;)  Note: OpenLDAP's MDB library has nothing to do with MS's 
MDB.

You can read more about it at:

<http://www.daasi.de/ldapcon2011/index.php?site=memory-mapped>
<http://www.daasi.de/ldapcon2011/downloads/chu-paper.pdf>
<http://www.daasi.de/ldapcon2011/downloads/Chu-slides.pdf>

Or watch the presentation at:

<http://youtu.be/SrKQNed7KK8>

--Quanah


--

Quanah Gibson-Mount
Sr. Member of Technical Staff
Zimbra, Inc
A Division of VMware, Inc.
--------------------
Zimbra ::  the leader in open source messaging and collaboration

Re: BayesStore in Redis DB

Posted by Benny Pedersen <me...@junc.org>.
Den 2012-03-27 13:29, Jan Hejl skrev:
> There are few reasons for implementing Redis DB for me (my company)
> may be not all of it is objective because i like these key value
> storages.

okay

> 1) Simplicity - Redis DB is simple to use and to maintain and is much
> more complex then other key value storages - remember that first
> storages were key value (BDB, etc.).

berkdb is on its way out on gentoo, that include the mysql support for 
it aswell, redis db is not currently in gentoo portage so i cant test it 
atm

> 2) Scaleability - Redis DB is pretty highly scalable, and the purpose
> of this plugin is tu use Bayes for high performance nodes. In future
> there should be lots of new things about Redis functionality, so I
> thought, that i would be nice to give it a try.

yep, hope your work will be part of spamassassin if it turns out good, 
mysqltuner is helpfull for me, since my server only have 1.2G ram, yes 
ram is cheap, but not on old servers

> 3) Resources - i did some short tests about memory consuption and i
> get to the 10% or less of memory consuption with comparsion to MySQL
> engines. This reason may be odd, i have to make much more tests.

super i would like to change here alone for this reason

> 4) Autoexpire - SEEN table is not expiring and this table grows fast.
> If you learn your bayes with 50000 emails per day, it grows into
> pretty big monster after few years, and there's no mechanism keeping
> time signature of seen entries. With Redis DB you can set EXPIRE time
> of SEEN key (it's also implemented in this plugin) and you don't have
> to care about anything else.

yes the seen table can be modified to support expire or simply cronned 
to be deleted, i do it as here with only holds 24 hours last records

> These are my reasons, but I understand if you rather use simplier way
> with MySQL memory engine. The point is that the plugin is almost done
> so why not make it better?

sure, if ram speed was demended one could make startup init for mysql 
to alter table engine memory, and on shutdown alter table engine myisam

but according to my reading you do more in the perl code ? :)


Re: BayesStore in Redis DB

Posted by Jan Hejl <jh...@excello.cz>.
There are few reasons for implementing Redis DB for me (my company) may 
be not all of it is objective because i like these key value storages.

1) Simplicity - Redis DB is simple to use and to maintain and is much 
more complex then other key value storages - remember that first 
storages were key value (BDB, etc.).
2) Scaleability - Redis DB is pretty highly scalable, and the purpose of 
this plugin is tu use Bayes for high performance nodes. In future there 
should be lots of new things about Redis functionality, so I thought, 
that i would be nice to give it a try.
3) Resources - i did some short tests about memory consuption and i get 
to the 10% or less of memory consuption with comparsion to MySQL 
engines. This reason may be odd, i have to make much more tests.
4) Autoexpire - SEEN table is not expiring and this table grows fast. If 
you learn your bayes with 50000 emails per day, it grows into pretty big 
monster after few years, and there's no mechanism keeping time signature 
of seen entries. With Redis DB you can set EXPIRE time of SEEN key (it's 
also implemented in this plugin) and you don't have to care about 
anything else.

These are my reasons, but I understand if you rather use simplier way 
with MySQL memory engine. The point is that the plugin is almost done so 
why not make it better?

Dne 27.3.2012 11:36, Benny Pedersen napsal(a):
> Den 2012-03-26 18:26, Jan Hejl skrev:
>> Hello everyone,
>>
>> last few months i've been working on new BayesStore module. This
>> module uses all-in-memory DB called Redis - more at redis.io
>
> so why not use mysql memory engine ?
>
> will your plugin store data on shutdown of server ?
Which one will do?
>
>> I've put the code on sf.net http://sourceforge.net/projects/bayesredis/
>>
>> It's usable (at least for me) but it's still under high development.
>
> remove loadplugin in cf files and move this line into pre file
Sorry, this line is deprecated. I use this as you say inside pre files. 
Thanks for pointing
>
>> I hope that this could be interesting for someone and if you could
>> help me to make it work properly and better I'll be glad.
>
> here i just try to use mysql with less then 30% ram, its a hard work 
> with both bayes, and dspam, but usaly i find what can be scaled down 
> in mem usage without make to much collective damage :)
>
>
>


Re: BayesStore in Redis DB

Posted by Benny Pedersen <me...@junc.org>.
Den 2012-03-26 18:26, Jan Hejl skrev:
> Hello everyone,
>
> last few months i've been working on new BayesStore module. This
> module uses all-in-memory DB called Redis - more at redis.io

so why not use mysql memory engine ?

will your plugin store data on shutdown of server ?

> I've put the code on sf.net 
> http://sourceforge.net/projects/bayesredis/
>
> It's usable (at least for me) but it's still under high development.

remove loadplugin in cf files and move this line into pre file

> I hope that this could be interesting for someone and if you could
> help me to make it work properly and better I'll be glad.

here i just try to use mysql with less then 30% ram, its a hard work 
with both bayes, and dspam, but usaly i find what can be scaled down in 
mem usage without make to much collective damage :)