You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Paweł Tęcza <pt...@uw.edu.pl> on 2007/08/16 10:36:16 UTC

Spam kills my MySQL with Bayes

Hi,

It's my first post to this mailing list, so I would like to say hello
to all of us! :)

We manage a not little mail system for our university (~100k messages
per day).  In includes:

- a few front-ends (Courier SMTP/IMAP/POP3/webmail/maildrop) which
  connect to random Spamassassin host via spamc utility,
- a cluster of a few servers with Spamassassin (3.2.1-1ubuntu1),
- another one server with MySQL (5.0.38-0ubuntu1) to store Bayes
  and FuzzyOcr database, etc.

All our servers work under control Ubuntu 7.04 as OpenVZ virtual
environments.

Recently we've been bombed by spam like below:

Date: Mon, 13 Aug 2007 18:43:18 -0300
From: dinca Klarenbeek <Kl...@kuak.com>
To: webmaster@net.icm.edu.pl
Subject: That programmer knows exactly what he or she is doing,
    and his or her intentions are malefic (or at least, not altruistic).

H*u_g e N_e'w,s To Im-pact C*Y-T,V


C+hina YouT,V C.o,r.p*.
Symbo,l: C-Y'T V


We h+a_v*e alr_eady s.e-e n CYTV''s marke,t i mpact bef_ore climbi n.g to o'v+e_r $2.0_0 w_i't*h n+e'w.s,.
Pr.ess R-elease:
C+hina YouT-V's Cn+Boo W*e+b S_i't.e Ran.ks N.o_._1 on M_icroso+ft L_i v-e Se-arch E.ngine
Cn'Boo Traf-fic In.creas_es 4_9*% O*v+e r T-w.o Month_s


R'e'a_d t'h_e n,ews, thi+nk ab*out t,h-e impac*t, and

j.u+m_p on t*h+i+s fir,st thin-g Tomorr, ow mornin-.g!  $-0.42 is a g*i*f*t at t.h.i,s p*r_ice.....

Do y o.u,r hom*e,work a'n+d wat.ch t.h*i_s t*rade Monda*y mor'ning.


T.h*e R'ewri,teEngine dir*e'ctive enab+les or di*sabl es t h*e runti me rewri_tin-g engin_e.

By midd.ay, th.ough we w e r_e w+e_l_l up in t-h_e northe _rly lati.*tudes, t-h'e h,e,a.t w*a's sick+ening+.
Y'o+u l*o o_k as if y+o-u h-a.v*e b.e e*n tr a_veling.

Predic't_able law-a bidin*g beh+avi.or lull*s dri_vers.

By s'o.m*e mira_cle wort*hy of ev'ery H,o.l,y B_o'o.k e.v*e.r writt_e n, b,o-t,h rocke-ts misse_d h+e+r+.

I hope that my message won't be rejected by Spamassassin which guards
that mailing list and you can see it ;)

Is it not a new kind of spam and Spamassassin should be improved
to fight it?  I'm not sure...

The results is that spam was killing our MySQL database, because we
had ~50k queries per minute with INSERTs and UPDATEs of a many tokens.
The only one solution was to disable Bayes.

Did you also have the problems with a spam like that?  If so, how do
you handle with it?

My best regards,

Pawel Tecza

Re: Spam kills my MySQL with Bayes

Posted by SM <sm...@resistor.net>.
Hi Pawel,
At 04:48 17-08-2007, =?iso-8859-2?Q?Pawe=B3_T=EAcza?= wrote:
>My hardware seems to be good enough.  It's Sun Fire x4100 M2 server
>with 2 x Dual-Core AMD Opteron 2220 SE CPUs and 8GB RAM on the board
>and it's bored with its job ;)  I think I rather need faster disks.

That should be fast.  Disk I/O is generally the bottleneck for a 
database. MySQL provides several sample configuration files.  Use the 
one which is appropriate for your installation.  Adjust the buffer 
size (engine specific).  See how effective your query cache is.  Turn 
on the slow query log and monitor it.

Regards,
-sm 


Re: Spam kills my MySQL with Bayes

Posted by Paweł Tęcza <pt...@uw.edu.pl>.
Pawel Sasin <ps...@wp-sa.pl> writes:
[...]
> You said you have several servers running spamd - if updates are
> causing you much trouble then you could disable bayes_autolearn on
> most of the servers, so that only some of them (down to 1) would
> update your bayes DB, while the others would just query it.

Thanks for the next hint, Pawel!  I didn't think about it :)
I agree it's a better solution then disabled Bayes everywhere.

Pawel

Re: Spam kills my MySQL with Bayes

Posted by Pawel Sasin <ps...@wp-sa.pl>.
>>> My hardware seems to be good enough.  It's Sun Fire x4100 M2 server
>>> with 2 x Dual-Core AMD Opteron 2220 SE CPUs and 8GB RAM on the board
>>> and it's bored with its job ;)  I think I rather need faster disks.
>>>       
>> With that amount memory you won't see much disk activity. You can happily
>> increase mysql buffer cache sizes to a GB or two. It's all basic mysql
>> tuning.
>>     
You said you have several servers running spamd - if updates are causing 
you much trouble then you could disable bayes_autolearn on most of the 
servers, so that only some of them (down to 1) would update your bayes 
DB, while the others would just query it.

-- 
Pawel Sasin

WIRTUALNA  POLSKA  SA, ul. Traugutta 115c, 80-226 Gdansk; NIP: 957-07-51-216; 
Sad Rejonowy Gdansk-Polnoc KRS 0000068548, kapital zakladowy 62.880.024 zlotych (w calosci wplacony)

Re: Spam kills my MySQL with Bayes

Posted by Paweł Tęcza <pt...@uw.edu.pl>.
Henrik Krohns <he...@hege.li> writes:
[...]
>> My hardware seems to be good enough.  It's Sun Fire x4100 M2 server
>> with 2 x Dual-Core AMD Opteron 2220 SE CPUs and 8GB RAM on the board
>> and it's bored with its job ;)  I think I rather need faster disks.
>
> With that amount memory you won't see much disk activity. You can happily
> increase mysql buffer cache sizes to a GB or two. It's all basic mysql
> tuning.

Hi Henrik,

It's a good suggestion. Thanks a lot! :)

My best regards,

Pawel

Re: Spam kills my MySQL with Bayes

Posted by Henrik Krohns <he...@hege.li>.
On Fri, Aug 17, 2007 at 01:48:30PM +0200, Pawe? T?cza wrote:
> SM <sm...@resistor.net> writes:
> [...]
> >>Now I use MyISAM strorage backend, because I just created Bayesian
> >>database using Spamassassin sql/bayes_mysql.sql file :)
> >
> > The recommendations in the sql/bayes_mysql.sql file are for the
> > average setup.  It doesn't cover MySQL optimization techniques as that
> > a MySQL specific issue.
> >
> > You can change the engine from MyISAM to InnoDB (see ALTER TABLE).
> > That should improve performance for INSERTs.  With the amount of mail
> > your server handles, you either have to improve MySQL performance,
> > switch to more powerful hardware or disable Bayes.  If you disable
> > Bayes, the punctuation spam would still be caught in your setup as it
> > scored over 19 points.
> 
> Hello again! :)
> 
> I'm working on conversion of storage engine from MyISAM to InnoDB.
> 
> My hardware seems to be good enough.  It's Sun Fire x4100 M2 server
> with 2 x Dual-Core AMD Opteron 2220 SE CPUs and 8GB RAM on the board
> and it's bored with its job ;)  I think I rather need faster disks.

With that amount memory you won't see much disk activity. You can happily
increase mysql buffer cache sizes to a GB or two. It's all basic mysql
tuning.


Re: Spam kills my MySQL with Bayes

Posted by Paweł Tęcza <pt...@uw.edu.pl>.
SM <sm...@resistor.net> writes:
[...]
>>Now I use MyISAM strorage backend, because I just created Bayesian
>>database using Spamassassin sql/bayes_mysql.sql file :)
>
> The recommendations in the sql/bayes_mysql.sql file are for the
> average setup.  It doesn't cover MySQL optimization techniques as that
> a MySQL specific issue.
>
> You can change the engine from MyISAM to InnoDB (see ALTER TABLE).
> That should improve performance for INSERTs.  With the amount of mail
> your server handles, you either have to improve MySQL performance,
> switch to more powerful hardware or disable Bayes.  If you disable
> Bayes, the punctuation spam would still be caught in your setup as it
> scored over 19 points.

Hello again! :)

I'm working on conversion of storage engine from MyISAM to InnoDB.

My hardware seems to be good enough.  It's Sun Fire x4100 M2 server
with 2 x Dual-Core AMD Opteron 2220 SE CPUs and 8GB RAM on the board
and it's bored with its job ;)  I think I rather need faster disks.

My best regards,

Pawel

Re: Spam kills my MySQL with Bayes

Posted by SM <sm...@resistor.net>.
At 03:09 16-08-2007, =?iso-8859-2?Q?Pawe=B3_T=EAcza?= wrote:
>Here are the Spamassassin headers for one of a spam mail we received:
>
>X-Spam-Flag: YES
>X-Spam-Checker-Version: SpamAssassin 3.2.1 (2007-05-02) on
>         anubis3.poczta.uw.edu.pl
>X-Spam-Level: xxxxxxxxxxxxxxxxxxx
>X-Spam-Status: Yes, score=19.3 required=5.0
>tests=FH_HELO_EQ_D_D_D_D,FRT_PRICE,
> 
>FRT_STRONG1,FRT_SYMBOL,HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_BL_SPAMCOP_NET
>,
>         RCVD_IN_PBL,TVD_FUZZY_SYMBOL,TVD_STOCK1 autolearn=disabled
>         version=3.2.1

The message is detected as spam.

>Thanks for the hint!  I'll try a look at them.

You don't need those rules then.

>Now I use MyISAM strorage backend, because I just created Bayesian
>database using Spamassassin sql/bayes_mysql.sql file :)

The recommendations in the sql/bayes_mysql.sql file are for the 
average setup.  It doesn't cover MySQL optimization techniques as 
that a MySQL specific issue.

You can change the engine from MyISAM to InnoDB (see ALTER 
TABLE).  That should improve performance for INSERTs.  With the 
amount of mail your server handles, you either have to improve MySQL 
performance, switch to more powerful hardware or disable Bayes.  If 
you disable Bayes, the punctuation spam would still be caught in your 
setup as it scored over 19 points.

Regards,
-sm 


Re: Spam kills my MySQL with Bayes

Posted by Kai Schaetzl <ma...@conactive.com>.
Pawe³ Têcza wrote on Thu, 16 Aug 2007 15:46:49 +0200:

> but what about configureable
> plugin option for maximum number of tokens per message to store?

if there isn't one already ...
You could put this up on bugzilla as a feature request.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com




Re: Spam kills my MySQL with Bayes

Posted by Paweł Tęcza <pt...@uw.edu.pl>.
Kai Schaetzl <ma...@conactive.com> writes:

> Pawe³ Têcza wrote on Thu, 16 Aug 2007 14:28:05 +0200:
[...]
> But the second
>> is rather Spamassassin's job.
>> 
>> I'm thinking whether it's really necessary to keep *all* tokens
>> for that kind of spam...  Maybe Spamassassin could save only
>> some part of them?  What's your opinion about it?
>
> I really don't know enough about Bayes and SA to say much about it. I 
> think it would be difficult for SA to determine what are "good" and "bad" 
> tokens.

Yes, it can be difficult to determine, but what about configureable
plugin option for maximum number of tokens per message to store?
If it doesn't exist yet, of course ;)

Pawel

Re: Spam kills my MySQL with Bayes

Posted by Kai Schaetzl <ma...@conactive.com>.
Pawe³ Têcza wrote on Thu, 16 Aug 2007 14:28:05 +0200:

> 1. try to speed up my MySQL server
> 2. decrease a number of unique tokens for "punctuation spam"
> 
> The first of them is a task for me, of course.

Well, it's nevertheless a good question for this list as others may have 
the same problem. I just wanted to stress that your problem is not 
detection as others seem to have overlooked this and give hints on better 
detection. But you are detecting just fine.

But the second
> is rather Spamassassin's job.
> 
> I'm thinking whether it's really necessary to keep *all* tokens
> for that kind of spam...  Maybe Spamassassin could save only
> some part of them?  What's your opinion about it?

I really don't know enough about Bayes and SA to say much about it. I 
think it would be difficult for SA to determine what are "good" and "bad" 
tokens.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com




Re: Spam kills my MySQL with Bayes

Posted by Paweł Tęcza <pt...@uw.edu.pl>.
Kai Schaetzl <ma...@conactive.com> writes:

> Pawe³ Têcza wrote on Thu, 16 Aug 2007 12:25:48 +0200:
>
>> What can you tell about it now?
>
> I think that's not the point, or is it? You don't seem to have a problem 
> with detection but with token storage slowness on SQL as these "fuzzy" 
> mails seem to generate a lot of unique tokens. Is that what you wanted to 
> get fixed?

Hi Kai,

I would like two things:

1. try to speed up my MySQL server
2. decrease a number of unique tokens for "punctuation spam"

The first of them is a task for me, of course.  But the second
is rather Spamassassin's job.

I'm thinking whether it's really necessary to keep *all* tokens
for that kind of spam...  Maybe Spamassassin could save only
some part of them?  What's your opinion about it?

My best regards,

Pawel

Re: Spam kills my MySQL with Bayes

Posted by Kai Schaetzl <ma...@conactive.com>.
Pawe³ Têcza wrote on Thu, 16 Aug 2007 12:25:48 +0200:

> What can you tell about it now?

I think that's not the point, or is it? You don't seem to have a problem 
with detection but with token storage slowness on SQL as these "fuzzy" 
mails seem to generate a lot of unique tokens. Is that what you wanted to 
get fixed?

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com




Re: Spam kills my MySQL with Bayes

Posted by Paweł Tęcza <pt...@uw.edu.pl>.
"Paweł Tęcza" <pt...@uw.edu.pl> writes:
[...]
> Here are the Spamassassin headers for one of a spam mail we received:

Ups!  It was spam received when we disabled Bayes. Below are spam
headers we scanned before:

X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.2.1 (2007-05-02) on
        anubis4.poczta.uw.edu.pl
X-Spam-Level: xxxxxxxxxxxxxxxxxxxxx
X-Spam-Status: Yes, score=21.5 required=5.0
tests=AXB_XMID_1212,BAYES_99,
        FH_HELO_EQ_D_D_D_D,FRT_PRICE,FRT_STRONG1,HELO_DYNAMIC_IPADDR,RCVD_IN_PBL,
        RCVD_IN_SORBS_DUL,RDNS_DYNAMIC,STOX_REPLY_TYPE,TVD_STOCK1
        autolearn=spam
        version=3.2.1
X-Spam-Report: =?ISO-8859-1?Q?
        *  3.5 BAYES_99 BODY: Bayesowskie prawdopodobie=f1stwo spamu
        wynosi 99 do
        *      100%
        *      [score: 1.0000]
        *  0.0 STOX_REPLY_TYPE STOX_REPLY_TYPE
        *  0.0 FH_HELO_EQ_D_D_D_D Helo is d-d-d-d
        *  2.4 HELO_DYNAMIC_IPADDR Relay HELO'd using suspicious
        hostname (IP addr
        *      1)
        *  3.5 AXB_XMID_1212 Barbera Fingerprint
        *  0.9 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL
        *      [90.14.168.63 listed in zen.spamhaus.org]
        *  0.9 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic
        IP address
        *      [90.14.168.63 listed in dnsbl.sorbs.net]
        *  3.5 FRT_PRICE BODY: ReplaceTags: Price
        *  3.0 FRT_STRONG1 BODY: ReplaceTags: Strong (1)
        *  3.8 TVD_STOCK1 BODY: TVD_STOCK1
        *  0.1 RDNS_DYNAMIC Delivered to trusted network by host with
        *      dynamic-looking rDNS?=

What can you tell about it now? :)

Regards,

Pawel

Re: Spam kills my MySQL with Bayes

Posted by Paweł Tęcza <pt...@uw.edu.pl>.
SM <sm...@resistor.net> writes:

> Hi Pawel,
> At 01:36 16-08-2007, =?iso-8859-2?Q?Pawe=B3_T=EAcza?= wrote:
[...]
>>Is it not a new kind of spam and Spamassassin should be improved
>>to fight it?  I'm not sure...
>
> No, it is not new.  I posted the following reply a few days back regarding this
> type of message referred to as "punctuation spam".
>
> The message hits hit BAYES_99 and FRT_PRICE.  As you did not include the
> headers, it's not possible to tell whether it would hit some of the "DYNAMIC"
> rules as well.

Hello mysterious SM! ;)

Thanks a lot for the reply and the explanation!

Here are the Spamassassin headers for one of a spam mail we received:

X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.2.1 (2007-05-02) on
        anubis3.poczta.uw.edu.pl
X-Spam-Level: xxxxxxxxxxxxxxxxxxx
X-Spam-Status: Yes, score=19.3 required=5.0
tests=FH_HELO_EQ_D_D_D_D,FRT_PRICE,
        FRT_STRONG1,FRT_SYMBOL,HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_BL_SPAMCOP_NET
,
        RCVD_IN_PBL,TVD_FUZZY_SYMBOL,TVD_STOCK1 autolearn=disabled
        version=3.2.1
X-Spam-Report: =?ISO-8859-1?Q?
        *  0.5 FH_HELO_EQ_D_D_D_D Helo is d-d-d-d
        *  2.5 FRT_PRICE BODY: ReplaceTags: Price
        *  3.6 FRT_SYMBOL BODY: ReplaceTags: Symbol
        *  1.4 TVD_FUZZY_SYMBOL BODY: TVD_FUZZY_SYMBOL
        *  2.9 FRT_STRONG1 BODY: ReplaceTags: Strong (1)
        *  3.8 TVD_STOCK1 BODY: TVD_STOCK1
        *  0.0 HTML_MESSAGE BODY: Wiadomo=b6=e6 zawiera kod HTML
        *  1.8 MIME_QP_LONG_LINE RAW: Linia QP d=b3u=bfsza ni=bf 76
        znak=f3w
        *  2.2 RCVD_IN_BL_SPAMCOP_NET RBL: Odebrane od systemu klasy
        RELAY w/g:
        *      bl.spamcop.net
        *      [Blocked - see
        <http://www.spamcop.net/bl.shtml?89.191.164.221>]
        *  0.5 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL
        *      [89.191.164.221 listed in zen.spamhaus.org]?=

> Bill Landry suggested using chickenpox.cf and mangled.cf rules from SARE.

Thanks for the hint!  I'll try a look at them.

>>The results is that spam was killing our MySQL database, because we
>>had ~50k queries per minute with INSERTs and UPDATEs of a many tokens.
>>The only one solution was to disable Bayes.
>
> MySQL can be optimized to handle such a load.  If you aren't using InnoDB for
> Bayesian storage, switch to it.

Now I use MyISAM strorage backend, because I just created Bayesian
database using Spamassassin sql/bayes_mysql.sql file :)

Have a nice day,

Pawel

Re: Spam kills my MySQL with Bayes

Posted by SM <sm...@resistor.net>.
Hi Pawel,
At 01:36 16-08-2007, =?iso-8859-2?Q?Pawe=B3_T=EAcza?= wrote:
>We manage a not little mail system for our university (~100k messages
>per day).  In includes:
>
>- a few front-ends (Courier SMTP/IMAP/POP3/webmail/maildrop) which
>   connect to random Spamassassin host via spamc utility,
>- a cluster of a few servers with Spamassassin (3.2.1-1ubuntu1),
>- another one server with MySQL (5.0.38-0ubuntu1) to store Bayes
>   and FuzzyOcr database, etc.
>
>All our servers work under control Ubuntu 7.04 as OpenVZ virtual
>environments.
>
>Recently we've been bombed by spam like below:
>
>Date: Mon, 13 Aug 2007 18:43:18 -0300
>From: dinca Klarenbeek <Kl...@kuak.com>
>To: webmaster@net.icm.edu.pl
>Subject: That programmer knows exactly what he or she is doing,
>     and his or her intentions are malefic (or at least, not altruistic).
>
>H*u_g e N_e'w,s To Im-pact C*Y-T,V

[snip]

>Is it not a new kind of spam and Spamassassin should be improved
>to fight it?  I'm not sure...

No, it is not new.  I posted the following reply a few days back 
regarding this type of message referred to as "punctuation spam".

The message hits hit BAYES_99 and FRT_PRICE.  As you did not include 
the headers, it's not possible to tell whether it would hit some of 
the "DYNAMIC" rules as well.

Bill Landry suggested using chickenpox.cf and mangled.cf rules from SARE.

>The results is that spam was killing our MySQL database, because we
>had ~50k queries per minute with INSERTs and UPDATEs of a many tokens.
>The only one solution was to disable Bayes.

MySQL can be optimized to handle such a load.  If you aren't using 
InnoDB for Bayesian storage, switch to it.

Regards,
-sm 


Re: Spam kills my MySQL with Bayes

Posted by Paweł Tęcza <pt...@uw.edu.pl>.
Pawel Sasin <ps...@wp-sa.pl> writes:
[...]
> Have you tried this on your SA servers?
> http://wiki.apache.org/spamassassin/DBIPlugin

Hello Pawel! :D

Thank you very much for the message about DBIPlugin!  I've never
used it before.  It looks interesting for me, so I've just
downloaded that plugin and I'm testing it on one of my SA nodes
right now :)

> AFAIK spawning many connections to mysql servers causes quite a big
> load on them.

I didn't noticed big load on my server with MySQL while "punctuation
spam" bombing.  Yes, it increased, but from 0.1 to 1.1 :)  I think
we didn't have many connections, but many SQL queries.

Greetings from Warsaw! :)

Pawel

Re: Spam kills my MySQL with Bayes

Posted by Pawel Sasin <ps...@wp-sa.pl>.
Hi,
> We manage a not little mail system for our university (~100k messages
> per day).  In includes:
>
> - a few front-ends (Courier SMTP/IMAP/POP3/webmail/maildrop) which
>   connect to random Spamassassin host via spamc utility,
> - a cluster of a few servers with Spamassassin (3.2.1-1ubuntu1),
> - another one server with MySQL (5.0.38-0ubuntu1) to store Bayes
>   and FuzzyOcr database, etc.
>
> The results is that spam was killing our MySQL database, because we
> had ~50k queries per minute with INSERTs and UPDATEs of a many tokens.
> The only one solution was to disable Bayes.
>   
Have you tried this on your SA servers?
http://wiki.apache.org/spamassassin/DBIPlugin

AFAIK spawning many connections to mysql servers causes quite a big load 
on them.

-- 
Pawel Sasin


WIRTUALNA  POLSKA  SA, ul. Traugutta 115c, 80-226 Gdansk; NIP: 957-07-51-216; 
Sad Rejonowy Gdansk-Polnoc KRS 0000068548, kapital zakladowy 62.880.024 zlotych (w calosci wplacony)