You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Michelle Konzack <li...@tamay-dogan.net> on 2008/09/04 09:33:04 UTC

spamassassin taks ten minutes for a message

Hello,

I am downloading my messages with my Laptop in a Internet  cafe,  trans-
fering @home to my server and then let a filter roll over it...

Note:  I am working Off-Line (No Internet @home)

Last weekend I was with my server @friends and conected it over ADSL to
the Internet and downloadd arround 30.000 messages from arround 10 days
since I was not in Strasbourg.

The first 18.000 messages went fin  but  then  "spamassassin"  begun  to
buging, exactly I took arround 10 minutes for each message  and  when  I
encountered the problem it was already runnin several  hours  with  this
problem...

How can ths be?

----[ command 'cd .spamassassin && ls -Al' ]----------------------------
insgesamt 25528
-rw-------  1 michelle.konzack private  1327104 2008-08-31 18:29 auto-whitelist
-rw-------  1 michelle.konzack private    93840 2008-08-31 18:29 bayes_journal
-rw-------  1 michelle.konzack private  2629632 2008-08-31 18:29 bayes_seen
-rw-------  1 michelle.konzack private 20377600 2008-08-31 18:29 bayes_toks
-rw-------  1 michelle.konzack private  4718592 2008-08-30 19:46 bayes_toks.expire30302
-rw-------  1 michelle.konzack private  4513792 2008-08-30 20:42 bayes_toks.expire32331
-rw-------  1 michelle.konzack private  4517888 2008-08-31 18:29 bayes_toks.expire7360
-rw-r--r--  1 michelle.konzack private     1510 2008-08-30 21:52 user_prefs
------------------------------------------------------------------------

As you can see, I had to stop spamassassin on 2008-08-31.  And even if I
move the files out of the way.  "spamassassin" refus to run with  normal
speed (4-6 messages per second)-

Note:  The Server is a Quad-Xeon with plenty of memory and
       10.000 RpM SCSI drives in Raid-1.

Thanks, Greetings and nice Day/Evening
    Michelle Konzack
    Systemadministrator
    24V Electronic Engineer
    Tamay Dogan Network
    Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
##################### Debian GNU/Linux Consultant #####################
Michelle Konzack   Apt. 917                  ICQ #328449886
+49/177/9351947    50, rue de Soultz         MSN LinuxMichi
+33/6/61925193     67100 Strasbourg/France   IRC #Debian (irc.icq.com)

Re: spamassassin taks ten minutes for a message

Posted by Michelle Konzack <li...@tamay-dogan.net>.
Hello John,

Am 2008-09-21 09:40:38, schrieb John Hardin:
> Some questions:
> 
> (1) How are you passing messages to spamassassin for scoring?

In procmail with:

    :0fw
    |spamc

> (2) Exactly what command line options are you using for 
> spamc/spamassassin? Are network tests enabled?

Standard Debian installation without network tests since I am Off-Line

> (3) Do you have bayes auto-expire enabled?

It seems, it is the default if you install spamassassin. Now I have set

    bayes_auto_expire	0

in my ~/.spamassassin/user_prefs and waiting if the error occor again.
Also I have setup a cronjob with

    0 6 * * * /usr/bin/sa-learn --force-expire

> (4) Does it exhibit the same poor performance when you run one message 
> through spamassassin manually?

Yes

> Please run one message through spamassassin with debugging turned on, 
> capture the results, and post them to a website somewhere (e.g. pastebin) 
> and send the URL for that to the list so we can see timing and such.

I think, it is not neccesary (see other message) but if the error happen
again, I will come back.

Thanks, Greetings and nice Day/Evening
    Michelle Konzack
    Systemadministrator
    24V Electronic Engineer
    Tamay Dogan Network
    Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
##################### Debian GNU/Linux Consultant #####################
Michelle Konzack   Apt. 917                  ICQ #328449886
+49/177/9351947    50, rue de Soultz         MSN LinuxMichi
+33/6/61925193     67100 Strasbourg/France   IRC #Debian (irc.icq.com)

Re: spamassassin taks ten minutes for a message

Posted by John Hardin <jh...@impsec.org>.
On Thu, 4 Sep 2008, Michelle Konzack wrote:

> The first 18.000 messages went fin but then "spamassassin"  begun to 
> buging, exactly I took arround 10 minutes for each message and when I 
> encountered the problem it was already runnin several hours with this 
> problem...
>
> How can ths be?
>
> As you can see, I had to stop spamassassin on 2008-08-31.  And even if I 
> move the files out of the way.  "spamassassin" refus to run with normal 
> speed (4-6 messages per second)-

Some questions:

(1) How are you passing messages to spamassassin for scoring?

(2) Exactly what command line options are you using for 
spamc/spamassassin? Are network tests enabled?

(3) Do you have bayes auto-expire enabled?

(4) Does it exhibit the same poor performance when you run one message 
through spamassassin manually?

Please run one message through spamassassin with debugging turned on, 
capture the results, and post them to a website somewhere (e.g. pastebin) 
and send the URL for that to the list so we can see timing and such.


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Democrats '63: Ask not what your country can do for you,
    ask what you can do for your country.
   Democrats '07: Ask not what your country can do for you,
    demand it!
-----------------------------------------------------------------------
  44 days until the Presidential Election

Re: spamassassin taks ten minutes for a message

Posted by Bob Proulx <bo...@proulx.com>.
Michelle Konzack wrote:
> I have filtered in the last 4 month over 800.000  messages  and  it  was
> working perfectly without and flaws and had stoped form  one  minute  to
> another.

Well, something in your environment has changed.  You might not ever
determine how things used to be but you will need to understand how
they are now and react to them.

> Since I am Off-Line, I had NO update for the system since 4 month, which
> mean, absolutely nothing has changed.

It is probably "Bit Rot".  :-)

> Since online checks are to slow, I like  to  see  a  solution  for  very
> reliable RBL checks and such.

You would probably benefit by keeping statistics about which DNSBLs
are triggering on which messages.

>       ################ list.dsbl.org ###################################
> ...
>             { REV2CHECKIP=`host ${RECEIVIP2REV}.list.dsbl.org 2>&1 | grep -v 'not found.'` }
> ...
>     host ${RECEIVIP2REV}.list.dsbl.org
> 
> are very slow...

Note that dsbl.org is gone.  Please see http://www.dsbl.org/ update
your configuration.

Bob

Re: spamassassin taks ten minutes for a message

Posted by mouss <mo...@netoyen.net>.
Michelle Konzack wrote:
> [snip]
> 
> but unfortunately the two/four lookups with
> 
>     host ${RECEIVIP2REV}.zen.spamhaus.org
>     host ${RECEIVIP2REV}.list.dsbl.org
> 
> are very slow...
> 
> My idea was already if I do not direct filtering, I could catch the IPs,
> put it into a cache file, sort and  unify  it  and  use  an  independant
> process which fetch the status and write out a file, which I can  easyly
> import into my own DNS server (bind9) @home and then do the final
> filtering
> 
> On my <samba3> I have with the Quad-Xeon  enough  resources  to  install
> some  instances  of  bind9  as  VHosts   which   could   be   setup   as
> <zen.spamhaus.org> and <list.dsbl.org> which then would  be  deactivated
> if <samba3> get an internet connection...
> 
> Question: Is it possibel to get (FTP) the lists from the two servers for
>           private non-public use?  If yes, how big are they?

dsbl was rsync-able but is now gone. for spamhaus, you would have to pay 
a fee (too expensive if you don't receive a lot of mail).

>           Since I am only 2-3 times per week On-Line, it would be nice
>           if I could fetch the whole list.  (I asume this takes less
>           resources as making several 1000 lookups on the DNS)

It will reduce the latency of your "real time" checks, but will 
certainly increase the overall bandwidth usage (if you add up the sizes 
of the dns packets, the result will be much smaller than that of the list).

Re: spamassassin taks ten minutes for a message

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
> Am 2008-09-20 18:22:25, schrieb Bob Proulx:
> > I don't really know and hopefully others will have better
> > suggestions.  But the first thing I would try is to run spamassassin
> > in local mode.
> > 
> >        Options:
> >         -L, --local                       Local tests only (no online tests)

On 23.09.08 13:04, Michelle Konzack wrote:
> I am using this since I have re-installed my Intranet Server 4 month ago.

[...]

> Since online checks are to slow, I like  to  see  a  solution  for  very
> reliable RBL checks and such.
> 
> I have a procmail recupe which catch the first and second  IP  from  the
> received header, reverse it and make DNS lookups like:

SA does lookups in parallel. You can even set timeout for them. I guess
lookups in procmail take longer time... 
-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Enter any 12-digit prime number to continue.

Re: spamassassin taks ten minutes for a message

Posted by Michelle Konzack <li...@tamay-dogan.net>.
Hi Bob,

Am 2008-09-20 18:22:25, schrieb Bob Proulx:
> I don't really know and hopefully others will have better
> suggestions.  But the first thing I would try is to run spamassassin
> in local mode.
> 
>        Options:
>         -L, --local                       Local tests only (no online tests)

I am using this since I have re-installed my Intranet Server 4 month ago.

> Since you are running it offline I am guessing that SA is trying to do
> network lookups and this is taking the extra time.

I have filtered in the last 4 month over 800.000  messages  and  it  was
working perfectly without and flaws and had stoped form  one  minute  to
another.

Since I am Off-Line, I had NO update for the system since 4 month, which
mean, absolutely nothing has changed.

> Why did this start?  I will make a second guess that something on your
> laptop is different in the networking system.  The first file I would
> check would be /etc/resolv.conf to see if dns name lookup is different
> than you expect when offline.  DNS lookups are "blocking" calls and
> can cause processes to wait during lookup.  Double check everything
> and make sure that dns lookups fail quickly when offline.

Spamassassin is on <samba3.private.tamay-dogan.net> and my Laptop is  on
<tp570.private.tamay-dogan.net>, Which mean, I download the messages  in
a Internet Cafe onto my Laptop sorted hourly and if I a connect my Lapto
@home, the folders where transfered automaticaly to my <samba3> where  a
script starts, reading one message after one and  pass  it  to  procmail
which do the filtering (including "spamc").

This setup is working since over 8 years...

But when spamassassin has stoped, I had  over  30.000  messages  in  the
queue and it stoped after 12.000 or such...

I should nore, that I use a global lock file for procmail,  which  mean,
it will handel only one file at once and there can ba no problem several
spamc requests screw up spamassassin...

> I actually do my own spamassassin online before getting to the laptop
> where I read mail offline.  The online tests and DNSBLs are much more
> effective than the offline tests.  I fear that offline spam testing

I was from 2008-09-01 to 2008-09-18 not in Strasbourg and  goten  78.000
messages in the mailboxes...  whit a small TP570 is is not  possibel  to
do and spamassassin stuff...

Only fetchmail and procmail (which sort the messages into hourly folders)
where I get arround 3200 messages per hour.

If I would install spamassassin on my TP570, I would get less then  1000
per hour.

> isn't good enough.  If you can get the spamassassin part running
> online before getting to your laptop I am sure you will have a
> superior result.

Since online checks are to slow, I like  to  see  a  solution  for  very
reliable RBL checks and such.

I have a procmail recupe which catch the first and second  IP  from  the
received header, reverse it and make DNS lookups like:

----[ '/usr/share/tdtools-procmail/FLT_spamhaus' ]----------------------
<snip>
    :0
    * ? test -f "`which host`"
    {
      SUB1=`formail -zxSubject:`
      DATE1=`date +"%d/%m/%Y %T"`

      ########## first IP ##########
      :0 H
      * Received:.*\[\/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+
      { 
        RECEIVIP=${MATCH} 
        :0
        * ! RECEIVIP ?? 127.0.0.1
        {
          :0
          * RECEIVIP ?? ()\/[0-9]+
          {
            QUAD1=${MATCH}
            :0
            * RECEIVIP ?? [0-9]+\.\/[0-9]+
            {
              QUAD2=${MATCH}
              :0
              * RECEIVIP ?? [0-9]+\.[0-9]+\.\/[0-9]+
              {
                QUAD3=${MATCH}
                :0
                * RECEIVIP ?? [0-9]+\.[0-9]+\.[0-9]+\.\/[0-9]+
                {
                  RECEIVIPREV="${MATCH}.${QUAD3}.${QUAD2}.${QUAD1}"
                }
              }
            }
      ################ sbl-xbl.spamhaus.org ##############################
            :0
            { REVCHECKIP=`host ${RECEIVIPREV}.zen.spamhaus.org 2>&1 | grep -v 'not found.'` }
            :0
            * $ REVCHECKIP ?? 127\.0\.0\.(2|4)
            { IP=`echo $RECEIVIP >>$HOME/log/spamhaus/\`date +%Y-%m\`.log`
              :0fhw
              | formail -i "Subject: ***zen.spamhaus.org*** $SUB1" -i "X-TDSpamHaus: $RECEIVIP"
              :0
              * ^Subject:.*(\*\*\*zen.spamhaus.org\*\*\*)
              ${TDTP_SPAM_PREFIX}${MSG_DATE}${SPAMTAG}.FLT_spamhaus.zen_spamhaus_org/
            }
      ################ list.dsbl.org #####################################
            :0
            { REVCHECKIP=`host ${RECEIVIPREV}.list.dsbl.org 2>&1 | grep -v 'not found.'` }
            :0
            * $ REVCHECKIP ?? 127\.0\.0\.(2|4)
            { IP=`echo $RECEIVIP >>$HOME/log/spamhaus/\`date +%Y-%m\`.log`
              :0fhw
              | formail -i "Subject: ***list.dsbl.org*** $SUB1" -i "X-TDSpamHaus: $RECEIVIP"
              :0
              * ^Subject:.*(\*\*\*list.dsbl.org\*\*\*)
              ${TDTP_SPAM_PREFIX}${MSG_DATE}${SPAMTAG}.FLT_spamhaus.list_dsbl_org/
            }
          }
        }
      }
  
      ########## second IP ##########
      :0 H
      * Received: from.*\[.*\](.*$)+Received:.*\[\/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+
      { 
        RECEIVIP2=${MATCH} 
        :0
        * ! RECEIVIP2 ?? 127.0.0.1
        {
          :0
          * RECEIVIP2 ?? ()\/[0-9]+
          {
            QUAD1=${MATCH}
            :0
            * RECEIVIP2 ?? [0-9]+\.\/[0-9]+
            {
              QUAD2=${MATCH}
              :0
              * RECEIVIP2 ?? [0-9]+\.[0-9]+\.\/[0-9]+
              {
                QUAD3=${MATCH}
                :0
                * RECEIVIP2 ?? [0-9]+\.[0-9]+\.[0-9]+\.\/[0-9]+
                {
                  RECEIVIP2REV="${MATCH}.${QUAD3}.${QUAD2}.${QUAD1}"
                }
              }
            }
      ################ sbl-xbl.spamhaus.org ##############################
            :0
            { REV2CHECKIP=`host ${RECEIVIP2REV}.zen.spamhaus.org 2>&1 | grep -v 'not found.'` }
            :0
            * $ REV2CHECKIP ?? 127\.0\.0\.(2|4)
            { IP=`echo $RECEIVIP >>$HOME/log/spamhaus/\`date +%Y-%m\`.log`
              :0fhw
              | formail -i "Subject: ***zen.spamhaus.org*** $SUB1" -i "X-TDSpamHaus: $RECEIVIP2"
              :0
              * ^Subject:.*(\*\*\*zen.spamhaus.org\*\*\*)
              ${TDTP_SPAM_PREFIX}${MSG_DATE}${SPAMTAG}.FLT_spamhaus.zen_spamhaus_org/
            }
      ################ list.dsbl.org ###################################
            :0
            { REV2CHECKIP=`host ${RECEIVIP2REV}.list.dsbl.org 2>&1 | grep -v 'not found.'` }
            :0
            * $ REV2CHECKIP ?? 127\.0\.0\.(2|4)
            { IP=`echo $RECEIVIP >>$HOME/log/spamhaus/\`date +%Y-%m\`.log`
              :0fhw
              | formail -i "Subject: ***list.dsbl.org*** $SUB1" -i "X-TDSpamHaus: $RECEIVIP2"
              :0
              * ^Subject:.*(\*\*\*list.dsbl.org\*\*\*)
              ${TDTP_SPAM_PREFIX}${MSG_DATE}${SPAMTAG}.FLT_spamhaus.list_dsbl_org/
            }
          }
        }
      }
    }
    :0E
    { LOG="${SHOW_FILTER}executable \"host\" not found.${NL}" }
------------------------------------------------------------------------

but unfortunately the two/four lookups with

    host ${RECEIVIP2REV}.zen.spamhaus.org
    host ${RECEIVIP2REV}.list.dsbl.org

are very slow...

My idea was already if I do not direct filtering, I could catch the IPs,
put it into a cache file, sort and  unify  it  and  use  an  independant
process which fetch the status and write out a file, which I can  easyly
import into my own DNS server (bind9) @home and then do the final
filtering

On my <samba3> I have with the Quad-Xeon  enough  resources  to  install
some  instances  of  bind9  as  VHosts   which   could   be   setup   as
<zen.spamhaus.org> and <list.dsbl.org> which then would  be  deactivated
if <samba3> get an internet connection...

Question: Is it possibel to get (FTP) the lists from the two servers for
          private non-public use?  If yes, how big are they?
          Since I am only 2-3 times per week On-Line, it would be nice
          if I could fetch the whole list.  (I asume this takes less
          resources as making several 1000 lookups on the DNS)

Thanks, Greetings and nice Day/Evening
    Michelle Konzack
    Systemadministrator
    24V Electronic Engineer
    Tamay Dogan Network
    Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
##################### Debian GNU/Linux Consultant #####################
Michelle Konzack   Apt. 917                  ICQ #328449886
+49/177/9351947    50, rue de Soultz         MSN LinuxMichi
+33/6/61925193     67100 Strasbourg/France   IRC #Debian (irc.icq.com)

Re: spamassassin taks ten minutes for a message

Posted by Bob Proulx <bo...@proulx.com>.
Michelle Konzack wrote:
> I am downloading my messages with my Laptop in a Internet  cafe,  trans-
> fering @home to my server and then let a filter roll over it...
> 
> Note:  I am working Off-Line (No Internet @home)

I do something very similar.

> buging, exactly I took arround 10 minutes for each message  and  when  I
> ...
> move the files out of the way.  "spamassassin" refus to run with  normal
> speed (4-6 messages per second)-

I don't really know and hopefully others will have better
suggestions.  But the first thing I would try is to run spamassassin
in local mode.

       Options:
        -L, --local                       Local tests only (no online tests)

Since you are running it offline I am guessing that SA is trying to do
network lookups and this is taking the extra time.

Why did this start?  I will make a second guess that something on your
laptop is different in the networking system.  The first file I would
check would be /etc/resolv.conf to see if dns name lookup is different
than you expect when offline.  DNS lookups are "blocking" calls and
can cause processes to wait during lookup.  Double check everything
and make sure that dns lookups fail quickly when offline.

I actually do my own spamassassin online before getting to the laptop
where I read mail offline.  The online tests and DNSBLs are much more
effective than the offline tests.  I fear that offline spam testing
isn't good enough.  If you can get the spamassassin part running
online before getting to your laptop I am sure you will have a
superior result.

Hope this helps,
Bob

Re: spamassassin taks ten minutes for a message

Posted by John Hardin <jh...@impsec.org>.
On Tue, 23 Sep 2008, Michelle Konzack wrote:

> Am 2008-09-21 08:56:15, schrieb Matt Kettler:
>> It looks like spamassassin is attempting to perform a bayes expiry, and
>> you keep killing it before it can finish. It does need to do that once
>> in a while, and it is slow.
>
> I was not killing it, I  was  only  watching  the  logfiles  using  tail
> instead of a nice file  on  TV.   Spamassassin  took  at  the  beginning
> several minutes and then after several 100 messages over 20 minutes  per
> message...

SA does have an internal time for how long it is willing to wait on the 
expiry. SA was probably killing the expiry process.

>> If you want to, you can run sa-learn --force-expire in order to make
>> expiry run manually. If no expiry has been run recently, SA will attempt
>> to do so during mail delivery.
>
> Oops... runing...
>
> Hmmm, I have found a bunch of "bayes_toks.expireNNNNN" in the folder...

Yes, that is a clear sign of interrupted bayes expiry attempts.

At this point you should probably turn off auto-expire and run a manual 
expiry from cron daily.

> I asume, the "bayes_toks.expireNNNNN" are  made  by  previously  run  of
> "expire" and left over...  I deleted it...

They remain when an expiry has been interrupted.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   You cannot bring about prosperity by discouraging thrift. You
   cannot help small men by tearing down big men. You cannot
   strengthen the weak by weakening the strong. You cannot lift the
   wage-earner by pulling down the wage-payer. You cannot help the
   poor man by destroying the rich. You cannot keep out of trouble by
   spending more than your income. You cannot further the brotherhood
   of man by inciting class hatred. You cannot establish security on
   borrowed money. You cannot build character and courage by taking
   away men's initiative and independence. You cannot help men
   permanently by doing for them what they could and should do for
   themselves.                               -- William J. H. Boetcker
-----------------------------------------------------------------------
  41 days until the Presidential Election

Re: spamassassin taks ten minutes for a message

Posted by Michelle Konzack <li...@tamay-dogan.net>.
Am 2008-09-21 08:56:15, schrieb Matt Kettler:
> It looks like spamassassin is attempting to perform a bayes expiry, and
> you keep killing it before it can finish. It does need to do that once
> in a while, and it is slow.

I was not killing it, I  was  only  watching  the  logfiles  using  tail
instead of a nice file  on  TV.   Spamassassin  took  at  the  beginning
several minutes and then after several 100 messages over 20 minutes  per
message...

> If you want to, you can run sa-learn --force-expire in order to make
> expiry run manually. If no expiry has been run recently, SA will attempt
> to do so during mail delivery.

Oops... runing...

Hmmm, I have found a bunch of "bayes_toks.expireNNNNN" in the folder...

[michelle.konzackm@samba3:~] sa-learn --force-expire
expired old bayes database entries in 113 seconds
122163 entries kept, 12157 deleted
token frequency: 1-occurrence tokens: 48.69%
token frequency: less than 8 occurrences: 27.96%

I asume, the "bayes_toks.expireNNNNN" are  made  by  previously  run  of
"expire" and left over...  I deleted it...

Thanks, Greetings and nice Day/Evening
    Michelle Konzack
    Systemadministrator
    24V Electronic Engineer
    Tamay Dogan Network
    Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
##################### Debian GNU/Linux Consultant #####################
Michelle Konzack   Apt. 917                  ICQ #328449886
+49/177/9351947    50, rue de Soultz         MSN LinuxMichi
+33/6/61925193     67100 Strasbourg/France   IRC #Debian (irc.icq.com)

Re: spamassassin taks ten minutes for a message

Posted by John Hardin <jh...@impsec.org>.
On Sun, 21 Sep 2008, Matt Kettler wrote:

> Michelle Konzack wrote:
>
>> The first 18.000 messages went fin but then "spamassassin"  begun to 
>> buging, exactly I took arround 10 minutes for each message and when I 
>> encountered the problem it was already runnin several hours with this 
>> problem...
>
> It looks like spamassassin is attempting to perform a bayes expiry, and 
> you keep killing it before it can finish. It does need to do that once 
> in a while, and it is slow.

That's what I thought of first as well, but...

>> -rw-------  1 michelle.konzack private 20377600 2008-08-31 18:29 bayes_toks

20MB of tokens doesn't seem all that large to me.

>> As you can see, I had to stop spamassassin on 2008-08-31.  And even if 
>> I move the files out of the way.  "spamassassin" refus to run with 
>> normal speed (4-6 messages per second)-

...and if the bayes files disappear, shouldn't expiry-related problems 
stop? (granted, they got a lot better, and a manual expiry might help a 
_lot_...)

I'd like to see a little more data on this one first. But if she does a 
manual expiry and says "It is working now!" I won't complain. :)

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Democrats '63: Ask not what your country can do for you,
    ask what you can do for your country.
   Democrats '07: Ask not what your country can do for you,
    demand it!
-----------------------------------------------------------------------
  44 days until the Presidential Election

Re: spamassassin taks ten minutes for a message

Posted by Matt Kettler <mk...@verizon.net>.
Michelle Konzack wrote:
> Hello,
>
> I am downloading my messages with my Laptop in a Internet  cafe,  trans-
> fering @home to my server and then let a filter roll over it...
>
> Note:  I am working Off-Line (No Internet @home)
>
> Last weekend I was with my server @friends and conected it over ADSL to
> the Internet and downloadd arround 30.000 messages from arround 10 days
> since I was not in Strasbourg.
>
> The first 18.000 messages went fin  but  then  "spamassassin"  begun  to
> buging, exactly I took arround 10 minutes for each message  and  when  I
> encountered the problem it was already runnin several  hours  with  this
> problem...
>   
It looks like spamassassin is attempting to perform a bayes expiry, and
you keep killing it before it can finish. It does need to do that once
in a while, and it is slow.

If you want to, you can run sa-learn --force-expire in order to make
expiry run manually. If no expiry has been run recently, SA will attempt
to do so during mail delivery.

> How can ths be?
>
> ----[ command 'cd .spamassassin && ls -Al' ]----------------------------
> insgesamt 25528
> -rw-------  1 michelle.konzack private  1327104 2008-08-31 18:29 auto-whitelist
> -rw-------  1 michelle.konzack private    93840 2008-08-31 18:29 bayes_journal
> -rw-------  1 michelle.konzack private  2629632 2008-08-31 18:29 bayes_seen
> -rw-------  1 michelle.konzack private 20377600 2008-08-31 18:29 bayes_toks
> -rw-------  1 michelle.konzack private  4718592 2008-08-30 19:46 bayes_toks.expire30302
> -rw-------  1 michelle.konzack private  4513792 2008-08-30 20:42 bayes_toks.expire32331
> -rw-------  1 michelle.konzack private  4517888 2008-08-31 18:29 bayes_toks.expire7360
> -rw-r--r--  1 michelle.konzack private     1510 2008-08-30 21:52 user_prefs
> ------------------------------------------------------------------------
>
> As you can see, I had to stop spamassassin on 2008-08-31.  And even if I
> move the files out of the way.  "spamassassin" refus to run with  normal
> speed (4-6 messages per second)-
>
> Note:  The Server is a Quad-Xeon with plenty of memory and
>        10.000 RpM SCSI drives in Raid-1.
>
> Thanks, Greetings and nice Day/Evening
>     Michelle Konzack
>     Systemadministrator
>     24V Electronic Engineer
>     Tamay Dogan Network
>     Debian GNU/Linux Consultant
>
>
>