You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "Brian J. Murrell" <br...@interlinx.bc.ca> on 2009/01/19 16:33:11 UTC

excessive scan time

I'm running 3.2.4(-1ubuntu1) of spamassassin here and have been noticing 
some excessive scan times.  i.e.:

Jan 18 19:07:28 linux spamd[30216]: spamd: result: Y 14 - AWL,BAYES_99,DCC_CHECK,DIGEST_MULTIPLE,HTML_IMAGE_ONLY_20,HTML_IMAGE_RATIO_06,HTML_MESSAGE,HTML_SHORT_LINK_IMG_3,MIME_HTML_ONLY,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E8_51_100,RAZOR2_CHECK,RDNS_NONE,TVD_APPROVED,URIBL_BLACK scantime=604.3,size=3325,user=brian,uid=1001,required_score=5.5,rhost=localhost,raddr=127.0.0.1,rport=49135,mid=<20...@66v.uwp30.uDelmarva.com>,bayes=1.000000,autolearn=spam

The result of this (604 second) scan time is that the MTA ends up giving 
up waiting after 600 seconds and the scan result is essentially wasted.  
No doubt some kind of "remote" test is taking an excessive amount of time.

How can I:

a) determine why the scan time is so long, after the fact (i.e. I could
   try to run the same spam through a "spamassassin -D [-t]" but there is
   no guarantee that whatever took so long the first time through will
   again take so long)?
b) reduce some timeouts of some particular tests so that the total test
   time does not exceed a reasonable threshold?

Thanx,
b.


Re: excessive scan time

Posted by "Brian J. Murrell" <br...@interlinx.bc.ca>.
On Thu, 22 Jan 2009 12:37:09 +0000, Justin Mason wrote:

> you should definitely investigate ways to avoid doing NFS reads/writes
> of the bayes files -- that is extremely I/O intensive, and NFS deals
> with it very badly.

OK.  Noted.  Maybe I will push the bayes database into MySQL as 
previously suggested.

Thanx!

b.



Re: excessive scan time

Posted by Justin Mason <jm...@jmason.org>.
you should definitely investigate ways to avoid doing NFS reads/writes
of the bayes files -- that is extremely I/O intensive, and NFS deals
with it very badly.

--j.

On Thu, Jan 22, 2009 at 12:27, Jonas Eckerman <jo...@frukt.org> wrote:
> Brian J. Murrell wrote:
>
>> One thing worth noting is that I have spamassassin using ~/.spamassassin
>> here and people's home dirs can be (i.e. NFS) mounted from remote machines
>> (i.e. their primary workstations), which do occasionally get shut down.
>
> If you're not allready using a SQL database for bayes and AWL I'd suggest
> you do that.
>
> I'd also suggest using SQL for user preferences.
>
>> I wonder what happens in the MTA->SA->local delivery process chain when
>> ~/.spamassassin is unavailable, or worse, on a stale mount.
>
> With bayes, AWL and user prefs in a SQL database that problem ought to be
> avoided. (Maybe there's more than those that should be moved from
> ~/.spamassassin though).
>
> /Jonas
> --
> Jonas Eckerman, FSDB & Fruktträdet
> http://whatever.frukt.org/
> http://www.fsdb.org/
> http://www.frukt.org/
>
>

Re: excessive scan time

Posted by LuKreme <kr...@kreme.com>.
On 22-Jan-2009, at 13:57, Brian J. Murrell wrote:
> Now users need to know how to edit SQL records, or I
> need to install a web interface for that.  The ROI here for that is  
> just
> not high enough.


Really?  A webface to edit user configuration options in an SQL  
database is trivial.  I know its trivial because *I* can do it.

-- 
"Whose motorcycle is this?" "It's  chopper, baby." "Whose chopper
	is this?" "It's Zed's." "Who's Zed?" "Zed' dead, baby. Zed's
	dead."


Re: excessive scan time

Posted by Jonas Eckerman <jo...@frukt.org>.
Brian J. Murrell wrote:

>> I'd also suggest using SQL for user preferences.

> The user interface (i.e. editing a file) for user preferences is a 
> different story.  Now users need to know how to edit SQL records, or I 
> need to install a web interface for that.

Or you use a small script that reads the users preferences from 
file (when the file has been modified) and updates the SQL database.

Regards
/Jonas
-- 
Jonas Eckerman, FSDB & Fruktträdet
http://whatever.frukt.org/
http://www.fsdb.org/
http://www.frukt.org/


Re: excessive scan time

Posted by "Brian J. Murrell" <br...@interlinx.bc.ca>.
On Thu, 22 Jan 2009 13:27:57 +0100, Jonas Eckerman wrote:
> 
> If you're not allready using a SQL database for bayes and AWL I'd
> suggest you do that.

Those two I might be willing to consider, however...
 
> I'd also suggest using SQL for user preferences.

The user interface (i.e. editing a file) for user preferences is a 
different story.  Now users need to know how to edit SQL records, or I 
need to install a web interface for that.  The ROI here for that is just 
not high enough.

> With bayes, AWL and user prefs in a SQL database that problem ought to
> be avoided. (Maybe there's more than those that should be moved from
> ~/.spamassassin though).

Yeah.  I tend to doubt those are the real culprits.  I think I have 
identified a backup process on the same server that does the NFS and mail 
as being quite expensive in both disk an memory and it's probably what is 
contending with spamd processes for resources.

b.



Re: excessive scan time

Posted by Jonas Eckerman <jo...@frukt.org>.
Brian J. Murrell wrote:

> One thing worth noting is that I have spamassassin using ~/.spamassassin 
> here and people's home dirs can be (i.e. NFS) mounted from remote 
> machines (i.e. their primary workstations), which do occasionally get 
> shut down.

If you're not allready using a SQL database for bayes and AWL I'd 
suggest you do that.

I'd also suggest using SQL for user preferences.

> I wonder what happens in the MTA->SA->local delivery process 
> chain when ~/.spamassassin is unavailable, or worse, on a stale mount.

With bayes, AWL and user prefs in a SQL database that problem 
ought to be avoided. (Maybe there's more than those that should 
be moved from ~/.spamassassin though).

/Jonas
-- 
Jonas Eckerman, FSDB & Fruktträdet
http://whatever.frukt.org/
http://www.fsdb.org/
http://www.frukt.org/


Re: excessive scan time

Posted by "Brian J. Murrell" <br...@interlinx.bc.ca>.
On Mon, 19 Jan 2009 16:47:24 +0100, Matus UHLAR - fantomas wrote:
>  
> When did you sa-update for last time?

Ubuntu appears to install a cron.daily cron job which does this amongst 
other things.

> How many processes are you running
> in parallel?

I have a pretty low volume system but I did just up it from 5 to 8 
yesterday.

> Aren't you running out of memory?

No.
 
>> a) determine why the scan time is so long, after the fact (i.e. I could
>>    try to run the same spam through a "spamassassin -D [-t]" but there
>>    is no guarantee that whatever took so long the first time through
>>    will again take so long)?
> 
> try running spamasssin with -L option

How will -L (local tests only) help me determine which remote tests are 
taking so long?

>> b) reduce some timeouts of some particular tests so that the total test
>>    time does not exceed a reasonable threshold?
> 
> razor,pyzor,dcc,spf,dkim,rbl have their timeouts (*_timeout), see their
> (or SpamAssassin) docs.

Indeed.  "dpkg -L spamassassin | xargs grep _timeout" shows some very 
interesting results.

Now that I think about it, I wonder if I am barking up the wrong tree.  
One thing worth noting is that I have spamassassin using ~/.spamassassin 
here and people's home dirs can be (i.e. NFS) mounted from remote 
machines (i.e. their primary workstations), which do occasionally get 
shut down.  I wonder what happens in the MTA->SA->local delivery process 
chain when ~/.spamassassin is unavailable, or worse, on a stale mount.

Is there a reasonable timeout built in to trying to read from that dir?

Thots?

b.



Re: excessive scan time

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 19.01.09 15:33, Brian J. Murrell wrote:
> I'm running 3.2.4(-1ubuntu1) of spamassassin here and have been noticing 
> some excessive scan times.  i.e.:
> 
> Jan 18 19:07:28 linux spamd[30216]: spamd: result: Y 14 -
> AWL,BAYES_99,DCC_CHECK,DIGEST_MULTIPLE,HTML_IMAGE_ONLY_20,HTML_IMAGE_RATIO_06,HTML_MESSAGE,HTML_SHORT_LINK_IMG_3,MIME_HTML_ONLY,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E8_51_100,RAZOR2_CHECK,RDNS_NONE,TVD_APPROVED,URIBL_BLACK
> scantime=604.3,size=3325,user=brian,uid=1001,required_score=5.5,rhost=localhost,raddr=127.0.0.1,rport=49135,mid=<20...@66v.uwp30.uDelmarva.com>,bayes=1.000000,autolearn=spam
> 
> The result of this (604 second) scan time is that the MTA ends up giving 
> up waiting after 600 seconds and the scan result is essentially wasted.  
> No doubt some kind of "remote" test is taking an excessive amount of time.

When did you sa-update for last time?
How many processes are you running in parallel? Aren't you running out of
memory?

> a) determine why the scan time is so long, after the fact (i.e. I could
>    try to run the same spam through a "spamassassin -D [-t]" but there is
>    no guarantee that whatever took so long the first time through will
>    again take so long)?

try running spamasssin with -L option

> b) reduce some timeouts of some particular tests so that the total test
>    time does not exceed a reasonable threshold?

razor,pyzor,dcc,spf,dkim,rbl have their timeouts (*_timeout), see their
(or SpamAssassin) docs.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Microsoft dick is soft to do no harm