You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2007/08/10 20:01:32 UTC

[Bug 5590] New: Scantime is very long unless "use bytes" hack is used

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590

           Summary: Scantime is very long unless "use bytes" hack is used
           Product: Spamassassin
           Version: 3.2.3
          Platform: Sun
        OS/Version: Solaris
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Libraries
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: rosenbaumlm@ornl.gov


# spamd -V
SpamAssassin Server version 3.2.3
  running on Perl 5.8.8
  with zlib support (Compress::Zlib 2.004)
# uname -a
SunOS ornl71 5.9 Generic_118558-39 sun4u sparc SUNW,UltraAX-i2
Problem has been present since SA v3.2.1

Some messages take a very long time to scan (on the order of several minutes). 
This time is reduced substantially if "use bytes" is added to Message.pm.  For 
a test case, I did the following to minimize the effects of network lookups 
and local rules:
Added -L and -D to spamd startup
Turned off Bayes, AWL, razor, pyzor, DCC.
Set dns_available to "no".
Removed SARE and locally-defined rules.
Scanned the test message with spamc.
Results
With "use bytes" hack:  4 seconds
Without the hack:      23 seconds
Adding in SARE and local rules makes the gap even wider:
With "use bytes" hack:  16 seconds
Without the hack:      122 seconds



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5590] Scantime is very long unless "use bytes" hack is used

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590





------- Additional Comments From felicity@apache.org  2007-08-11 20:18 -------
fwiw, I would just change your LANG environment variable.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5590] Scantime is very long unless "use bytes" hack is used

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590





------- Additional Comments From rosenbaumlm@ornl.gov  2007-08-10 11:02 -------
Created an attachment (id=4082)
 --> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=4082&action=view)
Test case




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Re: [Bug 5590] Scantime is very long unless "use bytes" hack is used

Posted by Matt Kettler <mk...@verizon.net>.
Matt Kettler wrote:
> Mark Martinec wrote:
>   
>>> However, in mine the difference when using a "stock" 3.2.3 is barely
>>> noticeable, going from 9 seconds to 8 seconds.
>>>
>>> Adding in a good handful of SARE rules (1365 extra rules, counting "score"
>>> lines) makes the difference quite significant.
>>>
>>> Without "use bytes" and the SARE rules:
>>> real    0m21.626s
>>> user    0m20.873s
>>> sys     0m0.264s
>>>
>>> With "use bytes" added to Message.pm, and the SARE rules:
>>> real    0m9.012s
>>> user    0m6.468s
>>> sys     0m0.240s
>>>     
>>>       
>> Do you have a UTF-8 -based locale in the environment?
>> (what does command 'locale' tell?)
>>
>> If so, turning it off (set to "C") is very desirable.
>>
>>   Mark
>>
>>   
>>     
> Good catch, mine is UTF-8.. Not sure about the original reporter.
>
>   
Actually, not such a good catch, even after changing it in i18n and
rebooting, I still get long scan times without use bytes.

# locale
LANG=en_US
LC_CTYPE="en_US"

Without use bytes:
real    0m22.611s
user    0m20.944s
sys     0m0.305s

With use bytes:
real    0m11.572s
user    0m6.626s
sys     0m0.262s



Re: [Bug 5590] Scantime is very long unless "use bytes" hack is used

Posted by Loren Wilton <lw...@earthlink.net>.
> Good catch, mine is UTF-8.. Not sure about the original reporter.

Probably not, or they wouldn't be seeing the large difference they see.

        Loren



Re: [Bug 5590] Scantime is very long unless "use bytes" hack is used

Posted by Matt Kettler <mk...@verizon.net>.
Mark Martinec wrote:
>> However, in mine the difference when using a "stock" 3.2.3 is barely
>> noticeable, going from 9 seconds to 8 seconds.
>>
>> Adding in a good handful of SARE rules (1365 extra rules, counting "score"
>> lines) makes the difference quite significant.
>>
>> Without "use bytes" and the SARE rules:
>> real    0m21.626s
>> user    0m20.873s
>> sys     0m0.264s
>>
>> With "use bytes" added to Message.pm, and the SARE rules:
>> real    0m9.012s
>> user    0m6.468s
>> sys     0m0.240s
>>     
>
> Do you have a UTF-8 -based locale in the environment?
> (what does command 'locale' tell?)
>
> If so, turning it off (set to "C") is very desirable.
>
>   Mark
>
>   
Good catch, mine is UTF-8.. Not sure about the original reporter.

Re: [Bug 5590] Scantime is very long unless "use bytes" hack is used

Posted by Mark Martinec <Ma...@ijs.si>.
> However, in mine the difference when using a "stock" 3.2.3 is barely
> noticeable, going from 9 seconds to 8 seconds.
>
> Adding in a good handful of SARE rules (1365 extra rules, counting "score"
> lines) makes the difference quite significant.
>
> Without "use bytes" and the SARE rules:
> real    0m21.626s
> user    0m20.873s
> sys     0m0.264s
>
> With "use bytes" added to Message.pm, and the SARE rules:
> real    0m9.012s
> user    0m6.468s
> sys     0m0.240s

Do you have a UTF-8 -based locale in the environment?
(what does command 'locale' tell?)

If so, turning it off (set to "C") is very desirable.

  Mark

[Bug 5590] Scantime is very long unless "use bytes" hack is used

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590





------- Additional Comments From mkettler_sa@verizon.net  2007-08-10 19:06 -------
Confirmed my test box can replicate the results using a crude: time spamassassin
-t <testcase.eml

However, in mine the difference when using a "stock" 3.2.3 is barely noticeable,
going from 9 seconds to 8 seconds. 

Adding in a good handful of SARE rules (1365 extra rules, counting "score"
lines) makes the difference quite significant.

Without "use bytes" and the SARE rules:
real    0m21.626s
user    0m20.873s
sys     0m0.264s

With "use bytes" added to Message.pm, and the SARE rules:
real    0m9.012s
user    0m6.468s
sys     0m0.240s



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5590] Scantime is very long unless "use bytes" hack is used

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590





------- Additional Comments From rosenbaumlm@ornl.gov  2007-08-10 11:04 -------
Created an attachment (id=4083)
 --> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=4083&action=view)
spamd debug log from test case




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5590] Scantime is very long unless "use bytes" hack is used

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590





------- Additional Comments From mkettler_sa@verizon.net  2007-08-11 19:41 -------
Side note: Mark Martinec suggested this might be due to UTF-8 encoding in the
locale. 

While my test system did have en_US.UTF-8 as the LANG, resetting
/etc/sysconfig/i18n to use "en_US" and rebooting did not change the results in
any significant way.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5590] Scantime is very long unless "use bytes" hack is used

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590





------- Additional Comments From rosenbaumlm@ornl.gov  2008-01-15 12:22 -------
This problem still exists in v3.2.4.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.