You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2008/04/07 01:06:11 UTC

[Bug 5876] New: MEMORY: Lower memory usage by up to one meg.

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5876

           Summary: MEMORY: Lower memory usage by up to one meg.
           Product: Spamassassin
           Version: 3.2.4
          Platform: Other
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: spamc/spamd
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: nick@cpanel.net


SA loads up all the rules into an eval.  The eval has about 1 meg of white
space in it and comments which can be removed to save some memory.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5876] MEMORY: Lower memory usage by up to one meg.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5876





--- Comment #17 from Mark Martinec <Ma...@ijs.si>  2009-04-01 07:13:43 PST ---
So this is now resolved as s side-product of a Bug 6060 solution
(for 3.3 only). I guess we can close this one now.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5876] MEMORY: Lower memory usage by up to one meg.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5876


Justin Mason <jm...@jmason.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED




--- Comment #18 from Justin Mason <jm...@jmason.org>  2009-04-01 07:27:32 PST ---
cool.  Nick, feel free to reopen if there are further bits that you want to
work on.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5876] MEMORY: Lower memory usage by up to one meg.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5876





--- Comment #9 from J. Nick Koston <ni...@cpanel.net>  2008-04-08 06:44:49 PST ---
prelauch to build the .pm files saves about 302k


Also I changed the code_file in the eval block to this
  my $code_file = $self->{evalpath} . '/' . $methodname . '.pm'; #should use
file::spec ?


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5876] MEMORY: Lower memory usage by up to one meg.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5876





--- Comment #8 from J. Nick Koston <ni...@cpanel.net>  2008-04-07 08:55:56 PST ---
You could probably also prelaunch to build the eval .pm files and then relaunch
and just load them as you won't end up with all the garbage in memory from
building them.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5876] MEMORY: Lower memory usage by up to one meg.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5876





--- Comment #10 from J. Nick Koston <ni...@cpanel.net>  2008-04-08 06:59:39 PST ---
There are some other things that could be done to make this more memory
efficient

perhaps use preset variables, never pass arguments as a hash to got hit

$self->got_hit(q#__BLAH_BLAH#,"BODY: ",ruletype=>"body");

would become

$self->got_hit(q#__REPTO_OVERQUOTE#,$prepend2desc,$body);

Doing some testing this this shows a bit of memory recovery.   I just changed
the body/head rules to constants and got back about 32k of ram.  I'm thinking
perl is just storing lots of copied of the string "body" and "ruletype" in
memory.  I'm not sure how it tried to internally optimize this.  I wonder if it
would be any faster if stored as a hash of qrs for the head and one line body
stuff ?  Maybe you have already tried that though.


Something like this is already done with the _eval tests

sub _eval_tests_type11_pri0_set1 {     
 my ($self, @extraevalargs) = @_;      
my $scoresptr = $self->{conf}->{scores};     
 my $prepend2desc = q#BODY: #;      
my $rulename;    
  my $result;

f($scoresptr->{q#__HTML_LENGTH_1024_1536#}){$rulename=q#__HTML_LENGTH_1024_1536#;$self->{test_log_msgs}=();$self->{current_rule_name}=$rulename;$self->register_plugin_eval_glue(q#html_range#);eval{$result=$self->html_range
(@extraevalargs ,q#length#, q#1024#, q#1536#
);};if($@){$self->handle_eval_rule_errors($rulename);}if($result){$self->got_hit($rulename,$prepend2desc,ruletype=>"eval",value=>$result);
}}
...


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5876] MEMORY: Lower memory usage by up to one meg.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5876





--- Comment #2 from J. Nick Koston <ni...@cpanel.net>  2008-04-06 16:08:59 PST ---
Before 
root      1232 18.1  1.0 25832 22604 ?       Ss   17:21   0:01 /usr/bin/spamd
-d --allowed-ips=127.0.0.1 --pidfile=/var/run/spamd.pid --max-children=3
--max-spare=1
root      1234  0.0  1.0 25832 21624 ?       S    17:21   0:00 spamd child


After
root     10511 18.1  1.0 25144 21576 ?       Ss   18:04   0:01 /usr/bin/spamd
-d --allowed-ips=127.0.0.1 --pidfile=/var/run/spamd.pid --max-children=3
--max-spare=1
root     10513  0.0  0.9 25144 20596 ?       S    18:04   0:00 spamd child


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5876] MEMORY: Lower memory usage by up to one meg.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5876





--- Comment #15 from J. Nick Koston <ni...@cpanel.net>  2008-04-16 20:52:57 PST ---
I haven't had time to finish working on this.  

The change sets here: http://koston.org/SA_LOWER_MEM_v3/ are still quite valid

However I need setup a better working environment for this so I can revert and
test changes in a more efficient manner before I can move past these as this
endeavor has become to large to keep track of without some type of revision
control.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5876] MEMORY: Lower memory usage by up to one meg.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5876





--- Comment #1 from J. Nick Koston <ni...@cpanel.net>  2008-04-06 16:07:57 PST ---
Created an attachment (id=4289)
 --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4289)
Proof of concept


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5876] MEMORY: Lower memory usage by up to one meg.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5876





--- Comment #14 from Justin Mason <jm...@jmason.org>  2008-04-16 15:13:51 PST ---
Nick, are you still working on this?


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5876] MEMORY: Lower memory usage by up to one meg.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5876





--- Comment #11 from Justin Mason <jm...@jmason.org>  2008-04-08 07:20:19 PST ---
I don't think there's a need to use File::Spec for that btw.

(In reply to comment #10)
> There are some other things that could be done to make this more memory
> efficient
> 
> perhaps use preset variables, never pass arguments as a hash to got hit
> 
> $self->got_hit(q#__BLAH_BLAH#,"BODY: ",ruletype=>"body");
> 
> would become
> 
> $self->got_hit(q#__REPTO_OVERQUOTE#,$prepend2desc,$body);

unfortunately this is a public API :( we could define a per-rule wrapper for
got_hit() though, for our code to use, which does this:

sub _got_body_hit { $_[0]->got_hit($_[1], "BODY: ",ruletype=>"body"); }

if that will save memory...

> Doing some testing this this shows a bit of memory recovery.   I just changed
> the body/head rules to constants and got back about 32k of ram.  I'm thinking
> perl is just storing lots of copied of the string "body" and "ruletype" in
> memory.  I'm not sure how it tried to internally optimize this. 

bizarre.  I was certain that was stored as 1 string internally.

> I wonder if it
> would be any faster if stored as a hash of qrs for the head and one line body
> stuff ?  Maybe you have already tried that though.

yes -- it's slower.  it may be more memory efficient, but I doubt the speed hit
will be worth it.

> Something like this is already done with the _eval tests
> 
> sub _eval_tests_type11_pri0_set1 {     
>  my ($self, @extraevalargs) = @_;      
> my $scoresptr = $self->{conf}->{scores};     
>  my $prepend2desc = q#BODY: #;      
> my $rulename;    
>   my $result;
> 
> f($scoresptr->{q#__HTML_LENGTH_1024_1536#}){$rulename=q#__HTML_LENGTH_1024_1536#;$self->{test_log_msgs}=();$self->{current_rule_name}=$rulename;$self->register_plugin_eval_glue(q#html_range#);eval{$result=$self->html_range
> (@extraevalargs ,q#length#, q#1024#, q#1536#
> );};if($@){$self->handle_eval_rule_errors($rulename);}if($result){$self->got_hit($rulename,$prepend2desc,ruletype=>"eval",value=>$result);
> }}
> ...

I don't quite get what you mean here...


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5876] MEMORY: Lower memory usage by up to one meg.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5876





--- Comment #7 from J. Nick Koston <ni...@cpanel.net>  2008-04-07 08:39:26 PST ---
Multiple Options

http://koston.org/SA_LOWER_MEM_v3/


Option 1 = Compress Perl code and still use evals of scalars
Option 2 = Write perl code out to a file and 'do' it in
Option 3 = Combination of Option1 & Option2



baseline (control)
root      4099 34.6  1.1 26836 22844 ?       Ss   10:20   0:01 /usr/bin/spamd
-d --allowed-ips=127.0.0.1 --pidfile=/var/run/spamd.pid --max-children=3
--max-spare=1
root      4117  0.0  1.0 26836 21872 ?       S    10:20   0:00 spamd child

evaling in created pm files (method two)
root      1004 55.5  1.0 25072 22340 ?       Ss   10:18   0:01 /usr/bin/spamd
-d --allowed-ips=127.0.0.1 --pidfile=/var/run/spamd.pid --max-children=3
--max-spare=1
root      1006  0.0  1.0 25072 21364 ?       S    10:18   0:00 spamd child

compressed perl code (method one)
root      7039 50.5  1.0 26404 22148 ?       Ss   10:24   0:01 /usr/bin/spamd
-d --allowed-ips=127.0.0.1 --pidfile=/var/run/spamd.pid --max-children=3
--max-spare=1
root      7041  0.0  1.0 26404 21176 ?       S    10:24   0:00 spamd child

evaling in created pm files && compressed perl code (method three = method one
& method two)
root      7833 34.6  1.0 25312 21972 ?       Ss   10:31   0:01 /usr/bin/spamd
-d --allowed-ips=127.0.0.1 --pidfile=/var/run/spamd.pid --max-children=3
--max-spare=1
root      7837  0.0  1.0 25312 20996 ?       S    10:31   0:00 spamd child


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5876] MEMORY: Lower memory usage by up to one meg.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5876





--- Comment #12 from Theo Van Dinter <fe...@apache.org>  2008-04-08 08:39:55 PST ---
Just one thing to keep in mind here...  Saving memory and reducing resource
usage is good and all, but keeping readable, maintainable, and extensible code
is also important.

For example, no one cares about 32k of memory.  So discussions about changing
APIs, etc, for the sake of shaving that tiny amount of memory usage out doesn't
make a lot of sense.   Saving O(MB) of memory by removing whitespace and
formatting that only a computer is ever going to read -- +1. :)


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5876] MEMORY: Lower memory usage by up to one meg.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5876





--- Comment #4 from J. Nick Koston <ni...@cpanel.net>  2008-04-06 16:10:08 PST ---
Created an attachment (id=4291)
 --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4291)
Proof of Concept part 3


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5876] MEMORY: Lower memory usage by up to one meg.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5876





--- Comment #5 from J. Nick Koston <ni...@cpanel.net>  2008-04-06 19:33:27 PST ---
Another thought:

Perhaps its better to write out modules to /var/lib/spamassassin then 

eval { require "FULLPATH/TO/MOD/HERE"; };

Let perl do the work of opening/loading the file as its bound to be more
efficient evaling in a scalar full of code.


This could also make startup time faster as the modules could serve as caches
instead of constructing them in memory every time.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5876] MEMORY: Lower memory usage by up to one meg.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5876





--- Comment #3 from J. Nick Koston <ni...@cpanel.net>  2008-04-06 16:09:53 PST ---
Created an attachment (id=4290)
 --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4290)
Proof of Concept


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5876] MEMORY: Lower memory usage by up to one meg.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5876


Justin Mason <jm...@jmason.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|Undefined                   |3.3.0




--- Comment #6 from Justin Mason <jm...@jmason.org>  2008-04-07 01:38:17 PST ---
Excellent! thanks for this, that had _never_ occurred to me, even in 5 years of
heavy optimization of that code! ;)

(In reply to comment #5)
> Let perl do the work of opening/loading the file as its bound to be more
> efficient evaling in a scalar full of code.
> This could also make startup time faster as the modules could serve as caches
> instead of constructing them in memory every time.

feel free to benchmark this ;)   If it works well, I'll gladly get it in.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5876] MEMORY: Lower memory usage by up to one meg.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5876





--- Comment #16 from Justin Mason <jm...@jmason.org>  2008-05-15 13:48:41 PST ---
(In reply to comment #8)
> You could probably also prelaunch to build the eval .pm files and then relaunch
> and just load them as you won't end up with all the garbage in memory from
> building them.

ah, I never followed up on this point.  I tried this at one point,
but failed to get any useful memory reduction.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5876] MEMORY: Lower memory usage by up to one meg.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5876





--- Comment #13 from J. Nick Koston <ni...@cpanel.net>  2008-04-08 09:27:59 PST ---
32k was only what I got back from doing one section and it wasn't the best way
to do it for sure.  Still doing some benchmarking here ...  32k certainly isn't
worth changing all that.

Going to something like

sub _got_body_hit { $_[0]->got_hit($_[1], "BODY: ",ruletype=>"body"); }

might be though.  More testing is certainly needed.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.