You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2009/03/31 21:02:15 UTC

[Bug 6060] Perl5.8.9 crashes while compiling long code from generic rules

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6060





--- Comment #5 from Mark Martinec <Ma...@ijs.si>  2009-03-31 12:02:14 PST ---
So far I survived by avoiding Perl 5.8.9 crashes (which was the
current version in FreeBSD ports), and was running SpamAssassin under
a hand-compiled 5.10.0.

Few days ago the FreeBSD ports was updated to perl 5.10.0, and after
upgrading, this same old 'Bus error' reared its ugly head again. So
for the last two days I had to disable any use of berkeley db to survive.

Seems the difference between my hand-compiled Perl 5.10.0 and the
official one is that mine used 32-bit integers, while the one from ports
used 64-bit integers, which causes the program to occupy somewhat more
memory, and the compilation hits the stack limit sooner.

So it seems the much heavier stack usage by a Perl compiler happened
between 5.8.8 and 5.8.9, and the 5.10.0 is no better than 5.8.9
in this respect, as I initially thought.

> Here is a count of calls to got_hit in the eval-ed code on our installation,
> by type and priority (as given as arguments to eval in the current code):
> [...]
> rawbody_0.pm   431
> meta_500.pm    736
> head_0.pm     1855
> body_0.pm     5224

I repeated the exercise with body_0.pm, prunning it down to a size
which the compiler still managed to compile without a crash, the
limit is:
  body_0.pm     3268
rules.

> Now that I think of it, a combined approach similar to use_rule_subs
> would satisfy both needs: not give Perl too large chunks of code
> to compile, and save on memory footprint in a parent process
> (inherited by child processes) for source code, as Bug 5876 is
> trying to solve.
> 
> But instead of one rule per subroutine (as with use_rule_subs)
> or all rules in one subroutine (as without use_rule_subs),
> perhaps 100 rules per sub would be a good compromise, cleanly
> satisfying both needs, and without going into complications
> with temporary files.

Not having much of a choice, I embarked on modifying the Check.pm
plugin to implement the above idea: compile not more than about 60kB
of source code at a time, and if necessary produce multiple subroutines,
one for each chunk of code, and provide a master subroutine which calls
each of the chunk subroutines in turn.

A side effect is a noticable reduction in memory footprint.
I modified the 'spamassassin' command to sleep just before finishing,
and checked the process memory size by a 'ps' command. The set of rules
is what I normally use on a production mailer (updates.spamassassin.org,
sought.rules.yerp.org, some SARE rules and a handful of local ones).

original 3.3 trunk:
   VSZ   RSS
 99692 91752

3.3 trunk with my modified Check.pm:
 82080 79584

The reduction is 17.2 MB in virtual memory size,
and 11.9 MB of resident memory size.  

As an experiment I also eliminated compiling of eval rules (substituted
by direct calls), as it seems little point in optimizing the outer loop.
I keep this diffs separate. It yields some additional memory reduction:
  80780 78468
i.e. 1.3 MB VSZ and 1.1MB in RSS. It is not much, but makes code
much simpler. I doubt there is any noticable performance penalty
by not optimizing an outer loop, it is hard to measure with all the
timing noise.

Anyway, I'd like to bring in at least my chunking changes in Check.pm.
It does pass the tests, and it does produce identical results on a
couple of messages that I tried manually with old and new code (each
with many and varied hits), and it does make our production mailer+SA
run again with berkeley db enabled under 5.10.0.

I would appreciate a second opinion on the approach and code, and any
feedback in case I broke some corner case which I didn't try (user rules?,
use_rule_subs?, mass checks?).

  Bug 6060: let the Check.pm plugin produce smaller chunks
  of source code (60 kB) to avoid Perl compiler crashing
  on exceeding stack size, and to reduce memory footprint
  of SpamAssassin.
  Sending        lib/Mail/SpamAssassin/Plugin/Check.pm
  Committed revision 760568 ( https://svn.apache.org/viewcvs.cgi?view=rev&rev=760568 ).

I can also commit the other half (direct execution of eval rules)
- it can be reverted later if it doesn't feel right.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.