You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2004/09/24 03:57:00 UTC

Re: use embedded perl for SA.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


(cc'ing dev@spamassassin)

David F. Skoll writes:
> On Thu, 23 Sep 2004, Justin Mason wrote:
> 
> > > How can you tell how much is being shared between the child processes?
> > > I know on Linux at least, there's no easy or accurate way to tell.
> 
> > Pretty much all the doco I can find via google (the true linux
> > authority! ;) seems to indicate that top's "SHARE" field is
> > actually reasonably accurate.
> 
> OK.  I'm looking at a very busy MIMEDefang installation right
> now, and according to "top", we have 11M shared out of a VSIZE of 44M.
> 
> So it looks like MIMEDefang isn't doing any better than you. :-(
> 
> I bet I know what's going on: Sure, SpamAssassin has tons of
> "read-only" data once it has loaded the rules.  However, in Perl,
> *everything* is reference-counted, so every time you look at something
> by adding a reference to it, its reference count gets updated.  Over
> time, you're pretty much guaranteed to touch every single page of
> memory, causing data-page sharing to be useless.

hmm.  I suspect it may be possible to avoid that though.  it may
be the act of taking references to the data somewhere -- simply
accessing a hash shouldn't cause a refcount increment.

There's a few mod_perl discussions in google about this.

> The only gain we get from a forking or pre-forked architecture is
> sharing of code pages.  Without a major overhaul of Perl's internals,
> we'll never share data pages effectively.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFBU37sQTcbUG5Y7woRAri4AJ9ReH6SotiudjpYJYa1lO854qlKZgCeK27c
EP1lAfQoCUCGvZpfKz8/pXk=
=KoMH
-----END PGP SIGNATURE-----


Re: use embedded perl for SA.

Posted by "David F. Skoll" <df...@roaringpenguin.com>.
On Thu, 23 Sep 2004, Justin Mason wrote:

> hmm.  I suspect it may be possible to avoid that though.  it may
> be the act of taking references to the data somewhere -- simply
> accessing a hash shouldn't cause a refcount increment.

It will increment the hash's refcount.  And if you iterate through the
elements of the hash, you'll increment (then probably soon decrement)
their reference counts.  I don't think it's avoidable; I've run into
similar problems before with other reference-counted systems.

I'm not extremely familiar with SA's code, but I'd guess it has a big
list of rules somewhere that it iterates through.  That probably touches
memory all over the place.

Regards,

David.