You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/09/28 22:23:21 UTC

[Bug 3839] New: perl's refcounting may be causing us trouble with copy-on-write

http://bugzilla.spamassassin.org/show_bug.cgi?id=3839

           Summary: perl's refcounting may be causing us trouble with copy-
                    on-write
           Product: Spamassassin
           Version: 3.0.0
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Libraries
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: jm@jmason.org


this came up on the users list:
http://issues.apache.org/eyebrowse/ReadMsg?listId=275&msgNo=16398 . Basically,
we have this situation:

1. spamd starts up and parses a lot of configuration (rules, rule names, rule
descriptions, etc.), and allocates SVs for all the data it keeps in RAM.

2. spamd forks N subprocesses to do scanning

3a. *the plan*: that those N subprocesses never copy that memory; they can share
the read-only config data with the parent process.

3b. *reality*: shared RAM is very low; each subprocess has its own copy of these
pages, as this "top -b1 -n -c" output from linux 2.6.6 demonstrates:

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
25963 jm        25   0 1032 24416  21m 4140    0  0.0  2.5   0:00 /usr/bin/perl -T
25969 jm        25   0 1032 24416  21m 4140    0  0.0  2.5   0:00 spamd child
25970 jm        25   0 1032 24416  21m 4140    0  0.0  2.5   0:00 spamd child
25971 jm        25   0 1032 24416  21m 4140    0  0.0  2.5   0:00 spamd child
25972 jm        25   0 1032 24416  21m 4140    0  0.0  2.5   0:00 spamd child
25973 jm        25   0 1032 24416  21m 4140    0  0.0  2.5   0:00 spamd child

"SHARE" is actually an accurate illustration of how much of that memory is being
shared between those processes; as you can see, the vast majority of the SIZE is
*not* being shared.

I believe this is due to a behaviour of ref-counted strings, as David Skoll
suggested.   for example:

    my $result;
    foreach my $string (@massive_array_of_strings) {
      $result .= substr($string,0,1);
    }
    print $result,"\n";

@massive_array_of_strings is a large array of short strings, like the multiple
arrays and hashes we maintain of SpamAssassin rules, rule names, and their code.
even though you never modify any string from @massive_array_of_strings, the act
of iterating over them does this:

    - increment refcount of SV at $massive_array_of_strings[0]
    - get substr, add to $result
    - decrement refcount of SV at $massive_array_of_strings[0]
    - increment refcount of SV at $massive_array_of_strings[1]
    - get substr, add to $result
    - decrement refcount of SV at $massive_array_of_strings[1]
    - ...

Every refcount change that takes place results in the page that that SV
structure is on, becoming marked as "dirty" for the purposes of copy-on-write.
This in turn results in those pages being copied by the VM into the child
process' RSS.

See "Are My Variables Shared?"  at
http://www.perl.com/lpt/a/2002/07/30/mod_perl.html .

Another question is, is this the problem we're running into?   Some 
grovelling through Devel::Peek output may be required. :(

Possible fixes:

1. use "Readonly":
http://mirror.eunet.fi/CPAN/modules/by-module/Readonly/Readonly-1.03.readme . 
however, may not work at all.

2. store memory-hungry array/hash structures in a single massive packed string!
yes, this should work, since the refcounting for a string is a single counter in
that string's SV, in the first page.  All the pages containing the string's data
would never be written.

3. avoid looking at some of the arrays/hashes in the child processes, at all.


btw, Nutssh on IRC was looking for these -- so here they are for anyone
else who wants them.

/proc/*/maps for parent:

08048000-08141000 r-xp 00000000 03:07 5900534    /usr/bin/perl
08141000-0814a000 rw-p 000f9000 03:07 5900534    /usr/bin/perl
0814a000-09384000 rwxp 00000000 00:00 0
40000000-40016000 r-xp 00000000 03:07 3211322    /lib/ld-2.3.2.so
40016000-40017000 rw-p 00015000 03:07 3211322    /lib/ld-2.3.2.so
40017000-40018000 rw-p 00000000 00:00 0
4002e000-40030000 r-xp 00000000 03:07 3031109    /lib/tls/libdl-2.3.2.so
40030000-40031000 rw-p 00001000 03:07 3031109    /lib/tls/libdl-2.3.2.so
40031000-40053000 r-xp 00000000 03:07 3031117    /lib/tls/libm-2.3.2.so
40053000-40054000 rw-p 00022000 03:07 3031117    /lib/tls/libm-2.3.2.so
40054000-40055000 rw-p 00000000 00:00 0
40055000-40061000 r-xp 00000000 03:07 3031132    /lib/tls/libpthread-0.60.so
40061000-40062000 rw-p 0000c000 03:07 3031132    /lib/tls/libpthread-0.60.so
40062000-40064000 rw-p 00000000 00:00 0
40064000-40194000 r-xp 00000000 03:07 3031105    /lib/tls/libc-2.3.2.so
40194000-4019d000 rw-p 0012f000 03:07 3031105    /lib/tls/libc-2.3.2.so
4019d000-4019f000 rw-p 00000000 00:00 0
4019f000-401a3000 r-xp 00000000 03:07 3031106    /lib/tls/libcrypt-2.3.2.so
401a3000-401a4000 rw-p 00004000 03:07 3031106    /lib/tls/libcrypt-2.3.2.so
401a4000-401ed000 rw-p 00000000 00:00 0
401ed000-401f1000 r-xp 00000000 03:07 345268     /usr/lib/perl/5.8.4/auto/IO/IO.so
401f1000-401f2000 rw-p 00004000 03:07 345268     /usr/lib/perl/5.8.4/auto/IO/IO.so
401f2000-401f8000 r-xp 00000000 03:07 426408    
/usr/lib/perl/5.8.4/auto/Socket/Socket.so
401f8000-401f9000 rw-p 00005000 03:07 426408    
/usr/lib/perl/5.8.4/auto/Socket/Socket.so
401f9000-401fb000 r-xp 00000000 03:07 1262348   
/usr/lib/perl/5.8.4/auto/Sys/Hostname/Hostname.so
401fb000-401fc000 rw-p 00001000 03:07 1262348   
/usr/lib/perl/5.8.4/auto/Sys/Hostname/Hostname.so
401fc000-401ff000 r-xp 00000000 03:07 7422476   
/usr/lib/perl/5.8.4/auto/Fcntl/Fcntl.so
401ff000-40200000 rw-p 00003000 03:07 7422476   
/usr/lib/perl/5.8.4/auto/Fcntl/Fcntl.so
40200000-4021c000 r-xp 00000000 03:07 7029299   
/usr/lib/perl/5.8.4/auto/POSIX/POSIX.so
4021c000-4021d000 rw-p 0001b000 03:07 7029299   
/usr/lib/perl/5.8.4/auto/POSIX/POSIX.so
4021d000-40220000 r-xp 00000000 03:07 6914140   
/usr/lib/perl/5.8.4/auto/MIME/Base64/Base64.so
40220000-40221000 rw-p 00002000 03:07 6914140   
/usr/lib/perl/5.8.4/auto/MIME/Base64/Base64.so
40221000-40223000 r-xp 00000000 03:07 3784793    /usr/lib/perl5/auto/Net/DNS/DNS.so
40223000-40224000 rw-p 00001000 03:07 3784793    /usr/lib/perl5/auto/Net/DNS/DNS.so
40224000-40228000 r-xp 00000000 03:07 3784710   
/usr/lib/perl5/auto/Digest/SHA1/SHA1.so
40228000-40229000 rw-p 00004000 03:07 3784710   
/usr/lib/perl5/auto/Digest/SHA1/SHA1.so
40229000-40237000 r-xp 00000000 03:07 3784853   
/usr/lib/perl/5.8.4/auto/DB_File/DB_File.so
40237000-40238000 rw-p 0000d000 03:07 3784853   
/usr/lib/perl/5.8.4/auto/DB_File/DB_File.so
4023a000-40249000 r-xp 00000000 03:07 3031133    /lib/tls/libresolv-2.3.2.so
40249000-4024a000 rw-p 0000f000 03:07 3031133    /lib/tls/libresolv-2.3.2.so
4024a000-4024c000 rw-p 00000000 00:00 0
4024c000-40255000 r-xp 00000000 03:07 1474596   
/usr/lib/perl5/auto/HTML/Parser/Parser.so
40255000-40256000 rw-p 00008000 03:07 1474596   
/usr/lib/perl5/auto/HTML/Parser/Parser.so
40256000-40259000 r-xp 00000000 03:07 5406739    /usr/lib/perl/5.8.4/auto/Cwd/Cwd.so
40259000-4025a000 rw-p 00002000 03:07 5406739    /usr/lib/perl/5.8.4/auto/Cwd/Cwd.so
4025a000-4025e000 r-xp 00000000 03:07 1802278   
/usr/lib/perl/5.8.4/auto/Time/HiRes/HiRes.so
4025e000-4025f000 rw-p 00003000 03:07 1802278   
/usr/lib/perl/5.8.4/auto/Time/HiRes/HiRes.so
40262000-40336000 r-xp 00000000 03:07 5931583    /usr/lib/libdb-4.2.so
40336000-40338000 rw-p 000d4000 03:07 5931583    /usr/lib/libdb-4.2.so
40338000-4033c000 r-xp 00000000 03:07 1278045   
/usr/lib/perl/5.8.4/auto/Sys/Syslog/Syslog.so
4033c000-4033d000 rw-p 00003000 03:07 1278045   
/usr/lib/perl/5.8.4/auto/Sys/Syslog/Syslog.so
4033d000-40345000 r-xp 00000000 03:07 6882044   
/usr/lib/perl/5.8.4/auto/List/Util/Util.so
40345000-40346000 rw-p 00007000 03:07 6882044   
/usr/lib/perl/5.8.4/auto/List/Util/Util.so
40353000-4035c000 r-xp 00000000 03:07 3031124    /lib/tls/libnss_files-2.3.2.so
4035c000-4035d000 rw-p 00008000 03:07 3031124    /lib/tls/libnss_files-2.3.2.so
4035d000-40364000 r-xp 00000000 03:07 3031120    /lib/tls/libnss_compat-2.3.2.so
40364000-40365000 rw-p 00007000 03:07 3031120    /lib/tls/libnss_compat-2.3.2.so
40365000-40377000 r-xp 00000000 03:07 3031119    /lib/tls/libnsl-2.3.2.so
40377000-40378000 rw-p 00011000 03:07 3031119    /lib/tls/libnsl-2.3.2.so
40378000-4037a000 rw-p 00000000 00:00 0
4037a000-40383000 r-xp 00000000 03:07 3031129    /lib/tls/libnss_nis-2.3.2.so
40383000-40384000 rw-p 00008000 03:07 3031129    /lib/tls/libnss_nis-2.3.2.so
40384000-4044c000 rw-p 00000000 00:00 0
404bc000-40524000 rw-p 00138000 00:00 0
bfff0000-c0000000 rw-p ffff1000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0



/proc/*/maps for child:

08048000-08141000 r-xp 00000000 03:07 5900534    /usr/bin/perl
08141000-0814a000 rw-p 000f9000 03:07 5900534    /usr/bin/perl
0814a000-093a5000 rwxp 00000000 00:00 0 
40000000-40016000 r-xp 00000000 03:07 3211322    /lib/ld-2.3.2.so
40016000-40017000 rw-p 00015000 03:07 3211322    /lib/ld-2.3.2.so
40017000-40018000 rw-p 00000000 00:00 0 
4002e000-40030000 r-xp 00000000 03:07 3031109    /lib/tls/libdl-2.3.2.so
40030000-40031000 rw-p 00001000 03:07 3031109    /lib/tls/libdl-2.3.2.so
40031000-40053000 r-xp 00000000 03:07 3031117    /lib/tls/libm-2.3.2.so
40053000-40054000 rw-p 00022000 03:07 3031117    /lib/tls/libm-2.3.2.so
40054000-40055000 rw-p 00000000 00:00 0 
40055000-40061000 r-xp 00000000 03:07 3031132    /lib/tls/libpthread-0.60.so
40061000-40062000 rw-p 0000c000 03:07 3031132    /lib/tls/libpthread-0.60.so
40062000-40064000 rw-p 00000000 00:00 0 
40064000-40194000 r-xp 00000000 03:07 3031105    /lib/tls/libc-2.3.2.so
40194000-4019d000 rw-p 0012f000 03:07 3031105    /lib/tls/libc-2.3.2.so
4019d000-4019f000 rw-p 00000000 00:00 0 
4019f000-401a3000 r-xp 00000000 03:07 3031106    /lib/tls/libcrypt-2.3.2.so
401a3000-401a4000 rw-p 00004000 03:07 3031106    /lib/tls/libcrypt-2.3.2.so
401a4000-401ed000 rw-p 00000000 00:00 0 
401ed000-401f1000 r-xp 00000000 03:07 345268     /usr/lib/perl/5.8.4/auto/IO/IO.so
401f1000-401f2000 rw-p 00004000 03:07 345268     /usr/lib/perl/5.8.4/auto/IO/IO.so
401f2000-401f8000 r-xp 00000000 03:07 426408    
/usr/lib/perl/5.8.4/auto/Socket/Socket.so
401f8000-401f9000 rw-p 00005000 03:07 426408    
/usr/lib/perl/5.8.4/auto/Socket/Socket.so
401f9000-401fb000 r-xp 00000000 03:07 1262348   
/usr/lib/perl/5.8.4/auto/Sys/Hostname/Hostname.so
401fb000-401fc000 rw-p 00001000 03:07 1262348   
/usr/lib/perl/5.8.4/auto/Sys/Hostname/Hostname.so
401fc000-401ff000 r-xp 00000000 03:07 7422476   
/usr/lib/perl/5.8.4/auto/Fcntl/Fcntl.so
401ff000-40200000 rw-p 00003000 03:07 7422476   
/usr/lib/perl/5.8.4/auto/Fcntl/Fcntl.so
40200000-4021c000 r-xp 00000000 03:07 7029299   
/usr/lib/perl/5.8.4/auto/POSIX/POSIX.so
4021c000-4021d000 rw-p 0001b000 03:07 7029299   
/usr/lib/perl/5.8.4/auto/POSIX/POSIX.so
4021d000-40220000 r-xp 00000000 03:07 6914140   
/usr/lib/perl/5.8.4/auto/MIME/Base64/Base64.so
40220000-40221000 rw-p 00002000 03:07 6914140   
/usr/lib/perl/5.8.4/auto/MIME/Base64/Base64.so
40221000-40223000 r-xp 00000000 03:07 3784793    /usr/lib/perl5/auto/Net/DNS/DNS.so
40223000-40224000 rw-p 00001000 03:07 3784793    /usr/lib/perl5/auto/Net/DNS/DNS.so
40224000-40228000 r-xp 00000000 03:07 3784710   
/usr/lib/perl5/auto/Digest/SHA1/SHA1.so
40228000-40229000 rw-p 00004000 03:07 3784710   
/usr/lib/perl5/auto/Digest/SHA1/SHA1.so
40229000-40237000 r-xp 00000000 03:07 3784853   
/usr/lib/perl/5.8.4/auto/DB_File/DB_File.so
40237000-40238000 rw-p 0000d000 03:07 3784853   
/usr/lib/perl/5.8.4/auto/DB_File/DB_File.so
4023a000-40249000 r-xp 00000000 03:07 3031133    /lib/tls/libresolv-2.3.2.so
40249000-4024a000 rw-p 0000f000 03:07 3031133    /lib/tls/libresolv-2.3.2.so
4024a000-4024c000 rw-p 00000000 00:00 0 
4024c000-40255000 r-xp 00000000 03:07 1474596   
/usr/lib/perl5/auto/HTML/Parser/Parser.so
40255000-40256000 rw-p 00008000 03:07 1474596   
/usr/lib/perl5/auto/HTML/Parser/Parser.so
40256000-40259000 r-xp 00000000 03:07 5406739    /usr/lib/perl/5.8.4/auto/Cwd/Cwd.so
40259000-4025a000 rw-p 00002000 03:07 5406739    /usr/lib/perl/5.8.4/auto/Cwd/Cwd.so
4025a000-4025e000 r-xp 00000000 03:07 1802278   
/usr/lib/perl/5.8.4/auto/Time/HiRes/HiRes.so
4025e000-4025f000 rw-p 00003000 03:07 1802278   
/usr/lib/perl/5.8.4/auto/Time/HiRes/HiRes.so
40262000-40336000 r-xp 00000000 03:07 5931583    /usr/lib/libdb-4.2.so
40336000-40338000 rw-p 000d4000 03:07 5931583    /usr/lib/libdb-4.2.so
40338000-4033c000 r-xp 00000000 03:07 1278045   
/usr/lib/perl/5.8.4/auto/Sys/Syslog/Syslog.so
4033c000-4033d000 rw-p 00003000 03:07 1278045   
/usr/lib/perl/5.8.4/auto/Sys/Syslog/Syslog.so
4033d000-40345000 r-xp 00000000 03:07 6882044   
/usr/lib/perl/5.8.4/auto/List/Util/Util.so
40345000-40346000 rw-p 00007000 03:07 6882044   
/usr/lib/perl/5.8.4/auto/List/Util/Util.so
40353000-4035c000 r-xp 00000000 03:07 3031124    /lib/tls/libnss_files-2.3.2.so
4035c000-4035d000 rw-p 00008000 03:07 3031124    /lib/tls/libnss_files-2.3.2.so
4035d000-40364000 r-xp 00000000 03:07 3031120    /lib/tls/libnss_compat-2.3.2.so
40364000-40365000 rw-p 00007000 03:07 3031120    /lib/tls/libnss_compat-2.3.2.so
40365000-40377000 r-xp 00000000 03:07 3031119    /lib/tls/libnsl-2.3.2.so
40377000-40378000 rw-p 00011000 03:07 3031119    /lib/tls/libnsl-2.3.2.so
40378000-4037a000 rw-p 00000000 00:00 0 
4037a000-40383000 r-xp 00000000 03:07 3031129    /lib/tls/libnss_nis-2.3.2.so
40383000-40384000 rw-p 00008000 03:07 3031129    /lib/tls/libnss_nis-2.3.2.so
40384000-4044c000 rw-p 00000000 00:00 0 
404bc000-40524000 rw-p 00138000 00:00 0 
bfff0000-c0000000 rw-p ffff1000 00:00 0 
ffffe000-fffff000 ---p 00000000 00:00 0



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Re: [Bug 3839] New: perl's refcounting may be causing us trouble with copy-on-write

Posted by Matt Sergeant <ms...@messagelabs.com>.
On 28 Sep 2004, at 21:23, bugzilla-daemon@bugzilla.spamassassin.org 
wrote:

> 3a. *the plan*: that those N subprocesses never copy that memory; they 
> can share
> the read-only config data with the parent process.
>
> 3b. *reality*: shared RAM is very low; each subprocess has its own 
> copy of these
> pages, as this "top -b1 -n -c" output from linux 2.6.6 demonstrates:
>
>   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
> 25963 jm        25   0 1032 24416  21m 4140    0  0.0  2.5   0:00 
> /usr/bin/perl -T
> 25969 jm        25   0 1032 24416  21m 4140    0  0.0  2.5   0:00 
> spamd child
> 25970 jm        25   0 1032 24416  21m 4140    0  0.0  2.5   0:00 
> spamd child
> 25971 jm        25   0 1032 24416  21m 4140    0  0.0  2.5   0:00 
> spamd child
> 25972 jm        25   0 1032 24416  21m 4140    0  0.0  2.5   0:00 
> spamd child
> 25973 jm        25   0 1032 24416  21m 4140    0  0.0  2.5   0:00 
> spamd child
>
> "SHARE" is actually an accurate illustration of how much of that 
> memory is being
> shared between those processes; as you can see, the vast majority of 
> the SIZE is
> *not* being shared.

Your analysis is wrong. RSS is the memory being used by the process, 
not SIZE. In the above, most of the memory *is* shared. Maybe this 
isn't a problem at all?

See the section "Process Memory Measurements" in the mod_perl guide 
(not sure where it is - I'm referring to the book version).

Matt.


______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________