You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2006/11/27 18:18:02 UTC

[Bug 5210] New: fd leak in MIME parser temp file usage

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5210

           Summary: fd leak in MIME parser temp file usage
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: major
          Priority: P5
         Component: Libraries
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: jm@jmason.org


Noticed this in a mass-check of a load of bounce messages.  Here's a
snippet from an strace log of "mass-check --debug ham:dir:/path/to/folder":

22242 open("/home/jm/ftp/emerg/18761", O_RDONLY|O_LARGEFILE) = 249
22242 ioctl(249, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbf9ee728) = -1 ENOTTY (Inappropria
te ioctl for device)
22242 _llseek(249, 0, [0], SEEK_CUR)    = 0
22242 fstat64(249, {st_mode=S_IFREG|0664, st_size=20559, ...}) = 0
22242 fcntl64(249, F_SETFD, FD_CLOEXEC) = 0
22242 fstat64(249, {st_mode=S_IFREG|0664, st_size=20559, ...}) = 0
22242 read(249, "From MAILER-DAEMON  Sun Nov 26 1"..., 4096) = 4096
22242 read(249, "ent =\nsoldiers. Pew amp american"..., 4096) = 4096
22242 read(249, "\n08+fnt95//0FJqBJlhVoIHHCAWdbbcA"..., 4096) = 4096
22242 read(249, "hksaeEeqiQYaiICnBz/9qojvqokBqpkj"..., 4096) = 4096
22242 read(249, "6ZeXeuWhNa+Q380u22xOWlsycqWdJhUG"..., 4096) = 4096
22242 read(249, "rt_000_0006_01C71130.E3ACE610--\n"..., 4096) = 79
22242 read(249, "", 4096)               = 0
22242 close(249)                        = 0
22242 write(2, "[22242] dbg: message: main messa"..., 58) = 58
22242 write(2, "[22242] dbg: message: ---- MIME "..., 50) = 50
22242 write(2, "[22242] dbg: message: parsing mu"..., 95) = 95
22242 write(2, "[22242] dbg: message: found part"..., 103) = 103
22242 write(2, "[22242] dbg: message: added part"..., 51) = 51
22242 write(2, "[22242] dbg: message: found part"..., 116) = 116
22242 write(2, "[22242] dbg: message: added part"..., 64) = 64
22242 write(2, "[22242] dbg: message: found part"..., 107) = 107
22242 write(2, "[22242] dbg: message: added part"..., 55) = 55
22242 write(2, "[22242] dbg: message: parsing no"..., 42) = 42
22242 write(2, "[22242] dbg: message: parsing no"..., 42) = 42
22242 write(2, "[22242] dbg: message: no encodin"..., 43) = 43
22242 write(2, "[22242] dbg: message: main messa"..., 52) = 52
22242 write(2, "[22242] dbg: message: added part"..., 51) = 51
22242 write(2, "[22242] dbg: message: parsing no"..., 42) = 42
.....
22242 write(2, "[22242] dbg: message: parsing mu"..., 97) = 97
22242 write(2, "[22242] dbg: message: found part"..., 105) = 105
22242 write(2, "[22242] dbg: message: added part"..., 51) = 51
22242 write(2, "[22242] dbg: message: found part"..., 104) = 104
22242 write(2, "[22242] dbg: message: added part"..., 50) = 50
22242 write(2, "[22242] dbg: message: parsing no"..., 42) = 42
22242 umask(077)                        = 02
22242 open("/tmp/.spamassassin22242ogVHk7tmp", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 06
00) = 249
22242 ioctl(249, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbf9ee708) = -1 ENOTTY (Inappropria
te ioctl for device)
22242 _llseek(249, 0, [0], SEEK_CUR)    = 0
22242 fstat64(249, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
22242 fcntl64(249, F_SETFD, FD_CLOEXEC) = 0
22242 umask(02)                         = 077
22242 unlink("/tmp/.spamassassin22242ogVHk7tmp") = 0
22242 write(249, "R0lGODlhzAHQAYfgAAAGAnIADQp/AIh0"..., 4096) = 4096
22242 write(249, "rHSo+dz3kY4941Dce7krP+tibzvrHH57"..., 4096) = 4096
22242 write(249, "8/+ZSfF5F/+Zif+Zp/GpXf+Z7/+aAf+h"..., 4096) = 4096
22242 write(2, "[22242] dbg: message: parsing no"..., 42) = 42
22242 write(2, "[22242] dbg: message: parsing no"..., 42) = 42
22242 write(2, "[22242] dbg: message: ---- MIME "..., 48) = 48
22242 time(NULL)                        = 1164647410
22242 write(2, "[22242] dbg: received-header: pa"..., 157) = 157
22242 write(2, "[22242] dbg: received-header: re"..., 75) = 75
22242 write(2, "[22242] dbg: metadata: X-Spam-Re"..., 47) = 47
22242 write(2, "[22242] dbg: metadata: X-Spam-Re"..., 165) = 165
22242 write(2, "[22242] dbg: metadata: X-Spam-Re"..., 48) = 48
22242 write(2, "[22242] dbg: metadata: X-Spam-Re"..., 164) = 164
....

22242 write(2, "[22242] dbg: rules: running body"..., 55) = 55
22242 write(2, "[22242] dbg: rules: ran body rul"..., 77) = 77
22242 write(2, "[22242] dbg: rules: ran body rul"..., 71) = 71
22242 write(2, "[22242] dbg: rules: ran body rul"..., 70) = 70
22242 write(2, "[22242] dbg: rules: running uri "..., 54) = 54
22242 write(2, "[22242] dbg: rules: ran uri rule"..., 68) = 68
22242 write(2, "[22242] dbg: rules: ran uri rule"..., 71) = 71
22242 write(2, "[22242] dbg: rules: ran uri rule"..., 94) = 94
22242 write(249, "zzP8sm1s0qhDMrQCsbFo0h5PEsePEuF\n"..., 1447) = 1447
22242 _llseek(249, 0, [0], SEEK_SET)    = 0
22242 _llseek(249, 0, [0], SEEK_CUR)    = 0
22242 fstat64(249, {st_mode=S_IFREG|0600, st_size=13735, ...}) = 0
22242 read(249, "R0lGODlhzAHQAYfgAAAGAnIADQp/AIh0"..., 4096) = 4096
22242 read(249, "rHSo+dz3kY4941Dce7krP+tibzvrHH57"..., 4096) = 4096
22242 read(249, "8/+ZSfF5F/+Zif+Zp/GpXf+Z7/+aAf+h"..., 4096) = 4096
22242 read(249, "zzP8sm1s0qhDMrQCsbFo0h5PEsePEuF\n"..., 4096) = 1447
22242 read(249, "", 4096)               = 0
22242 write(2, "[22242] dbg: message: decoding b"..., 38) = 38
22242 write(2, "[22242] dbg: imageinfo: gif imag"..., 105) = 105
22242 write(2, "[22242] dbg: imageinfo: image na"..., 50) = 50
22242 write(2, "[22242] dbg: imageinfo: adding 4"..., 51) = 51
22242 _llseek(249, 0, [0], SEEK_SET)    = 0
22242 _llseek(249, 0, [0], SEEK_CUR)    = 0
22242 read(249, "R0lGODlhzAHQAYfgAAAGAnIADQp/AIh0"..., 4096) = 4096
22242 read(249, "rHSo+dz3kY4941Dce7krP+tibzvrHH57"..., 4096) = 4096
22242 read(249, "8/+ZSfF5F/+Zif+Zp/GpXf+Z7/+aAf+h"..., 4096) = 4096
22242 read(249, "zzP8sm1s0qhDMrQCsbFo0h5PEsePEuF\n"..., 4096) = 1447
22242 read(249, "", 4096)               = 0
22242 write(2, "[22242] dbg: https_http_mismatch"..., 44) = 44
22242 _llseek(249, 0, [0], SEEK_SET)    = 0
22242 _llseek(249, 0, [0], SEEK_CUR)    = 0
22242 read(249, "R0lGODlhzAHQAYfgAAAGAnIADQp/AIh0"..., 4096) = 4096
22242 read(249, "rHSo+dz3kY4941Dce7krP+tibzvrHH57"..., 4096) = 4096
22242 read(249, "8/+ZSfF5F/+Zif+Zp/GpXf+Z7/+aAf+h"..., 4096) = 4096
22242 read(249, "zzP8sm1s0qhDMrQCsbFo0h5PEsePEuF\n"..., 4096) = 1447
22242 read(249, "", 4096)               = 0
22242 write(2, "[22242] dbg: rules: ran eval rul"..., 73) = 73
22242 write(2, "[22242] dbg: rules: ran eval rul"..., 73) = 73
22242 write(2, "[22242] dbg: imageinfo: image ra"..., 76) = 76
22242 write(2, "[22242] dbg: rules: ran eval rul"..., 72) = 72
22242 write(2, "[22242] dbg: eval: text words: 1"..., 52) = 52
22242 write(2, "[22242] dbg: eval: madiff: left:"..., 69) = 69
22242 write(2, "[22242] dbg: rules: ran eval rul"..., 69) = 69
22242 write(2, "[22242] dbg: rules: ran eval rul"..., 66) = 66
22242 _llseek(249, 0, [0], SEEK_SET)    = 0
22242 _llseek(249, 0, [0], SEEK_CUR)    = 0
22242 read(249, "R0lGODlhzAHQAYfgAAAGAnIADQp/AIh0"..., 4096) = 4096
22242 read(249, "rHSo+dz3kY4941Dce7krP+tibzvrHH57"..., 4096) = 4096
22242 read(249, "8/+ZSfF5F/+Zif+Zp/GpXf+Z7/+aAf+h"..., 4096) = 4096
22242 read(249, "zzP8sm1s0qhDMrQCsbFo0h5PEsePEuF\n"..., 4096) = 1447
22242 read(249, "", 4096)               = 0
22242 write(2, "[22242] dbg: rules: ran eval rul"..., 67) = 67
22242 write(2, "[22242] dbg: rules: ran eval rul"..., 67) = 67
22242 write(2, "[22242] dbg: rules: ran eval rul"..., 72) = 72
22242 write(2, "[22242] dbg: rules: ran eval rul"..., 71) = 71
22242 write(2, "[22242] dbg: rules: ran eval rul"..., 73) = 73
22242 write(2, "[22242] dbg: rules: ran eval rul"..., 73) = 73
22242 write(2, "[22242] dbg: rules: ran eval rul"..., 70) = 70
22242 write(2, "[22242] dbg: rules: ran eval rul"..., 78) = 78
22242 write(2, "[22242] dbg: rules: ran eval rul"..., 72) = 72
22242 write(2, "[22242] dbg: rules: ran eval rul"..., 72) = 72
22242 write(2, "[22242] dbg: rules: running rawb"..., 62) = 62
22242 write(2, "[22242] dbg: rules: ran eval rul"..., 68) = 68
22242 write(2, "[22242] dbg: imageinfo: image ra"..., 76) = 76
22242 write(2, "[22242] dbg: rules: ran eval rul"..., 64) = 64
22242 write(2, "[22242] dbg: rules: running full"..., 59) = 59
22242 write(2, "[22242] dbg: rules: running meta"..., 59) = 59
22242 write(2, "[22242] dbg: check: running test"..., 52) = 52
22242 write(2, "[22242] dbg: rules: running head"..., 59) = 59
22242 write(2, "[22242] dbg: rules: running body"..., 59) = 59
22242 write(2, "[22242] dbg: rules: running uri "..., 58) = 58
22242 write(2, "[22242] dbg: rules: running rawb"..., 62) = 62
22242 write(2, "[22242] dbg: rules: running full"..., 59) = 59
22242 write(2, "[22242] dbg: rules: running meta"..., 59) = 59
22242 write(2, "[22242] dbg: check: running test"..., 53) = 53
22242 write(2, "[22242] dbg: rules: running head"..., 59) = 59
22242 write(2, "[22242] dbg: rules: running body"..., 59) = 59
22242 write(2, "[22242] dbg: rules: running uri "..., 58) = 58
22242 write(2, "[22242] dbg: rules: running rawb"..., 62) = 62
22242 write(2, "[22242] dbg: rules: running full"..., 59) = 59
22242 write(2, "[22242] dbg: rules: running meta"..., 59) = 59
22242 write(2, "[22242] dbg: check: is spam? sco"..., 52) = 52
22242 write(2, "[22242] dbg: check: tests=DC_GIF"..., 128) = 128
22242 write(2, "[22242] dbg: check: subtests=__A"..., 600) = 600
22242 time(NULL)                        = 1164647410
22242 write(9, "Y  6 /home/jm/ftp/emerg/18761 DC"..., 748) = 748


Eventually mass-check starts spewing "too many open files" and
skips the remaining messages, then exits.

The file is a 20KB bounce.  I can attach it, but I don't think it
is actually relevant.

its mass-check line:

Y  6 /home/jm/ftp/emerg/18761
DC_GIF_UNO_LARGO,DC_IMAGE_SPAM_HTML,HTML_IMAGE_ONLY_20,HTML_MESSAGE,PART_CID_STOCK,TVD_FW_GRAPHIC_ID1,__ANY_IMAGE_ATTACH,__AUTO_GEN_AS,__BOUNCE_AUTO_GENERATED,__BOUNCE_CTYPE,__BOUNCE_RPATH_NULL,__BOUNCE_SYMANTEC,__CT,__CTYPE_HAS_BOUNDARY,__CTYPE_ONETAB_GIF,__GIF_AREA_180K,__GIF_ATTACH,__GIF_ATTACH_1,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_MSGID,__HAS_RCVD,__HAS_SUBJECT,__HTML_IMG_ONLY,__HTML_LENGTH_1536_2048,__KAM_NUMBER2,__LOCAL_PP_NONPPURL,__MIME_BASE64,__MIME_HTML,__MIME_QP,__MIME_VERSION,__MSGID_OK_DIGITS,__MSGID_OK_HOST,__NONEMPTY_BODY,__PART_STOCK_CID,__SANE_MSGID,__TAG_EXISTS_BODY,__TAG_EXISTS_HEAD,__TAG_EXISTS_HTML,__TAG_EXISTS_META,__TOCC_EXISTS,__TVD_INT_CID
time=1164524403,scantime=0,format=f,reuse=no

will attach debug log snippet now.   I also have the full strace and
debug log output, fwiw, but I think this probably illustrates enough...



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5210] fd leak in MIME parser temp file usage

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5210


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|Undefined                   |3.2.0




------- Additional Comments From jm@jmason.org  2006-11-27 09:20 -------
oh, needless to say, fd 249 is not closed anywhere else before the EOF of the
strace log, after that.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5210] fd leak in MIME parser temp file usage

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5210





------- Additional Comments From felicity@apache.org  2006-11-27 11:47 -------
(In reply to comment #6)
> assuming the tempfile code is saving us RAM, I vote #2.   I have a theory that
> low RAM usage should be something we aim for, given that modern CPUs are
> insanely fast (and getting faster) compared to RAM bandwidth.

Ok, see how the latest trunk (r479738 or later) works for you. :)



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5210] fd leak in MIME parser temp file usage

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5210





------- Additional Comments From jm@jmason.org  2006-11-27 09:19 -------
Created an attachment (id=3756)
 --> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=3756&action=view)
mass-check --debug snippet




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5210] fd leak in MIME parser temp file usage

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5210


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |5166




------- Additional Comments From jm@jmason.org  2006-11-27 09:22 -------
oops, possibly dup of 5166



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5210] fd leak in MIME parser temp file usage

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5210


Bug 5210 depends on bug 5166, which changed state.

Bug 5166 Summary: mass-check: Too many open files
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5166

           What    |Old Value                   |New Value
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5210] fd leak in MIME parser temp file usage

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5210





------- Additional Comments From felicity@apache.org  2006-11-27 11:01 -------
So there's 2 possibilities here imo.

1) we just drop the code.  I don't know how useful it actually is, and with more
and more people wanting to look at the binary content of images/etc, there's no
real RAM savings.

2) when I added the code in, I had already removed the finish() code, on the
assumption that perl's garbage collection would go ahead and remove the whole
tree, and also close any of the temp files.  it appears that at a minimum, we
could add back in the old finish() code which attempts to undef most of the node
data.  If perl isn't closing the temp files, maybe it's not freeing up the node
memory either (though I'd think we'd hear about that much quicker than the temp
files).

I can put in a patch pretty quickly for either one, so what would people prefer?  



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5210] fd leak in MIME parser temp file usage

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5210





------- Additional Comments From jm@jmason.org  2006-11-27 09:23 -------
Created an attachment (id=3757)
 --> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=3757&action=view)
msg 18761 from the mass-check

well, since bug 5166 indicates that it may be message-data-related, here's the
msg.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5210] fd leak in MIME parser temp file usage

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5210


felicity@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED




------- Additional Comments From felicity@apache.org  2006-11-27 19:28 -------
I'm going to call it resolved. :)



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5210] fd leak in MIME parser temp file usage

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5210





------- Additional Comments From jm@jmason.org  2006-11-27 12:19 -------
looks fixed, afaics. cheers!



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5210] fd leak in MIME parser temp file usage

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5210





------- Additional Comments From jm@jmason.org  2006-11-27 11:23 -------
thanks Theo.

assuming the tempfile code is saving us RAM, I vote #2.   I have a theory that
low RAM usage should be something we aim for, given that modern CPUs are
insanely fast (and getting faster) compared to RAM bandwidth.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.