You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2005/09/29 00:56:48 UTC

[Bug 4606] New: Misparsing of message causes binary attachment to be treated as text

http://bugzilla.spamassassin.org/show_bug.cgi?id=4606

           Summary: Misparsing of message causes binary attachment to be
                    treated as text
           Product: Spamassassin
           Version: 3.1.0
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Libraries
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: tech2@i-is.com


I tried describing this the best I can, I'll attach a collection of messages 
which are being incorrectly parsed causing SA 3.1 to treat the binary 
attachment as plain text.  These are all multi-layered mime messages, it looks 
like the typical forward of a forward of a forward thing.  Are all these 
messages broken or is SA not parsing them correct is my question.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4606] Misparsing of message causes binary attachment to be treated as text

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4606


tech2@i-is.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |DUPLICATE




------- Additional Comments From tech2@i-is.com  2005-09-29 17:31 -------
Ok after further research, this is a duplicate of Bug 3069.
I'll mark this a duplicate of that.
I was thinking it'd be simple to not parse the base64 text regardless if it's a
message/rfc822 or not, I wasn't familiar of the internals and how this is worked
out.

*** This bug has been marked as a duplicate of 3069 ***



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4606] Misparsing of message causes binary attachment to be treated as text

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4606





------- Additional Comments From tech2@i-is.com  2005-09-29 15:57 -------
I was seeing some of the fuzzy rules hitting the binary text, also a few other
obfu type rules.  I have to admit there was SARE rules involved in the FP, also
Tripwire, but the issue I see here is that mis-parsing HAM messages like this
could create a false image about a paticular rule when it's not really the rules
fault it hit.  I hope I'm making sense out of this, what I mean is if you have
messages like this in your corpus, and the binary is being treated as text, that
could be messing up your mass-check results.
I created a work-around so if you don't feel any need to do anything about this,
I am happy to close this ticket in any fashion you see fit, the main point was I
thought this was incorrect behaviour.  Thanks for your time and help!



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4606] Misparsing of message causes binary attachment to be treated as text

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4606





------- Additional Comments From tech2@i-is.com  2005-09-29 08:38 -------
Well this issue is causing me 100s of false positives each day.  I could write
custom nice rules to counter the misfiring rules but I was hoping for a better
solution, one that would work for everyone!  Is there anything that can be done
in SA to work around this issue?



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4606] Misparsing of message causes binary attachment to be treated as text

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4606





------- Additional Comments From felicity@apache.org  2005-09-29 03:08 -------
Subject: Re:   New: Misparsing of message causes binary attachment to be treated as text

On Wed, Sep 28, 2005 at 03:56:48PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> I tried describing this the best I can, I'll attach a collection of messages 
> which are being incorrectly parsed causing SA 3.1 to treat the binary 
> attachment as plain text.  These are all multi-layered mime messages, it looks 
> like the typical forward of a forward of a forward thing.  Are all these 
> messages broken or is SA not parsing them correct is my question.

If I understand your concern, neither the messages are broken nor is SA
parsing them incorrectly.  SA subparses the first message/rfc822 part,
but the ones inside there aren't subparsed, so they'd be considered
text in the message.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4606] Misparsing of message causes binary attachment to be treated as text

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4606





------- Additional Comments From jgmyers@proofpoint.com  2005-09-29 16:44 -------
The test cases show a succession of forwardings by MUAs that forward messages as
message/rfc822 entities.  This type of forwarding is a common MUA behavior.  It
is also the standard MTA behavior for DSNs.

If SpamAssassin didn't scan message/rfc822, that would give spammers an easy way
to bypass scanning.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4606] Misparsing of message causes binary attachment to be treated as text

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4606





------- Additional Comments From schulz@adi.com  2005-09-30 11:05 -------
Well, bug 4103 is certainly unavailable.  It seems that the other bugs can be
found in the above comments and by following one reference.  The ones I find
are: bug 3069, bug 3271 and bug 3367.  Could bug 4103 be unblocked now that
fixes are out to everybody?



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4606] Misparsing of message causes binary attachment to be treated as text

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4606





------- Additional Comments From jm@jmason.org  2005-09-30 11:25 -------
bug 4103 is now readable.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4606] Misparsing of message causes binary attachment to be treated as text

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4606





------- Additional Comments From tech2@i-is.com  2005-09-30 09:13 -------
So the parsing of the mime structure is a problem here?  I was hoping or
thinking we could just drop parts like this:

Content-Type: image/jpeg
Content-Transfer-Encoding: base64

I understand that it's not always obvious what the Content-Type is, but when it
does show up, can't we just drop it from body?

The decision to do this was decided in Bug 3069 that's why I marked it a
duplicate, if a dev disagrees with me they are free to undo what I've done :)



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4606] Misparsing of message causes binary attachment to be treated as text

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4606





------- Additional Comments From lwilton@earthlink.net  2005-09-30 15:04 -------
Subject: Re:  Misparsing of message causes binary attachment to be treated as text

I wonder if there is a way to do "cheap subparsing" that might solve both
issues.

As I think I understand the problem with subparsing, I'm assuming that you
were basically calling some largish mime parser module recursively, and this
was probably upsetting perl at the stack size, leading to the deep recursion
error.

What about the possibility of simply hacking the mime parser by adding a
string array used as a stack of mime boundaries?  Every time when linearly
parsing down you recognize the introduction of a new boundary you push it
ont the stack and make it the active boundary, and run it until you see the
end boundary flag, when you pop it off, reverting to the previous boundary?

That has some potential holes with malformed mime introductions or missing
end boundaries, but I don't see that recursively calling the parser would in
any way be immune to those faults either.  You could always keep the outer
boundary handy and compare text to it to make sure you don't overrun the
overall message.  (Assuming that the outer boundary isn't reused as an
internal boundary.)

The main point of this hack would be to simply detect and delete internal
non-text mime parts, not to actually create separate recursive mime parts
for all the found text parts.  Thus, in the case of a message that had a
bunch of embedded messages as the reply history, all of that would still
show up as the outer-level text part by the time SA was scanning things.
However, any gif that was embedded in someone's signature three levels in
would have been correctly stripped from that one agglomerated text part.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4606] Misparsing of message causes binary attachment to be treated as text

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4606





------- Additional Comments From jgmyers@proofpoint.com  2005-09-29 16:14 -------
A MIME recursion limit of 1 seems inordinately low.  I'd think it should be at
least 5.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4606] Misparsing of message causes binary attachment to be treated as text

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4606





------- Additional Comments From felicity@apache.org  2005-09-30 10:07 -------
Subject: Re:  Misparsing of message causes binary attachment to be treated as text

On Fri, Sep 30, 2005 at 09:13:55AM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> So the parsing of the mime structure is a problem here?  I was hoping or
> thinking we could just drop parts like this:
> 
> Content-Type: image/jpeg
> Content-Transfer-Encoding: base64
> 
> I understand that it's not always obvious what the Content-Type is, but when it
> does show up, can't we just drop it from body?

The message parser works fine, it just intentionally doesn't subparse
message/rfc822 parts past the first one.  Non text/* or message/* parts are
already ignored.  The specific issue here is 1) only subparsing one level and
2) treating non-parsed message/* parts as text.

Both of those behaviors come from other tickets and were decided upon as the
correct course of action to fix the issues from those tickets.  We can't do it
both ways, either we solve one set of problems or the reverse set of problems.
If you think you do know how to solve both sets, please feel free to explain.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4606] Misparsing of message causes binary attachment to be treated as text

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4606





------- Additional Comments From jm@jmason.org  2005-09-29 16:32 -------
we've been here before, and boy, was there fireworks. ;)

My position was that if a message contains a message/rfc822 MIME part, that part
should be considered a (non-scanned) binary part, not a (scanned) text part. 
That would avoid this bug.

However IIRC some MUAs *did* treat it as a text part, which is why we have the
current behaviour...



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4606] Misparsing of message causes binary attachment to be treated as text

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4606





------- Additional Comments From tech2@i-is.com  2005-09-28 15:57 -------
Created an attachment (id=3150)
 --> (http://bugzilla.spamassassin.org/attachment.cgi?id=3150&action=view)
Zip of sample messages which are misparsed




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4606] Misparsing of message causes binary attachment to be treated as text

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4606





------- Additional Comments From jgmyers@proofpoint.com  2005-09-30 10:15 -------
As those other tickets are either unreferenced or unavailable, it is not
reasonable to expect people to address the unknown problems therein.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4606] Misparsing of message causes binary attachment to be treated as text

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4606





------- Additional Comments From felicity@apache.org  2005-09-29 09:15 -------
Subject: Re:  Misparsing of message causes binary attachment to be treated as text

On Thu, Sep 29, 2005 at 08:38:27AM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> Well this issue is causing me 100s of false positives each day.  I could write
> custom nice rules to counter the misfiring rules but I was hoping for a better
> solution, one that would work for everyone!  Is there anything that can be done
> in SA to work around this issue?

Not that I can think of.  We originally would have parsed the whole
thing, but certain messages would then cause perl to die with a deep
recursion error (bug 4103).  So now the code only subparses the first
level message/rfc822 to avoid the issue.

The messages don't necessary look like spam, and SA doesn't flag any of them
as spam with the default 3.1 set, so I'm not sure where you're getting FPs.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4606] Misparsing of message causes binary attachment to be treated as text

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4606


jgmyers@proofpoint.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dev@spamassassin.apache.org
         AssignedTo|dev@spamassassin.apache.org |jgmyers@proofpoint.com
             Status|REOPENED                    |NEW




------- Additional Comments From jgmyers@proofpoint.com  2005-10-19 20:33 -------
Taking bug.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
You are on the CC list for the bug, or are watching someone who is.

[Bug 4606] Misparsing of message causes binary attachment to be treated as text

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4606





------- Additional Comments From jm@jmason.org  2005-09-29 11:11 -------
Fred -- you could use the MIMEHeader plugin in 3.1.0, and write a custom "nice"
rule to match message/rfc822 parts...



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4606] Misparsing of message causes binary attachment to be treated as text

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4606





------- Additional Comments From jgmyers@proofpoint.com  2005-09-29 18:07 -------
This is not a duplicate of bug 3609.  Bug 3609 was fixed by recursing down a
single level of message/rfc822.  This bug complains that message/rfc822 messages
at more than one level of recursion are parsed as text, generating FPs.

A single level of recursion is not sufficient.  Even if it were, the current
handling of messages beyond the recursion limit is undesirable.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4606] Misparsing of message causes binary attachment to be treated as text

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4606


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |4609






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4606] Misparsing of message causes binary attachment to be treated as text

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4606


jgmyers@proofpoint.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED
   Target Milestone|Undefined                   |3.1.1




------- Additional Comments From jgmyers@proofpoint.com  2006-02-24 19:54 -------
3.1 branch: Committed revision 380772.

Trunk: Committed revision 380783.



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4606] Misparsing of message causes binary attachment to be treated as text

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4606


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|DUPLICATE                   |




------- Additional Comments From jm@jmason.org  2005-10-14 13:37 -------
reopening, as per discussion in bug 4609.  once that bug is resolved, we can fix
this one pretty easily as per Theo's idea in that bug: 'still only subparse the
one time, and then not treat message/* parts as part of the body after that.'

In other words, treat subparts that are not displayed by default by common MUAs,
in the same way we treat binary attachments currently -- ie. hidden.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4606] Misparsing of message causes binary attachment to be treated as text

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4606





------- Additional Comments From jm@jmason.org  2005-09-29 17:05 -------
'If SpamAssassin didn't scan message/rfc822, that would give spammers an easy way
to bypass scanning.'

we really have been here before.  seriously.  see:

http://bugzilla.spamassassin.org/show_bug.cgi?id=3367
http://bugzilla.spamassassin.org/show_bug.cgi?id=3069



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.