You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2014/08/22 09:00:07 UTC

[Bug 7078] New: Mail::Spamassassin::Message::Node::header() error.

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7078

            Bug ID: 7078
           Summary: Mail::Spamassassin::Message::Node::header() error.
           Product: Spamassassin
           Version: 3.4.0
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: major
          Priority: P2
         Component: spamassassin
          Assignee: dev@spamassassin.apache.org
          Reporter: liu31380@gmail.com

Created attachment 5231
  --> https://issues.apache.org/SpamAssassin/attachment.cgi?id=5231&action=edit
DATE_IN_PAST_96_XX

I take one mail (attachment), report following rule.

"2.1 DATE_IN_PAST_96_XX     Date: is 96 hours or more before Received: date"

I try to debug  the code , traced in Mail::Spamassassin::Message::Node::header
procedure , I insert print line to debug "received" field's pasering
result(modifyed code).

The result display:
---------------
 from smtp.test.com ([120.1.1.120])
        by mail.cutlass.com (IceWarp 9.4.2) with SMTP id IPY03927
        for <ad...@cutlass.com>; Tue, 22 Jul 2014 16:54:27 +0800

---------------
---------------
 for <ad...@cutlass.com>; Tue, 22 Jul 2014 16:54:27 +08007

---------------
---------------
 from yangzhengchu (192.168.179.243) by smtp.isof.com
 (192.168.2.142) with Microsoft SMTP Server (TLS) id 14.2.247.3; Sat, 27 Apr
 2013 15:38:17 +0800

---------------
---------------
 2013 15:38:17 +0800h Microsoft SMTP Server (TLS) id 14.2.247.3; Sat, 27 Apr

---------------

================modifyed code================
sub header {
  my $self   = shift;
  my $rawkey = shift;

  return unless defined $rawkey;

  # we're going to do things case insensitively
  my $key    = lc($rawkey);

  # Trim whitespace off of the header keys
  $key       =~ s/^\s+//;
  $key       =~ s/\s+$//;

  if (@_) {
    my $raw_value = shift;
    return unless defined $raw_value;

    push @{ $self->{'header_order'} }, $rawkey;
    if ( !exists $self->{'headers'}->{$key} ) {
      $self->{'headers'}->{$key} = [];
      $self->{'raw_headers'}->{$key} = [];
    }

    my $dec_value = $raw_value;
        if($key eq "received") {                                                
        print "---------------\n";                                              
        print $dec_value."\n";                                                  
        print "---------------\n";                                              
    }
    $dec_value =~ s/\n[ \t]+/ /gs;
        if($key eq "received") {                                                
        print "---------------\n";                                              
        print $dec_value."\n";                                                  
        print "---------------\n";                                              
    }
    $dec_value =~ s/\s+$//s;
    $dec_value =~ s/^\s+//s;
    push @{ $self->{'headers'}->{$key} },
$self->_decode_header($dec_value,$key);

    push @{ $self->{'raw_headers'}->{$key} }, $raw_value;

    return $self->{'headers'}->{$key}->[-1];
  }

  if (wantarray) {
    return unless exists $self->{'headers'}->{$key};
    return @{ $self->{'headers'}->{$key} };
  }
  else {
    return '' unless exists $self->{'headers'}->{$key};
    return $self->{'headers'}->{$key}->[-1];
  }
}

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7078] Mail::Spamassassin::Message::Node::header() error.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7078

liu31380@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |liu31380@gmail.com

--- Comment #11 from liu31380@gmail.com ---
Sorry, I‘m offline past tow days.
My problem just as Mark said.
There is only one date had been extracted, and the correct results should be
two. 
Using the the same received content , just let the fold line location does not
break the date, the extraction result is correct two.
Finally, thanks very much for detailed analysis.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7078] Mail::Spamassassin::Message::Node::header() error.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7078

Mark Martinec <Ma...@ijs.si> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #12 from Mark Martinec <Ma...@ijs.si> ---
Closing, the immediate problem is fixed.


(In reply to Mark Martinec from comment #9)
> It would make more sense to me to have 'pristine' representation of
> a message also in a normalized line-ending form, so the rest of the
> code would not need to worry about LF vs. CRLF.  Such a change would
> deserve a wider consensus though.

This can be a topic of another PR, or re-opening at some later time.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7078] Mail::Spamassassin::Message::Node::header() error.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7078

Philip Prindeville <ph...@redfish-solutions.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |philipp@redfish-solutions.c
                   |                            |om

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7078] Mail::Spamassassin::Message::Node::header() error.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7078

--- Comment #3 from Mark Martinec <Ma...@ijs.si> ---
Created attachment 5232
  --> https://issues.apache.org/SpamAssassin/attachment.cgi?id=5232&action=edit
Suggested patch - normalize line endings in header too

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7078] Mail::Spamassassin::Message::Node::header() error.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7078

--- Comment #2 from Mark Martinec <Ma...@ijs.si> ---
> I wonder, aren't the <CR><LF> line endings supposed to be normalized
> to <LF> somewhere in the pre-parsing stage. I don't think the culprit
> is the sub header().

I see, the Mail::SpamAssassin::Message::new() is normalizing line endings
in a message body, but not in a header section. I wonder why is that so
and why did not more things break up. Probably because most of us are
feeding messages with <NL> line endings to SpamAssassin.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7078] Mail::Spamassassin::Message::Node::header() error.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7078

--- Comment #7 from Mark Martinec <Ma...@ijs.si> ---
Sorry, a missing detail:

-         $value =~ s/\015\012\z/\012/  if $squash_crlf;
+         $value =~ s/\015\012/\012/sg  if $squash_crlf;

Sending lib/Mail/SpamAssassin/Message.pm
Committed revision 1619952.

( I remembered it on my way home, but Karsten was too quick :)
(more on the next comment...)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7078] Mail::Spamassassin::Message::Node::header() error.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7078

--- Comment #9 from Mark Martinec <Ma...@ijs.si> ---
While investigating the case, my initial instinct was to do line-ending
normalization once-and-for-all in MS::Message::new(), affecting also the
pristine_headers and pristine_body. As it turned out, the report-generating
code in PerMsgStatus relies on original line endings in these two strings,
so a (small-ish) change would also be needed there, so I decide to revert
the attempted fix and keep the change to a least-effect (which also resulted
in my last-minute breakage).

So the current situation remains unchanged: the 'cooked' representation
of the message has line endings normalized to a single <LF>, so 'body'
rules dealing with this text need not worry about different line-ending
representations. On the other hand, raw rules (like 'rawbody') see the
original line endings and need to deal with them. It would not surprise
me that some of these rules do not work correctly on CRLF messages.

It would make more sense to me to have 'pristine' representation of
a message also in a normalized line-ending form, so the rest of the
code would not need to worry about LF vs. CRLF.  Such a change would
deserve a wider consensus though.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7078] Mail::Spamassassin::Message::Node::header() error.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7078

Mark Martinec <Ma...@ijs.si> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|Undefined                   |3.4.1

--- Comment #4 from Mark Martinec <Ma...@ijs.si> ---
Bug 7078 - Mail::Spamassassin::Message::Node::header() error -
normalize line endings in header, not just in body
  Sending lib/Mail/SpamAssassin/Message.pm
Committed revision 1619844.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7078] Mail::Spamassassin::Message::Node::header() error.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7078

Kevin A. McGrail <km...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kmcgrail@pccc.com

--- Comment #5 from Kevin A. McGrail <km...@pccc.com> ---
+1 for me.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7078] Mail::Spamassassin::Message::Node::header() error.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7078

--- Comment #1 from Mark Martinec <Ma...@ijs.si> ---
Unfortunately you captured your output from screen instead of catching
it on a file - which makes it look like a mystery, while in fact there
are nonprintable <CR> characters in the printout.

I wonder, aren't the <CR><LF> line endings supposed to be normalized
to <LF> somewhere in the pre-parsing stage. I don't think the culprit
is the sub header().

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7078] Mail::Spamassassin::Message::Node::header() error.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7078

--- Comment #8 from Mark Martinec <Ma...@ijs.si> ---
> What is the initial issue for this bug report?

Yes, the issue was that the Message::Node::header assumes that
line endings have already been normalized, so the following
unfolding did not accomplish its task:

  $dec_value =~ s/\n[ \t]+/ /gs;

leading to parse error on parsing the date in a Received header field.

So, ether line ending normalization needed to be fixed before
these lines reach Message::Node::header(), or this routine
needs to deal with both forms, independently of the decision
already made (about a need for normalization in MS::Message::new).

Considering that MS::Message::new already already does the
line ending normalizations for a body, and already decided
($squash_crlf) whether such normalization is needed or not,
it seems more logical to do the normalization there and not
in Node::header().

> This is NOT because of header() incorrectly parsing and breaking lines,
> though. With the patch attachment 5232 [details] applied, the result is the
> same and the rule still hits.

This is because of the last-minute missing patch, just applied.
It does parse it correctly now.

> If the original un-modified message  (a) has CRLF line endings 
> (b) causing parsing issues,  wouldn't that have come up long ago?

I suppose it would, but nobody noticed it.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7078] Mail::Spamassassin::Message::Node::header() error.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7078

--- Comment #6 from Karsten Bräckelmann <gu...@rudersport.de> ---
What is the initial issue for this bug report?

Clearly, there was only a need for debugging the Node::header() sub, due to a
message triggering DATE_IN_PAST_96_XX. Which is no surprise given the headers
of the sample attachment 5231.

  Received: Tue, 22 Jul 2014 16:54:27 +0800
  Received: Sat, 27 Apr 2013 15:38:17 +0800
  Date: Sat, 27 Apr 2013 15:37:53 +0800

And indeed, running the given sample through SA results in that rule hit. This
is NOT because of header() incorrectly parsing and breaking lines, though. With
the patch attachment 5232 applied, the result is the same and the rule still
hits.

It is obvious the sample has been modified. Which might explain the Windows
style CRLF line endings in the headers.

I am not opposing the patch. I am just not convinced it actually fixes
anything. If the original un-modified message  (a) has CRLF line endings  (b)
causing parsing issues,  wouldn't that have come up long ago?


Liu, can you add some detail about the initial issue, please? What was the
reason for debugging SA code in the first place?

Also, can you access the original, raw message and check its headers' line
endings?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7078] Mail::Spamassassin::Message::Node::header() error.

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7078

--- Comment #10 from Mark Martinec <Ma...@ijs.si> ---
(In reply to Karsten Bräckelmann from comment #6)
> [...] a message triggering DATE_IN_PAST_96_XX. Which is no surprise
> given the headers of the sample attachment 5231 [details].
> 
>   Received: Tue, 22 Jul 2014 16:54:27 +0800
>   Received: Sat, 27 Apr 2013 15:38:17 +0800
>   Date: Sat, 27 Apr 2013 15:37:53 +0800

The answer to that specific case lies in the folding of
the header field:

Received: from yangzhengchu (192.168.179.243) by smtp.isof.com<CR><LF>
 (192.168.2.142) with Microsoft SMTP Server (TLS) id 14.2.247.3; Sat, 27
Apr<CR><LF>
 2013 15:38:17 +0800<CR><LF>

There is a line fold within the date (which is perfectly valid):
  Sat, 27 Apr<CR><LF>2013 15:38:17 +0800

Given the unfortunate
  $dec_value =~ s/\n[ \t]+/ /gs;
this resulted in a <CR> (without a <LF>) being left in the
half-unfolded header field, which then failed to match a regexp

      if ($rcvd =~ m/(\s.?\d+ \S\S\S \d+ \d+:\d+:\d+ \S+)/) {

in MS::SpamAssassin/Plugin/HeaderEval::_get_received_header_times() .

So of the two dates in Received header field:
     Tue, 22 Jul 2014 16:54:27 +0800
and  Sat, 27 Apr<CR> 2013 15:38:17 +0800

in the broken case only the first was recognized, which was
a year apart from a date in a Date header field:
  Date: Sat, 27 Apr 2013 15:37:53 +0800
which resulted in a rule firing.

Now after a fix, both dates in Received header fields are recognized,
and the second is very close to a Date, so the DATE_IN_PAST
does not fire.

-- 
You are receiving this mail because:
You are the assignee for the bug.