You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by "Malte S. Stretz" <ms...@gmx.net> on 2004/08/17 00:33:28 UTC

Weird code path for EvalTests::check_dcc()

While I tracked another possible bug (see [1]), I had a look how check_dcc 
works.  And was completely puzzled by the code.

check_dcc() calls
  $self->get('X-DCC-(?:[^:]+-)?Metrics')
to shortcut some ocde path.  That's obviously intended to be a RE.  But on 
the whole path through the code I can't find the point where the header is 
matched via a RE.

Follow me through the code, maybe you can enlighten me:

1. EvalTests.pm:2599: check_dcc($full)

-> EvalTests.pm:2611: calls PerMsgStatus::get()

2. PerMsgStatus.pm:1175: get('X-DCC-(?:[^:]+-)?Metrics', undef)

-> PerMsgStatus.pm:1179: assume header cache isn't hit
-> PerMsgStatus.pm:1186: it's not a raw header
-> PerMsgStatus.pm:1226: no special header
-> PerMsgStatus.pm:1227: calls Message::Node::get_header()

3. Message/Node.pm:549: get_header('X-DCC-(?:[^:]+-)?Metrics', 0)

-> Message/Node.pm:564: not raw
-> Message/Node.pm:565: calls Message::Node::get()

4. Message/Node.pm:150: header('X-DCC-(?:[^:]+-)?Metrics', undef)

-> Message/Node.pm:154: key given, all right (apply some normalization)
-> Message/Node.pm:163: no more values in @_, skip block
-> Message/Node.pm:180: return something from the pre-parsed mail


Ok, so where's the RE applied?  Even if other code paths were taken I can't 
find any point where the string is made an RE and I can't believe that the 
mail parser puts the "Header" check_dcc() wants into the cache-hash.


Another weird thing:  The header cache in PerMsgStatus::get():
 - it doesn't compare case-insensitive (probably intended as it seems like
   ALL and ToCc are case sensitive, too)
 - it caches the original $request, including the weird RE (probably
   intended)

Cheers,
Malte


[1] http://bugs.gentoo.org/show_bug.cgi?id=58057#c12

-- 
[SGT] Simon G. Tatham: "How to Report Bugs Effectively"
      <http://www.chiark.greenend.org.uk/~sgtatham/bugs.html>
[ESR] Eric S. Raymond: "How To Ask Questions The Smart Way"
      <http://www.catb.org/~esr/faqs/smart-questions.html>

Re: Weird code path for EvalTests::check_dcc()

Posted by "Malte S. Stretz" <ms...@gmx.net>.
On Tuesday 17 August 2004 05:56 CET Daniel Quinlan wrote:
>[...]
> This is worth looking into, but I wouldn't delay -final *at all* to
> change how get() caching works.

Ok, it's now bug 3692 [1].

Both issues are IMO no reason to delay -final.  They are just optimizations 
which don't (always) work, definitels something we can fix for 3.1.0 and 
backport to 3.0.1.

Cheers,
Malte

[1] http://bugzilla.spamassassin.org/show_bug.cgi?id=3692

-- 
[SGT] Simon G. Tatham: "How to Report Bugs Effectively"
      <http://www.chiark.greenend.org.uk/~sgtatham/bugs.html>
[ESR] Eric S. Raymond: "How To Ask Questions The Smart Way"
      <http://www.catb.org/~esr/faqs/smart-questions.html>

Re: Weird code path for EvalTests::check_dcc()

Posted by Nick Leverton <nj...@leverton.org>.
On Mon, Aug 16, 2004 at 09:32:11PM -0700, Loren Wilton wrote:
> 
> Does that RE actually make any sense?  Is it really expecting to find either
> X-DCC-Metrics:
> or
> X-DCC-[^:]+-Metrics:  ?
> 
> I can imagine the first header existing, but something that has a dash, a
> something, and a +- in sequence seems rather strange for a header name.

[^:]+ means "a character other than colon, occurring one or more times".

I have one like that in my current mailbox, with a header saying
X-DCC-IECC-Metrics: xuxa.iecc.com 1107; Body=1 Fuz1=2 Fuz2=2

Nick
-- 
"And we will be restoring neurotypicality just as soon as we are sure
what is normal anyway. Thank you".  -- not quite DNA

Re: Weird code path for EvalTests::check_dcc()

Posted by Loren Wilton <lw...@earthlink.net>.
> >   return 1 if grep(/^X-DCC-(?:[^:]+-)?Metrics:/ && /bulk/,
@{$self->{msg}->=

Does that RE actually make any sense?  Is it really expecting to find either
X-DCC-Metrics:
or
X-DCC-[^:]+-Metrics:  ?

I can imagine the first header existing, but something that has a dash, a
something, and a +- in sequence seems rather strange for a header name.


Re: Weird code path for EvalTests::check_dcc()

Posted by Theo Van Dinter <fe...@kluge.net>.
On Mon, Aug 16, 2004 at 08:56:58PM -0700, Dan Quinlan wrote:
> >   return 1 if grep(/^X-DCC-(?:[^:]+-)?Metrics:/ && /bulk/, @{$self->{msg}->=
> > get_all_headers()});
> 
> If we can expand out the list of headers, we should just go with that.

Sure.  I was reproducing the initial behavior, but if we know the list
of headers, that'd be better.  The issue, however, is:

[...]
     DCC SMTP header lines are of the form:

     X-DCC-brand-Metrics: chost server-ID; bulk cknm1=count cknm2=count ...
     where
        brand   is the "brand name" of the DCC server, such as "RHYOLITE".
[...]

So "brand" is open to how things are configured and which DCC server is
being used.  I think we have to do the get_all_headers() call there.

> Actually, that test has to be done via get('ALL') because you can't get
> the raw header name via get($header);

Ah...  You're right.  I should learn not to respond to mails when I have
17 other things going on at the same time. <sigh>

-- 
Randomly Generated Tagline:
Life gets boring, someone invents another necessity, and once again we
 turn the crank on the screwjack of progress hoping that nobody gets
 screwed.
              -- Larry Wall in <19...@wall.org>

Re: Weird code path for EvalTests::check_dcc()

Posted by Daniel Quinlan <qu...@pathname.com>.
Theo Van Dinter <fe...@kluge.net> writes:

> with:
> 
>   return 1 if grep(/^X-DCC-(?:[^:]+-)?Metrics:/ && /bulk/, @{$self->{msg}->=
> get_all_headers()});

If we can expand out the list of headers, we should just go with that.
Fetching the entire list of headers is definitely inefficient unless we
really need it.

>> Another weird thing:  The header cache in PerMsgStatus::get():
>>  - it doesn't compare case-insensitive (probably intended as it seems like
>>    ALL and ToCc are case sensitive, too)

Hmmm... that might not be optimal.

> Yeah, because you may want to check MIME-Version and MiME-Version, for inst=
> ance.

Actually, that test has to be done via get('ALL') because you can't get
the raw header name via get($header);

This is worth looking into, but I wouldn't delay -final *at all* to
change how get() caching works.

Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/

Re: Weird code path for EvalTests::check_dcc()

Posted by Theo Van Dinter <fe...@kluge.net>.
On Tue, Aug 17, 2004 at 12:33:28AM +0200, Malte S. Stretz wrote:
> While I tracked another possible bug (see [1]), I had a look how check_dcc 
> works.  And was completely puzzled by the code.

Heh.

> check_dcc() calls
>   $self->get('X-DCC-(?:[^:]+-)?Metrics')
> to shortcut some ocde path.  That's obviously intended to be a RE.  But on 
> the whole path through the code I can't find the point where the header is 
> matched via a RE.

Well, now that you mention it...  PMS::get() doesn't accept an RE.

> Follow me through the code, maybe you can enlighten me:
> 2. PerMsgStatus.pm:1175: get('X-DCC-(?:[^:]+-)?Metrics', undef)

Yeah, that's not going to work.

> -> PerMsgStatus.pm:1227: calls Message::Node::get_header()

doesn't do an RE either.

> Ok, so where's the RE applied?  Even if other code paths were taken I can't 
> find any point where the string is made an RE and I can't believe that the 
> mail parser puts the "Header" check_dcc() wants into the cache-hash.

Yeah, so we ought to replace:

  $_ = $self->get('X-DCC-(?:[^:]+-)?Metrics');
  return 1 if /bulk/;

with:

  return 1 if grep(/^X-DCC-(?:[^:]+-)?Metrics:/ && /bulk/, @{$self->{msg}->get_all_headers()});

Time for a ticket... :(

> Another weird thing:  The header cache in PerMsgStatus::get():
>  - it doesn't compare case-insensitive (probably intended as it seems like
>    ALL and ToCc are case sensitive, too)

Yeah, because you may want to check MIME-Version and MiME-Version, for instance.

-- 
Randomly Generated Tagline:
"Never make any mistaeks."
 (Anonymous, in a mail discussion about to a kernel bug report.)