You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by "Malte S. Stretz" <ms...@gmx.net> on 2004/08/17 00:33:28 UTC
Weird code path for EvalTests::check_dcc()
While I tracked another possible bug (see [1]), I had a look how check_dcc
works. And was completely puzzled by the code.
check_dcc() calls
$self->get('X-DCC-(?:[^:]+-)?Metrics')
to shortcut some ocde path. That's obviously intended to be a RE. But on
the whole path through the code I can't find the point where the header is
matched via a RE.
Follow me through the code, maybe you can enlighten me:
1. EvalTests.pm:2599: check_dcc($full)
-> EvalTests.pm:2611: calls PerMsgStatus::get()
2. PerMsgStatus.pm:1175: get('X-DCC-(?:[^:]+-)?Metrics', undef)
-> PerMsgStatus.pm:1179: assume header cache isn't hit
-> PerMsgStatus.pm:1186: it's not a raw header
-> PerMsgStatus.pm:1226: no special header
-> PerMsgStatus.pm:1227: calls Message::Node::get_header()
3. Message/Node.pm:549: get_header('X-DCC-(?:[^:]+-)?Metrics', 0)
-> Message/Node.pm:564: not raw
-> Message/Node.pm:565: calls Message::Node::get()
4. Message/Node.pm:150: header('X-DCC-(?:[^:]+-)?Metrics', undef)
-> Message/Node.pm:154: key given, all right (apply some normalization)
-> Message/Node.pm:163: no more values in @_, skip block
-> Message/Node.pm:180: return something from the pre-parsed mail
Ok, so where's the RE applied? Even if other code paths were taken I can't
find any point where the string is made an RE and I can't believe that the
mail parser puts the "Header" check_dcc() wants into the cache-hash.
Another weird thing: The header cache in PerMsgStatus::get():
- it doesn't compare case-insensitive (probably intended as it seems like
ALL and ToCc are case sensitive, too)
- it caches the original $request, including the weird RE (probably
intended)
Cheers,
Malte
[1] http://bugs.gentoo.org/show_bug.cgi?id=58057#c12
--
[SGT] Simon G. Tatham: "How to Report Bugs Effectively"
<http://www.chiark.greenend.org.uk/~sgtatham/bugs.html>
[ESR] Eric S. Raymond: "How To Ask Questions The Smart Way"
<http://www.catb.org/~esr/faqs/smart-questions.html>
Re: Weird code path for EvalTests::check_dcc()
Posted by "Malte S. Stretz" <ms...@gmx.net>.
On Tuesday 17 August 2004 05:56 CET Daniel Quinlan wrote:
>[...]
> This is worth looking into, but I wouldn't delay -final *at all* to
> change how get() caching works.
Ok, it's now bug 3692 [1].
Both issues are IMO no reason to delay -final. They are just optimizations
which don't (always) work, definitels something we can fix for 3.1.0 and
backport to 3.0.1.
Cheers,
Malte
[1] http://bugzilla.spamassassin.org/show_bug.cgi?id=3692
--
[SGT] Simon G. Tatham: "How to Report Bugs Effectively"
<http://www.chiark.greenend.org.uk/~sgtatham/bugs.html>
[ESR] Eric S. Raymond: "How To Ask Questions The Smart Way"
<http://www.catb.org/~esr/faqs/smart-questions.html>
Re: Weird code path for EvalTests::check_dcc()
Posted by Nick Leverton <nj...@leverton.org>.
On Mon, Aug 16, 2004 at 09:32:11PM -0700, Loren Wilton wrote:
>
> Does that RE actually make any sense? Is it really expecting to find either
> X-DCC-Metrics:
> or
> X-DCC-[^:]+-Metrics: ?
>
> I can imagine the first header existing, but something that has a dash, a
> something, and a +- in sequence seems rather strange for a header name.
[^:]+ means "a character other than colon, occurring one or more times".
I have one like that in my current mailbox, with a header saying
X-DCC-IECC-Metrics: xuxa.iecc.com 1107; Body=1 Fuz1=2 Fuz2=2
Nick
--
"And we will be restoring neurotypicality just as soon as we are sure
what is normal anyway. Thank you". -- not quite DNA
Re: Weird code path for EvalTests::check_dcc()
Posted by Loren Wilton <lw...@earthlink.net>.
> > return 1 if grep(/^X-DCC-(?:[^:]+-)?Metrics:/ && /bulk/,
@{$self->{msg}->=
Does that RE actually make any sense? Is it really expecting to find either
X-DCC-Metrics:
or
X-DCC-[^:]+-Metrics: ?
I can imagine the first header existing, but something that has a dash, a
something, and a +- in sequence seems rather strange for a header name.
Re: Weird code path for EvalTests::check_dcc()
Posted by Theo Van Dinter <fe...@kluge.net>.
On Mon, Aug 16, 2004 at 08:56:58PM -0700, Dan Quinlan wrote:
> > return 1 if grep(/^X-DCC-(?:[^:]+-)?Metrics:/ && /bulk/, @{$self->{msg}->=
> > get_all_headers()});
>
> If we can expand out the list of headers, we should just go with that.
Sure. I was reproducing the initial behavior, but if we know the list
of headers, that'd be better. The issue, however, is:
[...]
DCC SMTP header lines are of the form:
X-DCC-brand-Metrics: chost server-ID; bulk cknm1=count cknm2=count ...
where
brand is the "brand name" of the DCC server, such as "RHYOLITE".
[...]
So "brand" is open to how things are configured and which DCC server is
being used. I think we have to do the get_all_headers() call there.
> Actually, that test has to be done via get('ALL') because you can't get
> the raw header name via get($header);
Ah... You're right. I should learn not to respond to mails when I have
17 other things going on at the same time. <sigh>
--
Randomly Generated Tagline:
Life gets boring, someone invents another necessity, and once again we
turn the crank on the screwjack of progress hoping that nobody gets
screwed.
-- Larry Wall in <19...@wall.org>
Re: Weird code path for EvalTests::check_dcc()
Posted by Daniel Quinlan <qu...@pathname.com>.
Theo Van Dinter <fe...@kluge.net> writes:
> with:
>
> return 1 if grep(/^X-DCC-(?:[^:]+-)?Metrics:/ && /bulk/, @{$self->{msg}->=
> get_all_headers()});
If we can expand out the list of headers, we should just go with that.
Fetching the entire list of headers is definitely inefficient unless we
really need it.
>> Another weird thing: The header cache in PerMsgStatus::get():
>> - it doesn't compare case-insensitive (probably intended as it seems like
>> ALL and ToCc are case sensitive, too)
Hmmm... that might not be optimal.
> Yeah, because you may want to check MIME-Version and MiME-Version, for inst=
> ance.
Actually, that test has to be done via get('ALL') because you can't get
the raw header name via get($header);
This is worth looking into, but I wouldn't delay -final *at all* to
change how get() caching works.
Daniel
--
Daniel Quinlan
http://www.pathname.com/~quinlan/
Re: Weird code path for EvalTests::check_dcc()
Posted by Theo Van Dinter <fe...@kluge.net>.
On Tue, Aug 17, 2004 at 12:33:28AM +0200, Malte S. Stretz wrote:
> While I tracked another possible bug (see [1]), I had a look how check_dcc
> works. And was completely puzzled by the code.
Heh.
> check_dcc() calls
> $self->get('X-DCC-(?:[^:]+-)?Metrics')
> to shortcut some ocde path. That's obviously intended to be a RE. But on
> the whole path through the code I can't find the point where the header is
> matched via a RE.
Well, now that you mention it... PMS::get() doesn't accept an RE.
> Follow me through the code, maybe you can enlighten me:
> 2. PerMsgStatus.pm:1175: get('X-DCC-(?:[^:]+-)?Metrics', undef)
Yeah, that's not going to work.
> -> PerMsgStatus.pm:1227: calls Message::Node::get_header()
doesn't do an RE either.
> Ok, so where's the RE applied? Even if other code paths were taken I can't
> find any point where the string is made an RE and I can't believe that the
> mail parser puts the "Header" check_dcc() wants into the cache-hash.
Yeah, so we ought to replace:
$_ = $self->get('X-DCC-(?:[^:]+-)?Metrics');
return 1 if /bulk/;
with:
return 1 if grep(/^X-DCC-(?:[^:]+-)?Metrics:/ && /bulk/, @{$self->{msg}->get_all_headers()});
Time for a ticket... :(
> Another weird thing: The header cache in PerMsgStatus::get():
> - it doesn't compare case-insensitive (probably intended as it seems like
> ALL and ToCc are case sensitive, too)
Yeah, because you may want to check MIME-Version and MiME-Version, for instance.
--
Randomly Generated Tagline:
"Never make any mistaeks."
(Anonymous, in a mail discussion about to a kernel bug report.)