You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Franz Schwartau <fr...@electromail.org> on 2015/06/17 23:08:55 UTC

Problem with TxRep's HELO handling

Hi!

A few days ago I replaced AWL with TxRep. From time to time I get
unusual high scores caused by TxRep since then.

So I started debugging the TxRep plugin a bit. The high scores are
caused by "HELO: localhost" after sa-learn of a spam mailbox.

In check_senders_reputation() line 1252 reads:

foreach my $rly ( @{$pms->{relays_trusted}}, @{$pms->{relays_untrusted}} ) {

Thus every relay parsed from Received headers is used. This leads to
$helo = 'localhost' (line 1256) if there is no from in a Received header.

Please see the attached log for details. Please note that the high
scores don't show up in this example. Don't get confused... ;-)

I don't get why TxRep evaluates every relay in line 1252 and following.
Shouldn't it just use the latest relay?

	Best regards
		Franz

Re: Problem with TxRep's HELO handling

Posted by RW <rw...@googlemail.com>.
On Thu, 18 Jun 2015 12:29:44 +0200
Matus UHLAR - fantomas wrote:

> On 18.06.15 09:11, Franz Schwartau wrote:

> >The lastest (third) Received header causes $helo to be set to
> >'localhost'.
> >
> >It would make more sense if TxRep uses the lastest (first) Received
> >header setting $helo to 'mail-wi0-f175.google.com'.
> 
> shouldn't that logically be more like lasttrusted header?

last-trusted is the correct generic way of putting it (often the top
header is purely internal). But that's not what TxRep is trying to do.

The last-trusted helo is under the control of the sender, so there's no
strong reason to prefer last-trusted on grounds of trust. And
last-trusted is better tracked through IP address or rdns anyway.

What TxRep is trying to do is track the helo from the original sender as
this can sometimes track a sending device across multiple services and
IP addresses. Aside from deliberate forgery this is going to fail in all
kinds of cases (e.g. webmail for one), and I doubt there's any good way
of fixing it. 

IIWY I'd just weight it at zero. IMO TxRep is a bit of a mixed-bag, it
is a better AWL, but I'm  sceptical about some of it's additional
features.


The reason why score averaging is appealing is that it doesn't require
knowing whether an email is spam or ham, but it is important to
partition the mail so that you're averaging either spam or ham
together,  but not both. The average score of a mix of spam and ham is
a pretty meaningless, apples and oranges average. Forgery aside, AWL's
email and IP address combination does this pretty well.  TxRep fixes the
IP address forgery problem, and fixes some other minor problems, but
then it introduce some additional things into the average that are
either mixed-sources, forgeable or unreliable.    


Re: Problem with TxRep's HELO handling

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 18.06.15 09:11, Franz Schwartau wrote:
>Yes, TxRep right now uses the _last_ Received header. IMHO it should use
>the _latest_ Received header, only.
>
>Let's have a look at the following series of Received header which cause
>the problem:
>
>Received: from mail-wi0-f175.google.com (mail-wi0-f175.google.com
>[209.85.212.175])
>        by mx1.domain.com (envelope-sender <al...@price2spy.com>)
>(MIMEDefang) with ESMTP id t5HJ97If029681
>        for <re...@domain.com>; Wed, 17 Jun 2015 21:09:10 +0200
>Received: by wiwd19 with SMTP id d19so1242876wiw.0
>        for <re...@domain.com>; Wed, 17 Jun 2015 12:09:07 -0700 (PDT)
>Received: from localhost ([188.95.50.54])
>        by mx.google.com with ESMTPSA id
>ka7sm8287084wjc.36.2015.06.17.12.09.06
>        (version=TLSv1 cipher=RC4-SHA bits=128/128);
>        Wed, 17 Jun 2015 12:09:07 -0700 (PDT)
>
>The lastest (third) Received header causes $helo to be set to 'localhost'.
>
>It would make more sense if TxRep uses the lastest (first) Received
>header setting $helo to 'mail-wi0-f175.google.com'.

shouldn't that logically be more like lasttrusted header?

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Chernobyl was an Windows 95 beta test site.

Re: Problem with TxRep's HELO handling

Posted by Franz Schwartau <fr...@electromail.org>.
Hi!

On 18.06.2015 00:39, RW wrote:
> On Wed, 17 Jun 2015 23:08:55 +0200
> Franz Schwartau wrote:
> 
>> Hi!
>>
>> A few days ago I replaced AWL with TxRep. From time to time I get
>> unusual high scores caused by TxRep since then.
>>
>> So I started debugging the TxRep plugin a bit. The high scores are
>> caused by "HELO: localhost" after sa-learn of a spam mailbox.
>>
>> In check_senders_reputation() line 1252 reads:
>>
>> foreach my $rly ( @{$pms->{relays_trusted}},
>> @{$pms->{relays_untrusted}} ) {
>>
>> Thus every relay parsed from Received headers is used. This leads to
>> $helo = 'localhost' (line 1256) if there is no from in a Received
>> header.
> ...
>> I don't get why TxRep evaluates every relay in line 1252 and
>> following. Shouldn't it just use the latest relay?
> 
> As far as I can see it does. It works it's way back through the headers
> setting $helo as it goes, so $helo ends-up being set on the last
> received header tested that matches the criteria (the last tested being
> from the lowest received header in the email).

Yes, TxRep right now uses the _last_ Received header. IMHO it should use
the _latest_ Received header, only.

Let's have a look at the following series of Received header which cause
the problem:

Received: from mail-wi0-f175.google.com (mail-wi0-f175.google.com
[209.85.212.175])
        by mx1.domain.com (envelope-sender <al...@price2spy.com>)
(MIMEDefang) with ESMTP id t5HJ97If029681
        for <re...@domain.com>; Wed, 17 Jun 2015 21:09:10 +0200
Received: by wiwd19 with SMTP id d19so1242876wiw.0
        for <re...@domain.com>; Wed, 17 Jun 2015 12:09:07 -0700 (PDT)
Received: from localhost ([188.95.50.54])
        by mx.google.com with ESMTPSA id
ka7sm8287084wjc.36.2015.06.17.12.09.06
        (version=TLSv1 cipher=RC4-SHA bits=128/128);
        Wed, 17 Jun 2015 12:09:07 -0700 (PDT)

The lastest (third) Received header causes $helo to be set to 'localhost'.

It would make more sense if TxRep uses the lastest (first) Received
header setting $helo to 'mail-wi0-f175.google.com'.

	Best regards
		Franz

Re: Problem with TxRep's HELO handling

Posted by RW <rw...@googlemail.com>.
On Wed, 17 Jun 2015 23:08:55 +0200
Franz Schwartau wrote:

> Hi!
> 
> A few days ago I replaced AWL with TxRep. From time to time I get
> unusual high scores caused by TxRep since then.
> 
> So I started debugging the TxRep plugin a bit. The high scores are
> caused by "HELO: localhost" after sa-learn of a spam mailbox.
> 
> In check_senders_reputation() line 1252 reads:
> 
> foreach my $rly ( @{$pms->{relays_trusted}},
> @{$pms->{relays_untrusted}} ) {
> 
> Thus every relay parsed from Received headers is used. This leads to
> $helo = 'localhost' (line 1256) if there is no from in a Received
> header.
...
> I don't get why TxRep evaluates every relay in line 1252 and
> following. Shouldn't it just use the latest relay?

As far as I can see it does. It works it's way back through the headers
setting $helo as it goes, so $helo ends-up being set on the last
received header tested that matches the criteria (the last tested being
from the lowest received header in the email).