You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Cedric Knight <ce...@gn.apc.org> on 2012/04/04 12:14:37 UTC

TVD_FROM_1 false positive

Hi

This rule has been mentioned here before by flo@rfc822.org back in 2009,
when it scored a mere 1.0.  In the 3.3.1 update channel active.cf has:

##{ TVD_FROM_1
header TVD_FROM_1       From:addr =~
/[^\@0-9]{2}\d{3}\.(?:com|net|org|info|biz)$/i
##} TVD_FROM_1
score TVD_FROM_1                            2.799 2.799 2.799 2.799

I've noticed it hitting the domain of a concerned user.  Of the top of
my head, I can think of other reputable domains ending in at least 1 or
2 digits, and don't personally see 3 digits as an essentially spammy
characteristic (although many domains ending 360 or 365 are indeed
associated with spam or dirty lists).

In my humble opinion:

(a) the high and variable score may be a result of an insufficiently
diverse ham corpus for the rescore mass check.  (I'd contribute myself
in a small way but am put off more by the fact that it's time-critical
and don't see any announcements than just the amount of work involved.)

(b) it might be better if rules like this, that presumably hit a large
amount of spam over a short period, were associated with other
characteristics of the same spam as a meta rule.  They could be
formulated as subrules or held to a score of at most 0.1, but merely
allowing the scorer to choose between the meta rule and its components
could have a similar effect.  This might not just reduce the adverse
effect of potential false positives but also, in the absence of a
description, clarify the intention of the rule or type of spam that it's
aimed at.

What's to be done?

-- 
All best wishes,

Cedric Knight


Re: TVD_FROM_1 false positive

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 4/4/2012 6:14 AM, Cedric Knight wrote:
> This rule has been mentioned here before by flo@rfc822.org back in 2009,
> when it scored a mere 1.0.  In the 3.3.1 update channel active.cf has:
>
> ##{ TVD_FROM_1
> header TVD_FROM_1       From:addr =~
> /[^\@0-9]{2}\d{3}\.(?:com|net|org|info|biz)$/i
> ##} TVD_FROM_1
> score TVD_FROM_1                            2.799 2.799 2.799 2.799
>
> I've noticed it hitting the domain of a concerned user.  Of the top of
> my head, I can think of other reputable domains ending in at least 1 or
> 2 digits, and don't personally see 3 digits as an essentially spammy
> characteristic (although many domains ending 360 or 365 are indeed
> associated with spam or dirty lists).
>
> In my humble opinion:
>
> (a) the high and variable score may be a result of an insufficiently
> diverse ham corpus for the rescore mass check.  (I'd contribute myself
> in a small way but am put off more by the fact that it's time-critical
> and don't see any announcements than just the amount of work involved.)
>
> (b) it might be better if rules like this, that presumably hit a large
> amount of spam over a short period, were associated with other
> characteristics of the same spam as a meta rule.  They could be
> formulated as subrules or held to a score of at most 0.1, but merely
> allowing the scorer to choose between the meta rule and its components
> could have a similar effect.  This might not just reduce the adverse
> effect of potential false positives but also, in the absence of a
> description, clarify the intention of the rule or type of spam that it's
> aimed at.
>
> What's to be done?
At the moment, I would recommend a ticket in bugzilla.  I'm always a fan 
of meta tags as well but this does seem to be scored to high.

However, until we get masscheck involved with enough corpora to fire off 
rules again, you'll have to score this locally.

Regards,
KAM