You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Alex <my...@gmail.com> on 2017/02/21 03:40:01 UTC

top and other spammy TLDs

Hi,

Some time ago I had put together a rule based on comments from this
list, and I've identified a FP that I hoped someone could help me to
correct.

The full domain in the email was http://www.top-1.biz. However, it's
being tagged as if it's "top" as the TLD in one of KAMs rules and one
of mine:

Feb 20 22:34:25.988 [31215] dbg: rules: ran uri rule __KAM_TINYDOMAIN
======> got hit: "-1.biz/"
Feb 20 22:34:25.988 [31215] dbg: rules: ran uri rule LOC_URI_RARE_TLD
======> got hit: "://www.top"

uri        LOC_URI_RARE_TLD
m;://[^/]+\.(?:work|space|club|science|pub|red|blue|green|link|ninja|lol|xyz|faith|review|download|top|global|(?:web)?site|tech|party|pro|bid|trade|win|moda|news|online)(?:/|\b);i
describe   LOC_URI_RARE_TLD     URI refers to rarely-nonspam TLD
score      LOC_URI_RARE_TLD     0.400

How can this be corrected to specifically only catch top as a TLD?

Re: top and other spammy TLDs

Posted by John Hardin <jh...@impsec.org>.
On Mon, 20 Feb 2017, Alex wrote:

> Hi,
>
> Some time ago I had put together a rule based on comments from this
> list, and I've identified a FP that I hoped someone could help me to
> correct.
>
> The full domain in the email was http://www.top-1.biz. However, it's
> being tagged as if it's "top" as the TLD in one of KAMs rules and one
> of mine:
>
> Feb 20 22:34:25.988 [31215] dbg: rules: ran uri rule __KAM_TINYDOMAIN
> ======> got hit: "-1.biz/"
> Feb 20 22:34:25.988 [31215] dbg: rules: ran uri rule LOC_URI_RARE_TLD
> ======> got hit: "://www.top"
>
> uri        LOC_URI_RARE_TLD m;://[^/]+\.(?:work|space|club|science|pub|red|blue|green|link|ninja|lol|xyz|faith|review|download|top|global|(?:web)?site|tech|party|pro|bid|trade|win|moda|news|online)(?:/|\b);i
>
> How can this be corrected to specifically only catch top as a TLD?

Re LOC_URI_RARE_TLD:

It's a URI rule, so anchor the end with (?:/|$) - if it's a bare domain 
the TLD will be at the end of the URI. If it's got a path part the domain 
will be followed by a slash.

Thanks for bringing that up, fixed here too.

Dunno about __KAM_TINYDOMAIN

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Homeland Security: Specializing in Tactical Band-aids
   for Strategic Problems.         -- Eric K. in Bruce Schneier's blog
-----------------------------------------------------------------------
  2 days until George Washington's 285th Birthday

Re: top and other spammy TLDs

Posted by Paul Stead <pa...@zeninternet.co.uk>.

On 21/02/2017, 23:15, "Paul Stead" <pa...@zeninternet.co.uk> wrote:

    I can’t see how this can be the same for the check_from_in_list calls, however?

Apologies – it is not possible to add custom addrlists in SA - https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7354

--
Paul Stead
Systems Engineer
Zen Internet

Re: top and other spammy TLDs

Posted by Paul Stead <pa...@zeninternet.co.uk>.

On 21/02/2017, 18:41, "RW" <rw...@googlemail.com> wrote:

    On Tue, 21 Feb 2017 17:57:13 +0000
    Paul Stead wrote:

    > I’ve posted this before, this is how I manage these nasty TLDs:
    >
    > Make sure WLBLEval is enabled:
    >
    > loadplugin Mail::SpamAssassin::Plugin::WLBLEval
    >
    > Then add the TLDs to a URI_HOST list:
    >
    > enlist_uri_host (NEWSPAMMY) top

    The other way is safer, because it only matches on actual
    URLs and not typos, filenames etc.

I can agree with this statement for check_uri_host_listed – possibly worth noting that check_uri_host_in_blacklist uses the same underlying code – which is scored +100 by SA internal scoring. Is there a justification for having a uri list that has been vetted to ensure it is an “actual URL” and not a typo?

I can’t see how this can be the same for the check_from_in_list calls, however?

Paul

--
Paul Stead
Systems Engineer
Zen Internet

Re: top and other spammy TLDs

Posted by RW <rw...@googlemail.com>.
On Tue, 21 Feb 2017 18:41:08 +0000
RW wrote:

> On Tue, 21 Feb 2017 17:57:13 +0000
> Paul Stead wrote:
> 
> > I’ve posted this before, this is how I manage these nasty TLDs:
> > 
> > Make sure WLBLEval is enabled:
> > 
> > loadplugin Mail::SpamAssassin::Plugin::WLBLEval
> > 
> > Then add the TLDs to a URI_HOST list:
> > 
> > enlist_uri_host (NEWSPAMMY) top  
> 
> The other way is safer, because it only matches on actual
> URLs and not typos, filenames etc.

Actually I was wrong about that. It seems that in URI tests http:// is
automatically prepended to any host or domain name that doesn't have a
protocol.

Re: top and other spammy TLDs

Posted by RW <rw...@googlemail.com>.
On Tue, 21 Feb 2017 17:57:13 +0000
Paul Stead wrote:

> I’ve posted this before, this is how I manage these nasty TLDs:
> 
> Make sure WLBLEval is enabled:
> 
> loadplugin Mail::SpamAssassin::Plugin::WLBLEval
> 
> Then add the TLDs to a URI_HOST list:
> 
> enlist_uri_host (NEWSPAMMY) top

The other way is safer, because it only matches on actual
URLs and not typos, filenames etc.

Re: top and other spammy TLDs

Posted by Paul Stead <pa...@zeninternet.co.uk>.

On 25/02/2017, 00:39, "Alex" <my...@gmail.com> wrote:

    header   PDS_FROM_OTHER_BAD_TLD eval:check_from_in_list('NEWSPAMMY')


This particular check will not work as the current release of SA does not include the improvement in the BZ report. If you have the patch included (I can’t support you patching your production SA) you can use:

enlist_addrlist (NEWSPAMMY) *@*.top

to create the NEWSPAMMY addrlist to then use the check_from_list_list and associated evals


Paul

--
Paul Stead
Systems Engineer
Zen Internet

Re: top and other spammy TLDs

Posted by Alex <my...@gmail.com>.
Hi,

On Fri, Feb 24, 2017 at 7:33 PM, Benny Pedersen <me...@junc.eu> wrote:
> Alex skrev den 2017-02-25 01:18:
>
>> Is there something more that needs to be done than the above?
>
>
> what sa version ?
>
> i know it works with 3.4.1
>
> but have disabled my own rules again

This is a relatively recent svn release, but I've just searched a
3.4.1 tree and there's no occurrence of NEWSPAMMY there either. This
is the config I'm using to start:

# (this is loaded in v320.pre)
# loadplugin Mail::SpamAssassin::Plugin::WLBLEval
#Then add the TLDs to a URI_HOST list:
enlist_uri_host (NEWSPAMMY) top
enlist_uri_host (NEWSPAMMY) date
enlist_uri_host (NEWSPAMMY) faith
enlist_uri_host (NEWSPAMMY) racing
#These can then be used with eval rules:
#To check all URIs:
header   PDS_OTHER_BAD_TLD eval:check_uri_host_listed('NEWSPAMMY')
score    PDS_OTHER_BAD_TLD 0.1
describe PDS_OTHER_BAD_TLD Other untrustworthy TLDs
#if you just want to check From address:
header   PDS_FROM_OTHER_BAD_TLD eval:check_from_in_list('NEWSPAMMY')

Thanks,
Alex

Re: top and other spammy TLDs

Posted by Benny Pedersen <me...@junc.eu>.
Alex skrev den 2017-02-25 01:18:

> Is there something more that needs to be done than the above?

what sa version ?

i know it works with 3.4.1

but have disabled my own rules again

Re: top and other spammy TLDs

Posted by Alex <my...@gmail.com>.
Hi,

On Tue, Feb 21, 2017 at 12:57 PM, Paul Stead
<pa...@zeninternet.co.uk> wrote:
> I’ve posted this before, this is how I manage these nasty TLDs:
>
> Make sure WLBLEval is enabled:
>
> loadplugin Mail::SpamAssassin::Plugin::WLBLEval
>
> Then add the TLDs to a URI_HOST list:
>
> enlist_uri_host (NEWSPAMMY) top
> enlist_uri_host (NEWSPAMMY) date
> enlist_uri_host (NEWSPAMMY) faith
> enlist_uri_host (NEWSPAMMY) racing
>
> These can then be used with eval rules:
>
> To check all URIs:
>
> header   PDS_OTHER_BAD_TLD eval:check_uri_host_listed('NEWSPAMMY')
> score    PDS_OTHER_BAD_TLD 0.1
> describe PDS_OTHER_BAD_TLD Other untrustworthy TLDs
>
> if you just want to check From address:
>
> header   PDS_FROM_OTHER_BAD_TLD eval:check_from_in_list('NEWSPAMMY')

I thought I would try and get this going, and despite not fully
understanding the comments you made in the bugreport, it doesn't seem
to work:

# spamassassin --lint
Feb 24 19:14:50.396 [14090] warn: eval: could not find list NEWSPAMMY
at /usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/WLBLEval.pm
line 112.

Is there something more that needs to be done than the above?

Re: top and other spammy TLDs

Posted by Paul Stead <pa...@zeninternet.co.uk>.
I’ve posted this before, this is how I manage these nasty TLDs:

Make sure WLBLEval is enabled:

loadplugin Mail::SpamAssassin::Plugin::WLBLEval

Then add the TLDs to a URI_HOST list:

enlist_uri_host (NEWSPAMMY) top
enlist_uri_host (NEWSPAMMY) date
enlist_uri_host (NEWSPAMMY) faith
enlist_uri_host (NEWSPAMMY) racing

These can then be used with eval rules:

To check all URIs:

header   PDS_OTHER_BAD_TLD eval:check_uri_host_listed('NEWSPAMMY')
score    PDS_OTHER_BAD_TLD 0.1
describe PDS_OTHER_BAD_TLD Other untrustworthy TLDs

if you just want to check From address:

header   PDS_FROM_OTHER_BAD_TLD eval:check_from_in_list('NEWSPAMMY')

Paul

On 21/02/2017, 03:40, "Alex" <my...@gmail.com> wrote:

    Hi,

    Some time ago I had put together a rule based on comments from this
    list, and I've identified a FP that I hoped someone could help me to
    correct.

    The full domain in the email was http://www.top-1.biz. However, it's
    being tagged as if it's "top" as the TLD in one of KAMs rules and one
    of mine:

    Feb 20 22:34:25.988 [31215] dbg: rules: ran uri rule __KAM_TINYDOMAIN
    ======> got hit: "-1.biz/"
    Feb 20 22:34:25.988 [31215] dbg: rules: ran uri rule LOC_URI_RARE_TLD
    ======> got hit: "://www.top"

    uri        LOC_URI_RARE_TLD
    m;://[^/]+\.(?:work|space|club|science|pub|red|blue|green|link|ninja|lol|xyz|faith|review|download|top|global|(?:web)?site|tech|party|pro|bid|trade|win|moda|news|online)(?:/|\b);i
    describe   LOC_URI_RARE_TLD     URI refers to rarely-nonspam TLD
    score      LOC_URI_RARE_TLD     0.400

    How can this be corrected to specifically only catch top as a TLD?


--
Paul Stead
Systems Engineer
Zen Internet