You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Karsten Bräckelmann <gu...@rudersport.de> on 2014/09/09 03:45:33 UTC

Valid TLDs (was: Re: Custom rule not hitting suddenly?)

Some discussion of the underlying issue.

On Tue, 2014-09-09 at 02:59 +0200, Karsten Bräckelmann wrote:
> At the time of the 3.3.2 release, the .club TLD simply didn't exist. It
> has been accepted by IANA just recently. Of course I was conveniently
> using a trunk checkout for testing and kind of shrugged off that TLD in
> question.
> 
> FWIW, this is not actually a 3.3.x issue. It's the same with 3.4.0. Yes,
> that is a *recent* TLD addition... *sigh*

Unlike the util_rb_[23]tld options, the set of valid TLDs is actually
hard-coded. It would not be a problem to make that an option, too.
Which, on the plus side, would make it possible to propagate new TLDs
via sa-update. Not only 3.3.x would benefit from that, but also 3.4.0
instances. Plus, it would be generally faster anyway.

There is one down side: A new dependency on Regexp::List [1]. The RE
pre-compile one-time upstart penalty should be negligible.

The question is: Is it worth it?  WILL it be worth it?

This incidence is part of the initial round of IANA accepting generic
TLDs. There's hundreds in this wave, and some are abused early. This is
moonshine registration, nothing like new TLDs being accepted in the
coming years.

Or is it? Will new generic TLDs in the future be abused like that, too?
How frequently will that happen? Is it worth being able to react to it
quickly? How long will URIBLs take to list them? How long will it take
for the average MUA to even linki-fy them?

Opinions? Discussion in here, or should I move this to dev?

I guess I'd be happy to introduce to you... util_rb_tld.


[1] Well, or a really, really f*cking ugly option that takes a
    pre-optimzed qr// blob containing the VALID_TLDS_RE.

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Valid TLDs

Posted by Henrik K <he...@hege.li>.
On Tue, Sep 09, 2014 at 07:10:34AM +0200, Axb wrote:
> On 09/09/2014 07:04 AM, Henrik K wrote:
> >On Tue, Sep 09, 2014 at 03:45:33AM +0200, Karsten Bräckelmann wrote:
> >>
> >>There is one down side: A new dependency on Regexp::List [1]. The RE
> >>pre-compile one-time upstart penalty should be negligible.
> >>
> >>[1] Well, or a really, really f*cking ugly option that takes a
> >>     pre-optimzed qr// blob containing the VALID_TLDS_RE.
> >
> >I think it's even mentioned on the bug. There's NO need for any deps.  Perl
> >5.10+ already trie optimizes a|b|c lists internally.  Just concatenate and
> >be done with it.  If you _really_ want to cater for 5.8 losers, the simple
> >shuffling "Regexp::List" does could be implemented internally in few lines of
> >code..
> >
> 
> is your your Freemail plugin seems to be the only one accessing the
> VALID_TLDS_RE list...

How about just continue at:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6985

:-)

Re: Valid TLDs

Posted by Axb <ax...@gmail.com>.
On 09/09/2014 07:04 AM, Henrik K wrote:
> On Tue, Sep 09, 2014 at 03:45:33AM +0200, Karsten Bräckelmann wrote:
>>
>> There is one down side: A new dependency on Regexp::List [1]. The RE
>> pre-compile one-time upstart penalty should be negligible.
>>
>> [1] Well, or a really, really f*cking ugly option that takes a
>>      pre-optimzed qr// blob containing the VALID_TLDS_RE.
>
> I think it's even mentioned on the bug. There's NO need for any deps.  Perl
> 5.10+ already trie optimizes a|b|c lists internally.  Just concatenate and
> be done with it.  If you _really_ want to cater for 5.8 losers, the simple
> shuffling "Regexp::List" does could be implemented internally in few lines of
> code..
>

is your your Freemail plugin seems to be the only one accessing the 
VALID_TLDS_RE list...


Re: Valid TLDs (was: Re: Custom rule not hitting suddenly?)

Posted by Henrik K <he...@hege.li>.
On Tue, Sep 09, 2014 at 03:45:33AM +0200, Karsten Bräckelmann wrote:
> 
> There is one down side: A new dependency on Regexp::List [1]. The RE
> pre-compile one-time upstart penalty should be negligible.
> 
> [1] Well, or a really, really f*cking ugly option that takes a
>     pre-optimzed qr// blob containing the VALID_TLDS_RE.

I think it's even mentioned on the bug. There's NO need for any deps.  Perl
5.10+ already trie optimizes a|b|c lists internally.  Just concatenate and
be done with it.  If you _really_ want to cater for 5.8 losers, the simple
shuffling "Regexp::List" does could be implemented internally in few lines of
code..


Re: Valid TLDs

Posted by Martin Gregorie <ma...@gregorie.org>.
On Tue, 2014-09-09 at 11:10 +0200, Antony Stone wrote:
> On Tuesday 09 September 2014 at 11:04:04 (EU time), Martin Gregorie wrote:
> 
> > I use Fedora, which is roughly equivalent to Debian unstable, i.e. fairly
> > cutting edge as its the next step back toward stability from 'testing'.
> 
> Hm, that's not quite the right way round - "testing" is the step in between 
> "stable" and "unstable".
> 
Thanks for the correction: I think that makes the comparison look like
this:

Debian: stable, testing, unstable
RedHat: RHEL, Fedora, testing

... with CentOS being equivalent to RHEL.


Martin





Re: Valid TLDs

Posted by Antony Stone <An...@spamassassin.open.source.it>.
On Tuesday 09 September 2014 at 11:04:04 (EU time), Martin Gregorie wrote:

> I use Fedora, which is roughly equivalent to Debian unstable, i.e. fairly
> cutting edge as its the next step back toward stability from 'testing'.

Hm, that's not quite the right way round - "testing" is the step in between 
"stable" and "unstable".

So the sequence is:

 - unstable - bleeding edge, development

 - testing - not officially stable, but should be good enough if you know what 
you're doing

 - stable - thoroughly tested, stable, starting to get out of date :)

https://www.debian.org/releases/


Regards,


Antony.

-- 
Wanted: telepath.   You know where to apply.

                                                   Please reply to the list;
                                                         please *don't* CC me.

Re: Valid TLDs

Posted by Martin Gregorie <ma...@gregorie.org>.
On Tue, 2014-09-09 at 10:41 +0200, Axb wrote:
> On 09/09/2014 10:27 AM, Martin Gregorie wrote:
> > On Mon, 2014-09-08 at 23:05 -0600, Amir Caspi wrote:
> >> An automated method would prevent a number of problems, and since the
> >> allowed TLDs are evolving, I think it makes the most sense. I can't
> >> speak to a specific implementation, but -something- automated...
> >>
> > Same here: I pick up new SA versions via the RedHat package distribution
> > system. This sometimes lags quite a long way behind SA version upgrades
> > and releases, so distributing the TLD list via sa_update sounds like a
> > good idea.
> >
> > Martin
> 
> so with RHEL7 you're stuck with SA 3.3.2 ?
> 
I can't comment: I use Fedora, which is roughly equivalent to Debian
unstable, i.e. fairly cutting edge as its the next step back toward
stability from 'testing'. Since its generally pretty much up to date
with application packages I was quite surprised that it got so far
behind with SA 3.4.0. That has never done that before AFAICR.
  
> IF SA moves to a .cf based TLD list, you'll still need to update the SA 
> core to support the change.
> 
Understood.


Martin




Re: Valid TLDs

Posted by Axb <ax...@gmail.com>.
On 09/09/2014 10:27 AM, Martin Gregorie wrote:
> On Mon, 2014-09-08 at 23:05 -0600, Amir Caspi wrote:
>> An automated method would prevent a number of problems, and since the
>> allowed TLDs are evolving, I think it makes the most sense. I can't
>> speak to a specific implementation, but -something- automated...
>>
> Same here: I pick up new SA versions via the RedHat package distribution
> system. This sometimes lags quite a long way behind SA version upgrades
> and releases, so distributing the TLD list via sa_update sounds like a
> good idea.
>
> Martin

so with RHEL7 you're stuck with SA 3.3.2 ?

I'll never understand this approach of sticking to old versions instead 
of cooking your own updated rpms
(and expecting the rest of the world to accomodate)


IF SA moves to a .cf based TLD list, you'll still need to update the SA 
core to support the change.


Re: Valid TLDs (was: Re: Custom rule not hitting suddenly?)

Posted by Martin Gregorie <ma...@gregorie.org>.
On Mon, 2014-09-08 at 23:05 -0600, Amir Caspi wrote:
> An automated method would prevent a number of problems, and since the
> allowed TLDs are evolving, I think it makes the most sense. I can't
> speak to a specific implementation, but -something- automated...
> 
Same here: I pick up new SA versions via the RedHat package distribution
system. This sometimes lags quite a long way behind SA version upgrades
and releases, so distributing the TLD list via sa_update sounds like a
good idea.

Martin





Re: Valid TLDs (was: Re: Custom rule not hitting suddenly?)

Posted by Amir Caspi <ce...@3phase.com>.
On Sep 8, 2014, at 7:45 PM, Karsten Bräckelmann <gu...@rudersport.de> wrote:
> 
> Opinions? Discussion in here, or should I move this to dev?

Given that TLDs can and do change on a timescale more frequent than many people update their version of SA (myself included), I would vote for a method that treats this as a configuration or whatever that could receive automatic updates.

My feeling is: primary functionality, i.e. code, should be fixed and tied to versioning; variables, i.e. settings, rules, and other things that need to evolve or get tweaked, should be changeable either automatically or manually. For this kind of back-end variable that must evolve but isn't something an end-user would normally touch, an automatic update like for the main rules block is appropriate.

If there's a way to do that without an ugly hack, I think it's the best option. The one-time speed hit during initial load is negligible, and it would prevent exactly the problem I had to bother you guys about, and spend both my time and yours, trying to debug. I am reasonably tech-savvy and could resolve this reasonably quickly, but it could have easily gone on much longer and been more frustrating for all involved.

An automated method would prevent a number of problems, and since the allowed TLDs are evolving, I think it makes the most sense. I can't speak to a specific implementation, but -something- automated...

Thanks.

--- Amir
thumbed via iPhone

Re: Valid TLDs (was: Re: Custom rule not hitting suddenly?)

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Mon, 2014-09-08 at 22:37 -0400, listsb-spamassassin@bitrate.net wrote:
> On Sep 8, 2014, at 21.45, Karsten Bräckelmann <gu...@rudersport.de> wrote:
> 
> > Some discussion of the underlying issue.
> > 
> > On Tue, 2014-09-09 at 02:59 +0200, Karsten Bräckelmann wrote:
> >> At the time of the 3.3.2 release, the .club TLD simply didn't exist. It
> >> has been accepted by IANA just recently. Of course I was conveniently
> >> using a trunk checkout for testing and kind of shrugged off that TLD in
> >> question.
> >> 
> >> FWIW, this is not actually a 3.3.x issue. It's the same with 3.4.0. Yes,
> >> that is a *recent* TLD addition... *sigh*
> > 
> > Unlike the util_rb_[23]tld options, the set of valid TLDs is actually
> > hard-coded. It would not be a problem to make that an option, too.
> > Which, on the plus side, would make it possible to propagate new TLDs
> > via sa-update. Not only 3.3.x would benefit from that, but also 3.4.0
> > instances. Plus, it would be generally faster anyway.
> > 
> > There is one down side: A new dependency on Regexp::List [1]. The RE
> > pre-compile one-time upstart penalty should be negligible.
> > 
> > The question is: Is it worth it?  WILL it be worth it?
> 
> pardon my possible technical ignorance here - could this potentially be
> a network test, rather than a list propagated by sa-update?  e.g.
> query dns for existence of delegation?

This cannot be queried for. Because the Valid TLDs (code|option) is what
is used to identify URIs in the first place, even from plain text links
any normal MUA would linki-fy.

Apart from that, the list of generic TLDs is not going to change *that*
frequent, that a few days between IANA acceptance, SA incorporating it,
and first occurrence in mail as sa-update takes would make a difference.

And as I hinted at before, (new) generic TLD owners have a vital
interest in their TLD not be mostly abused. If it is, it's not worth the
investment.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Valid TLDs (was: Re: Custom rule not hitting suddenly?)

Posted by li...@bitrate.net.
On Sep 8, 2014, at 21.45, Karsten Bräckelmann <gu...@rudersport.de> wrote:

> Some discussion of the underlying issue.
> 
> On Tue, 2014-09-09 at 02:59 +0200, Karsten Bräckelmann wrote:
>> At the time of the 3.3.2 release, the .club TLD simply didn't exist. It
>> has been accepted by IANA just recently. Of course I was conveniently
>> using a trunk checkout for testing and kind of shrugged off that TLD in
>> question.
>> 
>> FWIW, this is not actually a 3.3.x issue. It's the same with 3.4.0. Yes,
>> that is a *recent* TLD addition... *sigh*
> 
> Unlike the util_rb_[23]tld options, the set of valid TLDs is actually
> hard-coded. It would not be a problem to make that an option, too.
> Which, on the plus side, would make it possible to propagate new TLDs
> via sa-update. Not only 3.3.x would benefit from that, but also 3.4.0
> instances. Plus, it would be generally faster anyway.
> 
> There is one down side: A new dependency on Regexp::List [1]. The RE
> pre-compile one-time upstart penalty should be negligible.
> 
> The question is: Is it worth it?  WILL it be worth it?

pardon my possible technical ignorance here - could this potentially be a network test, rather than a list propagated by sa-update?  e.g. query dns for existence of delegation?

-ben

Re: Valid TLDs (was: Re: Custom rule not hitting suddenly?)

Posted by Dave Pooser <da...@pooserville.com>.
>>embedded in the rules.
>                  ^^^^^
>Code, not rules. Which basically is the issue here...

Just read what I *mean* and not what I type. ;-)
-- 
Dave Pooser
Cat-Herder-in-Chief, Pooserville.com
"...Life is not a journey to the grave with the intention of arriving
safely in one pretty and well-preserved piece, but to slide across the
finish line broadside, thoroughly used up, worn out, leaking oil, and
shouting GERONIMO!!!" -- Bill McKenna





Re: Valid TLDs (was: Re: Custom rule not hitting suddenly?)

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Mon, 2014-09-08 at 21:45 -0500, Dave Pooser wrote:
> On 9/8/14 8:45 PM, "Karsten Bräckelmann" <gu...@rudersport.de> wrote:
> 
> >There is one down side: A new dependency on Regexp::List [1]. The RE
> >pre-compile one-time upstart penalty should be negligible.
> >
> >[1] Well, or a really, really f*cking ugly option that takes a
> >    pre-optimzed qr// blob containing the VALID_TLDS_RE.
> 
> I may be biased as I've been dealing with a different CPAN dependency
> flustercluck recently (love maintainers who can't be bothered to update
> the version info so CPAN doesn't realize there's an update and I have to
> manually un/re install), but I'm a vote for the hideously ugly
> preoptimized blob over adding a new dependency.
> 
> That said, I'd rather have the new dependency than keep the configuration
> embedded in the rules.
                  ^^^^^
Code, not rules. Which basically is the issue here...

> So, in order of preference:
> 1) Pre-optimized blob
> 2) Regexp::List dependency
> 3) Current method

Got ya. Both (1) and (2) would require code changes, so it's 3.4.1+ only
anyway.

Thanks.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Valid TLDs (was: Re: Custom rule not hitting suddenly?)

Posted by Dave Pooser <da...@pooserville.com>.
On 9/8/14 8:45 PM, "Karsten Bräckelmann" <gu...@rudersport.de> wrote:

>There is one down side: A new dependency on Regexp::List [1]. The RE
>pre-compile one-time upstart penalty should be negligible.
>
>[1] Well, or a really, really f*cking ugly option that takes a
>    pre-optimzed qr// blob containing the VALID_TLDS_RE.

I may be biased as I've been dealing with a different CPAN dependency
flustercluck recently (love maintainers who can't be bothered to update
the version info so CPAN doesn't realize there's an update and I have to
manually un/re install), but I'm a vote for the hideously ugly
preoptimized blob over adding a new dependency.

That said, I'd rather have the new dependency than keep the configuration
embedded in the rules.

So, in order of preference:
1) Pre-optimized blob
2) Regexp::List dependency
3) Current method
-- 
Dave Pooser
Cat-Herder-in-Chief, Pooserville.com
"...Life is not a journey to the grave with the intention of arriving
safely in one pretty and well-preserved piece, but to slide across the
finish line broadside, thoroughly used up, worn out, leaking oil, and
shouting GERONIMO!!!" -- Bill McKenna





Re: Valid TLDs (was: Re: Custom rule not hitting suddenly?)

Posted by Benny Pedersen <me...@junc.eu>.
On 9. sep. 2014 04.29.55 Karsten Bräckelmann <gu...@rudersport.de> wrote:

> Apart from that nitpick, I understand you would be in favor of a Valid
> TLD option, rather than hard-coded. Noted.

Perl programmer make there signature in perl code

Well i still thinking about url reputation, but since nearly all kind of 
silly tlds come in future it will be a hardtask to list and update, but at 
the same time nice whitelist by remove tlds, na not funny

Good is that the old ones does not die so fast as new throwaway tlds could 
be is that i could be a spam / ham race without any winners :(

Also seperate mailto vs http tracking testing is something atleast awl 
should handle and possible aswell in txrep with is more fokused on from 
header then url domains ?

As the old awl code is it lists url domains in sender ip, ups

If i just write perl code then i could a nice signature aswell

Re: Valid TLDs (was: Re: Custom rule not hitting suddenly?)

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Mon, 2014-09-08 at 22:15 -0400, Daniel Staal wrote:
> --As of September 9, 2014 3:45:33 AM +0200, Karsten Bräckelmann is alleged 
> to have said:
> 
> > This incidence is part of the initial round of IANA accepting generic
> > TLDs. There's hundreds in this wave, and some are abused early. This is
> > moonshine registration, nothing like new TLDs being accepted in the
> > coming years.
> >
> > Or is it? Will new generic TLDs in the future be abused like that, too?
> > How frequently will that happen? Is it worth being able to react to it
> > quickly? How long will URIBLs take to list them? How long will it take
> > for the average MUA to even linki-fy them?
> >
> > Opinions? Discussion in here, or should I move this to dev?
> 
> --As for the rest, it is mine.
> 
> New TLDs will always be abused...

And old ones. "TK, re-naming the web." Yes, sometimes it is valid to add
a point or two for the mere occurence of a TLD in a URI.

For how long? Whoever applied for new generic $tld put about 180 grand
up the shelve. How much is it worth them to prevent spammers from
tasting domains and actually turn their investment into serious
customers paying bucks?


> Anyway, personal opinion: Spamassassin is currently structured to have code 
> and rules as separate things.  Putting this in the code blurs that - it's a 
> rule.  Unless there is a major performance penalty, I would move it to be 
> with the rest of the rules.  It should make maintenance easier and clearer.

It is and would not be "a rule" as you stated, but configuration.

Apart from that nitpick, I understand you would be in favor of a Valid
TLD option, rather than hard-coded. Noted.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Valid TLDs (was: Re: Custom rule not hitting suddenly?)

Posted by Daniel Staal <DS...@usa.net>.
--As of September 9, 2014 3:45:33 AM +0200, Karsten Bräckelmann is alleged 
to have said:

> This incidence is part of the initial round of IANA accepting generic
> TLDs. There's hundreds in this wave, and some are abused early. This is
> moonshine registration, nothing like new TLDs being accepted in the
> coming years.
>
> Or is it? Will new generic TLDs in the future be abused like that, too?
> How frequently will that happen? Is it worth being able to react to it
> quickly? How long will URIBLs take to list them? How long will it take
> for the average MUA to even linki-fy them?
>
> Opinions? Discussion in here, or should I move this to dev?

--As for the rest, it is mine.

New TLDs will always be abused...

Anyway, personal opinion: Spamassassin is currently structured to have code 
and rules as separate things.  Putting this in the code blurs that - it's a 
rule.  Unless there is a major performance penalty, I would move it to be 
with the rest of the rules.  It should make maintenance easier and clearer.

Daniel T. Staal

---------------------------------------------------------------
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------

Re: Valid TLDs

Posted by Reindl Harald <h....@thelounge.net>.
Am 09.09.2014 um 03:45 schrieb Karsten Bräckelmann:
> This incidence is part of the initial round of IANA accepting generic
> TLDs. There's hundreds in this wave, and some are abused early. This is
> moonshine registration, nothing like new TLDs being accepted in the
> coming years.
> 
> Or is it? Will new generic TLDs in the future be abused like that, too?
> How frequently will that happen? Is it worth being able to react to it
> quickly? How long will URIBLs take to list them? How long will it take
> for the average MUA to even linki-fy them?
> 
> Opinions? Discussion in here, or should I move this to dev?

half-OT

due migration away from a commercial appliance i noticed
a lot of wrongly as spam blocked messages because of TLD's
and domain names naming .cf

that resulted in block any message even quting my own
content if a dumb MUA wrapped link tags around filenames
like "local.cf", "main.cf", "master.cf"...

there is some danger for false positives in that context