You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Henrik K <he...@hege.li> on 2022/05/23 09:45:33 UTC

DecodeShortURLs rules

Can we provide working default rules in sa-update / xx_decodeshorturls.cf ?

Plugin can and should be left disabled by default, with mention that
enabling it will result in HTTP requests from the mail server.

But as it's now under our maintenance, we should provide some rules for it
or it's mainly useless, url_shorteners list will keep on living.

Does anyone have working url_shorteners list?  I'm currently filtering all
non-resolving and non-http-200-returning services from the legacy lists.


Re: DecodeShortURLs rules

Posted by Henrik K <he...@hege.li>.
On Wed, May 25, 2022 at 08:16:23AM +0300, Henrik K wrote:
> On Tue, May 24, 2022 at 03:12:33PM -0400, Kevin A. McGrail wrote:
> > While you may say it's a stretch, it is the best list I know of that exists and
> > we've kept it up adding additions and removing things when needed.  
> 
> I think it would be generally helpful to mark and date any new additions.
> 
> Anyway, no one has commented about committing to stock rules?  Atleast no
> objections are heard, so I'll proceed to whip up something.

Committed rules/25_url_shortener.cf.  Please note the documentation on how
to update the list (url_shortener + regex made from it).  All rules are now
modified to use __URL_SHORTENER.

There's still SHORTENED_URL_SRC and SHORTENED_URL_HREF which use hardcoded
lists in regex.  *SRC one seems to be good S/O, dunno if it should be
updated to use the generated shortener regex also, maybe the shortener regex
should be turned into replacetags to be commonly usable.

Not sure why this exists and why it's nopublish?  Shouldn't it be removed or
published properly in 25_url_shortener.cf?  Not sure if it has any value as
a general rule anyway.

sandbox/khopesh/20_khop_experimental.cf:meta     URL_SHORTENER  __URL_SHORTENER
sandbox/khopesh/20_khop_experimental.cf:describe URL_SHORTENER  Has a shortened URL (can hide a blacklisted link)
sandbox/khopesh/20_khop_experimental.cf:tflags   URL_SHORTENER  nopublish


Re: DecodeShortURLs rules

Posted by Henrik K <he...@hege.li>.
On Wed, May 25, 2022 at 09:47:10AM -0400, Kevin A. McGrail wrote:
> Thanks Henrik,
> 
> No one has ever asked a single question about a single URL shorterner domain
> nor have I seen a FP/FN from it so I'm not sure the date info is useful.
> However, please do feel free to grab that file and put it in stock rules.  It
> won't cause any issues if it's in two places that I can think of.  Do you
> agree?

As just posted, I grabbed the original, yours, PDS from sandbox, searched
some new ones from Google and did a massive manual cleanup on the now
official SA list.

Duplicates or unnecessary entries do not pose a real problem aside from
annoying me.  As such I don't really care what KAM.cf contains, as I don't
use it.


Re: DecodeShortURLs rules

Posted by "Kevin A. McGrail" <km...@apache.org>.
Thanks Henrik,

No one has ever asked a single question about a single URL
shorterner domain nor have I seen a FP/FN from it so I'm not sure the date
info is useful.
However, please do feel free to grab that file and put it in stock rules.
It won't cause any issues if it's in two places that I can think of.  Do
you agree?

Regards,
KAM
--
Kevin A. McGrail
Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171


On Wed, May 25, 2022 at 1:16 AM Henrik K <he...@hege.li> wrote:

> On Tue, May 24, 2022 at 03:12:33PM -0400, Kevin A. McGrail wrote:
> > While you may say it's a stretch, it is the best list I know of that
> exists and
> > we've kept it up adding additions and removing things when needed.
>
> I think it would be generally helpful to mark and date any new additions.
>
> Anyway, no one has commented about committing to stock rules?  Atleast no
> objections are heard, so I'll proceed to whip up something.
>
>

Re: DecodeShortURLs rules

Posted by Henrik K <he...@hege.li>.
On Tue, May 24, 2022 at 03:12:33PM -0400, Kevin A. McGrail wrote:
> While you may say it's a stretch, it is the best list I know of that exists and
> we've kept it up adding additions and removing things when needed.  

I think it would be generally helpful to mark and date any new additions.

Anyway, no one has commented about committing to stock rules?  Atleast no
objections are heard, so I'll proceed to whip up something.


Re: DecodeShortURLs rules

Posted by "Kevin A. McGrail" <km...@apache.org>.
While you may say it's a stretch, it is the best list I know of that exists
and we've kept it up adding additions and removing things when needed.

Unfortunately, I have no genesis information on lat.ms being listed.

Regards,
KAM
--
Kevin A. McGrail
Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171


On Tue, May 24, 2022 at 7:42 AM Henrik K <he...@hege.li> wrote:

> On Mon, May 23, 2022 at 01:27:55PM -0400, Kevin A. McGrail wrote:
> >
> > https://mcgrail.com/downloads/KAM_urlshorteners.cf is the list the KAM
> > project maintains.
>
> "Maintained" is a bit of a stretch..  that's mostly the same old legacy
> list
> that everyone uses.  20% of them can be immediately discarded as not even
> resolving from DNS.  I already did a lot of work to further filter away
> parked domains, closed down services etc.
>
> Also I don't understand why it originally contained stuff like "lat.ms".
> Did latimes.com provide a shortening service for any custom URL?  If not,
> why should we care that a message contains "short URL" if it's not abusable
> and used in spam?
>
>

Re: DecodeShortURLs rules

Posted by Henrik K <he...@hege.li>.
On Mon, May 23, 2022 at 01:27:55PM -0400, Kevin A. McGrail wrote:
> 
> https://mcgrail.com/downloads/KAM_urlshorteners.cf is the list the KAM
> project maintains.

"Maintained" is a bit of a stretch..  that's mostly the same old legacy list
that everyone uses.  20% of them can be immediately discarded as not even
resolving from DNS.  I already did a lot of work to further filter away
parked domains, closed down services etc.

Also I don't understand why it originally contained stuff like "lat.ms". 
Did latimes.com provide a shortening service for any custom URL?  If not,
why should we care that a message contains "short URL" if it's not abusable
and used in spam?


Re: DecodeShortURLs rules

Posted by "Kevin A. McGrail" <km...@apache.org>.
Hi Henrik,

Take a look at KAM url shorterners file and rules in KAM.cf:

https://mcgrail.com/downloads/KAM.cf
search Mail::SpamAssassin::Plugin::DecodeShortURLs for example

https://mcgrail.com/downloads/KAM_urlshorteners.cf is the list the KAM
project maintains.

Regards,
KAM
--
Kevin A. McGrail
Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171


On Mon, May 23, 2022 at 5:45 AM Henrik K <he...@hege.li> wrote:

>
> Can we provide working default rules in sa-update / xx_decodeshorturls.cf
> ?
>
> Plugin can and should be left disabled by default, with mention that
> enabling it will result in HTTP requests from the mail server.
>
> But as it's now under our maintenance, we should provide some rules for it
> or it's mainly useless, url_shorteners list will keep on living.
>
> Does anyone have working url_shorteners list?  I'm currently filtering all
> non-resolving and non-http-200-returning services from the legacy lists.
>
>