You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by buy <bu...@netcasters.com> on 2019/04/17 12:44:32 UTC

Whitespace in urls

Hi,

I've been encountering spammers putting whitespace in the
domain area of a url.  My rule is not catching them.  An
equivalent pattern match in perl does catch them.

The spam email contains urls that look like this:
-------------------------------------------------
<a href="https://www. miwilurt. 
com/mKC7AeJAmPT5duDOp6rh_aOmQfdpzd_Ewgbm87h8By6313NSjVfHM10dT8MhiBk0XUB4g9vTUZrRs2U1fJUYCA~~/">click 
here</a>

Spamassassin rule looks like this (NO MATCH):
--------------------------------------------
uri       NC_SPAM292  /https?\:\/\/(?:\w*\.)*\s*miwilurt\.\s*com\//
score     NC_SPAM292  50


Perl check looks like this (MATCH):
-----------------------------------
$str = 'https://www. miwilurt. com/';
if ($str =~ /https?\:\/\/(?:\w*\.)*\s*miwilurt\.\s*com\//) {
     print "Match\n";
}


Thanks for your time,
Ted

Re: Whitespace in urls

Posted by Henrik K <he...@hege.li>.
On Wed, Apr 17, 2019 at 02:00:26PM +0100, RW wrote:
> On Wed, 17 Apr 2019 08:44:32 -0400
> buy wrote:
> 
> > Hi,
> > 
> > I've been encountering spammers putting whitespace in the
> > domain area of a url.  My rule is not catching them.
> > ...
> > Spamassassin rule looks like this (NO MATCH):
> > --------------------------------------------
> > uri       NC_SPAM292  /https?\:\/\/(?:\w*\.)*\s*miwilurt\.\s*com\//
> > score     NC_SPAM292  50
> 
> presumably it either hasn't been parsed as a uri or the spaces have
> been removed. Try a body or rawbody rule.

To check if it's seen at all:
spamassassin --cf 'uri ALLURIS /.+/' --cf 'tflags ALLURIS multiple' -t -D -L < testmsg 2>&1 | egrep 'ALLURIS.*hit:'


Re: Whitespace in urls

Posted by buy <bu...@netcasters.com>.
On 4/17/2019 9:24 AM, RW wrote:
> On Wed, 17 Apr 2019 14:00:26 +0100
> RW wrote:
> 
>> On Wed, 17 Apr 2019 08:44:32 -0400
>> buy wrote:
>>
>>> Hi,
>>>
>>> I've been encountering spammers putting whitespace in the
>>> domain area of a url.  My rule is not catching them.
>>> ...
>>> Spamassassin rule looks like this (NO MATCH):
>>> --------------------------------------------
>>> uri       NC_SPAM292  /https?\:\/\/(?:\w*\.)*\s*miwilurt\.\s*com\//
>>> score     NC_SPAM292  50
>>
>> presumably it either hasn't been parsed as a uri or the spaces have
>> been removed.
> 
> I see it uses \s* so it's not going to be the latter
> 
> 
>> Try a body or rawbody rule.
> 

The url exists in the plain text version of the mail message,
but not in the html version.  Thought I checked that:(  Thanks
for all of the suggestions.

Re: Whitespace in urls

Posted by RW <rw...@googlemail.com>.
On Wed, 17 Apr 2019 14:00:26 +0100
RW wrote:

> On Wed, 17 Apr 2019 08:44:32 -0400
> buy wrote:
> 
> > Hi,
> > 
> > I've been encountering spammers putting whitespace in the
> > domain area of a url.  My rule is not catching them.
> > ...
> > Spamassassin rule looks like this (NO MATCH):
> > --------------------------------------------
> > uri       NC_SPAM292  /https?\:\/\/(?:\w*\.)*\s*miwilurt\.\s*com\//
> > score     NC_SPAM292  50  
> 
> presumably it either hasn't been parsed as a uri or the spaces have
> been removed. 

I see it uses \s* so it's not going to be the latter


> Try a body or rawbody rule.

Re: Whitespace in urls

Posted by John Hardin <jh...@impsec.org>.
On Wed, 17 Apr 2019, RW wrote:

> On Wed, 17 Apr 2019 08:44:32 -0400
> buy wrote:
>
>> Hi,
>>
>> I've been encountering spammers putting whitespace in the
>> domain area of a url.  My rule is not catching them.
>> ...
>> Spamassassin rule looks like this (NO MATCH):
>> --------------------------------------------
>> uri       NC_SPAM292  /https?\:\/\/(?:\w*\.)*\s*miwilurt\.\s*com\//
>> score     NC_SPAM292  50
>
> presumably it either hasn't been parsed as a uri or the spaces have
> been removed. Try a body or rawbody rule.

This should help troubleshooting it in debug mode with rule hits logging 
enabled:

   uri     __ALL_URI   /.+/
   tflags  __ALL_URI   multiple


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Our government should bear in mind the fact that the American
   Revolution was touched off by the then-current government
   attempting to confiscate firearms from the people.
-----------------------------------------------------------------------
  2 days until the 244th anniversary of The Shot Heard 'Round The World

Re: Whitespace in urls

Posted by RW <rw...@googlemail.com>.
On Wed, 17 Apr 2019 08:44:32 -0400
buy wrote:

> Hi,
> 
> I've been encountering spammers putting whitespace in the
> domain area of a url.  My rule is not catching them.
> ...
> Spamassassin rule looks like this (NO MATCH):
> --------------------------------------------
> uri       NC_SPAM292  /https?\:\/\/(?:\w*\.)*\s*miwilurt\.\s*com\//
> score     NC_SPAM292  50

presumably it either hasn't been parsed as a uri or the spaces have
been removed. Try a body or rawbody rule.

Re: Whitespace in urls

Posted by Martin Gregorie <ma...@gregorie.org>.
On Wed, 2019-04-17 at 08:44 -0400, buy wrote:
> The spam email contains urls that look like this:
> -------------------------------------------------
> <a href="https://www. miwilurt. 
> com/mKC7AeJAmPT5duDOp6rh_aOmQfdpzd_Ewgbm87h8By6313NSjVfHM10dT8MhiBk0X
> UB4g9vTUZrRs2U1fJUYCA~~/">click 
> here</a>
> 
> Spamassassin rule looks like this (NO MATCH):
> --------------------------------------------
> uri       NC_SPAM292  /https?\:\/\/(?:\w*\.)*\s*miwilurt\.\s*com\//
> score     NC_SPAM292  50
> 
Untested, but...

Highlighting my MUA (Evolution) in your message shows the reason your
rule fails: the only 'URI' there is https://www. - and its malformed
because it ends with a '.'. 

So, try NC_SPM292 again, but as a body rule, rather than a uri rule and
use the same regex. 

However, it does look too specific to do anything except play
wackamole, but mat be useful if you can generalise it to accept any
string that starts with http: or https: followed by a string that
contains at least two instances of a dot followed by a space.


Martin