You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by buy <bu...@netcasters.com> on 2019/04/17 12:44:32 UTC
Whitespace in urls
Hi,
I've been encountering spammers putting whitespace in the
domain area of a url. My rule is not catching them. An
equivalent pattern match in perl does catch them.
The spam email contains urls that look like this:
-------------------------------------------------
<a href="https://www. miwilurt.
com/mKC7AeJAmPT5duDOp6rh_aOmQfdpzd_Ewgbm87h8By6313NSjVfHM10dT8MhiBk0XUB4g9vTUZrRs2U1fJUYCA~~/">click
here</a>
Spamassassin rule looks like this (NO MATCH):
--------------------------------------------
uri NC_SPAM292 /https?\:\/\/(?:\w*\.)*\s*miwilurt\.\s*com\//
score NC_SPAM292 50
Perl check looks like this (MATCH):
-----------------------------------
$str = 'https://www. miwilurt. com/';
if ($str =~ /https?\:\/\/(?:\w*\.)*\s*miwilurt\.\s*com\//) {
print "Match\n";
}
Thanks for your time,
Ted
Re: Whitespace in urls
Posted by Henrik K <he...@hege.li>.
On Wed, Apr 17, 2019 at 02:00:26PM +0100, RW wrote:
> On Wed, 17 Apr 2019 08:44:32 -0400
> buy wrote:
>
> > Hi,
> >
> > I've been encountering spammers putting whitespace in the
> > domain area of a url. My rule is not catching them.
> > ...
> > Spamassassin rule looks like this (NO MATCH):
> > --------------------------------------------
> > uri NC_SPAM292 /https?\:\/\/(?:\w*\.)*\s*miwilurt\.\s*com\//
> > score NC_SPAM292 50
>
> presumably it either hasn't been parsed as a uri or the spaces have
> been removed. Try a body or rawbody rule.
To check if it's seen at all:
spamassassin --cf 'uri ALLURIS /.+/' --cf 'tflags ALLURIS multiple' -t -D -L < testmsg 2>&1 | egrep 'ALLURIS.*hit:'
Re: Whitespace in urls
Posted by buy <bu...@netcasters.com>.
On 4/17/2019 9:24 AM, RW wrote:
> On Wed, 17 Apr 2019 14:00:26 +0100
> RW wrote:
>
>> On Wed, 17 Apr 2019 08:44:32 -0400
>> buy wrote:
>>
>>> Hi,
>>>
>>> I've been encountering spammers putting whitespace in the
>>> domain area of a url. My rule is not catching them.
>>> ...
>>> Spamassassin rule looks like this (NO MATCH):
>>> --------------------------------------------
>>> uri NC_SPAM292 /https?\:\/\/(?:\w*\.)*\s*miwilurt\.\s*com\//
>>> score NC_SPAM292 50
>>
>> presumably it either hasn't been parsed as a uri or the spaces have
>> been removed.
>
> I see it uses \s* so it's not going to be the latter
>
>
>> Try a body or rawbody rule.
>
The url exists in the plain text version of the mail message,
but not in the html version. Thought I checked that:( Thanks
for all of the suggestions.
Re: Whitespace in urls
Posted by RW <rw...@googlemail.com>.
On Wed, 17 Apr 2019 14:00:26 +0100
RW wrote:
> On Wed, 17 Apr 2019 08:44:32 -0400
> buy wrote:
>
> > Hi,
> >
> > I've been encountering spammers putting whitespace in the
> > domain area of a url. My rule is not catching them.
> > ...
> > Spamassassin rule looks like this (NO MATCH):
> > --------------------------------------------
> > uri NC_SPAM292 /https?\:\/\/(?:\w*\.)*\s*miwilurt\.\s*com\//
> > score NC_SPAM292 50
>
> presumably it either hasn't been parsed as a uri or the spaces have
> been removed.
I see it uses \s* so it's not going to be the latter
> Try a body or rawbody rule.
Re: Whitespace in urls
Posted by John Hardin <jh...@impsec.org>.
On Wed, 17 Apr 2019, RW wrote:
> On Wed, 17 Apr 2019 08:44:32 -0400
> buy wrote:
>
>> Hi,
>>
>> I've been encountering spammers putting whitespace in the
>> domain area of a url. My rule is not catching them.
>> ...
>> Spamassassin rule looks like this (NO MATCH):
>> --------------------------------------------
>> uri NC_SPAM292 /https?\:\/\/(?:\w*\.)*\s*miwilurt\.\s*com\//
>> score NC_SPAM292 50
>
> presumably it either hasn't been parsed as a uri or the spaces have
> been removed. Try a body or rawbody rule.
This should help troubleshooting it in debug mode with rule hits logging
enabled:
uri __ALL_URI /.+/
tflags __ALL_URI multiple
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Our government should bear in mind the fact that the American
Revolution was touched off by the then-current government
attempting to confiscate firearms from the people.
-----------------------------------------------------------------------
2 days until the 244th anniversary of The Shot Heard 'Round The World
Re: Whitespace in urls
Posted by RW <rw...@googlemail.com>.
On Wed, 17 Apr 2019 08:44:32 -0400
buy wrote:
> Hi,
>
> I've been encountering spammers putting whitespace in the
> domain area of a url. My rule is not catching them.
> ...
> Spamassassin rule looks like this (NO MATCH):
> --------------------------------------------
> uri NC_SPAM292 /https?\:\/\/(?:\w*\.)*\s*miwilurt\.\s*com\//
> score NC_SPAM292 50
presumably it either hasn't been parsed as a uri or the spaces have
been removed. Try a body or rawbody rule.
Re: Whitespace in urls
Posted by Martin Gregorie <ma...@gregorie.org>.
On Wed, 2019-04-17 at 08:44 -0400, buy wrote:
> The spam email contains urls that look like this:
> -------------------------------------------------
> <a href="https://www. miwilurt.
> com/mKC7AeJAmPT5duDOp6rh_aOmQfdpzd_Ewgbm87h8By6313NSjVfHM10dT8MhiBk0X
> UB4g9vTUZrRs2U1fJUYCA~~/">click
> here</a>
>
> Spamassassin rule looks like this (NO MATCH):
> --------------------------------------------
> uri NC_SPAM292 /https?\:\/\/(?:\w*\.)*\s*miwilurt\.\s*com\//
> score NC_SPAM292 50
>
Untested, but...
Highlighting my MUA (Evolution) in your message shows the reason your
rule fails: the only 'URI' there is https://www. - and its malformed
because it ends with a '.'.
So, try NC_SPM292 again, but as a body rule, rather than a uri rule and
use the same regex.
However, it does look too specific to do anything except play
wackamole, but mat be useful if you can generalise it to accept any
string that starts with http: or https: followed by a string that
contains at least two instances of a dot followed by a space.
Martin