You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Joseph Acquisto <jo...@j4computers.com> on 2012/05/16 23:05:23 UTC

regex needed for http link

I have been unsuccessful creating a rule to detect and weight http links in message body, such as this one below:

http://boguslink.ru 

The ones I have created get "hits" when tested on the command line, but don't seem to work in local.cf.  Maybe that's the wrong place?



Re: ***Possible SPAM*** Re: regex needed for http link

Posted by Joseph Acquisto <jo...@j4computers.com>.
>>> On 5/17/2012 at 6:16 PM, John Hardin <jh...@impsec.org> wrote:
> On Thu, 17 May 2012, Joseph Acquisto wrote:
> 
>> I attempted to adapt something from a similar regex provided by a vendor
>> of a commercial product.  It was to detect country codes we do not want
>> to accept mail from.   No doubt my ignorance of SA and regex in general
>> will be on display for the amusement of many.
>>
>> rawbody            URI_RU              m,^https?://[^.\.][ru]/,i
> 
> heh. Yeah, that won't work. "[]" means a character class, one character 
> that matches anything within the square brackets.
> 
> What the above RE says is:
> 
> blah blah blah // (not-period OR period) (r OR u) /
> 
> ...so it would match, for example:
> 
>  	https://.r/ 
>  	https://.u/ 
> 
> but never:
> 
>  	https://{anything}.ru/ 
> 
> And you actually had success testing that from the command line?
> 
> -- 
>   John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/ 
>   jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org 
>   key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79

I believe so.  It was weeks ago that I did that (then comment it out, intending to get back to it).

I won't be able to focus on this for a while.  I forgot we are having a social gathering tonight.
Sigh.  Sometimes that sort of thing has to happen.

joe a.



Re: ***Possible SPAM*** Re: regex needed for http link

Posted by John Hardin <jh...@impsec.org>.
On Thu, 17 May 2012, Joseph Acquisto wrote:

> I attempted to adapt something from a similar regex provided by a vendor
> of a commercial product.  It was to detect country codes we do not want
> to accept mail from.   No doubt my ignorance of SA and regex in general
> will be on display for the amusement of many.
>
> rawbody            URI_RU              m,^https?://[^.\.][ru]/,i

heh. Yeah, that won't work. "[]" means a character class, one character 
that matches anything within the square brackets.

What the above RE says is:

blah blah blah // (not-period OR period) (r OR u) /

...so it would match, for example:

 	https://.r/
 	https://.u/

but never:

 	https://{anything}.ru/

And you actually had success testing that from the command line?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Justice is justice, whereas "social justice" is code for one set
   of rules for the rich, another for the poor; one set for whites,
   another set for minorities; one set for straight men, another for
   women and gays. In short, it's the opposite of actual justice.
                                                     -- Burt Prelutsky
-----------------------------------------------------------------------
  2 days until SpaceX Dragon first mission to ISS

***Possible SPAM*** Re: regex needed for http link

Posted by Joseph Acquisto <jo...@j4computers.com>.
>>> On 5/17/2012 at 9:55 AM, John Hardin <jh...@impsec.org> wrote:
> On Wed, 16 May 2012, Joseph Acquisto wrote:
> 
>>>>> On 5/16/2012 at 8:53 PM, "Joseph Acquisto" <jo...@j4computers.com> wrote:
>>>>>> On 5/16/2012 at 5:18 PM, Brent Gardner <bg...@gmail.com> wrote:
>>>>
>>>> How about:
>>>>
>>>> /\.ru\b/i
>>>
>>> I will give that a try.
>>
>> That worked.  But I imagine it may trigger on innocuous instances of .ru as 
> well, so it should also include check for http:// and wildcard for domain.
> 
> What were you doing that _didn't_ detect that? The "proper" way is this:
> 
>     uri   URI_DOT_RU    /\.ru\b/i
> 
> ...and let the body parser figure out the "link" context.
> 
> Is there some reason that won't work?
> 
> Could you post the rule you were originally using?
> 
> -- 
>   John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/ 
>

I attempted to adapt something from a similar regex provided by a vendor
of a commercial product.  It was to detect country codes we do not want
to accept mail from.   No doubt my ignorance of SA and regex in general
will be on display for the amusement of many.

rawbody            URI_RU              m,^https?://[^.\.][ru]/,i


joe a.


Re: regex needed for http link

Posted by John Hardin <jh...@impsec.org>.
On Wed, 16 May 2012, Joseph Acquisto wrote:

>>>> On 5/16/2012 at 8:53 PM, "Joseph Acquisto" <jo...@j4computers.com> wrote:
>>>>> On 5/16/2012 at 5:18 PM, Brent Gardner <bg...@gmail.com> wrote:
>>>
>>> How about:
>>>
>>> /\.ru\b/i
>>
>> I will give that a try.
>
> That worked.  But I imagine it may trigger on innocuous instances of .ru as well, so it should also include check for http:// and wildcard for domain.

What were you doing that _didn't_ detect that? The "proper" way is this:

    uri   URI_DOT_RU    /\.ru\b/i

...and let the body parser figure out the "link" context.

Is there some reason that won't work?

Could you post the rule you were originally using?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   If Microsoft made hammers, everyone would whine about how poorly
   screws were designed and about how they are hard to hammer in, and
   wonder why it takes so long to paint a wall using the hammer.
-----------------------------------------------------------------------
  2 days until SpaceX Dragon first mission to ISS

Re: regex needed for http link

Posted by Joseph Acquisto <jo...@j4computers.com>.
>>> On 5/16/2012 at 8:53 PM, "Joseph Acquisto" <jo...@j4computers.com> wrote:
>>>> On 5/16/2012 at 5:18 PM, Brent Gardner <bg...@gmail.com> wrote:
>> On 05/16/2012 02:15 PM, Joseph Acquisto wrote:
>>>>>> On 5/16/2012 at 5:05 PM, "Joseph Acquisto"<jo...@j4computers.com>  wrote:
>>>> I have been unsuccessful creating a rule to detect and weight http links in
>>>> message body, such as this one below:
>>>>
>>>> http://boguslink.xx 
>>>>
>>>> The ones I have created get "hits" when tested on the command line, but
>>>> don't seem to work in local.cf.  Maybe that's the wrong place?
>>> I should have said, to detect the two character country code.
>>>
>> What are you using now?
>> 
>> How about:
>> 
>> /\.ru\b/i
>> 
>> 
>> 
>> Brent Gardner
> 
> I will give that a try.

That worked.  But I imagine it may trigger on innocuous instances of .ru as well, so it should also include check for http:// and wildcard for domain.

joe a.


Re: regex needed for http link

Posted by Joseph Acquisto <jo...@j4computers.com>.
>>> On 5/16/2012 at 5:18 PM, Brent Gardner <bg...@gmail.com> wrote:
> On 05/16/2012 02:15 PM, Joseph Acquisto wrote:
>>>>> On 5/16/2012 at 5:05 PM, "Joseph Acquisto"<jo...@j4computers.com>  wrote:
>>> I have been unsuccessful creating a rule to detect and weight http links in
>>> message body, such as this one below:
>>>
>>> http://boguslink.ru 
>>>
>>> The ones I have created get "hits" when tested on the command line, but
>>> don't seem to work in local.cf.  Maybe that's the wrong place?
>> I should have said, to detect the two character country code.
>>
> What are you using now?
> 
> How about:
> 
> /\.ru\b/i
> 
> 
> 
> Brent Gardner

I will give that a try.


Re: regex needed for http link

Posted by Brent Gardner <bg...@gmail.com>.
On 05/16/2012 02:15 PM, Joseph Acquisto wrote:
>>>> On 5/16/2012 at 5:05 PM, "Joseph Acquisto"<jo...@j4computers.com>  wrote:
>> I have been unsuccessful creating a rule to detect and weight http links in
>> message body, such as this one below:
>>
>> http://boguslink.ru
>>
>> The ones I have created get "hits" when tested on the command line, but
>> don't seem to work in local.cf.  Maybe that's the wrong place?
> I should have said, to detect the two character country code.
>
What are you using now?

How about:

/\.ru\b/i



Brent Gardner



Re: regex needed for http link

Posted by Joseph Acquisto <jo...@j4computers.com>.
>>> On 5/16/2012 at 5:05 PM, "Joseph Acquisto" <jo...@j4computers.com> wrote:
> I have been unsuccessful creating a rule to detect and weight http links in 
> message body, such as this one below:
> 
> http://boguslink.ru 
> 
> The ones I have created get "hits" when tested on the command line, but 
> don't seem to work in local.cf.  Maybe that's the wrong place?

I should have said, to detect the two character country code.


Re: regex needed for http link

Posted by Joseph Acquisto <jo...@j4computers.com>.
>>> On 5/16/2012 at 8:28 PM, John Hardin <jh...@impsec.org> wrote:
> On Wed, 16 May 2012, Joseph Acquisto wrote:
> 
>> I have been unsuccessful creating a rule to detect and weight http links in 
> message body, such as this one below:
>>
>> http://boguslink.ru 
>>
>> The ones I have created get "hits" when tested on the command line, but 
>> don't seem to work in local.cf.  Maybe that's the wrong place?
> 
> Are you restarting the spamd or amavisd daemon after you make your change?
> 
> -- 
>   John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/ 
>   jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org 
>   key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
> -----------------------------------------------------------------------
>    Taking my gun away because I *might* shoot someone is like cutting
>    my tongue out because I *might* yell "Fire!" in a crowded theater.
>                                                    -- Peter Venetoklis
> -----------------------------------------------------------------------
>   3 days until SpaceX Dragon first mission to ISS

Yes, I'm restarting.  "rcspamd restart"


Re: regex needed for http link

Posted by John Hardin <jh...@impsec.org>.
On Wed, 16 May 2012, Joseph Acquisto wrote:

> I have been unsuccessful creating a rule to detect and weight http links in message body, such as this one below:
>
> http://boguslink.ru
>
> The ones I have created get "hits" when tested on the command line, but 
> don't seem to work in local.cf.  Maybe that's the wrong place?

Are you restarting the spamd or amavisd daemon after you make your change?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Taking my gun away because I *might* shoot someone is like cutting
   my tongue out because I *might* yell "Fire!" in a crowded theater.
                                                   -- Peter Venetoklis
-----------------------------------------------------------------------
  3 days until SpaceX Dragon first mission to ISS