You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Adam Katz <an...@khopis.com> on 2013/06/06 04:30:51 UTC

Re: Question about T_KHOP_FOREIGN_CLICK

On 05/31/2013 06:51 AM, Bowie Bailey wrote:
>
> On 5/31/2013 8:30 AM, Matteo Vannucchi - TeamEnterprise wrote:
>> Hello, my name is Matteo.
>>
>> I do not manage a spamassassin installation, but I would like to ask
>> this simple question, because I saw it is a rule which is used to
>> evaluate spam score.
>> I tried searching Google, the users forum, the Wiki and the Docs page
>> in the site, but did not find any information. The simple question
>> is: how does T_KHOP_FOREIGN_CLICK rule work?
>>
>> Hope the answer is as simple.
>
> It's a fairly complex regex rule.  Without spending too much time
> analyzing it, I think it is looking for a link that says "click here"
> in a language other than english.

You are correct, though it also matches English.  I've placed a
syntactical explanation of this regex at http://regex101.com/r/qS8nF4

> A related question is why is this rule name duplicated?  My guess is
> that it was changed at some point from a rawbody rule to a uri_detail
> rule and the old one was left in there.  One of them should be removed
> to avoid confusion.
>
> from 72_active.cf:
>
> rawbody    T_KHOP_FOREIGN_CLICK
> m{\bhref=[^>]{9,199}>[^<]{0,80}(?:<(?!/a\b)[^>]{0,299}>[^<]{0,80}){0,9}[^<]{0,80}\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a
> ]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)}si
>
> uri_detail T_KHOP_FOREIGN_CLICK text =~
> /\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a
> ]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)/i
>
The sandbox promotion system does make this a bit more confusing than it
should be (using a double negative), but it is assembling the two
versions of the rule correctly:

##{ T_KHOP_FOREIGN_CLICK if ! plugin (Mail::SpamAssassin::Plugin::URIDetail)

if ! plugin (Mail::SpamAssassin::Plugin::URIDetail)
  rawbody    T_KHOP_FOREIGN_CLICK       m{\bhref=[^>]{9,199}>[^<]{0,80}(?:<(?!/a\b)[^>]{0,299}>[^<]{0,80}){0,9}[^<]{0,80}\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a ]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)}si
endif
##} T_KHOP_FOREIGN_CLICK if ! plugin (Mail::SpamAssassin::Plugin::URIDetail)

##{ if !(! plugin (Mail::SpamAssassin::Plugin::URIDetail))_sandbox

if !(! plugin (Mail::SpamAssassin::Plugin::URIDetail))
  uri_detail T_KHOP_FOREIGN_CLICK       text =~ /\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a ]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)/i
endif
##} if !(! plugin (Mail::SpamAssassin::Plugin::URIDetail))_sandbox

This means that the rawbody version is used if URIDetail isn't loaded
and the uri_detail version is used if the URIDetail plugin is loaded.

Re: Question about T_KHOP_FOREIGN_CLICK

Posted by Bowie Bailey <Bo...@BUC.com>.
On 6/5/2013 10:30 PM, Adam Katz wrote:
> On 05/31/2013 06:51 AM, Bowie Bailey wrote:
>>
>> On 5/31/2013 8:30 AM, Matteo Vannucchi - TeamEnterprise wrote:
>>> Hello, my name is Matteo.
>>>
>>> I do not manage a spamassassin installation, but I would like to ask 
>>> this simple question, because I saw it is a rule which is used to 
>>> evaluate spam score.
>>> I tried searching Google, the users forum, the Wiki and the Docs 
>>> page in the site, but did not find any information. The simple 
>>> question is: how does T_KHOP_FOREIGN_CLICK rule work?
>>>
>>> Hope the answer is as simple.
>>
>> It's a fairly complex regex rule.  Without spending too much time 
>> analyzing it, I think it is looking for a link that says "click here" 
>> in a language other than english.
>
> You are correct, though it also matches English.  I've placed a 
> syntactical explanation of this regex at http://regex101.com/r/qS8nF4

Ah... That makes it perfectly clear!   ;)

Nice site though...  I'll have to bookmark that one for the next time 
one of my regexs isn't doing what I expect.  I can never remember those 
sites when I need them.

>
>> A related question is why is this rule name duplicated?  My guess is 
>> that it was changed at some point from a rawbody rule to a uri_detail 
>> rule and the old one was left in there.  One of them should be 
>> removed to avoid confusion.
>>
>> from 72_active.cf:
>>
>> rawbody    T_KHOP_FOREIGN_CLICK 
>> m{\bhref=[^>]{9,199}>[^<]{0,80}(?:<(?!/a\b)[^>]{0,299}>[^<]{0,80}){0,9}[^<]{0,80}\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a 
>> ]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)}si
>>
>> uri_detail T_KHOP_FOREIGN_CLICK text =~ 
>> /\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a 
>> ]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)/i
>>
> The sandbox promotion system does make this a bit more confusing than 
> it should be (using a double negative), but it is assembling the two 
> versions of the rule correctly:
>
> ##{ T_KHOP_FOREIGN_CLICK if ! plugin (Mail::SpamAssassin::Plugin::URIDetail)
>
> if ! plugin (Mail::SpamAssassin::Plugin::URIDetail)
>    rawbody    T_KHOP_FOREIGN_CLICK       m{\bhref=[^>]{9,199}>[^<]{0,80}(?:<(?!/a\b)[^>]{0,299}>[^<]{0,80}){0,9}[^<]{0,80}\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a ]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)}si
> endif
> ##} T_KHOP_FOREIGN_CLICK if ! plugin (Mail::SpamAssassin::Plugin::URIDetail)
>
> ##{ if !(! plugin (Mail::SpamAssassin::Plugin::URIDetail))_sandbox
>
> if !(! plugin (Mail::SpamAssassin::Plugin::URIDetail))
>    uri_detail T_KHOP_FOREIGN_CLICK       text =~ /\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a ]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)/i
> endif
> ##} if !(! plugin (Mail::SpamAssassin::Plugin::URIDetail))_sandbox
> This means that the rawbody version is used if URIDetail isn't loaded 
> and the uri_detail version is used if the URIDetail plugin is loaded.

That explains it.  I was grepping the file and didn't think to look for 
conditionals around the rules.

-- 
Bowie