You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Adam Katz <an...@khopis.com> on 2013/06/06 04:30:51 UTC
Re: Question about T_KHOP_FOREIGN_CLICK
On 05/31/2013 06:51 AM, Bowie Bailey wrote:
>
> On 5/31/2013 8:30 AM, Matteo Vannucchi - TeamEnterprise wrote:
>> Hello, my name is Matteo.
>>
>> I do not manage a spamassassin installation, but I would like to ask
>> this simple question, because I saw it is a rule which is used to
>> evaluate spam score.
>> I tried searching Google, the users forum, the Wiki and the Docs page
>> in the site, but did not find any information. The simple question
>> is: how does T_KHOP_FOREIGN_CLICK rule work?
>>
>> Hope the answer is as simple.
>
> It's a fairly complex regex rule. Without spending too much time
> analyzing it, I think it is looking for a link that says "click here"
> in a language other than english.
You are correct, though it also matches English. I've placed a
syntactical explanation of this regex at http://regex101.com/r/qS8nF4
> A related question is why is this rule name duplicated? My guess is
> that it was changed at some point from a rawbody rule to a uri_detail
> rule and the old one was left in there. One of them should be removed
> to avoid confusion.
>
> from 72_active.cf:
>
> rawbody T_KHOP_FOREIGN_CLICK
> m{\bhref=[^>]{9,199}>[^<]{0,80}(?:<(?!/a\b)[^>]{0,299}>[^<]{0,80}){0,9}[^<]{0,80}\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a
> ]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)}si
>
> uri_detail T_KHOP_FOREIGN_CLICK text =~
> /\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a
> ]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)/i
>
The sandbox promotion system does make this a bit more confusing than it
should be (using a double negative), but it is assembling the two
versions of the rule correctly:
##{ T_KHOP_FOREIGN_CLICK if ! plugin (Mail::SpamAssassin::Plugin::URIDetail)
if ! plugin (Mail::SpamAssassin::Plugin::URIDetail)
rawbody T_KHOP_FOREIGN_CLICK m{\bhref=[^>]{9,199}>[^<]{0,80}(?:<(?!/a\b)[^>]{0,299}>[^<]{0,80}){0,9}[^<]{0,80}\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a ]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)}si
endif
##} T_KHOP_FOREIGN_CLICK if ! plugin (Mail::SpamAssassin::Plugin::URIDetail)
##{ if !(! plugin (Mail::SpamAssassin::Plugin::URIDetail))_sandbox
if !(! plugin (Mail::SpamAssassin::Plugin::URIDetail))
uri_detail T_KHOP_FOREIGN_CLICK text =~ /\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a ]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)/i
endif
##} if !(! plugin (Mail::SpamAssassin::Plugin::URIDetail))_sandbox
This means that the rawbody version is used if URIDetail isn't loaded
and the uri_detail version is used if the URIDetail plugin is loaded.
Re: Question about T_KHOP_FOREIGN_CLICK
Posted by Bowie Bailey <Bo...@BUC.com>.
On 6/5/2013 10:30 PM, Adam Katz wrote:
> On 05/31/2013 06:51 AM, Bowie Bailey wrote:
>>
>> On 5/31/2013 8:30 AM, Matteo Vannucchi - TeamEnterprise wrote:
>>> Hello, my name is Matteo.
>>>
>>> I do not manage a spamassassin installation, but I would like to ask
>>> this simple question, because I saw it is a rule which is used to
>>> evaluate spam score.
>>> I tried searching Google, the users forum, the Wiki and the Docs
>>> page in the site, but did not find any information. The simple
>>> question is: how does T_KHOP_FOREIGN_CLICK rule work?
>>>
>>> Hope the answer is as simple.
>>
>> It's a fairly complex regex rule. Without spending too much time
>> analyzing it, I think it is looking for a link that says "click here"
>> in a language other than english.
>
> You are correct, though it also matches English. I've placed a
> syntactical explanation of this regex at http://regex101.com/r/qS8nF4
Ah... That makes it perfectly clear! ;)
Nice site though... I'll have to bookmark that one for the next time
one of my regexs isn't doing what I expect. I can never remember those
sites when I need them.
>
>> A related question is why is this rule name duplicated? My guess is
>> that it was changed at some point from a rawbody rule to a uri_detail
>> rule and the old one was left in there. One of them should be
>> removed to avoid confusion.
>>
>> from 72_active.cf:
>>
>> rawbody T_KHOP_FOREIGN_CLICK
>> m{\bhref=[^>]{9,199}>[^<]{0,80}(?:<(?!/a\b)[^>]{0,299}>[^<]{0,80}){0,9}[^<]{0,80}\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a
>> ]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)}si
>>
>> uri_detail T_KHOP_FOREIGN_CLICK text =~
>> /\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a
>> ]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)/i
>>
> The sandbox promotion system does make this a bit more confusing than
> it should be (using a double negative), but it is assembling the two
> versions of the rule correctly:
>
> ##{ T_KHOP_FOREIGN_CLICK if ! plugin (Mail::SpamAssassin::Plugin::URIDetail)
>
> if ! plugin (Mail::SpamAssassin::Plugin::URIDetail)
> rawbody T_KHOP_FOREIGN_CLICK m{\bhref=[^>]{9,199}>[^<]{0,80}(?:<(?!/a\b)[^>]{0,299}>[^<]{0,80}){0,9}[^<]{0,80}\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a ]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)}si
> endif
> ##} T_KHOP_FOREIGN_CLICK if ! plugin (Mail::SpamAssassin::Plugin::URIDetail)
>
> ##{ if !(! plugin (Mail::SpamAssassin::Plugin::URIDetail))_sandbox
>
> if !(! plugin (Mail::SpamAssassin::Plugin::URIDetail))
> uri_detail T_KHOP_FOREIGN_CLICK text =~ /\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a ]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)/i
> endif
> ##} if !(! plugin (Mail::SpamAssassin::Plugin::URIDetail))_sandbox
> This means that the rawbody version is used if URIDetail isn't loaded
> and the uri_detail version is used if the URIDetail plugin is loaded.
That explains it. I was grepping the file and didn't think to look for
conditionals around the rules.
--
Bowie