You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Adam Katz <an...@khopis.com> on 2015/01/09 01:23:18 UTC

SARE RULEGEN, Re: Rule updates....

Ran these against my corpus.  Here are the worst performers (lots in
common with RW's complaints):

*SPAM%   HAM%    S/O  NAME*
0.013  0.153  0.080  __RULEGEN_PHISH_BLR6YY
0.006  0.286  0.022  __RULEGEN_PHISH_0ATBRI
0.008  0.334  0.023  __RULEGEN_PHISH_L3I0Z5
0.002  0.300  0.006  __RULEGEN_PHISH_LGYG7Q
0.017  1.387  0.012  __RULEGEN_PHISH_QVS6GE
0.045  2.490  0.018  __RULEGEN_PHISH_UNQ4VP
0.027  2.011  0.013  __RULEGEN_PHISH_B9HL3A

body __RULEGEN_PHISH_UNQ4VP  / may contain information that is /
body __RULEGEN_PHISH_QVS6GE  / or entity to which it is addressed/
body __RULEGEN_PHISH_B9HL3A  /The information contained in this /
body __RULEGEN_PHISH_0ATBRI  / it is addressed\. If you are n/
body __RULEGEN_PHISH_LGYG7Q  / you have received it in error. /
body __RULEGEN_PHISH_BLR6YY  /uthorised and regulated by the /
body __RULEGEN_PHISH_L3I0Z5  / is intended solely for the ..d/

A large number of the FPs come from Paypal and similar services.

Even controlling for those, I haven't found the phishing ruleset useful
at all.  The fraud rules do have limited utility.

What relationship does this have to the 10+ year-old SARE stuff?


On 12/20/2014 03:35 AM, Axb wrote:
> On 12/18/2014 06:27 PM, RW wrote:
>> On Tue, 16 Dec 2014 13:10:05 +0100
>> Axb wrote:
>>
>>> https://sourceforge.net/projects/sare/files/
>>>
>>> replaces any older version.
>>>
>>> leech while it lasts....
>>>
>>> adjust scores if needed..
>>
>>
>> There are some rules that shouldn't be there. (I only tested a few that
>> looked the most dubious)
>>
>> The first is a common phrase in mail from UK banks and other financial
>> services companies. Note the "ise" spelling which is common outside
>> the US.
>>
>> body __RULEGEN_PHISH_BLR6YY  /uthorised and regulated by the /
>>
>>
>> The following are common in legal disclaimer signatures:
>>
>> body __RULEGEN_PHISH_UNQ4VP  / may contain information that is /
>> body __RULEGEN_PHISH_B9HL3A  /The information contained in this /
>> body __RULEGEN_PHISH_C6URDE  / do not necessarily represent those of /
>> body __RULEGEN_PHISH_L3I0Z5  / is intended solely for the ..d/
>>
>>
>> This hits some of of my ham:
>>
>> body __RULEGEN_PHISH_SRX3XZ  / apologize for any inconvenience/
>>
>>
>> Unless there's a bug, the fact that those disclaimer phrases got through
>> suggests that these rules are either intended to be very much more
>> aggressive than the SOUGHT rules,  or the ham corpus isn't good enough.
>
>
> as the rules were generated with donated corpus data, you're more than
> welcome to send me an archive of ham samples to avoid these potential
> issues.
>
>
>
>
>


Re: SARE RULEGEN, Re: Rule updates....

Posted by Axb <ax...@gmail.com>.
On 01/09/2015 01:23 AM, Adam Katz wrote:
> Ran these against my corpus.  Here are the worst performers (lots in
> common with RW's complaints):
>
> *SPAM%   HAM%    S/O  NAME*
> 0.013  0.153  0.080  __RULEGEN_PHISH_BLR6YY
> 0.006  0.286  0.022  __RULEGEN_PHISH_0ATBRI
> 0.008  0.334  0.023  __RULEGEN_PHISH_L3I0Z5
> 0.002  0.300  0.006  __RULEGEN_PHISH_LGYG7Q
> 0.017  1.387  0.012  __RULEGEN_PHISH_QVS6GE
> 0.045  2.490  0.018  __RULEGEN_PHISH_UNQ4VP
> 0.027  2.011  0.013  __RULEGEN_PHISH_B9HL3A
>
> body __RULEGEN_PHISH_UNQ4VP  / may contain information that is /
> body __RULEGEN_PHISH_QVS6GE  / or entity to which it is addressed/
> body __RULEGEN_PHISH_B9HL3A  /The information contained in this /
> body __RULEGEN_PHISH_0ATBRI  / it is addressed\. If you are n/
> body __RULEGEN_PHISH_LGYG7Q  / you have received it in error. /
> body __RULEGEN_PHISH_BLR6YY  /uthorised and regulated by the /
> body __RULEGEN_PHISH_L3I0Z5  / is intended solely for the ..d/
>
> A large number of the FPs come from Paypal and similar services.

Agreed, the rules are not close to ideal.
The spam corpus is ancient, the ham corpus is too small.


> Even controlling for those, I haven't found the phishing ruleset useful
> at all.  The fraud rules do have limited utility.

Agreed - blam bad & stale data.

> What relationship does this have to the 10+ year-old SARE stuff?

I was part of the SARE group, and saved the rules (for historical 
reasons) to SF before the web site was shutdown for good.

As I don't have the means to set up a SA update channel, putting the 
RULEGEN rules on SF was the only option I had left.