You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Adam Katz <an...@khopis.com> on 2015/01/09 01:23:18 UTC
SARE RULEGEN, Re: Rule updates....
Ran these against my corpus. Here are the worst performers (lots in
common with RW's complaints):
*SPAM% HAM% S/O NAME*
0.013 0.153 0.080 __RULEGEN_PHISH_BLR6YY
0.006 0.286 0.022 __RULEGEN_PHISH_0ATBRI
0.008 0.334 0.023 __RULEGEN_PHISH_L3I0Z5
0.002 0.300 0.006 __RULEGEN_PHISH_LGYG7Q
0.017 1.387 0.012 __RULEGEN_PHISH_QVS6GE
0.045 2.490 0.018 __RULEGEN_PHISH_UNQ4VP
0.027 2.011 0.013 __RULEGEN_PHISH_B9HL3A
body __RULEGEN_PHISH_UNQ4VP / may contain information that is /
body __RULEGEN_PHISH_QVS6GE / or entity to which it is addressed/
body __RULEGEN_PHISH_B9HL3A /The information contained in this /
body __RULEGEN_PHISH_0ATBRI / it is addressed\. If you are n/
body __RULEGEN_PHISH_LGYG7Q / you have received it in error. /
body __RULEGEN_PHISH_BLR6YY /uthorised and regulated by the /
body __RULEGEN_PHISH_L3I0Z5 / is intended solely for the ..d/
A large number of the FPs come from Paypal and similar services.
Even controlling for those, I haven't found the phishing ruleset useful
at all. The fraud rules do have limited utility.
What relationship does this have to the 10+ year-old SARE stuff?
On 12/20/2014 03:35 AM, Axb wrote:
> On 12/18/2014 06:27 PM, RW wrote:
>> On Tue, 16 Dec 2014 13:10:05 +0100
>> Axb wrote:
>>
>>> https://sourceforge.net/projects/sare/files/
>>>
>>> replaces any older version.
>>>
>>> leech while it lasts....
>>>
>>> adjust scores if needed..
>>
>>
>> There are some rules that shouldn't be there. (I only tested a few that
>> looked the most dubious)
>>
>> The first is a common phrase in mail from UK banks and other financial
>> services companies. Note the "ise" spelling which is common outside
>> the US.
>>
>> body __RULEGEN_PHISH_BLR6YY /uthorised and regulated by the /
>>
>>
>> The following are common in legal disclaimer signatures:
>>
>> body __RULEGEN_PHISH_UNQ4VP / may contain information that is /
>> body __RULEGEN_PHISH_B9HL3A /The information contained in this /
>> body __RULEGEN_PHISH_C6URDE / do not necessarily represent those of /
>> body __RULEGEN_PHISH_L3I0Z5 / is intended solely for the ..d/
>>
>>
>> This hits some of of my ham:
>>
>> body __RULEGEN_PHISH_SRX3XZ / apologize for any inconvenience/
>>
>>
>> Unless there's a bug, the fact that those disclaimer phrases got through
>> suggests that these rules are either intended to be very much more
>> aggressive than the SOUGHT rules, or the ham corpus isn't good enough.
>
>
> as the rules were generated with donated corpus data, you're more than
> welcome to send me an archive of ham samples to avoid these potential
> issues.
>
>
>
>
>
Re: SARE RULEGEN, Re: Rule updates....
Posted by Axb <ax...@gmail.com>.
On 01/09/2015 01:23 AM, Adam Katz wrote:
> Ran these against my corpus. Here are the worst performers (lots in
> common with RW's complaints):
>
> *SPAM% HAM% S/O NAME*
> 0.013 0.153 0.080 __RULEGEN_PHISH_BLR6YY
> 0.006 0.286 0.022 __RULEGEN_PHISH_0ATBRI
> 0.008 0.334 0.023 __RULEGEN_PHISH_L3I0Z5
> 0.002 0.300 0.006 __RULEGEN_PHISH_LGYG7Q
> 0.017 1.387 0.012 __RULEGEN_PHISH_QVS6GE
> 0.045 2.490 0.018 __RULEGEN_PHISH_UNQ4VP
> 0.027 2.011 0.013 __RULEGEN_PHISH_B9HL3A
>
> body __RULEGEN_PHISH_UNQ4VP / may contain information that is /
> body __RULEGEN_PHISH_QVS6GE / or entity to which it is addressed/
> body __RULEGEN_PHISH_B9HL3A /The information contained in this /
> body __RULEGEN_PHISH_0ATBRI / it is addressed\. If you are n/
> body __RULEGEN_PHISH_LGYG7Q / you have received it in error. /
> body __RULEGEN_PHISH_BLR6YY /uthorised and regulated by the /
> body __RULEGEN_PHISH_L3I0Z5 / is intended solely for the ..d/
>
> A large number of the FPs come from Paypal and similar services.
Agreed, the rules are not close to ideal.
The spam corpus is ancient, the ham corpus is too small.
> Even controlling for those, I haven't found the phishing ruleset useful
> at all. The fraud rules do have limited utility.
Agreed - blam bad & stale data.
> What relationship does this have to the 10+ year-old SARE stuff?
I was part of the SARE group, and saved the rules (for historical
reasons) to SF before the web site was shutdown for good.
As I don't have the means to set up a SA update channel, putting the
RULEGEN rules on SF was the only option I had left.