You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Marc Perkel <su...@junkemailfilter.com> on 2016/02/03 02:55:25 UTC

Question about spam report header

Normally SA creates a header that has a list of the names of rules that 
matched. It skips the listing of hidden rules that start with __ .

Is there a command where I can easily tell SA to include the hidden 
rules in the report in the headers so I can see all of it?

-- 
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400


Re: Question about spam report header

Posted by Marc Perkel <su...@junkemailfilter.com>.
perl -p -i -e 's/__/T_/g' /usr/share/spamassassin/updates_spamassassin_org/*

This converts the rules. I'm doing something very interesting. It's 
going to take a few days to see if it works.

I'm applying the same techniques of my evolution filter to the SA rule 
names.

I extract the names and then run them into a program that create all 
combinations up to 4 levels and learn those combos as either spam or ham.

Then after building a ham and spam corpus sets I take the test message - 
create  set of rule combinations and then do set campares against the to 
ham and spam sets.

What I'm looking for is combos matching ham and NOT matching spam - or - 
combinations matching spam and NOT matching ham.

In theory I should be able to create thousands of combination rules for 
both ham and spam that all have a very high probably of being accurate. 
It's just an interesting experiment to see how well it works.

Right now I have 151728 ham combination, 113632 spam combinations. Of 
those only 22933 are in both sets. It's only been learning for one day. 
I want to see where it is after a week.

Buy changing the rules from __ to T_ I exposed a lot more rule names. 
The way this works is that I don't need to know what rules are ham rules 
or spam rules in advance. And I don't need to score them. The filter 
figures it all out on it's own. So the rule names are just information.

I think this trick will make SA far more accurate. We'll see. I want to 
give it till at least Friday for the system to learn. I'm also storing 
hit counts so that I could pick out maybe the best 1000 rules and 
publish them.

Anyhow - that's what I'm up to and so far results are good. But because 
it's early in the learning cycle most message are not yet producing 
significant scores. The ones that are producing scores are making the 
right call however.



On 02/02/16 20:19, Dave Funk wrote:
> You can do that but it requires editing all your rule files, altho 
> then you see those matches in all your reports.
>
> If you just want to test one particular message, just use the -D 
> option to spamassassin and grep for ' got hit: '
>
> Mar 11 21:51:44.203 [5074] dbg: rules: ran header rule __MIME_VERSION 
> ======> got hit: "<YES>"
> Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule 
> __TO_HEADER_EXISTS ======> got hit: "<"
> Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __TOCC_EXISTS 
> ======> got hit: "<YES>"
> Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __KAM_UPS2 
> ======> got hit: "negative match"
> Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __KAM_JURY3 
> ======> got hit: "negative match"
> Mar 11 21:51:44.205 [5074] dbg: rules: ran header rule __HAS_FROM 
> ======> got hit: "<YES>"
>
> (Yes, Marc, you probably already know this, this is for the other 
> people who might be following this thread ;)
>
> On Tue, 2 Feb 2016, Marc Perkel wrote:
>
>> Never mind ....
>>
>> I found that if I change __ to T_ that it does what I want.
>>
>>
>> On 02/02/16 18:05, Marc Perkel wrote:
>>>
>>> On 02/02/16 17:55, Marc Perkel wrote:
>>>> Normally SA creates a header that has a list of the names of rules 
>>>> that matched. It skips the listing of hidden rules that start with 
>>>> __ .
>>>>
>>>> Is there a command where I can easily tell SA to include the hidden 
>>>> rules in the report in the headers so I can see all of it?
>>>>
>>>
>>> I'm also - I suppose asking it to list rules that match that produce 
>>> no scores.
>>>
>>> body      __LATE_RICH_RELATIVE     /\blate 
>>> .{0,15}(?:father|wife|widow|husband|general|president|daughter|son|minister|client)/i
>>>
>>> body      __CT_CLICK               /\b(click(ing)? 
>>> (here|now|this|on|below|.{0,9}(hyper)?link))|visit(ing)?this link\b/i
>>>
>>> body      __BENEFICIARY            /\bbeneficiary\b/i
>>>
>>> body      __CT_BEGGER              /\b(kind assist[ae]nce|feed my 
>>> family|need (of )?your help|donat(e|ion))\b/i
>>>
>>> body      __CT_CONTACT             /\b((contact(?:ing) you|contact 
>>> (information|me|email|number|us)|your contact))|to (inform|email) you/i
>>>
>>> body      __CT_REPLY_TO_ME         /\b(reply to me|please reply|my 
>>> email address|private email|contact me|prompt response|reply from 
>>> you|hearing from you|assist me)/i
>>>
>>> body      __CT_DYING               /\b(diagnosed with|months to 
>>> live|dying of|transplant)\b/i
>>>
>>> body      __CT_UNITED_NATIONS      /\bUnited Nations?\b/i
>>>
>>> meta      __CT_STRANGER            CT_MY_NAME_IS || CT_DEAR_FRIEND 
>>> || CT_DEAR_SOMETHING || CT_SIR_MADAM || CT_INTRODUCE
>>>
>>> meta      __CT_MONEY               CT_TRANSFER_MONEY || 
>>> CT_THE_SUM_OF || CT_EARN_MONEY || LOTS_OF_MONEY || MILLION_USD || 
>>> FUZZY_MILLION || GIVE_YOU_MONEY || __CT_BANK || BILLION_DOLLARS || 
>>> US_DOLLARS_2 || ADVA$
>>>
>>> meta      __CT_VICTIM              __BENEFICIARY || 
>>> CT_LATE_PRESIDENT || CT_LATE_RICH_RELATIVE || __CT_DYING
>>>
>>> meta      __CT_FORM                FILL_THIS_FORM || 
>>> FILL_THIS_FORM_LONG || T_FILL_THIS_FORM_SHORT
>>>
>>> meta      __CT_CONFIDENTIAL        CT_PRIVATE_EMAIL || 
>>> CT_PRIVATE_PHONE || CONFIDENTIAL_SCAM1 || CONFIDENTIAL_SCAM2
>>>
>>> meta      __CT_NOW                 CT_ACT_NOW || CT_DO_IT_TODAY || 
>>> CT_URGENT_RESPOND
>>>
>>> meta      CT_GOD_BENEFICIARY       __CT_GOD && __CT_VICTIM
>>> describe  CT_GOD_BENEFICIARY       God and Beneficiary
>>> score     CT_GOD_BENEFICIARY       4
>>>
>>> meta      CT_GOD_BEGGER            __CT_GOD && __CT_BEGGER
>>> describe  CT_GOD_BEGGER            Begging in Religious Language
>>> score     CT_GOD_BEGGER            3
>>>
>>>
>>
>>
>

-- 
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400


Re: Question about spam report header

Posted by Dave Funk <db...@engineering.uiowa.edu>.
You can do that but it requires editing all your rule files, altho then 
you see those matches in all your reports.

If you just want to test one particular message, just use the -D option to 
spamassassin and grep for ' got hit: '

Mar 11 21:51:44.203 [5074] dbg: rules: ran header rule __MIME_VERSION ======> got hit: "<YES>"
Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __TO_HEADER_EXISTS ======> got hit: "<"
Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __TOCC_EXISTS ======> got hit: "<YES>"
Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __KAM_UPS2 ======> got hit: "negative match"
Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __KAM_JURY3 ======> got hit: "negative match"
Mar 11 21:51:44.205 [5074] dbg: rules: ran header rule __HAS_FROM ======> got hit: "<YES>"

(Yes, Marc, you probably already know this, this is for the other people 
who might be following this thread ;)

On Tue, 2 Feb 2016, Marc Perkel wrote:

> Never mind ....
>
> I found that if I change __ to T_ that it does what I want.
>
>
> On 02/02/16 18:05, Marc Perkel wrote:
>> 
>> On 02/02/16 17:55, Marc Perkel wrote:
>>> Normally SA creates a header that has a list of the names of rules that 
>>> matched. It skips the listing of hidden rules that start with __ .
>>> 
>>> Is there a command where I can easily tell SA to include the hidden rules 
>>> in the report in the headers so I can see all of it?
>>> 
>> 
>> I'm also - I suppose asking it to list rules that match that produce no 
>> scores.
>> 
>> body      __LATE_RICH_RELATIVE     /\blate 
>> .{0,15}(?:father|wife|widow|husband|general|president|daughter|son|minister|client)/i
>> 
>> body      __CT_CLICK               /\b(click(ing)? 
>> (here|now|this|on|below|.{0,9}(hyper)?link))|visit(ing)?this link\b/i
>> 
>> body      __BENEFICIARY            /\bbeneficiary\b/i
>> 
>> body      __CT_BEGGER              /\b(kind assist[ae]nce|feed my 
>> family|need (of )?your help|donat(e|ion))\b/i
>> 
>> body      __CT_CONTACT             /\b((contact(?:ing) you|contact 
>> (information|me|email|number|us)|your contact))|to (inform|email) you/i
>> 
>> body      __CT_REPLY_TO_ME         /\b(reply to me|please reply|my email 
>> address|private email|contact me|prompt response|reply from you|hearing 
>> from you|assist me)/i
>> 
>> body      __CT_DYING               /\b(diagnosed with|months to live|dying 
>> of|transplant)\b/i
>> 
>> body      __CT_UNITED_NATIONS      /\bUnited Nations?\b/i
>> 
>> meta      __CT_STRANGER            CT_MY_NAME_IS || CT_DEAR_FRIEND || 
>> CT_DEAR_SOMETHING || CT_SIR_MADAM || CT_INTRODUCE
>> 
>> meta      __CT_MONEY               CT_TRANSFER_MONEY || CT_THE_SUM_OF || 
>> CT_EARN_MONEY || LOTS_OF_MONEY || MILLION_USD || FUZZY_MILLION || 
>> GIVE_YOU_MONEY || __CT_BANK || BILLION_DOLLARS || US_DOLLARS_2 || ADVA$
>> 
>> meta      __CT_VICTIM              __BENEFICIARY || CT_LATE_PRESIDENT || 
>> CT_LATE_RICH_RELATIVE || __CT_DYING
>> 
>> meta      __CT_FORM                FILL_THIS_FORM || FILL_THIS_FORM_LONG || 
>> T_FILL_THIS_FORM_SHORT
>> 
>> meta      __CT_CONFIDENTIAL        CT_PRIVATE_EMAIL || CT_PRIVATE_PHONE || 
>> CONFIDENTIAL_SCAM1 || CONFIDENTIAL_SCAM2
>> 
>> meta      __CT_NOW                 CT_ACT_NOW || CT_DO_IT_TODAY || 
>> CT_URGENT_RESPOND
>> 
>> meta      CT_GOD_BENEFICIARY       __CT_GOD && __CT_VICTIM
>> describe  CT_GOD_BENEFICIARY       God and Beneficiary
>> score     CT_GOD_BENEFICIARY       4
>> 
>> meta      CT_GOD_BEGGER            __CT_GOD && __CT_BEGGER
>> describe  CT_GOD_BEGGER            Begging in Religious Language
>> score     CT_GOD_BEGGER            3
>> 
>> 
>
>

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: Question about spam report header

Posted by RW <rw...@googlemail.com>.
On Wed, 03 Feb 2016 05:48:00 +0100
Benny Pedersen wrote:


> note:
> 
> __ have no score need
> T_ must have socre, if not defined it defults to 1.0

An ordinary rule defaults to 1.0, a rule that start with T_ defaults to 0.01. 

Re: Question about spam report header

Posted by Benny Pedersen <me...@junc.eu>.
On 2016-02-03 04:16, Marc Perkel wrote:
> Never mind ....

are you elvis fan ? :=)

> I found that if I change __ to T_ that it does what I want.

it does ?

note:

__ have no score need
T_ must have socre, if not defined it defults to 1.0

so did you try reading one of elvis records here:

perldoc Mail::SpamAssassin::Conf

see section about add header tags

hopefully your working licenses to kill allow you to still read terminal 
docs ? :=)

> 
> 
> On 02/02/16 18:05, Marc Perkel wrote:
>> 
>> On 02/02/16 17:55, Marc Perkel wrote:
>>> Normally SA creates a header that has a list of the names of rules 
>>> that matched. It skips the listing of hidden rules that start with __ 
>>> .
>>> 
>>> Is there a command where I can easily tell SA to include the hidden 
>>> rules in the report in the headers so I can see all of it?
>>> 
>> 
>> I'm also - I suppose asking it to list rules that match that produce 
>> no scores.
>> 
>> body      __LATE_RICH_RELATIVE     /\blate 
>> .{0,15}(?:father|wife|widow|husband|general|president|daughter|son|minister|client)/i
>> 
>> body      __CT_CLICK               /\b(click(ing)? 
>> (here|now|this|on|below|.{0,9}(hyper)?link))|visit(ing)?this link\b/i
>> 
>> body      __BENEFICIARY            /\bbeneficiary\b/i
>> 
>> body      __CT_BEGGER              /\b(kind assist[ae]nce|feed my 
>> family|need (of )?your help|donat(e|ion))\b/i
>> 
>> body      __CT_CONTACT             /\b((contact(?:ing) you|contact 
>> (information|me|email|number|us)|your contact))|to (inform|email) 
>> you/i
>> 
>> body      __CT_REPLY_TO_ME         /\b(reply to me|please reply|my 
>> email address|private email|contact me|prompt response|reply from 
>> you|hearing from you|assist me)/i
>> 
>> body      __CT_DYING               /\b(diagnosed with|months to 
>> live|dying of|transplant)\b/i
>> 
>> body      __CT_UNITED_NATIONS      /\bUnited Nations?\b/i
>> 
>> meta      __CT_STRANGER            CT_MY_NAME_IS || CT_DEAR_FRIEND || 
>> CT_DEAR_SOMETHING || CT_SIR_MADAM || CT_INTRODUCE
>> 
>> meta      __CT_MONEY               CT_TRANSFER_MONEY || CT_THE_SUM_OF 
>> || CT_EARN_MONEY || LOTS_OF_MONEY || MILLION_USD || FUZZY_MILLION || 
>> GIVE_YOU_MONEY || __CT_BANK || BILLION_DOLLARS || US_DOLLARS_2 || 
>> ADVA$
>> 
>> meta      __CT_VICTIM              __BENEFICIARY || CT_LATE_PRESIDENT 
>> || CT_LATE_RICH_RELATIVE || __CT_DYING
>> 
>> meta      __CT_FORM                FILL_THIS_FORM || 
>> FILL_THIS_FORM_LONG || T_FILL_THIS_FORM_SHORT
>> 
>> meta      __CT_CONFIDENTIAL        CT_PRIVATE_EMAIL || 
>> CT_PRIVATE_PHONE || CONFIDENTIAL_SCAM1 || CONFIDENTIAL_SCAM2
>> 
>> meta      __CT_NOW                 CT_ACT_NOW || CT_DO_IT_TODAY || 
>> CT_URGENT_RESPOND
>> 
>> meta      CT_GOD_BENEFICIARY       __CT_GOD && __CT_VICTIM
>> describe  CT_GOD_BENEFICIARY       God and Beneficiary
>> score     CT_GOD_BENEFICIARY       4
>> 
>> meta      CT_GOD_BEGGER            __CT_GOD && __CT_BEGGER
>> describe  CT_GOD_BEGGER            Begging in Religious Language
>> score     CT_GOD_BEGGER            3
>> 
>> 

Re: Question about spam report header

Posted by Marc Perkel <su...@junkemailfilter.com>.
Never mind ....

I found that if I change __ to T_ that it does what I want.


On 02/02/16 18:05, Marc Perkel wrote:
>
> On 02/02/16 17:55, Marc Perkel wrote:
>> Normally SA creates a header that has a list of the names of rules 
>> that matched. It skips the listing of hidden rules that start with __ .
>>
>> Is there a command where I can easily tell SA to include the hidden 
>> rules in the report in the headers so I can see all of it?
>>
>
> I'm also - I suppose asking it to list rules that match that produce 
> no scores.
>
> body      __LATE_RICH_RELATIVE     /\blate 
> .{0,15}(?:father|wife|widow|husband|general|president|daughter|son|minister|client)/i
>
> body      __CT_CLICK               /\b(click(ing)? 
> (here|now|this|on|below|.{0,9}(hyper)?link))|visit(ing)?this link\b/i
>
> body      __BENEFICIARY            /\bbeneficiary\b/i
>
> body      __CT_BEGGER              /\b(kind assist[ae]nce|feed my 
> family|need (of )?your help|donat(e|ion))\b/i
>
> body      __CT_CONTACT             /\b((contact(?:ing) you|contact 
> (information|me|email|number|us)|your contact))|to (inform|email) you/i
>
> body      __CT_REPLY_TO_ME         /\b(reply to me|please reply|my 
> email address|private email|contact me|prompt response|reply from 
> you|hearing from you|assist me)/i
>
> body      __CT_DYING               /\b(diagnosed with|months to 
> live|dying of|transplant)\b/i
>
> body      __CT_UNITED_NATIONS      /\bUnited Nations?\b/i
>
> meta      __CT_STRANGER            CT_MY_NAME_IS || CT_DEAR_FRIEND || 
> CT_DEAR_SOMETHING || CT_SIR_MADAM || CT_INTRODUCE
>
> meta      __CT_MONEY               CT_TRANSFER_MONEY || CT_THE_SUM_OF 
> || CT_EARN_MONEY || LOTS_OF_MONEY || MILLION_USD || FUZZY_MILLION || 
> GIVE_YOU_MONEY || __CT_BANK || BILLION_DOLLARS || US_DOLLARS_2 || ADVA$
>
> meta      __CT_VICTIM              __BENEFICIARY || CT_LATE_PRESIDENT 
> || CT_LATE_RICH_RELATIVE || __CT_DYING
>
> meta      __CT_FORM                FILL_THIS_FORM || 
> FILL_THIS_FORM_LONG || T_FILL_THIS_FORM_SHORT
>
> meta      __CT_CONFIDENTIAL        CT_PRIVATE_EMAIL || 
> CT_PRIVATE_PHONE || CONFIDENTIAL_SCAM1 || CONFIDENTIAL_SCAM2
>
> meta      __CT_NOW                 CT_ACT_NOW || CT_DO_IT_TODAY || 
> CT_URGENT_RESPOND
>
> meta      CT_GOD_BENEFICIARY       __CT_GOD && __CT_VICTIM
> describe  CT_GOD_BENEFICIARY       God and Beneficiary
> score     CT_GOD_BENEFICIARY       4
>
> meta      CT_GOD_BEGGER            __CT_GOD && __CT_BEGGER
> describe  CT_GOD_BEGGER            Begging in Religious Language
> score     CT_GOD_BEGGER            3
>
>

-- 
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400


Re: Question about spam report header

Posted by Marc Perkel <su...@junkemailfilter.com>.
On 02/02/16 17:55, Marc Perkel wrote:
> Normally SA creates a header that has a list of the names of rules 
> that matched. It skips the listing of hidden rules that start with __ .
>
> Is there a command where I can easily tell SA to include the hidden 
> rules in the report in the headers so I can see all of it?
>

I'm also - I suppose asking it to list rules that match that produce no 
scores.

body      __LATE_RICH_RELATIVE     /\blate 
.{0,15}(?:father|wife|widow|husband|general|president|daughter|son|minister|client)/i

body      __CT_CLICK               /\b(click(ing)? 
(here|now|this|on|below|.{0,9}(hyper)?link))|visit(ing)?this link\b/i

body      __BENEFICIARY            /\bbeneficiary\b/i

body      __CT_BEGGER              /\b(kind assist[ae]nce|feed my 
family|need (of )?your help|donat(e|ion))\b/i

body      __CT_CONTACT             /\b((contact(?:ing) you|contact 
(information|me|email|number|us)|your contact))|to (inform|email) you/i

body      __CT_REPLY_TO_ME         /\b(reply to me|please reply|my email 
address|private email|contact me|prompt response|reply from you|hearing 
from you|assist me)/i

body      __CT_DYING               /\b(diagnosed with|months to 
live|dying of|transplant)\b/i

body      __CT_UNITED_NATIONS      /\bUnited Nations?\b/i

meta      __CT_STRANGER            CT_MY_NAME_IS || CT_DEAR_FRIEND || 
CT_DEAR_SOMETHING || CT_SIR_MADAM || CT_INTRODUCE

meta      __CT_MONEY               CT_TRANSFER_MONEY || CT_THE_SUM_OF || 
CT_EARN_MONEY || LOTS_OF_MONEY || MILLION_USD || FUZZY_MILLION || 
GIVE_YOU_MONEY || __CT_BANK || BILLION_DOLLARS || US_DOLLARS_2 || ADVA$

meta      __CT_VICTIM              __BENEFICIARY || CT_LATE_PRESIDENT || 
CT_LATE_RICH_RELATIVE || __CT_DYING

meta      __CT_FORM                FILL_THIS_FORM || FILL_THIS_FORM_LONG 
|| T_FILL_THIS_FORM_SHORT

meta      __CT_CONFIDENTIAL        CT_PRIVATE_EMAIL || CT_PRIVATE_PHONE 
|| CONFIDENTIAL_SCAM1 || CONFIDENTIAL_SCAM2

meta      __CT_NOW                 CT_ACT_NOW || CT_DO_IT_TODAY || 
CT_URGENT_RESPOND

meta      CT_GOD_BENEFICIARY       __CT_GOD && __CT_VICTIM
describe  CT_GOD_BENEFICIARY       God and Beneficiary
score     CT_GOD_BENEFICIARY       4

meta      CT_GOD_BEGGER            __CT_GOD && __CT_BEGGER
describe  CT_GOD_BEGGER            Begging in Religious Language
score     CT_GOD_BEGGER            3


-- 
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400