You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Marc Perkel <su...@junkemailfilter.com> on 2016/02/03 02:55:25 UTC
Question about spam report header
Normally SA creates a header that has a list of the names of rules that
matched. It skips the listing of hidden rules that start with __ .
Is there a command where I can easily tell SA to include the hidden
rules in the report in the headers so I can see all of it?
--
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400
Re: Question about spam report header
Posted by Marc Perkel <su...@junkemailfilter.com>.
perl -p -i -e 's/__/T_/g' /usr/share/spamassassin/updates_spamassassin_org/*
This converts the rules. I'm doing something very interesting. It's
going to take a few days to see if it works.
I'm applying the same techniques of my evolution filter to the SA rule
names.
I extract the names and then run them into a program that create all
combinations up to 4 levels and learn those combos as either spam or ham.
Then after building a ham and spam corpus sets I take the test message -
create set of rule combinations and then do set campares against the to
ham and spam sets.
What I'm looking for is combos matching ham and NOT matching spam - or -
combinations matching spam and NOT matching ham.
In theory I should be able to create thousands of combination rules for
both ham and spam that all have a very high probably of being accurate.
It's just an interesting experiment to see how well it works.
Right now I have 151728 ham combination, 113632 spam combinations. Of
those only 22933 are in both sets. It's only been learning for one day.
I want to see where it is after a week.
Buy changing the rules from __ to T_ I exposed a lot more rule names.
The way this works is that I don't need to know what rules are ham rules
or spam rules in advance. And I don't need to score them. The filter
figures it all out on it's own. So the rule names are just information.
I think this trick will make SA far more accurate. We'll see. I want to
give it till at least Friday for the system to learn. I'm also storing
hit counts so that I could pick out maybe the best 1000 rules and
publish them.
Anyhow - that's what I'm up to and so far results are good. But because
it's early in the learning cycle most message are not yet producing
significant scores. The ones that are producing scores are making the
right call however.
On 02/02/16 20:19, Dave Funk wrote:
> You can do that but it requires editing all your rule files, altho
> then you see those matches in all your reports.
>
> If you just want to test one particular message, just use the -D
> option to spamassassin and grep for ' got hit: '
>
> Mar 11 21:51:44.203 [5074] dbg: rules: ran header rule __MIME_VERSION
> ======> got hit: "<YES>"
> Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule
> __TO_HEADER_EXISTS ======> got hit: "<"
> Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __TOCC_EXISTS
> ======> got hit: "<YES>"
> Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __KAM_UPS2
> ======> got hit: "negative match"
> Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __KAM_JURY3
> ======> got hit: "negative match"
> Mar 11 21:51:44.205 [5074] dbg: rules: ran header rule __HAS_FROM
> ======> got hit: "<YES>"
>
> (Yes, Marc, you probably already know this, this is for the other
> people who might be following this thread ;)
>
> On Tue, 2 Feb 2016, Marc Perkel wrote:
>
>> Never mind ....
>>
>> I found that if I change __ to T_ that it does what I want.
>>
>>
>> On 02/02/16 18:05, Marc Perkel wrote:
>>>
>>> On 02/02/16 17:55, Marc Perkel wrote:
>>>> Normally SA creates a header that has a list of the names of rules
>>>> that matched. It skips the listing of hidden rules that start with
>>>> __ .
>>>>
>>>> Is there a command where I can easily tell SA to include the hidden
>>>> rules in the report in the headers so I can see all of it?
>>>>
>>>
>>> I'm also - I suppose asking it to list rules that match that produce
>>> no scores.
>>>
>>> body __LATE_RICH_RELATIVE /\blate
>>> .{0,15}(?:father|wife|widow|husband|general|president|daughter|son|minister|client)/i
>>>
>>> body __CT_CLICK /\b(click(ing)?
>>> (here|now|this|on|below|.{0,9}(hyper)?link))|visit(ing)?this link\b/i
>>>
>>> body __BENEFICIARY /\bbeneficiary\b/i
>>>
>>> body __CT_BEGGER /\b(kind assist[ae]nce|feed my
>>> family|need (of )?your help|donat(e|ion))\b/i
>>>
>>> body __CT_CONTACT /\b((contact(?:ing) you|contact
>>> (information|me|email|number|us)|your contact))|to (inform|email) you/i
>>>
>>> body __CT_REPLY_TO_ME /\b(reply to me|please reply|my
>>> email address|private email|contact me|prompt response|reply from
>>> you|hearing from you|assist me)/i
>>>
>>> body __CT_DYING /\b(diagnosed with|months to
>>> live|dying of|transplant)\b/i
>>>
>>> body __CT_UNITED_NATIONS /\bUnited Nations?\b/i
>>>
>>> meta __CT_STRANGER CT_MY_NAME_IS || CT_DEAR_FRIEND
>>> || CT_DEAR_SOMETHING || CT_SIR_MADAM || CT_INTRODUCE
>>>
>>> meta __CT_MONEY CT_TRANSFER_MONEY ||
>>> CT_THE_SUM_OF || CT_EARN_MONEY || LOTS_OF_MONEY || MILLION_USD ||
>>> FUZZY_MILLION || GIVE_YOU_MONEY || __CT_BANK || BILLION_DOLLARS ||
>>> US_DOLLARS_2 || ADVA$
>>>
>>> meta __CT_VICTIM __BENEFICIARY ||
>>> CT_LATE_PRESIDENT || CT_LATE_RICH_RELATIVE || __CT_DYING
>>>
>>> meta __CT_FORM FILL_THIS_FORM ||
>>> FILL_THIS_FORM_LONG || T_FILL_THIS_FORM_SHORT
>>>
>>> meta __CT_CONFIDENTIAL CT_PRIVATE_EMAIL ||
>>> CT_PRIVATE_PHONE || CONFIDENTIAL_SCAM1 || CONFIDENTIAL_SCAM2
>>>
>>> meta __CT_NOW CT_ACT_NOW || CT_DO_IT_TODAY ||
>>> CT_URGENT_RESPOND
>>>
>>> meta CT_GOD_BENEFICIARY __CT_GOD && __CT_VICTIM
>>> describe CT_GOD_BENEFICIARY God and Beneficiary
>>> score CT_GOD_BENEFICIARY 4
>>>
>>> meta CT_GOD_BEGGER __CT_GOD && __CT_BEGGER
>>> describe CT_GOD_BEGGER Begging in Religious Language
>>> score CT_GOD_BEGGER 3
>>>
>>>
>>
>>
>
--
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400
Re: Question about spam report header
Posted by Dave Funk <db...@engineering.uiowa.edu>.
You can do that but it requires editing all your rule files, altho then
you see those matches in all your reports.
If you just want to test one particular message, just use the -D option to
spamassassin and grep for ' got hit: '
Mar 11 21:51:44.203 [5074] dbg: rules: ran header rule __MIME_VERSION ======> got hit: "<YES>"
Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __TO_HEADER_EXISTS ======> got hit: "<"
Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __TOCC_EXISTS ======> got hit: "<YES>"
Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __KAM_UPS2 ======> got hit: "negative match"
Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __KAM_JURY3 ======> got hit: "negative match"
Mar 11 21:51:44.205 [5074] dbg: rules: ran header rule __HAS_FROM ======> got hit: "<YES>"
(Yes, Marc, you probably already know this, this is for the other people
who might be following this thread ;)
On Tue, 2 Feb 2016, Marc Perkel wrote:
> Never mind ....
>
> I found that if I change __ to T_ that it does what I want.
>
>
> On 02/02/16 18:05, Marc Perkel wrote:
>>
>> On 02/02/16 17:55, Marc Perkel wrote:
>>> Normally SA creates a header that has a list of the names of rules that
>>> matched. It skips the listing of hidden rules that start with __ .
>>>
>>> Is there a command where I can easily tell SA to include the hidden rules
>>> in the report in the headers so I can see all of it?
>>>
>>
>> I'm also - I suppose asking it to list rules that match that produce no
>> scores.
>>
>> body __LATE_RICH_RELATIVE /\blate
>> .{0,15}(?:father|wife|widow|husband|general|president|daughter|son|minister|client)/i
>>
>> body __CT_CLICK /\b(click(ing)?
>> (here|now|this|on|below|.{0,9}(hyper)?link))|visit(ing)?this link\b/i
>>
>> body __BENEFICIARY /\bbeneficiary\b/i
>>
>> body __CT_BEGGER /\b(kind assist[ae]nce|feed my
>> family|need (of )?your help|donat(e|ion))\b/i
>>
>> body __CT_CONTACT /\b((contact(?:ing) you|contact
>> (information|me|email|number|us)|your contact))|to (inform|email) you/i
>>
>> body __CT_REPLY_TO_ME /\b(reply to me|please reply|my email
>> address|private email|contact me|prompt response|reply from you|hearing
>> from you|assist me)/i
>>
>> body __CT_DYING /\b(diagnosed with|months to live|dying
>> of|transplant)\b/i
>>
>> body __CT_UNITED_NATIONS /\bUnited Nations?\b/i
>>
>> meta __CT_STRANGER CT_MY_NAME_IS || CT_DEAR_FRIEND ||
>> CT_DEAR_SOMETHING || CT_SIR_MADAM || CT_INTRODUCE
>>
>> meta __CT_MONEY CT_TRANSFER_MONEY || CT_THE_SUM_OF ||
>> CT_EARN_MONEY || LOTS_OF_MONEY || MILLION_USD || FUZZY_MILLION ||
>> GIVE_YOU_MONEY || __CT_BANK || BILLION_DOLLARS || US_DOLLARS_2 || ADVA$
>>
>> meta __CT_VICTIM __BENEFICIARY || CT_LATE_PRESIDENT ||
>> CT_LATE_RICH_RELATIVE || __CT_DYING
>>
>> meta __CT_FORM FILL_THIS_FORM || FILL_THIS_FORM_LONG ||
>> T_FILL_THIS_FORM_SHORT
>>
>> meta __CT_CONFIDENTIAL CT_PRIVATE_EMAIL || CT_PRIVATE_PHONE ||
>> CONFIDENTIAL_SCAM1 || CONFIDENTIAL_SCAM2
>>
>> meta __CT_NOW CT_ACT_NOW || CT_DO_IT_TODAY ||
>> CT_URGENT_RESPOND
>>
>> meta CT_GOD_BENEFICIARY __CT_GOD && __CT_VICTIM
>> describe CT_GOD_BENEFICIARY God and Beneficiary
>> score CT_GOD_BENEFICIARY 4
>>
>> meta CT_GOD_BEGGER __CT_GOD && __CT_BEGGER
>> describe CT_GOD_BEGGER Begging in Religious Language
>> score CT_GOD_BEGGER 3
>>
>>
>
>
--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{
Re: Question about spam report header
Posted by RW <rw...@googlemail.com>.
On Wed, 03 Feb 2016 05:48:00 +0100
Benny Pedersen wrote:
> note:
>
> __ have no score need
> T_ must have socre, if not defined it defults to 1.0
An ordinary rule defaults to 1.0, a rule that start with T_ defaults to 0.01.
Re: Question about spam report header
Posted by Benny Pedersen <me...@junc.eu>.
On 2016-02-03 04:16, Marc Perkel wrote:
> Never mind ....
are you elvis fan ? :=)
> I found that if I change __ to T_ that it does what I want.
it does ?
note:
__ have no score need
T_ must have socre, if not defined it defults to 1.0
so did you try reading one of elvis records here:
perldoc Mail::SpamAssassin::Conf
see section about add header tags
hopefully your working licenses to kill allow you to still read terminal
docs ? :=)
>
>
> On 02/02/16 18:05, Marc Perkel wrote:
>>
>> On 02/02/16 17:55, Marc Perkel wrote:
>>> Normally SA creates a header that has a list of the names of rules
>>> that matched. It skips the listing of hidden rules that start with __
>>> .
>>>
>>> Is there a command where I can easily tell SA to include the hidden
>>> rules in the report in the headers so I can see all of it?
>>>
>>
>> I'm also - I suppose asking it to list rules that match that produce
>> no scores.
>>
>> body __LATE_RICH_RELATIVE /\blate
>> .{0,15}(?:father|wife|widow|husband|general|president|daughter|son|minister|client)/i
>>
>> body __CT_CLICK /\b(click(ing)?
>> (here|now|this|on|below|.{0,9}(hyper)?link))|visit(ing)?this link\b/i
>>
>> body __BENEFICIARY /\bbeneficiary\b/i
>>
>> body __CT_BEGGER /\b(kind assist[ae]nce|feed my
>> family|need (of )?your help|donat(e|ion))\b/i
>>
>> body __CT_CONTACT /\b((contact(?:ing) you|contact
>> (information|me|email|number|us)|your contact))|to (inform|email)
>> you/i
>>
>> body __CT_REPLY_TO_ME /\b(reply to me|please reply|my
>> email address|private email|contact me|prompt response|reply from
>> you|hearing from you|assist me)/i
>>
>> body __CT_DYING /\b(diagnosed with|months to
>> live|dying of|transplant)\b/i
>>
>> body __CT_UNITED_NATIONS /\bUnited Nations?\b/i
>>
>> meta __CT_STRANGER CT_MY_NAME_IS || CT_DEAR_FRIEND ||
>> CT_DEAR_SOMETHING || CT_SIR_MADAM || CT_INTRODUCE
>>
>> meta __CT_MONEY CT_TRANSFER_MONEY || CT_THE_SUM_OF
>> || CT_EARN_MONEY || LOTS_OF_MONEY || MILLION_USD || FUZZY_MILLION ||
>> GIVE_YOU_MONEY || __CT_BANK || BILLION_DOLLARS || US_DOLLARS_2 ||
>> ADVA$
>>
>> meta __CT_VICTIM __BENEFICIARY || CT_LATE_PRESIDENT
>> || CT_LATE_RICH_RELATIVE || __CT_DYING
>>
>> meta __CT_FORM FILL_THIS_FORM ||
>> FILL_THIS_FORM_LONG || T_FILL_THIS_FORM_SHORT
>>
>> meta __CT_CONFIDENTIAL CT_PRIVATE_EMAIL ||
>> CT_PRIVATE_PHONE || CONFIDENTIAL_SCAM1 || CONFIDENTIAL_SCAM2
>>
>> meta __CT_NOW CT_ACT_NOW || CT_DO_IT_TODAY ||
>> CT_URGENT_RESPOND
>>
>> meta CT_GOD_BENEFICIARY __CT_GOD && __CT_VICTIM
>> describe CT_GOD_BENEFICIARY God and Beneficiary
>> score CT_GOD_BENEFICIARY 4
>>
>> meta CT_GOD_BEGGER __CT_GOD && __CT_BEGGER
>> describe CT_GOD_BEGGER Begging in Religious Language
>> score CT_GOD_BEGGER 3
>>
>>
Re: Question about spam report header
Posted by Marc Perkel <su...@junkemailfilter.com>.
Never mind ....
I found that if I change __ to T_ that it does what I want.
On 02/02/16 18:05, Marc Perkel wrote:
>
> On 02/02/16 17:55, Marc Perkel wrote:
>> Normally SA creates a header that has a list of the names of rules
>> that matched. It skips the listing of hidden rules that start with __ .
>>
>> Is there a command where I can easily tell SA to include the hidden
>> rules in the report in the headers so I can see all of it?
>>
>
> I'm also - I suppose asking it to list rules that match that produce
> no scores.
>
> body __LATE_RICH_RELATIVE /\blate
> .{0,15}(?:father|wife|widow|husband|general|president|daughter|son|minister|client)/i
>
> body __CT_CLICK /\b(click(ing)?
> (here|now|this|on|below|.{0,9}(hyper)?link))|visit(ing)?this link\b/i
>
> body __BENEFICIARY /\bbeneficiary\b/i
>
> body __CT_BEGGER /\b(kind assist[ae]nce|feed my
> family|need (of )?your help|donat(e|ion))\b/i
>
> body __CT_CONTACT /\b((contact(?:ing) you|contact
> (information|me|email|number|us)|your contact))|to (inform|email) you/i
>
> body __CT_REPLY_TO_ME /\b(reply to me|please reply|my
> email address|private email|contact me|prompt response|reply from
> you|hearing from you|assist me)/i
>
> body __CT_DYING /\b(diagnosed with|months to
> live|dying of|transplant)\b/i
>
> body __CT_UNITED_NATIONS /\bUnited Nations?\b/i
>
> meta __CT_STRANGER CT_MY_NAME_IS || CT_DEAR_FRIEND ||
> CT_DEAR_SOMETHING || CT_SIR_MADAM || CT_INTRODUCE
>
> meta __CT_MONEY CT_TRANSFER_MONEY || CT_THE_SUM_OF
> || CT_EARN_MONEY || LOTS_OF_MONEY || MILLION_USD || FUZZY_MILLION ||
> GIVE_YOU_MONEY || __CT_BANK || BILLION_DOLLARS || US_DOLLARS_2 || ADVA$
>
> meta __CT_VICTIM __BENEFICIARY || CT_LATE_PRESIDENT
> || CT_LATE_RICH_RELATIVE || __CT_DYING
>
> meta __CT_FORM FILL_THIS_FORM ||
> FILL_THIS_FORM_LONG || T_FILL_THIS_FORM_SHORT
>
> meta __CT_CONFIDENTIAL CT_PRIVATE_EMAIL ||
> CT_PRIVATE_PHONE || CONFIDENTIAL_SCAM1 || CONFIDENTIAL_SCAM2
>
> meta __CT_NOW CT_ACT_NOW || CT_DO_IT_TODAY ||
> CT_URGENT_RESPOND
>
> meta CT_GOD_BENEFICIARY __CT_GOD && __CT_VICTIM
> describe CT_GOD_BENEFICIARY God and Beneficiary
> score CT_GOD_BENEFICIARY 4
>
> meta CT_GOD_BEGGER __CT_GOD && __CT_BEGGER
> describe CT_GOD_BEGGER Begging in Religious Language
> score CT_GOD_BEGGER 3
>
>
--
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400
Re: Question about spam report header
Posted by Marc Perkel <su...@junkemailfilter.com>.
On 02/02/16 17:55, Marc Perkel wrote:
> Normally SA creates a header that has a list of the names of rules
> that matched. It skips the listing of hidden rules that start with __ .
>
> Is there a command where I can easily tell SA to include the hidden
> rules in the report in the headers so I can see all of it?
>
I'm also - I suppose asking it to list rules that match that produce no
scores.
body __LATE_RICH_RELATIVE /\blate
.{0,15}(?:father|wife|widow|husband|general|president|daughter|son|minister|client)/i
body __CT_CLICK /\b(click(ing)?
(here|now|this|on|below|.{0,9}(hyper)?link))|visit(ing)?this link\b/i
body __BENEFICIARY /\bbeneficiary\b/i
body __CT_BEGGER /\b(kind assist[ae]nce|feed my
family|need (of )?your help|donat(e|ion))\b/i
body __CT_CONTACT /\b((contact(?:ing) you|contact
(information|me|email|number|us)|your contact))|to (inform|email) you/i
body __CT_REPLY_TO_ME /\b(reply to me|please reply|my email
address|private email|contact me|prompt response|reply from you|hearing
from you|assist me)/i
body __CT_DYING /\b(diagnosed with|months to
live|dying of|transplant)\b/i
body __CT_UNITED_NATIONS /\bUnited Nations?\b/i
meta __CT_STRANGER CT_MY_NAME_IS || CT_DEAR_FRIEND ||
CT_DEAR_SOMETHING || CT_SIR_MADAM || CT_INTRODUCE
meta __CT_MONEY CT_TRANSFER_MONEY || CT_THE_SUM_OF ||
CT_EARN_MONEY || LOTS_OF_MONEY || MILLION_USD || FUZZY_MILLION ||
GIVE_YOU_MONEY || __CT_BANK || BILLION_DOLLARS || US_DOLLARS_2 || ADVA$
meta __CT_VICTIM __BENEFICIARY || CT_LATE_PRESIDENT ||
CT_LATE_RICH_RELATIVE || __CT_DYING
meta __CT_FORM FILL_THIS_FORM || FILL_THIS_FORM_LONG
|| T_FILL_THIS_FORM_SHORT
meta __CT_CONFIDENTIAL CT_PRIVATE_EMAIL || CT_PRIVATE_PHONE
|| CONFIDENTIAL_SCAM1 || CONFIDENTIAL_SCAM2
meta __CT_NOW CT_ACT_NOW || CT_DO_IT_TODAY ||
CT_URGENT_RESPOND
meta CT_GOD_BENEFICIARY __CT_GOD && __CT_VICTIM
describe CT_GOD_BENEFICIARY God and Beneficiary
score CT_GOD_BENEFICIARY 4
meta CT_GOD_BEGGER __CT_GOD && __CT_BEGGER
describe CT_GOD_BEGGER Begging in Religious Language
score CT_GOD_BEGGER 3
--
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400