You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "marcin@mejor.pl" <ma...@mejor.pl> on 2017/03/07 14:33:24 UTC

seek-phrases-in-log - does it work correctly?

Hi!
I'm trying to use
masses/rule-dev/seek-phrases-in-log --reqpatlength <X>

I'm not sure if it works correctly, please look:
$ /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log --ham /home/masscheck/auto/tmp/all_w.h --spam /home/masscheck/auto/tmp/all_w.s --rules --ruleprefix __SEEK_FRAUD_ --reqpatlength 0
Tue Mar  7 15:28:32 2017: reading /home/masscheck/auto/tmp/all_w.s...
Tue Mar  7 15:28:32 2017: n-grams active: 637
Tue Mar  7 15:28:32 2017: reading /home/masscheck/auto/tmp/all_w.h...
Tue Mar  7 15:28:32 2017: n-grams active: 626
Tue Mar  7 15:28:32 2017: filtering into message subsets...
Tue Mar  7 15:28:32 2017: message subsets found: 10
Tue Mar  7 15:28:32 2017: deduping and assembling regexps...
Tue Mar  7 15:28:32 2017: working on message subset 1 (0)...
#  1.000  73.333   0.000
body __SEEK_FRAUD_8PS1M3  / interested in /
body __SEEK_FRAUD_VMFZAX  / looking for /
#  1.000  66.667   0.000
body __SEEK_FRAUD_MUS7GX  /Dear /
body __SEEK_FRAUD_Y2S6AV  /My name is .{0,20}, I am the personnel manager of a large International company\. Most of the work you can do from home, that is, at a distance\. Salary is \$2.00-\$5.00\./
#  1.000  40.000   0.000
body __SEEK_FRAUD_KPQM4S  /Re: /
#  1.000  26.667   0.000
body __SEEK_FRAUD_U38RDU  /Best regards\!/
#  1.000  26.667   0.000
body __SEEK_FRAUD_P9TXHY  /Good day/
#  1.000  20.000   0.000
body __SEEK_FRAUD_GDWWR6  /Have a nice day\!/
#  1.000  13.333   0.000
body __SEEK_FRAUD_LN5SMR  /hi\!/


but:

$ /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log --ham /home/masscheck/auto/tmp/all_w.h --spam /home/masscheck/auto/tmp/all_w.s --rules --ruleprefix __SEEK_FRAUD_ --reqpatlength 1
Tue Mar  7 15:32:14 2017: reading /home/masscheck/auto/tmp/all_w.s...
Tue Mar  7 15:32:14 2017: n-grams active: 637
Tue Mar  7 15:32:14 2017: reading /home/masscheck/auto/tmp/all_w.h...
Tue Mar  7 15:32:14 2017: n-grams active: 626
Tue Mar  7 15:32:14 2017: filtering into message subsets...
Tue Mar  7 15:32:14 2017: message subsets found: 10
Tue Mar  7 15:32:14 2017: deduping and assembling regexps...
Tue Mar  7 15:32:14 2017: working on message subset 1 (0)...
Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.


perl-5.22.3

Marcin

Re: seek-phrases-in-log - does it work correctly?

Posted by Axb <ax...@gmail.com>.
On 03/10/2017 10:55 AM, marcin@mejor.pl wrote:
<snipped>

>>>>>
>>>>> Pls open a bug to track the changes for the future.
>>>>>
>>>>> And I've got good news :)
>>>>> I'll rename the one we now have in SVN and commit my working version as
>>>>> a replacement.
>>>>
>>>> Hmm, could it be that
>>>> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=6640 isn't properly
>>>> fixed?
>>>
>>> ooops something went very belly up with that one.
>>>
>>> I'll replace and close that bug
>>> Thanks for catching this one.
>>>
>>> Axb
>>
>>
>> Please test latest version of seek-phrases-in-log
>
> Works. Thanks for help and patience, I was sure that you will give up
> with "SOA 1" answer:)

Glad to help.

I can't remember anybody else doing something with the SOUGHT routine 
and posting on the list so I was intrigued.
Please note that this part of the code was written by Justin Mason, who, 
sadly, has left the SA project and commited the bits and pieces so it 
doesn't get lost but the whole thing should be considered "unsupported".

Axb


Re: seek-phrases-in-log - does it work correctly?

Posted by "marcin@mejor.pl" <ma...@mejor.pl>.
W dniu 09.03.2017 o 17:08, Axb pisze:
> On 03/09/2017 05:03 PM, Axb wrote:
>> On 03/09/2017 04:58 PM, marcin@mejor.pl wrote:
>>> W dniu 09.03.2017 o 16:05, Axb pisze:
>>>> On 03/09/2017 03:11 PM, marcin@mejor.pl wrote:
>>>>> W dniu 09.03.2017 o 15:05, marcin@mejor.pl pisze:
>>>>>> W dniu 09.03.2017 o 14:42, Axb pisze:
>>>>>>> On 03/09/2017 02:31 PM, marcin@mejor.pl wrote:
>>>>>>>> W dniu 08.03.2017 o 17:30, Axb pisze:
>>>>>>>>> On 03/08/2017 04:55 PM, marcin@mejor.pl wrote:
>>>>>>>>>> W dniu 08.03.2017 o 16:33, Axb pisze:
>>>>>>>>>>> On 03/08/2017 04:16 PM, marcin@mejor.pl wrote:
>>>>>>>>>>>> W dniu 08.03.2017 o 16:06, Axb pisze:
>>>>>>>>>>>>> On 03/08/2017 03:58 PM, marcin@mejor.pl wrote:
>>>>>>>>>>>>>> W dniu 08.03.2017 o 15:27, Axb pisze:
>>>>>>>>>>>>>>> As your command below shows you're using --reqpatlength 0
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Start off with some sane as for example --reqpatlength 40
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> you may also want to play with --maxtextread
>>>>>>>>>>>>>>> ( I use --maxtextread 8192  for FRAUD rules)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But with --reqpatlength 10, 40, 100 or 1000 I've go no hit.
>>>>>>>>>>>>>> Reading
>>>>>>>>>>>>>> help
>>>>>>>>>>>>>> ( "--reqpatlength: required pattern length, in characters
>>>>>>>>>>>>>> (default: 0)"
>>>>>>>>>>>>>> ) I understand that pattern in generated rule will be longer
>>>>>>>>>>>>>> than
>>>>>>>>>>>>>> reqpatlength (shorter strings will be ignored). Do I
>>>>>>>>>>>>>> correctly
>>>>>>>>>>>>>> assume
>>>>>>>>>>>>>> how the parameter works?
>>>>>>>>>>>>>
>>>>>>>>>>>>> --reqpatlength 40  tells seekphrases to ignore any phrases
>>>>>>>>>>>>> which are
>>>>>>>>>>>>> smaller than 40 chars
>>>>>>>>>>>>>
>>>>>>>>>>>>> just checked by line which is using
>>>>>>>>>>>>>  --reqpatlength 37
>>>>>>>>>>>>
>>>>>>>>>>>> Any value>0 makes that no rule is generated.
>>>>>>>>>>>>
>>>>>>>>>>>>> body __AXB_FRAUD_LAF076  /It has come to our attention that
>>>>>>>>>>>>> you /
>>>>>>>>>>>>> body __AXB_FRAUD_UPVTRT  / in order to confirm your
>>>>>>>>>>>>> disbursement\./
>>>>>>>>>>>>> body __AXB_FRAUD_NOFUX2  / approval, your funds will be
>>>>>>>>>>>>> deposited
>>>>>>>>>>>>> directly into your /
>>>>>>>>>>>>> body __AXB_FRAUD_Z4ZZ7D  / in order to accept your
>>>>>>>>>>>>> disbursement\./
>>>>>>>>>>>>> body __AXB_FRAUD_CUXJ6X  / approval, your funds will be direct
>>>>>>>>>>>>> deposited
>>>>>>>>>>>>> into your /
>>>>>>>>>>>>> body __AXB_FRAUD_NHWXKL  /: You Are Eligible to Receive Funds
>>>>>>>>>>>>> up to
>>>>>>>>>>>>> \$.,000\. /
>>>>>>>>>>>>>
>>>>>>>>>>>>> hard to guess what is not working on your side without full
>>>>>>>>>>>>> insight
>>>>>>>>>>>>
>>>>>>>>>>>> What can I do to help more? Should I share all_w.h and all_w.s
>>>>>>>>>>>> files?
>>>>>>>>>>>
>>>>>>>>>>> before we go that way pls answer these questions
>>>>>>>>>>>
>>>>>>>>>>> how many spams/hams are you processing?
>>>>>>>>>>
>>>>>>>>>> ham: ~1400
>>>>>>>>>> spam: ~8200
>>>>>>>>>>
>>>>>>>>>>> do you have a file named assemble.state ? if yes, how large?
>>>>>>>>>>
>>>>>>>>>> Yes, I've got this file, it has ~9MB size.
>>>>>>>>>>
>>>>>>>>>>> and pls zip & send me the full script you're using to generate
>>>>>>>>>>> the
>>>>>>>>>>> rules, OFFLIST! do NOT post to list
>>>>>>>>>>
>>>>>>>>>> Ok, I'll choose tar.bz2 ;)
>>>>>>>>>> Thanks for help.
>>>>>>>>>
>>>>>>>>> replying on list as much as I can so it's  archived FTR
>>>>>>>>>
>>>>>>>>> first thin I see is that your logs do not contain a list of rules
>>>>>>>>> which
>>>>>>>>> hit on each message.
>>>>>>>>>
>>>>>>>>> for example my "w.s" file has lines which look like:
>>>>>>>>>
>>>>>>>>>  53
>>>>>>>>> /home/mc/Maildir/cur/1487823401.M695422P29583.ruler,S=7602,W=7747:2,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ADVANCE_FEE_2_NEW_MONEY,ADVANCE_FEE_3_NEW,ADVANCE_FEE_3_NEW_MONEY,ADVANCE_FEE_4_NEW,ADVANCE_FEE_4_NEW_MONEY,ADVANCE_FEE_5_NEW,ADVANCE_FEE_5_NEW_MONEY,AXB_XM2600,AXB_XMAILER_MIMEOLE_OL_024C2,CM_XRCVD_VOOZER4,DEAR_WINNER,FORGED_MUA_OUTLOOK,FROM_MISSPACED,FROM_MISSP_MSFT,FROM_MISSP_REPLYTO,FROM_MISSP_URI,FSL_419_FP1,FSL_CTYPE_WIN1251,FSL_MISSP_REPLYTO,FSL_NEW_HELO_USER,FSL_RCVD_USER,FSL_UA,FSL_XM_419,HK_NAME_MR_MRS,LOTS_OF_MONEY,LOTTO_DEPT,MONEY_FRAUD_3,MONEY_FRAUD_5,MONEY_FROM_MISSP,MSOE_MID_WRONG_CASE,NSL_RCVD_HELO_USER,TO_NO_BRKTS_FROM_MSSP,T_AXB_XM2600,T_BIG_HEADERS_5K,T_CM_XRCVD_VOOZER4,T_FSL_FREEMAIL_1,T_FSL_HELO_NON_FQDN_2,T_HK_MUCHMONEY,T_LOTTO_AGENT,T_SINGLE_HEADER_1K,T_TO_NO_BRKTS_MSFT,__419_FROM_SIG,__ADVANCE_FEE_2_NEW,__ADVANCE_FEE_2_NEW_MONEY,__ADVANCE_FEE_3_NEW,__ADVANCE_FEE_3_NEW_MONEY,__ADVANCE_FEE_4_NEW,__ADVANCE_FEE_4_NEW_MONEY,__ADVANCE_FEE_5_NEW,__ADVANCE_FEE_5_NEW_MONEY,__AFF_LOTTERY,__ANY_OUTLOOK_MUA,__ANY_TEXT_ATTACH,__ANY_TEXT_ATTACH_DOC,__AXB_MO_OL_024C2,__AXB_MO_OL_D8ACC,__AXB_XM_OL_024C2,__AXB_XM_OL_080C4,__AXB_XM_OL_424A6,__AXB_XM_OL_B9D6C,__BOUNCE_RPATH_NULL,__CONGRADULAT,__CT,__CTE,__CTYPE_CHARSET_QUOTED,__CT_TEXT_PLAIN,__DOS_HAS_ANY_URI,__DOS_RCVD_THU,__DOS_RCVD_WED,__DOS_RELAYED_EXT,__FB_CONGRADS,__FH_HAS_XMSMAIL,__FH_HAS_XPRIORITY,__FORGED_OE,__FRAUD_DBI,__FRAUD_FCW,__FROM_FULL_NAME,__FROM_MISSPACED,__FROM_MISSP_REPLYTO,__FROM_MISSP_URI,__FROM_RUNON,__FSL_419_1,__FSL_419_2,__FSL_419_3,__FSL_419_4,__FSL_419_5,__FSL_HELO_USER_1,__FSL_HELO_USER_3,__FSL_UA_2,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_DATE,__HAS_FROM,__HAS_MESSAGE_ID,__HAS_MIMEOLE,__HAS_MSGID,__HAS_MSMAIL_PRI,__HAS_RCVD,__HAS_REPLY_TO,__HAS_SUBJECT,__HAS_TO,__HAS_URI,__HAS_XMAIL,__HAS_X_MAILER,__HK_NAME_MR_MRS,__LAST_EXTERNAL_RELAY_NO_AUTH,__LAST_UNTRUSTED_RELAY_NO_AUTH,__LOTSA_MONEY_04,__LOTTO_ADMITS,__LOTTO_ADMITS_1,__LOTTO_WIN_01,__MIMEOLE_MS,__MIME_VERSION,__MISSING_REF,__MISSING_REPLY,__MISSING_THREAD,__MONEY_FRAUD,__MONEY_FRAUD_3,__MONEY_FRAUD_5,__MONEY_LOTTERY,__MSGID_OK_DIGITS,__MSOE_MID_WRONG_CASE,__M_NOTIFIC,__NAKED_TO,__NONEMPTY_BODY,__NO_INR_YES_REF,__OE_MUA,__RCVD_VIA_APNIC_E,__RCVD_VIA_ARIN_E,__RCVD_VIA_RIPE,__RCVD_VIA_RIPE_E,__RDNS_SHORT,__REPLYTO_EXISTS,__REPLY_FREEMAIL,__SANE_MSGID,__SARE_FRAUD_BARRISTER,__SINGLE_HEADER_1K,__SUBJ_2UPPER,__SUBJ_4LOWER,__SUBJ_HAS_WORDS,__SUBJ_NOT_SHORT,__TOCC_EXISTS,__TO_NO_ARROWS_R,__TO_NO_BRKTS_FROM_MSSP,__TO_NO_BRKTS_FROM_RUNON,__TO_NO_BRKTS_MSFT,__TO_NO_BRKTS_NOTLIST,__TVD_BODY,__TVD_MIME_ATT_TP,__URI_MAILTO,__XM_MSOE6,__XM_MS_IN_GENERAL,__XM_OUTLOOK_EXPRESS,__XPRIO,__YOU_WON,__YOU_WON_01,__YOU_WON_02,__YOU_WON_SOMTIN,__hk_million,__hk_win_1,__hk_win_5,__hk_win_6,__hk_win_b
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> time=0,scantime=0,format=f,reuse=no,set=0
>>>>>>>>>
>>>>>>>>> so apparently your masschecker is not seeing rules.
>>>>>>>>>
>>>>>>>>> I don't use --cache &  --cachedir (don't remember why) - for
>>>>>>>>> starters
>>>>>>>>> maybe remove
>>>>>>>>
>>>>>>>> I started without cache.
>>>>>>>>
>>>>>>>>> I have  --cf='use_bayes 0' (speeds up processing) and make sure
>>>>>>>>> you use
>>>>>>>>>   --cf='required_score 5'
>>>>>>>>>
>>>>>>>>> you'll have to play with your setup till your logs show SA rule
>>>>>>>>> hits.
>>>>>>>>
>>>>>>>> Therea are no SA rules because parameter "-C=/dev/null" is set.
>>>>>>>>
>>>>>>>> I don't understand something. Why do I need to check
>>>>>>>> mails-that-i-classified-as-spam-or-ham against rules? If I
>>>>>>>> understand
>>>>>>>> how creating auto rules works masscheck only dumps strings from ham
>>>>>>>> and
>>>>>>>> spam.
>>>>>>>
>>>>>>> the routine is supposed to create rules based from msgs in your spam
>>>>>>> folder and needs the ham folder to counterweight against potential
>>>>>>> FPs
>>>>>>> so for example, you don't start producing rules based on phrases in
>>>>>>> disclaimers.
>>>>>>>
>>>>>>> in the log, each line starts with Y/N and a score - not sure how
>>>>>>> necessary it is, I've always had it that way and it "works for me"
>>>>>>>
>>>>>>>> And next seek-phrases-in-log should create rules using found
>>>>>>>> strings.
>>>>>>>> I'm using script from svn with some changes in path. So I assumed
>>>>>>>> that
>>>>>>>> it should be more or less working:)
>>>>>>>
>>>>>>> a wise man once said: "to assume is not to know"
>>>>>>> why not try avoiding modifications till you get some usefull results
>>>>>>> and
>>>>>>> the start doing mods, one at a time.
>>>>>>
>>>>>> I just modified "run" script, other perl scripts are untouched.
>>>>>>
>>>>>>>> Btw, I removed -C=/dev/null , rules hit are in logs but
>>>>>>>> seek-phrases-in-log still returns no rules if I use
>>>>>>>> --reqpatlength= to
>>>>>>>> non zero value.
>>>>>>>
>>>>>>> I have no idea.
>>>>>>> I'll send you a modified seek-phrases-in-log (offlist) for you to
>>>>>>> try...
>>>>>>
>>>>>> I've got two news, bad and good.
>>>>>> The good news is you version of script works!
>>>>>> Bad news is that script in official repo doesn't work.
>>>>>> bugzilla?
>>>>>
>>>>> I see what is going. Variable maxreqpatlength isn't initialized in
>>>>> original script...
>>>>>
>>>>
>>>>
>>>> Pls open a bug to track the changes for the future.
>>>>
>>>> And I've got good news :)
>>>> I'll rename the one we now have in SVN and commit my working version as
>>>> a replacement.
>>>
>>> Hmm, could it be that
>>> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=6640 isn't properly
>>> fixed?
>>
>> ooops something went very belly up with that one.
>>
>> I'll replace and close that bug
>> Thanks for catching this one.
>>
>> Axb
> 
> 
> Please test latest version of seek-phrases-in-log

Works. Thanks for help and patience, I was sure that you will give up
with "SOA 1" answer:)

Marcin



Re: seek-phrases-in-log - does it work correctly?

Posted by Axb <ax...@gmail.com>.
On 03/09/2017 05:03 PM, Axb wrote:
> On 03/09/2017 04:58 PM, marcin@mejor.pl wrote:
>> W dniu 09.03.2017 o 16:05, Axb pisze:
>>> On 03/09/2017 03:11 PM, marcin@mejor.pl wrote:
>>>> W dniu 09.03.2017 o 15:05, marcin@mejor.pl pisze:
>>>>> W dniu 09.03.2017 o 14:42, Axb pisze:
>>>>>> On 03/09/2017 02:31 PM, marcin@mejor.pl wrote:
>>>>>>> W dniu 08.03.2017 o 17:30, Axb pisze:
>>>>>>>> On 03/08/2017 04:55 PM, marcin@mejor.pl wrote:
>>>>>>>>> W dniu 08.03.2017 o 16:33, Axb pisze:
>>>>>>>>>> On 03/08/2017 04:16 PM, marcin@mejor.pl wrote:
>>>>>>>>>>> W dniu 08.03.2017 o 16:06, Axb pisze:
>>>>>>>>>>>> On 03/08/2017 03:58 PM, marcin@mejor.pl wrote:
>>>>>>>>>>>>> W dniu 08.03.2017 o 15:27, Axb pisze:
>>>>>>>>>>>>>> As your command below shows you're using --reqpatlength 0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Start off with some sane as for example --reqpatlength 40
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> you may also want to play with --maxtextread
>>>>>>>>>>>>>> ( I use --maxtextread 8192  for FRAUD rules)
>>>>>>>>>>>>>
>>>>>>>>>>>>> But with --reqpatlength 10, 40, 100 or 1000 I've go no hit.
>>>>>>>>>>>>> Reading
>>>>>>>>>>>>> help
>>>>>>>>>>>>> ( "--reqpatlength: required pattern length, in characters
>>>>>>>>>>>>> (default: 0)"
>>>>>>>>>>>>> ) I understand that pattern in generated rule will be longer
>>>>>>>>>>>>> than
>>>>>>>>>>>>> reqpatlength (shorter strings will be ignored). Do I correctly
>>>>>>>>>>>>> assume
>>>>>>>>>>>>> how the parameter works?
>>>>>>>>>>>>
>>>>>>>>>>>> --reqpatlength 40  tells seekphrases to ignore any phrases
>>>>>>>>>>>> which are
>>>>>>>>>>>> smaller than 40 chars
>>>>>>>>>>>>
>>>>>>>>>>>> just checked by line which is using
>>>>>>>>>>>>  --reqpatlength 37
>>>>>>>>>>>
>>>>>>>>>>> Any value>0 makes that no rule is generated.
>>>>>>>>>>>
>>>>>>>>>>>> body __AXB_FRAUD_LAF076  /It has come to our attention that
>>>>>>>>>>>> you /
>>>>>>>>>>>> body __AXB_FRAUD_UPVTRT  / in order to confirm your
>>>>>>>>>>>> disbursement\./
>>>>>>>>>>>> body __AXB_FRAUD_NOFUX2  / approval, your funds will be
>>>>>>>>>>>> deposited
>>>>>>>>>>>> directly into your /
>>>>>>>>>>>> body __AXB_FRAUD_Z4ZZ7D  / in order to accept your
>>>>>>>>>>>> disbursement\./
>>>>>>>>>>>> body __AXB_FRAUD_CUXJ6X  / approval, your funds will be direct
>>>>>>>>>>>> deposited
>>>>>>>>>>>> into your /
>>>>>>>>>>>> body __AXB_FRAUD_NHWXKL  /: You Are Eligible to Receive Funds
>>>>>>>>>>>> up to
>>>>>>>>>>>> \$.,000\. /
>>>>>>>>>>>>
>>>>>>>>>>>> hard to guess what is not working on your side without full
>>>>>>>>>>>> insight
>>>>>>>>>>>
>>>>>>>>>>> What can I do to help more? Should I share all_w.h and all_w.s
>>>>>>>>>>> files?
>>>>>>>>>>
>>>>>>>>>> before we go that way pls answer these questions
>>>>>>>>>>
>>>>>>>>>> how many spams/hams are you processing?
>>>>>>>>>
>>>>>>>>> ham: ~1400
>>>>>>>>> spam: ~8200
>>>>>>>>>
>>>>>>>>>> do you have a file named assemble.state ? if yes, how large?
>>>>>>>>>
>>>>>>>>> Yes, I've got this file, it has ~9MB size.
>>>>>>>>>
>>>>>>>>>> and pls zip & send me the full script you're using to generate
>>>>>>>>>> the
>>>>>>>>>> rules, OFFLIST! do NOT post to list
>>>>>>>>>
>>>>>>>>> Ok, I'll choose tar.bz2 ;)
>>>>>>>>> Thanks for help.
>>>>>>>>
>>>>>>>> replying on list as much as I can so it's  archived FTR
>>>>>>>>
>>>>>>>> first thin I see is that your logs do not contain a list of rules
>>>>>>>> which
>>>>>>>> hit on each message.
>>>>>>>>
>>>>>>>> for example my "w.s" file has lines which look like:
>>>>>>>>
>>>>>>>>  53
>>>>>>>> /home/mc/Maildir/cur/1487823401.M695422P29583.ruler,S=7602,W=7747:2,
>>>>>>>>
>>>>>>>> ADVANCE_FEE_2_NEW_MONEY,ADVANCE_FEE_3_NEW,ADVANCE_FEE_3_NEW_MONEY,ADVANCE_FEE_4_NEW,ADVANCE_FEE_4_NEW_MONEY,ADVANCE_FEE_5_NEW,ADVANCE_FEE_5_NEW_MONEY,AXB_XM2600,AXB_XMAILER_MIMEOLE_OL_024C2,CM_XRCVD_VOOZER4,DEAR_WINNER,FORGED_MUA_OUTLOOK,FROM_MISSPACED,FROM_MISSP_MSFT,FROM_MISSP_REPLYTO,FROM_MISSP_URI,FSL_419_FP1,FSL_CTYPE_WIN1251,FSL_MISSP_REPLYTO,FSL_NEW_HELO_USER,FSL_RCVD_USER,FSL_UA,FSL_XM_419,HK_NAME_MR_MRS,LOTS_OF_MONEY,LOTTO_DEPT,MONEY_FRAUD_3,MONEY_FRAUD_5,MONEY_FROM_MISSP,MSOE_MID_WRONG_CASE,NSL_RCVD_HELO_USER,TO_NO_BRKTS_FROM_MSSP,T_AXB_XM2600,T_BIG_HEADERS_5K,T_CM_XRCVD_VOOZER4,T_FSL_FREEMAIL_1,T_FSL_HELO_NON_FQDN_2,T_HK_MUCHMONEY,T_LOTTO_AGENT,T_SINGLE_HEADER_1K,T_TO_NO_BRKTS_MSFT,__419_FROM_SIG,__ADVANCE_FEE_2_NEW,__ADVANCE_FEE_2_NEW_MONEY,__ADVANCE_FEE_3_NEW,__ADVANCE_FEE_3_NEW_MONEY,__ADVANCE_FEE_4_NEW,__ADVANCE_FEE_4_NEW_MONEY,__ADVANCE_FEE_5_NEW,__ADVANCE_FEE_5_NEW_MONEY,__AFF_LOTTERY,__ANY_OUTLOOK_MUA,__ANY_TEXT_ATTACH,__ANY_TEXT_ATTACH_DOC,__AXB_MO_OL_024C2,__AXB_MO_OL_D8ACC,__AXB_XM_OL_024C2,__AXB_XM_OL_080C4,__AXB_XM_OL_424A6,__AXB_XM_OL_B9D6C,__BOUNCE_RPATH_NULL,__CONGRADULAT,__CT,__CTE,__CTYPE_CHARSET_QUOTED,__CT_TEXT_PLAIN,__DOS_HAS_ANY_URI,__DOS_RCVD_THU,__DOS_RCVD_WED,__DOS_RELAYED_EXT,__FB_CONGRADS,__FH_HAS_XMSMAIL,__FH_HAS_XPRIORITY,__FORGED_OE,__FRAUD_DBI,__FRAUD_FCW,__FROM_FULL_NAME,__FROM_MISSPACED,__FROM_MISSP_REPLYTO,__FROM_MISSP_URI,__FROM_RUNON,__FSL_419_1,__FSL_419_2,__FSL_419_3,__FSL_419_4,__FSL_419_5,__FSL_HELO_USER_1,__FSL_HELO_USER_3,__FSL_UA_2,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_DATE,__HAS_FROM,__HAS_MESSAGE_ID,__HAS_MIMEOLE,__HAS_MSGID,__HAS_MSMAIL_PRI,__HAS_RCVD,__HAS_REPLY_TO,__HAS_SUBJECT,__HAS_TO,__HAS_URI,__HAS_XMAIL,__HAS_X_MAILER,__HK_NAME_MR_MRS,__LAST_EXTERNAL_RELAY_NO_AUTH,__LAST_UNTRUSTED_RELAY_NO_AUTH,__LOTSA_MONEY_04,__LOTTO_ADMITS,__LOTTO_ADMITS_1,__LOTTO_WIN_01,__MIMEOLE_MS,__MIME_VERSION,__MISSING_REF,__MISSING_REPLY,__MISSING_THREAD,__MONEY_FRAUD,__MONEY_FRAUD_3,__MONEY_FRAUD_5,__MONEY_LOTTERY,__MSGID_OK_DIGITS,__MSOE_MID_WRONG_CASE,__M_NOTIFIC,__NAKED_TO,__NONEMPTY_BODY,__NO_INR_YES_REF,__OE_MUA,__RCVD_VIA_APNIC_E,__RCVD_VIA_ARIN_E,__RCVD_VIA_RIPE,__RCVD_VIA_RIPE_E,__RDNS_SHORT,__REPLYTO_EXISTS,__REPLY_FREEMAIL,__SANE_MSGID,__SARE_FRAUD_BARRISTER,__SINGLE_HEADER_1K,__SUBJ_2UPPER,__SUBJ_4LOWER,__SUBJ_HAS_WORDS,__SUBJ_NOT_SHORT,__TOCC_EXISTS,__TO_NO_ARROWS_R,__TO_NO_BRKTS_FROM_MSSP,__TO_NO_BRKTS_FROM_RUNON,__TO_NO_BRKTS_MSFT,__TO_NO_BRKTS_NOTLIST,__TVD_BODY,__TVD_MIME_ATT_TP,__URI_MAILTO,__XM_MSOE6,__XM_MS_IN_GENERAL,__XM_OUTLOOK_EXPRESS,__XPRIO,__YOU_WON,__YOU_WON_01,__YOU_WON_02,__YOU_WON_SOMTIN,__hk_million,__hk_win_1,__hk_win_5,__hk_win_6,__hk_win_b
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> time=0,scantime=0,format=f,reuse=no,set=0
>>>>>>>>
>>>>>>>> so apparently your masschecker is not seeing rules.
>>>>>>>>
>>>>>>>> I don't use --cache &  --cachedir (don't remember why) - for
>>>>>>>> starters
>>>>>>>> maybe remove
>>>>>>>
>>>>>>> I started without cache.
>>>>>>>
>>>>>>>> I have  --cf='use_bayes 0' (speeds up processing) and make sure
>>>>>>>> you use
>>>>>>>>   --cf='required_score 5'
>>>>>>>>
>>>>>>>> you'll have to play with your setup till your logs show SA rule
>>>>>>>> hits.
>>>>>>>
>>>>>>> Therea are no SA rules because parameter "-C=/dev/null" is set.
>>>>>>>
>>>>>>> I don't understand something. Why do I need to check
>>>>>>> mails-that-i-classified-as-spam-or-ham against rules? If I
>>>>>>> understand
>>>>>>> how creating auto rules works masscheck only dumps strings from ham
>>>>>>> and
>>>>>>> spam.
>>>>>>
>>>>>> the routine is supposed to create rules based from msgs in your spam
>>>>>> folder and needs the ham folder to counterweight against potential
>>>>>> FPs
>>>>>> so for example, you don't start producing rules based on phrases in
>>>>>> disclaimers.
>>>>>>
>>>>>> in the log, each line starts with Y/N and a score - not sure how
>>>>>> necessary it is, I've always had it that way and it "works for me"
>>>>>>
>>>>>>> And next seek-phrases-in-log should create rules using found
>>>>>>> strings.
>>>>>>> I'm using script from svn with some changes in path. So I assumed
>>>>>>> that
>>>>>>> it should be more or less working:)
>>>>>>
>>>>>> a wise man once said: "to assume is not to know"
>>>>>> why not try avoiding modifications till you get some usefull results
>>>>>> and
>>>>>> the start doing mods, one at a time.
>>>>>
>>>>> I just modified "run" script, other perl scripts are untouched.
>>>>>
>>>>>>> Btw, I removed -C=/dev/null , rules hit are in logs but
>>>>>>> seek-phrases-in-log still returns no rules if I use
>>>>>>> --reqpatlength= to
>>>>>>> non zero value.
>>>>>>
>>>>>> I have no idea.
>>>>>> I'll send you a modified seek-phrases-in-log (offlist) for you to
>>>>>> try...
>>>>>
>>>>> I've got two news, bad and good.
>>>>> The good news is you version of script works!
>>>>> Bad news is that script in official repo doesn't work.
>>>>> bugzilla?
>>>>
>>>> I see what is going. Variable maxreqpatlength isn't initialized in
>>>> original script...
>>>>
>>>
>>>
>>> Pls open a bug to track the changes for the future.
>>>
>>> And I've got good news :)
>>> I'll rename the one we now have in SVN and commit my working version as
>>> a replacement.
>>
>> Hmm, could it be that
>> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=6640 isn't properly
>> fixed?
>
> ooops something went very belly up with that one.
>
> I'll replace and close that bug
> Thanks for catching this one.
>
> Axb


Please test latest version of seek-phrases-in-log

Thanks

Axb




Re: seek-phrases-in-log - does it work correctly?

Posted by Axb <ax...@gmail.com>.
On 03/09/2017 04:58 PM, marcin@mejor.pl wrote:
> W dniu 09.03.2017 o 16:05, Axb pisze:
>> On 03/09/2017 03:11 PM, marcin@mejor.pl wrote:
>>> W dniu 09.03.2017 o 15:05, marcin@mejor.pl pisze:
>>>> W dniu 09.03.2017 o 14:42, Axb pisze:
>>>>> On 03/09/2017 02:31 PM, marcin@mejor.pl wrote:
>>>>>> W dniu 08.03.2017 o 17:30, Axb pisze:
>>>>>>> On 03/08/2017 04:55 PM, marcin@mejor.pl wrote:
>>>>>>>> W dniu 08.03.2017 o 16:33, Axb pisze:
>>>>>>>>> On 03/08/2017 04:16 PM, marcin@mejor.pl wrote:
>>>>>>>>>> W dniu 08.03.2017 o 16:06, Axb pisze:
>>>>>>>>>>> On 03/08/2017 03:58 PM, marcin@mejor.pl wrote:
>>>>>>>>>>>> W dniu 08.03.2017 o 15:27, Axb pisze:
>>>>>>>>>>>>> As your command below shows you're using --reqpatlength 0
>>>>>>>>>>>>>
>>>>>>>>>>>>> Start off with some sane as for example --reqpatlength 40
>>>>>>>>>>>>>
>>>>>>>>>>>>> you may also want to play with --maxtextread
>>>>>>>>>>>>> ( I use --maxtextread 8192  for FRAUD rules)
>>>>>>>>>>>>
>>>>>>>>>>>> But with --reqpatlength 10, 40, 100 or 1000 I've go no hit.
>>>>>>>>>>>> Reading
>>>>>>>>>>>> help
>>>>>>>>>>>> ( "--reqpatlength: required pattern length, in characters
>>>>>>>>>>>> (default: 0)"
>>>>>>>>>>>> ) I understand that pattern in generated rule will be longer
>>>>>>>>>>>> than
>>>>>>>>>>>> reqpatlength (shorter strings will be ignored). Do I correctly
>>>>>>>>>>>> assume
>>>>>>>>>>>> how the parameter works?
>>>>>>>>>>>
>>>>>>>>>>> --reqpatlength 40  tells seekphrases to ignore any phrases
>>>>>>>>>>> which are
>>>>>>>>>>> smaller than 40 chars
>>>>>>>>>>>
>>>>>>>>>>> just checked by line which is using
>>>>>>>>>>>  --reqpatlength 37
>>>>>>>>>>
>>>>>>>>>> Any value>0 makes that no rule is generated.
>>>>>>>>>>
>>>>>>>>>>> body __AXB_FRAUD_LAF076  /It has come to our attention that you /
>>>>>>>>>>> body __AXB_FRAUD_UPVTRT  / in order to confirm your
>>>>>>>>>>> disbursement\./
>>>>>>>>>>> body __AXB_FRAUD_NOFUX2  / approval, your funds will be deposited
>>>>>>>>>>> directly into your /
>>>>>>>>>>> body __AXB_FRAUD_Z4ZZ7D  / in order to accept your
>>>>>>>>>>> disbursement\./
>>>>>>>>>>> body __AXB_FRAUD_CUXJ6X  / approval, your funds will be direct
>>>>>>>>>>> deposited
>>>>>>>>>>> into your /
>>>>>>>>>>> body __AXB_FRAUD_NHWXKL  /: You Are Eligible to Receive Funds
>>>>>>>>>>> up to
>>>>>>>>>>> \$.,000\. /
>>>>>>>>>>>
>>>>>>>>>>> hard to guess what is not working on your side without full
>>>>>>>>>>> insight
>>>>>>>>>>
>>>>>>>>>> What can I do to help more? Should I share all_w.h and all_w.s
>>>>>>>>>> files?
>>>>>>>>>
>>>>>>>>> before we go that way pls answer these questions
>>>>>>>>>
>>>>>>>>> how many spams/hams are you processing?
>>>>>>>>
>>>>>>>> ham: ~1400
>>>>>>>> spam: ~8200
>>>>>>>>
>>>>>>>>> do you have a file named assemble.state ? if yes, how large?
>>>>>>>>
>>>>>>>> Yes, I've got this file, it has ~9MB size.
>>>>>>>>
>>>>>>>>> and pls zip & send me the full script you're using to generate the
>>>>>>>>> rules, OFFLIST! do NOT post to list
>>>>>>>>
>>>>>>>> Ok, I'll choose tar.bz2 ;)
>>>>>>>> Thanks for help.
>>>>>>>
>>>>>>> replying on list as much as I can so it's  archived FTR
>>>>>>>
>>>>>>> first thin I see is that your logs do not contain a list of rules
>>>>>>> which
>>>>>>> hit on each message.
>>>>>>>
>>>>>>> for example my "w.s" file has lines which look like:
>>>>>>>
>>>>>>>  53
>>>>>>> /home/mc/Maildir/cur/1487823401.M695422P29583.ruler,S=7602,W=7747:2,
>>>>>>> ADVANCE_FEE_2_NEW_MONEY,ADVANCE_FEE_3_NEW,ADVANCE_FEE_3_NEW_MONEY,ADVANCE_FEE_4_NEW,ADVANCE_FEE_4_NEW_MONEY,ADVANCE_FEE_5_NEW,ADVANCE_FEE_5_NEW_MONEY,AXB_XM2600,AXB_XMAILER_MIMEOLE_OL_024C2,CM_XRCVD_VOOZER4,DEAR_WINNER,FORGED_MUA_OUTLOOK,FROM_MISSPACED,FROM_MISSP_MSFT,FROM_MISSP_REPLYTO,FROM_MISSP_URI,FSL_419_FP1,FSL_CTYPE_WIN1251,FSL_MISSP_REPLYTO,FSL_NEW_HELO_USER,FSL_RCVD_USER,FSL_UA,FSL_XM_419,HK_NAME_MR_MRS,LOTS_OF_MONEY,LOTTO_DEPT,MONEY_FRAUD_3,MONEY_FRAUD_5,MONEY_FROM_MISSP,MSOE_MID_WRONG_CASE,NSL_RCVD_HELO_USER,TO_NO_BRKTS_FROM_MSSP,T_AXB_XM2600,T_BIG_HEADERS_5K,T_CM_XRCVD_VOOZER4,T_FSL_FREEMAIL_1,T_FSL_HELO_NON_FQDN_2,T_HK_MUCHMONEY,T_LOTTO_AGENT,T_SINGLE_HEADER_1K,T_TO_NO_BRKTS_MSFT,__419_FROM_SIG,__ADVANCE_FEE_2_NEW,__ADVANCE_FEE_2_NEW_MONEY,__ADVANCE_FEE_3_NEW,__ADVANCE_FEE_3_NEW_MONEY,__ADVANCE_FEE_4_NEW,__ADVANCE_FEE_4_NEW_MONEY,__ADVANCE_FEE_5_NEW,__ADVANCE_FEE_5_NEW_MONEY,__AFF_LOTTERY,__ANY_OUTLOOK_MUA,__ANY_TEXT_ATTACH,__ANY_TEXT_ATTACH_DOC,__AXB_MO_OL_024C2,__AXB_MO_OL_D8ACC,__AXB_XM_OL_024C2,__AXB_XM_OL_080C4,__AXB_XM_OL_424A6,__AXB_XM_OL_B9D6C,__BOUNCE_RPATH_NULL,__CONGRADULAT,__CT,__CTE,__CTYPE_CHARSET_QUOTED,__CT_TEXT_PLAIN,__DOS_HAS_ANY_URI,__DOS_RCVD_THU,__DOS_RCVD_WED,__DOS_RELAYED_EXT,__FB_CONGRADS,__FH_HAS_XMSMAIL,__FH_HAS_XPRIORITY,__FORGED_OE,__FRAUD_DBI,__FRAUD_FCW,__FROM_FULL_NAME,__FROM_MISSPACED,__FROM_MISSP_REPLYTO,__FROM_MISSP_URI,__FROM_RUNON,__FSL_419_1,__FSL_419_2,__FSL_419_3,__FSL_419_4,__FSL_419_5,__FSL_HELO_USER_1,__FSL_HELO_USER_3,__FSL_UA_2,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_DATE,__HAS_FROM,__HAS_MESSAGE_ID,__HAS_MIMEOLE,__HAS_MSGID,__HAS_MSMAIL_PRI,__HAS_RCVD,__HAS_REPLY_TO,__HAS_SUBJECT,__HAS_TO,__HAS_URI,__HAS_XMAIL,__HAS_X_MAILER,__HK_NAME_MR_MRS,__LAST_EXTERNAL_RELAY_NO_AUTH,__LAST_UNTRUSTED_RELAY_NO_AUTH,__LOTSA_MONEY_04,__LOTTO_ADMITS,__LOTTO_ADMITS_1,__LOTTO_WIN_01,__MIMEOLE_MS,__MIME_VERSION,__MISSING_REF,__MISSING_REPLY,__MISSING_THREAD,__MONEY_FRAUD,__MONEY_FRAUD_3,__MONEY_FRAUD_5,__MONEY_LOTTERY,__MSGID_OK_DIGITS,__MSOE_MID_WRONG_CASE,__M_NOTIFIC,__NAKED_TO,__NONEMPTY_BODY,__NO_INR_YES_REF,__OE_MUA,__RCVD_VIA_APNIC_E,__RCVD_VIA_ARIN_E,__RCVD_VIA_RIPE,__RCVD_VIA_RIPE_E,__RDNS_SHORT,__REPLYTO_EXISTS,__REPLY_FREEMAIL,__SANE_MSGID,__SARE_FRAUD_BARRISTER,__SINGLE_HEADER_1K,__SUBJ_2UPPER,__SUBJ_4LOWER,__SUBJ_HAS_WORDS,__SUBJ_NOT_SHORT,__TOCC_EXISTS,__TO_NO_ARROWS_R,__TO_NO_BRKTS_FROM_MSSP,__TO_NO_BRKTS_FROM_RUNON,__TO_NO_BRKTS_MSFT,__TO_NO_BRKTS_NOTLIST,__TVD_BODY,__TVD_MIME_ATT_TP,__URI_MAILTO,__XM_MSOE6,__XM_MS_IN_GENERAL,__XM_OUTLOOK_EXPRESS,__XPRIO,__YOU_WON,__YOU_WON_01,__YOU_WON_02,__YOU_WON_SOMTIN,__hk_million,__hk_win_1,__hk_win_5,__hk_win_6,__hk_win_b
>>>>>>>
>>>>>>>
>>>>>>> time=0,scantime=0,format=f,reuse=no,set=0
>>>>>>>
>>>>>>> so apparently your masschecker is not seeing rules.
>>>>>>>
>>>>>>> I don't use --cache &  --cachedir (don't remember why) - for starters
>>>>>>> maybe remove
>>>>>>
>>>>>> I started without cache.
>>>>>>
>>>>>>> I have  --cf='use_bayes 0' (speeds up processing) and make sure
>>>>>>> you use
>>>>>>>   --cf='required_score 5'
>>>>>>>
>>>>>>> you'll have to play with your setup till your logs show SA rule hits.
>>>>>>
>>>>>> Therea are no SA rules because parameter "-C=/dev/null" is set.
>>>>>>
>>>>>> I don't understand something. Why do I need to check
>>>>>> mails-that-i-classified-as-spam-or-ham against rules? If I understand
>>>>>> how creating auto rules works masscheck only dumps strings from ham
>>>>>> and
>>>>>> spam.
>>>>>
>>>>> the routine is supposed to create rules based from msgs in your spam
>>>>> folder and needs the ham folder to counterweight against potential FPs
>>>>> so for example, you don't start producing rules based on phrases in
>>>>> disclaimers.
>>>>>
>>>>> in the log, each line starts with Y/N and a score - not sure how
>>>>> necessary it is, I've always had it that way and it "works for me"
>>>>>
>>>>>> And next seek-phrases-in-log should create rules using found strings.
>>>>>> I'm using script from svn with some changes in path. So I assumed that
>>>>>> it should be more or less working:)
>>>>>
>>>>> a wise man once said: "to assume is not to know"
>>>>> why not try avoiding modifications till you get some usefull results
>>>>> and
>>>>> the start doing mods, one at a time.
>>>>
>>>> I just modified "run" script, other perl scripts are untouched.
>>>>
>>>>>> Btw, I removed -C=/dev/null , rules hit are in logs but
>>>>>> seek-phrases-in-log still returns no rules if I use --reqpatlength= to
>>>>>> non zero value.
>>>>>
>>>>> I have no idea.
>>>>> I'll send you a modified seek-phrases-in-log (offlist) for you to
>>>>> try...
>>>>
>>>> I've got two news, bad and good.
>>>> The good news is you version of script works!
>>>> Bad news is that script in official repo doesn't work.
>>>> bugzilla?
>>>
>>> I see what is going. Variable maxreqpatlength isn't initialized in
>>> original script...
>>>
>>
>>
>> Pls open a bug to track the changes for the future.
>>
>> And I've got good news :)
>> I'll rename the one we now have in SVN and commit my working version as
>> a replacement.
>
> Hmm, could it be that
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=6640 isn't properly
> fixed?

ooops something went very belly up with that one.

I'll replace and close that bug
Thanks for catching this one.

Axb




Re: seek-phrases-in-log - does it work correctly?

Posted by "marcin@mejor.pl" <ma...@mejor.pl>.
W dniu 09.03.2017 o 16:05, Axb pisze:
> On 03/09/2017 03:11 PM, marcin@mejor.pl wrote:
>> W dniu 09.03.2017 o 15:05, marcin@mejor.pl pisze:
>>> W dniu 09.03.2017 o 14:42, Axb pisze:
>>>> On 03/09/2017 02:31 PM, marcin@mejor.pl wrote:
>>>>> W dniu 08.03.2017 o 17:30, Axb pisze:
>>>>>> On 03/08/2017 04:55 PM, marcin@mejor.pl wrote:
>>>>>>> W dniu 08.03.2017 o 16:33, Axb pisze:
>>>>>>>> On 03/08/2017 04:16 PM, marcin@mejor.pl wrote:
>>>>>>>>> W dniu 08.03.2017 o 16:06, Axb pisze:
>>>>>>>>>> On 03/08/2017 03:58 PM, marcin@mejor.pl wrote:
>>>>>>>>>>> W dniu 08.03.2017 o 15:27, Axb pisze:
>>>>>>>>>>>> As your command below shows you're using --reqpatlength 0
>>>>>>>>>>>>
>>>>>>>>>>>> Start off with some sane as for example --reqpatlength 40
>>>>>>>>>>>>
>>>>>>>>>>>> you may also want to play with --maxtextread
>>>>>>>>>>>> ( I use --maxtextread 8192  for FRAUD rules)
>>>>>>>>>>>
>>>>>>>>>>> But with --reqpatlength 10, 40, 100 or 1000 I've go no hit.
>>>>>>>>>>> Reading
>>>>>>>>>>> help
>>>>>>>>>>> ( "--reqpatlength: required pattern length, in characters
>>>>>>>>>>> (default: 0)"
>>>>>>>>>>> ) I understand that pattern in generated rule will be longer
>>>>>>>>>>> than
>>>>>>>>>>> reqpatlength (shorter strings will be ignored). Do I correctly
>>>>>>>>>>> assume
>>>>>>>>>>> how the parameter works?
>>>>>>>>>>
>>>>>>>>>> --reqpatlength 40  tells seekphrases to ignore any phrases
>>>>>>>>>> which are
>>>>>>>>>> smaller than 40 chars
>>>>>>>>>>
>>>>>>>>>> just checked by line which is using
>>>>>>>>>>  --reqpatlength 37
>>>>>>>>>
>>>>>>>>> Any value>0 makes that no rule is generated.
>>>>>>>>>
>>>>>>>>>> body __AXB_FRAUD_LAF076  /It has come to our attention that you /
>>>>>>>>>> body __AXB_FRAUD_UPVTRT  / in order to confirm your
>>>>>>>>>> disbursement\./
>>>>>>>>>> body __AXB_FRAUD_NOFUX2  / approval, your funds will be deposited
>>>>>>>>>> directly into your /
>>>>>>>>>> body __AXB_FRAUD_Z4ZZ7D  / in order to accept your
>>>>>>>>>> disbursement\./
>>>>>>>>>> body __AXB_FRAUD_CUXJ6X  / approval, your funds will be direct
>>>>>>>>>> deposited
>>>>>>>>>> into your /
>>>>>>>>>> body __AXB_FRAUD_NHWXKL  /: You Are Eligible to Receive Funds
>>>>>>>>>> up to
>>>>>>>>>> \$.,000\. /
>>>>>>>>>>
>>>>>>>>>> hard to guess what is not working on your side without full
>>>>>>>>>> insight
>>>>>>>>>
>>>>>>>>> What can I do to help more? Should I share all_w.h and all_w.s
>>>>>>>>> files?
>>>>>>>>
>>>>>>>> before we go that way pls answer these questions
>>>>>>>>
>>>>>>>> how many spams/hams are you processing?
>>>>>>>
>>>>>>> ham: ~1400
>>>>>>> spam: ~8200
>>>>>>>
>>>>>>>> do you have a file named assemble.state ? if yes, how large?
>>>>>>>
>>>>>>> Yes, I've got this file, it has ~9MB size.
>>>>>>>
>>>>>>>> and pls zip & send me the full script you're using to generate the
>>>>>>>> rules, OFFLIST! do NOT post to list
>>>>>>>
>>>>>>> Ok, I'll choose tar.bz2 ;)
>>>>>>> Thanks for help.
>>>>>>
>>>>>> replying on list as much as I can so it's  archived FTR
>>>>>>
>>>>>> first thin I see is that your logs do not contain a list of rules
>>>>>> which
>>>>>> hit on each message.
>>>>>>
>>>>>> for example my "w.s" file has lines which look like:
>>>>>>
>>>>>>  53
>>>>>> /home/mc/Maildir/cur/1487823401.M695422P29583.ruler,S=7602,W=7747:2,
>>>>>> ADVANCE_FEE_2_NEW_MONEY,ADVANCE_FEE_3_NEW,ADVANCE_FEE_3_NEW_MONEY,ADVANCE_FEE_4_NEW,ADVANCE_FEE_4_NEW_MONEY,ADVANCE_FEE_5_NEW,ADVANCE_FEE_5_NEW_MONEY,AXB_XM2600,AXB_XMAILER_MIMEOLE_OL_024C2,CM_XRCVD_VOOZER4,DEAR_WINNER,FORGED_MUA_OUTLOOK,FROM_MISSPACED,FROM_MISSP_MSFT,FROM_MISSP_REPLYTO,FROM_MISSP_URI,FSL_419_FP1,FSL_CTYPE_WIN1251,FSL_MISSP_REPLYTO,FSL_NEW_HELO_USER,FSL_RCVD_USER,FSL_UA,FSL_XM_419,HK_NAME_MR_MRS,LOTS_OF_MONEY,LOTTO_DEPT,MONEY_FRAUD_3,MONEY_FRAUD_5,MONEY_FROM_MISSP,MSOE_MID_WRONG_CASE,NSL_RCVD_HELO_USER,TO_NO_BRKTS_FROM_MSSP,T_AXB_XM2600,T_BIG_HEADERS_5K,T_CM_XRCVD_VOOZER4,T_FSL_FREEMAIL_1,T_FSL_HELO_NON_FQDN_2,T_HK_MUCHMONEY,T_LOTTO_AGENT,T_SINGLE_HEADER_1K,T_TO_NO_BRKTS_MSFT,__419_FROM_SIG,__ADVANCE_FEE_2_NEW,__ADVANCE_FEE_2_NEW_MONEY,__ADVANCE_FEE_3_NEW,__ADVANCE_FEE_3_NEW_MONEY,__ADVANCE_FEE_4_NEW,__ADVANCE_FEE_4_NEW_MONEY,__ADVANCE_FEE_5_NEW,__ADVANCE_FEE_5_NEW_MONEY,__AFF_LOTTERY,__ANY_OUTLOOK_MUA,__ANY_TEXT_ATTACH,__ANY_TEXT_ATTACH_DOC,__AXB_MO_OL_024C2,__AXB_MO_OL_D8ACC,__AXB_XM_OL_024C2,__AXB_XM_OL_080C4,__AXB_XM_OL_424A6,__AXB_XM_OL_B9D6C,__BOUNCE_RPATH_NULL,__CONGRADULAT,__CT,__CTE,__CTYPE_CHARSET_QUOTED,__CT_TEXT_PLAIN,__DOS_HAS_ANY_URI,__DOS_RCVD_THU,__DOS_RCVD_WED,__DOS_RELAYED_EXT,__FB_CONGRADS,__FH_HAS_XMSMAIL,__FH_HAS_XPRIORITY,__FORGED_OE,__FRAUD_DBI,__FRAUD_FCW,__FROM_FULL_NAME,__FROM_MISSPACED,__FROM_MISSP_REPLYTO,__FROM_MISSP_URI,__FROM_RUNON,__FSL_419_1,__FSL_419_2,__FSL_419_3,__FSL_419_4,__FSL_419_5,__FSL_HELO_USER_1,__FSL_HELO_USER_3,__FSL_UA_2,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_DATE,__HAS_FROM,__HAS_MESSAGE_ID,__HAS_MIMEOLE,__HAS_MSGID,__HAS_MSMAIL_PRI,__HAS_RCVD,__HAS_REPLY_TO,__HAS_SUBJECT,__HAS_TO,__HAS_URI,__HAS_XMAIL,__HAS_X_MAILER,__HK_NAME_MR_MRS,__LAST_EXTERNAL_RELAY_NO_AUTH,__LAST_UNTRUSTED_RELAY_NO_AUTH,__LOTSA_MONEY_04,__LOTTO_ADMITS,__LOTTO_ADMITS_1,__LOTTO_WIN_01,__MIMEOLE_MS,__MIME_VERSION,__MISSING_REF,__MISSING_REPLY,__MISSING_THREAD,__MONEY_FRAUD,__MONEY_FRAUD_3,__MONEY_FRAUD_5,__MONEY_LOTTERY,__MSGID_OK_DIGITS,__MSOE_MID_WRONG_CASE,__M_NOTIFIC,__NAKED_TO,__NONEMPTY_BODY,__NO_INR_YES_REF,__OE_MUA,__RCVD_VIA_APNIC_E,__RCVD_VIA_ARIN_E,__RCVD_VIA_RIPE,__RCVD_VIA_RIPE_E,__RDNS_SHORT,__REPLYTO_EXISTS,__REPLY_FREEMAIL,__SANE_MSGID,__SARE_FRAUD_BARRISTER,__SINGLE_HEADER_1K,__SUBJ_2UPPER,__SUBJ_4LOWER,__SUBJ_HAS_WORDS,__SUBJ_NOT_SHORT,__TOCC_EXISTS,__TO_NO_ARROWS_R,__TO_NO_BRKTS_FROM_MSSP,__TO_NO_BRKTS_FROM_RUNON,__TO_NO_BRKTS_MSFT,__TO_NO_BRKTS_NOTLIST,__TVD_BODY,__TVD_MIME_ATT_TP,__URI_MAILTO,__XM_MSOE6,__XM_MS_IN_GENERAL,__XM_OUTLOOK_EXPRESS,__XPRIO,__YOU_WON,__YOU_WON_01,__YOU_WON_02,__YOU_WON_SOMTIN,__hk_million,__hk_win_1,__hk_win_5,__hk_win_6,__hk_win_b
>>>>>>
>>>>>>
>>>>>> time=0,scantime=0,format=f,reuse=no,set=0
>>>>>>
>>>>>> so apparently your masschecker is not seeing rules.
>>>>>>
>>>>>> I don't use --cache &  --cachedir (don't remember why) - for starters
>>>>>> maybe remove
>>>>>
>>>>> I started without cache.
>>>>>
>>>>>> I have  --cf='use_bayes 0' (speeds up processing) and make sure
>>>>>> you use
>>>>>>   --cf='required_score 5'
>>>>>>
>>>>>> you'll have to play with your setup till your logs show SA rule hits.
>>>>>
>>>>> Therea are no SA rules because parameter "-C=/dev/null" is set.
>>>>>
>>>>> I don't understand something. Why do I need to check
>>>>> mails-that-i-classified-as-spam-or-ham against rules? If I understand
>>>>> how creating auto rules works masscheck only dumps strings from ham
>>>>> and
>>>>> spam.
>>>>
>>>> the routine is supposed to create rules based from msgs in your spam
>>>> folder and needs the ham folder to counterweight against potential FPs
>>>> so for example, you don't start producing rules based on phrases in
>>>> disclaimers.
>>>>
>>>> in the log, each line starts with Y/N and a score - not sure how
>>>> necessary it is, I've always had it that way and it "works for me"
>>>>
>>>>> And next seek-phrases-in-log should create rules using found strings.
>>>>> I'm using script from svn with some changes in path. So I assumed that
>>>>> it should be more or less working:)
>>>>
>>>> a wise man once said: "to assume is not to know"
>>>> why not try avoiding modifications till you get some usefull results
>>>> and
>>>> the start doing mods, one at a time.
>>>
>>> I just modified "run" script, other perl scripts are untouched.
>>>
>>>>> Btw, I removed -C=/dev/null , rules hit are in logs but
>>>>> seek-phrases-in-log still returns no rules if I use --reqpatlength= to
>>>>> non zero value.
>>>>
>>>> I have no idea.
>>>> I'll send you a modified seek-phrases-in-log (offlist) for you to
>>>> try...
>>>
>>> I've got two news, bad and good.
>>> The good news is you version of script works!
>>> Bad news is that script in official repo doesn't work.
>>> bugzilla?
>>
>> I see what is going. Variable maxreqpatlength isn't initialized in
>> original script...
>>
> 
> 
> Pls open a bug to track the changes for the future.
> 
> And I've got good news :)
> I'll rename the one we now have in SVN and commit my working version as
> a replacement.

Hmm, could it be that
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=6640 isn't properly
fixed?


Re: seek-phrases-in-log - does it work correctly?

Posted by Axb <ax...@gmail.com>.
On 03/09/2017 03:11 PM, marcin@mejor.pl wrote:
> W dniu 09.03.2017 o 15:05, marcin@mejor.pl pisze:
>> W dniu 09.03.2017 o 14:42, Axb pisze:
>>> On 03/09/2017 02:31 PM, marcin@mejor.pl wrote:
>>>> W dniu 08.03.2017 o 17:30, Axb pisze:
>>>>> On 03/08/2017 04:55 PM, marcin@mejor.pl wrote:
>>>>>> W dniu 08.03.2017 o 16:33, Axb pisze:
>>>>>>> On 03/08/2017 04:16 PM, marcin@mejor.pl wrote:
>>>>>>>> W dniu 08.03.2017 o 16:06, Axb pisze:
>>>>>>>>> On 03/08/2017 03:58 PM, marcin@mejor.pl wrote:
>>>>>>>>>> W dniu 08.03.2017 o 15:27, Axb pisze:
>>>>>>>>>>> As your command below shows you're using --reqpatlength 0
>>>>>>>>>>>
>>>>>>>>>>> Start off with some sane as for example --reqpatlength 40
>>>>>>>>>>>
>>>>>>>>>>> you may also want to play with --maxtextread
>>>>>>>>>>> ( I use --maxtextread 8192  for FRAUD rules)
>>>>>>>>>>
>>>>>>>>>> But with --reqpatlength 10, 40, 100 or 1000 I've go no hit. Reading
>>>>>>>>>> help
>>>>>>>>>> ( "--reqpatlength: required pattern length, in characters
>>>>>>>>>> (default: 0)"
>>>>>>>>>> ) I understand that pattern in generated rule will be longer than
>>>>>>>>>> reqpatlength (shorter strings will be ignored). Do I correctly
>>>>>>>>>> assume
>>>>>>>>>> how the parameter works?
>>>>>>>>>
>>>>>>>>> --reqpatlength 40  tells seekphrases to ignore any phrases which are
>>>>>>>>> smaller than 40 chars
>>>>>>>>>
>>>>>>>>> just checked by line which is using
>>>>>>>>>  --reqpatlength 37
>>>>>>>>
>>>>>>>> Any value>0 makes that no rule is generated.
>>>>>>>>
>>>>>>>>> body __AXB_FRAUD_LAF076  /It has come to our attention that you /
>>>>>>>>> body __AXB_FRAUD_UPVTRT  / in order to confirm your disbursement\./
>>>>>>>>> body __AXB_FRAUD_NOFUX2  / approval, your funds will be deposited
>>>>>>>>> directly into your /
>>>>>>>>> body __AXB_FRAUD_Z4ZZ7D  / in order to accept your disbursement\./
>>>>>>>>> body __AXB_FRAUD_CUXJ6X  / approval, your funds will be direct
>>>>>>>>> deposited
>>>>>>>>> into your /
>>>>>>>>> body __AXB_FRAUD_NHWXKL  /: You Are Eligible to Receive Funds up to
>>>>>>>>> \$.,000\. /
>>>>>>>>>
>>>>>>>>> hard to guess what is not working on your side without full insight
>>>>>>>>
>>>>>>>> What can I do to help more? Should I share all_w.h and all_w.s files?
>>>>>>>
>>>>>>> before we go that way pls answer these questions
>>>>>>>
>>>>>>> how many spams/hams are you processing?
>>>>>>
>>>>>> ham: ~1400
>>>>>> spam: ~8200
>>>>>>
>>>>>>> do you have a file named assemble.state ? if yes, how large?
>>>>>>
>>>>>> Yes, I've got this file, it has ~9MB size.
>>>>>>
>>>>>>> and pls zip & send me the full script you're using to generate the
>>>>>>> rules, OFFLIST! do NOT post to list
>>>>>>
>>>>>> Ok, I'll choose tar.bz2 ;)
>>>>>> Thanks for help.
>>>>>
>>>>> replying on list as much as I can so it's  archived FTR
>>>>>
>>>>> first thin I see is that your logs do not contain a list of rules which
>>>>> hit on each message.
>>>>>
>>>>> for example my "w.s" file has lines which look like:
>>>>>
>>>>>  53 /home/mc/Maildir/cur/1487823401.M695422P29583.ruler,S=7602,W=7747:2,
>>>>> ADVANCE_FEE_2_NEW_MONEY,ADVANCE_FEE_3_NEW,ADVANCE_FEE_3_NEW_MONEY,ADVANCE_FEE_4_NEW,ADVANCE_FEE_4_NEW_MONEY,ADVANCE_FEE_5_NEW,ADVANCE_FEE_5_NEW_MONEY,AXB_XM2600,AXB_XMAILER_MIMEOLE_OL_024C2,CM_XRCVD_VOOZER4,DEAR_WINNER,FORGED_MUA_OUTLOOK,FROM_MISSPACED,FROM_MISSP_MSFT,FROM_MISSP_REPLYTO,FROM_MISSP_URI,FSL_419_FP1,FSL_CTYPE_WIN1251,FSL_MISSP_REPLYTO,FSL_NEW_HELO_USER,FSL_RCVD_USER,FSL_UA,FSL_XM_419,HK_NAME_MR_MRS,LOTS_OF_MONEY,LOTTO_DEPT,MONEY_FRAUD_3,MONEY_FRAUD_5,MONEY_FROM_MISSP,MSOE_MID_WRONG_CASE,NSL_RCVD_HELO_USER,TO_NO_BRKTS_FROM_MSSP,T_AXB_XM2600,T_BIG_HEADERS_5K,T_CM_XRCVD_VOOZER4,T_FSL_FREEMAIL_1,T_FSL_HELO_NON_FQDN_2,T_HK_MUCHMONEY,T_LOTTO_AGENT,T_SINGLE_HEADER_1K,T_TO_NO_BRKTS_MSFT,__419_FROM_SIG,__ADVANCE_FEE_2_NEW,__ADVANCE_FEE_2_NEW_MONEY,__ADVANCE_FEE_3_NEW,__ADVANCE_FEE_3_NEW_MONEY,__ADVANCE_FEE_4_NEW,__ADVANCE_FEE_4_NEW_MONEY,__ADVANCE_FEE_5_NEW,__ADVANCE_FEE_5_NEW_MONEY,__AFF_LOTTERY,__ANY_OUTLOOK_MUA,__ANY_TEXT_ATTACH,__ANY_TEXT_ATTACH_DOC,__AXB_MO_OL_024C2,__AXB_MO_OL_D8ACC,__AXB_XM_OL_024C2,__AXB_XM_OL_080C4,__AXB_XM_OL_424A6,__AXB_XM_OL_B9D6C,__BOUNCE_RPATH_NULL,__CONGRADULAT,__CT,__CTE,__CTYPE_CHARSET_QUOTED,__CT_TEXT_PLAIN,__DOS_HAS_ANY_URI,__DOS_RCVD_THU,__DOS_RCVD_WED,__DOS_RELAYED_EXT,__FB_CONGRADS,__FH_HAS_XMSMAIL,__FH_HAS_XPRIORITY,__FORGED_OE,__FRAUD_DBI,__FRAUD_FCW,__FROM_FULL_NAME,__FROM_MISSPACED,__FROM_MISSP_REPLYTO,__FROM_MISSP_URI,__FROM_RUNON,__FSL_419_1,__FSL_419_2,__FSL_419_3,__FSL_419_4,__FSL_419_5,__FSL_HELO_USER_1,__FSL_HELO_USER_3,__FSL_UA_2,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_DATE,__HAS_FROM,__HAS_MESSAGE_ID,__HAS_MIMEOLE,__HAS_MSGID,__HAS_MSMAIL_PRI,__HAS_RCVD,__HAS_REPLY_TO,__HAS_SUBJECT,__HAS_TO,__HAS_URI,__HAS_XMAIL,__HAS_X_MAILER,__HK_NAME_MR_MRS,__LAST_EXTERNAL_RELAY_NO_AUTH,__LAST_UNTRUSTED_RELAY_NO_AUTH,__LOTSA_MONEY_04,__LOTTO_ADMITS,__LOTTO_ADMITS_1,__LOTTO_WIN_01,__MIMEOLE_MS,__MIME_VERSION,__MISSING_REF,__MISSING_REPLY,__MISSING_THREAD,__MONEY_FRAUD,__MONEY_FRAUD_3,__MONEY_FRAUD_5,__MONEY_LOTTERY,__MSGID_OK_DIGITS,__MSOE_MID_WRONG_CASE,__M_NOTIFIC,__NAKED_TO,__NONEMPTY_BODY,__NO_INR_YES_REF,__OE_MUA,__RCVD_VIA_APNIC_E,__RCVD_VIA_ARIN_E,__RCVD_VIA_RIPE,__RCVD_VIA_RIPE_E,__RDNS_SHORT,__REPLYTO_EXISTS,__REPLY_FREEMAIL,__SANE_MSGID,__SARE_FRAUD_BARRISTER,__SINGLE_HEADER_1K,__SUBJ_2UPPER,__SUBJ_4LOWER,__SUBJ_HAS_WORDS,__SUBJ_NOT_SHORT,__TOCC_EXISTS,__TO_NO_ARROWS_R,__TO_NO_BRKTS_FROM_MSSP,__TO_NO_BRKTS_FROM_RUNON,__TO_NO_BRKTS_MSFT,__TO_NO_BRKTS_NOTLIST,__TVD_BODY,__TVD_MIME_ATT_TP,__URI_MAILTO,__XM_MSOE6,__XM_MS_IN_GENERAL,__XM_OUTLOOK_EXPRESS,__XPRIO,__YOU_WON,__YOU_WON_01,__YOU_WON_02,__YOU_WON_SOMTIN,__hk_million,__hk_win_1,__hk_win_5,__hk_win_6,__hk_win_b
>>>>>
>>>>> time=0,scantime=0,format=f,reuse=no,set=0
>>>>>
>>>>> so apparently your masschecker is not seeing rules.
>>>>>
>>>>> I don't use --cache &  --cachedir (don't remember why) - for starters
>>>>> maybe remove
>>>>
>>>> I started without cache.
>>>>
>>>>> I have  --cf='use_bayes 0' (speeds up processing) and make sure you use
>>>>>   --cf='required_score 5'
>>>>>
>>>>> you'll have to play with your setup till your logs show SA rule hits.
>>>>
>>>> Therea are no SA rules because parameter "-C=/dev/null" is set.
>>>>
>>>> I don't understand something. Why do I need to check
>>>> mails-that-i-classified-as-spam-or-ham against rules? If I understand
>>>> how creating auto rules works masscheck only dumps strings from ham and
>>>> spam.
>>>
>>> the routine is supposed to create rules based from msgs in your spam
>>> folder and needs the ham folder to counterweight against potential FPs
>>> so for example, you don't start producing rules based on phrases in
>>> disclaimers.
>>>
>>> in the log, each line starts with Y/N and a score - not sure how
>>> necessary it is, I've always had it that way and it "works for me"
>>>
>>>> And next seek-phrases-in-log should create rules using found strings.
>>>> I'm using script from svn with some changes in path. So I assumed that
>>>> it should be more or less working:)
>>>
>>> a wise man once said: "to assume is not to know"
>>> why not try avoiding modifications till you get some usefull results and
>>> the start doing mods, one at a time.
>>
>> I just modified "run" script, other perl scripts are untouched.
>>
>>>> Btw, I removed -C=/dev/null , rules hit are in logs but
>>>> seek-phrases-in-log still returns no rules if I use --reqpatlength= to
>>>> non zero value.
>>>
>>> I have no idea.
>>> I'll send you a modified seek-phrases-in-log (offlist) for you to try...
>>
>> I've got two news, bad and good.
>> The good news is you version of script works!
>> Bad news is that script in official repo doesn't work.
>> bugzilla?
>
> I see what is going. Variable maxreqpatlength isn't initialized in
> original script...
>


Pls open a bug to track the changes for the future.

And I've got good news :)
I'll rename the one we now have in SVN and commit my working version as 
a replacement.

Thx

Re: seek-phrases-in-log - does it work correctly?

Posted by "marcin@mejor.pl" <ma...@mejor.pl>.
W dniu 09.03.2017 o 15:05, marcin@mejor.pl pisze:
> W dniu 09.03.2017 o 14:42, Axb pisze:
>> On 03/09/2017 02:31 PM, marcin@mejor.pl wrote:
>>> W dniu 08.03.2017 o 17:30, Axb pisze:
>>>> On 03/08/2017 04:55 PM, marcin@mejor.pl wrote:
>>>>> W dniu 08.03.2017 o 16:33, Axb pisze:
>>>>>> On 03/08/2017 04:16 PM, marcin@mejor.pl wrote:
>>>>>>> W dniu 08.03.2017 o 16:06, Axb pisze:
>>>>>>>> On 03/08/2017 03:58 PM, marcin@mejor.pl wrote:
>>>>>>>>> W dniu 08.03.2017 o 15:27, Axb pisze:
>>>>>>>>>> As your command below shows you're using --reqpatlength 0
>>>>>>>>>>
>>>>>>>>>> Start off with some sane as for example --reqpatlength 40
>>>>>>>>>>
>>>>>>>>>> you may also want to play with --maxtextread
>>>>>>>>>> ( I use --maxtextread 8192  for FRAUD rules)
>>>>>>>>>
>>>>>>>>> But with --reqpatlength 10, 40, 100 or 1000 I've go no hit. Reading
>>>>>>>>> help
>>>>>>>>> ( "--reqpatlength: required pattern length, in characters
>>>>>>>>> (default: 0)"
>>>>>>>>> ) I understand that pattern in generated rule will be longer than
>>>>>>>>> reqpatlength (shorter strings will be ignored). Do I correctly
>>>>>>>>> assume
>>>>>>>>> how the parameter works?
>>>>>>>>
>>>>>>>> --reqpatlength 40  tells seekphrases to ignore any phrases which are
>>>>>>>> smaller than 40 chars
>>>>>>>>
>>>>>>>> just checked by line which is using
>>>>>>>>  --reqpatlength 37
>>>>>>>
>>>>>>> Any value>0 makes that no rule is generated.
>>>>>>>
>>>>>>>> body __AXB_FRAUD_LAF076  /It has come to our attention that you /
>>>>>>>> body __AXB_FRAUD_UPVTRT  / in order to confirm your disbursement\./
>>>>>>>> body __AXB_FRAUD_NOFUX2  / approval, your funds will be deposited
>>>>>>>> directly into your /
>>>>>>>> body __AXB_FRAUD_Z4ZZ7D  / in order to accept your disbursement\./
>>>>>>>> body __AXB_FRAUD_CUXJ6X  / approval, your funds will be direct
>>>>>>>> deposited
>>>>>>>> into your /
>>>>>>>> body __AXB_FRAUD_NHWXKL  /: You Are Eligible to Receive Funds up to
>>>>>>>> \$.,000\. /
>>>>>>>>
>>>>>>>> hard to guess what is not working on your side without full insight
>>>>>>>
>>>>>>> What can I do to help more? Should I share all_w.h and all_w.s files?
>>>>>>
>>>>>> before we go that way pls answer these questions
>>>>>>
>>>>>> how many spams/hams are you processing?
>>>>>
>>>>> ham: ~1400
>>>>> spam: ~8200
>>>>>
>>>>>> do you have a file named assemble.state ? if yes, how large?
>>>>>
>>>>> Yes, I've got this file, it has ~9MB size.
>>>>>
>>>>>> and pls zip & send me the full script you're using to generate the
>>>>>> rules, OFFLIST! do NOT post to list
>>>>>
>>>>> Ok, I'll choose tar.bz2 ;)
>>>>> Thanks for help.
>>>>
>>>> replying on list as much as I can so it's  archived FTR
>>>>
>>>> first thin I see is that your logs do not contain a list of rules which
>>>> hit on each message.
>>>>
>>>> for example my "w.s" file has lines which look like:
>>>>
>>>>  53 /home/mc/Maildir/cur/1487823401.M695422P29583.ruler,S=7602,W=7747:2,
>>>> ADVANCE_FEE_2_NEW_MONEY,ADVANCE_FEE_3_NEW,ADVANCE_FEE_3_NEW_MONEY,ADVANCE_FEE_4_NEW,ADVANCE_FEE_4_NEW_MONEY,ADVANCE_FEE_5_NEW,ADVANCE_FEE_5_NEW_MONEY,AXB_XM2600,AXB_XMAILER_MIMEOLE_OL_024C2,CM_XRCVD_VOOZER4,DEAR_WINNER,FORGED_MUA_OUTLOOK,FROM_MISSPACED,FROM_MISSP_MSFT,FROM_MISSP_REPLYTO,FROM_MISSP_URI,FSL_419_FP1,FSL_CTYPE_WIN1251,FSL_MISSP_REPLYTO,FSL_NEW_HELO_USER,FSL_RCVD_USER,FSL_UA,FSL_XM_419,HK_NAME_MR_MRS,LOTS_OF_MONEY,LOTTO_DEPT,MONEY_FRAUD_3,MONEY_FRAUD_5,MONEY_FROM_MISSP,MSOE_MID_WRONG_CASE,NSL_RCVD_HELO_USER,TO_NO_BRKTS_FROM_MSSP,T_AXB_XM2600,T_BIG_HEADERS_5K,T_CM_XRCVD_VOOZER4,T_FSL_FREEMAIL_1,T_FSL_HELO_NON_FQDN_2,T_HK_MUCHMONEY,T_LOTTO_AGENT,T_SINGLE_HEADER_1K,T_TO_NO_BRKTS_MSFT,__419_FROM_SIG,__ADVANCE_FEE_2_NEW,__ADVANCE_FEE_2_NEW_MONEY,__ADVANCE_FEE_3_NEW,__ADVANCE_FEE_3_NEW_MONEY,__ADVANCE_FEE_4_NEW,__ADVANCE_FEE_4_NEW_MONEY,__ADVANCE_FEE_5_NEW,__ADVANCE_FEE_5_NEW_MONEY,__AFF_LOTTERY,__ANY_OUTLOOK_MUA,__ANY_TEXT_ATTACH,__ANY_TEXT_ATTACH_DOC,__AXB_MO_OL_024C2,__AXB_MO_OL_D8ACC,__AXB_XM_OL_024C2,__AXB_XM_OL_080C4,__AXB_XM_OL_424A6,__AXB_XM_OL_B9D6C,__BOUNCE_RPATH_NULL,__CONGRADULAT,__CT,__CTE,__CTYPE_CHARSET_QUOTED,__CT_TEXT_PLAIN,__DOS_HAS_ANY_URI,__DOS_RCVD_THU,__DOS_RCVD_WED,__DOS_RELAYED_EXT,__FB_CONGRADS,__FH_HAS_XMSMAIL,__FH_HAS_XPRIORITY,__FORGED_OE,__FRAUD_DBI,__FRAUD_FCW,__FROM_FULL_NAME,__FROM_MISSPACED,__FROM_MISSP_REPLYTO,__FROM_MISSP_URI,__FROM_RUNON,__FSL_419_1,__FSL_419_2,__FSL_419_3,__FSL_419_4,__FSL_419_5,__FSL_HELO_USER_1,__FSL_HELO_USER_3,__FSL_UA_2,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_DATE,__HAS_FROM,__HAS_MESSAGE_ID,__HAS_MIMEOLE,__HAS_MSGID,__HAS_MSMAIL_PRI,__HAS_RCVD,__HAS_REPLY_TO,__HAS_SUBJECT,__HAS_TO,__HAS_URI,__HAS_XMAIL,__HAS_X_MAILER,__HK_NAME_MR_MRS,__LAST_EXTERNAL_RELAY_NO_AUTH,__LAST_UNTRUSTED_RELAY_NO_AUTH,__LOTSA_MONEY_04,__LOTTO_ADMITS,__LOTTO_ADMITS_1,__LOTTO_WIN_01,__MIMEOLE_MS,__MIME_VERSION,__MISSING_REF,__MISSING_REPLY,__MISSING_THREAD,__MONEY_FRAUD,__MONEY_FRAUD_3,__MONEY_FRAUD_5,__MONEY_LOTTERY,__MSGID_OK_DIGITS,__MSOE_MID_WRONG_CASE,__M_NOTIFIC,__NAKED_TO,__NONEMPTY_BODY,__NO_INR_YES_REF,__OE_MUA,__RCVD_VIA_APNIC_E,__RCVD_VIA_ARIN_E,__RCVD_VIA_RIPE,__RCVD_VIA_RIPE_E,__RDNS_SHORT,__REPLYTO_EXISTS,__REPLY_FREEMAIL,__SANE_MSGID,__SARE_FRAUD_BARRISTER,__SINGLE_HEADER_1K,__SUBJ_2UPPER,__SUBJ_4LOWER,__SUBJ_HAS_WORDS,__SUBJ_NOT_SHORT,__TOCC_EXISTS,__TO_NO_ARROWS_R,__TO_NO_BRKTS_FROM_MSSP,__TO_NO_BRKTS_FROM_RUNON,__TO_NO_BRKTS_MSFT,__TO_NO_BRKTS_NOTLIST,__TVD_BODY,__TVD_MIME_ATT_TP,__URI_MAILTO,__XM_MSOE6,__XM_MS_IN_GENERAL,__XM_OUTLOOK_EXPRESS,__XPRIO,__YOU_WON,__YOU_WON_01,__YOU_WON_02,__YOU_WON_SOMTIN,__hk_million,__hk_win_1,__hk_win_5,__hk_win_6,__hk_win_b
>>>>
>>>> time=0,scantime=0,format=f,reuse=no,set=0
>>>>
>>>> so apparently your masschecker is not seeing rules.
>>>>
>>>> I don't use --cache &  --cachedir (don't remember why) - for starters
>>>> maybe remove
>>>
>>> I started without cache.
>>>
>>>> I have  --cf='use_bayes 0' (speeds up processing) and make sure you use
>>>>   --cf='required_score 5'
>>>>
>>>> you'll have to play with your setup till your logs show SA rule hits.
>>>
>>> Therea are no SA rules because parameter "-C=/dev/null" is set.
>>>
>>> I don't understand something. Why do I need to check
>>> mails-that-i-classified-as-spam-or-ham against rules? If I understand
>>> how creating auto rules works masscheck only dumps strings from ham and
>>> spam.
>>
>> the routine is supposed to create rules based from msgs in your spam
>> folder and needs the ham folder to counterweight against potential FPs
>> so for example, you don't start producing rules based on phrases in
>> disclaimers.
>>
>> in the log, each line starts with Y/N and a score - not sure how
>> necessary it is, I've always had it that way and it "works for me"
>>
>>> And next seek-phrases-in-log should create rules using found strings.
>>> I'm using script from svn with some changes in path. So I assumed that
>>> it should be more or less working:)
>>
>> a wise man once said: "to assume is not to know"
>> why not try avoiding modifications till you get some usefull results and
>> the start doing mods, one at a time.
> 
> I just modified "run" script, other perl scripts are untouched.
> 
>>> Btw, I removed -C=/dev/null , rules hit are in logs but
>>> seek-phrases-in-log still returns no rules if I use --reqpatlength= to
>>> non zero value.
>>
>> I have no idea.
>> I'll send you a modified seek-phrases-in-log (offlist) for you to try...
> 
> I've got two news, bad and good.
> The good news is you version of script works!
> Bad news is that script in official repo doesn't work.
> bugzilla?

I see what is going. Variable maxreqpatlength isn't initialized in
original script...


Re: seek-phrases-in-log - does it work correctly?

Posted by "marcin@mejor.pl" <ma...@mejor.pl>.
W dniu 09.03.2017 o 14:42, Axb pisze:
> On 03/09/2017 02:31 PM, marcin@mejor.pl wrote:
>> W dniu 08.03.2017 o 17:30, Axb pisze:
>>> On 03/08/2017 04:55 PM, marcin@mejor.pl wrote:
>>>> W dniu 08.03.2017 o 16:33, Axb pisze:
>>>>> On 03/08/2017 04:16 PM, marcin@mejor.pl wrote:
>>>>>> W dniu 08.03.2017 o 16:06, Axb pisze:
>>>>>>> On 03/08/2017 03:58 PM, marcin@mejor.pl wrote:
>>>>>>>> W dniu 08.03.2017 o 15:27, Axb pisze:
>>>>>>>>> As your command below shows you're using --reqpatlength 0
>>>>>>>>>
>>>>>>>>> Start off with some sane as for example --reqpatlength 40
>>>>>>>>>
>>>>>>>>> you may also want to play with --maxtextread
>>>>>>>>> ( I use --maxtextread 8192  for FRAUD rules)
>>>>>>>>
>>>>>>>> But with --reqpatlength 10, 40, 100 or 1000 I've go no hit. Reading
>>>>>>>> help
>>>>>>>> ( "--reqpatlength: required pattern length, in characters
>>>>>>>> (default: 0)"
>>>>>>>> ) I understand that pattern in generated rule will be longer than
>>>>>>>> reqpatlength (shorter strings will be ignored). Do I correctly
>>>>>>>> assume
>>>>>>>> how the parameter works?
>>>>>>>
>>>>>>> --reqpatlength 40  tells seekphrases to ignore any phrases which are
>>>>>>> smaller than 40 chars
>>>>>>>
>>>>>>> just checked by line which is using
>>>>>>>  --reqpatlength 37
>>>>>>
>>>>>> Any value>0 makes that no rule is generated.
>>>>>>
>>>>>>> body __AXB_FRAUD_LAF076  /It has come to our attention that you /
>>>>>>> body __AXB_FRAUD_UPVTRT  / in order to confirm your disbursement\./
>>>>>>> body __AXB_FRAUD_NOFUX2  / approval, your funds will be deposited
>>>>>>> directly into your /
>>>>>>> body __AXB_FRAUD_Z4ZZ7D  / in order to accept your disbursement\./
>>>>>>> body __AXB_FRAUD_CUXJ6X  / approval, your funds will be direct
>>>>>>> deposited
>>>>>>> into your /
>>>>>>> body __AXB_FRAUD_NHWXKL  /: You Are Eligible to Receive Funds up to
>>>>>>> \$.,000\. /
>>>>>>>
>>>>>>> hard to guess what is not working on your side without full insight
>>>>>>
>>>>>> What can I do to help more? Should I share all_w.h and all_w.s files?
>>>>>
>>>>> before we go that way pls answer these questions
>>>>>
>>>>> how many spams/hams are you processing?
>>>>
>>>> ham: ~1400
>>>> spam: ~8200
>>>>
>>>>> do you have a file named assemble.state ? if yes, how large?
>>>>
>>>> Yes, I've got this file, it has ~9MB size.
>>>>
>>>>> and pls zip & send me the full script you're using to generate the
>>>>> rules, OFFLIST! do NOT post to list
>>>>
>>>> Ok, I'll choose tar.bz2 ;)
>>>> Thanks for help.
>>>
>>> replying on list as much as I can so it's  archived FTR
>>>
>>> first thin I see is that your logs do not contain a list of rules which
>>> hit on each message.
>>>
>>> for example my "w.s" file has lines which look like:
>>>
>>>  53 /home/mc/Maildir/cur/1487823401.M695422P29583.ruler,S=7602,W=7747:2,
>>> ADVANCE_FEE_2_NEW_MONEY,ADVANCE_FEE_3_NEW,ADVANCE_FEE_3_NEW_MONEY,ADVANCE_FEE_4_NEW,ADVANCE_FEE_4_NEW_MONEY,ADVANCE_FEE_5_NEW,ADVANCE_FEE_5_NEW_MONEY,AXB_XM2600,AXB_XMAILER_MIMEOLE_OL_024C2,CM_XRCVD_VOOZER4,DEAR_WINNER,FORGED_MUA_OUTLOOK,FROM_MISSPACED,FROM_MISSP_MSFT,FROM_MISSP_REPLYTO,FROM_MISSP_URI,FSL_419_FP1,FSL_CTYPE_WIN1251,FSL_MISSP_REPLYTO,FSL_NEW_HELO_USER,FSL_RCVD_USER,FSL_UA,FSL_XM_419,HK_NAME_MR_MRS,LOTS_OF_MONEY,LOTTO_DEPT,MONEY_FRAUD_3,MONEY_FRAUD_5,MONEY_FROM_MISSP,MSOE_MID_WRONG_CASE,NSL_RCVD_HELO_USER,TO_NO_BRKTS_FROM_MSSP,T_AXB_XM2600,T_BIG_HEADERS_5K,T_CM_XRCVD_VOOZER4,T_FSL_FREEMAIL_1,T_FSL_HELO_NON_FQDN_2,T_HK_MUCHMONEY,T_LOTTO_AGENT,T_SINGLE_HEADER_1K,T_TO_NO_BRKTS_MSFT,__419_FROM_SIG,__ADVANCE_FEE_2_NEW,__ADVANCE_FEE_2_NEW_MONEY,__ADVANCE_FEE_3_NEW,__ADVANCE_FEE_3_NEW_MONEY,__ADVANCE_FEE_4_NEW,__ADVANCE_FEE_4_NEW_MONEY,__ADVANCE_FEE_5_NEW,__ADVANCE_FEE_5_NEW_MONEY,__AFF_LOTTERY,__ANY_OUTLOOK_MUA,__ANY_TEXT_ATTACH,__ANY_TEXT_ATTACH_DOC,__AXB_MO_OL_024C2,__AXB_MO_OL_D8ACC,__AXB_XM_OL_024C2,__AXB_XM_OL_080C4,__AXB_XM_OL_424A6,__AXB_XM_OL_B9D6C,__BOUNCE_RPATH_NULL,__CONGRADULAT,__CT,__CTE,__CTYPE_CHARSET_QUOTED,__CT_TEXT_PLAIN,__DOS_HAS_ANY_URI,__DOS_RCVD_THU,__DOS_RCVD_WED,__DOS_RELAYED_EXT,__FB_CONGRADS,__FH_HAS_XMSMAIL,__FH_HAS_XPRIORITY,__FORGED_OE,__FRAUD_DBI,__FRAUD_FCW,__FROM_FULL_NAME,__FROM_MISSPACED,__FROM_MISSP_REPLYTO,__FROM_MISSP_URI,__FROM_RUNON,__FSL_419_1,__FSL_419_2,__FSL_419_3,__FSL_419_4,__FSL_419_5,__FSL_HELO_USER_1,__FSL_HELO_USER_3,__FSL_UA_2,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_DATE,__HAS_FROM,__HAS_MESSAGE_ID,__HAS_MIMEOLE,__HAS_MSGID,__HAS_MSMAIL_PRI,__HAS_RCVD,__HAS_REPLY_TO,__HAS_SUBJECT,__HAS_TO,__HAS_URI,__HAS_XMAIL,__HAS_X_MAILER,__HK_NAME_MR_MRS,__LAST_EXTERNAL_RELAY_NO_AUTH,__LAST_UNTRUSTED_RELAY_NO_AUTH,__LOTSA_MONEY_04,__LOTTO_ADMITS,__LOTTO_ADMITS_1,__LOTTO_WIN_01,__MIMEOLE_MS,__MIME_VERSION,__MISSING_REF,__MISSING_REPLY,__MISSING_THREAD,__MONEY_FRAUD,__MONEY_FRAUD_3,__MONEY_FRAUD_5,__MONEY_LOTTERY,__MSGID_OK_DIGITS,__MSOE_MID_WRONG_CASE,__M_NOTIFIC,__NAKED_TO,__NONEMPTY_BODY,__NO_INR_YES_REF,__OE_MUA,__RCVD_VIA_APNIC_E,__RCVD_VIA_ARIN_E,__RCVD_VIA_RIPE,__RCVD_VIA_RIPE_E,__RDNS_SHORT,__REPLYTO_EXISTS,__REPLY_FREEMAIL,__SANE_MSGID,__SARE_FRAUD_BARRISTER,__SINGLE_HEADER_1K,__SUBJ_2UPPER,__SUBJ_4LOWER,__SUBJ_HAS_WORDS,__SUBJ_NOT_SHORT,__TOCC_EXISTS,__TO_NO_ARROWS_R,__TO_NO_BRKTS_FROM_MSSP,__TO_NO_BRKTS_FROM_RUNON,__TO_NO_BRKTS_MSFT,__TO_NO_BRKTS_NOTLIST,__TVD_BODY,__TVD_MIME_ATT_TP,__URI_MAILTO,__XM_MSOE6,__XM_MS_IN_GENERAL,__XM_OUTLOOK_EXPRESS,__XPRIO,__YOU_WON,__YOU_WON_01,__YOU_WON_02,__YOU_WON_SOMTIN,__hk_million,__hk_win_1,__hk_win_5,__hk_win_6,__hk_win_b
>>>
>>> time=0,scantime=0,format=f,reuse=no,set=0
>>>
>>> so apparently your masschecker is not seeing rules.
>>>
>>> I don't use --cache &  --cachedir (don't remember why) - for starters
>>> maybe remove
>>
>> I started without cache.
>>
>>> I have  --cf='use_bayes 0' (speeds up processing) and make sure you use
>>>   --cf='required_score 5'
>>>
>>> you'll have to play with your setup till your logs show SA rule hits.
>>
>> Therea are no SA rules because parameter "-C=/dev/null" is set.
>>
>> I don't understand something. Why do I need to check
>> mails-that-i-classified-as-spam-or-ham against rules? If I understand
>> how creating auto rules works masscheck only dumps strings from ham and
>> spam.
> 
> the routine is supposed to create rules based from msgs in your spam
> folder and needs the ham folder to counterweight against potential FPs
> so for example, you don't start producing rules based on phrases in
> disclaimers.
> 
> in the log, each line starts with Y/N and a score - not sure how
> necessary it is, I've always had it that way and it "works for me"
> 
>> And next seek-phrases-in-log should create rules using found strings.
>> I'm using script from svn with some changes in path. So I assumed that
>> it should be more or less working:)
> 
> a wise man once said: "to assume is not to know"
> why not try avoiding modifications till you get some usefull results and
> the start doing mods, one at a time.

I just modified "run" script, other perl scripts are untouched.

>> Btw, I removed -C=/dev/null , rules hit are in logs but
>> seek-phrases-in-log still returns no rules if I use --reqpatlength= to
>> non zero value.
> 
> I have no idea.
> I'll send you a modified seek-phrases-in-log (offlist) for you to try...

I've got two news, bad and good.
The good news is you version of script works!
Bad news is that script in official repo doesn't work.
bugzilla?

Thanks


Re: seek-phrases-in-log - does it work correctly?

Posted by Axb <ax...@gmail.com>.
On 03/09/2017 02:31 PM, marcin@mejor.pl wrote:
> W dniu 08.03.2017 o 17:30, Axb pisze:
>> On 03/08/2017 04:55 PM, marcin@mejor.pl wrote:
>>> W dniu 08.03.2017 o 16:33, Axb pisze:
>>>> On 03/08/2017 04:16 PM, marcin@mejor.pl wrote:
>>>>> W dniu 08.03.2017 o 16:06, Axb pisze:
>>>>>> On 03/08/2017 03:58 PM, marcin@mejor.pl wrote:
>>>>>>> W dniu 08.03.2017 o 15:27, Axb pisze:
>>>>>>>> As your command below shows you're using --reqpatlength 0
>>>>>>>>
>>>>>>>> Start off with some sane as for example --reqpatlength 40
>>>>>>>>
>>>>>>>> you may also want to play with --maxtextread
>>>>>>>> ( I use --maxtextread 8192  for FRAUD rules)
>>>>>>>
>>>>>>> But with --reqpatlength 10, 40, 100 or 1000 I've go no hit. Reading
>>>>>>> help
>>>>>>> ( "--reqpatlength: required pattern length, in characters
>>>>>>> (default: 0)"
>>>>>>> ) I understand that pattern in generated rule will be longer than
>>>>>>> reqpatlength (shorter strings will be ignored). Do I correctly assume
>>>>>>> how the parameter works?
>>>>>>
>>>>>> --reqpatlength 40  tells seekphrases to ignore any phrases which are
>>>>>> smaller than 40 chars
>>>>>>
>>>>>> just checked by line which is using
>>>>>>  --reqpatlength 37
>>>>>
>>>>> Any value>0 makes that no rule is generated.
>>>>>
>>>>>> body __AXB_FRAUD_LAF076  /It has come to our attention that you /
>>>>>> body __AXB_FRAUD_UPVTRT  / in order to confirm your disbursement\./
>>>>>> body __AXB_FRAUD_NOFUX2  / approval, your funds will be deposited
>>>>>> directly into your /
>>>>>> body __AXB_FRAUD_Z4ZZ7D  / in order to accept your disbursement\./
>>>>>> body __AXB_FRAUD_CUXJ6X  / approval, your funds will be direct
>>>>>> deposited
>>>>>> into your /
>>>>>> body __AXB_FRAUD_NHWXKL  /: You Are Eligible to Receive Funds up to
>>>>>> \$.,000\. /
>>>>>>
>>>>>> hard to guess what is not working on your side without full insight
>>>>>
>>>>> What can I do to help more? Should I share all_w.h and all_w.s files?
>>>>
>>>> before we go that way pls answer these questions
>>>>
>>>> how many spams/hams are you processing?
>>>
>>> ham: ~1400
>>> spam: ~8200
>>>
>>>> do you have a file named assemble.state ? if yes, how large?
>>>
>>> Yes, I've got this file, it has ~9MB size.
>>>
>>>> and pls zip & send me the full script you're using to generate the
>>>> rules, OFFLIST! do NOT post to list
>>>
>>> Ok, I'll choose tar.bz2 ;)
>>> Thanks for help.
>>
>> replying on list as much as I can so it's  archived FTR
>>
>> first thin I see is that your logs do not contain a list of rules which
>> hit on each message.
>>
>> for example my "w.s" file has lines which look like:
>>
>>  53 /home/mc/Maildir/cur/1487823401.M695422P29583.ruler,S=7602,W=7747:2,
>> ADVANCE_FEE_2_NEW_MONEY,ADVANCE_FEE_3_NEW,ADVANCE_FEE_3_NEW_MONEY,ADVANCE_FEE_4_NEW,ADVANCE_FEE_4_NEW_MONEY,ADVANCE_FEE_5_NEW,ADVANCE_FEE_5_NEW_MONEY,AXB_XM2600,AXB_XMAILER_MIMEOLE_OL_024C2,CM_XRCVD_VOOZER4,DEAR_WINNER,FORGED_MUA_OUTLOOK,FROM_MISSPACED,FROM_MISSP_MSFT,FROM_MISSP_REPLYTO,FROM_MISSP_URI,FSL_419_FP1,FSL_CTYPE_WIN1251,FSL_MISSP_REPLYTO,FSL_NEW_HELO_USER,FSL_RCVD_USER,FSL_UA,FSL_XM_419,HK_NAME_MR_MRS,LOTS_OF_MONEY,LOTTO_DEPT,MONEY_FRAUD_3,MONEY_FRAUD_5,MONEY_FROM_MISSP,MSOE_MID_WRONG_CASE,NSL_RCVD_HELO_USER,TO_NO_BRKTS_FROM_MSSP,T_AXB_XM2600,T_BIG_HEADERS_5K,T_CM_XRCVD_VOOZER4,T_FSL_FREEMAIL_1,T_FSL_HELO_NON_FQDN_2,T_HK_MUCHMONEY,T_LOTTO_AGENT,T_SINGLE_HEADER_1K,T_TO_NO_BRKTS_MSFT,__419_FROM_SIG,__ADVANCE_FEE_2_NEW,__ADVANCE_FEE_2_NEW_MONEY,__ADVANCE_FEE_3_NEW,__ADVANCE_FEE_3_NEW_MONEY,__ADVANCE_FEE_4_NEW,__ADVANCE_FEE_4_NEW_MONEY,__ADVANCE_FEE_5_NEW,__ADVANCE_FEE_5_NEW_MONEY,__AFF_LOTTERY,__ANY_OUTLOOK_MUA,__ANY_TEXT_ATTACH,__ANY_TEXT_ATTACH_DOC,__AXB_MO_OL_024C2,__AXB_MO_OL_D8ACC,__AXB_XM_OL_024C2,__AXB_XM_OL_080C4,__AXB_XM_OL_424A6,__AXB_XM_OL_B9D6C,__BOUNCE_RPATH_NULL,__CONGRADULAT,__CT,__CTE,__CTYPE_CHARSET_QUOTED,__CT_TEXT_PLAIN,__DOS_HAS_ANY_URI,__DOS_RCVD_THU,__DOS_RCVD_WED,__DOS_RELAYED_EXT,__FB_CONGRADS,__FH_HAS_XMSMAIL,__FH_HAS_XPRIORITY,__FORGED_OE,__FRAUD_DBI,__FRAUD_FCW,__FROM_FULL_NAME,__FROM_MISSPACED,__FROM_MISSP_REPLYTO,__FROM_MISSP_URI,__FROM_RUNON,__FSL_419_1,__FSL_419_2,__FSL_419_3,__FSL_419_4,__FSL_419_5,__FSL_HELO_USER_1,__FSL_HELO_USER_3,__FSL_UA_2,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_DATE,__HAS_FROM,__HAS_MESSAGE_ID,__HAS_MIMEOLE,__HAS_MSGID,__HAS_MSMAIL_PRI,__HAS_RCVD,__HAS_REPLY_TO,__HAS_SUBJECT,__HAS_TO,__HAS_URI,__HAS_XMAIL,__HAS_X_MAILER,__HK_NAME_MR_MRS,__LAST_EXTERNAL_RELAY_NO_AUTH,__LAST_UNTRUSTED_RELAY_NO_AUTH,__LOTSA_MONEY_04,__LOTTO_ADMITS,__LOTTO_ADMITS_1,__LOTTO_WIN_01,__MIMEOLE_MS,__MIME_VERSION,__MISSING_REF,__MISSING_REPLY,__MISSING_THREAD,__MONEY_FRAUD,__MONEY_FRAUD_3,__MONEY_FRAUD_5,__MONEY_LOTTERY,__MSGID_OK_DIGITS,__MSOE_MID_WRONG_CASE,__M_NOTIFIC,__NAKED_TO,__NONEMPTY_BODY,__NO_INR_YES_REF,__OE_MUA,__RCVD_VIA_APNIC_E,__RCVD_VIA_ARIN_E,__RCVD_VIA_RIPE,__RCVD_VIA_RIPE_E,__RDNS_SHORT,__REPLYTO_EXISTS,__REPLY_FREEMAIL,__SANE_MSGID,__SARE_FRAUD_BARRISTER,__SINGLE_HEADER_1K,__SUBJ_2UPPER,__SUBJ_4LOWER,__SUBJ_HAS_WORDS,__SUBJ_NOT_SHORT,__TOCC_EXISTS,__TO_NO_ARROWS_R,__TO_NO_BRKTS_FROM_MSSP,__TO_NO_BRKTS_FROM_RUNON,__TO_NO_BRKTS_MSFT,__TO_NO_BRKTS_NOTLIST,__TVD_BODY,__TVD_MIME_ATT_TP,__URI_MAILTO,__XM_MSOE6,__XM_MS_IN_GENERAL,__XM_OUTLOOK_EXPRESS,__XPRIO,__YOU_WON,__YOU_WON_01,__YOU_WON_02,__YOU_WON_SOMTIN,__hk_million,__hk_win_1,__hk_win_5,__hk_win_6,__hk_win_b
>> time=0,scantime=0,format=f,reuse=no,set=0
>>
>> so apparently your masschecker is not seeing rules.
>>
>> I don't use --cache &  --cachedir (don't remember why) - for starters
>> maybe remove
>
> I started without cache.
>
>> I have  --cf='use_bayes 0' (speeds up processing) and make sure you use
>>   --cf='required_score 5'
>>
>> you'll have to play with your setup till your logs show SA rule hits.
>
> Therea are no SA rules because parameter "-C=/dev/null" is set.
>
> I don't understand something. Why do I need to check
> mails-that-i-classified-as-spam-or-ham against rules? If I understand
> how creating auto rules works masscheck only dumps strings from ham and
> spam.

the routine is supposed to create rules based from msgs in your spam 
folder and needs the ham folder to counterweight against potential FPs 
so for example, you don't start producing rules based on phrases in 
disclaimers.

in the log, each line starts with Y/N and a score - not sure how 
necessary it is, I've always had it that way and it "works for me"

> And next seek-phrases-in-log should create rules using found strings.
> I'm using script from svn with some changes in path. So I assumed that
> it should be more or less working:)

a wise man once said: "to assume is not to know"
why not try avoiding modifications till you get some usefull results and 
the start doing mods, one at a time.

> Btw, I removed -C=/dev/null , rules hit are in logs but
> seek-phrases-in-log still returns no rules if I use --reqpatlength= to
> non zero value.

I have no idea.
I'll send you a modified seek-phrases-in-log (offlist) for you to try...

Axb




Re: seek-phrases-in-log - does it work correctly?

Posted by "marcin@mejor.pl" <ma...@mejor.pl>.
W dniu 08.03.2017 o 17:30, Axb pisze:
> On 03/08/2017 04:55 PM, marcin@mejor.pl wrote:
>> W dniu 08.03.2017 o 16:33, Axb pisze:
>>> On 03/08/2017 04:16 PM, marcin@mejor.pl wrote:
>>>> W dniu 08.03.2017 o 16:06, Axb pisze:
>>>>> On 03/08/2017 03:58 PM, marcin@mejor.pl wrote:
>>>>>> W dniu 08.03.2017 o 15:27, Axb pisze:
>>>>>>> As your command below shows you're using --reqpatlength 0
>>>>>>>
>>>>>>> Start off with some sane as for example --reqpatlength 40
>>>>>>>
>>>>>>> you may also want to play with --maxtextread
>>>>>>> ( I use --maxtextread 8192  for FRAUD rules)
>>>>>>
>>>>>> But with --reqpatlength 10, 40, 100 or 1000 I've go no hit. Reading
>>>>>> help
>>>>>> ( "--reqpatlength: required pattern length, in characters
>>>>>> (default: 0)"
>>>>>> ) I understand that pattern in generated rule will be longer than
>>>>>> reqpatlength (shorter strings will be ignored). Do I correctly assume
>>>>>> how the parameter works?
>>>>>
>>>>> --reqpatlength 40  tells seekphrases to ignore any phrases which are
>>>>> smaller than 40 chars
>>>>>
>>>>> just checked by line which is using
>>>>>  --reqpatlength 37
>>>>
>>>> Any value>0 makes that no rule is generated.
>>>>
>>>>> body __AXB_FRAUD_LAF076  /It has come to our attention that you /
>>>>> body __AXB_FRAUD_UPVTRT  / in order to confirm your disbursement\./
>>>>> body __AXB_FRAUD_NOFUX2  / approval, your funds will be deposited
>>>>> directly into your /
>>>>> body __AXB_FRAUD_Z4ZZ7D  / in order to accept your disbursement\./
>>>>> body __AXB_FRAUD_CUXJ6X  / approval, your funds will be direct
>>>>> deposited
>>>>> into your /
>>>>> body __AXB_FRAUD_NHWXKL  /: You Are Eligible to Receive Funds up to
>>>>> \$.,000\. /
>>>>>
>>>>> hard to guess what is not working on your side without full insight
>>>>
>>>> What can I do to help more? Should I share all_w.h and all_w.s files?
>>>
>>> before we go that way pls answer these questions
>>>
>>> how many spams/hams are you processing?
>>
>> ham: ~1400
>> spam: ~8200
>>
>>> do you have a file named assemble.state ? if yes, how large?
>>
>> Yes, I've got this file, it has ~9MB size.
>>
>>> and pls zip & send me the full script you're using to generate the
>>> rules, OFFLIST! do NOT post to list
>>
>> Ok, I'll choose tar.bz2 ;)
>> Thanks for help.
> 
> replying on list as much as I can so it's  archived FTR
> 
> first thin I see is that your logs do not contain a list of rules which
> hit on each message.
> 
> for example my "w.s" file has lines which look like:
> 
>  53 /home/mc/Maildir/cur/1487823401.M695422P29583.ruler,S=7602,W=7747:2,
> ADVANCE_FEE_2_NEW_MONEY,ADVANCE_FEE_3_NEW,ADVANCE_FEE_3_NEW_MONEY,ADVANCE_FEE_4_NEW,ADVANCE_FEE_4_NEW_MONEY,ADVANCE_FEE_5_NEW,ADVANCE_FEE_5_NEW_MONEY,AXB_XM2600,AXB_XMAILER_MIMEOLE_OL_024C2,CM_XRCVD_VOOZER4,DEAR_WINNER,FORGED_MUA_OUTLOOK,FROM_MISSPACED,FROM_MISSP_MSFT,FROM_MISSP_REPLYTO,FROM_MISSP_URI,FSL_419_FP1,FSL_CTYPE_WIN1251,FSL_MISSP_REPLYTO,FSL_NEW_HELO_USER,FSL_RCVD_USER,FSL_UA,FSL_XM_419,HK_NAME_MR_MRS,LOTS_OF_MONEY,LOTTO_DEPT,MONEY_FRAUD_3,MONEY_FRAUD_5,MONEY_FROM_MISSP,MSOE_MID_WRONG_CASE,NSL_RCVD_HELO_USER,TO_NO_BRKTS_FROM_MSSP,T_AXB_XM2600,T_BIG_HEADERS_5K,T_CM_XRCVD_VOOZER4,T_FSL_FREEMAIL_1,T_FSL_HELO_NON_FQDN_2,T_HK_MUCHMONEY,T_LOTTO_AGENT,T_SINGLE_HEADER_1K,T_TO_NO_BRKTS_MSFT,__419_FROM_SIG,__ADVANCE_FEE_2_NEW,__ADVANCE_FEE_2_NEW_MONEY,__ADVANCE_FEE_3_NEW,__ADVANCE_FEE_3_NEW_MONEY,__ADVANCE_FEE_4_NEW,__ADVANCE_FEE_4_NEW_MONEY,__ADVANCE_FEE_5_NEW,__ADVANCE_FEE_5_NEW_MONEY,__AFF_LOTTERY,__ANY_OUTLOOK_MUA,__ANY_TEXT_ATTACH,__ANY_TEXT_ATTACH_DOC,__AXB_MO_OL_024C2,__AXB_MO_OL_D8ACC,__AXB_XM_OL_024C2,__AXB_XM_OL_080C4,__AXB_XM_OL_424A6,__AXB_XM_OL_B9D6C,__BOUNCE_RPATH_NULL,__CONGRADULAT,__CT,__CTE,__CTYPE_CHARSET_QUOTED,__CT_TEXT_PLAIN,__DOS_HAS_ANY_URI,__DOS_RCVD_THU,__DOS_RCVD_WED,__DOS_RELAYED_EXT,__FB_CONGRADS,__FH_HAS_XMSMAIL,__FH_HAS_XPRIORITY,__FORGED_OE,__FRAUD_DBI,__FRAUD_FCW,__FROM_FULL_NAME,__FROM_MISSPACED,__FROM_MISSP_REPLYTO,__FROM_MISSP_URI,__FROM_RUNON,__FSL_419_1,__FSL_419_2,__FSL_419_3,__FSL_419_4,__FSL_419_5,__FSL_HELO_USER_1,__FSL_HELO_USER_3,__FSL_UA_2,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_DATE,__HAS_FROM,__HAS_MESSAGE_ID,__HAS_MIMEOLE,__HAS_MSGID,__HAS_MSMAIL_PRI,__HAS_RCVD,__HAS_REPLY_TO,__HAS_SUBJECT,__HAS_TO,__HAS_URI,__HAS_XMAIL,__HAS_X_MAILER,__HK_NAME_MR_MRS,__LAST_EXTERNAL_RELAY_NO_AUTH,__LAST_UNTRUSTED_RELAY_NO_AUTH,__LOTSA_MONEY_04,__LOTTO_ADMITS,__LOTTO_ADMITS_1,__LOTTO_WIN_01,__MIMEOLE_MS,__MIME_VERSION,__MISSING_REF,__MISSING_REPLY,__MISSING_THREAD,__MONEY_FRAUD,__MONEY_FRAUD_3,__MONEY_FRAUD_5,__MONEY_LOTTERY,__MSGID_OK_DIGITS,__MSOE_MID_WRONG_CASE,__M_NOTIFIC,__NAKED_TO,__NONEMPTY_BODY,__NO_INR_YES_REF,__OE_MUA,__RCVD_VIA_APNIC_E,__RCVD_VIA_ARIN_E,__RCVD_VIA_RIPE,__RCVD_VIA_RIPE_E,__RDNS_SHORT,__REPLYTO_EXISTS,__REPLY_FREEMAIL,__SANE_MSGID,__SARE_FRAUD_BARRISTER,__SINGLE_HEADER_1K,__SUBJ_2UPPER,__SUBJ_4LOWER,__SUBJ_HAS_WORDS,__SUBJ_NOT_SHORT,__TOCC_EXISTS,__TO_NO_ARROWS_R,__TO_NO_BRKTS_FROM_MSSP,__TO_NO_BRKTS_FROM_RUNON,__TO_NO_BRKTS_MSFT,__TO_NO_BRKTS_NOTLIST,__TVD_BODY,__TVD_MIME_ATT_TP,__URI_MAILTO,__XM_MSOE6,__XM_MS_IN_GENERAL,__XM_OUTLOOK_EXPRESS,__XPRIO,__YOU_WON,__YOU_WON_01,__YOU_WON_02,__YOU_WON_SOMTIN,__hk_million,__hk_win_1,__hk_win_5,__hk_win_6,__hk_win_b
> time=0,scantime=0,format=f,reuse=no,set=0
> 
> so apparently your masschecker is not seeing rules.
> 
> I don't use --cache &  --cachedir (don't remember why) - for starters
> maybe remove

I started without cache.

> I have  --cf='use_bayes 0' (speeds up processing) and make sure you use
>   --cf='required_score 5'
> 
> you'll have to play with your setup till your logs show SA rule hits.

Therea are no SA rules because parameter "-C=/dev/null" is set.

I don't understand something. Why do I need to check
mails-that-i-classified-as-spam-or-ham against rules? If I understand
how creating auto rules works masscheck only dumps strings from ham and
spam. And next seek-phrases-in-log should create rules using found strings.
I'm using script from svn with some changes in path. So I assumed that
it should be more or less working:)
Btw, I removed -C=/dev/null , rules hit are in logs but
seek-phrases-in-log still returns no rules if I use --reqpatlength= to
non zero value.

Thanks!



Re: seek-phrases-in-log - does it work correctly?

Posted by Axb <ax...@gmail.com>.
On 03/08/2017 04:55 PM, marcin@mejor.pl wrote:
> W dniu 08.03.2017 o 16:33, Axb pisze:
>> On 03/08/2017 04:16 PM, marcin@mejor.pl wrote:
>>> W dniu 08.03.2017 o 16:06, Axb pisze:
>>>> On 03/08/2017 03:58 PM, marcin@mejor.pl wrote:
>>>>> W dniu 08.03.2017 o 15:27, Axb pisze:
>>>>>> As your command below shows you're using --reqpatlength 0
>>>>>>
>>>>>> Start off with some sane as for example --reqpatlength 40
>>>>>>
>>>>>> you may also want to play with --maxtextread
>>>>>> ( I use --maxtextread 8192  for FRAUD rules)
>>>>>
>>>>> But with --reqpatlength 10, 40, 100 or 1000 I've go no hit. Reading
>>>>> help
>>>>> ( "--reqpatlength: required pattern length, in characters (default: 0)"
>>>>> ) I understand that pattern in generated rule will be longer than
>>>>> reqpatlength (shorter strings will be ignored). Do I correctly assume
>>>>> how the parameter works?
>>>>
>>>> --reqpatlength 40  tells seekphrases to ignore any phrases which are
>>>> smaller than 40 chars
>>>>
>>>> just checked by line which is using
>>>>  --reqpatlength 37
>>>
>>> Any value>0 makes that no rule is generated.
>>>
>>>> body __AXB_FRAUD_LAF076  /It has come to our attention that you /
>>>> body __AXB_FRAUD_UPVTRT  / in order to confirm your disbursement\./
>>>> body __AXB_FRAUD_NOFUX2  / approval, your funds will be deposited
>>>> directly into your /
>>>> body __AXB_FRAUD_Z4ZZ7D  / in order to accept your disbursement\./
>>>> body __AXB_FRAUD_CUXJ6X  / approval, your funds will be direct deposited
>>>> into your /
>>>> body __AXB_FRAUD_NHWXKL  /: You Are Eligible to Receive Funds up to
>>>> \$.,000\. /
>>>>
>>>> hard to guess what is not working on your side without full insight
>>>
>>> What can I do to help more? Should I share all_w.h and all_w.s files?
>>
>> before we go that way pls answer these questions
>>
>> how many spams/hams are you processing?
>
> ham: ~1400
> spam: ~8200
>
>> do you have a file named assemble.state ? if yes, how large?
>
> Yes, I've got this file, it has ~9MB size.
>
>> and pls zip & send me the full script you're using to generate the
>> rules, OFFLIST! do NOT post to list
>
> Ok, I'll choose tar.bz2 ;)
> Thanks for help.

replying on list as much as I can so it's  archived FTR

first thin I see is that your logs do not contain a list of rules which 
hit on each message.

for example my "w.s" file has lines which look like:

  53 
/home/mc/Maildir/cur/1487823401.M695422P29583.ruler,S=7602,W=7747:2, 
ADVANCE_FEE_2_NEW_MONEY,ADVANCE_FEE_3_NEW,ADVANCE_FEE_3_NEW_MONEY,ADVANCE_FEE_4_NEW,ADVANCE_FEE_4_NEW_MONEY,ADVANCE_FEE_5_NEW,ADVANCE_FEE_5_NEW_MONEY,AXB_XM2600,AXB_XMAILER_MIMEOLE_OL_024C2,CM_XRCVD_VOOZER4,DEAR_WINNER,FORGED_MUA_OUTLOOK,FROM_MISSPACED,FROM_MISSP_MSFT,FROM_MISSP_REPLYTO,FROM_MISSP_URI,FSL_419_FP1,FSL_CTYPE_WIN1251,FSL_MISSP_REPLYTO,FSL_NEW_HELO_USER,FSL_RCVD_USER,FSL_UA,FSL_XM_419,HK_NAME_MR_MRS,LOTS_OF_MONEY,LOTTO_DEPT,MONEY_FRAUD_3,MONEY_FRAUD_5,MONEY_FROM_MISSP,MSOE_MID_WRONG_CASE,NSL_RCVD_HELO_USER,TO_NO_BRKTS_FROM_MSSP,T_AXB_XM2600,T_BIG_HEADERS_5K,T_CM_XRCVD_VOOZER4,T_FSL_FREEMAIL_1,T_FSL_HELO_NON_FQDN_2,T_HK_MUCHMONEY,T_LOTTO_AGENT,T_SINGLE_HEADER_1K,T_TO_NO_BRKTS_MSFT,__419_FROM_SIG,__ADVANCE_FEE_2_NEW,__ADVANCE_FEE_2_NEW_MONEY,__ADVANCE_FEE_3_NEW,__ADVANCE_FEE_3_NEW_MONEY,__ADVANCE_FEE_4_NEW,__ADVANCE_FEE_4_NEW_MONEY,__ADVANCE_FEE_5_NEW,__ADVANCE_FEE_5_NEW_MONEY,__AFF_LOTTERY,__ANY_OUTLOOK_MUA,__ANY_TEXT_ATTACH,__ANY_TEXT_ATTACH_DOC,__AXB_MO_OL_024C2,__AXB_MO_OL_D8ACC,__AXB_XM_OL_024C2,__AXB_XM_OL_080C4,__AXB_XM_OL_424A6,__AXB_XM_OL_B9D6C,__BOUNCE_RPATH_NULL,__CONGRADULAT,__CT,__CTE,__CTYPE_CHARSET_QUOTED,__CT_TEXT_PLAIN,__DOS_HAS_ANY_URI,__DOS_RCVD_THU,__DOS_RCVD_WED,__DOS_RELAYED_EXT,__FB_CONGRADS,__FH_HAS_XMSMAIL,__FH_HAS_XPRIORITY,__FORGED_OE,__FRAUD_DBI,__FRAUD_FCW,__FROM_FULL_NAME,__FROM_MISSPACED,__FROM_MISSP_REPLYTO,__FROM_MISSP_URI,__FROM_RUNON,__FSL_419_1,__FSL_419_2,__FSL_419_3,__FSL_419_4,__FSL_419_5,__FSL_HELO_USER_1,__FSL_HELO_USER_3,__FSL_UA_2,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_DATE,__HAS_FROM,__HAS_MESSAGE_ID,__HAS_MIMEOLE,__HAS_MSGID,__HAS_MSMAIL_PRI,__HAS_RCVD,__HAS_REPLY_TO,__HAS_SUBJECT,__HAS_TO,__HAS_URI,__HAS_XMAIL,__HAS_X_MAILER,__HK_NAME_MR_MRS,__LAST_EXTERNAL_RELAY_NO_AUTH,__LAST_UNTRUSTED_RELAY_NO_AUTH,__LOTSA_MONEY_04,__LOTTO_ADMITS,__LOTTO_ADMITS_1,__LOTTO_WIN_01,__MIMEOLE_MS,__MIME_VERSION,__MISSING_REF,__MISSING_REPLY,__MISSING_THREAD,__MONEY_FRAUD,__MONEY_FRAUD_3,__MONEY_FRAUD_5,__MONEY_LOTTERY,__MSGID_OK_DIGITS,__MSOE_MID_WRONG_CASE,__M_NOTIFIC,__NAKED_TO,__NONEMPTY_BODY,__NO_INR_YES_REF,__OE_MUA,__RCVD_VIA_APNIC_E,__RCVD_VIA_ARIN_E,__RCVD_VIA_RIPE,__RCVD_VIA_RIPE_E,__RDNS_SHORT,__REPLYTO_EXISTS,__REPLY_FREEMAIL,__SANE_MSGID,__SARE_FRAUD_BARRISTER,__SINGLE_HEADER_1K,__SUBJ_2UPPER,__SUBJ_4LOWER,__SUBJ_HAS_WORDS,__SUBJ_NOT_SHORT,__TOCC_EXISTS,__TO_NO_ARROWS_R,__TO_NO_BRKTS_FROM_MSSP,__TO_NO_BRKTS_FROM_RUNON,__TO_NO_BRKTS_MSFT,__TO_NO_BRKTS_NOTLIST,__TVD_BODY,__TVD_MIME_ATT_TP,__URI_MAILTO,__XM_MSOE6,__XM_MS_IN_GENERAL,__XM_OUTLOOK_EXPRESS,__XPRIO,__YOU_WON,__YOU_WON_01,__YOU_WON_02,__YOU_WON_SOMTIN,__hk_million,__hk_win_1,__hk_win_5,__hk_win_6,__hk_win_b 
time=0,scantime=0,format=f,reuse=no,set=0

so apparently your masschecker is not seeing rules.

I don't use --cache &  --cachedir (don't remember why) - for starters 
maybe remove

I have  --cf='use_bayes 0' (speeds up processing) and make sure you use
   --cf='required_score 5'

you'll have to play with your setup till your logs show SA rule hits.







Re: seek-phrases-in-log - does it work correctly?

Posted by "marcin@mejor.pl" <ma...@mejor.pl>.
W dniu 08.03.2017 o 16:33, Axb pisze:
> On 03/08/2017 04:16 PM, marcin@mejor.pl wrote:
>> W dniu 08.03.2017 o 16:06, Axb pisze:
>>> On 03/08/2017 03:58 PM, marcin@mejor.pl wrote:
>>>> W dniu 08.03.2017 o 15:27, Axb pisze:
>>>>> As your command below shows you're using --reqpatlength 0
>>>>>
>>>>> Start off with some sane as for example --reqpatlength 40
>>>>>
>>>>> you may also want to play with --maxtextread
>>>>> ( I use --maxtextread 8192  for FRAUD rules)
>>>>
>>>> But with --reqpatlength 10, 40, 100 or 1000 I've go no hit. Reading
>>>> help
>>>> ( "--reqpatlength: required pattern length, in characters (default: 0)"
>>>> ) I understand that pattern in generated rule will be longer than
>>>> reqpatlength (shorter strings will be ignored). Do I correctly assume
>>>> how the parameter works?
>>>
>>> --reqpatlength 40  tells seekphrases to ignore any phrases which are
>>> smaller than 40 chars
>>>
>>> just checked by line which is using
>>>  --reqpatlength 37
>>
>> Any value>0 makes that no rule is generated.
>>
>>> body __AXB_FRAUD_LAF076  /It has come to our attention that you /
>>> body __AXB_FRAUD_UPVTRT  / in order to confirm your disbursement\./
>>> body __AXB_FRAUD_NOFUX2  / approval, your funds will be deposited
>>> directly into your /
>>> body __AXB_FRAUD_Z4ZZ7D  / in order to accept your disbursement\./
>>> body __AXB_FRAUD_CUXJ6X  / approval, your funds will be direct deposited
>>> into your /
>>> body __AXB_FRAUD_NHWXKL  /: You Are Eligible to Receive Funds up to
>>> \$.,000\. /
>>>
>>> hard to guess what is not working on your side without full insight
>>
>> What can I do to help more? Should I share all_w.h and all_w.s files?
> 
> before we go that way pls answer these questions
> 
> how many spams/hams are you processing?

ham: ~1400
spam: ~8200

> do you have a file named assemble.state ? if yes, how large?

Yes, I've got this file, it has ~9MB size.

> and pls zip & send me the full script you're using to generate the
> rules, OFFLIST! do NOT post to list

Ok, I'll choose tar.bz2 ;)
Thanks for help.


Re: seek-phrases-in-log - does it work correctly?

Posted by Axb <ax...@gmail.com>.
On 03/08/2017 04:16 PM, marcin@mejor.pl wrote:
> W dniu 08.03.2017 o 16:06, Axb pisze:
>> On 03/08/2017 03:58 PM, marcin@mejor.pl wrote:
>>> W dniu 08.03.2017 o 15:27, Axb pisze:
>>>> As your command below shows you're using --reqpatlength 0
>>>>
>>>> Start off with some sane as for example --reqpatlength 40
>>>>
>>>> you may also want to play with --maxtextread
>>>> ( I use --maxtextread 8192  for FRAUD rules)
>>>
>>> But with --reqpatlength 10, 40, 100 or 1000 I've go no hit. Reading help
>>> ( "--reqpatlength: required pattern length, in characters (default: 0)"
>>> ) I understand that pattern in generated rule will be longer than
>>> reqpatlength (shorter strings will be ignored). Do I correctly assume
>>> how the parameter works?
>>
>> --reqpatlength 40  tells seekphrases to ignore any phrases which are
>> smaller than 40 chars
>>
>> just checked by line which is using
>>  --reqpatlength 37
>
> Any value>0 makes that no rule is generated.
>
>> body __AXB_FRAUD_LAF076  /It has come to our attention that you /
>> body __AXB_FRAUD_UPVTRT  / in order to confirm your disbursement\./
>> body __AXB_FRAUD_NOFUX2  / approval, your funds will be deposited
>> directly into your /
>> body __AXB_FRAUD_Z4ZZ7D  / in order to accept your disbursement\./
>> body __AXB_FRAUD_CUXJ6X  / approval, your funds will be direct deposited
>> into your /
>> body __AXB_FRAUD_NHWXKL  /: You Are Eligible to Receive Funds up to
>> \$.,000\. /
>>
>> hard to guess what is not working on your side without full insight
>
> What can I do to help more? Should I share all_w.h and all_w.s files?

before we go that way pls answer these questions

how many spams/hams are you processing?

do you have a file named assemble.state ? if yes, how large?

and pls zip & send me the full script you're using to generate the 
rules, OFFLIST! do NOT post to list



Re: seek-phrases-in-log - does it work correctly?

Posted by "marcin@mejor.pl" <ma...@mejor.pl>.
W dniu 08.03.2017 o 16:06, Axb pisze:
> On 03/08/2017 03:58 PM, marcin@mejor.pl wrote:
>> W dniu 08.03.2017 o 15:27, Axb pisze:
>>> As your command below shows you're using --reqpatlength 0
>>>
>>> Start off with some sane as for example --reqpatlength 40
>>>
>>> you may also want to play with --maxtextread
>>> ( I use --maxtextread 8192  for FRAUD rules)
>>
>> But with --reqpatlength 10, 40, 100 or 1000 I've go no hit. Reading help
>> ( "--reqpatlength: required pattern length, in characters (default: 0)"
>> ) I understand that pattern in generated rule will be longer than
>> reqpatlength (shorter strings will be ignored). Do I correctly assume
>> how the parameter works?
> 
> --reqpatlength 40  tells seekphrases to ignore any phrases which are
> smaller than 40 chars
> 
> just checked by line which is using
>  --reqpatlength 37

Any value>0 makes that no rule is generated.

> body __AXB_FRAUD_LAF076  /It has come to our attention that you /
> body __AXB_FRAUD_UPVTRT  / in order to confirm your disbursement\./
> body __AXB_FRAUD_NOFUX2  / approval, your funds will be deposited
> directly into your /
> body __AXB_FRAUD_Z4ZZ7D  / in order to accept your disbursement\./
> body __AXB_FRAUD_CUXJ6X  / approval, your funds will be direct deposited
> into your /
> body __AXB_FRAUD_NHWXKL  /: You Are Eligible to Receive Funds up to
> \$.,000\. /
> 
> hard to guess what is not working on your side without full insight

What can I do to help more? Should I share all_w.h and all_w.s files?


Re: seek-phrases-in-log - does it work correctly?

Posted by Axb <ax...@gmail.com>.
On 03/08/2017 03:58 PM, marcin@mejor.pl wrote:
> W dniu 08.03.2017 o 15:27, Axb pisze:
>> As your command below shows you're using --reqpatlength 0
>>
>> Start off with some sane as for example --reqpatlength 40
>>
>> you may also want to play with --maxtextread
>> ( I use --maxtextread 8192  for FRAUD rules)
>
> But with --reqpatlength 10, 40, 100 or 1000 I've go no hit. Reading help
> ( "--reqpatlength: required pattern length, in characters (default: 0)"
> ) I understand that pattern in generated rule will be longer than
> reqpatlength (shorter strings will be ignored). Do I correctly assume
> how the parameter works?

--reqpatlength 40  tells seekphrases to ignore any phrases which are 
smaller than 40 chars

just checked by line which is using
  --reqpatlength 37

body __AXB_FRAUD_LAF076  /It has come to our attention that you /
body __AXB_FRAUD_UPVTRT  / in order to confirm your disbursement\./
body __AXB_FRAUD_NOFUX2  / approval, your funds will be deposited 
directly into your /
body __AXB_FRAUD_Z4ZZ7D  / in order to accept your disbursement\./
body __AXB_FRAUD_CUXJ6X  / approval, your funds will be direct deposited 
into your /
body __AXB_FRAUD_NHWXKL  /: You Are Eligible to Receive Funds up to 
\$.,000\. /

hard to guess what is not working on your side without full insight

Re: seek-phrases-in-log - does it work correctly?

Posted by "marcin@mejor.pl" <ma...@mejor.pl>.
W dniu 08.03.2017 o 15:27, Axb pisze:
> As your command below shows you're using --reqpatlength 0
> 
> Start off with some sane as for example --reqpatlength 40
> 
> you may also want to play with --maxtextread
> ( I use --maxtextread 8192  for FRAUD rules)

But with --reqpatlength 10, 40, 100 or 1000 I've go no hit. Reading help
( "--reqpatlength: required pattern length, in characters (default: 0)"
) I understand that pattern in generated rule will be longer than
reqpatlength (shorter strings will be ignored). Do I correctly assume
how the parameter works?



Re: seek-phrases-in-log - does it work correctly?

Posted by Axb <ax...@gmail.com>.
As your command below shows you're using --reqpatlength 0

Start off with some sane as for example --reqpatlength 40

you may also want to play with --maxtextread
( I use --maxtextread 8192  for FRAUD rules)

On 03/07/2017 03:33 PM, marcin@mejor.pl wrote:
> Hi!
> I'm trying to use
> masses/rule-dev/seek-phrases-in-log --reqpatlength <X>


>
> I'm not sure if it works correctly, please look:
> $ /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log --ham /home/masscheck/auto/tmp/all_w.h --spam /home/masscheck/auto/tmp/all_w.s --rules --ruleprefix __SEEK_FRAUD_ --reqpatlength 0
> Tue Mar  7 15:28:32 2017: reading /home/masscheck/auto/tmp/all_w.s...
> Tue Mar  7 15:28:32 2017: n-grams active: 637
> Tue Mar  7 15:28:32 2017: reading /home/masscheck/auto/tmp/all_w.h...
> Tue Mar  7 15:28:32 2017: n-grams active: 626
> Tue Mar  7 15:28:32 2017: filtering into message subsets...
> Tue Mar  7 15:28:32 2017: message subsets found: 10
> Tue Mar  7 15:28:32 2017: deduping and assembling regexps...
> Tue Mar  7 15:28:32 2017: working on message subset 1 (0)...
> #  1.000  73.333   0.000
> body __SEEK_FRAUD_8PS1M3  / interested in /
> body __SEEK_FRAUD_VMFZAX  / looking for /
> #  1.000  66.667   0.000
> body __SEEK_FRAUD_MUS7GX  /Dear /
> body __SEEK_FRAUD_Y2S6AV  /My name is .{0,20}, I am the personnel manager of a large International company\. Most of the work you can do from home, that is, at a distance\. Salary is \$2.00-\$5.00\./
> #  1.000  40.000   0.000
> body __SEEK_FRAUD_KPQM4S  /Re: /
> #  1.000  26.667   0.000
> body __SEEK_FRAUD_U38RDU  /Best regards\!/
> #  1.000  26.667   0.000
> body __SEEK_FRAUD_P9TXHY  /Good day/
> #  1.000  20.000   0.000
> body __SEEK_FRAUD_GDWWR6  /Have a nice day\!/
> #  1.000  13.333   0.000
> body __SEEK_FRAUD_LN5SMR  /hi\!/
>
>
> but:
>
> $ /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log --ham /home/masscheck/auto/tmp/all_w.h --spam /home/masscheck/auto/tmp/all_w.s --rules --ruleprefix __SEEK_FRAUD_ --reqpatlength 1
> Tue Mar  7 15:32:14 2017: reading /home/masscheck/auto/tmp/all_w.s...
> Tue Mar  7 15:32:14 2017: n-grams active: 637
> Tue Mar  7 15:32:14 2017: reading /home/masscheck/auto/tmp/all_w.h...
> Tue Mar  7 15:32:14 2017: n-grams active: 626
> Tue Mar  7 15:32:14 2017: filtering into message subsets...
> Tue Mar  7 15:32:14 2017: message subsets found: 10
> Tue Mar  7 15:32:14 2017: deduping and assembling regexps...
> Tue Mar  7 15:32:14 2017: working on message subset 1 (0)...
> Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
> Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
> Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
> Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
> Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
> Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
> Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
> Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
> Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
> Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
> Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
> Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
> Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
> Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
> Use of uninitialized value in numeric lt (<) at /home/masscheck/spamassassin-trunk//masses/rule-dev/seek-phrases-in-log line 675.
>
>
> perl-5.22.3
>
> Marcin
>