You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spamassassin.apache.org by Henrik K <he...@hege.li> on 2019/06/07 15:33:07 UTC

The obviously different case of subtest debug flood

On Fri, Jun 07, 2019 at 07:48:56AM -0700, John Hardin wrote:
> On Fri, 7 Jun 2019, Henrik K wrote:
> 
> >Just committed a simple log suppressor for these kinds of spam..
> >
> >Jun  7 11:25:44.264 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
> >Jun  7 11:25:44.264 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
> >Jun  7 11:25:44.264 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
> >Jun  7 11:25:44.264 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
> >Jun  7 11:25:44.265 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
> >Jun  7 11:25:44.265 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
> >Jun  7 11:25:44.265 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
> >Jun  7 11:25:44.265 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
> >Jun  7 11:25:44.265 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
> >Jun  7 11:25:44.265 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
> >Jun  7 11:25:44.269 [1569] dbg: --- last message repeated 21 times ---
> 
> Veto doing that. That information is very useful when debugging rules.

Well the information is there.  In many places.  You are saying you are
consistently using things like spamassassin -t -D | grep __LOWER_E | wc -l
to debug your rules?

Re: The obviously different case of subtest debug flood

Posted by John Hardin <jh...@impsec.org>.

On Sat, 8 Jun 2019, Kevin A. McGrail wrote:

> On 6/8/2019 1:27 AM, Henrik K wrote:
>> On Fri, Jun 07, 2019 at 11:53:24AM -0700, John Hardin wrote:
>>> ...where we're not collapsing on solely the rule name, I'd accept that.
>> I guess this was the confusion then.  I've been talking about identical /
>> duplicate lines, as is the code.  Of course it won't collapse only on some
>> part of the line.  This same thing is a common default syslog feature, so I
>> thought it would be pretty clear.
>>
> Yes, I thought *immediately* of the syslog deduplication anti-flood
> feature and thought it was good.

Agreed. I was only stressing the point for clarity because of the 
different contexts.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Vista: because the audio experience is *far* more important than
   network throughput.
-----------------------------------------------------------------------
  2 days until the 52nd anniversary of Israel's victory in the Six-Day War

Re: The obviously different case of subtest debug flood

Posted by "Kevin A. McGrail" <km...@apache.org>.

On 6/8/2019 1:27 AM, Henrik K wrote:
> On Fri, Jun 07, 2019 at 11:53:24AM -0700, John Hardin wrote:
>> ...where we're not collapsing on solely the rule name, I'd accept that.
> I guess this was the confusion then.  I've been talking about identical /
> duplicate lines, as is the code.  Of course it won't collapse only on some
> part of the line.  This same thing is a common default syslog feature, so I
> thought it would be pretty clear.
>
Yes, I thought *immediately* of the syslog deduplication anti-flood
feature and thought it was good.

-- 
Kevin A. McGrail
Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171

Re: The obviously different case of subtest debug flood

Posted by Henrik K <he...@hege.li>.

On Fri, Jun 07, 2019 at 11:53:24AM -0700, John Hardin wrote:
> 
> ...where we're not collapsing on solely the rule name, I'd accept that.

I guess this was the confusion then.  I've been talking about identical /
duplicate lines, as is the code.  Of course it won't collapse only on some
part of the line.  This same thing is a common default syslog feature, so I
thought it would be pretty clear.

Re: The obviously different case of subtest debug flood

Posted by Henrik K <he...@hege.li>.

On Fri, Jun 07, 2019 at 06:44:22PM +0300, Henrik K wrote:
> On Fri, Jun 07, 2019 at 11:39:12AM -0400, Kevin A. McGrail wrote:
> > On 6/7/2019 11:33 AM, Henrik K wrote:
> > > Well the information is there.  In many places.  You are saying you are
> > > consistently using things like spamassassin -t -D | grep __LOWER_E | wc -l
> > > to debug your rules?
> > 
> > Close.  I am consistently using spamassassin -t -D 2>&1 | grep -i -e
> > Content\ analysis -e KAM, for example and since I write mostly meta
> > rules, I get some really long and hard to read subtest debug lines.
> 
> Well the question was more for John.  I understand your previous problem,
> that's fine.
> 
> I don't understand how a line describing there are x more identical lines
> makes debugging any harder.  Just count 10+21, that's it.  It doesn't take
> more effort to read the whole spamassassin -D |less output manually than it
> does to create an exact grep | wc -l for the problem.

Ok let's take an example I can come up with...

spamassassin -t -D -L < testmsg 2>&1 |grep __LOWER_E
Jun  7 18:55:58.772 [8203] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Jun  7 18:55:58.773 [8203] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Jun  7 18:55:58.773 [8203] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Jun  7 18:55:58.773 [8203] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Jun  7 18:55:58.773 [8203] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Jun  7 18:55:58.773 [8203] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Jun  7 18:55:58.773 [8203] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Jun  7 18:55:58.773 [8203] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Jun  7 18:55:58.773 [8203] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Jun  7 18:55:58.773 [8203] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Jun  7 18:55:59.009 [8203] dbg: check: subtests=__BODY_LE_200,__BODY_TEXT_LINE(2),__DOS_RCVD_TUE,__ENV_AND_HDR_FROM_MATCH,__E_LIKE_LETTER(31),__HAS_DATE,__HAS_FROM,__HAS_MESSAGE_ID,__HAS_MSGID,__HAS_RCVD,__HAS_SUBJECT,__KAM_BODY_LENGTH_LT_1024,__KAM_BODY_LENGTH_LT_128,__KAM_BODY_LENGTH_LT_256,__KAM_BODY_LENGTH_LT_512,__KAM_DROPBOX2,__KAM_FAKEDELIVER12,__KAM_FAKEDELIVER4,__KAM_FAKEDELIVER6,__KAM_FAKEDELIVER8,__KAM_GOOGLE2_2,__KAM_HARP3,__KAM_HAS_0_URIS,__KAM_JURY3,__KAM_LOTSOFHASH,__KAM_MAILSPLOIT2,__KAM_MULTIPLE_FROM,__KAM_PAYPAL3B,__KAM_SUBJECT_SINGLEWORD,__KAM_UPS2,__KAM_WU1,__KHOP_NO_FULL_NAME,__LCL__ENV_AND_HDR_FROM_MATCH,__LCL__KAM_BODY_LENGTH_LT_1024,__LCL__KAM_BODY_LENGTH_LT_128,__LCL__KAM_BODY_LENGTH_LT_512,__LOWER_E(31),__MISSING_REF,__MISSING_REPLY,__MSGID_OK_HOST,__MSOE_MID_WRONG_CASE,__NONEMPTY_BODY,__NOT_SPOOFED,__RB_LE_200,__SANE_MSGID,__SINGLE_WORD_LINE,__SINGLE_WORD_SUBJ,__SUBJ_SHORT,__TO_NO_ARROWS_R (Total Subtest Hits: 110 / Deduplicated Total Hits: 49)

Perhaps you are expecting to see much more 'ran body rule' lines, which is a
bit questionable practise.  Do you really want to go browse 200 identical
lines?

Would a compatible solution be mentioning later duplicates in the same debug line so it can be seen in grep? (notice the last one)

Jun  7 18:55:58.772 [8203] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Jun  7 18:55:58.773 [8203] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Jun  7 18:55:58.773 [8203] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Jun  7 18:55:58.773 [8203] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Jun  7 18:55:58.773 [8203] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Jun  7 18:55:58.773 [8203] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Jun  7 18:55:58.773 [8203] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Jun  7 18:55:58.773 [8203] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Jun  7 18:55:58.773 [8203] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Jun  7 18:55:58.773 [8203] dbg: rules: ran body rule __LOWER_E ======> got hit: "e" [... message repeated 21 more times]
Jun  7 18:55:59.009 [8203] dbg: check: subtests=__BODY_LE_200,__BODY_TEXT_LINE(2),__DOS_RCVD_TUE,__ENV_AND_HDR_FROM_MATCH,__E_LIKE_LETTER(31),__HAS_DATE,__HAS_FROM,__HAS_MESSAGE_ID,__HAS_MSGID,__HAS_RCVD,__HAS_SUBJECT,__KAM_BODY_LENGTH_LT_1024,__KAM_BODY_LENGTH_LT_128,__KAM_BODY_LENGTH_LT_256,__KAM_BODY_LENGTH_LT_512,__KAM_DROPBOX2,__KAM_FAKEDELIVER12,__KAM_FAKEDELIVER4,__KAM_FAKEDELIVER6,__KAM_FAKEDELIVER8,__KAM_GOOGLE2_2,__KAM_HARP3,__KAM_HAS_0_URIS,__KAM_JURY3,__KAM_LOTSOFHASH,__KAM_MAILSPLOIT2,__KAM_MULTIPLE_FROM,__KAM_PAYPAL3B,__KAM_SUBJECT_SINGLEWORD,__KAM_UPS2,__KAM_WU1,__KHOP_NO_FULL_NAME,__LCL__ENV_AND_HDR_FROM_MATCH,__LCL__KAM_BODY_LENGTH_LT_1024,__LCL__KAM_BODY_LENGTH_LT_128,__LCL__KAM_BODY_LENGTH_LT_512,__LOWER_E(31),__MISSING_REF,__MISSING_REPLY,__MSGID_OK_HOST,__MSOE_MID_WRONG_CASE,__NONEMPTY_BODY,__NOT_SPOOFED,__RB_LE_200,__SANE_MSGID,__SINGLE_WORD_LINE,__SINGLE_WORD_SUBJ,__SUBJ_SHORT,__TO_NO_ARROWS_R (Total Subtest Hits: 110 / Deduplicated Total Hits: 49)

Re: The obviously different case of subtest debug flood

Posted by Henrik K <he...@hege.li>.

On Fri, Jun 07, 2019 at 11:39:12AM -0400, Kevin A. McGrail wrote:
> On 6/7/2019 11:33 AM, Henrik K wrote:
> > Well the information is there.  In many places.  You are saying you are
> > consistently using things like spamassassin -t -D | grep __LOWER_E | wc -l
> > to debug your rules?
> 
> Close.  I am consistently using spamassassin -t -D 2>&1 | grep -i -e
> Content\ analysis -e KAM, for example and since I write mostly meta
> rules, I get some really long and hard to read subtest debug lines.

Well the question was more for John.  I understand your previous problem,
that's fine.

I don't understand how a line describing there are x more identical lines
makes debugging any harder.  Just count 10+21, that's it.  It doesn't take
more effort to read the whole spamassassin -D |less output manually than it
does to create an exact grep | wc -l for the problem.

Re: The obviously different case of subtest debug flood

Posted by Bill Cole <sa...@billmail.scconsult.com>.

On 7 Jun 2019, at 14:53, John Hardin wrote:

> Now if the hits were duplicates, and we logged something like:
>
> Jun  7 11:25:44.265 [1569] dbg: rules: ran body rule __LOWER_E ======> 
> got hit: "e" (100)
>
> ...where we're not collapsing on solely the rule name, I'd accept 
> that.

FWIW, __LOWER_E specifically is this:

body            __LOWER_E       /e/

So there's no issue of hits varying at all. It's evil twin 
__E_LIKE_LETTER will hit on anything that looks like an 'e' so it does 
have diverse hits.

-- 
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire

Re: The obviously different case of subtest debug flood

Posted by John Hardin <jh...@impsec.org>.

On Fri, 7 Jun 2019, Henrik K wrote:

> On Fri, Jun 07, 2019 at 07:48:56AM -0700, John Hardin wrote:
>> On Fri, 7 Jun 2019, Henrik K wrote:
>>
>>> Just committed a simple log suppressor for these kinds of spam..
>>>
>>> Jun  7 11:25:44.264 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
>>> Jun  7 11:25:44.264 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
>>> Jun  7 11:25:44.264 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
>>> Jun  7 11:25:44.264 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
>>> Jun  7 11:25:44.265 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
>>> Jun  7 11:25:44.265 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
>>> Jun  7 11:25:44.265 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
>>> Jun  7 11:25:44.265 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
>>> Jun  7 11:25:44.265 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
>>> Jun  7 11:25:44.265 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
>>> Jun  7 11:25:44.269 [1569] dbg: --- last message repeated 21 times ---
>>
>> Veto doing that. That information is very useful when debugging rules.
>
> Well the information is there.  In many places.  You are saying you are
> consistently using things like spamassassin -t -D | grep __LOWER_E | wc -l
> to debug your rules?

If I'm working on a multiple rule that is something complex (like the text 
variations in the bitcoin extortion and fraud rules) then I want to see 
all the hits and, more importantly, what hit on each.

Now if the hits were duplicates, and we logged something like:

Jun  7 11:25:44.265 [1569] dbg: rules: ran body rule __LOWER_E ======> got hit: "e" (100)

...where we're not collapsing on solely the rule name, I'd accept that.


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The ["assault weapons"] ban is the moral equivalent of banning red
   cars because they look too fast.  -- Steve Chapman, Chicago Tribune
-----------------------------------------------------------------------
  3 days until the 52nd anniversary of Israel's victory in the Six-Day War

Re: The obviously different case of subtest debug flood

Posted by "Kevin A. McGrail" <km...@apache.org>.

On 6/7/2019 11:33 AM, Henrik K wrote:
> Well the information is there.  In many places.  You are saying you are
> consistently using things like spamassassin -t -D | grep __LOWER_E | wc -l
> to debug your rules?

Close.  I am consistently using spamassassin -t -D 2>&1 | grep -i -e
Content\ analysis -e KAM, for example and since I write mostly meta
rules, I get some really long and hard to read subtest debug lines.

-- 
Kevin A. McGrail
Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171