You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Alex <my...@gmail.com> on 2017/07/13 01:04:37 UTC

"bout u" campaign

Hi all,

Has anyone else experienced a spam campaign with any one of the
following subjects:

- sometimes enjoy it wild, how bout you?
- sometimes like it ruff, what bout you?
- sumtimes enjoy it ruff, wat bout you?

The body contains something like "wild hukups" then a phone number.

https://pastebin.com/X5xNn9RZ

It comes from AOL and other freemails, but doesn't hit much, and hits
bayes50 or lower here.

Is this a snowshoe thing? Ideas on how to stop them? I've now trained
them but I thought someone might like to see them for their own
benefit, and perhaps had ideas on a more general way of blocking
these.

What is even the point of spam with a phone number?

The IP range for the ones originating from AOL are all in the
204.29.186.0/24 block. None of them are in any meaningful blacklist
and have a 90+ senderscore.

I'm sure the campaign will change soon, but I thought there was
something more general we could look for the next time...

Re: "bout u" campaign

Posted by Alex <my...@gmail.com>.

Hi,

On Thu, Jul 13, 2017 at 12:01 PM, Dianne Skoll <df...@roaringpenguin.com> wrote:
> On Wed, 12 Jul 2017 21:04:37 -0400
> Alex <my...@gmail.com> wrote:
>
>> Has anyone else experienced a spam campaign with any one of the
>> following subjects:
>
>> - sometimes enjoy it wild, how bout you?
>> - sometimes like it ruff, what bout you?
>> - sumtimes enjoy it ruff, wat bout you?
>
> 144 hits, all of them except one on Tuesday, 11 July.  All
> whacked very handily by Bayes.

Thank you. We also saw a variation with "irs collection" that were
subsequently caught by bayes.

Re: "bout u" campaign

Posted by Dianne Skoll <df...@roaringpenguin.com>.

On Wed, 12 Jul 2017 21:04:37 -0400
Alex <my...@gmail.com> wrote:

> Has anyone else experienced a spam campaign with any one of the
> following subjects:

> - sometimes enjoy it wild, how bout you?
> - sometimes like it ruff, what bout you?
> - sumtimes enjoy it ruff, wat bout you?

144 hits, all of them except one on Tuesday, 11 July.  All
whacked very handily by Bayes.

Regards,

Dianne.

Re: "bout u" campaign

Posted by "Kevin A. McGrail" <ke...@mcgrail.com>.

On 7/12/2017 9:04 PM, Alex wrote:
> Has anyone else experienced a spam campaign with any one of the
> following subjects:

0 hits today on this, nothing that's gotten through for me on our servers.

RE: "bout u" campaign

Posted by Charles Amstutz <ch...@infinitesys.com>.

As a follow up, it says how to do the DNS, just now how to list in the .cf files, maybe I can copy another blacklist syntax?

                Infinite Systems
                Charles Amstutz | Systems Administrator
                charlesa@infinitesys.com 402.477.2474
                134 S 13th Street, Suite 302 | Lincoln, NE 68508

-----Original Message-----
From: David Jones [mailto:djones@ena.com] 
Sent: Thursday, July 13, 2017 8:17 AM
To: users@spamassassin.apache.org
Subject: Re: "bout u" campaign

On 07/12/2017 09:50 PM, Alex wrote:
> Hi,
> 
>> pretty high mainly due to DCC and BAYES_99.
> 
> Are you paying for DCC? I think we're over their limit and they 
> blacklisted us long ago, lol.

I have my own DCC server joined into the DCC network.

https://www.dcc-servers.net/dcc/

> 
>> I guess I have well trained Bayes.
> 
> I think you just don't have many one-liner emails as a regular course 
> of business?

I am classifying about 10K ham and 8K spam each day which I also use in the masscheck processing (currently on hold).  Since I have started doing this about a month or so ago, my BAYES scores seem to be more accurate.  Maybe I wasn't training enough ham/spam before?  I don't know for sure yet.

> 
>>   1.2 RCVD_IN_LASHBACK       RBL: Received is listed in Lashback
>>                              usb.unsubscore.com
>>                              [204.29.186.60 listed in 
>> ubl.unsubscore.com]
> 
> I forgot about this. I have it in postscreen (+1) but now also added it in SA.
> 
>>   2.2 RCVD_IN_SORBS_SPAM     RBL: SORBS: sender is a spam source
> 
> We do have some in SORBS, but only score it 0.5.  Do you really 
> recommend scoring it so high?
> Obviously I do because it's working well in my platform.  I have other
WL rules that subtract points to offset this one.  If there are no other WL (i.e. list.dnswl.org) hits then this will stand out more.

Do some analysis of your emails that hit this rule and what the scores were.  My threshold for blocking is 6.0 (default for MailScanner).  If your threshold is 5.0 and your ham with this rule his is scoring below
3.3 (5.0 - 1.7), then you would be fine setting this to score 2.2.

>>   0.0 OS_UNKNOWN             Relay runs on unknown OS
> 
> That's an interesting one. Fingerprinting?
> 
Yeh.  I thought it might be a useful data point for making meta rules but it turns out to not be.  I will probably leave this out when I rebuild my filters in the next couple of months on CentOS 7.

>>   1.2 FREEMAIL_FROM          Sender email is commonly abused enduser mail
> 
> This is also scored *much* lower here - we have many freemail senders.
> The default score is 0.001, so you must have changed it.
> 
Yep.  Again my block threshold is 6.0 in MailScanner and I have less default trust for FREEMAIL senders.  I also have meta rules based on FREEMAIL and other hits that add to the score based on combinations I have seen over the years.

FREEMAIL senders are very difficult to accurately filter but I feel like my rules are pretty good.  I have to postwhite exclude most freemail providers since they are listed on some RBLs which makes no sense to me. 
  You can't block the big ones like Yahoo, Hotmail, Comcast, etc. just because they are so large and there are many legit senders in the middle of the spammers.

>> -2.2 RCVD_IN_SENDERSCORE_90_100 Senderscore.org score of 90 to 100
> 
> For 90_100, I think we're only subtracting -0.2.
> 
For my mail flow, I have noticed that senders in the 90's are normally very trustworthy.

If you separate your rules into 2 main categories, then you can setup scores based on their category to balance out the other category.

1. IP and domain reputation
2. Message content

Good IP reputation can offset questionable message content and vice versa.  I tend to go heavy on the reputation side at the MTA and in SA which has serve me well in the past several years.  Before that, I was constantly adjusting content rule scores and writing custom rules to react to the latest spam campaign where I was always behind.

I have a huge list of whitelist_auth based on domain reputation which allows me to crank up some content scores and not let Bayes block good reputation senders based on content.

>>   2.2 ENA_DIGEST_FREEMAIL    Freemail account hitting message digest spam
>> seen by the Internet (DCC, Pyzor, or Razor).
> 
> The problem I always had with pyzor/dcc was that it works on very 
> small blocks of text, no? Perhaps it works well for small messages, 
> but isn't it problematic for larger messages?
> 
I have no idea.  I just analyzed my mail scoring and noticed combinations like DCC and FREEMAIL are common in my spam.

>>   1.2 ENA_DIGEST_MULTIPLE_MSPIKE_H2 Dcc, Razor, or Pyzor hits from servers
>>                              listed in MSPIKE_H2 so add back points.
>>   0.0 ENA_BAD_SPAM           Spam hitting really bad rules.
>>   2.2 ENA_BAD_SPAM_FREEMAIL  Bad spam from freemail (hotmail, gmail, msn,
>>                              yahoo).
> 
> These are interesting, but I suppose privileged...
> 
The ENA_BAD_SPAM rule is a combination of 2 different types (reputation and content) rules with an AND between them.  For example (this is is about one-third of the rule):

meta            ENA_BAD_SPAM            (DCC_CHECK || PYZOR_CHECK || 
RAZOR2_CHECK || RAZOR2_CF_RANGE_E8_51_100 || BAYES_999 || BAYES_99 ||
BAYES_95 || RCVD_IN_BL_SPAMCOP_NET || RCVD_IN_SORBS_WEB ||
RCVD_IN_SENDERSCORE_60_69 || RCVD_IN_SENDERSCORE_50_59 ||
RCVD_IN_SENDERSCORE_30_49 || RCVD_IN_SENDERSCORE_0_29 || RCVD_IN_SORBS_SPAM ) && (URI_PHISH || URIBL_IVMURI || FREEMAIL_FROM || FREEMAIL_REPLYTO || FREEMAIL_FORGED_REPLYTO || MISSING_SUBJECT || MISSING_DATE || KAM_REALLYHUGEIMGSRC || KAM_HUGEIMGSRC || KAM_MANYTO || HTML_FONT_LOW_CONTRAST || ADVANCE_FEE_2_NEW_MONEY || ADVANCE_FEE_2_NEW_FORM || ADVANCE_FEE_3_NEW || ADVANCE_FEE_3_NEW_MONEY 
|| ADVANCE_FEE_3_NEW_FORM || ADVANCE_FEE_4_NEW || TVD_RCVD_SINGLE)
describe        ENA_BAD_SPAM            Spam hitting really bad rules.
score           ENA_BAD_SPAM            0.001

/etc/mail/spamassassin/99_mailspike.cf
shortcircuit RCVD_IN_MSPIKE_H5 on

score RCVD_IN_MSPIKE_H4 -3.2
score RCVD_IN_MSPIKE_H3 -2.2
score RCVD_IN_MSPIKE_H2 -1.2
score RCVD_IN_MSPIKE_WL -0.82
score RCVD_IN_MSPIKE_BL 1.2
score RCVD_IN_MSPIKE_L2 0.2
score RCVD_IN_MSPIKE_L3 1.2
score RCVD_IN_MSPIKE_L4 2.2
score RCVD_IN_MSPIKE_L5 3.2

meta		ENA_DIGEST_FREEMAIL	FREEMAIL_FROM && (DCC_CHECK || PYZOR_CHECK || 
RAZOR2_CHECK)
describe	ENA_DIGEST_FREEMAIL	Freemail account hitting message digest 
spam seen by the Internet (DCC, Pyzor, or Razor).
score		ENA_DIGEST_FREEMAIL	2.2

meta		ENA_DIGEST_MULTIPLE_DNSWL_MED	(DIGEST_MULTIPLE || 
ENA_DIGEST_FREEMAIL) && RCVD_IN_DNSWL_MED
describe	ENA_DIGEST_MULTIPLE_DNSWL_MED	Dcc, Razor, or Pyzor hits from 
servers listed in DNSWL so add back points.
score		ENA_DIGEST_MULTIPLE_DNSWL_MED	2.2

meta		ENA_DIGEST_MULTIPLE_MSPIKE_H4	(DIGEST_MULTIPLE || 
ENA_DIGEST_FREEMAIL) && RCVD_IN_MSPIKE_H4
describe	ENA_DIGEST_MULTIPLE_MSPIKE_H4	Dcc, Razor, or Pyzor hits from 
servers listed in MSPIKE_H4 so add back points.
score		ENA_DIGEST_MULTIPLE_MSPIKE_H4	3.2

meta		ENA_DIGEST_MULTIPLE_MSPIKE_H3	(DIGEST_MULTIPLE || 
ENA_DIGEST_FREEMAIL) && RCVD_IN_MSPIKE_H3
describe	ENA_DIGEST_MULTIPLE_MSPIKE_H3	Dcc, Razor, or Pyzor hits from 
servers listed in MSPIKE_H3 so add back points.
score		ENA_DIGEST_MULTIPLE_MSPIKE_H3	2.2

meta		ENA_DIGEST_MULTIPLE_MSPIKE_H2	(DIGEST_MULTIPLE || 
ENA_DIGEST_FREEMAIL) && RCVD_IN_MSPIKE_H2
describe	ENA_DIGEST_MULTIPLE_MSPIKE_H2	Dcc, Razor, or Pyzor hits from 
servers listed in MSPIKE_H2 so add back points.
score		ENA_DIGEST_MULTIPLE_MSPIKE_H2	1.2

Hope this is helpful.

--
David Jones

RE: "bout u" campaign

Posted by Charles Amstutz <ch...@infinitesys.com>.

I'm starting mine out at 0.5 until I see what happens.

                Infinite Systems
                Charles Amstutz | Systems Administrator
                charlesa@infinitesys.com 402.477.2474
                134 S 13th Street, Suite 302 | Lincoln, NE 68508

-----Original Message-----
From: David Jones [mailto:djones@ena.com] 
Sent: Thursday, July 13, 2017 11:13 AM
To: users@spamassassin.apache.org
Subject: Re: "bout u" campaign

On 07/13/2017 10:56 AM, RW wrote:
> On Thu, 13 Jul 2017 09:33:04 -0400
> Alex wrote:
> 
>> On Thu, Jul 13, 2017 at 9:29 AM, Charles Amstutz 
>> <ch...@infinitesys.com> wrote:
>>> How do you use lashback? It says that it is free to use for 
>>> commercial and non commercial use. How do I set it up?
>>
>> Drop this into your local.cf or similar:
>>
>> header   RCVD_IN_LASHBACK eval:check_rbl('LASHBACK',
>> 'ubl.unsubscore.com')
> 
> I have it as lastexternal:
> 
> header RCVD_IN_UNSUBBL  eval:check_rbl('ubl-lastexternal', 
> 'ubl.unsubscore.com')
> 
> I've found there to be quite a lot of ISP pool addresses in it, so 
> deep checks are probably unsafe.
> 

I started mine with lastexternal and didn't find much added value over other major RBLs and since my MTA was blocking mostly with IVM and Spamhaus RBLs that overlapped Lashback.  I also wanted to check outbound mail where the second or more hop was from an infected device most likely under botnet control.  It would have helped in the OP spam.

> I've also found it has quite a high FP rate of ~2%.
> 

I am working with them to fix these FPs (they include major mail providers like Comcast, Microsoft and Google which are pointless) and potentially be included in the default SA rules.  It's still a valuable RBL to help with an overall score even with a ~2% FP.  I wouldn't score it too high like you can with Spamhaus and IVM.  I also have it at 1.2.

-- 
David Jones

Re: "bout u" campaign

Posted by David Jones <dj...@ena.com>.

On 07/13/2017 10:56 AM, RW wrote:
> On Thu, 13 Jul 2017 09:33:04 -0400
> Alex wrote:
> 
>> On Thu, Jul 13, 2017 at 9:29 AM, Charles Amstutz
>> <ch...@infinitesys.com> wrote:
>>> How do you use lashback? It says that it is free to use for
>>> commercial and non commercial use. How do I set it up?
>>
>> Drop this into your local.cf or similar:
>>
>> header   RCVD_IN_LASHBACK eval:check_rbl('LASHBACK',
>> 'ubl.unsubscore.com')
> 
> I have it as lastexternal:
> 
> header RCVD_IN_UNSUBBL  eval:check_rbl('ubl-lastexternal', 'ubl.unsubscore.com')
> 
> I've found there to be quite a lot of ISP pool addresses in it, so deep
> checks are probably unsafe.
> 

I started mine with lastexternal and didn't find much added value over 
other major RBLs and since my MTA was blocking mostly with IVM and 
Spamhaus RBLs that overlapped Lashback.  I also wanted to check outbound 
mail where the second or more hop was from an infected device most 
likely under botnet control.  It would have helped in the OP spam.

> I've also found it has quite a high FP rate of ~2%.
> 

I am working with them to fix these FPs (they include major mail 
providers like Comcast, Microsoft and Google which are pointless) and 
potentially be included in the default SA rules.  It's still a valuable 
RBL to help with an overall score even with a ~2% FP.  I wouldn't score 
it too high like you can with Spamhaus and IVM.  I also have it at 1.2.

-- 
David Jones

RE: "bout u" campaign

Posted by John Hardin <jh...@impsec.org>.

On Thu, 13 Jul 2017, Charles Amstutz wrote:

> Hello,
>
> For the inexeperienced, what is the difference between lashback and lastexternal.

"lashback" is a DNSBL

"lastexternal" is which MTA gets checked against that DNSBL. In this case, 
the last MTA external to your network - the MTA that handed the message 
over to your MTA. This is versus, for example, the MTA that first accepted 
the message from an email program.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Microsoft is not a standards body.
-----------------------------------------------------------------------
  3 days until the 72nd anniversary of the dawn of the Atomic Age

Re: "bout u" campaign

Posted by David Jones <dj...@ena.com>.

On 07/13/2017 11:00 AM, Charles Amstutz wrote:
> Hello,
> 
> For the inexeperienced, what is the difference between lashback and lastexternal.
> 
> 
>                  Infinite Systems
>                  Charles Amstutz | Systems Administrator
>                  charlesa@infinitesys.com 402.477.2474
>                  134 S 13th Street, Suite 302 | Lincoln, NE 68508
>   
> 
> 

Search for "lastexternal" on this page:

https://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Conf.html

Basically you should have your internal_networks setup with your 
internal network space plus the trusted_networks setup with networks 
that you trust to never send spam or if you are POP/IMAP from a 
provider.  Then SA can determine the lastexternal IP to check against 
and not all hops.

For example, I do score.senderscore.org as lastexternal since it has 
such a high score:

header          RCVD_IN_SENDERSCORE_0_29 
eval:check_rbl('senderscore0-lastexternal','score.senderscore.com.','^127\.0\.4\.([1-2]?[0-9])$')
describe        RCVD_IN_SENDERSCORE_0_29        Senderscore.org score of 
0 to 29
score           RCVD_IN_SENDERSCORE_0_29        5.2
tflags          RCVD_IN_SENDERSCORE_0_29        net

-- 
David Jones

Re: "bout u" campaign

Posted by RW <rw...@googlemail.com>.

On Thu, 13 Jul 2017 16:00:14 +0000
Charles Amstutz Top-Posted:

> Hello,
> 
> For the inexeperienced, what is the difference between lashback and
> lastexternal.

lashback is just a label, the difference is between

  eval:check_rbl('LASHBACK', ...

and

  eval:check_rbl('LASHBACK-lastexternal', ...


The former checks all the IP addresses outside your trusted network,
the latter only checks the last-external IP address, i.e. the MX
handover.

If a dynamic IP address gets into a blocklist, it may get
reassigned to many other devices before it's delisted.  Most blocklists
are lastexternal because a device with a dynamic address shouldn't be
delivering direct to MX in the first place.

RE: "bout u" campaign

Posted by Charles Amstutz <ch...@infinitesys.com>.

Hello,

For the inexeperienced, what is the difference between lashback and lastexternal.


                Infinite Systems
                Charles Amstutz | Systems Administrator
                charlesa@infinitesys.com 402.477.2474
                134 S 13th Street, Suite 302 | Lincoln, NE 68508
 


-----Original Message-----
From: RW [mailto:rwmaillists@googlemail.com] 
Sent: Thursday, July 13, 2017 10:57 AM
To: users@spamassassin.apache.org
Subject: Re: "bout u" campaign

On Thu, 13 Jul 2017 09:33:04 -0400
Alex wrote:

> On Thu, Jul 13, 2017 at 9:29 AM, Charles Amstutz 
> <ch...@infinitesys.com> wrote:
> > How do you use lashback? It says that it is free to use for 
> > commercial and non commercial use. How do I set it up?
> 
> Drop this into your local.cf or similar:
> 
> header   RCVD_IN_LASHBACK eval:check_rbl('LASHBACK',
> 'ubl.unsubscore.com')

I have it as lastexternal:

header RCVD_IN_UNSUBBL  eval:check_rbl('ubl-lastexternal', 'ubl.unsubscore.com')

I've found there to be quite a lot of ISP pool addresses in it, so deep checks are probably unsafe. 

I've also found it has quite a high FP rate of ~2%.

Re: "bout u" campaign

Posted by RW <rw...@googlemail.com>.

On Thu, 13 Jul 2017 09:33:04 -0400
Alex wrote:

> On Thu, Jul 13, 2017 at 9:29 AM, Charles Amstutz
> <ch...@infinitesys.com> wrote:
> > How do you use lashback? It says that it is free to use for
> > commercial and non commercial use. How do I set it up?  
> 
> Drop this into your local.cf or similar:
> 
> header   RCVD_IN_LASHBACK eval:check_rbl('LASHBACK',
> 'ubl.unsubscore.com') 

I have it as lastexternal:

header RCVD_IN_UNSUBBL  eval:check_rbl('ubl-lastexternal', 'ubl.unsubscore.com')

I've found there to be quite a lot of ISP pool addresses in it, so deep
checks are probably unsafe. 

I've also found it has quite a high FP rate of ~2%.

RE: "bout u" campaign

Posted by Charles Amstutz <ch...@infinitesys.com>.

Thanks


                Infinite Systems
                Charles Amstutz | Systems Administrator
                charlesa@infinitesys.com 402.477.2474
                134 S 13th Street, Suite 302 | Lincoln, NE 68508
 


-----Original Message-----
From: Alex [mailto:mysqlstudent@gmail.com] 
Sent: Thursday, July 13, 2017 8:33 AM
To: Charles Amstutz <ch...@infinitesys.com>; SA Mailing list <us...@spamassassin.apache.org>
Subject: Re: "bout u" campaign

On Thu, Jul 13, 2017 at 9:29 AM, Charles Amstutz <ch...@infinitesys.com> wrote:
> How do you use lashback? It says that it is free to use for commercial and non commercial use. How do I set it up?

Drop this into your local.cf or similar:

header   RCVD_IN_LASHBACK eval:check_rbl('LASHBACK', 'ubl.unsubscore.com')
describe RCVD_IN_LASHBACK LashBack Unsubscribe Blacklist
tflags   RCVD_IN_LASHBACK net
score    RCVD_IN_LASHBACK 1.2

I've scored it at 1.2. You may wish to change that, perhaps lower for a while, while you see how it works in your organization.

Re: "bout u" campaign

Posted by RW <rw...@googlemail.com>.

On Thu, 13 Jul 2017 15:56:59 +0000
Charles Amstutz wrote:

> Thanks,
> 
> I was looking at the default RBL lists
> 
> https://wiki.apache.org/spamassassin/DnsBlocklists
> 
> But was looking for other things that are free for commercial use. I
> found this that is possible.
> 
> http://0spam.fusionzero.com/
> 
> but don't know if wanyone had experience with it, or could make other
> recommendations. 

You might try this one, it may be well suited to postscreen or outright
rejection as it supposed to be quite conservative - I haven't had any
FPs. 

header    RCVD_IN_GBUDB      eval:check_rbl('gbudb-lastexternal', 'truncate.gbudb.net.')
describe  RCVD_IN_GBUDB      Listed in truncate.gbudb.net
tflags    RCVD_IN_GBUDB      net
score     RCVD_IN_GBUDB      1.0 # adjust after testing

RE: "bout u" campaign

Posted by Charles Amstutz <ch...@infinitesys.com>.

Thanks,

I was looking at the default RBL lists

https://wiki.apache.org/spamassassin/DnsBlocklists

But was looking for other things that are free for commercial use. I found this that is possible.

http://0spam.fusionzero.com/

but don't know if wanyone had experience with it, or could make other recommendations. 


>Drop this into your local.cf or similar:

>header   RCVD_IN_LASHBACK eval:check_rbl('LASHBACK', 'ubl.unsubscore.com')
>describe RCVD_IN_LASHBACK LashBack Unsubscribe Blacklist
>tflags   RCVD_IN_LASHBACK net
>score    RCVD_IN_LASHBACK 1.2

> I've scored it at 1.2. You may wish to change that, perhaps lower for a while, while you see how it works in your organization.

Re: "bout u" campaign

Posted by Alex <my...@gmail.com>.

On Thu, Jul 13, 2017 at 9:29 AM, Charles Amstutz
<ch...@infinitesys.com> wrote:
> How do you use lashback? It says that it is free to use for commercial and non commercial use. How do I set it up?

Drop this into your local.cf or similar:

header   RCVD_IN_LASHBACK eval:check_rbl('LASHBACK', 'ubl.unsubscore.com')
describe RCVD_IN_LASHBACK LashBack Unsubscribe Blacklist
tflags   RCVD_IN_LASHBACK net
score    RCVD_IN_LASHBACK 1.2

I've scored it at 1.2. You may wish to change that, perhaps lower for
a while, while you see how it works in your organization.

RE: "bout u" campaign

Posted by Charles Amstutz <ch...@infinitesys.com>.

How do you use lashback? It says that it is free to use for commercial and non commercial use. How do I set it up?

                Infinite Systems
                Charles Amstutz | Systems Administrator
                charlesa@infinitesys.com 402.477.2474
                134 S 13th Street, Suite 302 | Lincoln, NE 68508

-----Original Message-----
From: David Jones [mailto:djones@ena.com] 
Sent: Thursday, July 13, 2017 8:17 AM
To: users@spamassassin.apache.org
Subject: Re: "bout u" campaign

On 07/12/2017 09:50 PM, Alex wrote:
> Hi,
> 
>> pretty high mainly due to DCC and BAYES_99.
> 
> Are you paying for DCC? I think we're over their limit and they 
> blacklisted us long ago, lol.

I have my own DCC server joined into the DCC network.

https://www.dcc-servers.net/dcc/

> 
>> I guess I have well trained Bayes.
> 
> I think you just don't have many one-liner emails as a regular course 
> of business?

I am classifying about 10K ham and 8K spam each day which I also use in the masscheck processing (currently on hold).  Since I have started doing this about a month or so ago, my BAYES scores seem to be more accurate.  Maybe I wasn't training enough ham/spam before?  I don't know for sure yet.

> 
>>   1.2 RCVD_IN_LASHBACK       RBL: Received is listed in Lashback
>>                              usb.unsubscore.com
>>                              [204.29.186.60 listed in 
>> ubl.unsubscore.com]
> 
> I forgot about this. I have it in postscreen (+1) but now also added it in SA.
> 
>>   2.2 RCVD_IN_SORBS_SPAM     RBL: SORBS: sender is a spam source
> 
> We do have some in SORBS, but only score it 0.5.  Do you really 
> recommend scoring it so high?
> Obviously I do because it's working well in my platform.  I have other
WL rules that subtract points to offset this one.  If there are no other WL (i.e. list.dnswl.org) hits then this will stand out more.

Do some analysis of your emails that hit this rule and what the scores were.  My threshold for blocking is 6.0 (default for MailScanner).  If your threshold is 5.0 and your ham with this rule his is scoring below
3.3 (5.0 - 1.7), then you would be fine setting this to score 2.2.

>>   0.0 OS_UNKNOWN             Relay runs on unknown OS
> 
> That's an interesting one. Fingerprinting?
> 
Yeh.  I thought it might be a useful data point for making meta rules but it turns out to not be.  I will probably leave this out when I rebuild my filters in the next couple of months on CentOS 7.

>>   1.2 FREEMAIL_FROM          Sender email is commonly abused enduser mail
> 
> This is also scored *much* lower here - we have many freemail senders.
> The default score is 0.001, so you must have changed it.
> 
Yep.  Again my block threshold is 6.0 in MailScanner and I have less default trust for FREEMAIL senders.  I also have meta rules based on FREEMAIL and other hits that add to the score based on combinations I have seen over the years.

FREEMAIL senders are very difficult to accurately filter but I feel like my rules are pretty good.  I have to postwhite exclude most freemail providers since they are listed on some RBLs which makes no sense to me. 
  You can't block the big ones like Yahoo, Hotmail, Comcast, etc. just because they are so large and there are many legit senders in the middle of the spammers.

>> -2.2 RCVD_IN_SENDERSCORE_90_100 Senderscore.org score of 90 to 100
> 
> For 90_100, I think we're only subtracting -0.2.
> 
For my mail flow, I have noticed that senders in the 90's are normally very trustworthy.

If you separate your rules into 2 main categories, then you can setup scores based on their category to balance out the other category.

1. IP and domain reputation
2. Message content

Good IP reputation can offset questionable message content and vice versa.  I tend to go heavy on the reputation side at the MTA and in SA which has serve me well in the past several years.  Before that, I was constantly adjusting content rule scores and writing custom rules to react to the latest spam campaign where I was always behind.

I have a huge list of whitelist_auth based on domain reputation which allows me to crank up some content scores and not let Bayes block good reputation senders based on content.

>>   2.2 ENA_DIGEST_FREEMAIL    Freemail account hitting message digest spam
>> seen by the Internet (DCC, Pyzor, or Razor).
> 
> The problem I always had with pyzor/dcc was that it works on very 
> small blocks of text, no? Perhaps it works well for small messages, 
> but isn't it problematic for larger messages?
> 
I have no idea.  I just analyzed my mail scoring and noticed combinations like DCC and FREEMAIL are common in my spam.

>>   1.2 ENA_DIGEST_MULTIPLE_MSPIKE_H2 Dcc, Razor, or Pyzor hits from servers
>>                              listed in MSPIKE_H2 so add back points.
>>   0.0 ENA_BAD_SPAM           Spam hitting really bad rules.
>>   2.2 ENA_BAD_SPAM_FREEMAIL  Bad spam from freemail (hotmail, gmail, msn,
>>                              yahoo).
> 
> These are interesting, but I suppose privileged...
> 
The ENA_BAD_SPAM rule is a combination of 2 different types (reputation and content) rules with an AND between them.  For example (this is is about one-third of the rule):

meta            ENA_BAD_SPAM            (DCC_CHECK || PYZOR_CHECK || 
RAZOR2_CHECK || RAZOR2_CF_RANGE_E8_51_100 || BAYES_999 || BAYES_99 ||
BAYES_95 || RCVD_IN_BL_SPAMCOP_NET || RCVD_IN_SORBS_WEB ||
RCVD_IN_SENDERSCORE_60_69 || RCVD_IN_SENDERSCORE_50_59 ||
RCVD_IN_SENDERSCORE_30_49 || RCVD_IN_SENDERSCORE_0_29 || RCVD_IN_SORBS_SPAM ) && (URI_PHISH || URIBL_IVMURI || FREEMAIL_FROM || FREEMAIL_REPLYTO || FREEMAIL_FORGED_REPLYTO || MISSING_SUBJECT || MISSING_DATE || KAM_REALLYHUGEIMGSRC || KAM_HUGEIMGSRC || KAM_MANYTO || HTML_FONT_LOW_CONTRAST || ADVANCE_FEE_2_NEW_MONEY || ADVANCE_FEE_2_NEW_FORM || ADVANCE_FEE_3_NEW || ADVANCE_FEE_3_NEW_MONEY 
|| ADVANCE_FEE_3_NEW_FORM || ADVANCE_FEE_4_NEW || TVD_RCVD_SINGLE)
describe        ENA_BAD_SPAM            Spam hitting really bad rules.
score           ENA_BAD_SPAM            0.001

/etc/mail/spamassassin/99_mailspike.cf
shortcircuit RCVD_IN_MSPIKE_H5 on

score RCVD_IN_MSPIKE_H4 -3.2
score RCVD_IN_MSPIKE_H3 -2.2
score RCVD_IN_MSPIKE_H2 -1.2
score RCVD_IN_MSPIKE_WL -0.82
score RCVD_IN_MSPIKE_BL 1.2
score RCVD_IN_MSPIKE_L2 0.2
score RCVD_IN_MSPIKE_L3 1.2
score RCVD_IN_MSPIKE_L4 2.2
score RCVD_IN_MSPIKE_L5 3.2

meta		ENA_DIGEST_FREEMAIL	FREEMAIL_FROM && (DCC_CHECK || PYZOR_CHECK || 
RAZOR2_CHECK)
describe	ENA_DIGEST_FREEMAIL	Freemail account hitting message digest 
spam seen by the Internet (DCC, Pyzor, or Razor).
score		ENA_DIGEST_FREEMAIL	2.2

meta		ENA_DIGEST_MULTIPLE_DNSWL_MED	(DIGEST_MULTIPLE || 
ENA_DIGEST_FREEMAIL) && RCVD_IN_DNSWL_MED
describe	ENA_DIGEST_MULTIPLE_DNSWL_MED	Dcc, Razor, or Pyzor hits from 
servers listed in DNSWL so add back points.
score		ENA_DIGEST_MULTIPLE_DNSWL_MED	2.2

meta		ENA_DIGEST_MULTIPLE_MSPIKE_H4	(DIGEST_MULTIPLE || 
ENA_DIGEST_FREEMAIL) && RCVD_IN_MSPIKE_H4
describe	ENA_DIGEST_MULTIPLE_MSPIKE_H4	Dcc, Razor, or Pyzor hits from 
servers listed in MSPIKE_H4 so add back points.
score		ENA_DIGEST_MULTIPLE_MSPIKE_H4	3.2

meta		ENA_DIGEST_MULTIPLE_MSPIKE_H3	(DIGEST_MULTIPLE || 
ENA_DIGEST_FREEMAIL) && RCVD_IN_MSPIKE_H3
describe	ENA_DIGEST_MULTIPLE_MSPIKE_H3	Dcc, Razor, or Pyzor hits from 
servers listed in MSPIKE_H3 so add back points.
score		ENA_DIGEST_MULTIPLE_MSPIKE_H3	2.2

meta		ENA_DIGEST_MULTIPLE_MSPIKE_H2	(DIGEST_MULTIPLE || 
ENA_DIGEST_FREEMAIL) && RCVD_IN_MSPIKE_H2
describe	ENA_DIGEST_MULTIPLE_MSPIKE_H2	Dcc, Razor, or Pyzor hits from 
servers listed in MSPIKE_H2 so add back points.
score		ENA_DIGEST_MULTIPLE_MSPIKE_H2	1.2

Hope this is helpful.

--
David Jones

Re: "bout u" campaign

Posted by David Jones <dj...@ena.com>.

On 07/15/2017 09:42 AM, RW wrote:
> On Thu, 13 Jul 2017 18:26:54 -0400
> Alex wrote:
> 
>> Hi,
>>
>>>> Are you paying for DCC? I think we're over their limit and they
>>>> blacklisted us long ago, lol.
>>>
>>> I have my own DCC server joined into the DCC network.
>>>
>>> https://www.dcc-servers.net/dcc/
>>
>> So you only provide spam services for your own users? Or do you pay?
>>
>>> I am classifying about 10K ham and 8K spam each day which I also
>>> use in the masscheck processing (currently on hold).  Since I have
>>> started doing this
>>
>> Through autolearn?
>>
>> It is otherwise extremely time-intensive.
>>
>>> Yep.  Again my block threshold is 6.0 in MailScanner and I have
>>> less default trust for FREEMAIL senders.  I also have meta rules
>>> based on FREEMAIL and other hits that add to the score based on
>>> combinations I have seen over the years.
>>
>> Adjusting many of the default rules disrupts the score balance created
>> by masschecks, no?
>>
>> I want to avoid having to juggle scores around, in addition to already
>> worrying about writing rules that ultimately have the same effect as
>> existing metas.
>>
>>>>>    2.2 ENA_DIGEST_FREEMAIL    Freemail account hitting message
>>>>> digest spam seen by the Internet (DCC, Pyzor, or Razor).
>>
>> Are you worried about overlap between the checksum systems?
>>
>> I've enabled DCC again today, and remembered what I don't like about
>> it. Do you have DCC_CHECK at its default 1.1 score? That's quite high
>> for something described as "bulk mail" when bulk mail is already
>> scored very close to 5.0.
> 
> And with  FREEMAIL_FROM plus DCC_CHECK (or any digest) you
> have
> 
> 1.2 FREEMAIL_FROM
> 2.2 DCC_CHECK
> 2.2 ENA_DIGEST_FREEMAIL
> 0.0 ENA_BAD_SPAM
> 
> which is 5.6 points. And judging by the name, at least in some cases,
> maybe all:
> 
> 2.2 ENA_BAD_SPAM_FREEMAIL
> 
> which makes  7.8 points. This is something that presumably works for
> him, but could cause problems in general.
> 

I was trying to give high-level information on the difference between 
reputation-based rules and content-based rules and how they can be used 
in combination.  For FREEMAIL, I have found that making the average 
message score just below the threshold gives the maximum reliability. 
Since my threshold for blocking is 6.0, I try to get the average 
FREEMAIL message to score in the 3.0 to 5.0 range.  With well-trained 
BAYES and a few other rules that subtract (BAYES_00, good reputation, 
etc.), this is working well.  When FREEMAIL messages hit DCC and a few 
other meta rules common in spam, then they will be over 6.0 like 
mentioned above.

Each person has to examine their mail flow and scoring to determine what 
will work in there environment but the concepts should still apply.

1.  Create a large list of whitelist_auth and whitelist_from_rcvd for 
those senders that a) aren't FREEMAIL, b) aren't human mailboxes with 
potentially compromised passwords, and c) have a valid unsubscribe 
link/process.
Examples:
whitelist_auth *@*.wayfair.com
whitelist_auth *@*.dunkindonuts.com
whitelist_auth *@mktgdillards.com
whitelist_auth *@*.usaa.com
whitelist_auth *@*.citi.com
whitelist_auth *@*.sophos.com
whitelist_auth *@*.myfedloan.org
whitelist_auth *@*.hiltonhonors.com
whitelist_auth *@*.usatoday.com
whitelist_auth *@*.usbank.com

2.  Enable SHORTCIRCUIT'ing:
shortcircuit USER_IN_WHITELIST on
priority     USER_IN_WHITELIST -400
shortcircuit USER_IN_DEF_WHITELIST on
shortcircuit USER_IN_BLACKLIST on
shortcircuit USER_IN_DKIM_WHITELIST on
shortcircuit USER_IN_DEF_DKIM_WL on
shortcircuit USER_IN_SPF_WHITELIST on
shortcircuit USER_IN_DEF_SPF_WL on

shortcircuit RCVD_IN_RP_CERTIFIED on
shortcircuit RCVD_IN_RP_SAFE on
shortcircuit RCVD_IN_DNSWL_HI on
shortcircuit RCVD_IN_IADB_LISTED on
shortcircuit RCVD_IN_IADB_SPF on
shortcircuit RCVD_IN_IADB_DK on
shortcircuit RCVD_IN_IADB_RDNS on
shortcircuit RCVD_IN_IADB_SENDERID on
shortcircuit RCVD_IN_IADB_OPTIN on

3. Add in extra RBL rules that aren't included with SA.  Test these with 
low scores until comfortable.  Lashback, senderscore.org, Mailspike and 
IVM if you have a subscription.

Once you tweak the above list to your email, you should have the 
reputation side covered well which will allow content-based checks to 
help with the rest of the spam.  Well-trained Bayes, ClamAV unofficial 
signatures, DCC, Razor, Pyzor, KAM.cf, custom meta rules, etc. will all 
help with the rest of the spam and you won't have to constantly react to 
the latest spam campaign.  You will still have to tweak and tune a 
little but not nearly as much as before.

-- 
David Jones

Re: "bout u" campaign

Posted by RW <rw...@googlemail.com>.

On Thu, 13 Jul 2017 18:26:54 -0400
Alex wrote:

> Hi,
> 
> >> Are you paying for DCC? I think we're over their limit and they
> >> blacklisted us long ago, lol.  
> >
> > I have my own DCC server joined into the DCC network.
> >
> > https://www.dcc-servers.net/dcc/  
> 
> So you only provide spam services for your own users? Or do you pay?
> 
> > I am classifying about 10K ham and 8K spam each day which I also
> > use in the masscheck processing (currently on hold).  Since I have
> > started doing this  
> 
> Through autolearn?
> 
> It is otherwise extremely time-intensive.
> 
> > Yep.  Again my block threshold is 6.0 in MailScanner and I have
> > less default trust for FREEMAIL senders.  I also have meta rules
> > based on FREEMAIL and other hits that add to the score based on
> > combinations I have seen over the years.  
> 
> Adjusting many of the default rules disrupts the score balance created
> by masschecks, no?
> 
> I want to avoid having to juggle scores around, in addition to already
> worrying about writing rules that ultimately have the same effect as
> existing metas.
> 
> >>>   2.2 ENA_DIGEST_FREEMAIL    Freemail account hitting message
> >>> digest spam seen by the Internet (DCC, Pyzor, or Razor).  
> 
> Are you worried about overlap between the checksum systems?
> 
> I've enabled DCC again today, and remembered what I don't like about
> it. Do you have DCC_CHECK at its default 1.1 score? That's quite high
> for something described as "bulk mail" when bulk mail is already
> scored very close to 5.0.

And with  FREEMAIL_FROM plus DCC_CHECK (or any digest) you
have 

1.2 FREEMAIL_FROM 
2.2 DCC_CHECK
2.2 ENA_DIGEST_FREEMAIL
0.0 ENA_BAD_SPAM

which is 5.6 points. And judging by the name, at least in some cases,
maybe all:

2.2 ENA_BAD_SPAM_FREEMAIL

which makes  7.8 points. This is something that presumably works for
him, but could cause problems in general.

Re: "bout u" campaign

Posted by David Jones <dj...@ena.com>.

On 07/14/2017 09:22 PM, Alex wrote:
> Hi,
> 
>>>> The ENA_BAD_SPAM rule is a combination of 2 different types (reputation
>>>> and
>>>> content) rules with an AND between them.  For example (this is is about
>>>> one-third of the rule):
>>>
>>> Is it usable like this?
>>
>> Try it out with a score of 0.001 and see what you think.  It should have
>> been valid.  Just drop it in and run:
>>
>> spamassassin -D --lint 2>&1 | /bin/grep -Ei '(failed|undefined
>> dependency|score set for non-existent rule)' | /bin/grep ENA_
> 
> By "usable" I meant have you included enough of the rule for it to
> really be effective?
> 
> I let it run for the day, and it's just not anchored well enough to
> provide any meaningful benefit. It's hitting on jcpenny, vresp.com,
> constantcontact, sendgrid, facebook, etc.
> 

I have all of those senders in whitelist_auth entries.  The ENA_BAD_SPAM 
has a score of 0.001 just as a place holder for other meta rules based 
on it that have a score of 1.2 - 3.2.

Once you setup different tiers of senders and SHORTCIRCUIT all of the 
trusted senders that usually score very low, you will be able to handle 
regular and untrusted senders more aggressively.

As I have said before, I SHORTCIRCUIT as ham thousands of domains based 
on their envelope-from domain as long as they have legit unsubscribe/opt 
out processes/links.  Now I don't have to worry about these being 
falsely categorized as spam based on content.  I don't SHORTCIRCUIT any 
FREEMAIL domains or any domains that have user mailboxes that can be 
compromised.

My MTA blocks the majority of the junk so what passes through SA is 
mostly SHORTCIRCUIT'd as ham.  Less than 5 percent is spam blocked by 
SA.  I only get the occasional report of spam from customers from 
compromised accounts now which are very difficult to block based on 
reputation.  Content-based rules are really the only way since these 
spammers are crafting zero-hour email that are designed to get through 
major mail filters.

-- 
David Jones

Re: "bout u" campaign

Posted by Alex <my...@gmail.com>.

Hi,

>>> The ENA_BAD_SPAM rule is a combination of 2 different types (reputation
>>> and
>>> content) rules with an AND between them.  For example (this is is about
>>> one-third of the rule):
>>
>> Is it usable like this?
>
> Try it out with a score of 0.001 and see what you think.  It should have
> been valid.  Just drop it in and run:
>
> spamassassin -D --lint 2>&1 | /bin/grep -Ei '(failed|undefined
> dependency|score set for non-existent rule)' | /bin/grep ENA_

By "usable" I meant have you included enough of the rule for it to
really be effective?

I let it run for the day, and it's just not anchored well enough to
provide any meaningful benefit. It's hitting on jcpenny, vresp.com,
constantcontact, sendgrid, facebook, etc.

Re: "bout u" campaign

Posted by David Jones <dj...@ena.com>.

On 07/13/2017 05:26 PM, Alex wrote:
> Hi,
> 
>>> Are you paying for DCC? I think we're over their limit and they
>>> blacklisted us long ago, lol.
>>
>> I have my own DCC server joined into the DCC network.
>>
>> https://www.dcc-servers.net/dcc/
> 
> So you only provide spam services for your own users? Or do you pay?
> 

Our DCC server was setup 6+ years ago by a previous mail sysadmin before 
I started working at my current job.  We don't budget or pay anything 
annually for DCC.  We are peered with another DCC server in their 
network.  All I know is that we must keep our current IP address the 
same to maintain the peering.  I have one DCC server that I point my 8 
mail filters to.

>> I am classifying about 10K ham and 8K spam each day which I also use in the
>> masscheck processing (currently on hold).  Since I have started doing this
> 
> Through autolearn?
> 
> It is otherwise extremely time-intensive.
> 

Actually I have found some rule combinations and score thresholds that 
are definitely ham/spam.  I have built an iRedMail VM with no RBLs, 
postscreen, or other MTA optimizations and disabled some things in 
amavis-new so spam will get to SA.  Ham comes from a subset of my 
primary SA filters based on SHORTCIRCUIT rules and very low scoring 
messages.

I setup Inbox rules to move certain messages into ham/spam folders.  I 
have to login once a day and spend a few minutes quickly reviewing the 
unread messages and marking them as read.  My masscheck and SA learning 
uses the read folder (Maildir cur directory).

>> Yep.  Again my block threshold is 6.0 in MailScanner and I have less default
>> trust for FREEMAIL senders.  I also have meta rules based on FREEMAIL and
>> other hits that add to the score based on combinations I have seen over the
>> years.
> 
> Adjusting many of the default rules disrupts the score balance created
> by masschecks, no?
> 

Correct.  Before I knew about the masscheck processing and what it does, 
I used to adjust the scores on most of the rules which was time 
consuming just like re-actively creating rules for new spam campaigns. 
A few months ago I removed most of my custom scores on default SA rules 
and I use meta rules to combine hits on certain rules to add some points.

> I want to avoid having to juggle scores around, in addition to already
> worrying about writing rules that ultimately have the same effect as
> existing metas.
> 
>>>>    2.2 ENA_DIGEST_FREEMAIL    Freemail account hitting message digest spam
>>>> seen by the Internet (DCC, Pyzor, or Razor).
> 
> Are you worried about overlap between the checksum systems?
> 
> I've enabled DCC again today, and remembered what I don't like about
> it. Do you have DCC_CHECK at its default 1.1 score? That's quite high
> for something described as "bulk mail" when bulk mail is already
> scored very close to 5.0.
> 

If you follow my logical separation of rules into reputation-based and 
content-based then DCC, RAZOR, and PYZOR are going to fall into the 
content side.  You still have the reputation rules that will lower the 
score and offset these DIGEST rules.  Plus with many SHORTCIRCUIT'd 
senders based on whitelist_auth and whitelist_from_rcvd, the 
trusted/safe bulk senders with a valid unsubscribe process will pass 
through fine.

> How much more effective do you find DCC than PYZOR? That's already
> scored at 1.4.
> 

Haven't really had to worry about this with SHORTCIRCUIT'ing and 
whitelist_auth on the envelope-from domain (SPF_PASS + non-human account 
domains).

>> I have no idea.  I just analyzed my mail scoring and noticed combinations
>> like DCC and FREEMAIL are common in my spam.
> 
> That's a good combination.
> 
>> The ENA_BAD_SPAM rule is a combination of 2 different types (reputation and
>> content) rules with an AND between them.  For example (this is is about
>> one-third of the rule):
> 
> Is it usable like this?
>

Try it out with a score of 0.001 and see what you think.  It should have 
been valid.  Just drop it in and run:

spamassassin -D --lint 2>&1 | /bin/grep -Ei '(failed|undefined 
dependency|score set for non-existent rule)' | /bin/grep ENA_

You can also run the first section and check for a zero return code.  I 
have a config distribution script that runs the first part above and 
will not send it out if the return code is not zero.

>> /etc/mail/spamassassin/99_mailspike.cf
>> shortcircuit RCVD_IN_MSPIKE_H5 on
>>
>> score RCVD_IN_MSPIKE_H4 -3.2
>> score RCVD_IN_MSPIKE_H3 -2.2
>> score RCVD_IN_MSPIKE_H2 -1.2
>> score RCVD_IN_MSPIKE_WL -0.82
>> score RCVD_IN_MSPIKE_BL 1.2
>> score RCVD_IN_MSPIKE_L2 0.2
>> score RCVD_IN_MSPIKE_L3 1.2
>> score RCVD_IN_MSPIKE_L4 2.2
>> score RCVD_IN_MSPIKE_L5 3.2
> 
> The default scores for these rules are all almost 0 when bayes and
> network tests are enabled. I've adjusted the L[2-5] rules from 0.2 to
> 1.2. Took a quick look at a handful of L5 mail and anything higher
> would be problematic.
> 
>> Hope this is helpful.
> 
> Thanks, as always.
> 
> 
>>
>> --
>> David Jones
-- 
David Jones

Re: "bout u" campaign

Posted by Alex <my...@gmail.com>.

Hi,

>> Are you paying for DCC? I think we're over their limit and they
>> blacklisted us long ago, lol.
>
> I have my own DCC server joined into the DCC network.
>
> https://www.dcc-servers.net/dcc/

So you only provide spam services for your own users? Or do you pay?

> I am classifying about 10K ham and 8K spam each day which I also use in the
> masscheck processing (currently on hold).  Since I have started doing this

Through autolearn?

It is otherwise extremely time-intensive.

> Yep.  Again my block threshold is 6.0 in MailScanner and I have less default
> trust for FREEMAIL senders.  I also have meta rules based on FREEMAIL and
> other hits that add to the score based on combinations I have seen over the
> years.

Adjusting many of the default rules disrupts the score balance created
by masschecks, no?

I want to avoid having to juggle scores around, in addition to already
worrying about writing rules that ultimately have the same effect as
existing metas.

>>>   2.2 ENA_DIGEST_FREEMAIL    Freemail account hitting message digest spam
>>> seen by the Internet (DCC, Pyzor, or Razor).

Are you worried about overlap between the checksum systems?

I've enabled DCC again today, and remembered what I don't like about
it. Do you have DCC_CHECK at its default 1.1 score? That's quite high
for something described as "bulk mail" when bulk mail is already
scored very close to 5.0.

How much more effective do you find DCC than PYZOR? That's already
scored at 1.4.

> I have no idea.  I just analyzed my mail scoring and noticed combinations
> like DCC and FREEMAIL are common in my spam.

That's a good combination.

> The ENA_BAD_SPAM rule is a combination of 2 different types (reputation and
> content) rules with an AND between them.  For example (this is is about
> one-third of the rule):

Is it usable like this?

> /etc/mail/spamassassin/99_mailspike.cf
> shortcircuit RCVD_IN_MSPIKE_H5 on
>
> score RCVD_IN_MSPIKE_H4 -3.2
> score RCVD_IN_MSPIKE_H3 -2.2
> score RCVD_IN_MSPIKE_H2 -1.2
> score RCVD_IN_MSPIKE_WL -0.82
> score RCVD_IN_MSPIKE_BL 1.2
> score RCVD_IN_MSPIKE_L2 0.2
> score RCVD_IN_MSPIKE_L3 1.2
> score RCVD_IN_MSPIKE_L4 2.2
> score RCVD_IN_MSPIKE_L5 3.2

The default scores for these rules are all almost 0 when bayes and
network tests are enabled. I've adjusted the L[2-5] rules from 0.2 to
1.2. Took a quick look at a handful of L5 mail and anything higher
would be problematic.

> Hope this is helpful.

Thanks, as always.


>
> --
> David Jones

Re: "bout u" campaign

Posted by David Jones <dj...@ena.com>.

On 07/12/2017 09:50 PM, Alex wrote:
> Hi,
> 
>> pretty high mainly due to DCC and BAYES_99.
> 
> Are you paying for DCC? I think we're over their limit and they
> blacklisted us long ago, lol.

I have my own DCC server joined into the DCC network.

https://www.dcc-servers.net/dcc/

> 
>> I guess I have well trained Bayes.
> 
> I think you just don't have many one-liner emails as a regular course
> of business?

I am classifying about 10K ham and 8K spam each day which I also use in 
the masscheck processing (currently on hold).  Since I have started 
doing this about a month or so ago, my BAYES scores seem to be more 
accurate.  Maybe I wasn't training enough ham/spam before?  I don't know 
for sure yet.

> 
>>   1.2 RCVD_IN_LASHBACK       RBL: Received is listed in Lashback
>>                              usb.unsubscore.com
>>                              [204.29.186.60 listed in ubl.unsubscore.com]
> 
> I forgot about this. I have it in postscreen (+1) but now also added it in SA.
> 
>>   2.2 RCVD_IN_SORBS_SPAM     RBL: SORBS: sender is a spam source
> 
> We do have some in SORBS, but only score it 0.5.  Do you really
> recommend scoring it so high?
> Obviously I do because it's working well in my platform.  I have other 
WL rules that subtract points to offset this one.  If there are no other 
WL (i.e. list.dnswl.org) hits then this will stand out more.

Do some analysis of your emails that hit this rule and what the scores 
were.  My threshold for blocking is 6.0 (default for MailScanner).  If 
your threshold is 5.0 and your ham with this rule his is scoring below 
3.3 (5.0 - 1.7), then you would be fine setting this to score 2.2.

>>   0.0 OS_UNKNOWN             Relay runs on unknown OS
> 
> That's an interesting one. Fingerprinting?
> 
Yeh.  I thought it might be a useful data point for making meta rules 
but it turns out to not be.  I will probably leave this out when I 
rebuild my filters in the next couple of months on CentOS 7.

>>   1.2 FREEMAIL_FROM          Sender email is commonly abused enduser mail
> 
> This is also scored *much* lower here - we have many freemail senders.
> The default score is 0.001, so you must have changed it.
> 
Yep.  Again my block threshold is 6.0 in MailScanner and I have less 
default trust for FREEMAIL senders.  I also have meta rules based on 
FREEMAIL and other hits that add to the score based on combinations I 
have seen over the years.

FREEMAIL senders are very difficult to accurately filter but I feel like 
my rules are pretty good.  I have to postwhite exclude most freemail 
providers since they are listed on some RBLs which makes no sense to me. 
  You can't block the big ones like Yahoo, Hotmail, Comcast, etc. just 
because they are so large and there are many legit senders in the middle 
of the spammers.

>> -2.2 RCVD_IN_SENDERSCORE_90_100 Senderscore.org score of 90 to 100
> 
> For 90_100, I think we're only subtracting -0.2.
> 
For my mail flow, I have noticed that senders in the 90's are normally 
very trustworthy.

If you separate your rules into 2 main categories, then you can setup 
scores based on their category to balance out the other category.

1. IP and domain reputation
2. Message content

Good IP reputation can offset questionable message content and vice 
versa.  I tend to go heavy on the reputation side at the MTA and in SA 
which has serve me well in the past several years.  Before that, I was 
constantly adjusting content rule scores and writing custom rules to 
react to the latest spam campaign where I was always behind.

I have a huge list of whitelist_auth based on domain reputation which 
allows me to crank up some content scores and not let Bayes block good 
reputation senders based on content.

>>   2.2 ENA_DIGEST_FREEMAIL    Freemail account hitting message digest spam
>> seen by the Internet (DCC, Pyzor, or Razor).
> 
> The problem I always had with pyzor/dcc was that it works on very
> small blocks of text, no? Perhaps it works well for small messages,
> but isn't it problematic for larger messages?
> 
I have no idea.  I just analyzed my mail scoring and noticed 
combinations like DCC and FREEMAIL are common in my spam.

>>   1.2 ENA_DIGEST_MULTIPLE_MSPIKE_H2 Dcc, Razor, or Pyzor hits from servers
>>                              listed in MSPIKE_H2 so add back points.
>>   0.0 ENA_BAD_SPAM           Spam hitting really bad rules.
>>   2.2 ENA_BAD_SPAM_FREEMAIL  Bad spam from freemail (hotmail, gmail, msn,
>>                              yahoo).
> 
> These are interesting, but I suppose privileged...
> 
The ENA_BAD_SPAM rule is a combination of 2 different types (reputation 
and content) rules with an AND between them.  For example (this is is 
about one-third of the rule):

meta            ENA_BAD_SPAM            (DCC_CHECK || PYZOR_CHECK || 
RAZOR2_CHECK || RAZOR2_CF_RANGE_E8_51_100 || BAYES_999 || BAYES_99 || 
BAYES_95 || RCVD_IN_BL_SPAMCOP_NET || RCVD_IN_SORBS_WEB || 
RCVD_IN_SENDERSCORE_60_69 || RCVD_IN_SENDERSCORE_50_59 || 
RCVD_IN_SENDERSCORE_30_49 || RCVD_IN_SENDERSCORE_0_29 || 
RCVD_IN_SORBS_SPAM ) && (URI_PHISH || URIBL_IVMURI || FREEMAIL_FROM || 
FREEMAIL_REPLYTO || FREEMAIL_FORGED_REPLYTO || MISSING_SUBJECT || 
MISSING_DATE || KAM_REALLYHUGEIMGSRC || KAM_HUGEIMGSRC || KAM_MANYTO || 
HTML_FONT_LOW_CONTRAST || ADVANCE_FEE_2_NEW_MONEY || 
ADVANCE_FEE_2_NEW_FORM || ADVANCE_FEE_3_NEW || ADVANCE_FEE_3_NEW_MONEY 
|| ADVANCE_FEE_3_NEW_FORM || ADVANCE_FEE_4_NEW || TVD_RCVD_SINGLE)
describe        ENA_BAD_SPAM            Spam hitting really bad rules.
score           ENA_BAD_SPAM            0.001

/etc/mail/spamassassin/99_mailspike.cf
shortcircuit RCVD_IN_MSPIKE_H5 on

score RCVD_IN_MSPIKE_H4 -3.2
score RCVD_IN_MSPIKE_H3 -2.2
score RCVD_IN_MSPIKE_H2 -1.2
score RCVD_IN_MSPIKE_WL -0.82
score RCVD_IN_MSPIKE_BL 1.2
score RCVD_IN_MSPIKE_L2 0.2
score RCVD_IN_MSPIKE_L3 1.2
score RCVD_IN_MSPIKE_L4 2.2
score RCVD_IN_MSPIKE_L5 3.2

meta		ENA_DIGEST_FREEMAIL	FREEMAIL_FROM && (DCC_CHECK || PYZOR_CHECK || 
RAZOR2_CHECK)
describe	ENA_DIGEST_FREEMAIL	Freemail account hitting message digest 
spam seen by the Internet (DCC, Pyzor, or Razor).
score		ENA_DIGEST_FREEMAIL	2.2

meta		ENA_DIGEST_MULTIPLE_DNSWL_MED	(DIGEST_MULTIPLE || 
ENA_DIGEST_FREEMAIL) && RCVD_IN_DNSWL_MED
describe	ENA_DIGEST_MULTIPLE_DNSWL_MED	Dcc, Razor, or Pyzor hits from 
servers listed in DNSWL so add back points.
score		ENA_DIGEST_MULTIPLE_DNSWL_MED	2.2

meta		ENA_DIGEST_MULTIPLE_MSPIKE_H4	(DIGEST_MULTIPLE || 
ENA_DIGEST_FREEMAIL) && RCVD_IN_MSPIKE_H4
describe	ENA_DIGEST_MULTIPLE_MSPIKE_H4	Dcc, Razor, or Pyzor hits from 
servers listed in MSPIKE_H4 so add back points.
score		ENA_DIGEST_MULTIPLE_MSPIKE_H4	3.2

meta		ENA_DIGEST_MULTIPLE_MSPIKE_H3	(DIGEST_MULTIPLE || 
ENA_DIGEST_FREEMAIL) && RCVD_IN_MSPIKE_H3
describe	ENA_DIGEST_MULTIPLE_MSPIKE_H3	Dcc, Razor, or Pyzor hits from 
servers listed in MSPIKE_H3 so add back points.
score		ENA_DIGEST_MULTIPLE_MSPIKE_H3	2.2

meta		ENA_DIGEST_MULTIPLE_MSPIKE_H2	(DIGEST_MULTIPLE || 
ENA_DIGEST_FREEMAIL) && RCVD_IN_MSPIKE_H2
describe	ENA_DIGEST_MULTIPLE_MSPIKE_H2	Dcc, Razor, or Pyzor hits from 
servers listed in MSPIKE_H2 so add back points.
score		ENA_DIGEST_MULTIPLE_MSPIKE_H2	1.2

Hope this is helpful.

-- 
David Jones

Re: "bout u" campaign

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.

On 12.07.17 22:50, Alex wrote:
>> pretty high mainly due to DCC and BAYES_99.
>
>Are you paying for DCC? I think we're over their limit and they
>blacklisted us long ago, lol.

Configure your own DCC server and connect to their network. 
It is not a paid service (paid is if you don't connect server to their
nwetwork).

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Fucking windows! Bring Bill Gates! (Southpark the movie)

Re: "bout u" campaign

Posted by Alex <my...@gmail.com>.

Hi,

> pretty high mainly due to DCC and BAYES_99.

Are you paying for DCC? I think we're over their limit and they
blacklisted us long ago, lol.

> I guess I have well trained Bayes.

I think you just don't have many one-liner emails as a regular course
of business?

>  1.2 RCVD_IN_LASHBACK       RBL: Received is listed in Lashback
>                             usb.unsubscore.com
>                             [204.29.186.60 listed in ubl.unsubscore.com]

I forgot about this. I have it in postscreen (+1) but now also added it in SA.

>  2.2 RCVD_IN_SORBS_SPAM     RBL: SORBS: sender is a spam source

We do have some in SORBS, but only score it 0.5.  Do you really
recommend scoring it so high?

>  0.0 OS_UNKNOWN             Relay runs on unknown OS

That's an interesting one. Fingerprinting?

>  1.2 FREEMAIL_FROM          Sender email is commonly abused enduser mail

This is also scored *much* lower here - we have many freemail senders.
The default score is 0.001, so you must have changed it.

> -2.2 RCVD_IN_SENDERSCORE_90_100 Senderscore.org score of 90 to 100

For 90_100, I think we're only subtracting -0.2.

>  2.2 ENA_DIGEST_FREEMAIL    Freemail account hitting message digest spam
> seen by the Internet (DCC, Pyzor, or Razor).

The problem I always had with pyzor/dcc was that it works on very
small blocks of text, no? Perhaps it works well for small messages,
but isn't it problematic for larger messages?

>  1.2 ENA_DIGEST_MULTIPLE_MSPIKE_H2 Dcc, Razor, or Pyzor hits from servers
>                             listed in MSPIKE_H2 so add back points.
>  0.0 ENA_BAD_SPAM           Spam hitting really bad rules.
>  2.2 ENA_BAD_SPAM_FREEMAIL  Bad spam from freemail (hotmail, gmail, msn,
>                             yahoo).

These are interesting, but I suppose privileged...

Re: "bout u" campaign

Posted by Alex <my...@gmail.com>.

Hi,

> I SHORTCIRCUIT any trustworthy sender with a legit unsubscribe process to
> put control back in the hands/mouse of the end user.  I also SHORTCIRCUIT
> with whitelist_auth any domains (primarily subdomains) that are
> system-generated and consistently score very low.

Just now received this one, and thought it was relevant given our
conversation today. Would you have (did you?) shortcircuited this?

https://pastebin.com/CXj0mgw1

This hit SENDERSCORE_90_100 as well as MSPIKE_H2 and BAYES_50. It also
didn't hit any FREEMAIL rules. It also hit LASHBACK.

I'm curious how your system would have blocked this. I've lowered
LASHBACK to 0.65 because it was hitting so much ham, but even at the
1.2+ we were discussing it would not have been marked as spam.

Thanks,
Alex

Re: "bout u" campaign

Posted by David Jones <dj...@ena.com>.

On 07/13/2017 12:56 PM, Dave Jones wrote:
> On 07/13/2017 12:39 PM, Alex wrote:
>> Hi,
>>
>>> header          RCVD_IN_SENDERSCORE_0_29
>>> eval:check_rbl('senderscore0-lastexternal','score.senderscore.com.','^127\.0\.4\.([1-2]?[0-9])$') 
>>>
>>> describe        RCVD_IN_SENDERSCORE_0_29        Senderscore.org score 
>>> of 0
>>> to 29
>>> score           RCVD_IN_SENDERSCORE_0_29        5.2
>>> tflags          RCVD_IN_SENDERSCORE_0_29        net
>>
>> At least in my environment, this one in particular would catch a ton
>> of legitimate mail. This also assumes a 6.0 score for you, correct?
>>
> 
> Correct.  My block threshold of 6.0 is the default in MailScanner.
> 
> The legit email should be SHORTCIRCUIT'd with whitelist_auth entries.
> 
> I SHORTCIRCUIT any trustworthy sender with a legit unsubscribe process 
> to put control back in the hands/mouse of the end user.  I also 
> SHORTCIRCUIT with whitelist_auth any domains (primarily subdomains) that 
> are system-generated and consistently score very low.
> 
>  From my own email analysis, the majority of my spam is from FREEMAIL 
> senders and compromised accounts with zero-hour spam campaigns that the 
> mail server is not yet on any RBLs.  Botnet controlled devices are 
> another source of spam but they seem to be sending through compromised 
> accounts these days.  They will phish a password, sit on it for days or 
> weeks, craft a zero-hour spam campaign to get through most mail filters, 
> then blast as much spam as they can until RBLs and DCC catch up to it in 
> about 30 minutes or so.  These compromised accounts from normally 
> trusted mail server IPs are they reason why some SA RBL rules need to go 
> beyond the lastexternal hop.
> 

Let me clarify a bit.  Don't put any FREEMAIL or domains with human 
accounts (potentially compromised) in your whitelist_auth unless you 
have to for some reason.  Some senders may not have SPF or DKIM setup 
properly so you may have to put some of them in the whitelist_from_rcvd 
to get the same result.

Doing this will separate out trustworthy senders from potential content 
pitfalls.  For example, legit eBay emails will get through while spoofed 
emails with identical email content can be blocked by Bayes or other 
content rules.

I am seeing a lot of email spoofing USAA insurance lately to phish 
accounts.  I whitelist_auth legit USAA emails then train the rest as 
spam so Bayes and other rules can block the phishing.

-- 
David Jones

Re: "bout u" campaign

Posted by Dave Jones <da...@apache.org>.

On 07/13/2017 12:39 PM, Alex wrote:
> Hi,
> 
>> header          RCVD_IN_SENDERSCORE_0_29
>> eval:check_rbl('senderscore0-lastexternal','score.senderscore.com.','^127\.0\.4\.([1-2]?[0-9])$')
>> describe        RCVD_IN_SENDERSCORE_0_29        Senderscore.org score of 0
>> to 29
>> score           RCVD_IN_SENDERSCORE_0_29        5.2
>> tflags          RCVD_IN_SENDERSCORE_0_29        net
> 
> At least in my environment, this one in particular would catch a ton
> of legitimate mail. This also assumes a 6.0 score for you, correct?
> 

Correct.  My block threshold of 6.0 is the default in MailScanner.

The legit email should be SHORTCIRCUIT'd with whitelist_auth entries.

I SHORTCIRCUIT any trustworthy sender with a legit unsubscribe process 
to put control back in the hands/mouse of the end user.  I also 
SHORTCIRCUIT with whitelist_auth any domains (primarily subdomains) that 
are system-generated and consistently score very low.

 From my own email analysis, the majority of my spam is from FREEMAIL 
senders and compromised accounts with zero-hour spam campaigns that the 
mail server is not yet on any RBLs.  Botnet controlled devices are 
another source of spam but they seem to be sending through compromised 
accounts these days.  They will phish a password, sit on it for days or 
weeks, craft a zero-hour spam campaign to get through most mail filters, 
then blast as much spam as they can until RBLs and DCC catch up to it in 
about 30 minutes or so.  These compromised accounts from normally 
trusted mail server IPs are they reason why some SA RBL rules need to go 
beyond the lastexternal hop.

-- 
David Jones

Re: "bout u" campaign

Posted by Alex <my...@gmail.com>.

Hi,

> header          RCVD_IN_SENDERSCORE_0_29
> eval:check_rbl('senderscore0-lastexternal','score.senderscore.com.','^127\.0\.4\.([1-2]?[0-9])$')
> describe        RCVD_IN_SENDERSCORE_0_29        Senderscore.org score of 0
> to 29
> score           RCVD_IN_SENDERSCORE_0_29        5.2
> tflags          RCVD_IN_SENDERSCORE_0_29        net

At least in my environment, this one in particular would catch a ton
of legitimate mail. This also assumes a 6.0 score for you, correct?

Re: "bout u" campaign

Posted by Dave Jones <da...@apache.org>.

On 07/13/2017 12:03 PM, @lbutlr wrote:
> On Jul 12, 2017, at 8:18 PM, David Jones <dj...@ena.com> wrote:
>> -2.2 RCVD_IN_SENDERSCORE_90_100 Senderscore.org score of 90 to 100
> 
> I haven’t seen that before (or not that I’ve noticed). Is it part fo the base SA package or something that was added?
> 
> 

I posted something generic about score.senderscore.org a year or so ago 
but here are the rules that I have been using now for a couple of years:

/etc/mail/spamassassin/99_senderscore.cf

ifplugin Mail::SpamAssassin::Plugin::DNSEval

header          __RCVD_IN_SENDERSCORE_90_100 
eval:check_rbl('senderscore90-lastexternal','score.senderscore.com.','^1
27\.0\.4\.(9[0-9]|100)$')
meta            RCVD_IN_SENDERSCORE_90_100      SPF_PASS && 
__RCVD_IN_SENDERSCORE_90_100
describe        RCVD_IN_SENDERSCORE_90_100      Senderscore.org score of 
90 to 100
score           RCVD_IN_SENDERSCORE_90_100      -2.2
tflags          RCVD_IN_SENDERSCORE_90_100      net

header		__RCVD_IN_SENDERSCORE_80_89 
eval:check_rbl('senderscorer80-lastexternal','score.senderscore.com.','^127\.0\.4\.(8[0-9])$')
meta		RCVD_IN_SENDERSCORE_80_89	SPF_PASS && __RCVD_IN_SENDERSCORE_80_89
describe	RCVD_IN_SENDERSCORE_80_89	Senderscore.org score of 80 to 89
score		RCVD_IN_SENDERSCORE_80_89	-0.2
tflags		RCVD_IN_SENDERSCORE_80_89	net

header		RCVD_IN_SENDERSCORE_70_79 
eval:check_rbl('senderscorer70-lastexternal','score.senderscore.com.','^127\.0\.4\.(7[0-9])$')
describe	RCVD_IN_SENDERSCORE_70_79	Senderscore.org score of 70 to 79
score		RCVD_IN_SENDERSCORE_70_79	1.2
tflags		RCVD_IN_SENDERSCORE_70_79	net

header		RCVD_IN_SENDERSCORE_60_69 
eval:check_rbl('senderscorer60-lastexternal','score.senderscore.com.','^127\.0\.4\.(6[0-9])$')
describe	RCVD_IN_SENDERSCORE_60_69	Senderscore.org score of 60 to 69
score		RCVD_IN_SENDERSCORE_60_69	2.2
tflags		RCVD_IN_SENDERSCORE_60_69	net

header		RCVD_IN_SENDERSCORE_50_59 
eval:check_rbl('senderscorer50-lastexternal','score.senderscore.com.','^127\.0\.4\.(5[0-9])$')
describe	RCVD_IN_SENDERSCORE_50_59	Senderscore.org score of 50 to 59
score		RCVD_IN_SENDERSCORE_50_59	3.2
tflags		RCVD_IN_SENDERSCORE_50_59	net

header		RCVD_IN_SENDERSCORE_30_49 
eval:check_rbl('senderscorer30-lastexternal','score.senderscore.com.','^127\.0\.4\.([3-4][0-9])$')
describe	RCVD_IN_SENDERSCORE_30_49	Senderscore.org score of 30 to 49
score		RCVD_IN_SENDERSCORE_30_49	4.2
tflags		RCVD_IN_SENDERSCORE_30_49	net

header		RCVD_IN_SENDERSCORE_0_29 
eval:check_rbl('senderscore0-lastexternal','score.senderscore.com.','^127\.0\.4\.([1-2]?[0-9])$')
describe	RCVD_IN_SENDERSCORE_0_29	Senderscore.org score of 0 to 29
score		RCVD_IN_SENDERSCORE_0_29	5.2
tflags		RCVD_IN_SENDERSCORE_0_29	net

endif


-- 
David Jones

Re: "bout u" campaign

Posted by "@lbutlr" <kr...@kreme.com>.

On Jul 12, 2017, at 8:18 PM, David Jones <dj...@ena.com> wrote:
> -2.2 RCVD_IN_SENDERSCORE_90_100 Senderscore.org score of 90 to 100

I haven’t seen that before (or not that I’ve noticed). Is it part fo the base SA package or something that was added?


-- 
Apple broke AppleScripting signatures in Mail.app, so no random signatures.

Re: "bout u" campaign

Posted by David Jones <dj...@ena.com>.

On 07/12/2017 08:04 PM, Alex wrote:
> Hi all,
> 
> Has anyone else experienced a spam campaign with any one of the
> following subjects:
> 
> - sometimes enjoy it wild, how bout you?
> - sometimes like it ruff, what bout you?
> - sumtimes enjoy it ruff, wat bout you?
> 
> The body contains something like "wild hukups" then a phone number.
> 
> https://pastebin.com/X5xNn9RZ
> 
> It comes from AOL and other freemails, but doesn't hit much, and hits
> bayes50 or lower here.
> 
> Is this a snowshoe thing? Ideas on how to stop them? I've now trained
> them but I thought someone might like to see them for their own
> benefit, and perhaps had ideas on a more general way of blocking
> these.
> 
> What is even the point of spam with a phone number?
> 
> The IP range for the ones originating from AOL are all in the
> 204.29.186.0/24 block. None of them are in any meaningful blacklist
> and have a 90+ senderscore.
> 
> I'm sure the campaign will change soon, but I thought there was
> something more general we could look for the next time...
> 

Time has passed so there could be more hits on RBLs by now and DCC hit 
now that may not have hit earlier but my SA scored it pretty high mainly 
due to DCC and BAYES_99.  I guess I have well trained Bayes.  I have 
some meta rules that trigger adding more points when FREEMAIL hits 
things like KAM_URI, DIGEST_MULTIPLE and high BAYES.  The ENA_BAD_SPAM 
is a huge list of combinations of bad rule hits built over years that 
triggers other rules with points.


Content analysis details:   (14.1 points, 5.0 required)

  pts rule name              description
---- ---------------------- 
--------------------------------------------------
  1.2 RCVD_IN_LASHBACK       RBL: Received is listed in Lashback
                             usb.unsubscore.com
                             [204.29.186.60 listed in ubl.unsubscore.com]
  3.5 BAYES_99               BODY: Bayes spam probability is 99 to 100%
                             [score: 0.9993]
  2.2 RCVD_IN_SORBS_SPAM     RBL: SORBS: sender is a spam source
                             [204.29.186.60 listed in dnsbl.sorbs.net]
-0.2 RCVD_IN_DNSWL_NONE     RBL: Sender listed at http://www.dnswl.org/, no
                             trust
                             [204.29.186.60 listed in list.dnswl.org]
-1.2 RCVD_IN_MSPIKE_H2      RBL: Average reputation (+2)
                             [204.29.186.60 listed in wl.mailspike.net]
-0.0 SPF_PASS               SPF: sender matches SPF record
  0.0 OS_UNKNOWN             Relay runs on unknown OS
  1.2 FREEMAIL_FROM          Sender email is commonly abused enduser 
mail provider
                             (georgia32ce[at]aol.com)
  1.5 KAM_MXURI              URI: URI begins with a mail exchange 
prefix, i.e. mx.[...]
  0.2 BAYES_999              BODY: Bayes spam probability is 99.9 to 100%
                             [score: 0.9993]
  0.0 HTML_MESSAGE           BODY: HTML included in message
  2.2 DCC_CHECK              Detected as bulk mail by DCC (dcc-servers.net)
  0.1 DKIM_SIGNED            Message has a DKIM or DK signature, not 
necessarily valid
-2.2 RCVD_IN_SENDERSCORE_90_100 Senderscore.org score of 90 to 100
  0.0 T_DKIM_INVALID         DKIM-Signature header exists but is not valid
  2.2 ENA_DIGEST_FREEMAIL    Freemail account hitting message digest 
spam seen
                              by the Internet (DCC, Pyzor, or Razor).
  1.2 ENA_DIGEST_MULTIPLE_MSPIKE_H2 Dcc, Razor, or Pyzor hits from servers
                             listed in MSPIKE_H2 so add back points.
  0.0 ENA_BAD_SPAM           Spam hitting really bad rules.
  2.2 ENA_BAD_SPAM_FREEMAIL  Bad spam from freemail (hotmail, gmail, msn,
                             yahoo).

-- 
David Jones

Re: "bout u" campaign

Posted by Kevin Golding <kp...@caomhin.org>.

On Mon, 17 Jul 2017 18:38:24 +0100, David Jones <dj...@ena.com> wrote:

> It would be nice if there was a local tool that could be part of the SA  
> project that would extend the masscheck processing and help build  
> content and meta rules.

As John's already mentioned, there is a surprising array of tools already  
included:

https://svn.apache.org/repos/asf/spamassassin/trunk/masses/rule-dev/

It's less amount creating and more about refining.

Re: "bout u" campaign

Posted by David Jones <dj...@ena.com>.

On 07/17/2017 12:03 PM, Jesse Norell wrote:
> This description:
> 
> On Thu, 2017-07-13 at 15:07 +0100, Martin Gregorie wrote:
>> I'm continuing to get good results from a multi-level approach:
>>
>> I use two or more subrules with low scores (0.01 or so) that are
>> combined by an AND relation in a meta-rule that triggers a suitably
>> spammy score when all subrules get hits.
>>
>> The subrules are typically automatically assembled lists of words or
>> phrases - automatically assembled because that makes maintenance
>> vastly
>> easier. The list contents are typically words and phrases found in
>> spam, e.g. one list might be selling phrases such as "get you rocks
>> off
>> with" that are unlikely to appear in personal or legit commercial mail
>> and another might be names or slang terms for less common
>> pharmaceuticals.
> 
> 
> and what David Jones has been describing in this thread of identifying
> specific combinations of rules (his based on reputation vs. content)
> both remind me of the description of Marc Perkel's "evolution filter",
> which from memory identified sets of rules which are very indicative of
> ham/spam.   Both David and Martin are reporting good success, as did
> Marc - maybe worth looking into implementing in spamassassin?
> 
> Does masscheck automate meta rule creation? (ie. not just generate
> scores)  Not the full "evolution filter" idea which would have to run on
> the endpoint, but that would benefit everyone via rule updates.
> 
> 

I have been working on rebuilding the SA project's server the past four 
months.  The first priority was getting the spamassassin.org hidden DNS 
master active again.  This was pretty easy.  The second priority was the 
masscheck processing which turned out to be pretty time intensive and 
still could have an open issue so SA updates are currently on hold.

 From what I can tell, the masscheck is only meant to dynamically update 
the rule scores in 72_scores.cf (manual scores are in 50_scores.cf) and 
help validate new rules added by the SA developers.  I doesn't create 
new rules.  It's not able to create new rules based on content since the 
masscheck processing is run locally by easy user.  The email content is 
not uploaded to the SA server.  Only a special log file showing all of 
the rule hits each message hit for ham and spam is sent to the SA server.

It would be nice if there was a local tool that could be part of the SA 
project that would extend the masscheck processing and help build 
content and meta rules.  This would create more interest in masschecking 
and get more people involved.  (I use my masscheck ham/spam to also 
train my Bayes DB or else it may not have been helpful enough for me to 
set it up and understand the value of it.)  I suspect the advanced users 
of SA like Kevin's KAM.cf rules and a few others on this list have 
something like this they are using to build custom rules in an automated 
way.  Thankfully Kevin publishes his KAM.cf and allows public downloading.

I know that Kevin has a desire to be able to speed up rule development 
and SA updates (could take up to ~40 hours today if it weren't currently 
on hold) to react faster to new spam but it will never be fast enough to 
react to zero-hour spam like other technologies.  The best thing you can 
do is selective greylisting, rate limiting, DCC, Razor, Pyzor, and hope 
the RBLs catch up quickly.  I also have a local ruleset that I add 
zero-hour spam to shortcircuit as spam based on content which does a 
pretty good job at most new spam and phishing but some still get through 
now and then from compromised accounts.

-- 
David Jones

Re: "bout u" campaign

Posted by John Hardin <jh...@impsec.org>.

On Mon, 17 Jul 2017, Jesse Norell wrote:

> Does masscheck automate meta rule creation? (ie. not just generate
> scores)  Not the full "evolution filter" idea which would have to run on
> the endpoint, but that would benefit everyone via rule updates.

No, it does not.

There were a couple of rule generation experiments (the Sought and 
Sought-Fraud rulesets) but they fell by the wayside. The code is there if 
someone would like to start generating rulesets, and some of the corpus 
contributors might be willing to provide classified corpora (I still have 
a separate maintained 419 spams folder even though sought-fraud went 
dark).

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Back in 1969 the technology to fake a Moon landing didn't exist,
   but the technology to actually land there did.
   Today, it is the opposite.                               -- unknown
-----------------------------------------------------------------------
  3 days until the 48th anniversary of Apollo 11 landing on the Moon

Re: "bout u" campaign

Posted by Jesse Norell <je...@kci.net>.

This description:

On Thu, 2017-07-13 at 15:07 +0100, Martin Gregorie wrote:
> I'm continuing to get good results from a multi-level approach:
> 
> I use two or more subrules with low scores (0.01 or so) that are
> combined by an AND relation in a meta-rule that triggers a suitably
> spammy score when all subrules get hits. 
> 
> The subrules are typically automatically assembled lists of words or
> phrases - automatically assembled because that makes maintenance
> vastly
> easier. The list contents are typically words and phrases found in
> spam, e.g. one list might be selling phrases such as "get you rocks
> off
> with" that are unlikely to appear in personal or legit commercial mail
> and another might be names or slang terms for less common
> pharmaceuticals. 

and what David Jones has been describing in this thread of identifying
specific combinations of rules (his based on reputation vs. content)
both remind me of the description of Marc Perkel's "evolution filter",
which from memory identified sets of rules which are very indicative of
ham/spam.   Both David and Martin are reporting good success, as did
Marc - maybe worth looking into implementing in spamassassin?

Does masscheck automate meta rule creation? (ie. not just generate
scores)  Not the full "evolution filter" idea which would have to run on
the endpoint, but that would benefit everyone via rule updates.

-- 
Jesse Norell
Kentec Communications, Inc.
970-522-8107  -  www.kci.net

Re: "bout u" campaign

Posted by Martin Gregorie <ma...@gregorie.org>.

On Thu, 2017-07-13 at 13:26 -0400, Alex wrote:
> Would you be willing to share a few examples?
> 
You can download the script processor and documentation from here:
http://www.libelle-systems.com/free/

Its called 'portmanteau' and is a .tgz compressed tar archive

Contact me offlist if you want copies of the definition files for my
MG_SALE and MG_PRODUCT rules.


Martin

Re: "bout u" campaign

Posted by Alex <my...@gmail.com>.

Hi,

On Thu, Jul 13, 2017 at 10:07 AM, Martin Gregorie <ma...@gregorie.org> wrote:
> On Thu, 2017-07-13 at 12:59 +0000, Charles Amstutz wrote:
>> I find it challenging to constantly keep up with campaign's.  My
>> guess with the phone number is to try to make it seem more
>> legitimate.
>> More recent, I try to look for general characteristics and go for
>> that, in order to futureproof rules. However, there are always
>> legitimate emails being sent that would trigger a potential rule
>> (depending on what you are matching on)
>>
> I'm continuing to get good results from a multi-level approach:
>
> I use two or more subrules with low scores (0.01 or so) that are
> combined by an AND relation in a meta-rule that triggers a suitably
> spammy score when all subrules get hits.
>
> The subrules are typically automatically assembled lists of words or
> phrases - automatically assembled because that makes maintenance vastly
> easier. The list contents are typically words and phrases found in
> spam, e.g. one list might be selling phrases such as "get you rocks off
> with" that are unlikely to appear in personal or legit commercial mail
> and another might be names or slang terms for less common
> pharmaceuticals.
>
> The basis of this idea, which works surprisingly well in practise, is
> that a hit on one list may be accidental but a message hitting on both
> lists is more likely than not to be spam. A side benefit of this
> approach is that it will also hit combinations that weren't used in any
> of the spam analysed to create the lists, and that this will not
> generate false positives if the list contents are carefully chosen.
>
> I use an awk script to turn easily edited definition files into valid
> SA rules and hand-write the combining meta-rules.

We have a local blocklist that generates rules based on strings
identified in the body, subject and sender. I don't think it's quite
the same, however.

Would you be willing to share a few examples?

We also have a system where we use some of the address collection
rules combined with some of our own rules for catching "list" spam
("Sports enthusiasts", etc).

Re: "bout u" campaign

Posted by Martin Gregorie <ma...@gregorie.org>.

On Thu, 2017-07-13 at 12:59 +0000, Charles Amstutz wrote:
> I find it challenging to constantly keep up with campaign's.  My
> guess with the phone number is to try to make it seem more
> legitimate. 
> More recent, I try to look for general characteristics and go for
> that, in order to futureproof rules. However, there are always
> legitimate emails being sent that would trigger a potential rule
> (depending on what you are matching on)
> 
I'm continuing to get good results from a multi-level approach:

I use two or more subrules with low scores (0.01 or so) that are
combined by an AND relation in a meta-rule that triggers a suitably
spammy score when all subrules get hits. 

The subrules are typically automatically assembled lists of words or
phrases - automatically assembled because that makes maintenance vastly
easier. The list contents are typically words and phrases found in
spam, e.g. one list might be selling phrases such as "get you rocks off
with" that are unlikely to appear in personal or legit commercial mail
and another might be names or slang terms for less common
pharmaceuticals. 

The basis of this idea, which works surprisingly well in practise, is
that a hit on one list may be accidental but a message hitting on both
lists is more likely than not to be spam. A side benefit of this
approach is that it will also hit combinations that weren't used in any
of the spam analysed to create the lists, and that this will not
generate false positives if the list contents are carefully chosen.

I use an awk script to turn easily edited definition files into valid
SA rules and hand-write the combining meta-rules.

Martin

RE: "bout u" campaign

Posted by Charles Amstutz <ch...@infinitesys.com>.

I find it challenging to constantly keep up with campaign's.  My guess with the phone number is to try to make it seem more legitimate. 
More recent, I try to look for general characteristics and go for that, in order to futureproof rules. However, there are always legitimate emails being sent that would trigger a potential rule (depending on what you are matching on)


>> What is even the point of spam with a phone number?