You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Sebastian Arcus <s....@open-t.co.uk> on 2018/04/07 15:42:13 UTC

MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

I'm not entirely sure what is the cause of this - notification emails 
from The Pension Regulator in UK (a government body overseeing pensions) 
have the destination email in upper case as part of the Message-ID. I 
don't know if the user has input their email address in caps when 
creating the account with TPR, and the system at TPR just preserves caps 
- or maybe their email software does that on purpose somehow. In all 
events, all email notifications from them go straight to the Junk 
folder. Do the standards really require a message id to be in all lower 
case?

I've enclosed one of the messages received here:

https://pastebin.com/9Bmu3pj1

Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

Posted by Sebastian Arcus <s....@open-t.co.uk>.

On 07/04/18 21:20, Bill Cole wrote:
> On 7 Apr 2018, at 11:42 (-0400), Sebastian Arcus wrote:
> 
>> Do the standards really require a message id to be in all lower case?
> 
> Of course not, and that's also not an accurate description of 
> MSGID_SPAM_CAPS.
> 
> A small minority of rules in SA are based on any external standard. They 
> are empirical and pragmatic, not legalistic. There is a complex analysis 
> of multiple mail streams  used to generate scores for the rules and to 
> decide which rules are good enough to publish in updates, run on a daily 
> basis because it takes most of a day to run. The fact that 
> MSGID_SPAM_CAPS exists with that name (and mot with a 'T_' or 
> developer's tag prefix) implies that at some point in the past it was 
> reliable enough as an indicator of spam to be part of the default set.

Thank you Bill. That is useful to know.

Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

Posted by Bill Cole <sa...@billmail.scconsult.com>.

On 7 Apr 2018, at 11:42 (-0400), Sebastian Arcus wrote:

> Do the standards really require a message id to be in all lower case?

Of course not, and that's also not an accurate description of 
MSGID_SPAM_CAPS.

A small minority of rules in SA are based on any external standard. They 
are empirical and pragmatic, not legalistic. There is a complex analysis 
of multiple mail streams  used to generate scores for the rules and to 
decide which rules are good enough to publish in updates, run on a daily 
basis because it takes most of a day to run. The fact that 
MSGID_SPAM_CAPS exists with that name (and mot with a 'T_' or 
developer's tag prefix) implies that at some point in the past it was 
reliable enough as an indicator of spam to be part of the default set.

-- 
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole

Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

Posted by RW <rw...@googlemail.com>.

On Sun, 8 Apr 2018 07:41:50 -0500
David Jones wrote:

> On 04/07/2018 10:42 AM, Sebastian Arcus wrote:

> > I've enclosed one of the messages received here:
> > 
> > https://pastebin.com/9Bmu3pj1  
> 
> I added this to the 60_whitelist_auth.cf to trust this sender:
> 
> def_whitelist_auth *@*.tpr.gov.uk
> 
> This will get pushed out in a couple of days by sa-update.
> 
> I know it's not directly addressing your question about the rule's
> high score 

FWIW with the defaults it would have scored only 1.04. Even with
BAYES_50 instead of BAYES_00 or without RCVD_IN_DNSWL_MED, it's still
comfortably under threshold.  

That said, perhaps someone could see how this compares with the existing
version:

  /^\s*<?[A-Z]+\@(?!(?:mailcity|whowhere)\.com|.*[\da-fA-F]{14})/

It excludes cases where the RHS has a long decimal number or hex
string. The 14 could be increased if the spam hits drop significantly. 

I don't have any hits on MSGID_SPAM_CAPS, but my guess is that
doing "clever" things with message-ids is indicative of ham, and most
spam hits will have something simpler.

Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

Posted by Sebastian Arcus <s....@open-t.co.uk>.

On 08/04/18 13:41, David Jones wrote:
> On 04/07/2018 10:42 AM, Sebastian Arcus wrote:
>> I'm not entirely sure what is the cause of this - notification emails 
>> from The Pension Regulator in UK (a government body overseeing 
>> pensions) have the destination email in upper case as part of the 
>> Message-ID. I don't know if the user has input their email address in 
>> caps when creating the account with TPR, and the system at TPR just 
>> preserves caps - or maybe their email software does that on purpose 
>> somehow. In all events, all email notifications from them go straight 
>> to the Junk folder. Do the standards really require a message id to be 
>> in all lower case?
>>
>> I've enclosed one of the messages received here:
>>
>> https://pastebin.com/9Bmu3pj1
> 
> I added this to the 60_whitelist_auth.cf to trust this sender:
> 
> def_whitelist_auth *@*.tpr.gov.uk
> 
> This will get pushed out in a couple of days by sa-update.
> 
> I know it's not directly addressing your question about the rule's high 
> score but this is how I address these types of issues.  If you create a 
> "fast lane" for trusted senders then this allows for more aggressive 
> tactics/scores for new and untrusted senders.

Thank you David. It sounds like a reasonable solution to me.

Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

Posted by David Jones <dj...@ena.com>.

On 04/07/2018 10:42 AM, Sebastian Arcus wrote:
> I'm not entirely sure what is the cause of this - notification emails 
> from The Pension Regulator in UK (a government body overseeing pensions) 
> have the destination email in upper case as part of the Message-ID. I 
> don't know if the user has input their email address in caps when 
> creating the account with TPR, and the system at TPR just preserves caps 
> - or maybe their email software does that on purpose somehow. In all 
> events, all email notifications from them go straight to the Junk 
> folder. Do the standards really require a message id to be in all lower 
> case?
> 
> I've enclosed one of the messages received here:
> 
> https://pastebin.com/9Bmu3pj1

I added this to the 60_whitelist_auth.cf to trust this sender:

def_whitelist_auth *@*.tpr.gov.uk

This will get pushed out in a couple of days by sa-update.

I know it's not directly addressing your question about the rule's high 
score but this is how I address these types of issues.  If you create a 
"fast lane" for trusted senders then this allows for more aggressive 
tactics/scores for new and untrusted senders.

-- 
David Jones

Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

Posted by Sebastian Arcus <s....@open-t.co.uk>.

On 07/04/18 17:22, Antony Stone wrote:
> On Saturday 07 April 2018 at 18:10:18, Sebastian Arcus wrote:
> 
>> On 07/04/18 16:52, Reindl Harald wrote something.
> 
>> Thank you for answering, but really, in effect you haven't answered at
>> all my question.
> 
>> And the way I customise the scores are based on the type of emails
>> received at this particular site. It might seem "idiotic" to you, but
>> there are reasons for those scores. Not everyone receives the same mix
>> of email - so it isn't constructive to start calling other people's
>> scoring "idiotic" just because they are not the same as your own or the
>> defaults.
> 
> Please note that there are good reasons why you received only a private
> response from this person, and that he is no longer permitted to post to the
> list.
> 
> My personal recommendation is to consider carefully anything he says, judge
> whether you find it useful, and not to reply.

Hi Antony. Thank you kindly for the information. I didn't notice that 
the message was private and not from the list - as the message CC'ed the 
list - so it looked like a regular reply. I will take your advice - 
thank you.

Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

Posted by Antony Stone <An...@spamassassin.open.source.it>.

On Saturday 07 April 2018 at 18:10:18, Sebastian Arcus wrote:

> On 07/04/18 16:52, Reindl Harald wrote something.

> Thank you for answering, but really, in effect you haven't answered at
> all my question.

> And the way I customise the scores are based on the type of emails
> received at this particular site. It might seem "idiotic" to you, but
> there are reasons for those scores. Not everyone receives the same mix
> of email - so it isn't constructive to start calling other people's
> scoring "idiotic" just because they are not the same as your own or the
> defaults.

Please note that there are good reasons why you received only a private 
response from this person, and that he is no longer permitted to post to the 
list.

My personal recommendation is to consider carefully anything he says, judge 
whether you find it useful, and not to reply.

Regards,

Antony.

-- 
This sentence contains exacly three erors.

                                                   Please reply to the list;
                                                         please *don't* CC me.

Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

Posted by Sebastian Arcus <s....@open-t.co.uk>.

On 07/04/18 17:14, Reindl Harald wrote:
> 
> 
> Am 07.04.2018 um 18:10 schrieb Sebastian Arcus:
>> And the way I customise the scores are based on the type of emails
>> received at this particular site. It might seem "idiotic" to you, but
>> there are reasons for those scores. Not everyone receives the same mix
>> of email - so it isn't constructive to start calling other people's
>> scoring "idiotic" just because they are not the same as your own or the
>> defaults
> if a single misfired rule make a BAYES_00 message to a spam message it's
> idiotic - it's that easy - with or without MSGID_SPAM_CAPS that can
> happen at every moment in time and when you trust your bayes -0.2 is not
> justified and if you don't trust your bayes train it

A default score of 3.1 for MSGID_SPAM_CAPS is pretty high - even 
compared with some of the DNS blacklists rules - and some of those are 
pretty powerful INMHO. Hence why I was trying to understand why this 
rule is assigned such a high score and what is the significance of it.

Secondly, I found in the past that a high negative score for BAYES_00 is 
counter-productive, because:

1. As soon as you receive a spam message with a new type of content, it 
essentially has a free ride until it gets put through the bayes training 
- as the high negative on BAYES_00 counteracts any other rule it hits - 
even pretty effective rules, such as Pyzor and blacklists.

2. Spammers have learned from the above, and I get a lot of spam which 
changes the wording all the time, so that bayes becomes essentially 
ineffective against it - but at the same time it stops other rules from 
working - because of the high negative scores on low BAYES.

3. Spammers have also learned from no.1 , and I see a lot of extremely 
short spam messages - just one short line of few words. Bayes seems to 
be extremely ineffective on these very short messages, not matter how 
much you train it - because of the small amount of data to work on, and 
with a little bit of cunning and varying the words used - they all score 
as BAYES_00. Again, the high negative score gives these spammers a 
guaranteed free ride, as it overrides any other rules.

So at least from the type of spam that I see, BAYES_00 with a large 
negative score is really counter-productive and it makes SA far less 
efficient at picking spam.

BAYES_00 doesn't necessarily mean "I am sure this is not spam" - as a 
good quality whitelist rule would, for example. It merely means "I 
haven't really seen this type of spam before", or simply "this message 
is too short and I really can't say anything useful about it". For these 
reasons, I don't think low BAYES scores should be given large negative 
scores - and hence why I changed them on my systems - with really good 
results.

Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

Posted by Sebastian Arcus <s....@open-t.co.uk>.

On 07/04/18 16:52, Reindl Harald wrote:
> Content analysis details:   (5.1 points, 4.0 required)
> 
> who did set the *non default* required score to 4.0?
> why did the person not adjust -0.2 for BAYES_00 too?
> 
> the scoring of this system is idiotic!
> 
> required score here is 5.5 and BAYES_00 is scored to -3.5 while milter
> reject starts with 8.0 so nothing would happen just because *one single*
> rule hti wrongly

Thank you for answering, but really, in effect you haven't answered at 
all my question. I was merely trying to understand the MSGID_SPAM_CAPS 
rule - and what rationale it is based on. I know I can alter the score 
just for it - I was trying to understand what other implications this 
might have. I didn't even suggest that SA default config or scoring 
needs to change!

And the way I customise the scores are based on the type of emails 
received at this particular site. It might seem "idiotic" to you, but 
there are reasons for those scores. Not everyone receives the same mix 
of email - so it isn't constructive to start calling other people's 
scoring "idiotic" just because they are not the same as your own or the 
defaults.

> 
> Am 07.04.2018 um 17:42 schrieb Sebastian Arcus:
>> I'm not entirely sure what is the cause of this - notification emails
>> from The Pension Regulator in UK (a government body overseeing pensions)
>> have the destination email in upper case as part of the Message-ID. I
>> don't know if the user has input their email address in caps when
>> creating the account with TPR, and the system at TPR just preserves caps
>> - or maybe their email software does that on purpose somehow. In all
>> events, all email notifications from them go straight to the Junk
>> folder. Do the standards really require a message id to be in all lower
>> case?
>>
>> I've enclosed one of the messages received here:
>>
>> https://pastebin.com/9Bmu3pj