You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Sebastian Arcus <s....@open-t.co.uk> on 2018/04/07 15:42:13 UTC
MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in
UK
I'm not entirely sure what is the cause of this - notification emails
from The Pension Regulator in UK (a government body overseeing pensions)
have the destination email in upper case as part of the Message-ID. I
don't know if the user has input their email address in caps when
creating the account with TPR, and the system at TPR just preserves caps
- or maybe their email software does that on purpose somehow. In all
events, all email notifications from them go straight to the Junk
folder. Do the standards really require a message id to be in all lower
case?
I've enclosed one of the messages received here:
https://pastebin.com/9Bmu3pj1
Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator
in UK
Posted by Sebastian Arcus <s....@open-t.co.uk>.
On 07/04/18 21:20, Bill Cole wrote:
> On 7 Apr 2018, at 11:42 (-0400), Sebastian Arcus wrote:
>
>> Do the standards really require a message id to be in all lower case?
>
> Of course not, and that's also not an accurate description of
> MSGID_SPAM_CAPS.
>
> A small minority of rules in SA are based on any external standard. They
> are empirical and pragmatic, not legalistic. There is a complex analysis
> of multiple mail streams used to generate scores for the rules and to
> decide which rules are good enough to publish in updates, run on a daily
> basis because it takes most of a day to run. The fact that
> MSGID_SPAM_CAPS exists with that name (and mot with a 'T_' or
> developer's tag prefix) implies that at some point in the past it was
> reliable enough as an indicator of spam to be part of the default set.
Thank you Bill. That is useful to know.
Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator
in UK
Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 7 Apr 2018, at 11:42 (-0400), Sebastian Arcus wrote:
> Do the standards really require a message id to be in all lower case?
Of course not, and that's also not an accurate description of
MSGID_SPAM_CAPS.
A small minority of rules in SA are based on any external standard. They
are empirical and pragmatic, not legalistic. There is a complex analysis
of multiple mail streams used to generate scores for the rules and to
decide which rules are good enough to publish in updates, run on a daily
basis because it takes most of a day to run. The fact that
MSGID_SPAM_CAPS exists with that name (and mot with a 'T_' or
developer's tag prefix) implies that at some point in the past it was
reliable enough as an indicator of spam to be part of the default set.
--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension
Regulator in UK
Posted by RW <rw...@googlemail.com>.
On Sun, 8 Apr 2018 07:41:50 -0500
David Jones wrote:
> On 04/07/2018 10:42 AM, Sebastian Arcus wrote:
> > I've enclosed one of the messages received here:
> >
> > https://pastebin.com/9Bmu3pj1
>
> I added this to the 60_whitelist_auth.cf to trust this sender:
>
> def_whitelist_auth *@*.tpr.gov.uk
>
> This will get pushed out in a couple of days by sa-update.
>
> I know it's not directly addressing your question about the rule's
> high score
FWIW with the defaults it would have scored only 1.04. Even with
BAYES_50 instead of BAYES_00 or without RCVD_IN_DNSWL_MED, it's still
comfortably under threshold.
That said, perhaps someone could see how this compares with the existing
version:
/^\s*<?[A-Z]+\@(?!(?:mailcity|whowhere)\.com|.*[\da-fA-F]{14})/
It excludes cases where the RHS has a long decimal number or hex
string. The 14 could be increased if the spam hits drop significantly.
I don't have any hits on MSGID_SPAM_CAPS, but my guess is that
doing "clever" things with message-ids is indicative of ham, and most
spam hits will have something simpler.
Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator
in UK
Posted by Sebastian Arcus <s....@open-t.co.uk>.
On 08/04/18 13:41, David Jones wrote:
> On 04/07/2018 10:42 AM, Sebastian Arcus wrote:
>> I'm not entirely sure what is the cause of this - notification emails
>> from The Pension Regulator in UK (a government body overseeing
>> pensions) have the destination email in upper case as part of the
>> Message-ID. I don't know if the user has input their email address in
>> caps when creating the account with TPR, and the system at TPR just
>> preserves caps - or maybe their email software does that on purpose
>> somehow. In all events, all email notifications from them go straight
>> to the Junk folder. Do the standards really require a message id to be
>> in all lower case?
>>
>> I've enclosed one of the messages received here:
>>
>> https://pastebin.com/9Bmu3pj1
>
> I added this to the 60_whitelist_auth.cf to trust this sender:
>
> def_whitelist_auth *@*.tpr.gov.uk
>
> This will get pushed out in a couple of days by sa-update.
>
> I know it's not directly addressing your question about the rule's high
> score but this is how I address these types of issues. If you create a
> "fast lane" for trusted senders then this allows for more aggressive
> tactics/scores for new and untrusted senders.
Thank you David. It sounds like a reasonable solution to me.
Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator
in UK
Posted by David Jones <dj...@ena.com>.
On 04/07/2018 10:42 AM, Sebastian Arcus wrote:
> I'm not entirely sure what is the cause of this - notification emails
> from The Pension Regulator in UK (a government body overseeing pensions)
> have the destination email in upper case as part of the Message-ID. I
> don't know if the user has input their email address in caps when
> creating the account with TPR, and the system at TPR just preserves caps
> - or maybe their email software does that on purpose somehow. In all
> events, all email notifications from them go straight to the Junk
> folder. Do the standards really require a message id to be in all lower
> case?
>
> I've enclosed one of the messages received here:
>
> https://pastebin.com/9Bmu3pj1
I added this to the 60_whitelist_auth.cf to trust this sender:
def_whitelist_auth *@*.tpr.gov.uk
This will get pushed out in a couple of days by sa-update.
I know it's not directly addressing your question about the rule's high
score but this is how I address these types of issues. If you create a
"fast lane" for trusted senders then this allows for more aggressive
tactics/scores for new and untrusted senders.
--
David Jones
Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator
in UK
Posted by Sebastian Arcus <s....@open-t.co.uk>.
On 07/04/18 17:22, Antony Stone wrote:
> On Saturday 07 April 2018 at 18:10:18, Sebastian Arcus wrote:
>
>> On 07/04/18 16:52, Reindl Harald wrote something.
>
>> Thank you for answering, but really, in effect you haven't answered at
>> all my question.
>
>> And the way I customise the scores are based on the type of emails
>> received at this particular site. It might seem "idiotic" to you, but
>> there are reasons for those scores. Not everyone receives the same mix
>> of email - so it isn't constructive to start calling other people's
>> scoring "idiotic" just because they are not the same as your own or the
>> defaults.
>
> Please note that there are good reasons why you received only a private
> response from this person, and that he is no longer permitted to post to the
> list.
>
> My personal recommendation is to consider carefully anything he says, judge
> whether you find it useful, and not to reply.
Hi Antony. Thank you kindly for the information. I didn't notice that
the message was private and not from the list - as the message CC'ed the
list - so it looked like a regular reply. I will take your advice -
thank you.
Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK
Posted by Antony Stone <An...@spamassassin.open.source.it>.
On Saturday 07 April 2018 at 18:10:18, Sebastian Arcus wrote:
> On 07/04/18 16:52, Reindl Harald wrote something.
> Thank you for answering, but really, in effect you haven't answered at
> all my question.
> And the way I customise the scores are based on the type of emails
> received at this particular site. It might seem "idiotic" to you, but
> there are reasons for those scores. Not everyone receives the same mix
> of email - so it isn't constructive to start calling other people's
> scoring "idiotic" just because they are not the same as your own or the
> defaults.
Please note that there are good reasons why you received only a private
response from this person, and that he is no longer permitted to post to the
list.
My personal recommendation is to consider carefully anything he says, judge
whether you find it useful, and not to reply.
Regards,
Antony.
--
This sentence contains exacly three erors.
Please reply to the list;
please *don't* CC me.
Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator
in UK
Posted by Sebastian Arcus <s....@open-t.co.uk>.
On 07/04/18 17:14, Reindl Harald wrote:
>
>
> Am 07.04.2018 um 18:10 schrieb Sebastian Arcus:
>> And the way I customise the scores are based on the type of emails
>> received at this particular site. It might seem "idiotic" to you, but
>> there are reasons for those scores. Not everyone receives the same mix
>> of email - so it isn't constructive to start calling other people's
>> scoring "idiotic" just because they are not the same as your own or the
>> defaults
> if a single misfired rule make a BAYES_00 message to a spam message it's
> idiotic - it's that easy - with or without MSGID_SPAM_CAPS that can
> happen at every moment in time and when you trust your bayes -0.2 is not
> justified and if you don't trust your bayes train it
A default score of 3.1 for MSGID_SPAM_CAPS is pretty high - even
compared with some of the DNS blacklists rules - and some of those are
pretty powerful INMHO. Hence why I was trying to understand why this
rule is assigned such a high score and what is the significance of it.
Secondly, I found in the past that a high negative score for BAYES_00 is
counter-productive, because:
1. As soon as you receive a spam message with a new type of content, it
essentially has a free ride until it gets put through the bayes training
- as the high negative on BAYES_00 counteracts any other rule it hits -
even pretty effective rules, such as Pyzor and blacklists.
2. Spammers have learned from the above, and I get a lot of spam which
changes the wording all the time, so that bayes becomes essentially
ineffective against it - but at the same time it stops other rules from
working - because of the high negative scores on low BAYES.
3. Spammers have also learned from no.1 , and I see a lot of extremely
short spam messages - just one short line of few words. Bayes seems to
be extremely ineffective on these very short messages, not matter how
much you train it - because of the small amount of data to work on, and
with a little bit of cunning and varying the words used - they all score
as BAYES_00. Again, the high negative score gives these spammers a
guaranteed free ride, as it overrides any other rules.
So at least from the type of spam that I see, BAYES_00 with a large
negative score is really counter-productive and it makes SA far less
efficient at picking spam.
BAYES_00 doesn't necessarily mean "I am sure this is not spam" - as a
good quality whitelist rule would, for example. It merely means "I
haven't really seen this type of spam before", or simply "this message
is too short and I really can't say anything useful about it". For these
reasons, I don't think low BAYES scores should be given large negative
scores - and hence why I changed them on my systems - with really good
results.
Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator
in UK
Posted by Sebastian Arcus <s....@open-t.co.uk>.
On 07/04/18 16:52, Reindl Harald wrote:
> Content analysis details: (5.1 points, 4.0 required)
>
> who did set the *non default* required score to 4.0?
> why did the person not adjust -0.2 for BAYES_00 too?
>
> the scoring of this system is idiotic!
>
> required score here is 5.5 and BAYES_00 is scored to -3.5 while milter
> reject starts with 8.0 so nothing would happen just because *one single*
> rule hti wrongly
Thank you for answering, but really, in effect you haven't answered at
all my question. I was merely trying to understand the MSGID_SPAM_CAPS
rule - and what rationale it is based on. I know I can alter the score
just for it - I was trying to understand what other implications this
might have. I didn't even suggest that SA default config or scoring
needs to change!
And the way I customise the scores are based on the type of emails
received at this particular site. It might seem "idiotic" to you, but
there are reasons for those scores. Not everyone receives the same mix
of email - so it isn't constructive to start calling other people's
scoring "idiotic" just because they are not the same as your own or the
defaults.
>
> Am 07.04.2018 um 17:42 schrieb Sebastian Arcus:
>> I'm not entirely sure what is the cause of this - notification emails
>> from The Pension Regulator in UK (a government body overseeing pensions)
>> have the destination email in upper case as part of the Message-ID. I
>> don't know if the user has input their email address in caps when
>> creating the account with TPR, and the system at TPR just preserves caps
>> - or maybe their email software does that on purpose somehow. In all
>> events, all email notifications from them go straight to the Junk
>> folder. Do the standards really require a message id to be in all lower
>> case?
>>
>> I've enclosed one of the messages received here:
>>
>> https://pastebin.com/9Bmu3pj