You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Alex <my...@gmail.com> on 2018/03/05 21:23:31 UTC
APOSTROPHE_TOCC score
Hi,
I just received a false-positive because of the following address:
To: "'info@example.se'" <in...@example.se>
Apparently the apostrophe is enough to warrant 2.5 points alone? Is
this intended to catch addresses like tom.o'reilly@example.com or more
like my example above?
That seems like an awfully high score, but was just wondering if
people thought this was correct of if we should look at it again or if
I should just write an exception locally...
Re: APOSTROPHE_TOCC score
Posted by RW <rw...@googlemail.com>.
On Mon, 5 Mar 2018 16:28:33 -0600
David Jones wrote:
> On 03/05/2018 04:20 PM, John Hardin wrote:
> > On Mon, 5 Mar 2018, Alex wrote:
> >
> >> 2.6 points for this is just unreasonable. This was a completely
> >> legitimate email.
> >
> > What is the S/O in masscheck?
> >
>
> http://ruleqa.spamassassin.org/20180304-r1825801-n/APOSTROPHE_TOCC/detail
>
> It's a high S/O in the masscheck but I don't think that alone is an
> indicator of spam. I need to check my ena corpora to see what is
> going on there.
>
> This rule should probably be limited to a max of 1.0.
Or perhaps change the rule from:
header APOSTROPHE_TOCC ToCc:addr =~ /'/
to:
header APOSTROPHE_TOCC ToCc:addr =~ /[^do]'/
Re: APOSTROPHE_TOCC score
Posted by David Jones <dj...@ena.com>.
On 03/05/2018 04:20 PM, John Hardin wrote:
> On Mon, 5 Mar 2018, Alex wrote:
>
>> 2.6 points for this is just unreasonable. This was a completely
>> legitimate email.
>
> What is the S/O in masscheck?
>
http://ruleqa.spamassassin.org/20180304-r1825801-n/APOSTROPHE_TOCC/detail
It's a high S/O in the masscheck but I don't think that alone is an
indicator of spam. I need to check my ena corpora to see what is going
on there.
This rule should probably be limited to a max of 1.0.
--
David Jones
Re: APOSTROPHE_TOCC score
Posted by John Hardin <jh...@impsec.org>.
On Mon, 5 Mar 2018, Alex wrote:
> 2.6 points for this is just unreasonable. This was a completely
> legitimate email.
What is the S/O in masscheck?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Failure to plan ahead on someone else's part does not constitute
an emergency on my part. -- David W. Barts in a.s.r
-----------------------------------------------------------------------
6 days until Daylight Saving Time begins in U.S. - Spring Forward
Re: APOSTROPHE_TOCC score
Posted by John Hardin <jh...@impsec.org>.
On Tue, 6 Mar 2018, David Jones wrote:
> On 03/06/2018 12:54 PM, John Hardin wrote:
>> On Tue, 6 Mar 2018, RW wrote:
>>
>>> On Tue, 6 Mar 2018 08:47:35 -0800 (PST)
>>> John Hardin wrote:
>>>
>>>> On Tue, 6 Mar 2018, David Jones wrote:
>>>
>>>>> In this case these were really bad spam so the APOSTROPHE_TOCC is
>>>>> just riding on the back of other rules, BLs, and high Bayes
>>>>> scores.
>>>>
>>>> What I generally look at is the detailed rule performance in
>>>> masscheck. If it primarily hits on spams that score in total 1-3
>>>> points.
>>>
>>> Why not under 5?
>>
>> If it's close to 5 and there's a limit that suggests the limit could be
>> increased a bit.
>>
>> It also needs to take into account the ham hits, which is why having a
>> ham-starved corpus is such a problem.
>
> Are you saying we have a ham-starved corpus?
We have at times in the past. When you're performing analyses like this
you need to bear in mind the size of the ham corpus.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Failure to plan ahead on someone else's part does not constitute
an emergency on my part. -- David W. Barts in a.s.r
-----------------------------------------------------------------------
5 days until Daylight Saving Time begins in U.S. - Spring Forward
Re: APOSTROPHE_TOCC score
Posted by David Jones <dj...@ena.com>.
On 03/06/2018 12:54 PM, John Hardin wrote:
> On Tue, 6 Mar 2018, RW wrote:
>
>> On Tue, 6 Mar 2018 08:47:35 -0800 (PST)
>> John Hardin wrote:
>>
>>> On Tue, 6 Mar 2018, David Jones wrote:
>>
>>>> In this case these were really bad spam so the APOSTROPHE_TOCC is
>>>> just riding on the back of other rules, BLs, and high Bayes
>>>> scores.
>>>
>>> What I generally look at is the detailed rule performance in
>>> masscheck. If it primarily hits on spams that score in total 1-3
>>> points.
>>
>> Why not under 5?
>
> If it's close to 5 and there's a limit that suggests the limit could be
> increased a bit.
>
> It also needs to take into account the ham hits, which is why having a
> ham-starved corpus is such a problem.
>
Are you saying we have a ham-starved corpus?
OVERALL SPAM HAM
ena-week0 77,945 36,459 41,486
ena-week1 93,847 52,781 41,066
ena-week2 69,297 30,328 38,969
ena-week3 75,853 31,995 43,858
ena-week4 92,680 37,511 55,169
409,622 189,074 220,548
http://ruleqa.spamassassin.org
--
David Jones
Re: APOSTROPHE_TOCC score
Posted by John Hardin <jh...@impsec.org>.
On Tue, 6 Mar 2018, RW wrote:
> On Tue, 6 Mar 2018 08:47:35 -0800 (PST)
> John Hardin wrote:
>
>> On Tue, 6 Mar 2018, David Jones wrote:
>
>>> In this case these were really bad spam so the APOSTROPHE_TOCC is
>>> just riding on the back of other rules, BLs, and high Bayes
>>> scores.
>>
>> What I generally look at is the detailed rule performance in
>> masscheck. If it primarily hits on spams that score in total 1-3
>> points.
>
> Why not under 5?
If it's close to 5 and there's a limit that suggests the limit could be
increased a bit.
It also needs to take into account the ham hits, which is why having a
ham-starved corpus is such a problem.
Generally speaking there's a spike, if the spike is at less than 5 it
needs attention and the lower the spike is the more generous the score
limit may be, bearing in mind that poison pills should be rare.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Failure to plan ahead on someone else's part does not constitute
an emergency on my part. -- David W. Barts in a.s.r
-----------------------------------------------------------------------
5 days until Daylight Saving Time begins in U.S. - Spring Forward
Re: APOSTROPHE_TOCC score
Posted by RW <rw...@googlemail.com>.
On Tue, 6 Mar 2018 08:47:35 -0800 (PST)
John Hardin wrote:
> On Tue, 6 Mar 2018, David Jones wrote:
> > In this case these were really bad spam so the APOSTROPHE_TOCC is
> > just riding on the back of other rules, BLs, and high Bayes
> > scores.
>
> What I generally look at is the detailed rule performance in
> masscheck. If it primarily hits on spams that score in total 1-3
> points.
Why not under 5?
Re: APOSTROPHE_TOCC score
Posted by John Hardin <jh...@impsec.org>.
On Tue, 6 Mar 2018, David Jones wrote:
> On 03/05/2018 06:57 PM, John Hardin wrote:
>> On Mon, 5 Mar 2018, Alex wrote:
>>
>>> Hi,
>>>
>>> On Mon, Mar 5, 2018 at 5:59 PM, John Hardin <jh...@impsec.org> wrote:
>>>> On Mon, 5 Mar 2018, Alex wrote:
>>>>
>>>>> To: =?utf-8?Q?DermotO=27reilly?= <Se...@example.com>
>>>>> * 2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe
>>>>>
>>>>> 2.6 points for this is just unreasonable. This was a completely
>>>>> legitimate email.
>>>>
>>>> Is such an address even deliverable?
>>>
>>> Yes, it's beyond me why anyone would want to use an apostrophe, but
>>> it's valid.
>>
>> OK.
>>
>> That rule is 8 years stale. I've added a masscheck score limit of 1.000
>>
>> I'm open to discussion of converting it to a subrule and/or adding some
>> extra conditions to it.
>>
>
> Here are some samples of what I found in my corpora which supplies the
> majority of the nightly masscheck corpora.
>
> https://pastebin.com/QchEu2BA
> https://pastebin.com/pbYnvzU4
> https://pastebin.com/EjnQSE7H
>
> In this case these were really bad spam so the APOSTROPHE_TOCC is just riding
> on the back of other rules, BLs, and high Bayes scores.
What I generally look at is the detailed rule performance in masscheck. If
it primarily hits on spams that score in total 1-3 points I generally
tend to set the score limit somewhat higher. Having a tail of
higher-scoring hits doesn't affect that analysis.
This looks like one of those rules.
In this case I'd probably set the score limit on this rule low and add
more generously-scored metas for the high-spam-low-ham rule overlaps from
the masscheck results.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Failure to plan ahead on someone else's part does not constitute
an emergency on my part. -- David W. Barts in a.s.r
-----------------------------------------------------------------------
5 days until Daylight Saving Time begins in U.S. - Spring Forward
Re: APOSTROPHE_TOCC score
Posted by David Jones <dj...@ena.com>.
On 03/05/2018 06:57 PM, John Hardin wrote:
> On Mon, 5 Mar 2018, Alex wrote:
>
>> Hi,
>>
>> On Mon, Mar 5, 2018 at 5:59 PM, John Hardin <jh...@impsec.org> wrote:
>>> On Mon, 5 Mar 2018, Alex wrote:
>>>
>>>> To: =?utf-8?Q?DermotO=27reilly?= <Se...@example.com>
>>>> * 2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe
>>>>
>>>> 2.6 points for this is just unreasonable. This was a completely
>>>> legitimate email.
>>>
>>> Is such an address even deliverable?
>>
>> Yes, it's beyond me why anyone would want to use an apostrophe, but
>> it's valid.
>
> OK.
>
> That rule is 8 years stale. I've added a masscheck score limit of 1.000
>
> I'm open to discussion of converting it to a subrule and/or adding some
> extra conditions to it.
>
Here are some samples of what I found in my corpora which supplies the
majority of the nightly masscheck corpora.
https://pastebin.com/QchEu2BA
https://pastebin.com/pbYnvzU4
https://pastebin.com/EjnQSE7H
In this case these were really bad spam so the APOSTROPHE_TOCC is just
riding on the back of other rules, BLs, and high Bayes scores.
--
David Jones
Re: APOSTROPHE_TOCC score
Posted by John Hardin <jh...@impsec.org>.
On Mon, 5 Mar 2018, Alex wrote:
> Hi,
>
> On Mon, Mar 5, 2018 at 5:59 PM, John Hardin <jh...@impsec.org> wrote:
>> On Mon, 5 Mar 2018, Alex wrote:
>>
>>> To: =?utf-8?Q?DermotO=27reilly?= <Se...@example.com>
>>> * 2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe
>>>
>>> 2.6 points for this is just unreasonable. This was a completely
>>> legitimate email.
>>
>> Is such an address even deliverable?
>
> Yes, it's beyond me why anyone would want to use an apostrophe, but
> it's valid.
OK.
That rule is 8 years stale. I've added a masscheck score limit of 1.000
I'm open to discussion of converting it to a subrule and/or adding some
extra conditions to it.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Failure to plan ahead on someone else's part does not constitute
an emergency on my part. -- David W. Barts in a.s.r
-----------------------------------------------------------------------
6 days until Daylight Saving Time begins in U.S. - Spring Forward
Re: APOSTROPHE_TOCC score
Posted by Alex <my...@gmail.com>.
Hi,
On Mon, Mar 5, 2018 at 5:59 PM, John Hardin <jh...@impsec.org> wrote:
> On Mon, 5 Mar 2018, Alex wrote:
>
>> To: =?utf-8?Q?DermotO=27reilly?= <Se...@example.com>
>> * 2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe
>>
>> 2.6 points for this is just unreasonable. This was a completely
>> legitimate email.
>
> Is such an address even deliverable?
Yes, it's beyond me why anyone would want to use an apostrophe, but
it's valid. We discourage its use because it just makes sharing your
address more difficult, and there's also probably some weird system
that doesn't know how to handle it out there.
https://en.wikipedia.org/wiki/Email_address#Local-part
Re: APOSTROPHE_TOCC score
Posted by John Hardin <jh...@impsec.org>.
On Mon, 5 Mar 2018, Alex wrote:
> To: =?utf-8?Q?DermotO=27reilly?= <Se...@example.com>
> * 2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe
>
> 2.6 points for this is just unreasonable. This was a completely
> legitimate email.
Is such an address even deliverable?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Failure to plan ahead on someone else's part does not constitute
an emergency on my part. -- David W. Barts in a.s.r
-----------------------------------------------------------------------
6 days until Daylight Saving Time begins in U.S. - Spring Forward
Re: APOSTROPHE_TOCC score
Posted by Alex <my...@gmail.com>.
Hi,
On Mon, Mar 5, 2018 at 4:48 PM, RW <rw...@googlemail.com> wrote:
> On Mon, 5 Mar 2018 16:23:31 -0500
> Alex wrote:
>
>> Hi,
>>
>> I just received a false-positive because of the following address:
>>
>> To: "'info@example.se'" <in...@example.se>
>>
>> Apparently the apostrophe is enough to warrant 2.5 points alone? Is
>> this intended to catch addresses like tom.o'reilly@example.com or more
>> like my example above?
>
> Only the former, but I can't reproduce the bug from the above example.
I'm sorry, too many terminals open. The email producing this hit was
indeed with o'reilly in it:
To: =?utf-8?Q?DermotO=27reilly?= <Se...@example.com>
* 2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe
2.6 points for this is just unreasonable. This was a completely
legitimate email.
Re: APOSTROPHE_TOCC score
Posted by RW <rw...@googlemail.com>.
On Mon, 5 Mar 2018 16:23:31 -0500
Alex wrote:
> Hi,
>
> I just received a false-positive because of the following address:
>
> To: "'info@example.se'" <in...@example.se>
>
> Apparently the apostrophe is enough to warrant 2.5 points alone? Is
> this intended to catch addresses like tom.o'reilly@example.com or more
> like my example above?
Only the former, but I can't reproduce the bug from the above example.