You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Alex <my...@gmail.com> on 2018/03/05 21:23:31 UTC

APOSTROPHE_TOCC score

Hi,

I just received a false-positive because of the following address:

To: "'info@example.se'" <in...@example.se>

Apparently the apostrophe is enough to warrant 2.5 points alone? Is
this intended to catch addresses like tom.o'reilly@example.com or more
like my example above?

That seems like an awfully high score, but was just wondering if
people thought this was correct of if we should look at it again or if
I should just write an exception locally...

Re: APOSTROPHE_TOCC score

Posted by RW <rw...@googlemail.com>.
On Mon, 5 Mar 2018 16:28:33 -0600
David Jones wrote:

> On 03/05/2018 04:20 PM, John Hardin wrote:
> > On Mon, 5 Mar 2018, Alex wrote:
> >   
> >> 2.6 points for this is just unreasonable. This was a completely
> >> legitimate email.  
> > 
> > What is the S/O in masscheck?
> >   
> 
> http://ruleqa.spamassassin.org/20180304-r1825801-n/APOSTROPHE_TOCC/detail
> 
> It's a high S/O in the masscheck but I don't think that alone is an 
> indicator of spam.  I need to check my ena corpora to see what is
> going on there.
> 
> This rule should probably be limited to a max of 1.0.

Or perhaps change the rule from:

  header	APOSTROPHE_TOCC	ToCc:addr =~ /'/

to: 

  header	APOSTROPHE_TOCC	ToCc:addr =~ /[^do]'/

Re: APOSTROPHE_TOCC score

Posted by David Jones <dj...@ena.com>.
On 03/05/2018 04:20 PM, John Hardin wrote:
> On Mon, 5 Mar 2018, Alex wrote:
> 
>> 2.6 points for this is just unreasonable. This was a completely
>> legitimate email.
> 
> What is the S/O in masscheck?
> 

http://ruleqa.spamassassin.org/20180304-r1825801-n/APOSTROPHE_TOCC/detail

It's a high S/O in the masscheck but I don't think that alone is an 
indicator of spam.  I need to check my ena corpora to see what is going 
on there.

This rule should probably be limited to a max of 1.0.

-- 
David Jones

Re: APOSTROPHE_TOCC score

Posted by John Hardin <jh...@impsec.org>.
On Mon, 5 Mar 2018, Alex wrote:

> 2.6 points for this is just unreasonable. This was a completely
> legitimate email.

What is the S/O in masscheck?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Failure to plan ahead on someone else's part does not constitute
   an emergency on my part.                 -- David W. Barts in a.s.r
-----------------------------------------------------------------------
  6 days until Daylight Saving Time begins in U.S. - Spring Forward

Re: APOSTROPHE_TOCC score

Posted by John Hardin <jh...@impsec.org>.
On Tue, 6 Mar 2018, David Jones wrote:

> On 03/06/2018 12:54 PM, John Hardin wrote:
>> On Tue, 6 Mar 2018, RW wrote:
>> 
>>> On Tue, 6 Mar 2018 08:47:35 -0800 (PST)
>>> John Hardin wrote:
>>> 
>>>> On Tue, 6 Mar 2018, David Jones wrote:
>>> 
>>>>> In this case these were really bad spam so the APOSTROPHE_TOCC is
>>>>> just riding on the back of other rules, BLs, and high Bayes
>>>>> scores.
>>>> 
>>>> What I generally look at is the detailed rule performance in
>>>> masscheck. If it primarily hits on spams that score in total 1-3
>>>> points.
>>> 
>>> Why not under 5?
>> 
>> If it's close to 5 and there's a limit that suggests the limit could be 
>> increased a bit.
>> 
>> It also needs to take into account the ham hits, which is why having a 
>> ham-starved corpus is such a problem.
>
> Are you saying we have a ham-starved corpus?

We have at times in the past. When you're performing analyses like this 
you need to bear in mind the size of the ham corpus.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Failure to plan ahead on someone else's part does not constitute
   an emergency on my part.                 -- David W. Barts in a.s.r
-----------------------------------------------------------------------
  5 days until Daylight Saving Time begins in U.S. - Spring Forward

Re: APOSTROPHE_TOCC score

Posted by David Jones <dj...@ena.com>.
On 03/06/2018 12:54 PM, John Hardin wrote:
> On Tue, 6 Mar 2018, RW wrote:
> 
>> On Tue, 6 Mar 2018 08:47:35 -0800 (PST)
>> John Hardin wrote:
>>
>>> On Tue, 6 Mar 2018, David Jones wrote:
>>
>>>> In this case these were really bad spam so the APOSTROPHE_TOCC is
>>>> just riding on the back of other rules, BLs, and high Bayes
>>>> scores.
>>>
>>> What I generally look at is the detailed rule performance in
>>> masscheck. If it primarily hits on spams that score in total 1-3
>>> points.
>>
>> Why not under 5?
> 
> If it's close to 5 and there's a limit that suggests the limit could be 
> increased a bit.
> 
> It also needs to take into account the ham hits, which is why having a 
> ham-starved corpus is such a problem.
> 

Are you saying we have a ham-starved corpus?

		OVERALL	 SPAM	 HAM
ena-week0	77,945	36,459	41,486
ena-week1	93,847	52,781	41,066
ena-week2	69,297	30,328	38,969
ena-week3	75,853	31,995	43,858
ena-week4	92,680	37,511	55,169
		409,622	189,074	220,548	

http://ruleqa.spamassassin.org

-- 
David Jones

Re: APOSTROPHE_TOCC score

Posted by John Hardin <jh...@impsec.org>.
On Tue, 6 Mar 2018, RW wrote:

> On Tue, 6 Mar 2018 08:47:35 -0800 (PST)
> John Hardin wrote:
>
>> On Tue, 6 Mar 2018, David Jones wrote:
>
>>> In this case these were really bad spam so the APOSTROPHE_TOCC is
>>> just riding on the back of other rules, BLs, and high Bayes
>>> scores.
>>
>> What I generally look at is the detailed rule performance in
>> masscheck. If it primarily hits on spams that score in total 1-3
>> points.
>
> Why not under 5?

If it's close to 5 and there's a limit that suggests the limit could be 
increased a bit.

It also needs to take into account the ham hits, which is why having a 
ham-starved corpus is such a problem.

Generally speaking there's a spike, if the spike is at less than 5 it 
needs attention and the lower the spike is the more generous the score 
limit may be, bearing in mind that poison pills should be rare.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Failure to plan ahead on someone else's part does not constitute
   an emergency on my part.                 -- David W. Barts in a.s.r
-----------------------------------------------------------------------
  5 days until Daylight Saving Time begins in U.S. - Spring Forward

Re: APOSTROPHE_TOCC score

Posted by RW <rw...@googlemail.com>.
On Tue, 6 Mar 2018 08:47:35 -0800 (PST)
John Hardin wrote:

> On Tue, 6 Mar 2018, David Jones wrote:

> > In this case these were really bad spam so the APOSTROPHE_TOCC is
> > just riding on the back of other rules, BLs, and high Bayes
> > scores.  
> 
> What I generally look at is the detailed rule performance in
> masscheck. If it primarily hits on spams that score in total 1-3
> points.

Why not under 5?

Re: APOSTROPHE_TOCC score

Posted by John Hardin <jh...@impsec.org>.
On Tue, 6 Mar 2018, David Jones wrote:

> On 03/05/2018 06:57 PM, John Hardin wrote:
>> On Mon, 5 Mar 2018, Alex wrote:
>> 
>>> Hi,
>>> 
>>> On Mon, Mar 5, 2018 at 5:59 PM, John Hardin <jh...@impsec.org> wrote:
>>>> On Mon, 5 Mar 2018, Alex wrote:
>>>> 
>>>>> To: =?utf-8?Q?DermotO=27reilly?= <Se...@example.com>
>>>>> *  2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe
>>>>> 
>>>>> 2.6 points for this is just unreasonable. This was a completely
>>>>> legitimate email.
>>>> 
>>>> Is such an address even deliverable?
>>> 
>>> Yes, it's beyond me why anyone would want to use an apostrophe, but
>>> it's valid.
>> 
>> OK.
>> 
>> That rule is 8 years stale. I've added a masscheck score limit of 1.000
>> 
>> I'm open to discussion of converting it to a subrule and/or adding some 
>> extra conditions to it.
>> 
>
> Here are some samples of what I found in my corpora which supplies the 
> majority of the nightly masscheck corpora.
>
> https://pastebin.com/QchEu2BA
> https://pastebin.com/pbYnvzU4
> https://pastebin.com/EjnQSE7H
>
> In this case these were really bad spam so the APOSTROPHE_TOCC is just riding 
> on the back of other rules, BLs, and high Bayes scores.

What I generally look at is the detailed rule performance in masscheck. If 
it primarily hits on spams that score in total 1-3 points I generally 
tend to set the score limit somewhat higher. Having a tail of 
higher-scoring hits doesn't affect that analysis.

This looks like one of those rules.

In this case I'd probably set the score limit on this rule low and add 
more generously-scored metas for the high-spam-low-ham rule overlaps from 
the masscheck results.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Failure to plan ahead on someone else's part does not constitute
   an emergency on my part.                 -- David W. Barts in a.s.r
-----------------------------------------------------------------------
  5 days until Daylight Saving Time begins in U.S. - Spring Forward

Re: APOSTROPHE_TOCC score

Posted by David Jones <dj...@ena.com>.
On 03/05/2018 06:57 PM, John Hardin wrote:
> On Mon, 5 Mar 2018, Alex wrote:
> 
>> Hi,
>>
>> On Mon, Mar 5, 2018 at 5:59 PM, John Hardin <jh...@impsec.org> wrote:
>>> On Mon, 5 Mar 2018, Alex wrote:
>>>
>>>> To: =?utf-8?Q?DermotO=27reilly?= <Se...@example.com>
>>>> *  2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe
>>>>
>>>> 2.6 points for this is just unreasonable. This was a completely
>>>> legitimate email.
>>>
>>> Is such an address even deliverable?
>>
>> Yes, it's beyond me why anyone would want to use an apostrophe, but
>> it's valid.
> 
> OK.
> 
> That rule is 8 years stale. I've added a masscheck score limit of 1.000
> 
> I'm open to discussion of converting it to a subrule and/or adding some 
> extra conditions to it.
> 

Here are some samples of what I found in my corpora which supplies the 
majority of the nightly masscheck corpora.

https://pastebin.com/QchEu2BA
https://pastebin.com/pbYnvzU4
https://pastebin.com/EjnQSE7H

In this case these were really bad spam so the APOSTROPHE_TOCC is just 
riding on the back of other rules, BLs, and high Bayes scores.

-- 
David Jones

Re: APOSTROPHE_TOCC score

Posted by John Hardin <jh...@impsec.org>.
On Mon, 5 Mar 2018, Alex wrote:

> Hi,
>
> On Mon, Mar 5, 2018 at 5:59 PM, John Hardin <jh...@impsec.org> wrote:
>> On Mon, 5 Mar 2018, Alex wrote:
>>
>>> To: =?utf-8?Q?DermotO=27reilly?= <Se...@example.com>
>>> *  2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe
>>>
>>> 2.6 points for this is just unreasonable. This was a completely
>>> legitimate email.
>>
>> Is such an address even deliverable?
>
> Yes, it's beyond me why anyone would want to use an apostrophe, but
> it's valid.

OK.

That rule is 8 years stale. I've added a masscheck score limit of 1.000

I'm open to discussion of converting it to a subrule and/or adding some 
extra conditions to it.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Failure to plan ahead on someone else's part does not constitute
   an emergency on my part.                 -- David W. Barts in a.s.r
-----------------------------------------------------------------------
  6 days until Daylight Saving Time begins in U.S. - Spring Forward

Re: APOSTROPHE_TOCC score

Posted by Alex <my...@gmail.com>.
Hi,

On Mon, Mar 5, 2018 at 5:59 PM, John Hardin <jh...@impsec.org> wrote:
> On Mon, 5 Mar 2018, Alex wrote:
>
>> To: =?utf-8?Q?DermotO=27reilly?= <Se...@example.com>
>> *  2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe
>>
>> 2.6 points for this is just unreasonable. This was a completely
>> legitimate email.
>
> Is such an address even deliverable?

Yes, it's beyond me why anyone would want to use an apostrophe, but
it's valid. We discourage its use because it just makes sharing your
address more difficult, and there's also probably some weird system
that doesn't know how to handle it out there.

https://en.wikipedia.org/wiki/Email_address#Local-part

Re: APOSTROPHE_TOCC score

Posted by John Hardin <jh...@impsec.org>.
On Mon, 5 Mar 2018, Alex wrote:

> To: =?utf-8?Q?DermotO=27reilly?= <Se...@example.com>
> *  2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe
>
> 2.6 points for this is just unreasonable. This was a completely
> legitimate email.

Is such an address even deliverable?


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Failure to plan ahead on someone else's part does not constitute
   an emergency on my part.                 -- David W. Barts in a.s.r
-----------------------------------------------------------------------
  6 days until Daylight Saving Time begins in U.S. - Spring Forward

Re: APOSTROPHE_TOCC score

Posted by Alex <my...@gmail.com>.
Hi,


On Mon, Mar 5, 2018 at 4:48 PM, RW <rw...@googlemail.com> wrote:
> On Mon, 5 Mar 2018 16:23:31 -0500
> Alex wrote:
>
>> Hi,
>>
>> I just received a false-positive because of the following address:
>>
>> To: "'info@example.se'" <in...@example.se>
>>
>> Apparently the apostrophe is enough to warrant 2.5 points alone? Is
>> this intended to catch addresses like tom.o'reilly@example.com or more
>> like my example above?
>
> Only the former, but I can't reproduce the bug from the above example.

I'm sorry, too many terminals open. The email producing this hit was
indeed with o'reilly in it:

To: =?utf-8?Q?DermotO=27reilly?= <Se...@example.com>
 *  2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe

2.6 points for this is just unreasonable. This was a completely
legitimate email.

Re: APOSTROPHE_TOCC score

Posted by RW <rw...@googlemail.com>.
On Mon, 5 Mar 2018 16:23:31 -0500
Alex wrote:

> Hi,
> 
> I just received a false-positive because of the following address:
> 
> To: "'info@example.se'" <in...@example.se>
> 
> Apparently the apostrophe is enough to warrant 2.5 points alone? Is
> this intended to catch addresses like tom.o'reilly@example.com or more
> like my example above?

Only the former, but I can't reproduce the bug from the above example.