You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Amir Caspi <ce...@3phase.com> on 2018/12/20 22:11:50 UTC

Proposed rule for too many dots in From

John, would you mind sandboxing a rule?

	Two or more dots in the From username seems to be rather spammy (and we've talked about it before on the list).  Would you mind sandboxing this test rule to see if it would be helpful as a main rule?  I get a lot of spam locally that hits this...

header	AC_FROM_MANY_DOTS	From =~ /<(?:\w+\.){2,}\w+@/
describe	AC_FROM_MANY_DOTS	Two or more periods in the From username

We could, of course, increase to three or more dots... maybe the three-dot version would score higher on its own, but the two-dot could be better in combo... not sure.

Hopefully it's helpful...

Cheers.

--- Amir


Re: Proposed rule for too many dots in From

Posted by Paul Stead <pa...@gmail.com>.
Looks like it was hitting a fair amount of ham the last week or so.

https://ruleqa.spamassassin.org/20190607-r1860743-n/T_AC_FROM_MANY_DOTS/detail

The last few days have looked a bit better:

https://ruleqa.spamassassin.org/20190609-r1860879-n/T_AC_FROM_MANY_DOTS/detail
https://ruleqa.spamassassin.org/20190610-r1860930-n/T_AC_FROM_MANY_DOTS/detail

3 days good performance on ruleqa equals promotion followed by a scoring
day.

On Mon, 10 Jun 2019 at 19:13, Amir Caspi <ce...@3phase.com> wrote:

> On Jan 26, 2019, at 10:27 AM, John Hardin <jh...@impsec.org> wrote:
> >
> > On Thu, 24 Jan 2019, Amir Caspi wrote:
> >
> >> On Jan 15, 2019, at 8:46 AM, John Hardin <jh...@impsec.org> wrote:
> >>>
> >>>> On Dec 20, 2018, at 6:16 PM, Amir Caspi <Ce...@3phase.com> wrote:
> >>>>>
> >>>>> header    AC_FROM_MANY_DOTS       From =~ /<(?:\w{2,}\.){2,}\w+@/
> >>>
> >>> Argh. I lost track of that over the holidays. Thanks for the reminder,
> adding it now.
> >>
> >> Anything interesting with the results on sandboxing this rule?
> >
> > Not really, at least not by itself.
>
> It looks like this rule was still being tested last month (I saw it
> hitting a bunch of my spams), but now appears to be gone (it's not hitting
> on spams that it normally would).  Did you decide it wasn't sufficiently
> useful, either alone or in meta?
>
> Cheers.
>
> --- Amir
>
>

Re: Proposed rule for too many dots in From

Posted by Amir Caspi <ce...@3phase.com>.
On Jan 26, 2019, at 10:27 AM, John Hardin <jh...@impsec.org> wrote:
> 
> On Thu, 24 Jan 2019, Amir Caspi wrote:
> 
>> On Jan 15, 2019, at 8:46 AM, John Hardin <jh...@impsec.org> wrote:
>>> 
>>>> On Dec 20, 2018, at 6:16 PM, Amir Caspi <Ce...@3phase.com> wrote:
>>>>> 
>>>>> header	AC_FROM_MANY_DOTS	From =~ /<(?:\w{2,}\.){2,}\w+@/
>>> 
>>> Argh. I lost track of that over the holidays. Thanks for the reminder, adding it now.
>> 
>> Anything interesting with the results on sandboxing this rule?
> 
> Not really, at least not by itself.

It looks like this rule was still being tested last month (I saw it hitting a bunch of my spams), but now appears to be gone (it's not hitting on spams that it normally would).  Did you decide it wasn't sufficiently useful, either alone or in meta?

Cheers.

--- Amir


Re: Proposed rule for too many dots in From

Posted by John Hardin <jh...@impsec.org>.
On Thu, 24 Jan 2019, Amir Caspi wrote:

> On Jan 15, 2019, at 8:46 AM, John Hardin <jh...@impsec.org> wrote:
>>
>>> On Dec 20, 2018, at 6:16 PM, Amir Caspi <Ce...@3phase.com> wrote:
>>>>
>>>> header	AC_FROM_MANY_DOTS	From =~ /<(?:\w{2,}\.){2,}\w+@/
>>
>> Argh. I lost track of that over the holidays. Thanks for the reminder, adding it now.
>
> Anything interesting with the results on sandboxing this rule?

Not really, at least not by itself.

https://ruleqa.spamassassin.org/20190125-r1852100-n/__AC_FROM_MANY_DOTS/detail

It hits low-scoring spam, so it *might* be worthwhile in a meta. I'll see 
what I can do with it.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Tomorrow: Wolfgang Amadeus Mozart's 263rd Birthday

Re: Proposed rule for too many dots in From

Posted by Amir Caspi <ce...@3phase.com>.
On Jan 15, 2019, at 8:46 AM, John Hardin <jh...@impsec.org> wrote:
> 
>> On Dec 20, 2018, at 6:16 PM, Amir Caspi <Ce...@3phase.com> wrote:
>>> 
>>> header	AC_FROM_MANY_DOTS	From =~ /<(?:\w{2,}\.){2,}\w+@/
> 
> Argh. I lost track of that over the holidays. Thanks for the reminder, adding it now.

Anything interesting with the results on sandboxing this rule?

Thanks!

--- Amir


Re: Proposed rule for too many dots in From

Posted by John Hardin <jh...@impsec.org>.
On Mon, 14 Jan 2019, Amir Caspi wrote:

> On Dec 20, 2018, at 6:16 PM, Amir Caspi <Ce...@3phase.com> wrote:
>>
>> header	AC_FROM_MANY_DOTS	From =~ /<(?:\w{2,}\.){2,}\w+@/
>>
>> John, could you update the sandbox rule to the above?  That should whittle down FPs. I'd recommend leaving it as 2 letters, though, since a number of spammy addresses are things like john.at.amazon or some such like that.
>
> John, just curious on whether there are any results from sandboxing this rule, and whether it -- or some variant -- might be a good addition?  (Likely within a meta, I'm sure.)
>
> Thanks and happy new year!

Argh. I lost track of that over the holidays. Thanks for the reminder, 
adding it now.



-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   All I could think about was this bear is so close to me I can
   see its teeth. I could have kissed it. I wished I had a gun.
                                              -- Alyson Jones-Robinson
-----------------------------------------------------------------------
  2 days until Benjamin Franklin's 313th Birthday

Re: Proposed rule for too many dots in From

Posted by Amir Caspi <ce...@3phase.com>.
On Dec 20, 2018, at 6:16 PM, Amir Caspi <Ce...@3phase.com> wrote:
> 
> header	AC_FROM_MANY_DOTS	From =~ /<(?:\w{2,}\.){2,}\w+@/
> 
> John, could you update the sandbox rule to the above?  That should whittle down FPs. I'd recommend leaving it as 2 letters, though, since a number of spammy addresses are things like john.at.amazon or some such like that.

John, just curious on whether there are any results from sandboxing this rule, and whether it -- or some variant -- might be a good addition?  (Likely within a meta, I'm sure.)

Thanks and happy new year!

--- Amir


Re: Proposed rule for too many dots in From

Posted by RW <rw...@googlemail.com>.
On Thu, 20 Dec 2018 21:12:33 -0700
Grant Taylor wrote:

> On 12/20/18 8:34 PM, Grant Taylor wrote:
> > I'm going back through and analyzing how I'm extracting data and
> > trying to satisfactorily explain some oddities.  
> 
> Out of 244,921 messages there are 16,528 unique addresses, this is
> how the messages break down for
> 
> Here's how the dots in the user parts of 16,528 unique addresses out
> of 244,921 messages break down:
> 
>    13,277               (no dots 80.3%)
>     2,936 .             ( 1 dot  17.7%)
>       281 ..            ( 2 dots  1.7%)
>        29 ...           ( 3 dots  0.2%)
>         3 ....          ( 4 dots  0.0%)
>         1 .....         ( 5 dots  0.0%)
>         1 ...........   (11 dots  0.0%)
> 
> So, in light of this information, I would be willing to concede 3 or 
> more dots is possibly and indicator of spam.

I think you are a bit premature there, without having separate figures
for spam  and ham, you can't say even whether any of these are good spam
indicator - even in isolation.

> My previous log methodology 

Isn't a sound method for scoring. For one thing it assumes that more
dots are more spammy. It could be that the S/O peaks at 4. 

For another, scoring should be about the balance of extra TPs and FPs
that the rule creates. Sometimes the more spammy looking rule hits
higher scoring spam and warrants a lower score.

Re: Proposed rule for too many dots in From

Posted by Grant Taylor <gt...@tnetconsulting.net>.
On 12/20/18 8:34 PM, Grant Taylor wrote:
> I'm going back through and analyzing how I'm extracting data and trying 
> to satisfactorily explain some oddities.

Out of 244,921 messages there are 16,528 unique addresses, this is how 
the messages break down for

Here's how the dots in the user parts of 16,528 unique addresses out of 
244,921 messages break down:

   13,277               (no dots 80.3%)
    2,936 .             ( 1 dot  17.7%)
      281 ..            ( 2 dots  1.7%)
       29 ...           ( 3 dots  0.2%)
        3 ....          ( 4 dots  0.0%)
        1 .....         ( 5 dots  0.0%)
        1 ...........   (11 dots  0.0%)

So, in light of this information, I would be willing to concede 3 or 
more dots is possibly and indicator of spam.

My previous log methodology would add the following spam score to 
messages with 3 or more dots.  (Assuming 3 dots is the number we start 
adding to the spam score.)

  3 dots = 1
  4 dots = 1.26
  5 dots = 1.46
11 dots = 2.18

Assuming 2 dots are allowed and is the number:

  3 dots = 1.58
  4 dots = 2.00
  5 dots = 2.32
11 dots = 3.46

I think I would be comfortable blindly adding log$Base($numberOfDots) 
(when numberOfDots > $Base) to the spam score.  I don't even see a need 
to mess with a meta rule.



-- 
Grant. . . .
unix || die


Re: Proposed rule for too many dots in From

Posted by Grant Taylor <gt...@tnetconsulting.net>.
On 12/20/18 7:54 PM, Amir Caspi wrote:
> Some of the ones with equal-signs look like bounce addresses from 
> envelopes, that would not be in the From header.

I'm going back through and analyzing how I'm extracting data and trying 
to satisfactorily explain some oddities.  I don't think there will be 
any significant change in the numbers.  But it wouldn't be the first 
time I was wrong about something, even today.

I have found that one of the IETF mailing lists that I subscribe to and 
participate in seemingly encodes the sending address as original from 
user part, =40 (hex for @), original from domain part, (actual) @, 
mailing list, .ietf.org.  This seems to especially be the case for 
senders from domains with DMARC enabled.

So:

    john.doe@example.com

Becomes:

    john.doe=40example.com@list.ietf.org

This is the contents of the From: header.

I consider that to be a legitimate email address.  Granted, it's 
probably atypical.  But none-the-less legitimate.

I'm also seeing email addresses use the (…) comments in From: headers.

    From: "So and So" <jo...@example.com> (please no spam)

Again, legitimate email addresses.



-- 
Grant. . . .
unix || die


Re: Proposed rule for too many dots in From

Posted by Grant Taylor <gt...@tnetconsulting.net>.
On 12/20/18 7:54 PM, Amir Caspi wrote:
> Are these in the From: header or the envelope-from (Return-Path)?

These are all the From: header.

> Some of the ones with equal-signs look like bounce addresses from 
> envelopes, that would not be in the From header.  Or did you just look for 
> any email address, so these could include Reply-To and emails in the body?

Nope.  I explicitly looked for the From: header.

They may look like VERP.  They may be VERP.  But remember that there's 
nothing that prohibits using VERP like addresses in the From: header.

This is particularly important when an email address tries to encode 
another email address.  The original "@" usually becomes another 
character, frequently "=".

> In general it looks like you do have a bunch that would hit on even the 
> multi-letter rule... but if they're envelopes or destinations (not From:) 
> then they wouldn't be searched in this rule.

They are indeed addresses in the From: header.

I'm using formail to explicitly extract only the From: header.

    formail -c -X From:

> Cheers and thanks.

You're welcome.



-- 
Grant. . . .
unix || die


Re: Proposed rule for too many dots in From

Posted by Amir Caspi <ce...@3phase.com>.
On Dec 20, 2018, at 7:49 PM, Grant Taylor <gt...@tnetconsulting.net> wrote:
> 
> So here's the user parts (left hand side of the @) of emails.

Are these in the From: header or the envelope-from (Return-Path)?  Some of the ones with equal-signs look like bounce addresses from envelopes, that would not be in the From header.  Or did you just look for any email address, so these could include Reply-To and emails in the body?

In general it looks like you do have a bunch that would hit on even the multi-letter rule... but if they're envelopes or destinations (not From:) then they wouldn't be searched in this rule.

Cheers and thanks.

--- Amir

Re: Proposed rule for too many dots in From

Posted by Grant Taylor <gt...@tnetconsulting.net>.
On 12/20/18 8:36 PM, Benny Pedersen wrote:
> and xxx is a real tld,

Yes.

> so you ddos maillist members now

How so?



-- 
Grant. . . .
unix || die


Re: Proposed rule for too many dots in From

Posted by Benny Pedersen <me...@junc.eu>.
Grant Taylor skrev den 2018-12-21 03:49:

> Note:  These are what I considered legitimate enough to keep in my
> mail structure.  I don't keep spam for very long.  This corpus goes
> back to 2001.

and xxx is a real tld, so you ddos maillist members now

Re: Proposed rule for too many dots in From

Posted by Grant Taylor <gt...@tnetconsulting.net>.
On 12/20/18 7:36 PM, Grant Taylor wrote:
> I don't know.  I'm re-running the command to scan my mailbox extracting 
> From: addresses.  (I'm logging to a file this time.)  I'll do some 
> analysis and let you know.

I don't know what sort of characterization you may want.  So here's the 
user parts (left hand side of the @) of emails.

Note:  These are what I considered legitimate enough to keep in my mail 
structure.  I don't keep spam for very long.  This corpus goes back to 2001.

       1 x.xxxx.x
       1 xxxx.x.x.
       1 x.x.x.x.xx.x
       1 xxx.xx.xx
       1 xxx.x.xxxx
       1 xxxxx.x.xx
       1 x.xxxxxxx.x
       1 xxx.xxx.xxx
       1 xxxxx.x.xxx
       1 xxxxx.xx.xx
       1 x.xx.xxxx.xxx
       1 x.xxx.xxxxxx
       1 xxx.x.xxxxxx
       1 xxxx.xxxx.xx
       1 x.xxxxx.xxxxx
       1 xxxx.xx.xxxxx
       1 xxx.x.xxxxxxxx
       1 xxx.xxx.xxx.xxx
       1 xxxx.xxxxxx.xx
       1 xxxxxxxx.x.xxx
       1 xxxxx.xx.xxxxxx
       1 xxxxxxx.x.xxxxx
       1 xxxxxxx.xxxxx.x
       1 xxx.xxxx.xxxxxxx
       1 xxx.xxxxxxxxx.xx
       1 xxxx.xxxxx.xxxxx
       1 xx.xxxxxx.xxxxxxx
       1 xxxxx.xxxxxx.xxxx
       1 xxxxxxx.xxxxxx.xx
       1 xx.xxxxxx.xxxxxxxx
       1 xxx.xx.xxx.xxxxxxxx
       1 xxxxxx.xxx.xxxxxxx
       1 xxxxxxxx.xxxx.xxxx
       1 xxxxxx.xxxxxxx.xxxx
       1 xxxxxxxxx.x.xxxxxxx
       1 xxxxxxxxx.xxx.xxxxx
       1 xxx.xxxx=xxxxxx.xxx.xx
       1 xxxx.xxxxx=xxxxxx.xxx
       1 xxxx.x.xxxxxxxxxxx.xxx
       1 xxxx.xxxxxxxxx.xxxxxx
       1 xxxxxxx.xxxxxx.xxxxx.x
       1 xxx.xxxx.xxx.xxxxxxxxxx
       1 xxxxxx.xxxxxxxx.xxxxxx
       1 xxxxxxx.xxxxxx.xxxxxxx
       1 xxxxxxxx.xx.xxxxxxxxxx
       1 xxxxx-xxxx.xxxx.xxxxxxxx
       1 xxxxxx.xxxxxx.xxxxxx+xxx
       1 xxxxxxxxxxx.xxxx.xxxxxx
       1 xxx-xxxxxxxx.xx.xxxxxxxxxx
       1 xxxxxxxx+xxxxxx.xxxxxx.xxx
       1 xxxxxxxxxxx.xxxxxxx.xxxxx
       1 xxxxxx.xxxxxxxxxxxx.xxxxxx
       1 xxxxxx.xxxxxxxx=xxxxxxxxxx.xx
       1 xxxxxxxxx.xxxxxxx.xxxxx.xx.xxxxxxx
       1 xxxxxxx-xxxxx-xxxxxxx=xxxxxxx.xxxxxxxxxxxxx.xxx
       1 xxxxxxx-xxxxxxxxxxxx-xxxxxxx=xxxxxxx.xxxxxxxxxxxxx.xxx
       1 
xxxxxxx-xx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxx-xxx=xxxxxxx.xxxxxxxxxxxxxx.xxx
       2 xx.xxxxx.xx
       2 xxx.xxxx.xx
       2 xxxxxxx.x.xxxx
       2 xxxxxxx.xxxx.x
       2 xxxx.xxxxx.xxxx
       2 xxxx.xxxxxxx.xx
       2 xx.xxxxxx.xxxxxx
       2 xxxxxx.x.xxxxxxx
       2 xxxxxx.x.xxxx.xxx
       2 xxxxxx.xx.xxxxxx
       2 xxxxxxx.xxxxxx.x
       2 xxxxxxxx.x.xxxxx
       2 xxx.xxxxx+xxxxx.xx
       2 xxxx.x.xxxxxxxxxx
       2 xxxx.xxxxxxxx.xxx
       2 xxxxxxx.xx.xxxxxx
       2 xxxxxx.x.xxxxxxxxx
       2 xxxxxxx.xx.xxxxxxx
       2 xxxxxxxx.x.xxxxxxx
       2 x.xxxxxxxxxx.xxxxxx
       2 xxx.xxxxx.xxxxxxxxx
       2 xxxx.xxx.xxxxxxxxxx
       2 xxxxx.xxxxxxx.xxxxx
       2 xxx.xxxxx.xxxx.xxxxxx
       2 xxxx.xxxxxxx.xxxxxxx
       2 xxxxx.x.xxxxxxxx.xxxx
       2 xxxxx.xxxx=xxxxxx.xxx
       2 xxxxx.xxxx.xxxxxxxxx
       2 xxxxxx.xx.xxxxxxxxxx
       2 xxxx.xxxxxxxxxx.xxxxxxxx
       2 xxxxxxx.xxxx=xxxxxxxxxx.xxx
       2 xxx-xxxxxx.xxxxxxxx.xxxxxxxxxx
       2 xxxxx.xxxx.xx.xx.xx.xxx.xxx.xx.xxx.xxx.xx.xx
       3 xx.x.x
       3 x.xx.x.x.x
       3 xx.xxxxxx.xx
       3 xxx.xxxx.xxx
       3 xxx.x.xxxxxxx
       3 xxxxx.xx.xxxxx
       3 xxxxxx.xxxxx.x
       3 x.xxxxxxxx.xxxx
       3 xx-xxxx.xxxx.xxx
       3 xxxxx.xxxxx.xxx
       3 xxxxxx.xxxx.xxx
       3 x.xxxx.xxxxxxxxx
       3 xxx.xxxxxxx.xxxx
       3 xxxxx.xxx.xxxxxxx
       3 xxxxx.xxxxx.xxxxx
       3 xxxxxxx.xxxxxxx.x
       3 xxx.xxxxxxxx.xxxxx
       3 xxxxxxx.x.xxxxx.xxx
       3 xxxxxx.x.xxxxxxxxxx
       3 xxxxxxxx.xxxxxxxx.xxxx
       3 xxxx.xxxxxx.xxxxxxxxxxx
       4 x.x.x.xxxxx
       4 xxx.x.xxxxx
       4 xxxx.xx.xxx
       4 x.xxxxx.xxxx
       4 x.xxxx.xxxxxxxx
       4 xxx.xxxxxxxx.xxx
       4 xxxx.xxxxxxx.xxxx
       4 xxxxxx.xxxxxxx.xxx
       4 xxxxx.xxxxxx.xxxxxxx
       4 xxxxxx.xxxxxx.xxxxxxxx
       4 xxxxxx.xxxxxxx.xxxxxxxxx
       5 xx.xxx.xxxx
       5 xxxxx.xxxxx.xx
       5 xxxxxx.x.xxxxx
       5 xxx.xxxx.xxxxxx
       5 xxxx.x.xxxxxxxx
       5 xxxxxxx.xx.xxxx
       5 xxxx.xxxxxx.xxxx
       5 xxxxxx.xxxxxxxx.x
       5 xxxxxxxx.x.xxxxxx
       5 xxxxxx.xx.xxxxx.xxx>
       5 xxxxxxx.xxxxx.xxxxxxx
       6 xx.x.xxxx
       6 x.x.x.xxxxxxx
       6 xxxx.xxxxxxxx.xx
       6 xxxxx.xxx.xxxxxx
       6 xxxxxx.x.xxxxxxxx
       6 x.xxxxxxx.xxxxxxxx
       6 xxxxxx.xxxxxxxx.xxx
       6 xxxxx.xxxxxxxxx.xxxx
       7 xx.x.x.x
       7 x.x.xxxxxxxx
       7 xxxxx.xx.xxxx
       7 xxxxxxx.x.xxxx-x
       7 xxxxxxxxx.x.xxxx
       7 xxxx.xxxxxxxxx.xxxxx
       7 xxxxxxxxx.xxxxx.xxxxxxxx
       7 xxxxx.xxxxx=xxxxxxxxxxxxx.xx
       8 xxxxxx.xxxxx.xxxxx
       8 xxxx.xxxxxxxxx.xxxx
       8 xxxxxx.xxxx.xx.xxxxxxxxx
       9 xxxx.x.xxxxx
       9 xxxxxx.x.xxxx
       9 xxxxxxx.xxxxx.xx
       9 xxxxx.x.xxxxxxx.xxx
       9 x-xxxxxxx=xxxxxxx.xxxxxxxxxxxxxx.xxx-xxxxx
      10 xxxx.xx.xxxx
      10 xxxxxx.xxxxxx.xxx
      11 xxxxx.x.xxxxx
      11 xxxxxx..xxx.xxxx
      11 xxxx.xxxxxx.xxxxx
      11 xxxxxxx.xxxxxx.xxxxx
      12 xxx.xxxx.xxxxx
      12 xxx.xx.xxxxx.xxx
      13 x.xxxxxxx.xxxxxx
      13 xxxx.xxxx.xxxxxxxx
      13 xxxx.xxxxxx.xxxxxx
      14 xxxxx.xxxx.xxx
      15 xxxx.xxxx.xxxxxx
      15 x.x.xxxxxxxxxxxxxxx
      15 xxxxxxxx.xxxxxxxx.xx
      16 xxxxxx.xxx.xx
      17 x.x.xxxxxxxxx
      18 x.x.xxx
      18 xx.xx.xxxxxx
      19 xxxxxx.x.xxxxxx
      20 xxxxxx.xxxx.xxxxxx
      23 x.x.xxxxxxxxxx
      23 xxxxxxx.x.xxxxxxxx
      26 xxxxxx.xxxxxx.xxxxxxx
      27 x.x.xxxxxx
      28 xxxxx.xx.xxxxxxxxx
      29 xxxxxxx.xxxx.xxx
      29 xxxxx.xxxxxxx.xxxx
      29 xxxxx.xxxxxx.xxxxxxxx
      30 x.x.x
      31 xxx.x.xxxxxxxxxxxx
      33 x.xxxx.xxx
      33 xxxx.x.xxxx
      38 xxx.xxxxxxxxx.xx.xxxxx
      39 xxxxx.xxxxxxxxxx.xx
      43 xxxxx.x.xxxx
      49 xxxx.x.xxxxxx
      56 xxxxx.x.xxxxxx
      56 xxxxx.xxxxx.xxxxxxx
      57 x-x.xxxxxxxxxx.xxxxx.xxxx
      58 xxxxxxx.x.xxxxxx
      59 x.x.x.xxxxxx
      61 xxxxxx+xxxx.xxxx.xxxxxxxx
      73 xxx.xxx.xxxx.xxx.xxxxxxxxx
      78 xxxxx.x.xxxxxxxxx
      79 xxxxx.xxx.xxxx
      82 xxxx.xxxx.xxxxxxxx.xxxx
     142 xxxxxxxx.xxxxxxx.xx
     147 xxxxx.x.xxxxxxx
     149 xx.xxxx.xxxx
     234 xxxx.x.xxxxxxx



-- 
Grant. . . .
unix || die


Re: Proposed rule for too many dots in From

Posted by Grant Taylor <gt...@tnetconsulting.net>.
On 12/20/18 6:16 PM, Amir Caspi wrote:
> I never intended for the rule to be applied on its own, but far more 
> likely that it would become part of a meta rule with other spammy 
> indicators.

Ah.  That makes more sense.

That being said, it is your server and you're free to run it however you 
want.

> That said, you're absolutely right -- I interact with a bunch of gov folks 
> and forgot about the middle initial being commonplace in the address.

;-)

> Typically that middle part is just one letter for the initial, so one 
> could change the rule to require at least two word characters between 
> the dots.  That is:
> 
> headerAC_FROM_MANY_DOTSFrom =~ /<(?:\w{2,}\.){2,}\w+@/

You could do something like that.  But I think that you're making the 
rule more complex (which is okay) but I'm not convinced that's 
necessarily a good thing.

I think I'd be likely to have people pick a number of dots that they 
think is reasonable (possibly with a default) and then take the log base 
that number of the number of dots in the message.  Then I'd add that 
result to the spam score.  If I could do such.

> Perhaps this is still too generic, and three dots should be the 
> minimum... but that's what the sandboxing will hopefully tell us.  And 
> part of the sandboxing will also hopefully tell us if this works well as 
> a meta -- I absolutely and wholeheartedly agree that the rule 
> _by_itself_ is not a good spam indicator at all... but combined with 
> other indicators, it might well be.

;-)

> Grant, how many of your legit emails would hit the above rule, requiring 
> more than one letter (i.e., more than just a middle initial) between the 
> dots?

I don't know.  I'm re-running the command to scan my mailbox extracting 
From: addresses.  (I'm logging to a file this time.)  I'll do some 
analysis and let you know.



-- 
Grant. . . .
unix || die


Re: Proposed rule for too many dots in From

Posted by Amir Caspi <ce...@3phase.com>.
On Dec 20, 2018, at 5:13 PM, Noel Butler <no...@ausics.net> wrote:
> I have to agree with Grant, two dots is crazy low, you might as well score at one dot.  A lot of emails are  firstname.initial.surname even many government departments in this part of the world use two dot format.
> 
I never intended for the rule to be applied on its own, but far more likely that it would become part of a meta rule with other spammy indicators.  That said, you're absolutely right -- I interact with a bunch of gov folks and forgot about the middle initial being commonplace in the address.  Typically that middle part is just one letter for the initial, so one could change the rule to require at least two word characters between the dots.  That is:

header	AC_FROM_MANY_DOTS	From =~ /<(?:\w{2,}\.){2,}\w+@/

John, could you update the sandbox rule to the above?  That should whittle down FPs. I'd recommend leaving it as 2 letters, though, since a number of spammy addresses are things like john.at.amazon or some such like that.

Perhaps this is still too generic, and three dots should be the minimum... but that's what the sandboxing will hopefully tell us.  And part of the sandboxing will also hopefully tell us if this works well as a meta -- I absolutely and wholeheartedly agree that the rule _by_itself_ is not a good spam indicator at all... but combined with other indicators, it might well be.

Grant, how many of your legit emails would hit the above rule, requiring more than one letter (i.e., more than just a middle initial) between the dots?

Thanks.

--- Amir


Re: Proposed rule for too many dots in From

Posted by Noel Butler <no...@ausics.net>.
On 21/12/2018 09:52, Grant Taylor wrote:

> On 12/20/2018 03:11 PM, Amir Caspi wrote: 
> 
>> Two or more dots in the From username seems to be rather spammy (and we've talked about it before on the list).
> 
> I feel obligated to comment that my wife's email address (Gmail) has two dots in it.  (Gmail is it's own can of worms for dots as they strip them, and other issues with Gmail.)  As do a number of other people that I exchange email with.

I have to agree with Grant, two dots is crazy low, you might as well
score at one dot.  A lot of emails are  firstname.initial.surname even
many government departments in this part of the world use two dot
format. 

-- 
Kind Regards, 

Noel Butler 

 		This Email, including any attachments, may contain legally privileged
information, therefore remains confidential and subject to copyright
protected under international law. You may not disseminate, discuss, or
reveal, any part, to anyone, without the authors express written
authority to do so. If you are not the intended recipient, please notify
the sender then delete all copies of this message including attachments,
immediately. Confidentiality, copyright, and legal privilege are not
waived or lost by reason of the mistaken delivery of this message. Only
PDF [1] and ODF [2] documents accepted, please do not send proprietary
formatted documents 

 

Links:
------
[1] http://www.adobe.com/
[2] http://en.wikipedia.org/wiki/OpenDocument

Re: Proposed rule for too many dots in From

Posted by Grant Taylor <gt...@tnetconsulting.net>.
On 12/20/2018 03:11 PM, Amir Caspi wrote:
> Two or more dots in the From username seems to be rather spammy (and 
> we've talked about it before on the list).

I feel obligated to comment that my wife's email address (Gmail) has two 
dots in it.  (Gmail is it's own can of worms for dots as they strip 
them, and other issues with Gmail.)  As do a number of other people that 
I exchange email with.

> Would you mind sandboxing this test rule to see if it would be helpful 
> as a main rule?  I get a lot of spam locally that hits this...
> 
> header	AC_FROM_MANY_DOTS	From =~ /<(?:\w+\.){2,}\w+@/
> describe	AC_FROM_MANY_DOTS	Two or more periods in the From username
> 
> We could, of course, increase to three or more dots... maybe the three-dot 
> version would score higher on its own, but the two-dot could be better 
> in combo... not sure.

Can't SpamAssassin add something to the score for each dot?

I just checked and my 249,000+ message corpus has 2,600+ message with 
two or more dots in the user part of the email address (From:addr).



-- 
Grant. . . .
unix || die


Re: Proposed rule for too many dots in From

Posted by John Hardin <jh...@impsec.org>.
On Thu, 20 Dec 2018, Amir Caspi wrote:

> John, would you mind sandboxing a rule?
>
> 	Two or more dots in the From username seems to be rather spammy (and we've talked about it before on the list).  Would you mind sandboxing this test rule to see if it would be helpful as a main rule?  I get a lot of spam locally that hits this...
>
> header	AC_FROM_MANY_DOTS	From =~ /<(?:\w+\.){2,}\w+@/
> describe	AC_FROM_MANY_DOTS	Two or more periods in the From username
>
> We could, of course, increase to three or more dots... maybe the three-dot version would score higher on its own, but the two-dot could be better in combo... not sure.
>
> Hopefully it's helpful...
>
> Cheers.
>
> --- Amir

Can you also provide a spample? Thanks!


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
   does quite what I want. I wish Christopher Robin was here."
                                            -- Peter da Silva in a.s.r
-----------------------------------------------------------------------
  5 days until Christmas