You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Mark Chaney <ma...@lists.macscr.com> on 2011/03/22 21:16:40 UTC

username in from address

Ever notice that a lot of spam seems to have your username in their from 
address? Such as an email sent TO blah@domain.com is FROM 
blah123@anotherdomain.com (notice 'blah' included in the from address). 
This appears to be the case with a large a majority of the spam that 
gets through my filters. Any ideas how to handle this? Would be nice to 
be able to add a score for matches like that.

Thanks,
Mark

Re: username in from address

Posted by John Hardin <jh...@impsec.org>.

On Tue, 29 Mar 2011, Mark Chaney wrote:

>> Watch __TO_EQ_FROM_USR and __TO_EQ_FROM_USR_NN on ruleqa starting
>> tomorrow.  The latter rule ignores trailing numbers, e.g. From:
>> john54@example.com matches To: john@other.example.org as well as To:
>> john6143598435623452@unrealistic.example.net.
>
> Is this just theory code or does something like what I am talking already 
> exist? I couldnt find anything on google relating to '
> __TO_EQ_FROM_USR_NN'.

"ruleqa" is the automated nightly mass rule test and scoring system. Rules 
in developers' sandboxes can, if they perform well enough, be 
automatically published into the next update that sa-update will apply.

The tools to review it are at http://ruleqa.spamassassin.com/ and you 
can type in specific rule names or rule name patterns (e.g. /__TO_EQ_FROM) 
to focus on specific rules

Unfortunately ruleqa seems to be broken at the moment, no new scoring runs 
have occurred since the 21st.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Liberals love sex ed because it teaches kids to be safe around their
   sex organs. Conservatives love gun education because it teaches kids
   to be safe around guns. However, both believe that the other's
   education goals lead to dangers too terrible to contemplate.
-----------------------------------------------------------------------
  Today: the M1911 is 100 years old - and still going strong!

Re: username in from address

Posted by Per Jessen <pe...@computer.org>.

Mark Chaney wrote:

> Watch __TO_EQ_FROM_USR and __TO_EQ_FROM_USR_NN on ruleqa starting
> tomorrow.  The latter rule ignores trailing numbers, e.g. From:
> john54@example.com matches To: john@other.example.org as well as To:
> john6143598435623452@unrealistic.example.net.
> 
> Is this just theory code or does something like what I am talking
> already exist? I couldnt find anything on google relating to '
> __TO_EQ_FROM_USR_NN'.

http://www.gossamer-threads.com/lists/spamassassin/commits/161702?page=last

FWIW, I added those rules to my systems yesterday around 10:00UTC, but
left out "!(__FROM_DNS || __FROM_INFO || __SENDER_BOT)".  Sofar I have
not seen a single hit. 


/Per Jessen, Zürich

Re: username in from address

Posted by Mark Chaney <ma...@lists.macscr.com>.

At least to my personal email address that I have had for about 9 years 
now, so I use it for a lot of my testing, about 90% of spam thats 
getting through my regular spamassassin filters is coming from the 
format I described in my first email. I havent had a chance to check on 
my other spamassassin servers to see how common this is, but its 
definitely to me, so I can only guess its to others unless somehow just 
a particular spammer is using this strategy (though there isnt anything 
in common with content or the servers its coming from).

All the filters I use now are pretty standard ones and I have never done 
anything custom, so I am really not sure how to even use the example 
below that was given to me in an earlier email:

Watch __TO_EQ_FROM_USR and __TO_EQ_FROM_USR_NN on ruleqa starting
tomorrow.  The latter rule ignores trailing numbers, e.g. From:
john54@example.com matches To: john@other.example.org as well as To:
john6143598435623452@unrealistic.example.net.

Is this just theory code or does something like what I am talking already exist? I couldnt find anything on google relating to '
__TO_EQ_FROM_USR_NN'.

Thanks,
Mark

Re: username in from address

Posted by Adam Katz <an...@khopis.com>.

On 3/22/2011 1:16 PM, Mark Chaney wrote:
>>>> Ever notice that a lot of spam seems to have your username in
>>>> their from address? Such as an email sent TO blah@domain.com is
>>>> FROM blah123@anotherdomain.com (notice 'blah' included in the
>>>> from address).

On 3/22/2011 2:31 PM, Adam Katz wrote:
>> somebody could throw something up in their sandbox, but we'd need
>> the result from timing.log (not published) to properly gauge the
>> results (assuming it even has a favorable hit rate and S/O).

Watch __TO_EQ_FROM_USR and __TO_EQ_FROM_USR_NN on ruleqa starting
tomorrow.  The latter rule ignores trailing numbers, e.g. From:
john54@example.com matches To: john@other.example.org as well as To:
john6143598435623452@unrealistic.example.net.

Assuming it performs decently, we'll have to examine its CPU performance
(which I can't do).

>> It also doesn't address the abstraction that Mark was trying to
>> share with us.  The real question is:  is this common in uncaught
>> spam?

On 03/22/2011 07:09 PM, Ted Mittelstaedt wrote:
> Unfortunately it is very common on this mailing list to make the
> claim that "oh, that [insert special case here] isn't a problem
> because our other filters are good enough to catch it" when the
> insert special case is difficult to figure out how to program for.

Eh?  I don't see where you got that from; I see no mention of any
special cases (unless you're talking about overlap, which is a factor we
need to take more seriously).  I just wanted some nods from people with
good intuition on these things before bothering to try it since my
intuition is that it won't help.  No offense (nor elitism) intended.

> But the fact is that we are approaching the area of diminishing
> returns with the Spamassassin canned rulesets.

I used to think that.  Now I work at a company with a massive private
stock of SA rules that handily disproves it.  The biggest problem is
that it takes tons of automated systems plus a nontrivial number of
full-time rule writers to make it hum.

> You shouldn't be asking the question "how much uncaught spam does
> this thing I think is an ugly hack would be good for"
> 
> You should be saying "well, it will probably only catch 2% of the 
> uncaught spam - but if I add this ugly hack to that other ugly hack
> to that other ugly hack all of which only catch 2% of the uncaught
> spam - why then guess what now I'm making a real dent in the
> stuff!!!"

So if gluing three ugly hacks together trigger on 2% of < 5 point spam,
it's worthwhile?  I was avoiding specifics because I'm not sure of how
this will play out.  You were probably talking about three separate
hacks that each independently catch 2% of uncaught spam, working on the
assumption that there is minimal overlap.  Overlap is one of the GA's
biggest shortcomings.

My intuition is that this rule will be somewhere between
__TO_EQ_FROM_DOM (S/O 0.466, 10.5% spam, 2.8173% uncaught-spam*) and
__TO_EQ_FROM (S/O 0.879, 10.3% spam, 2.1365% uncaught-spam), probably
closer to _DOM.  Nice guess on the 2% figure!

* Uncaught-spam% was calculated from summing totals given by ruleqa's
score-map data for scores under 5.  Since __HAS_RCVD hits all but one
spam, I used its summed hits < 5 as the divisor.

> The former attitude is "your problem is an annoyance to me and I'll 
> try to avoid it by studying it to death" the latter is "how can I
> help you with your problem" attitude.

Again, no offense intended.  Like points (and therefore FPs!),
inefficiencies add up when you have large volumes of rules, so we're
very sensitive about the efficiency of rules that aren't top shelf.  I
see no reason not to apply that same logic here.

As to studying things to death, I think it's a good mantra to tread
lightly (especially when armed with a big stick).

I've been discussing looking at the proposed pattern.  Everybody else
has been offering workarounds.  Both avenues have merit.

SA devs do more than spend our lives on this list.  There is a balance
of how much time we dawdle in uncertain pursuits.  Demanding our
attention and research is not terribly polite, especially when coupled
with insults.

Re: username in from address

Posted by Ted Mittelstaedt <te...@ipinc.net>.

On 3/22/2011 2:31 PM, Adam Katz wrote:
>> On 3/22/2011 1:16 PM, Mark Chaney wrote:
>>> Ever notice that a lot of spam seems to have your username in their
>>> from address? Such as an email sent TO blah@domain.com is FROM
>>> blah123@anotherdomain.com (notice 'blah' included in the from
>>> address). This appears to be the case with a large a majority of
>>> the spam that gets through my filters. Any ideas how to handle
>>> this? Would be nice to be able to add a score for matches like
>>> that.
>
> This hasn't been common enough (in my experience) to justify either of
> the two ways to match it (a plugin or else an ugly pair of multi-line
> ALL header rules).  I suppose somebody could throw something up in their
> sandbox, but we'd need the result from timing.log (not published) to
> properly gauge the results (assuming it even has a favorable hit rate
> and S/O).
>
> On 03/22/2011 01:59 PM, Ted Mittelstaedt wrote:
>> If this sort of thing bothers you then simply use a unique or close
>> to unique username and then put a filter in your e-mail client.
>>
>> send mail from:
>>
>> markymarkythefunkydude@northpole.com
>>
>> and your guaranteed that anyone mailing you with
>> "markymarkthefunkydude" in any part of their sending e-mail address
>> is a spammer, and it should be child's play to create a filter in
>> even Outlook that will delete those messages.
>
> That's an ugly workaround that will serve to annoy anybody he
> corresponds with (especially if he's dictating his address at a party;
> that doesn't fit on a napkin).  It also requires trashing an old email
> address, which means alienating/losing old contacts.
>

Many corporations have gone to using full name separated by period
precisely to reduce spamload.  And, how much loss is it to lose
an e-mail address like cooldude8675309@aol.com?  Not to say that the
OP had that kind of an e-mail address but I can't count the number
of gmail.com, aol.com and hotmail.com addresses out there where the
user has a set of numerals tacked on to the name portion of their
e-mail address.

Believe it or not we actually have a few users who have listed ALL
of their contacts and delete ANY mail that is not in their contact
list.  I can only presume they never attend the sort of parties that
you write your e-mail address on a napkin.

I can say that at our site there's a definite increase in spam to
short e-mail addresses (such as my own).

> It also doesn't address the abstraction that Mark was trying to share
> with us.  The real question is:  is this common in uncaught spam?
>

Unfortunately it is very common on this mailing list to make the claim
that "oh, that [insert special case here] isn't a problem because our
other filters are good enough to catch it" when the insert special case
is difficult to figure out how to program for.

But the fact is that we are approaching the area of diminishing returns
with the Spamassassin canned rulesets.  Bays isn't that usable for many
sites and frankly I've only seen it successfully implemented when we
can get users to use IMAP since we can train them to drag the spams into
a junk folder that a script can then fish out and stuff into the learner.

You shouldn't be asking the question "how much uncaught spam does this
thing I think is an ugly hack would be good for"

You should be saying "well, it will probably only catch 2% of the
uncaught spam - but if I add this ugly hack to that other ugly hack to 
that other ugly hack all of which only catch 2% of the uncaught spam -
why then guess what now I'm making a real dent in the stuff!!!"

The former attitude is "your problem is an annoyance to me and I'll
try to avoid it by studying it to death" the latter is "how can I help 
you with your problem" attitude.

Just sayin!

And no, my comment on the attitude displayed is no indication that I 
think a ruleset based on the OP's proposal would be viable or not.

Ted

Re: username in from address

Posted by Adam Katz <an...@khopis.com>.

> On 3/22/2011 1:16 PM, Mark Chaney wrote:
>> Ever notice that a lot of spam seems to have your username in their
>> from address? Such as an email sent TO blah@domain.com is FROM 
>> blah123@anotherdomain.com (notice 'blah' included in the from
>> address). This appears to be the case with a large a majority of
>> the spam that gets through my filters. Any ideas how to handle
>> this? Would be nice to be able to add a score for matches like
>> that.

This hasn't been common enough (in my experience) to justify either of
the two ways to match it (a plugin or else an ugly pair of multi-line
ALL header rules).  I suppose somebody could throw something up in their
sandbox, but we'd need the result from timing.log (not published) to
properly gauge the results (assuming it even has a favorable hit rate
and S/O).

On 03/22/2011 01:59 PM, Ted Mittelstaedt wrote:
> If this sort of thing bothers you then simply use a unique or close
> to unique username and then put a filter in your e-mail client.
> 
> send mail from:
> 
> markymarkythefunkydude@northpole.com
> 
> and your guaranteed that anyone mailing you with
> "markymarkthefunkydude" in any part of their sending e-mail address
> is a spammer, and it should be child's play to create a filter in
> even Outlook that will delete those messages.

That's an ugly workaround that will serve to annoy anybody he
corresponds with (especially if he's dictating his address at a party;
that doesn't fit on a napkin).  It also requires trashing an old email
address, which means alienating/losing old contacts.

It also doesn't address the abstraction that Mark was trying to share
with us.  The real question is:  is this common in uncaught spam?

Re: username in from address

Posted by Ted Mittelstaedt <te...@ipinc.net>.

If this sort of thing bothers you then simply use a unique or close to
unique username and then put a filter in your e-mail client.

send mail from:

markymarkythefunkydude@northpole.com

and your guaranteed that anyone mailing you with "markymarkthefunkydude"
in any part of their sending e-mail address is a spammer, and it should
be child's play to create a filter in even Outlook that will delete
those messages.

Ted

On 3/22/2011 1:16 PM, Mark Chaney wrote:
> Ever notice that a lot of spam seems to have your username in their from
> address? Such as an email sent TO blah@domain.com is FROM
> blah123@anotherdomain.com (notice 'blah' included in the from address).
> This appears to be the case with a large a majority of the spam that
> gets through my filters. Any ideas how to handle this? Would be nice to
> be able to add a score for matches like that.
>
> Thanks,
> Mark