You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Alex <my...@gmail.com> on 2010/04/27 02:37:56 UTC

Filtering zip spam

Hi,

I'm seeing an increase in zip attachment spam, and hoped someone could
help me figure out why it isn't being properly tagged. Are others
seeing this? Is BAYES_99 being triggered or is it lower?

Here's an example:

http://pastebin.com/h9JwTQ9T

The score is very low. Does someone have an idea of other
characteristics that I can flag on?

Thanks!
Alex

Re: Filtering zip spam

Posted by David B Funk <db...@engineering.uiowa.edu>.
On Mon, 26 Apr 2010, Alex wrote:

> Hi,
>
> I'm seeing an increase in zip attachment spam, and hoped someone could
> help me figure out why it isn't being properly tagged. Are others
> seeing this? Is BAYES_99 being triggered or is it lower?
>
> Here's an example:
>
> http://pastebin.com/h9JwTQ9T
>
> The score is very low. Does someone have an idea of other
> characteristics that I can flag on?

FWIW, here's what I'm getting for that message:

Content analysis details:   (15.5 points, 6.0 required, autolearn=no)

 pts rule name              description
---- ---------------------- ------------------------------------------
 1.7 RATWARE_GECKO_BUILD    Bulk email fingerprint (Gecko faked) found
 0.1 RATWR10_MESSID         Message-ID has ratware pattern (HEXHEX.HEXHEX@)
 1.1 SPF_FAIL               SPF: sender does not match SPF record (fail)[SPF failed: Please see
http://www.openspf.org/why.html?sender=debenture%40us.randstad.com&ip=80.12.242.26&receiver=server37.icaen.uiowa.edu]
 4.0 BAYES_99               BODY: Bayesian spam probability is 99 to 100%
                            [score: 1.0000]
 5.0 L_CLAMAV               Clam AntiVirus detected a virus
 1.6 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
                [Blocked - see <http://www.spamcop.net/bl.shtml?80.14.188.63>]
 2.0 MY_CLAMAV              MY_CLAMAV
 0.0 T__MY_CLAMAV_SANE      T__MY_CLAMAV_SANE


Major hits are BAYES_99 & Sane-Security sigs in ClamAV, minor hits from
spamcop & spf-fail plus some custom rules. Without the Sane hits it
still would have made it over my threshold.

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: Filtering zip spam

Posted by "corpus.defero" <co...@idnet.com>.
On Tue, 2010-04-27 at 11:08 -0400, Alex wrote:
> Hi,
> 
> >> Might as well just block all of \.fr at smtp time for that matter :-)
> >> Poor France :(
> >
> > I mostly do....... au revoir Le France....
> 
> Somewhat off-topic, but in the interest of increasing awareness, India
> reportedly ranks first:
> 
> http://www.dnaindia.com/mumbai/report_india-ranks-first-in-sending-spam-mails_1374118
> 
> Regards,
> Alex
Not in my logs it doesn't ;-) but each user and server has different
experiences. 


Re: Filtering zip spam

Posted by ram <ra...@netcore.co.in>.
On Tue, 2010-04-27 at 11:08 -0400, Alex wrote:
> Hi,
> 
> >> Might as well just block all of \.fr at smtp time for that matter :-)
> >> Poor France :(
> >
> > I mostly do....... au revoir Le France....
> 
> Somewhat off-topic, but in the interest of increasing awareness, India
> reportedly ranks first:
> 
> http://www.dnaindia.com/mumbai/report_india-ranks-first-in-sending-spam-mails_1374118
> 

If you read it .... India ranks first in asia pacific regions. No
surprises , Afganistan has almost no internet , Pakistan has almost no
power, and Bangladesh has almost no users. The others are too small. 


Worldwide most spam comes from the US and China and then followed by
Russia
http://www.spamhaus.org/statistics/countries.lasso


India doesnt even figure in the top 20





Re: Filtering zip spam

Posted by Alex <my...@gmail.com>.
Hi,

>> Might as well just block all of \.fr at smtp time for that matter :-)
>> Poor France :(
>
> I mostly do....... au revoir Le France....

Somewhat off-topic, but in the interest of increasing awareness, India
reportedly ranks first:

http://www.dnaindia.com/mumbai/report_india-ranks-first-in-sending-spam-mails_1374118

Regards,
Alex

Re: Filtering zip spam

Posted by "corpus.defero" <co...@idnet.com>.
On Tue, 2010-04-27 at 02:16 -0400, Alex wrote:
> Hi,
> 
> >> Here's an example:
> >>
> >> http://pastebin.com/h9JwTQ9T
> >>
> >> The score is very low. Does someone have an idea of other
> >> characteristics that I can flag on?
> >>
> > Hits for me on this:
> > Sanesecurity.Junk.22048.UNOFFICIAL FOUND
> 
> Ah, very good. I think that might be what I'm missing. How are you
> implementing this? From here?
> 
> http://www.sanesecurity.co.uk/download_scripts_linux.htm
> 
> Or are you using the clamav SA plugin-in?
Using clamav-milter ahead of SA with Postfix with SANE but any
implementation that uses clam/sane will do the same.
> 
> I'm using amavisd with clam-0.96 and sa-3.2.5.
> 
> >  9.0 RELAYCOUNTRY_FR        Relayed through France
> >  5.0 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
> 
> I wish I could use scores like that :-)
> 
> Might as well just block all of \.fr at smtp time for that matter :-)
> Poor France :(
I mostly do....... au revoir Le France....
> 
> Thanks,
> Alex



Re: Filtering zip spam

Posted by Alex <my...@gmail.com>.
Hi,

>> Here's an example:
>>
>> http://pastebin.com/h9JwTQ9T
>>
>> The score is very low. Does someone have an idea of other
>> characteristics that I can flag on?
>>
> Hits for me on this:
> Sanesecurity.Junk.22048.UNOFFICIAL FOUND

Ah, very good. I think that might be what I'm missing. How are you
implementing this? From here?

http://www.sanesecurity.co.uk/download_scripts_linux.htm

Or are you using the clamav SA plugin-in?

I'm using amavisd with clam-0.96 and sa-3.2.5.

>  9.0 RELAYCOUNTRY_FR        Relayed through France
>  5.0 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net

I wish I could use scores like that :-)

Might as well just block all of \.fr at smtp time for that matter :-)
Poor France :(

Thanks,
Alex

Re: Filtering zip spam

Posted by "corpus.defero" <co...@idnet.com>.
On Mon, 2010-04-26 at 20:37 -0400, Alex wrote:
> Hi,
> 
> I'm seeing an increase in zip attachment spam, and hoped someone could
> help me figure out why it isn't being properly tagged. Are others
> seeing this? Is BAYES_99 being triggered or is it lower?
> 
> Here's an example:
> 
> http://pastebin.com/h9JwTQ9T
> 
> The score is very low. Does someone have an idea of other
> characteristics that I can flag on?
> 
> Thanks!
> Alex
Hits for me on this:
Sanesecurity.Junk.22048.UNOFFICIAL FOUND

But how long that has bitten it I can't say. Other than that it's not
doing well:

 pts rule name              description
---- ----------------------
--------------------------------------------------
 9.0 RELAYCOUNTRY_FR        Relayed through France
 5.0 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
                [Blocked - see
<http://www.spamcop.net/bl.shtml?80.14.188.63>]



Re: Filtering zip spam

Posted by Alex <my...@gmail.com>.
Hi,

> Alex, does Bayes understand/check INSIDE zips, at least for file
> properties?  If not, then it is inherently limited (just in this

I'm not sure if you're asking me rhetorically here. I really don't
know. Is it enough that bayes finds the encoded string as the
attachment, and matches that against other strings or must it be
expanded first into its real content?

> context), which is a big part of why this is such an effective
> technique.  Adding that to Bayes should be relatively straight
> forward, and should make zips less attractive to spammers.

Almost too obvious of an addition makes me wonder why it hasn't
previously been done.

> One simple approach is to score all "small" zips, then meta that
> with other characteristics, like ANY blocklist hit, "unusual"
> nation of origin, etc.

That's a good one. I'm not sure I'm at the point of writing rules to
match on attachment size, however.

> That's how I first handled zips, a few years ago, and it's fairly
> effective.  Small zips in ham are VERY unusual, and typically are

Again, very obvious after you mention it that I'm surprised it's not
in the default rules if you've been doing it for a while. Is there
some side-effect or drawback that would prevent it from being rolled
into a real SA release?

> To avoid FPs, I'm using the RealName-based rules I described almost
> three years ago (I have several "skip" rules daisy-chained off

I'll have to locate those. Not much luck finding it after a quick
search. It's not the Google "I'm feeling lucky" discussion, right?

# Is this even still relevant?
http://old.nabble.com/Googlepages---Livefilestore-spams-td14715808.html

> Alex, as with all rules, it really depends on your ham ecology.

I agree to an extent, but there is a common reference point that we
all have, and I'd like to at least find that.

> Feel free to share more info about yours (we need the equivalent
> of the Geek Code for ham ecology!).  When you first started
> posting, I briefly assumed you were a college student, then
> gradually realized you have decent volume and diversity. :)

I appreciate that. I've been working with Linux since the beginning
but not a real perl programmer.

> As I mentioned in a post in January, I had noticed a consistent
> value in an Image properties field which I was calculating, but
> not (at the time) exporting.

Is this it?

# Re: pill image spam learns to walk
http://marc.info/?l=spamassassin-users&m=126327771510366&w=2

Is there any progress on your work from that, which might benefit us here?

> Entire zip:
>    - number of files
>    - compression ratio (i.e. across ALL files)

Isn't this what the clamav and sanesecurity sigs are for?

Thanks,
Alex

Re: Filtering zip spam

Posted by "Chip M." <sa...@IowaHoneypot.com>.
>I'm seeing an increase in zip attachment spam, and hoped someone
>could help me figure out why it isn't being properly tagged. Are
>others seeing this? Is BAYES_99 being triggered or is it lower?

Alex, does Bayes understand/check INSIDE zips, at least for file
properties?  If not, then it is inherently limited (just in this
context), which is a big part of why this is such an effective
technique.  Adding that to Bayes should be relatively straight
forward, and should make zips less attractive to spammers.


>The score is very low. Does someone have an idea of other
>characteristics that I can flag on?

One simple approach is to score all "small" zips, then meta that
with other characteristics, like ANY blocklist hit, "unusual"
nation of origin, etc.

That's safer than outright blocking merely "unusual" nations, like
France. :)

That's how I first handled zips, a few years ago, and it's fairly
effective.  Small zips in ham are VERY unusual, and typically are
sent by more sophisticated users, so it may be viable to have a
Subject-based "skip" rule (again, via metas) that would cancel out
other tests.

To avoid FPs, I'm using the RealName-based rules I described almost
three years ago (I have several "skip" rules daisy-chained off
those - a good example of an anti-spam mechanism which turned into
a very effective anti-FP mechanism).
Note that all the current zips have incorrect RealNames.


Alex, as with all rules, it really depends on your ham ecology.
Feel free to share more info about yours (we need the equivalent
of the Geek Code for ham ecology!).  When you first started
posting, I briefly assumed you were a college student, then
gradually realized you have decent volume and diversity. :)


All of the recent zipped file campaigns look like the work of last
year's inline-PNG/RTF coder, so we could well be in for more
variants.

Using zips is an interesting delivery mechanism.  Most Windows
versions have easy means to open them, and there's an element of
novelty (even I was almost excited when the first zipped JPEG
arrived - followed by disappointment that it was merely a
"standard" wavy pharm).


Another approach I had been using was a (post-SA) test that
extracts all filenames, and just looks for any specified file
extension(s).

It worked, but that test was designed for malware detection, and
has VERY limited options.  There was no means of restricting it to
a zip containing just one small RTF and no other files, so my
initial rule would have mis-fired on anything with a mix of files.

I finally had my Kaylee Frye moment about two weeks ago, and
(in my post-SA filter (sorry, written in Object Pascal)) wrote a
brand new "Zip Info" module, similar to "Image Info".

I designed it to expose far more info, and wrote the rules module
so I'd have far more control than was currently "necessary".

As I mentioned in a post in January, I had noticed a consistent
value in an Image properties field which I was calculating, but
not (at the time) exporting.
I'm trying to avoid that mental kick moment. :)


SANITY CHECK please!
Here's what I'm currently exporting:

Entire zip:
    - number of files
    - compression ratio (i.e. across ALL files)

Per file:
    - filename
    - compression ratio
    - file date

The only property I'm not currently doing anything with is the
individual file date.  I'm having my endusers log their ham data
for a few weeks, then I'll see if there's anything useful, ham vs
spam wise.  I predict ham will have a rich date range, and spam
will be mostly/entirely recent.  I may add a simple "younger/older
than n days" test, regardless, since when dealing with spammers,
Logic is often NOT the beginning of Wisdom. ;)


Implementing the basic properties extraction was trivial.
Thinking thru how I wanted to handle the rules was more of a
challenge. :)

Figured I'd share where I'm at, and pick the big brains. :)
    - "Chip"

P.S.  I am also seriously considering adding the ability to extract
any specified file as a text or binary stream, with the text stream
defaulting to being fed to a domain extraction module.

It's not unreasonable for somebody to send a legit zipped RTF, so
content scanning would be good.  These spam RTFs in particular are
tiny (low overhead to extract) yet intensely spammy.