You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Jack Gostl <go...@argoscomp.com> on 2006/03/07 12:55:13 UTC

All image spam

I've seen some references to this in threads, but I didn't see an answer.

Starting in late November, we started getting hit with spam that was almost entirely a jpeg. They seem to be mostly "stock recommendations". There is minimal message, usually HTML, and the real spam content is in the image. Despite al the trainging that I do, this seems to slip through the Bayes algorithms with no more than a 50%, and the rest of the tests don't drive the score up high enough to help.

I am currently running SpamAssassin 3.0.3. I tried running these messages through SpamAssassin 3.1 and it doesn't seem to help.

Any suggestions?

Thanks - Jack

Re: All image spam

Posted by Craig Baird <cr...@xpressweb.com>.

I'm having similar results here.  As others have mentioned, the SARE stock 
rules do help somewhat, but it's by no means the proverbial "silver bullet".  
As someone else also mentioned, it helps to increase the HTML_IMAGE_ONLY_XX 
rules.  I increased 12,16,20, and 24 by one point each.  However, that still 
doesn't nail all of them.  I have seen some come through without even hitting 
any HTML_IMAGE_ONLY_XX rules.

It seems to me that with these image-only spams, spammers may have finally 
stumbled onto a pretty good weapon to counter SA, and to defeat Bayes.  With 
broadband connections being dirt cheap these days, and with all the zombie 
nets at their disposal, spammers can now blast out large spams in a short 
amount of time, without causing much drain on their own network resources.  
I'm getting image-only spam with attachments ranging in size from about 12K to 
70K.

I'll bet it's only a matter of time before we start seeing spam larger than 
256K, which I believe is the threshold that most people use to determine 
whether to send a message to SA for scanning or not.  We'll probably all be 
bumping up that threshold at some point.  :(

Craig


Quoting Jack Gostl <go...@argoscomp.com>:

> I've seen some references to this in threads, but I didn't see an answer.
> 
> Starting in late November, we started getting hit with spam that was almost
> entirely a jpeg. They seem to be mostly "stock recommendations". There is
> minimal message, usually HTML, and the real spam content is in the image.
> Despite al the trainging that I do, this seems to slip through the Bayes
> algorithms with no more than a 50%, and the rest of the tests don't drive the
> score up high enough to help.
> 
> I am currently running SpamAssassin 3.0.3. I tried running these messages
> through SpamAssassin 3.1 and it doesn't seem to help.
> 
> Any suggestions?
> 
> Thanks - Jack
>

Re: Image MD5sums available, was Re: All image spam

Posted by Dirk Bonengel <di...@bonengel.de>.

Hi, all,

I wonder if the iXhash Plugin I did last summer would catch these.
FYI, the plugin uses some form(s) of fuzzy MD5 checksums of the complete 
mail body (not seperate mime parts) and does compare the results with 
those I provide via DNS.
It's available at http://wiki.apache.org/spamassassin/iXhash.
If not, enhancing it to also compute checksums of attachments would be 
nice to have. If only I had the time...

Dirk


William Stearns schrieb:
> Good evening, Jack, all,
>
> On Tue, 7 Mar 2006, Jack Gostl wrote:
>
>> I've seen some references to this in threads, but I didn't see an 
>> answer.
>>
>> Starting in late November, we started getting hit with spam that was 
>> almost entirely a jpeg. They seem to be mostly "stock 
>> recommendations". There is minimal message, usually HTML, and the 
>> real spam content is in the image. Despite al the trainging that I 
>> do, this seems to slip through the Bayes algorithms with no more than 
>> a 50%, and the rest of the tests don't drive the score up high enough 
>> to help.
>>
>> I am currently running SpamAssassin 3.0.3. I tried running these 
>> messages through SpamAssassin 3.1 and it doesn't seem to help.
>>
>> Any suggestions?
>
>     We talked about identifying images last summer.  There are a few 
> answers, some of which have been discussed in this thread already.
>     Razor, pyzor, and DCC are designed to score up messages with 
> already-seen mime parts (read: if 3 other people think that image is 
> spam, your spam filter can score it up).  As with identifying text 
> parts where the spammer inserts random words to throw those services 
> off, images can be subtly modified so the visible area is essentially 
> identical but the actual image file is different with every spam run.
>     I offered to put together a catalog of checksums of images used in 
> spam, and have done so.  The md5 and sha1 sums of 44,522 spam images 
> can be found at http://www.stearns.org/spamattach/ , broken out by 
> category and in combined files.  If anyone wants to take on an 
> interesting project of computing the md5 checksums of attachments, I'd 
> be willing to set those lists up as a dns-queriable rbl (along the 
> lines of
> 01f5ff6ab05499c94a967409204e6a29.md5.some_rbl.net which would return 
> 127.0.0.2 if known, nothing if not).
>     I already understand the downsides to this approach (duplicates 
> work of razor, pyzor, and dcc, images can be altered), but figure the 
> checksum work has already been done and will continue to be done anyways.
>     Anyone up for it?
>     Cheers,
>     - Bill
>
> --------------------------------------------------------------------------- 
>
>         "That man is a success who lived well, laughed often and loved
> much: who has gained the respect of intelligent men and the love of
> children: who has filled his niche and accomplished his task: who leaves
> the world a better place than he found it, whether by an improved poppy,
> a perfect poem or a rescued soul; who never lacked appreciation of
> earth's beauty or failed to express it; who looked for the best in
> others and gave the best he had."
>         -- Robert Louis Stevenson. 
> -------------------------------------------------------------------------- 
>
> William Stearns (wstearns@pobox.com).  Mason, Buildkernel, freedups, p0f,
> rsync-backup, ssh-keyinstall, dns-check, more at:   
> http://www.stearns.org
> -------------------------------------------------------------------------- 
>

Image MD5sums available, was Re: All image spam

Posted by William Stearns <ws...@pobox.com>.

Good evening, Jack, all,

On Tue, 7 Mar 2006, Jack Gostl wrote:

> I've seen some references to this in threads, but I didn't see an 
> answer.
>
> Starting in late November, we started getting hit with spam that was 
> almost entirely a jpeg. They seem to be mostly "stock recommendations". 
> There is minimal message, usually HTML, and the real spam content is in 
> the image. Despite al the trainging that I do, this seems to slip 
> through the Bayes algorithms with no more than a 50%, and the rest of 
> the tests don't drive the score up high enough to help.
>
> I am currently running SpamAssassin 3.0.3. I tried running these 
> messages through SpamAssassin 3.1 and it doesn't seem to help.
>
> Any suggestions?

 	We talked about identifying images last summer.  There are a few 
answers, some of which have been discussed in this thread already.
 	Razor, pyzor, and DCC are designed to score up messages with 
already-seen mime parts (read: if 3 other people think that image is spam, 
your spam filter can score it up).  As with identifying text parts where 
the spammer inserts random words to throw those services off, images can 
be subtly modified so the visible area is essentially identical but the 
actual image file is different with every spam run.
 	I offered to put together a catalog of checksums of images used in 
spam, and have done so.  The md5 and sha1 sums of 44,522 spam images can 
be found at http://www.stearns.org/spamattach/ , broken out by category 
and in combined files.  If anyone wants to take on an interesting project 
of computing the md5 checksums of attachments, I'd be willing to set those 
lists up as a dns-queriable rbl (along the lines of
01f5ff6ab05499c94a967409204e6a29.md5.some_rbl.net which would return 
127.0.0.2 if known, nothing if not).
 	I already understand the downsides to this approach (duplicates 
work of razor, pyzor, and dcc, images can be altered), but figure the 
checksum work has already been done and will continue to be done anyways.
 	Anyone up for it?
 	Cheers,
 	- Bill

---------------------------------------------------------------------------
         "That man is a success who lived well, laughed often and loved
much: who has gained the respect of intelligent men and the love of
children: who has filled his niche and accomplished his task: who leaves
the world a better place than he found it, whether by an improved poppy,
a perfect poem or a rescued soul; who never lacked appreciation of
earth's beauty or failed to express it; who looked for the best in
others and gave the best he had."
         -- Robert Louis Stevenson. 
--------------------------------------------------------------------------
William Stearns (wstearns@pobox.com).  Mason, Buildkernel, freedups, p0f,
rsync-backup, ssh-keyinstall, dns-check, more at:   http://www.stearns.org
--------------------------------------------------------------------------

Re: All image spam

Posted by Loren Wilton <lw...@earthlink.net>.

> Any suggestions?

The SARE stock rules.  They won't catch all of 'em, but they will catch a lot.

        Loren

Re: All image spam

Posted by le...@srs.gov.

We jacked up the scoring on HTML_IMAGE_ONLY_12  to a 5, and are catching 
about 90% of these now with almost no false positives.





"Jack Gostl" <go...@argoscomp.com> 
03/07/2006 07:26 AM

To
<us...@spamassassin.apache.org>
cc

Subject
All image spam






I've seen some references to this in threads, but I didn't see an answer.
 
Starting in late November, we started getting hit with spam that was 
almost entirely a jpeg. They seem to be mostly "stock recommendations". 
There is minimal message, usually HTML, and the real spam content is in 
the image. Despite al the trainging that I do, this seems to slip through 
the Bayes algorithms with no more than a 50%, and the rest of the tests 
don't drive the score up high enough to help.
 
I am currently running SpamAssassin 3.0.3. I tried running these messages 
through SpamAssassin 3.1 and it doesn't seem to help.
 
Any suggestions?
 
Thanks - Jack

RE: All image spam

Posted by Craig Baird <cr...@xpressweb.com>.

Quoting Martin Hepworth <ma...@solid-state-logic.com>:

> Jack
> 
> If you turn on the URI-RBLs in 3.1 (see v310.pre) you should see a
> reduction
> in this type of spam.

I don't think I've ever seen a URI in one of these...  They purposely leave 
out anything in the actual message body that could be used to block their 
mail.  All that is present in the message body is gibberish that typically 
doesn't even trigger a significant Bayes score.  All the spam content, 
including any URIs is contained in the image.

Craig

RE: All image spam

Posted by Martin Hepworth <ma...@solid-state-logic.com>.

Jack

If you turn on the URI-RBLs in 3.1 (see v310.pre) you should see a reduction
in this type of spam.

--
Martin Hepworth 
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300

> -----Original Message-----
> From: Jack Gostl [mailto:gostl@argoscomp.com]
> Sent: 07 March 2006 11:55
> To: users@spamassassin.apache.org
> Subject: All image spam
> 
> I've seen some references to this in threads, but I didn't see an answer.
> 
> Starting in late November, we started getting hit with spam that was
> almost entirely a jpeg. They seem to be mostly "stock recommendations".
> There is minimal message, usually HTML, and the real spam content is in
> the image. Despite al the trainging that I do, this seems to slip through
> the Bayes algorithms with no more than a 50%, and the rest of the tests
> don't drive the score up high enough to help.
> 
> I am currently running SpamAssassin 3.0.3. I tried running these messages
> through SpamAssassin 3.1 and it doesn't seem to help.
> 
> Any suggestions?
> 
> Thanks - Jack
> 


**********************************************************************

This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.	

**********************************************************************