You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by William Stearns <ws...@pobox.com> on 2006/03/07 23:47:31 UTC

Image MD5sums available, was Re: All image spam

Good evening, Jack, all,

On Tue, 7 Mar 2006, Jack Gostl wrote:

> I've seen some references to this in threads, but I didn't see an 
> answer.
>
> Starting in late November, we started getting hit with spam that was 
> almost entirely a jpeg. They seem to be mostly "stock recommendations". 
> There is minimal message, usually HTML, and the real spam content is in 
> the image. Despite al the trainging that I do, this seems to slip 
> through the Bayes algorithms with no more than a 50%, and the rest of 
> the tests don't drive the score up high enough to help.
>
> I am currently running SpamAssassin 3.0.3. I tried running these 
> messages through SpamAssassin 3.1 and it doesn't seem to help.
>
> Any suggestions?

 	We talked about identifying images last summer.  There are a few 
answers, some of which have been discussed in this thread already.
 	Razor, pyzor, and DCC are designed to score up messages with 
already-seen mime parts (read: if 3 other people think that image is spam, 
your spam filter can score it up).  As with identifying text parts where 
the spammer inserts random words to throw those services off, images can 
be subtly modified so the visible area is essentially identical but the 
actual image file is different with every spam run.
 	I offered to put together a catalog of checksums of images used in 
spam, and have done so.  The md5 and sha1 sums of 44,522 spam images can 
be found at http://www.stearns.org/spamattach/ , broken out by category 
and in combined files.  If anyone wants to take on an interesting project 
of computing the md5 checksums of attachments, I'd be willing to set those 
lists up as a dns-queriable rbl (along the lines of
01f5ff6ab05499c94a967409204e6a29.md5.some_rbl.net which would return 
127.0.0.2 if known, nothing if not).
 	I already understand the downsides to this approach (duplicates 
work of razor, pyzor, and dcc, images can be altered), but figure the 
checksum work has already been done and will continue to be done anyways.
 	Anyone up for it?
 	Cheers,
 	- Bill

---------------------------------------------------------------------------
         "That man is a success who lived well, laughed often and loved
much: who has gained the respect of intelligent men and the love of
children: who has filled his niche and accomplished his task: who leaves
the world a better place than he found it, whether by an improved poppy,
a perfect poem or a rescued soul; who never lacked appreciation of
earth's beauty or failed to express it; who looked for the best in
others and gave the best he had."
         -- Robert Louis Stevenson. 
--------------------------------------------------------------------------
William Stearns (wstearns@pobox.com).  Mason, Buildkernel, freedups, p0f,
rsync-backup, ssh-keyinstall, dns-check, more at:   http://www.stearns.org
--------------------------------------------------------------------------

Re: Image MD5sums available, was Re: All image spam

Posted by Dirk Bonengel <di...@bonengel.de>.
Hi, all,

I wonder if the iXhash Plugin I did last summer would catch these.
FYI, the plugin uses some form(s) of fuzzy MD5 checksums of the complete 
mail body (not seperate mime parts) and does compare the results with 
those I provide via DNS.
It's available at http://wiki.apache.org/spamassassin/iXhash.
If not, enhancing it to also compute checksums of attachments would be 
nice to have. If only I had the time...

Dirk


William Stearns schrieb:
> Good evening, Jack, all,
>
> On Tue, 7 Mar 2006, Jack Gostl wrote:
>
>> I've seen some references to this in threads, but I didn't see an 
>> answer.
>>
>> Starting in late November, we started getting hit with spam that was 
>> almost entirely a jpeg. They seem to be mostly "stock 
>> recommendations". There is minimal message, usually HTML, and the 
>> real spam content is in the image. Despite al the trainging that I 
>> do, this seems to slip through the Bayes algorithms with no more than 
>> a 50%, and the rest of the tests don't drive the score up high enough 
>> to help.
>>
>> I am currently running SpamAssassin 3.0.3. I tried running these 
>> messages through SpamAssassin 3.1 and it doesn't seem to help.
>>
>> Any suggestions?
>
>     We talked about identifying images last summer.  There are a few 
> answers, some of which have been discussed in this thread already.
>     Razor, pyzor, and DCC are designed to score up messages with 
> already-seen mime parts (read: if 3 other people think that image is 
> spam, your spam filter can score it up).  As with identifying text 
> parts where the spammer inserts random words to throw those services 
> off, images can be subtly modified so the visible area is essentially 
> identical but the actual image file is different with every spam run.
>     I offered to put together a catalog of checksums of images used in 
> spam, and have done so.  The md5 and sha1 sums of 44,522 spam images 
> can be found at http://www.stearns.org/spamattach/ , broken out by 
> category and in combined files.  If anyone wants to take on an 
> interesting project of computing the md5 checksums of attachments, I'd 
> be willing to set those lists up as a dns-queriable rbl (along the 
> lines of
> 01f5ff6ab05499c94a967409204e6a29.md5.some_rbl.net which would return 
> 127.0.0.2 if known, nothing if not).
>     I already understand the downsides to this approach (duplicates 
> work of razor, pyzor, and dcc, images can be altered), but figure the 
> checksum work has already been done and will continue to be done anyways.
>     Anyone up for it?
>     Cheers,
>     - Bill
>
> --------------------------------------------------------------------------- 
>
>         "That man is a success who lived well, laughed often and loved
> much: who has gained the respect of intelligent men and the love of
> children: who has filled his niche and accomplished his task: who leaves
> the world a better place than he found it, whether by an improved poppy,
> a perfect poem or a rescued soul; who never lacked appreciation of
> earth's beauty or failed to express it; who looked for the best in
> others and gave the best he had."
>         -- Robert Louis Stevenson. 
> -------------------------------------------------------------------------- 
>
> William Stearns (wstearns@pobox.com).  Mason, Buildkernel, freedups, p0f,
> rsync-backup, ssh-keyinstall, dns-check, more at:   
> http://www.stearns.org
> -------------------------------------------------------------------------- 
>