You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Yves Goergen <no...@unclassified.de> on 2006/07/26 16:12:03 UTC

How to identify image spam finally?

Hi there,
I'm running SpamAssassin on my mailbox and rejecting anything above a
score of 10. But lately the spam volume increases again. The bug
majority of mails I receive has a big image on the top, sometimes
combined from multiple image files, containing a lot of text I don't
want to read (stocks "info" and the like), followed by some lines of
random words that Bayes hardly identifies. IMO, the only way to reliably
identify that type of spam is to do an OCR on the (combined) image and
include it in the text rules. Is it possible to do that? I know that it
would cause a higher server load but that seems to be the price of a
clean mailbox. I don't know so many bad words in English to express what
I feel about that spam (maybe that's better) but I'm really fed up with it!

-- 
Yves Goergen "LonelyPixel" <no...@unclassified.de>
http://beta.unclassified.de – My web laboratory.

Re: How to identify image spam finally?

Posted by Loren Wilton <lw...@earthlink.net>.
> majority of mails I receive has a big image on the top, sometimes
> combined from multiple image files, containing a lot of text I don't
> want to read (stocks "info" and the like), followed by some lines of

Try the rulesemporium stock rules.

        Loren


Re: How to identify image spam finally?

Posted by jdow <jd...@earthlink.net>.
Visit http://www.rulesemporium.com/ and read up on the various sets of
rules these fine people maintain. Many of them do very well with image
only spam or image over nonsense text spam as well as stock spam.

For these types of spam it is also imperative that you have the standard
set of block lists enabled. Between the SARE rules (above) and the DNS
based block list tests none of that <censored> is getting through here.

Doing an OCR on the images is RUINOUSLY expensive in terms of time spent
on each message.

{^_^}
----- Original Message ----- 
From: "Yves Goergen" <no...@unclassified.de>


> Hi there,
> I'm running SpamAssassin on my mailbox and rejecting anything above a
> score of 10. But lately the spam volume increases again. The bug
> majority of mails I receive has a big image on the top, sometimes
> combined from multiple image files, containing a lot of text I don't
> want to read (stocks "info" and the like), followed by some lines of
> random words that Bayes hardly identifies. IMO, the only way to reliably
> identify that type of spam is to do an OCR on the (combined) image and
> include it in the text rules. Is it possible to do that? I know that it
> would cause a higher server load but that seems to be the price of a
> clean mailbox. I don't know so many bad words in English to express what
> I feel about that spam (maybe that's better) but I'm really fed up with it!
>
> -- 
> Yves Goergen "LonelyPixel" <no...@unclassified.de>
> http://beta.unclassified.de – My web laboratory.