You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "Johnson, S" <sj...@edina.k12.mn.us> on 2005/04/28 20:35:00 UTC

OCR and SA

Has anyone attempted to write an OCR filter (optical character
recognition) for jpg or gif files that contain spam words?



=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Confidentiality Notice

If the information in this electronic communication relates to an individual pupil, it is a confidential pupil record under Minnesota Law and may not be reviewed, distributed, or copied by any person other than the individual(s) to whom it is addressed. This electronic communication is intended solely for the use of the individual(s) to whom it is addressed. If you are not the intended recipient, any further review, dissemination, distribution, or copying of this electronic communication or any attachment thereto is strictly prohibited. If you have received an electronic communication in error, you should immediately return it to the sender and delete it from your system.


Re: OCR and SA

Posted by Matt Kettler <mk...@evi-inc.com>.
Johnson, S wrote:

> Has anyone attempted to write an OCR filter (optical character
> recognition) for jpg or gif files that contain spam words?
>

Not that I'm aware of, but it's been mentioned MANY times.

Really I think it largely boils down to being more CPU load than it's
worth.

Even the best OCR's are unreliable, and easily evaded if the sender is
trying to confuse it. All adding OCR would do would cause Spammers with
image-based spams to start using strange fonts which are hard to OCR but
easy to read.

Besides, image based spams aren't really much of a problem, at least not
here.

Most web-linked-image based spams are picked up quickly by SURBL and
Razor's e8 hash.

Most embedded-image based spams are quickly picked up by razor's e4
hash, dcc, and/or pyzor. Many also contain web links to the site they
advertise and get hit by SURBL, etc too.

*shrug*