You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Greg Skouby <gs...@sitesnow.com> on 2006/12/28 17:51:41 UTC

FuzzyOCR and 'valid' embedded Images

Hi All,


We have some users of our mail system that are using Lotus Notes for
their MUA. In Lotus Notes they have the option of using, for lack of a
better word, some 'stationery' that effectively embeds three images into
the outgoing email.  If the original recipient replies to that original
message with an HTML message of their own then the three images get
embedded into the second email that is sent back to the original sender.
Things get confusing here so follow me. If the original sender then
replies to the reply from the original recipient then three MORE images
get embedded into the message for a total of six embedded images. You
see where I am going here? In a long enough conversation the embedded
images start to stack up. 


If a conversation ensues of any length then we start to hit the
following tests which push the score WAY over the minimum:


SARE_GIF_ATTACH
TVD_FW_GRAPHIC_NAME_MID
MY_CID_AND_STYLE
MY_CID_AND_ARIAL2
PART_CID_STOCK
TVD_FW_GRAPHIC_ID1
PART_CID_STOCK_LESS

In some extreme cases the emails even start to hit RAZOR tests but I am
less concerned about that.

I know you could argue that Lotus Notes is not playing 'nicely' but I
can't really control that. I just want to solve the problem but if you
have any suggestions of how to make Lotus Notes behave better, apart
from just not sending HTML email, I would be happy to hear them. 

In order to start to solve the problem I installed FuzzyOCR; I figured
this was a good step to discern between 'hammy' and 'spammy' images. The
FuzzyOCR installation seems to have worked correctly. My question is
where do I go from here? My inclination is to decrease the scores for
the above referenced rules, besides the RAZOR tests. Does this sound
like the correct way to go?


I am running 3.1.7 with sa-update and some of the various SARE rulesets.
I have AWL and Bayes turned on also.

Thanks for your thoughts!




--Greg




Re: FuzzyOCR and 'valid' embedded Images

Posted by René Berber <r....@computer.org>.
Greg Skouby wrote:

> We have some users of our mail system that are using Lotus Notes for
> their MUA. In Lotus Notes they have the option of using, for lack of a
> better word, some 'stationery' that effectively embeds three images into
> the outgoing email.  If the original recipient replies to that original
> message with an HTML message of their own then the three images get
> embedded into the second email that is sent back to the original sender.
> Things get confusing here so follow me. If the original sender then
> replies to the reply from the original recipient then three MORE images
> get embedded into the message for a total of six embedded images. You
> see where I am going here? In a long enough conversation the embedded
> images start to stack up. 
[snip]
> In order to start to solve the problem I installed FuzzyOCR; I figured
> this was a good step to discern between 'hammy' and 'spammy' images. The
> FuzzyOCR installation seems to have worked correctly. My question is
> where do I go from here? My inclination is to decrease the scores for
> the above referenced rules, besides the RAZOR tests. Does this sound
> like the correct way to go?

No, FuzzyOcr does not score non-spam images, nor does it subtract in any case;
it does detect non-spam images but only to save the checksum in its database
(and not have to scan the same image again).  You would have to change the code
to make it do what you want.

The best solution would be not to use SA on those messages, and that is of
course done somewhere else.  One example are some of SnertSoft's milters for
sendmail (and postfix?), the interesting functionality is that they (supposedly)
can white-list the remote recipient, so that when they answer they don't have to
go through the usual tests (I've only read abut this in the context of
gray-listing but a milter for spam checks could do the same).

MailScanner has the white-listing functionality, but its not automatic, its manual.

Other possibility would be to extend AWL and/or other auto white-listing in a
similar fashion.  SA's AWL is probably decreasing the score in your case already
and you don't have much control, just add or delete manually, and the automatic
score averaging.

> I am running 3.1.7 with sa-update and some of the various SARE rulesets.
> I have AWL and Bayes turned on also.
> 
> Thanks for your thoughts!

HTH
-- 
René Berber