You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Bowie Bailey <Bo...@BUC.com> on 2008/09/25 17:56:34 UTC

FuzzyOCR

I've had FuzzyOCR running for quite a while.  Today I found a false
positive for it that is a bit strange.

The message has seven images.  FuzzyOCR claims to have found the word
"service" in five of them (and counted it 10 times for a score of 6.5).
However, I can only see the word in one of the images and only three of
the seven images have any text at all.  Is there a problem here?

Is FuzzyOCR still useful?  It doesn't seem to hit a lot for me.

	%OFMAIL: 1.18
	%OFSPAM: 3.41
	%OFHAM:  0.26

--
Bowie

Re: FuzzyOCR

Posted by DaveAtJLA <da...@jla.com>.
Sorry this reply is a bit late, but the problem is a bug in FuzzyOCR. When a
message has multiple images, it ends up appending to the text file instead
of replacing it. The bug is in routine open_on_specific_fd in Misc.pm:

$fname =~ s/> *// and $flags |= O_CREAT|O_WRONLY;

should be

$fname =~ s/> *// and $flags |= O_CREAT|O_WRONLY|O_TRUNC;

(and you have to add O_TRUNC to the import list at the top of the module
too).

I logged this as ticket 555 on the FuzzyOCR website.

Having fixed that, I'm not sure that FuzzyOCR is helping much. Also I've
lowered the FUZZY_OCR_WRONG_EXTENSION score as it was occasionally firing
multiple times on non-spam.

Dave



Bowie Bailey wrote:
> 
> I've had FuzzyOCR running for quite a while.  Today I found a false
> positive for it that is a bit strange.
> 
> The message has seven images.  FuzzyOCR claims to have found the word
> "service" in five of them (and counted it 10 times for a score of 6.5).
> However, I can only see the word in one of the images and only three of
> the seven images have any text at all.  Is there a problem here?
> 
> Is FuzzyOCR still useful?  It doesn't seem to hit a lot for me.
> 
> 	%OFMAIL: 1.18
> 	%OFSPAM: 3.41
> 	%OFHAM:  0.26
> 
> --
> Bowie
> 
> 

-- 
View this message in context: http://www.nabble.com/FuzzyOCR-tp19672684p20581027.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.