You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Bowie Bailey <Bo...@BUC.com> on 2008/09/25 17:56:34 UTC
FuzzyOCR
I've had FuzzyOCR running for quite a while. Today I found a false
positive for it that is a bit strange.
The message has seven images. FuzzyOCR claims to have found the word
"service" in five of them (and counted it 10 times for a score of 6.5).
However, I can only see the word in one of the images and only three of
the seven images have any text at all. Is there a problem here?
Is FuzzyOCR still useful? It doesn't seem to hit a lot for me.
%OFMAIL: 1.18
%OFSPAM: 3.41
%OFHAM: 0.26
--
Bowie
Re: FuzzyOCR
Posted by DaveAtJLA <da...@jla.com>.
Sorry this reply is a bit late, but the problem is a bug in FuzzyOCR. When a
message has multiple images, it ends up appending to the text file instead
of replacing it. The bug is in routine open_on_specific_fd in Misc.pm:
$fname =~ s/> *// and $flags |= O_CREAT|O_WRONLY;
should be
$fname =~ s/> *// and $flags |= O_CREAT|O_WRONLY|O_TRUNC;
(and you have to add O_TRUNC to the import list at the top of the module
too).
I logged this as ticket 555 on the FuzzyOCR website.
Having fixed that, I'm not sure that FuzzyOCR is helping much. Also I've
lowered the FUZZY_OCR_WRONG_EXTENSION score as it was occasionally firing
multiple times on non-spam.
Dave
Bowie Bailey wrote:
>
> I've had FuzzyOCR running for quite a while. Today I found a false
> positive for it that is a bit strange.
>
> The message has seven images. FuzzyOCR claims to have found the word
> "service" in five of them (and counted it 10 times for a score of 6.5).
> However, I can only see the word in one of the images and only three of
> the seven images have any text at all. Is there a problem here?
>
> Is FuzzyOCR still useful? It doesn't seem to hit a lot for me.
>
> %OFMAIL: 1.18
> %OFSPAM: 3.41
> %OFHAM: 0.26
>
> --
> Bowie
>
>
--
View this message in context: http://www.nabble.com/FuzzyOCR-tp19672684p20581027.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.