You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Marc Perkel <ma...@perkel.com> on 2006/11/18 04:17:31 UTC

Fuzzy OCR - first time user

OK - trying out the FuzzyOCR plugin. So far it all the default stuff 
with minimal installation. I'm running Fedora Core 6. Used the gocr RPM 
and didn't patch the source. Everything is default and it doesn't seem 
to be complaining so .....

If I like this what do I need to change to really do it right? Should I 
grab the devel code? Do I really need the gocr patch? Should I tweek the 
scores? What do the hard core users change?


Re: Fuzzy OCR - first time user

Posted by decoder <de...@own-hero.net>.
Marc Perkel wrote:
> OK - trying out the FuzzyOCR plugin. So far it all the default stuff 
> with minimal installation. I'm running Fedora Core 6. Used the gocr 
> RPM and didn't patch the source. Everything is default and it doesn't 
> seem to be complaining so .....
>
> If I like this what do I need to change to really do it right? Should 
> I grab the devel code? Do I really need the gocr patch? Should I tweek 
> the scores? What do the hard core users change?
>
My suggestion the FuzzyOcr version is 3.4.x, since it is a lot better. I 
also recommend to enable image hashing which is disabled by default.

About the patch for gocr: I highly suggest to build it from source 
because I don't know if Fedora Core 6 has the proper bindings to netpbm 
compiled with gocr. Redhat does not. That leads to dramatical decrease 
in effectiveness. Also, the patch prevents segmentation faults with some 
pictures, and afaik, this bug still hasn't been fixed.

The scores normally do not need change, unless you get serious problems 
with FPs..

And what the hardcore users change? lol... well, experienced users have 
different scansets, for example they invoke "ocrad" instead of gocr in 
their scansets because it runs faster and recognizes better in most 
situations. In the shipped config file, there is an example for a 
scanset which includes ocrad (If you wan't to try it out, make sure to 
read the "Notes about the config file" page on the FuzzyOcr download 
page as the ocrad scanset contains a small typo which should be fixed 
first :))

Finally, if you run into problems, try our mailing list at 
http://lists.own-hero.net/mailman/listinfo/devel-spam


Best regards,


Chris