You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Marc Perkel <ma...@perkel.com> on 2006/11/18 04:17:31 UTC
Fuzzy OCR - first time user
OK - trying out the FuzzyOCR plugin. So far it all the default stuff
with minimal installation. I'm running Fedora Core 6. Used the gocr RPM
and didn't patch the source. Everything is default and it doesn't seem
to be complaining so .....
If I like this what do I need to change to really do it right? Should I
grab the devel code? Do I really need the gocr patch? Should I tweek the
scores? What do the hard core users change?
Re: Fuzzy OCR - first time user
Posted by decoder <de...@own-hero.net>.
Marc Perkel wrote:
> OK - trying out the FuzzyOCR plugin. So far it all the default stuff
> with minimal installation. I'm running Fedora Core 6. Used the gocr
> RPM and didn't patch the source. Everything is default and it doesn't
> seem to be complaining so .....
>
> If I like this what do I need to change to really do it right? Should
> I grab the devel code? Do I really need the gocr patch? Should I tweek
> the scores? What do the hard core users change?
>
My suggestion the FuzzyOcr version is 3.4.x, since it is a lot better. I
also recommend to enable image hashing which is disabled by default.
About the patch for gocr: I highly suggest to build it from source
because I don't know if Fedora Core 6 has the proper bindings to netpbm
compiled with gocr. Redhat does not. That leads to dramatical decrease
in effectiveness. Also, the patch prevents segmentation faults with some
pictures, and afaik, this bug still hasn't been fixed.
The scores normally do not need change, unless you get serious problems
with FPs..
And what the hardcore users change? lol... well, experienced users have
different scansets, for example they invoke "ocrad" instead of gocr in
their scansets because it runs faster and recognizes better in most
situations. In the shipped config file, there is an example for a
scanset which includes ocrad (If you wan't to try it out, make sure to
read the "Notes about the config file" page on the FuzzyOcr download
page as the ocrad scanset contains a small typo which should be fixed
first :))
Finally, if you run into problems, try our mailing list at
http://lists.own-hero.net/mailman/listinfo/devel-spam
Best regards,
Chris