You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by David Baron <d_...@012.net.il> on 2007/01/08 15:08:07 UTC

FuzzyOcr -- how do I know it is working?

Installed the Debian package. How do I know it is working? Are all those 
"SPAMMY" rules its?

Re: FuzzyOcr -- how do I know it is working?

Posted by David Baron <d_...@012.net.il>.
On Monday 08 January 2007 18:34, Gary V wrote:
> >Installed the Debian package. How do I know it is working? Are all those
> >"SPAMMY" rules its?
>
> I looks like you are using amavisd-new. SPAMMY essentially means a message
> scored between tag2_level and kill_level and is not directly related to
> FuzzyOcr. If you get FuzzyOcr hits you will see FUZZY_OCR rules hit.
> Remember that messages that score over focr_autodisable_score (I think 10
> is the default) will not get scanned. Also remember that you will need to
> reload amavisd-new after each change to FuzzyOcr.cf if you want to see the
> changes.

I am not using amavisd.
I did see some STOCK_IMAGE type hits.
Many of the messages were > 10 so would not have been scanned.
>
> In FuzzyOcr.cf I suggest TEMPORARILY increasing focr_verbose:
> focr_verbose 2
>
> Personally I set it back to 0 when not debugging.
>
> and enabling the log (it must be writable by the user running SA):
> focr_logfile /var/lib/amavis/FuzzyOcr.log
OK, I will set a log file in a normal place.

I have both gocr and ocrad installed.

RE: FuzzyOcr -- how do I know it is working?

Posted by Gary V <mr...@hotmail.com>.
>Installed the Debian package. How do I know it is working? Are all those
>"SPAMMY" rules its?

I looks like you are using amavisd-new. SPAMMY essentially means a message 
scored between tag2_level and kill_level and is not directly related to 
FuzzyOcr. If you get FuzzyOcr hits you will see FUZZY_OCR rules hit. 
Remember that messages that score over focr_autodisable_score (I think 10 is 
the default) will not get scanned. Also remember that you will need to 
reload amavisd-new after each change to FuzzyOcr.cf if you want to see the 
changes.

In FuzzyOcr.cf I suggest TEMPORARILY increasing focr_verbose:
focr_verbose 2

Personally I set it back to 0 when not debugging.

and enabling the log (it must be writable by the user running SA):
focr_logfile /var/lib/amavis/FuzzyOcr.log

placing it in the user's home directory like this should be enough, but just 
in case:
touch /var/lib/amavis/FuzzyOcr.log
chown amavis:amavis /var/lib/amavis/FuzzyOcr.log

(or)
focr_logfile /tmp/FuzzyOcr.log
and:
chown amavis:amavis /tmp/FuzzyOcr.log
(but be careful not to fill up the /tmp directory)

Then (for example) tail -f /var/lib/amavis/FuzzyOcr.log and send a message 
through with an image (preferably a stock scam or pharmacy spam image with 
legible text).

Note that on a Debian system you will get errors because Debian is using an 
older version of netpbm which does not contain all the utilities FuzzyOcr 
expects to find. I personally comment out the preprocessors and scansets 
that use the missing components. Whether I'm doing it correctly is another 
matter.

Regarding The Debian package, on the devel-spam mailing list, I posted:

#################

I think it would be good if the package installed both ocrad and gocr
since ocrad does a good job.

Also, you should possibly comment out the preprocessors (and scansets
that use them) that are not included with the currently available
Debian netpbm package. This would prevent errors like:

Cannot find executable for ocrad
Cannot find executable for pamthreshold
Cannot find executable for pamtopnm
Cannot find executable for tesseract
Skipping ocrad, invalid command '$ocrad'
Skipping ocrad-invert, invalid command '$ocrad'
Skipping ocrad-decolorize-invert, invalid command '$ocrad'
Skipping ocrad-decolorize, invalid command '$ocrad'

and with ocrad installed:

Cannot find executable for pamthreshold
Cannot find executable for pamtopnm
Cannot find executable for tesseract
Error running preprocessor(pamthreshold): pamthreshold -simple -threshold 
0.5
Errors in Scanset "ocrad-decolorize-invert"
Return code: 2048, Error: save_execute: failed to exec pamthreshold -simple 
-threshold 0.5: No such file or directory at 
/usr/share/perl5/FuzzyOcr/Misc.pm line 173.
Skipping scanset because of errors, trying next...
Error running preprocessor(pamthreshold): pamthreshold -simple -threshold 
0.5
Errors in Scanset "ocrad-decolorize"
Return code: 2048, Error: save_execute: failed to exec pamthreshold -simple 
-threshold 0.5: No such file or directory at 
/usr/share/perl5/FuzzyOcr/Misc.pm line 173.
Skipping scanset because of errors, trying next...

Possibly (but I'm not certain):

--- FuzzyOcr.cf-original        2007-01-07 16:27:17.093798195 -0700
+++ FuzzyOcr.cf 2007-01-07 16:29:12.319402455 -0700
@@ -99,8 +99,9 @@

# Include additional scanner/preprocessor commands here:
#
-focr_bin_helper pnmnorm, pnminvert, pamthreshold, ppmtopgm, pamtopnm
-focr_bin_helper tesseract
+#focr_bin_helper pnmnorm, pnminvert, pamthreshold, ppmtopgm, pamtopnm
+#focr_bin_helper tesseract
+focr_bin_helper pnmnorm, pnminvert, ppmtopgm


--- FuzzyOcr.scansets-original  2007-01-07 16:27:53.607240168 -0700
+++ FuzzyOcr.scansets   2007-01-07 16:29:58.825582474 -0700
@@ -18,19 +18,19 @@
     args = -s5 -i $input
}

-# Inverted Ocrad scanset with decolorization
-scanset ocrad-decolorize-invert {
-    preprocessors = ppmtopgm, pamthreshold, pamtopnm
-    command = $ocrad
-    args = -s5 -i $input
-}
+## Inverted Ocrad scanset with decolorization
+#scanset ocrad-decolorize-invert {
+#    preprocessors = ppmtopgm, pamthreshold, pamtopnm
+#    command = $ocrad
+#    args = -s5 -i $input
+#}

-# Ocrad scanset with decolorization
-scanset ocrad-decolorize {
-    preprocessors = ppmtopgm, pamthreshold, pamtopnm
-    command = $ocrad
-    args = -s5 $input
-}
+## Ocrad scanset with decolorization
+#scanset ocrad-decolorize {
+#    preprocessors = ppmtopgm, pamthreshold, pamtopnm
+#    command = $ocrad
+#    args = -s5 $input
+#}


--- FuzzyOcr.preps-original     2007-01-07 16:27:39.158044309 -0700
+++ FuzzyOcr.preps      2007-01-07 16:30:51.907932931 -0700
@@ -16,16 +16,16 @@
     command = ppmtopgm
}

-# Converts PAM to PNM
-preprocessor pamtopnm {
-    command = pamtopnm
-}
+## Converts PAM to PNM
+#preprocessor pamtopnm {
+#    command = pamtopnm
+#}

-# Uses thresholding on the PAM file
-preprocessor pamthreshold {
-    command = pamthreshold
-    args = -simple -threshold 0.5
-}
+## Uses thresholding on the PAM file
+#preprocessor pamthreshold {
+#    command = pamthreshold
+#    args = -simple -threshold 0.5
+#}


###################


Gary V

_________________________________________________________________
Fixing up the home? Live Search can help 
http://imagine-windowslive.com/search/kits/default.aspx?kit=improve&locale=en-US&source=hmemailtaglinenov06&FORM=WLMTAG