You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Halid Faith <ma...@ihlas.net.tr> on 2006/12/16 11:59:22 UTC

Why don't my Fuzzyocr see some mails which has spam text in a jpeg file ?

I use spamassassin3.1.7 and fuzzyocr3.4.2

Fuzzyocr usually work well. Yet some mails which contains jpeg can't see. Therefore fuzzyocr don't give any score them as FUZZY_OCR.

Here's my Fuzzyocr.cf 

body FUZZY_OCR eval:fuzzyocr_check()
describe FUZZY_OCR Mail contains an image with common spam text inside
body FUZZY_OCR_WRONG_CTYPE eval:dummy_check()
describe FUZZY_OCR_WRONG_CTYPE Mail contains an image with wrong content-type set
body FUZZY_OCR_CORRUPT_IMG eval:dummy_check()
describe FUZZY_OCR_CORRUPT_IMG Mail contains a corrupted image
body FUZZY_OCR_KNOWN_HASH eval:dummy_check()
describe FUZZY_OCR_KNOWN_HASH Mail contains an image with known hash

priority FUZZY_OCR             900

########### Plugin Configuration #############

#### Logging options #####
# Verbosity level (see manual) Attention: Don't set to 0, but to 0.0 for quiet operation, or comment out the focr_logfile line. (Def
focr_verbose 2.0
#
# Logfile (make sure it is writable by the plugin) (Default value: NONE)
focr_logfile /usr/local/etc/mail/spamassassin/FuzzyOcr.log
##########################

##### Wordlists #####
# Here we defined the words to scan for (Default value: /etc/mail/spamassassin/FuzzyOcr.words)
focr_global_wordlist /usr/local/etc/mail/spamassassin/FuzzyOcr.words
#
# This is the path RELATIVE to the respektive home directory for the personalized list
# This list is merged with the global word list on execution (Default value: .spamassassin/fuzzyocr.words)
# If focr_personal_wordlist begins with '/', treats option as fixed path and does not search HOME
#focr_personal_wordlist .spamassassin/fuzzyocr.words
#####################

# These parameters can be used to change other detection settings
# If you leave these commented out, the defaults will be used.
# Do not use " " around any parameters!
#
##### Location of helper applications (path + binary) (Default values: /usr/bin/<app>) #####
focr_bin_gifsicle /usr/local/bin/gifsicle
focr_bin_giffix /usr/local/bin/giffix
focr_bin_giftext /usr/local/bin/giftext
focr_bin_gifinter /usr/local/bin/gifinter
focr_bin_giftopnm /usr/local/bin/giftopnm
focr_bin_jpegtopnm /usr/local/bin/jpegtopnm
focr_bin_pngtopnm /usr/local/bin/pngtopnm
focr_bin_bmptopnm /usr/local/bin/bmptopnm
focr_bin_tifftopnm /usr/local/bin/tifftopnm
focr_bin_ppmhist /usr/local/bin/ppmhist
focr_bin_gocr /usr/local/bin/gocr
focr_bin_ocrad /usr/local/bin/ocrad
#
focr_path_bin /usr/local/netpbm/bin:/usr/local/bin:/usr/bin
#
############################################################################################

##### Scansets, comma seperated (Default value: $gocr -i -, $gocr -l 180 -d 2 -i -) #####
# Each scanset consists of one or more commands which make text out of pnm input.
# Each scanset is run seperately on the PNM data, results are combined in scoring.
#focr_scansets $gocr -i $pfile, $gocr -l 180 -d 2 -i $pfile
#
# An example that involves ocrad as well
focr_scansets $gocr -i $pfile, $gocr -l 180 -d 2 -i $pfile, $ocrad -s 0.5 -T 0.5 $pfile
#
# Another one for ocrad only
#focr_scansets $ocrad -s 0.5 -T 0.5 $pfile
#
# To use only one scan with default values, uncomment the next line instead
#focr_scansets $gocr -i $pfile


  

Re: Why don't my Fuzzyocr see some mails which has spam text in a jpeg file ?

Posted by Halid Faith <ma...@ihlas.net.tr>.
Yes, My FuzzyOCR recognize spam text in the jpeg sample file.
Debug is already enable.
Still it can't see some mails which spam text in jpeg/gif file like as below
http://212.64.213.55:5000
Can your fuzzyocr see spam text of above gif file?

Also thank you
I changed my scanset line as you said
focr_scansets $gocr -i $pfile, $gocr -l 180 -d 2 -i $pfile, $ocrad -s 5 -T
0.4 $pfile

I restarted sa-spamd





> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> Halid Faith wrote:
> > I use spamassassin3.1.7 and fuzzyocr3.4.2
> >
> > Fuzzyocr usually work well. Yet some mails which contains jpeg
> > can't see. Therefore fuzzyocr don't give any score them as
> > FUZZY_OCR.
> Does the jpeg sample file provided within the tarball work? If that's
> the case, isolate a mail that didn't work with FuzzyOcr, and run it
> from the command line with debugging enabled. This isn't necessarily a
> bug, there are some small spam jpegs that aren't recognized well with
> the standard word list (namely 2 that I know of that need custom words).
>
>
> Best regards,
>
>
> Chris
> >
> > Here's my Fuzzyocr.cf
> Your scanset line contains a small error:
>
> $ocrad -s 0.5 -T 0.5 $pfile should be $ocrad -s 5 -T 0.4 $pfile
>
> That will provide better results, just as a tweak.
>
> Best regards,
>
> Chris
>
> >
> > body FUZZY_OCR eval:fuzzyocr_check() describe FUZZY_OCR Mail
> > contains an image with common spam text inside body
> > FUZZY_OCR_WRONG_CTYPE eval:dummy_check() describe
> > FUZZY_OCR_WRONG_CTYPE Mail contains an image with wrong
> > content-type set body FUZZY_OCR_CORRUPT_IMG eval:dummy_check()
> > describe FUZZY_OCR_CORRUPT_IMG Mail contains a corrupted image body
> > FUZZY_OCR_KNOWN_HASH eval:dummy_check() describe
> > FUZZY_OCR_KNOWN_HASH Mail contains an image with known hash
> >
> > priority FUZZY_OCR             900
> >
> > ########### Plugin Configuration #############
> >
> > #### Logging options ##### # Verbosity level (see manual)
> > Attention: Don't set to 0, but to 0.0 for quiet operation, or
> > comment out the focr_logfile line. (Def focr_verbose 2.0 # #
> > Logfile (make sure it is writable by the plugin) (Default value:
> > NONE) focr_logfile /usr/local/etc/mail/spamassassin/FuzzyOcr.log
> > ##########################
> >
> > ##### Wordlists ##### # Here we defined the words to scan for
> > (Default value: /etc/mail/spamassassin/FuzzyOcr.words)
> > focr_global_wordlist
> > /usr/local/etc/mail/spamassassin/FuzzyOcr.words # # This is the
> > path RELATIVE to the respektive home directory for the personalized
> > list # This list is merged with the global word list on execution
> > (Default value: .spamassassin/fuzzyocr.words) # If
> > focr_personal_wordlist begins with '/', treats option as fixed path
> > and does not search HOME #focr_personal_wordlist
> > .spamassassin/fuzzyocr.words #####################
> >
> > # These parameters can be used to change other detection settings #
> > If you leave these commented out, the defaults will be used. # Do
> > not use " " around any parameters! # ##### Location of helper
> > applications (path + binary) (Default values: /usr/bin/<app>) #####
> >  focr_bin_gifsicle /usr/local/bin/gifsicle focr_bin_giffix
> > /usr/local/bin/giffix focr_bin_giftext /usr/local/bin/giftext
> > focr_bin_gifinter /usr/local/bin/gifinter focr_bin_giftopnm
> > /usr/local/bin/giftopnm focr_bin_jpegtopnm /usr/local/bin/jpegtopnm
> >  focr_bin_pngtopnm /usr/local/bin/pngtopnm focr_bin_bmptopnm
> > /usr/local/bin/bmptopnm focr_bin_tifftopnm /usr/local/bin/tifftopnm
> >  focr_bin_ppmhist /usr/local/bin/ppmhist focr_bin_gocr
> > /usr/local/bin/gocr focr_bin_ocrad /usr/local/bin/ocrad #
> > focr_path_bin /usr/local/netpbm/bin:/usr/local/bin:/usr/bin #
> >
############################################################################
################
> >
> >
> > ##### Scansets, comma seperated (Default value: $gocr -i -, $gocr
> > -l 180 -d 2 -i -) ##### # Each scanset consists of one or more
> > commands which make text out of pnm input. # Each scanset is run
> > seperately on the PNM data, results are combined in scoring.
> > #focr_scansets $gocr -i $pfile, $gocr -l 180 -d 2 -i $pfile # # An
> > example that involves ocrad as well focr_scansets $gocr -i $pfile,
> > $gocr -l 180 -d 2 -i $pfile, $ocrad -s 0.5 -T 0.5 $pfile # #
> > Another one for ocrad only #focr_scansets $ocrad -s 0.5 -T 0.5
> > $pfile # # To use only one scan with default values, uncomment the
> > next line instead #focr_scansets $gocr -i $pfile
> >
> >
> >
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.5 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFFg+L/JQIKXnJyDxURAqsMAJkBuj2GAZiYOwuktV/rI9yqUN30YACfV5n9
> V7Gr+wPYEGkIb0u8EPCg6MA=
> =Y/t1
> -----END PGP SIGNATURE-----
>
>


Re: Why don't my Fuzzyocr see some mails which has spam text in a jpeg file ?

Posted by decoder <de...@own-hero.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Halid Faith wrote:
> I use spamassassin3.1.7 and fuzzyocr3.4.2
>
> Fuzzyocr usually work well. Yet some mails which contains jpeg
> can't see. Therefore fuzzyocr don't give any score them as
> FUZZY_OCR.
Does the jpeg sample file provided within the tarball work? If that's
the case, isolate a mail that didn't work with FuzzyOcr, and run it
from the command line with debugging enabled. This isn't necessarily a
bug, there are some small spam jpegs that aren't recognized well with
the standard word list (namely 2 that I know of that need custom words).


Best regards,


Chris
>
> Here's my Fuzzyocr.cf
Your scanset line contains a small error:

$ocrad -s 0.5 -T 0.5 $pfile should be $ocrad -s 5 -T 0.4 $pfile

That will provide better results, just as a tweak.

Best regards,

Chris

>
> body FUZZY_OCR eval:fuzzyocr_check() describe FUZZY_OCR Mail
> contains an image with common spam text inside body
> FUZZY_OCR_WRONG_CTYPE eval:dummy_check() describe
> FUZZY_OCR_WRONG_CTYPE Mail contains an image with wrong
> content-type set body FUZZY_OCR_CORRUPT_IMG eval:dummy_check()
> describe FUZZY_OCR_CORRUPT_IMG Mail contains a corrupted image body
> FUZZY_OCR_KNOWN_HASH eval:dummy_check() describe
> FUZZY_OCR_KNOWN_HASH Mail contains an image with known hash
>
> priority FUZZY_OCR             900
>
> ########### Plugin Configuration #############
>
> #### Logging options ##### # Verbosity level (see manual)
> Attention: Don't set to 0, but to 0.0 for quiet operation, or
> comment out the focr_logfile line. (Def focr_verbose 2.0 # #
> Logfile (make sure it is writable by the plugin) (Default value:
> NONE) focr_logfile /usr/local/etc/mail/spamassassin/FuzzyOcr.log
> ##########################
>
> ##### Wordlists ##### # Here we defined the words to scan for
> (Default value: /etc/mail/spamassassin/FuzzyOcr.words)
> focr_global_wordlist
> /usr/local/etc/mail/spamassassin/FuzzyOcr.words # # This is the
> path RELATIVE to the respektive home directory for the personalized
> list # This list is merged with the global word list on execution
> (Default value: .spamassassin/fuzzyocr.words) # If
> focr_personal_wordlist begins with '/', treats option as fixed path
> and does not search HOME #focr_personal_wordlist
> .spamassassin/fuzzyocr.words #####################
>
> # These parameters can be used to change other detection settings #
> If you leave these commented out, the defaults will be used. # Do
> not use " " around any parameters! # ##### Location of helper
> applications (path + binary) (Default values: /usr/bin/<app>) #####
>  focr_bin_gifsicle /usr/local/bin/gifsicle focr_bin_giffix
> /usr/local/bin/giffix focr_bin_giftext /usr/local/bin/giftext
> focr_bin_gifinter /usr/local/bin/gifinter focr_bin_giftopnm
> /usr/local/bin/giftopnm focr_bin_jpegtopnm /usr/local/bin/jpegtopnm
>  focr_bin_pngtopnm /usr/local/bin/pngtopnm focr_bin_bmptopnm
> /usr/local/bin/bmptopnm focr_bin_tifftopnm /usr/local/bin/tifftopnm
>  focr_bin_ppmhist /usr/local/bin/ppmhist focr_bin_gocr
> /usr/local/bin/gocr focr_bin_ocrad /usr/local/bin/ocrad #
> focr_path_bin /usr/local/netpbm/bin:/usr/local/bin:/usr/bin #
> ############################################################################################
>
>
> ##### Scansets, comma seperated (Default value: $gocr -i -, $gocr
> -l 180 -d 2 -i -) ##### # Each scanset consists of one or more
> commands which make text out of pnm input. # Each scanset is run
> seperately on the PNM data, results are combined in scoring.
> #focr_scansets $gocr -i $pfile, $gocr -l 180 -d 2 -i $pfile # # An
> example that involves ocrad as well focr_scansets $gocr -i $pfile,
> $gocr -l 180 -d 2 -i $pfile, $ocrad -s 0.5 -T 0.5 $pfile # #
> Another one for ocrad only #focr_scansets $ocrad -s 0.5 -T 0.5
> $pfile # # To use only one scan with default values, uncomment the
> next line instead #focr_scansets $gocr -i $pfile
>
>
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFg+L/JQIKXnJyDxURAqsMAJkBuj2GAZiYOwuktV/rI9yqUN30YACfV5n9
V7Gr+wPYEGkIb0u8EPCg6MA=
=Y/t1
-----END PGP SIGNATURE-----