You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Quinn Comendant <qu...@strangecode.com> on 2007/01/16 03:17:14 UTC

FuzzyOCR 3.5.1 not using FUZZY_OCR rule when using hash

I just upgraded FuzzyOCR to 3.5.1 and everything works great. Except, I'm not getting the FUZZY_OCR rule when hashing is enabled. I get different results depending on the value of focr_enable_image_hashing. In the below examples, hashing off gives a much higher score for the orc-gif.eml example. Why?

When running the command:

	# spamc -R < ./FuzzyOcr-3.5.1/samples/ocr-gif.eml 

When I set focr_enable_image_hashing 0:

[...]
Content analysis details:   (12.8 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
 0.1 FORGED_RCVD_HELO       Received: contains a forged HELO
 0.4 HTML_30_40             BODY: Message is 30% to 40% HTML
 0.0 HTML_MESSAGE           BODY: HTML included in message
 0.0 BAYES_50               BODY: Bayesian spam probability is 40 to 60%
                            [score: 0.5334]
 0.9 MY_CID_AND_CLOSING     SARE cid and closing
 1.0 SHORT_HELO_AND_INLINE_IMAGE Short HELO string, with inline image
 1.5 FUZZY_OCR_WRONG_CTYPE  BODY: Mail contains an image with wrong
                            content-type set
                            Image has format "GIF" but content-type is
                            "image/jpeg"
 2.5 FUZZY_OCR_CORRUPT_IMG  BODY: Mail contains a corrupted image
                            Corrupt image: GIF-LIB error: Image is
                            defective, decoding aborted.
  10 FUZZY_OCR              BODY: Mail contains an image with common spam text inside
                            Words found:
                            "target" in 1 lines
                            "stock" in 2 lines
                            "company" in 1 lines
                            "recommendation" in 1 lines
                            (7.5 word occurrences found)
-4.1 AWL                    AWL: From: address is in the auto white-list


When I set focr_enable_image_hashing 2:

[...]
Content analysis details:   (7.7 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
 0.1 FORGED_RCVD_HELO       Received: contains a forged HELO
 0.4 HTML_30_40             BODY: Message is 30% to 40% HTML
 0.0 HTML_MESSAGE           BODY: HTML included in message
 0.0 BAYES_50               BODY: Bayesian spam probability is 40 to 60%
                            [score: 0.5334]
 0.9 MY_CID_AND_CLOSING     SARE cid and closing
 1.0 SHORT_HELO_AND_INLINE_IMAGE Short HELO string, with inline image
 1.5 FUZZY_OCR_WRONG_CTYPE  BODY: Mail contains an image with wrong
                            content-type set
                            Image has format "GIF" but content-type is
                            "image/jpeg"
 2.5 FUZZY_OCR_CORRUPT_IMG  BODY: Mail contains a corrupted image
                            Corrupt image: GIF-LIB error: Image is
                            defective, decoding aborted.
 1.3 AWL                    AWL: From: address is in the auto white-list


The hashing system appears to be working, as the output from fuzzy-stats is:

<<db_hash>>
File Size    :     12288 Bytes
File Name    : /etc/mail/spamassassin/FuzzyOcr.db
Oldest Hash  : Mon Jan 15 19:40:21 2007
Average Score:        12.75
Images in DB :         6

Mon, Jan 15, 2007
Matched   :         6 [100.00%]
Avg. Score:        12.75
    2 Time(s) ->   10.500    538584 327x549x7 sbillet.jpeg image/jpeg
    1 Time(s) ->   33.000    188515 377x500x2 4WQUDM.PNG image/png
    1 Time(s) ->   12.000    484116 361x447x7 CIMG0980.gif image/gif
    1 Time(s) ->    9.000    541713 274x659x18 rubblein9.gif image/gif
    1 Time(s) ->    6.000    326949 246x443x25044 image001.jpg image/jpeg


Thanks for any help!
Quinn

---------------------------------------------------------------------
Strangecode :: Internet Consultancy
http://www.strangecode.com/
+1 530 624 4410

Re: FuzzyOCR 3.5.1 not using FUZZY_OCR rule when using hash (SOLVED)

Posted by Quinn Comendant <qu...@strangecode.com>.
On Wed, 17 Jan 2007 19:46:54 -0800, Quinn Comendant wrote:
> Also, I've added this issue to ticket #62: 
> http://fuzzyocr.own-hero.net/ticket/62

Case closed. I noticed the output score was different between running clamc and spamassassin and realized this was a permissions issue. 

These files all need to be readable/writable by the user that spamd is run as:

-rw-rw-r--  1 vpopmail vchkpw 12288 Jan 26 21:32 /etc/mail/spamassassin/FuzzyOcr.db
-rw-rw-r--  1 vpopmail vchkpw     0 Jan 26 21:32 /etc/mail/spamassassin/FuzzyOcr.db.lock
-rw-rw-r--  1 vpopmail vchkpw 12288 Jan 26 21:32 /etc/mail/spamassassin/FuzzyOcr.safe.db
-rw-rw-r--  1 vpopmail vchkpw     0 Jan 26 21:32 /etc/mail/spamassassin/FuzzyOcr.safe.db.lock


Note to others: when debugging, always run spamassassin as the user that spamd runs as (using sudo or su), for example:

sudo -H -u vpopmail spamassassin -t -D < /etc/mail/spamassassin/FuzzyOcr-3.5.1/samples/ocr-animated.eml 2>&1  | less

Q

---------------------------------------------------------------------
Strangecode :: Internet Consultancy
http://www.strangecode.com/
+1 530 624 4410

Re: FuzzyOCR 3.5.1 not using FUZZY_OCR rule when using hash

Posted by Quinn Comendant <qu...@strangecode.com>.
On Mon, 15 Jan 2007 18:17:14 -0800, Quinn Comendant wrote:
> When I set focr_enable_image_hashing 2:
[...]

HINT: I notice from http://fuzzyocr.own-hero.net/wiki/WhatisFuzzyOcr that this email should be tagged with FUZZY_OCR_KNOWN_HASH but note in my previous email this wasn't included in my spamc -R report.

Also, I've added this issue to ticket #62: 
http://fuzzyocr.own-hero.net/ticket/62

Q


---------------------------------------------------------------------
Strangecode :: Internet Consultancy
http://www.strangecode.com/
+1 530 624 4410