You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Quinn Comendant <qu...@strangecode.com> on 2007/01/16 03:17:14 UTC
FuzzyOCR 3.5.1 not using FUZZY_OCR rule when using hash
I just upgraded FuzzyOCR to 3.5.1 and everything works great. Except, I'm not getting the FUZZY_OCR rule when hashing is enabled. I get different results depending on the value of focr_enable_image_hashing. In the below examples, hashing off gives a much higher score for the orc-gif.eml example. Why?
When running the command:
# spamc -R < ./FuzzyOcr-3.5.1/samples/ocr-gif.eml
When I set focr_enable_image_hashing 0:
[...]
Content analysis details: (12.8 points, 5.0 required)
pts rule name description
---- ---------------------- --------------------------------------------------
0.1 FORGED_RCVD_HELO Received: contains a forged HELO
0.4 HTML_30_40 BODY: Message is 30% to 40% HTML
0.0 HTML_MESSAGE BODY: HTML included in message
0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60%
[score: 0.5334]
0.9 MY_CID_AND_CLOSING SARE cid and closing
1.0 SHORT_HELO_AND_INLINE_IMAGE Short HELO string, with inline image
1.5 FUZZY_OCR_WRONG_CTYPE BODY: Mail contains an image with wrong
content-type set
Image has format "GIF" but content-type is
"image/jpeg"
2.5 FUZZY_OCR_CORRUPT_IMG BODY: Mail contains a corrupted image
Corrupt image: GIF-LIB error: Image is
defective, decoding aborted.
10 FUZZY_OCR BODY: Mail contains an image with common spam text inside
Words found:
"target" in 1 lines
"stock" in 2 lines
"company" in 1 lines
"recommendation" in 1 lines
(7.5 word occurrences found)
-4.1 AWL AWL: From: address is in the auto white-list
When I set focr_enable_image_hashing 2:
[...]
Content analysis details: (7.7 points, 5.0 required)
pts rule name description
---- ---------------------- --------------------------------------------------
0.1 FORGED_RCVD_HELO Received: contains a forged HELO
0.4 HTML_30_40 BODY: Message is 30% to 40% HTML
0.0 HTML_MESSAGE BODY: HTML included in message
0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60%
[score: 0.5334]
0.9 MY_CID_AND_CLOSING SARE cid and closing
1.0 SHORT_HELO_AND_INLINE_IMAGE Short HELO string, with inline image
1.5 FUZZY_OCR_WRONG_CTYPE BODY: Mail contains an image with wrong
content-type set
Image has format "GIF" but content-type is
"image/jpeg"
2.5 FUZZY_OCR_CORRUPT_IMG BODY: Mail contains a corrupted image
Corrupt image: GIF-LIB error: Image is
defective, decoding aborted.
1.3 AWL AWL: From: address is in the auto white-list
The hashing system appears to be working, as the output from fuzzy-stats is:
<<db_hash>>
File Size : 12288 Bytes
File Name : /etc/mail/spamassassin/FuzzyOcr.db
Oldest Hash : Mon Jan 15 19:40:21 2007
Average Score: 12.75
Images in DB : 6
Mon, Jan 15, 2007
Matched : 6 [100.00%]
Avg. Score: 12.75
2 Time(s) -> 10.500 538584 327x549x7 sbillet.jpeg image/jpeg
1 Time(s) -> 33.000 188515 377x500x2 4WQUDM.PNG image/png
1 Time(s) -> 12.000 484116 361x447x7 CIMG0980.gif image/gif
1 Time(s) -> 9.000 541713 274x659x18 rubblein9.gif image/gif
1 Time(s) -> 6.000 326949 246x443x25044 image001.jpg image/jpeg
Thanks for any help!
Quinn
---------------------------------------------------------------------
Strangecode :: Internet Consultancy
http://www.strangecode.com/
+1 530 624 4410
Re: FuzzyOCR 3.5.1 not using FUZZY_OCR rule when using hash
(SOLVED)
Posted by Quinn Comendant <qu...@strangecode.com>.
On Wed, 17 Jan 2007 19:46:54 -0800, Quinn Comendant wrote:
> Also, I've added this issue to ticket #62:
> http://fuzzyocr.own-hero.net/ticket/62
Case closed. I noticed the output score was different between running clamc and spamassassin and realized this was a permissions issue.
These files all need to be readable/writable by the user that spamd is run as:
-rw-rw-r-- 1 vpopmail vchkpw 12288 Jan 26 21:32 /etc/mail/spamassassin/FuzzyOcr.db
-rw-rw-r-- 1 vpopmail vchkpw 0 Jan 26 21:32 /etc/mail/spamassassin/FuzzyOcr.db.lock
-rw-rw-r-- 1 vpopmail vchkpw 12288 Jan 26 21:32 /etc/mail/spamassassin/FuzzyOcr.safe.db
-rw-rw-r-- 1 vpopmail vchkpw 0 Jan 26 21:32 /etc/mail/spamassassin/FuzzyOcr.safe.db.lock
Note to others: when debugging, always run spamassassin as the user that spamd runs as (using sudo or su), for example:
sudo -H -u vpopmail spamassassin -t -D < /etc/mail/spamassassin/FuzzyOcr-3.5.1/samples/ocr-animated.eml 2>&1 | less
Q
---------------------------------------------------------------------
Strangecode :: Internet Consultancy
http://www.strangecode.com/
+1 530 624 4410
Re: FuzzyOCR 3.5.1 not using FUZZY_OCR rule when using hash
Posted by Quinn Comendant <qu...@strangecode.com>.
On Mon, 15 Jan 2007 18:17:14 -0800, Quinn Comendant wrote:
> When I set focr_enable_image_hashing 2:
[...]
HINT: I notice from http://fuzzyocr.own-hero.net/wiki/WhatisFuzzyOcr that this email should be tagged with FUZZY_OCR_KNOWN_HASH but note in my previous email this wasn't included in my spamc -R report.
Also, I've added this issue to ticket #62:
http://fuzzyocr.own-hero.net/ticket/62
Q
---------------------------------------------------------------------
Strangecode :: Internet Consultancy
http://www.strangecode.com/
+1 530 624 4410