You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Ed Kasky <ed...@esson.net> on 2007/01/23 23:22:49 UTC

Re: FuzzyOcr Hash Error - Fixed

At 12:54 PM Tuesday, 1/23/2007, René Berber wrote -=>
>Ed Kasky wrote:
>
> > At 10:23 AM Tuesday, 1/23/2007, René Berber wrote -=>
> >> Ed Kasky wrote:
> >>
> >> > With FuzzyOcr 3.5.1 and SA 3.1.7, I noticed this in the log while
> >> > debugging my setup:
> >> >
> >> > 2007-01-23 01:39:23 [16842] Processing Message with ID
> >> > "<00...@europe>" ("Lacy Silva"
> >> > <kn...@astrolabio.net> -> "ed" <ed...@esson.net>)
> >> > 2007-01-23 01:39:23 [16842] GIF: [248x442] submersible.gif (5458)
> >> > 2007-01-23 01:39:23 [16842] Found: 1 images
> >> > 2007-01-23 01:39:23 [16842] Found GIF header name="submersible.gif"
> >> > 2007-01-23 01:39:23 [16842] Image is single non-interlaced...
> >> > 2007-01-23 01:39:24 [16842] Calculating image hash for:
> >> > /tmp/.spamassassin168423O9h2Ttmp/submersible.gif.pnm
> >> > 2007-01-23 01:39:24 [16842] Timed out
> >>
> >> Look at the timestamp, there was no 10 sec timeout, it was immediate.
> >
> > I know - that caught my attention right away.
>
>What version of module Time::HiRes do you have?

Time::HiRes is up to date (1.9704)

However, I suppose running a debug would have helped ;-)

[456] info: FuzzyOcr: Calculating image hash for: 
/tmp/.spamassassin456xeuqXRtmp/CIMG0980.gif.pnm
[456] dbg: FuzzyOcr: Saved pid: 490
[490] dbg: FuzzyOcr: Exec : 
/usr/local/netpbm/bin/ppmhist -noheader 
/tmp/.spamassassin456xeuqXRtmp/CIMG0980.gif.pnm
[490] dbg: FuzzyOcr: Stdout: >/tmp/.spamassassin456xeuqXRtmp/ppmhist.info
[490] dbg: FuzzyOcr: Stderr: >/dev/null
[456] dbg: FuzzyOcr: Elapsed [490]: 0.162664 sec. 
(/usr/local/netpbm/bin/ppmhist: exit 127)
[456] error: FuzzyOcr: Timed out
[456] info: FuzzyOcr: Error calculating the image hash, skipping hash check...
[456] info: FuzzyOcr: Empty Hash, skipping...
[456] dbg: FuzzyOcr: Remove DIR: /tmp/.spamassassin456xeuqXRtmp
[456] dbg: FuzzyOcr: FuzzyOcr ending successfully...
[456] dbg: FuzzyOcr: Processed in 1.138189 sec.

ppmhist couldn't find libnetpbm.so.10 so I added 
the path and it's working now.  Results from parsing one of the sample emails:

1.5 FUZZY_OCR_WRONG_CTYPE  BODY: Mail contains an image with wrong
                             content-type set
                             Image has format "GIF" but content-type is
                             "image/jpeg"
1.5 FUZZY_OCR_WRONG_EXTENSION BODY: Mail contains an image with wrong
                             file extension
                             Image has format "GIF" but file extension is
                             "jpeg"
2.5 FUZZY_OCR_CORRUPT_IMG  BODY: Mail contains a corrupted image
                             Corrupt image: GIF-LIB error: Image is
                             defective, decoding aborted.
15 FUZZY_OCR_KNOWN_HASH   BODY: Mail contains an image with known hash
                             Words found:
                             "company" in 1 lines
                             "recommendation" in 1 lines
                             "target" in 1 lines
                             "price" in 2 lines
                             "service" in 1 lines
                             "stock" in 2 lines
                             (12 word occurrences found)

And I got a hit on an email a few minutes ago as well.

Ed Kasky
~~~~~~~~~
Randomly Generated Quote (56 of 526):
"Every people has a right to choose the sovereignty under which they
shall live."   --Woodroe Wilson