You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Ed Kasky <ed...@esson.net> on 2007/01/23 15:26:39 UTC

FuzzyOcr Hash Error

With FuzzyOcr 3.5.1 and SA 3.1.7, I noticed this in the log while 
debugging my setup:

2007-01-23 01:39:23 [16842] Processing Message with ID 
"<00...@europe>" ("Lacy Silva" 
<kn...@astrolabio.net> -> "ed" <ed...@esson.net>)
2007-01-23 01:39:23 [16842] GIF: [248x442] submersible.gif (5458)
2007-01-23 01:39:23 [16842] Found: 1 images
2007-01-23 01:39:23 [16842] Found GIF header name="submersible.gif"
2007-01-23 01:39:23 [16842] Image is single non-interlaced...
2007-01-23 01:39:24 [16842] Calculating image hash for: 
/tmp/.spamassassin168423O9h2Ttmp/submersible.gif.pnm
2007-01-23 01:39:24 [16842] Timed out
2007-01-23 01:39:24 [16842] Error calculating the image hash, 
skipping hash check...
2007-01-23 01:39:24 [16842] Empty Hash, skipping...

Timeout is set to default of 10 seconds and the hash.db is writeable by spamd.

-rw-rw-r--    1 spamd    spamd       90112 Jan 23 06:19 
/etc/mail/spamassassin/FuzzyOcr.db

 From the cf:
focr_enable_image_hashing 2
focr_db_hash /etc/mail/spamassassin/FuzzyOcr.db
focr_db_safe /etc/mail/spamassassin/FuzzyOcr.safe.db

The rest of the hash settings are left as default.

As a result, I have had no hits since installing the new version.

Any suggestions as to where to look next are gratefully accepted and 
appreciated...

Ed

. . . . . . . . . . . . . . . . . .
Randomly Generated Quote (290 of 1164):
A journey of a thousand miles must begin with a single step.
                 -- Lao Tsu


Re: FuzzyOcr Hash Error - Fixed

Posted by Ed Kasky <ed...@esson.net>.
At 12:54 PM Tuesday, 1/23/2007, René Berber wrote -=>
>Ed Kasky wrote:
>
> > At 10:23 AM Tuesday, 1/23/2007, René Berber wrote -=>
> >> Ed Kasky wrote:
> >>
> >> > With FuzzyOcr 3.5.1 and SA 3.1.7, I noticed this in the log while
> >> > debugging my setup:
> >> >
> >> > 2007-01-23 01:39:23 [16842] Processing Message with ID
> >> > "<00...@europe>" ("Lacy Silva"
> >> > <kn...@astrolabio.net> -> "ed" <ed...@esson.net>)
> >> > 2007-01-23 01:39:23 [16842] GIF: [248x442] submersible.gif (5458)
> >> > 2007-01-23 01:39:23 [16842] Found: 1 images
> >> > 2007-01-23 01:39:23 [16842] Found GIF header name="submersible.gif"
> >> > 2007-01-23 01:39:23 [16842] Image is single non-interlaced...
> >> > 2007-01-23 01:39:24 [16842] Calculating image hash for:
> >> > /tmp/.spamassassin168423O9h2Ttmp/submersible.gif.pnm
> >> > 2007-01-23 01:39:24 [16842] Timed out
> >>
> >> Look at the timestamp, there was no 10 sec timeout, it was immediate.
> >
> > I know - that caught my attention right away.
>
>What version of module Time::HiRes do you have?

Time::HiRes is up to date (1.9704)

However, I suppose running a debug would have helped ;-)

[456] info: FuzzyOcr: Calculating image hash for: 
/tmp/.spamassassin456xeuqXRtmp/CIMG0980.gif.pnm
[456] dbg: FuzzyOcr: Saved pid: 490
[490] dbg: FuzzyOcr: Exec : 
/usr/local/netpbm/bin/ppmhist -noheader 
/tmp/.spamassassin456xeuqXRtmp/CIMG0980.gif.pnm
[490] dbg: FuzzyOcr: Stdout: >/tmp/.spamassassin456xeuqXRtmp/ppmhist.info
[490] dbg: FuzzyOcr: Stderr: >/dev/null
[456] dbg: FuzzyOcr: Elapsed [490]: 0.162664 sec. 
(/usr/local/netpbm/bin/ppmhist: exit 127)
[456] error: FuzzyOcr: Timed out
[456] info: FuzzyOcr: Error calculating the image hash, skipping hash check...
[456] info: FuzzyOcr: Empty Hash, skipping...
[456] dbg: FuzzyOcr: Remove DIR: /tmp/.spamassassin456xeuqXRtmp
[456] dbg: FuzzyOcr: FuzzyOcr ending successfully...
[456] dbg: FuzzyOcr: Processed in 1.138189 sec.

ppmhist couldn't find libnetpbm.so.10 so I added 
the path and it's working now.  Results from parsing one of the sample emails:

1.5 FUZZY_OCR_WRONG_CTYPE  BODY: Mail contains an image with wrong
                             content-type set
                             Image has format "GIF" but content-type is
                             "image/jpeg"
1.5 FUZZY_OCR_WRONG_EXTENSION BODY: Mail contains an image with wrong
                             file extension
                             Image has format "GIF" but file extension is
                             "jpeg"
2.5 FUZZY_OCR_CORRUPT_IMG  BODY: Mail contains a corrupted image
                             Corrupt image: GIF-LIB error: Image is
                             defective, decoding aborted.
15 FUZZY_OCR_KNOWN_HASH   BODY: Mail contains an image with known hash
                             Words found:
                             "company" in 1 lines
                             "recommendation" in 1 lines
                             "target" in 1 lines
                             "price" in 2 lines
                             "service" in 1 lines
                             "stock" in 2 lines
                             (12 word occurrences found)

And I got a hit on an email a few minutes ago as well.

Ed Kasky
~~~~~~~~~
Randomly Generated Quote (56 of 526):
"Every people has a right to choose the sovereignty under which they
shall live."   --Woodroe Wilson


Re: FuzzyOcr Hash Error

Posted by René Berber <r....@computer.org>.
Ed Kasky wrote:

> At 10:23 AM Tuesday, 1/23/2007, René Berber wrote -=>
>> Ed Kasky wrote:
>>
>> > With FuzzyOcr 3.5.1 and SA 3.1.7, I noticed this in the log while
>> > debugging my setup:
>> >
>> > 2007-01-23 01:39:23 [16842] Processing Message with ID
>> > "<00...@europe>" ("Lacy Silva"
>> > <kn...@astrolabio.net> -> "ed" <ed...@esson.net>)
>> > 2007-01-23 01:39:23 [16842] GIF: [248x442] submersible.gif (5458)
>> > 2007-01-23 01:39:23 [16842] Found: 1 images
>> > 2007-01-23 01:39:23 [16842] Found GIF header name="submersible.gif"
>> > 2007-01-23 01:39:23 [16842] Image is single non-interlaced...
>> > 2007-01-23 01:39:24 [16842] Calculating image hash for:
>> > /tmp/.spamassassin168423O9h2Ttmp/submersible.gif.pnm
>> > 2007-01-23 01:39:24 [16842] Timed out
>>
>> Look at the timestamp, there was no 10 sec timeout, it was immediate.
> 
> I know - that caught my attention right away.

What version of module Time::HiRes do you have?
-- 
René Berber


Re: FuzzyOcr Hash Error

Posted by Ed Kasky <ed...@esson.net>.
At 10:23 AM Tuesday, 1/23/2007, René Berber wrote -=>
>Ed Kasky wrote:
>
> > With FuzzyOcr 3.5.1 and SA 3.1.7, I noticed this in the log while
> > debugging my setup:
> >
> > 2007-01-23 01:39:23 [16842] Processing Message with ID
> > "<00...@europe>" ("Lacy Silva"
> > <kn...@astrolabio.net> -> "ed" <ed...@esson.net>)
> > 2007-01-23 01:39:23 [16842] GIF: [248x442] submersible.gif (5458)
> > 2007-01-23 01:39:23 [16842] Found: 1 images
> > 2007-01-23 01:39:23 [16842] Found GIF header name="submersible.gif"
> > 2007-01-23 01:39:23 [16842] Image is single non-interlaced...
> > 2007-01-23 01:39:24 [16842] Calculating image hash for:
> > /tmp/.spamassassin168423O9h2Ttmp/submersible.gif.pnm
> > 2007-01-23 01:39:24 [16842] Timed out
>
>Look at the timestamp, there was no 10 sec timeout, it was immediate.

I know - that caught my attention right away.

> > 2007-01-23 01:39:24 [16842] Error calculating the image hash, skipping
> > hash check...
> > 2007-01-23 01:39:24 [16842] Empty Hash, skipping...
> >
> > Timeout is set to default of 10 seconds and the hash.db is writeable by
> > spamd.
> >
> > -rw-rw-r--    1 spamd    spamd       90112 Jan 23 06:19
> > /etc/mail/spamassassin/FuzzyOcr.db
>
>The date and size indicates that it has been used very recently.

The date and size changed I think because I 
restarted spamd at that time this morning after 
checking the cf.  4 1/2 hours later it's still the same.

> > From the cf:
> > focr_enable_image_hashing 2
> > focr_db_hash /etc/mail/spamassassin/FuzzyOcr.db
> > focr_db_safe /etc/mail/spamassassin/FuzzyOcr.safe.db
> >
> > The rest of the hash settings are left as default.
> >
> > As a result, I have had no hits since installing the new version.
>
>When did you install the new version?

About 2 weeks ago.

>For what period of time there are no hits?  Do you know how many times the
>plugin was called?

I haven't had any hits since installing.  Since 
Sunday when the log was rotated, there are 1241 
instances in the FuzzyOcr log, 404 scans and 837 
cancels due to score being above/below thresholds.


> > Any suggestions as to where to look next are gratefully accepted and
> > appreciated...
>
>There is a global timeout, usually disabled but 
>looks like you uncommented the 1
>sec sample value.

# Timeout for the plugin, in seconds. (Maximum runtime of the plugin)
# Default value: 10
focr_timeout 20

# Use a global timeout value instead of per helper application.
# Default value: 0
#focr_global_timeout 1

Still scratching my head on the timeouts and hash db errors...

Ed Kasky
~~~~~~~~~
Randomly Generated Quote (431 of 526):
Scriptures, n. The sacred books of our holy religion, as distinguished
from the false and profane writings on which all other faiths are based.
-Ambrose Bierce, writer (1842-1914) [The Devil's Dictionary]


Re: FuzzyOcr Hash Error

Posted by René Berber <r....@computer.org>.
Ed Kasky wrote:

> With FuzzyOcr 3.5.1 and SA 3.1.7, I noticed this in the log while
> debugging my setup:
> 
> 2007-01-23 01:39:23 [16842] Processing Message with ID
> "<00...@europe>" ("Lacy Silva"
> <kn...@astrolabio.net> -> "ed" <ed...@esson.net>)
> 2007-01-23 01:39:23 [16842] GIF: [248x442] submersible.gif (5458)
> 2007-01-23 01:39:23 [16842] Found: 1 images
> 2007-01-23 01:39:23 [16842] Found GIF header name="submersible.gif"
> 2007-01-23 01:39:23 [16842] Image is single non-interlaced...
> 2007-01-23 01:39:24 [16842] Calculating image hash for:
> /tmp/.spamassassin168423O9h2Ttmp/submersible.gif.pnm
> 2007-01-23 01:39:24 [16842] Timed out

Look at the timestamp, there was no 10 sec timeout, it was immediate.

> 2007-01-23 01:39:24 [16842] Error calculating the image hash, skipping
> hash check...
> 2007-01-23 01:39:24 [16842] Empty Hash, skipping...
> 
> Timeout is set to default of 10 seconds and the hash.db is writeable by
> spamd.
> 
> -rw-rw-r--    1 spamd    spamd       90112 Jan 23 06:19
> /etc/mail/spamassassin/FuzzyOcr.db

The date and size indicates that it has been used very recently.

> 
> From the cf:
> focr_enable_image_hashing 2
> focr_db_hash /etc/mail/spamassassin/FuzzyOcr.db
> focr_db_safe /etc/mail/spamassassin/FuzzyOcr.safe.db
> 
> The rest of the hash settings are left as default.
> 
> As a result, I have had no hits since installing the new version.

When did you install the new version?

For what period of time there are no hits?  Do you know how many times the
plugin was called?

> Any suggestions as to where to look next are gratefully accepted and
> appreciated...

There is a global timeout, usually disabled but looks like you uncommented the 1
sec sample value.
-- 
René Berber