You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Ed Kasky <ed...@esson.net> on 2007/01/23 15:26:39 UTC
FuzzyOcr Hash Error
With FuzzyOcr 3.5.1 and SA 3.1.7, I noticed this in the log while
debugging my setup:
2007-01-23 01:39:23 [16842] Processing Message with ID
"<00...@europe>" ("Lacy Silva"
<kn...@astrolabio.net> -> "ed" <ed...@esson.net>)
2007-01-23 01:39:23 [16842] GIF: [248x442] submersible.gif (5458)
2007-01-23 01:39:23 [16842] Found: 1 images
2007-01-23 01:39:23 [16842] Found GIF header name="submersible.gif"
2007-01-23 01:39:23 [16842] Image is single non-interlaced...
2007-01-23 01:39:24 [16842] Calculating image hash for:
/tmp/.spamassassin168423O9h2Ttmp/submersible.gif.pnm
2007-01-23 01:39:24 [16842] Timed out
2007-01-23 01:39:24 [16842] Error calculating the image hash,
skipping hash check...
2007-01-23 01:39:24 [16842] Empty Hash, skipping...
Timeout is set to default of 10 seconds and the hash.db is writeable by spamd.
-rw-rw-r-- 1 spamd spamd 90112 Jan 23 06:19
/etc/mail/spamassassin/FuzzyOcr.db
From the cf:
focr_enable_image_hashing 2
focr_db_hash /etc/mail/spamassassin/FuzzyOcr.db
focr_db_safe /etc/mail/spamassassin/FuzzyOcr.safe.db
The rest of the hash settings are left as default.
As a result, I have had no hits since installing the new version.
Any suggestions as to where to look next are gratefully accepted and
appreciated...
Ed
. . . . . . . . . . . . . . . . . .
Randomly Generated Quote (290 of 1164):
A journey of a thousand miles must begin with a single step.
-- Lao Tsu
Re: FuzzyOcr Hash Error - Fixed
Posted by Ed Kasky <ed...@esson.net>.
At 12:54 PM Tuesday, 1/23/2007, René Berber wrote -=>
>Ed Kasky wrote:
>
> > At 10:23 AM Tuesday, 1/23/2007, René Berber wrote -=>
> >> Ed Kasky wrote:
> >>
> >> > With FuzzyOcr 3.5.1 and SA 3.1.7, I noticed this in the log while
> >> > debugging my setup:
> >> >
> >> > 2007-01-23 01:39:23 [16842] Processing Message with ID
> >> > "<00...@europe>" ("Lacy Silva"
> >> > <kn...@astrolabio.net> -> "ed" <ed...@esson.net>)
> >> > 2007-01-23 01:39:23 [16842] GIF: [248x442] submersible.gif (5458)
> >> > 2007-01-23 01:39:23 [16842] Found: 1 images
> >> > 2007-01-23 01:39:23 [16842] Found GIF header name="submersible.gif"
> >> > 2007-01-23 01:39:23 [16842] Image is single non-interlaced...
> >> > 2007-01-23 01:39:24 [16842] Calculating image hash for:
> >> > /tmp/.spamassassin168423O9h2Ttmp/submersible.gif.pnm
> >> > 2007-01-23 01:39:24 [16842] Timed out
> >>
> >> Look at the timestamp, there was no 10 sec timeout, it was immediate.
> >
> > I know - that caught my attention right away.
>
>What version of module Time::HiRes do you have?
Time::HiRes is up to date (1.9704)
However, I suppose running a debug would have helped ;-)
[456] info: FuzzyOcr: Calculating image hash for:
/tmp/.spamassassin456xeuqXRtmp/CIMG0980.gif.pnm
[456] dbg: FuzzyOcr: Saved pid: 490
[490] dbg: FuzzyOcr: Exec :
/usr/local/netpbm/bin/ppmhist -noheader
/tmp/.spamassassin456xeuqXRtmp/CIMG0980.gif.pnm
[490] dbg: FuzzyOcr: Stdout: >/tmp/.spamassassin456xeuqXRtmp/ppmhist.info
[490] dbg: FuzzyOcr: Stderr: >/dev/null
[456] dbg: FuzzyOcr: Elapsed [490]: 0.162664 sec.
(/usr/local/netpbm/bin/ppmhist: exit 127)
[456] error: FuzzyOcr: Timed out
[456] info: FuzzyOcr: Error calculating the image hash, skipping hash check...
[456] info: FuzzyOcr: Empty Hash, skipping...
[456] dbg: FuzzyOcr: Remove DIR: /tmp/.spamassassin456xeuqXRtmp
[456] dbg: FuzzyOcr: FuzzyOcr ending successfully...
[456] dbg: FuzzyOcr: Processed in 1.138189 sec.
ppmhist couldn't find libnetpbm.so.10 so I added
the path and it's working now. Results from parsing one of the sample emails:
1.5 FUZZY_OCR_WRONG_CTYPE BODY: Mail contains an image with wrong
content-type set
Image has format "GIF" but content-type is
"image/jpeg"
1.5 FUZZY_OCR_WRONG_EXTENSION BODY: Mail contains an image with wrong
file extension
Image has format "GIF" but file extension is
"jpeg"
2.5 FUZZY_OCR_CORRUPT_IMG BODY: Mail contains a corrupted image
Corrupt image: GIF-LIB error: Image is
defective, decoding aborted.
15 FUZZY_OCR_KNOWN_HASH BODY: Mail contains an image with known hash
Words found:
"company" in 1 lines
"recommendation" in 1 lines
"target" in 1 lines
"price" in 2 lines
"service" in 1 lines
"stock" in 2 lines
(12 word occurrences found)
And I got a hit on an email a few minutes ago as well.
Ed Kasky
~~~~~~~~~
Randomly Generated Quote (56 of 526):
"Every people has a right to choose the sovereignty under which they
shall live." --Woodroe Wilson
Re: FuzzyOcr Hash Error
Posted by René Berber <r....@computer.org>.
Ed Kasky wrote:
> At 10:23 AM Tuesday, 1/23/2007, René Berber wrote -=>
>> Ed Kasky wrote:
>>
>> > With FuzzyOcr 3.5.1 and SA 3.1.7, I noticed this in the log while
>> > debugging my setup:
>> >
>> > 2007-01-23 01:39:23 [16842] Processing Message with ID
>> > "<00...@europe>" ("Lacy Silva"
>> > <kn...@astrolabio.net> -> "ed" <ed...@esson.net>)
>> > 2007-01-23 01:39:23 [16842] GIF: [248x442] submersible.gif (5458)
>> > 2007-01-23 01:39:23 [16842] Found: 1 images
>> > 2007-01-23 01:39:23 [16842] Found GIF header name="submersible.gif"
>> > 2007-01-23 01:39:23 [16842] Image is single non-interlaced...
>> > 2007-01-23 01:39:24 [16842] Calculating image hash for:
>> > /tmp/.spamassassin168423O9h2Ttmp/submersible.gif.pnm
>> > 2007-01-23 01:39:24 [16842] Timed out
>>
>> Look at the timestamp, there was no 10 sec timeout, it was immediate.
>
> I know - that caught my attention right away.
What version of module Time::HiRes do you have?
--
René Berber
Re: FuzzyOcr Hash Error
Posted by Ed Kasky <ed...@esson.net>.
At 10:23 AM Tuesday, 1/23/2007, René Berber wrote -=>
>Ed Kasky wrote:
>
> > With FuzzyOcr 3.5.1 and SA 3.1.7, I noticed this in the log while
> > debugging my setup:
> >
> > 2007-01-23 01:39:23 [16842] Processing Message with ID
> > "<00...@europe>" ("Lacy Silva"
> > <kn...@astrolabio.net> -> "ed" <ed...@esson.net>)
> > 2007-01-23 01:39:23 [16842] GIF: [248x442] submersible.gif (5458)
> > 2007-01-23 01:39:23 [16842] Found: 1 images
> > 2007-01-23 01:39:23 [16842] Found GIF header name="submersible.gif"
> > 2007-01-23 01:39:23 [16842] Image is single non-interlaced...
> > 2007-01-23 01:39:24 [16842] Calculating image hash for:
> > /tmp/.spamassassin168423O9h2Ttmp/submersible.gif.pnm
> > 2007-01-23 01:39:24 [16842] Timed out
>
>Look at the timestamp, there was no 10 sec timeout, it was immediate.
I know - that caught my attention right away.
> > 2007-01-23 01:39:24 [16842] Error calculating the image hash, skipping
> > hash check...
> > 2007-01-23 01:39:24 [16842] Empty Hash, skipping...
> >
> > Timeout is set to default of 10 seconds and the hash.db is writeable by
> > spamd.
> >
> > -rw-rw-r-- 1 spamd spamd 90112 Jan 23 06:19
> > /etc/mail/spamassassin/FuzzyOcr.db
>
>The date and size indicates that it has been used very recently.
The date and size changed I think because I
restarted spamd at that time this morning after
checking the cf. 4 1/2 hours later it's still the same.
> > From the cf:
> > focr_enable_image_hashing 2
> > focr_db_hash /etc/mail/spamassassin/FuzzyOcr.db
> > focr_db_safe /etc/mail/spamassassin/FuzzyOcr.safe.db
> >
> > The rest of the hash settings are left as default.
> >
> > As a result, I have had no hits since installing the new version.
>
>When did you install the new version?
About 2 weeks ago.
>For what period of time there are no hits? Do you know how many times the
>plugin was called?
I haven't had any hits since installing. Since
Sunday when the log was rotated, there are 1241
instances in the FuzzyOcr log, 404 scans and 837
cancels due to score being above/below thresholds.
> > Any suggestions as to where to look next are gratefully accepted and
> > appreciated...
>
>There is a global timeout, usually disabled but
>looks like you uncommented the 1
>sec sample value.
# Timeout for the plugin, in seconds. (Maximum runtime of the plugin)
# Default value: 10
focr_timeout 20
# Use a global timeout value instead of per helper application.
# Default value: 0
#focr_global_timeout 1
Still scratching my head on the timeouts and hash db errors...
Ed Kasky
~~~~~~~~~
Randomly Generated Quote (431 of 526):
Scriptures, n. The sacred books of our holy religion, as distinguished
from the false and profane writings on which all other faiths are based.
-Ambrose Bierce, writer (1842-1914) [The Devil's Dictionary]
Re: FuzzyOcr Hash Error
Posted by René Berber <r....@computer.org>.
Ed Kasky wrote:
> With FuzzyOcr 3.5.1 and SA 3.1.7, I noticed this in the log while
> debugging my setup:
>
> 2007-01-23 01:39:23 [16842] Processing Message with ID
> "<00...@europe>" ("Lacy Silva"
> <kn...@astrolabio.net> -> "ed" <ed...@esson.net>)
> 2007-01-23 01:39:23 [16842] GIF: [248x442] submersible.gif (5458)
> 2007-01-23 01:39:23 [16842] Found: 1 images
> 2007-01-23 01:39:23 [16842] Found GIF header name="submersible.gif"
> 2007-01-23 01:39:23 [16842] Image is single non-interlaced...
> 2007-01-23 01:39:24 [16842] Calculating image hash for:
> /tmp/.spamassassin168423O9h2Ttmp/submersible.gif.pnm
> 2007-01-23 01:39:24 [16842] Timed out
Look at the timestamp, there was no 10 sec timeout, it was immediate.
> 2007-01-23 01:39:24 [16842] Error calculating the image hash, skipping
> hash check...
> 2007-01-23 01:39:24 [16842] Empty Hash, skipping...
>
> Timeout is set to default of 10 seconds and the hash.db is writeable by
> spamd.
>
> -rw-rw-r-- 1 spamd spamd 90112 Jan 23 06:19
> /etc/mail/spamassassin/FuzzyOcr.db
The date and size indicates that it has been used very recently.
>
> From the cf:
> focr_enable_image_hashing 2
> focr_db_hash /etc/mail/spamassassin/FuzzyOcr.db
> focr_db_safe /etc/mail/spamassassin/FuzzyOcr.safe.db
>
> The rest of the hash settings are left as default.
>
> As a result, I have had no hits since installing the new version.
When did you install the new version?
For what period of time there are no hits? Do you know how many times the
plugin was called?
> Any suggestions as to where to look next are gratefully accepted and
> appreciated...
There is a global timeout, usually disabled but looks like you uncommented the 1
sec sample value.
--
René Berber