You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Evan Platt <ev...@espphotography.com> on 2007/02/25 19:32:11 UTC

Crooked JPG's not being recognized by FuzzyOCR?

Lately, I'm seeing JPG attachments that are 'crooked' (see 
http://www.espphotography.com/crookedjpg.jpg ) . These aren't hitting 
any points with FuzzyOCR.

Am I missing something? Do these hit for anyone else?


Re: Crooked JPG's not being recognized by FuzzyOCR?

Posted by Brian Wilson <wi...@bubba.org>.
On Feb 25, 2007, at 2:29 PM, David Goldsmith wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Ok, I had a permissions issue on some of the FuzzyOCR files so it
> couldn't properly parse it.  Now that the permissions are fixed, my
> system is catching that image.
>
> SA results are:
>
> X-Spam-Report:
> 	* -1.4 ALL_TRUSTED Passed through trusted hosts only via SMTP
> 	*  5.0 FUZZY_OCR BODY: Mail contains an image with common spam
>                text inside
> 	*      Words found:
> 	"target" in 1 lines
> 	"target" in 1 lines
> 	(2 word
> 	*      occurrences found)
> 	* -1.9 AWL AWL: From: address is in the auto white-list
>
> FuzzyOCR flagged it, it just didn't get blocked since it was from  
> me to
> me and only went through internal servers, there were some negative  
> offsets.
>

I have received the same spam message (my copy: http://bubba.org/spam/ 
imagespam13.txt) and FuzzyOcr seems to pick it up fine, but it would  
have still been flagged as spam with other checks.

[12518] info: FuzzyOcr: Scanset "ocrad" found word "target" with fuzz  
of 0.0000
[12518] info: FuzzyOcr: line: " target kt ss"
[12518] info: FuzzyOcr: Scanset "ocrad" found word "company" with  
fuzz of 0.1429
[12518] info: FuzzyOcr: line: "companrirwin resources inc c oder otc  
i w r s p k "
[12518] dbg: FuzzyOcr: Not enough OCR Hits without space stripping,  
doing second matching pass...
[12518] info: FuzzyOcr: Scanset "ocrad" found word "target" with fuzz  
of 0.0000
[12518] info: FuzzyOcr: line: "targetktss"
[12518] info: FuzzyOcr: Scanset "ocrad" found word "company" with  
fuzz of 0.1429
[12518] info: FuzzyOcr: line: "companrirwinresourcesinccoderotciwrspk"
[12518] info: FuzzyOcr: Scanset "ocrad" generates enough hits (4),  
skipping further scansets...

         *  6.0 FUZZY_OCR BODY: Mail contains an image with common  
spam text inside
         *      Words found:
         "target" in 1 lines
         "company" in 1 lines
         "target" in 1
         *      lines
         "company" in 1 lines

-B






Re: Crooked JPG's not being recognized by FuzzyOCR?

Posted by David Goldsmith <dg...@sans.org>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ok, I had a permissions issue on some of the FuzzyOCR files so it
couldn't properly parse it.  Now that the permissions are fixed, my
system is catching that image.

SA results are:

X-Spam-Report:
	* -1.4 ALL_TRUSTED Passed through trusted hosts only via SMTP
	*  5.0 FUZZY_OCR BODY: Mail contains an image with common spam
               text inside
	*      Words found:
	"target" in 1 lines
	"target" in 1 lines
	(2 word
	*      occurrences found)
	* -1.9 AWL AWL: From: address is in the auto white-list

FuzzyOCR flagged it, it just didn't get blocked since it was from me to
me and only went through internal servers, there were some negative offsets.

The FuzzyOCR.log entries for this message were:

2007-02-25 19:24:25 [15469] Starting FuzzyOcr...
2007-02-25 19:24:25 [15469] Processing Message with ID
"<45...@sans.org>" (David Goldsmith <dg...@sans.org> ->
David Goldsmith <dg...@sans.org>)
2007-02-25 19:24:25 [15469] fname: "crookedjpg.jpg" => "crookedjpg.jpg"
2007-02-25 19:24:25 [15469] JPEG: [360x491] crookedjpg.jpg (55507)
2007-02-25 19:24:25 [15469] Saved:
/tmp/.spamassassin154693gktwEtmp/crookedjpg.jpg
2007-02-25 19:24:25 [15469] Saved: /tmp/.spamassassin154693gktwEtmp/raw.eml
2007-02-25 19:24:25 [15469] Found: 1 images
2007-02-25 19:24:25 [15469] pfile =>
/tmp/.spamassassin154693gktwEtmp/crookedjpg.jpg.pnm
2007-02-25 19:24:25 [15469] efile =>
/tmp/.spamassassin154693gktwEtmp/crookedjpg.jpg.err
2007-02-25 19:24:25 [15469] File has Content-Type "image/jpeg" and File
Extension "jpg"
2007-02-25 19:24:25 [15469] Found JPEG header name="crookedjpg.jpg"
2007-02-25 19:24:25 [16133] Exec  : /usr/bin/jpegtopnm
/tmp/.spamassassin154693gktwEtmp/crookedjpg.jpg
2007-02-25 19:24:25 [16133] Stdout:
>/tmp/.spamassassin154693gktwEtmp/crookedjpg.jpg.pnm
2007-02-25 19:24:25 [16133] Stderr:
>>/tmp/.spamassassin154693gktwEtmp/crookedjpg.jpg.err
2007-02-25 19:24:25 [15469] Saved pid: 16133
2007-02-25 19:24:25 [15469] Elapsed [16133]: 0.031423 sec.
(/usr/bin/jpegtopnm: exit 0)
2007-02-25 19:24:25 [15469] Calculating image hash for:
/tmp/.spamassassin154693gktwEtmp/crookedjpg.jpg.pnm
2007-02-25 19:24:25 [15469] Saved pid: 16134
2007-02-25 19:24:25 [16134] Exec  : /usr/bin/ppmhist -noheader
/tmp/.spamassassin154693gktwEtmp/crookedjpg.jpg.pnm
2007-02-25 19:24:25 [16134] Stdout:
>/tmp/.spamassassin154693gktwEtmp/ppmhist.info
2007-02-25 19:24:25 [16134] Stderr: >/dev/null
2007-02-25 19:24:25 [15469] Elapsed [16134]: 0.052791 sec.
(/usr/bin/ppmhist: exit 0)
2007-02-25 19:24:25 [15469] Got:
<530295:360:491:14321::255:251:255:253:7774::255:255:228:252:7550::255:255:226:252:6649::255:252:249:253:6392::255:252:251:253:6053::255:255:230:252:5502>
2007-02-25 19:24:25 [15469] Scanset Order: ocrad(0) ocrad-invert(0)
ocrad-decolorize-invert(0) ocrad-decolorize(0) gocr(0) gocr-180(0)
2007-02-25 19:24:25 [16135] Exec  : /usr/local/bin/ocrad -s5
/tmp/.spamassassin154693gktwEtmp/crookedjpg.jpg.pnm
2007-02-25 19:24:25 [16135] Stdout:
>/tmp/.spamassassin154693gktwEtmp/scanset.ocrad.out
2007-02-25 19:24:25 [16135] Stderr:
>/tmp/.spamassassin154693gktwEtmp/scanset.ocrad.err
2007-02-25 19:24:25 [15469] Saved pid: 16135
2007-02-25 19:24:25 [15469] Elapsed [16135]: 0.135362 sec.
(/usr/local/bin/ocrad: exit 0)
2007-02-25 19:24:25 [15469] ocrdata=>>cANAolaw _qI_KRAIJ ARC AN UNTApPED
MARKET!
                      IWRS IS HERE TO OIG _J_ _WAT GOLD!


                      comp_nr!_ewIlo RE_RCES INC (Od7er o_c: _ w e s . p K )
                      _IWRS
                      _rad_g at: $l
                      _-Oa_ Est: _3,_0
                      _r Targe_ Ln_ S_5
                      Market Indícator: amím


                      GCr ON M_S gA_WAGOW wow_
                      _p LA,T FGATURE GA_MO zao_ tw A @_!

                      <<=end
2007-02-25 19:24:25 [15469] Scanset "ocrad" found word "target" with
fuzz of 0.1667
                      line: "r targe ln ss"
2007-02-25 19:24:25 [15469] Not enough OCR Hits without space stripping,
doing second matching pass...
2007-02-25 19:24:25 [15469] Scanset "ocrad" found word "target" with
fuzz of 0.1667
                      line: "rtargelnss"
2007-02-25 19:24:25 [15469] Scanset "ocrad" generates enough hits (2),
skipping further scansets...
2007-02-25 19:24:25 [15469] Message is spam, score = 5.000
2007-02-25 19:24:25 [15469] Adding Hash to
"/etc/mail/spamassassin/FuzzyOcr.db" with score "5.000"
2007-02-25 19:24:25 [15469] Digest:
530295:360:491:14321::255:251:255:253:7774::255:255:228:252:7550::255:255:226:252:6649::255:252:249:253:6392::255:252:251:253:6053::255:255:230:252:5502
2007-02-25 19:24:25 [15469] Remove DIR: /tmp/.spamassassin15469Cr5QSWtmp
2007-02-25 19:24:25 [15469] Remove DIR: /tmp/.spamassassin154693gktwEtmp
2007-02-25 19:24:25 [15469] FuzzyOcr ending successfully...
2007-02-25 19:24:25 [15469] Processed in 0.415369 sec.


David Goldsmith
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3rc2 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF4eN8417vU8/9QfkRAqciAKCXjnXsFasv4s9haENoD2cwfi1NHQCgvCXl
7ersgxgTorVwI9dREnAqBkM=
=aBYh
-----END PGP SIGNATURE-----

Re: Crooked JPG's not being recognized by FuzzyOCR?

Posted by David Goldsmith <dg...@sans.org>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Evan Platt wrote:
> Lately, I'm seeing JPG attachments that are 'crooked' (see
> http://www.espphotography.com/crookedjpg.jpg ) . These aren't hitting
> any points with FuzzyOCR.
> 
> Am I missing something? Do these hit for anyone else?

Hmm, I just setup FuzzyOCR on a new SA server yesterday and, while it
seems to be working, mine didn't catch this either.

I attached your image to an email I sent myself and got the following
results:

X-Spam-Report:
	* -1.4 ALL_TRUSTED Passed through trusted hosts only via SMTP
	*  0.6 AWL AWL: From: address is in the auto white-list

- From my FuzzyOCR.log file:

2007-02-25 19:04:47 [15469] Starting FuzzyOcr...
2007-02-25 19:04:47 [15469] Processing Message with ID
"<45...@sans.org>" (David Goldsmith <dg...@sans.org> ->
David Goldsmith <dg...@sans.org>)
2007-02-25 19:04:47 [15469] fname: "crookedjpg.jpg" => "crookedjpg.jpg"
2007-02-25 19:04:47 [15469] JPEG: [360x491] crookedjpg.jpg (55507)
2007-02-25 19:04:47 [15469] Saved:
/tmp/.spamassassin15469Cr5QSWtmp/crookedjpg.jpg
2007-02-25 19:04:47 [15469] Saved: /tmp/.spamassassin15469Cr5QSWtmp/raw.eml
2007-02-25 19:04:47 [15469] Found: 1 images
2007-02-25 19:04:47 [15469] pfile =>
/tmp/.spamassassin15469Cr5QSWtmp/crookedjpg.jpg.pnm
2007-02-25 19:04:47 [15469] efile =>
/tmp/.spamassassin15469Cr5QSWtmp/crookedjpg.jpg.err
2007-02-25 19:04:47 [15469] File has Content-Type "image/jpeg" and File
Extension "jpg"
2007-02-25 19:04:47 [15469] Found JPEG header name="crookedjpg.jpg"
2007-02-25 19:04:47 [16061] Exec  : /usr/bin/jpegtopnm
/tmp/.spamassassin15469Cr5QSWtmp/crookedjpg.jpg
2007-02-25 19:04:47 [16061] Stdout:
>/tmp/.spamassassin15469Cr5QSWtmp/crookedjpg.jpg.pnm
2007-02-25 19:04:47 [16061] Stderr:
>>/tmp/.spamassassin15469Cr5QSWtmp/crookedjpg.jpg.err
2007-02-25 19:04:47 [15469] Saved pid: 16061
2007-02-25 19:04:47 [15469] Elapsed [16061]: 0.031555 sec.
(/usr/bin/jpegtopnm: exit 0)
2007-02-25 19:04:47 [15469] Calculating image hash for:
/tmp/.spamassassin15469Cr5QSWtmp/crookedjpg.jpg.pnm
2007-02-25 19:04:48 [15469] Saved pid: 16062
2007-02-25 19:04:48 [16062] Exec  : /usr/bin/ppmhist -noheader
/tmp/.spamassassin15469Cr5QSWtmp/crookedjpg.jpg.pnm
2007-02-25 19:04:48 [16062] Stdout:
>/tmp/.spamassassin15469Cr5QSWtmp/ppmhist.info
2007-02-25 19:04:48 [16062] Stderr: >/dev/null
2007-02-25 19:04:48 [15469] Elapsed [16062]: 0.052911 sec.
(/usr/bin/ppmhist: exit 0)
2007-02-25 19:04:48 [15469] Got:
<530295:360:491:14321::255:251:255:253:7774::255:255:228:252:7550::255:255:226:252:6649::255:252:249:253:6392::255:252:251:253:6053::255:255:230:252:5502>
2007-02-25 19:08:44 [15470] Scan canceled, message has already more than
10 points (15.148).
2007-02-25 19:08:44 [15470] Processed in 0.000229 sec.
2007-02-25 19:09:00 [15469] Scan canceled, message has already more than
10 points (31.104).
2007-02-25 19:09:00 [15469] Processed in 0.000227 sec.
2007-02-25 19:11:06 [15469] Scan canceled, message has already more than
10 points (19.551).

Not sure what the reference to 10 points is, must be FuzzyOCR related
since SA gave the message a negative score.

David Goldsmith


- --
"My company will receive a ten fold return from the investment in SANS
Security training."
Cary Polk, Humanna Inc.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3rc2 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF4eJH417vU8/9QfkRAuQ4AJ9XJ4+RO6XE+TOJ65bhJREQJiu4bACeM1CC
uHOFkq6iV9z1dVH9VsBuvxA=
=9VDf
-----END PGP SIGNATURE-----

Re: Crooked JPG's not being recognized by FuzzyOCR?

Posted by snowcrash+spamassassin <sc...@gmail.com>.
> Do these hit for anyone else?

fwiw, it scores "6.000" for me,

2007-02-25 17:10:33 [21699] JPEG: [360x491] crookedjpg.jpg (55507)
2007-02-25 17:10:33 [21699] Found: 1 images
2007-02-25 17:10:33 [21699] Found JPEG header name="crookedjpg.jpg"
2007-02-25 17:10:33 [21699] Calculating image hash for:
/tmp/.spamassassin21699JCCwwAtmp/crookedjpg.jpg.pnm
2007-02-25 17:10:35 [21699] Scanset Order: ocrad(0) ocrad-invert(0)
ocrad-decolorize-invert(0) ocrad-decolorize(0) gocr(0) gocr-180(0)
2007-02-25 17:10:37 [21699] Scanset "ocrad" found word "target" with
fuzz of 0.1667
                      line: "r targe ln ss"
2007-02-25 17:10:37 [21699] Scanset "ocrad" found word "company" with
fuzz of 0.2857
                      line: "compnriewilo rerces inc coder oc  w e s  p k "
2007-02-25 17:10:37 [21699] Scanset "ocrad" found word "target" with
fuzz of 0.1667
                      line: "rtargelnss"
2007-02-25 17:10:38 [21699] Scanset "ocrad" found word "company" with
fuzz of 0.2857
                      line: "compnriewilorercesinccoderocwespk"
2007-02-25 17:10:45 [21699] Scanset "ocrad-decolorize" found word
"target" with fuzz of 0.1667
                      line: "r targe ln ss"
2007-02-25 17:10:46 [21699] Scanset "ocrad-decolorize" found word
"company" with fuzz of 0.2857
                      line: "compnriewilo rerces inc coder oc  w e s  p k "
2007-02-25 17:10:46 [21699] Scanset "ocrad-decolorize" found word
"target" with fuzz of 0.1667
                      line: "rtargelnss"
2007-02-25 17:10:46 [21699] Scanset "ocrad-decolorize" found word
"company" with fuzz of 0.2857
                      line: "compnriewilorercesinccoderocwespk"
2007-02-25 17:10:48 [21699] Scanset "gocr-180" found word "trade" with
fuzz of 0.2000
                      line: "clorgomadtpbak
tareniegtosglattntwnrdgmwnnilcinstruserbsyj abmricwargahjc
cnohwerwoitc   w rta k pi c "
2007-02-25 17:10:48 [21699] Scanset "gocr-180" found word "trade" with
fuzz of 0.2000
                      line:
"clorgomadtpbaktareniegtosglattntwnrdgmwnnilcinstruserbsyjabmricwargahjccnohwerwoitcwrtakpic"
2007-02-25 17:10:48 [21699] Message is spam, score = 6.000
2007-02-25 17:10:48 [21699] Adding Hash to
"/var/mail/spamassassin/local/FuzzyOcr.db" with score "6.000"
2007-02-25 17:10:48 [21699] Words found:
                      "target" in 1 lines
                      "company" in 1 lines
                      "target" in 1 lines
                      "company" in 1 lines
                      (4 word occurrences found)

Re: Crooked JPG's not being recognized by FuzzyOCR?

Posted by qqqq <qq...@usermail.com>.
| Lately, I'm seeing JPG attachments that are 'crooked' (see 
| http://www.espphotography.com/crookedjpg.jpg ) . These aren't hitting 
| any points with FuzzyOCR.
| 
| Am I missing something? Do these hit for anyone else?

You're not alone.  I'm getting a Fuzzy score of 0

QQQQ