You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2007/02/23 01:22:19 UTC

[Bug 5349] New: scanning w/ v320-trunk shows diff/missing header displays in FuzzyOCR test output

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5349

           Summary: scanning w/ v320-trunk shows diff/missing header
                    displays in FuzzyOCR test output
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P5
         Component: spamassassin
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: schneecrash+apache@gmail.com


i can't get any discussion abt _whether_ this *is* a bug, so i'll simply file it
as one, and decide-by-discussion here ...

testing,

 spamassassin --version
   SpamAssassin version 3.2.0-pre1-r499012
     running on Perl version 5.8.8

& using with,

 FuzzyOCR 3.5.1

plugin.

two test cases,

  (1) spamasssassin @ cmd_line
  (2) sent/recd email

show similar behavior of missing/truncated fuzzyocr headers, only in the case of
v320/trunk; v318/trunk is ok, and does not display this behavior.


case (1):

test with,

 spamassassin -D -t -x < /usr/ports/FuzzyOcr/samples/ocr-animated.eml

in 'verbose' fuzzyocr.log,

 ...
 2007-02-22 14:07:35 [6252] Found: 1 images
 2007-02-22 14:07:35 [6252] Found GIF header name="CIMG0980.gif"
 2007-02-22 14:07:36 [6252] Image is interlaced or animated...
 2007-02-22 14:07:36 [6252] File contains <7> images, deanimating...
 2007-02-22 14:07:37 [6252] Calculating image hash for:
/tmp/.spamassassin6252Qdn9h3tmp/CIMG0980.gif.pnm
 2007-02-22 14:07:37 [6252] Updating Exact info File:'CIMG0980.gif'
Type:'image/gif'
 2007-02-22 14:07:37 [6252] Found Score <15.500> for Exact Image Hash
 2007-02-22 14:07:37 [6252] Matched [1] time(s). Prev match:  15 min.
40 sec. ago
 2007-02-22 14:07:37 [6252] Message is SPAM. Words found:
             "investor" in 1 lines
             "price" in 2 lines
             "company" in 1 lines
             "alert" in 1 lines
             "valium" in 1 lines
             "trade" in 1 lines
             "banking" in 1 lines
             "news" in 1 lines
             (13.5 word occurrences found)

 %

but, at console, i _only_ see,

 ...
 Content analysis details:   (43.7 points, 5.0 required)

  pts rule name              description
 ---- ---------------------- --------------------------------------------------
  0.1 RDNS_NONE              Delivered to trusted network by a host
with no rDNS
  4.5 HELO_LOCALHOST         HELO_LOCALHOST
  0.5 FH_MSGID_01C67         Special MSGID
  2.3 CTYPE_001C_A           CTYPE_001C_A
  1.7 OUTLOOK_3416           Claims to be sent by an unusual build of
Outlook (3416)
  0.0 DK_POLICY_SIGNSOME     Domain Keys: policy says domain signs some mails
  3.3 DATE_IN_FUTURE_12_24   Date: is 12 to 24 hours after Received: date
  5.0 BOTNET                 Relay might be a spambot or virusbot
               [botnet0.7,ip=58.186.156.15,nordns]
  0.0 DKIM_POLICY_SIGNSOME   Domain Keys Identified Mail: policy says domain
               signs some mails
  0.0 BOTNET_NORDNS          Relay's IP address has no PTR record
               [botnet_nordns,ip=58.186.156.15]
  0.0 HTML_MESSAGE           BODY: HTML included in message
  1.9 TVD_VIS_HIDDEN         RAW: TVD_VIS_HIDDEN
  1.8 MIME_QP_LONG_LINE      RAW: Quoted-printable line longer than 76 chars
  1.5 RAZOR2_CF_RANGE_E8_51_100 Razor2 gives engine 8 confidence level
               above 50%
               [cf: 100]
  0.5 RAZOR2_CHECK           Listed in Razor2 (http://razor.sf.net/)
  1.5 RAZOR2_CF_RANGE_E4_51_100 Razor2 gives engine 4 confidence level
               above 50%
               [cf: 100]
  0.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
               [cf: 100]
  1.4 DCC_CHECK              Listed in DCC (http://rhyolite.com/anti-spam/dcc/)
  0.0 DIGEST_MULTIPLE        Message hits more than one network digest check
  3.6 XMAILER_MIMEOLE_OL_465CD XMAILER_MIMEOLE_OL_465CD
  1.9 HDR_ORDER_FTSDMCXX_001C Header order similar to spam (FTSDMCXX/MID
               variant)
  0.7 SHORT_HELO_AND_INLINE_IMAGE Short HELO string, with inline image
   11 FUZZY_OCR              BODY:
 %


NOTE, there's NO detail to the FUZZY_OCR header output.


case (2):

test, with a 'sent/recd' email, rather than just a file test @ cmd_line

similarly, with this image,

       http://img181.imageshack.us/img181/2156/spamsc2.gif

attached to an otherwise blank email, on receipt, i see in "FuzzyOCR.log",

 2007-02-22 14:22:57 [27803] Processing Message with ID
"<11...@spamassassin_spamd_init>"
(ignore@compiling.spamassassin.taint.org -> <no receipients>)
 2007-02-22 14:25:10 [6298] Processing Message with ID
"<45...@gmail.com>" (SnowCrash
<sc...@gmail.com> -> "SnowCrash"
<sn...@mydomain.com>)
 2007-02-22 14:25:10 [6298] GIF: [320x512] spam.gif (10195)
 2007-02-22 14:25:10 [6298] Found: 1 images
 2007-02-22 14:25:10 [6298] Found GIF header name="spam.gif"
 2007-02-22 14:25:11 [6298] Image is single non-interlaced...
 2007-02-22 14:25:12 [6298] Calculating image hash for:
/tmp/.spamassassin6298Zhf5nItmp/spam.gif.pnm
 2007-02-22 14:25:12 [6298] Scanset Order: ocrad(0) ocrad-invert(0)
ocrad-decolorize-invert(0) ocrad-decolorize(0) gocr(0) gocr-180(0)
 2007-02-22 14:25:14 [6298] Scanset "ocrad" found word "target" with
fuzz of 0.0000
     line: "target s"
 2007-02-22 14:25:14 [6298] Scanset "ocrad" found word "investor"
with fuzz of 0.2500
     line: " fhe lncreasing inrest receilled br th liile gotwtg"
 2007-02-22 14:25:14 [6298] Scanset "ocrad" found word "breaking"
with fuzz of 0.2500
     line: " fhe lncreasing inrest receilled br th liile gotwtg"
 2007-02-22 14:25:22 [6298] Scanset "ocrad-decolorize" found word
"target" with fuzz of 0.0000
     line: "target s"
 2007-02-22 14:25:22 [6298] Scanset "ocrad-decolorize" found word
"investor" with fuzz of 0.2500
     line: " fhe lncreasing inrest receilled br th liile gotwtg"
 2007-02-22 14:25:23 [6298] Scanset "ocrad-decolorize" found word
"breaking" with fuzz of 0.2500
     line: " fhe lncreasing inrest receilled br th liile gotwtg"
 2007-02-22 14:25:23 [6298] Scanset "gocr" found word "erectile" with
fuzz of 0.2500
     line: " e increasln ingrest receiled hr j lirg ne  t u t  "
 2007-02-22 14:25:23 [6298] Scanset "gocr" found word "target" with
fuzz of 0.0000
     line: "target "
 2007-02-22 14:25:24 [6298] Scanset "gocr" found word "erectile" with
fuzz of 0.2500
     line: "eincreaslningrestreceiledhrjlirgnetut"
 2007-02-22 14:25:24 [6298] Scanset "gocr" found word "buy" with fuzz of 0.0000
     line: "momemnsborqbuy"
 2007-02-22 14:25:24 [6298] Scanset "gocr" found word "target" with
fuzz of 0.0000
     line: "target"
 2007-02-22 14:25:25 [6298] Scanset "gocr-180" found word "target"
with fuzz of 0.0000
     line: "target "
 2007-02-22 14:25:26 [6298] Scanset "gocr-180" found word "buy" with
fuzz of 0.0000
     line: "momemnsborqbuy"
 2007-02-22 14:25:26 [6298] Scanset "gocr-180" found word "target"
with fuzz of 0.0000
     line: "target"
 2007-02-22 14:25:26 [6298] Message is spam, score = 9.500
 2007-02-22 14:25:26 [6298] Adding Hash to
"/var/mail/spamassassin/local/FuzzyOcr.db" with score "9.500"
 2007-02-22 14:25:26 [6298] Words found:
             "erectile" in 1 lines
             "target" in 1 lines
             "erectile" in 1 lines
             "buy" in 1 lines
             "target" in 1 lines
             (7.5 word occurrences found)


in the rec'd message's header, i see only,

 ...
 X-Spam-Report:
   *  0.1 RDNS_NONE Delivered to trusted network by a host with no rDNS
   *  0.0 DK_POLICY_SIGNSOME Domain Keys: policy says domain signs some mails
   *  0.0 DKIM_POLICY_SIGNSOME Domain Keys Identified Mail: policy says domain
   *       signs some mails
   *  0.0 DK_SIGNED Domain Keys: message has a signature
   *  0.0 DKIM_SIGNED Domain Keys Identified Mail: message has a signature
   *  1.0 DC_IMG_TEXT_RATIO BODY: Low body to pixel area ratio
   *  0.0 BAYES_00 BODY: Bayesian spam probability is 0 to 1%
   *      [score: 0.0002]
   *  2.2 TVD_SPACE_RATIO BODY: TVD_SPACE_RATIO
   *  1.2 SARE_GIF_ATTACH FULL: Email has a inline gif
   *  9.5 FUZZY_OCR BODY:
 ...


*again*, with no header 'detail' for the FUZZY_OCR BODY header :-/


since i'm seeing the same 'missing header' biz on both,

  (1) rec'd email proc'd via spamd running on my mailserver
  (2) test file submitted to spamassassin via cmd line,

and, differing behavior for sa v318 & v320, with the same version of
FuzzyOCR, i suspect this is a SA-related issue.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5349] scanning w/ v320-trunk shows diff/missing header displays in FuzzyOCR test output

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5349


schneecrash+apache@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |CLOSED




------- Additional Comments From schneecrash+apache@gmail.com  2007-02-22 17:07 -------
the issue occurs in the presence of sa v320, but not sa v318.  looks like an SA
issue to me.

as far as "complaining?"

you might want to reconsider *asking* people to provide feedback, file bugs, ask
on the list, etc if you consider this "complaining.  you had an opportunity to
participate/comment in irc channel and on the list ... you chose not to.

*now* you complain that i'm complaining?

fix it yourself, if you care.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5349] scanning w/ v320-trunk shows diff/missing header displays in FuzzyOCR test output

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5349


schneecrash+apache@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|CLOSED                      |REOPENED
         Resolution|INVALID                     |




------- Additional Comments From schneecrash+apache@gmail.com  2007-02-22 17:17 -------
.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5349] scanning w/ v320-trunk shows diff/missing header displays in FuzzyOCR test output

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5349


felicity@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID




------- Additional Comments From felicity@apache.org  2007-02-22 16:28 -------
Since you're complaining about a third party plugin, we have no way to debug the
issue.  I'd talk to the author and have them do some testing.  If there's a SA
bug that can be demonstrated, the ticket can be reopened. :)



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5349] scanning w/ v320-trunk shows diff/missing header displays in FuzzyOCR test output

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5349


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|WONTFIX                     |




------- Additional Comments From jm@jmason.org  2007-05-06 05:34 -------
so this is still an issue with released 3.2.0, I hear.  If someone from FuzzyOCR
could post the code they're using to generate multi-line reports, we may be able
to help...



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5349] scanning w/ v320-trunk shows diff/missing header displays in FuzzyOCR test output

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5349


schneecrash+apache@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |WONTFIX




------- Additional Comments From schneecrash+apache@gmail.com  2007-02-22 17:18 -------
.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.