You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2017/12/10 19:27:46 UTC

[Bug 7519] New: BODY_SINGLE_WORD triggers on base64 encoded text with more than one word.

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7519

            Bug ID: 7519
           Summary: BODY_SINGLE_WORD triggers on base64 encoded text with
                    more than one word.
           Product: Spamassassin
           Version: 3.4.0
          Hardware: Other
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Rules
          Assignee: dev@spamassassin.apache.org
          Reporter: mrl@psfc.mit.edu
  Target Milestone: Undefined

Created attachment 5493
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5493&action=edit
Email shows problem.

See attachment.   There is a paragraph of text in the following mime
attachment, but the it's triggering the "one word text message" rule.

Content-Type: text/plain;Name="text_0.txt";Charset="utf-8"
Content-Disposition: Attachment;Filename="text_0.txt";Charset="utf-8"
Content-Location: text_0.txt
Content-Transfer-Encoding: base64

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7519] BODY_SINGLE_WORD triggers on base64 encoded text with more than one word.

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7519

Bill Cole <sa...@billmail.scconsult.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sa-bugz-20080315@billmail.s
                   |                            |cconsult.com
             Status|NEW                         |RESOLVED
         Resolution|---                         |INVALID

--- Comment #1 from Bill Cole <sa...@billmail.scconsult.com> ---
That message is badly malformed. The Content-Type header is invalid (missing
spaces,) there is no MIME-Version header, the Message-ID header is invalid
(missing angle brackets) and some of the putative MIME parts are improperly
encoded into lines an order of magnitude longer than MIME allows. 

As a result, there is no formally correct way to parse this message. That any
software can make any sense of it is a tribute to how lenient mail software is.
It is unclear to me why it is hitting BODY_SINGLE_WORD but it is also hitting
HTML_IMAGE_ONLY_20 and BODY_URI_ONLY incorrectly and I expect that all of these
are due to SA being confused by the compound pathology of the message. Note
that the rules it correctly hits (BASE64_LENGTH_79_INF, BAYES_50,
MIME_HEADER_CTYPE_ONLY, MISSING_SUBJECT, and INVALID_MSGID) add up to 5.3, so
even if we figured out precisely how the 3 bogus hits happened and fixed that,
SA would (by default) still call it spam.

The "garbage in, garbage out" principle applies here. It is not a bug for
SpamAssassin to misparse a message that technically has no correct parsing.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7519] BODY_SINGLE_WORD triggers on base64 encoded text with more than one word.

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7519

Bill Cole <sa...@billmail.scconsult.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|INVALID                     |DUPLICATE

--- Comment #3 from Bill Cole <sa...@billmail.scconsult.com> ---
Yes, it's Bug #7219

*** This bug has been marked as a duplicate of bug 7219 ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7519] BODY_SINGLE_WORD triggers on base64 encoded text with more than one word.

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7519

RW <rw...@googlemail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rwmaillists@googlemail.com

--- Comment #2 from RW <rw...@googlemail.com> ---
Actually it is a bug that I pointed-out some time ago - I don't recall the bug
number.

The problem is in 

body __BODY_TEXT_LINE     /^\s*\S/
body __BODY_TEXT_LINE     multiple maxhits=3

the count usually include the Subject line, but only if the header is present
and contain a non-space character. 

In the attached email the multi-word paragraph is counted as if it were the
subject. 

IMO it should be 

body   __BODY_TEXT_LINE_FULL    /^\s*\S/
body   __BODY_TEXT_LINE_FULL    multiple maxhits=3

header __SUBJECT_HAS_NON_SPACE  Subject =~ /\S/

meta   __BODY_TEXT_LINE         __BODY_TEXT_LINE_FULL - __SUBJECT_HAS_NON_SPACE


The arithmetic for __BODY_SINGLE_WORD,  __BODY_URI_ONLY & __EMPTY_BODY then
needs to be adjusted for __BODY_TEXT_LINE being one smaller.

-- 
You are receiving this mail because:
You are the assignee for the bug.