You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2017/12/10 19:27:46 UTC
[Bug 7519] New: BODY_SINGLE_WORD triggers on base64 encoded text
with more than one word.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7519
Bug ID: 7519
Summary: BODY_SINGLE_WORD triggers on base64 encoded text with
more than one word.
Product: Spamassassin
Version: 3.4.0
Hardware: Other
OS: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Rules
Assignee: dev@spamassassin.apache.org
Reporter: mrl@psfc.mit.edu
Target Milestone: Undefined
Created attachment 5493
--> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5493&action=edit
Email shows problem.
See attachment. There is a paragraph of text in the following mime
attachment, but the it's triggering the "one word text message" rule.
Content-Type: text/plain;Name="text_0.txt";Charset="utf-8"
Content-Disposition: Attachment;Filename="text_0.txt";Charset="utf-8"
Content-Location: text_0.txt
Content-Transfer-Encoding: base64
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7519] BODY_SINGLE_WORD triggers on base64 encoded text with
more than one word.
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7519
Bill Cole <sa...@billmail.scconsult.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |sa-bugz-20080315@billmail.s
| |cconsult.com
Status|NEW |RESOLVED
Resolution|--- |INVALID
--- Comment #1 from Bill Cole <sa...@billmail.scconsult.com> ---
That message is badly malformed. The Content-Type header is invalid (missing
spaces,) there is no MIME-Version header, the Message-ID header is invalid
(missing angle brackets) and some of the putative MIME parts are improperly
encoded into lines an order of magnitude longer than MIME allows.
As a result, there is no formally correct way to parse this message. That any
software can make any sense of it is a tribute to how lenient mail software is.
It is unclear to me why it is hitting BODY_SINGLE_WORD but it is also hitting
HTML_IMAGE_ONLY_20 and BODY_URI_ONLY incorrectly and I expect that all of these
are due to SA being confused by the compound pathology of the message. Note
that the rules it correctly hits (BASE64_LENGTH_79_INF, BAYES_50,
MIME_HEADER_CTYPE_ONLY, MISSING_SUBJECT, and INVALID_MSGID) add up to 5.3, so
even if we figured out precisely how the 3 bogus hits happened and fixed that,
SA would (by default) still call it spam.
The "garbage in, garbage out" principle applies here. It is not a bug for
SpamAssassin to misparse a message that technically has no correct parsing.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7519] BODY_SINGLE_WORD triggers on base64 encoded text with
more than one word.
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7519
Bill Cole <sa...@billmail.scconsult.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|INVALID |DUPLICATE
--- Comment #3 from Bill Cole <sa...@billmail.scconsult.com> ---
Yes, it's Bug #7219
*** This bug has been marked as a duplicate of bug 7219 ***
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7519] BODY_SINGLE_WORD triggers on base64 encoded text with
more than one word.
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7519
RW <rw...@googlemail.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rwmaillists@googlemail.com
--- Comment #2 from RW <rw...@googlemail.com> ---
Actually it is a bug that I pointed-out some time ago - I don't recall the bug
number.
The problem is in
body __BODY_TEXT_LINE /^\s*\S/
body __BODY_TEXT_LINE multiple maxhits=3
the count usually include the Subject line, but only if the header is present
and contain a non-space character.
In the attached email the multi-word paragraph is counted as if it were the
subject.
IMO it should be
body __BODY_TEXT_LINE_FULL /^\s*\S/
body __BODY_TEXT_LINE_FULL multiple maxhits=3
header __SUBJECT_HAS_NON_SPACE Subject =~ /\S/
meta __BODY_TEXT_LINE __BODY_TEXT_LINE_FULL - __SUBJECT_HAS_NON_SPACE
The arithmetic for __BODY_SINGLE_WORD, __BODY_URI_ONLY & __EMPTY_BODY then
needs to be adjusted for __BODY_TEXT_LINE being one smaller.
--
You are receiving this mail because:
You are the assignee for the bug.