You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@spamassassin.apache.org on 2020/10/26 01:45:41 UTC
[Bug 7866] New: TextCat: Improper language classification on URIs in
plain/text
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7866
Bug ID: 7866
Summary: TextCat: Improper language classification on URIs in
plain/text
Product: Spamassassin
Version: 3.4.4
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Plugins
Assignee: dev@spamassassin.apache.org
Reporter: jad@aesir.com
Target Milestone: Undefined
Textcat can improperly classify text including URI in the plain text portion of
a message. Here is a sample that was tagged as UNWANTED_BODY_TEXT (classified
as sk & cs for this example):
-------
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
A Weekly Review from AWS
Featured Announcements
Amazon Aurora enables dynamic resizing for database storage space =
=
=20
<https://email.awscloud.com/dc/KwqiTCOQ16Q1JCi3MdelD5Wf1a0xWVUzLJ8TEakWNHdl=
A7N3nIa2aQWviXRrQW0g0Nzk3qf9Jbd_7Br-VcC96_vVOrK4bJqlew1KGbdQmMLIlhsNLtVFFTo=
o0oG_f9iDbFtXhfHuZSrhIpoERCR4a4jOBbGd629KotGGay-7-sKFDTCWVGisnhbxOeaG-rBvct=
WHIpIaAIuHyhj21BdtQbvqu9vEkOLb4i9f5WJzjdvSttMrYY5mQQiiAxDzWx90K16R7A5hk3kuc=
4mmg5ogqliI9wKd7lBG1qX0Uis2H9tTvIbdKJhEU2XcxTXIVK0l2bb1qlYvipE7NL9dS516_m6n=
76Y0b_DoQp07kQfyQE3Cm-s5tpwt4oOzzjMzZvKLprHcLw3Lb7I6Pp_5WIyD-ze1ZmT5cFKEF7D=
C_c-BH24c5m2mByYrBLRlvBsCPRNQhAPdJZi-geOf6Jf9J_oeDFtxQwo1yU94FVAb3AB9MBU=3D=
/hOoW4pZ0lT000kthk000MCE>
--------
A fix would be to strip URIs from the body text before classification.
my $body = $msg->get_rendered_body_text_array();
$body = join("\n", @{$body});
$body =~ s/^Subject://i;
# %%% Make sure that there are no URIs to be evaluated here.
$body =~ s/https?:\/\/\S+//g; # BUG fix
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7866] TextCat: Improper language classification on URIs in
plain/text
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7866
Giovanni Bechis <gi...@paclan.it> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
CC| |giovanni@paclan.it
Status|NEW |RESOLVED
--- Comment #1 from Giovanni Bechis <gi...@paclan.it> ---
A similar fix was present in trunk, now backported to 3.4 tree in r1883069.
--
You are receiving this mail because:
You are the assignee for the bug.