You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2010/02/01 15:50:11 UTC

[Bug 6317] New: Enhancement: include sender text in the message body so body and uri tests can scan it

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6317

           Summary: Enhancement: include sender text in the message body
                    so body and uri tests can scan it
           Product: Spamassassin
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: Libraries
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: martin@gregorie.org


Some spam carries its payload as the sender's personal name. The rest of the
user-writable message, i.e. the subject line and the message body, are both
filled with random gibberish. There is often a URL in this text that can't be
recognised as a URL or processed as one without using an expensive raw scan. 

If body and uri tests can be applied to this text the same way as they are to
the Subject header text we can easily write rules that fire on phrases and URLs
in the From: header without adding much overhead to the scanning process.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6317] Enable URI testing in From: headers

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6317

Adam Katz <an...@khopis.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Enhancement: include sender |Enable URI testing in From:
                   |text in the message body so |headers
                   |body and uri tests can scan |
                   |it                          |

--- Comment #4 from Adam Katz <an...@khopis.com> 2010-02-01 15:08:59 UTC ---
Okay, let's separate the two bugs.

Bug 6315 primarily focuses on spammy text in the From field.
Old name: "New spam type with drugs promo in envelope From: string"
New name: "Detect spammy words like drug promos in From: headers"

Bug 6317 primarily focuses on uri patterns in the From field.
Old name: "Enhancement: include sender text in the message body so body and uri
tests can scan it"
New name: "Enable URI testing in From: headers"

That puts half of this bug's scope into bug 6315 instead.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6317] Enhancement: include sender text in the message body so body and uri tests can scan it

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6317

--- Comment #3 from Karsten Bräckelmann <gu...@rudersport.de> 2010-02-01 11:07:47 UTC ---
While not the same request, bug 6315 is about the very same recent pattern as
discussed here. Candidate for DUPE.

Personally, while URIs in the From:name header are quite suspect on its own
(though I do have seen it being used in legit mail), I like the idea of
harvesting the URIs for URI DNSBL checks.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6317] Enhancement: include sender text in the message body so body and uri tests can scan it

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6317

--- Comment #2 from Adam Katz <an...@khopis.com> 2010-02-01 10:51:16 UTC ---
(In reply to comment #1)
> Furthermore, URI detection for the From header may be a frivolous exercise, as
> my tests at http://ruleqa.spamassassin.org/?rule=/FROM_W&srcpath=khop seem to
> indicate that *any* URI in this location is itself a strong an indicator of
> spam.  Further parsing is therefore unnecessary.

Sorry, that should begin with "Furthermore, URI *decoding* for the From header
may be a frivolous exercise," as my rules detect URIs and call it a done deal
without further investigation, and the numbers back them up.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6317] Enhancement: include sender text in the message body so body and uri tests can scan it

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6317

Adam Katz <an...@khopis.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |antispam@khopis.com

--- Comment #1 from Adam Katz <an...@khopis.com> 2010-02-01 10:46:00 UTC ---
This stems from a list conversation archived at
http://old.nabble.com/forum/ViewPost.jtp?post=27384882&framed=y and my tests
were also mentioned in another thread from last week at
http://old.nabble.com/forum/ViewPost.jtp?post=27328212&framed=y

I'm not sure I agree with the full concept though, and I think my participatory
remarks may have been misread.

Bayesian rules already examine From and Subject fields in addition to the body,
and they rightly mark the collected words with the field name (e.g. "from:adam"
is a word plucked by Bayes when it sees "Adam Katz" in the From header, with
the colon being a forbidden character in standard word parsing.  This is not
necessarily the exact mechanism SA uses to delimit, but it is close.)

The topic that spurred this request was related to spamvertised websites that
appear in the From header rather than the body and thus are immune to SA's uri
detection.  Martin has abstracted this idea to all body tests, which may not be
as wise.

Furthermore, URI detection for the From header may be a frivolous exercise, as
my tests at http://ruleqa.spamassassin.org/?rule=/FROM_W&srcpath=khop seem to
indicate that *any* URI in this location is itself a strong an indicator of
spam.  Further parsing is therefore unnecessary.

Publishing this rule with SA before legit mail starts clutching this concept
might deter its adoption.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6317] Enable URI testing in From: headers

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6317

--- Comment #5 from Karsten Bräckelmann <gu...@rudersport.de> 2010-03-07 16:12:01 UTC ---
Created an attachment (id=4698)
 --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4698)
Parse URIs in From:name, the real-name part of the From header.

Quick hack that enables parsing of URIs out of From:name. Ripped directly from
a live running 3.2 system.

Does not directly apply to 3.3 or trunk. The for loop in the function
_get_parsed_uri_list changed slightly in 3.3. Porting to trunk should be
straight forward, though.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.