You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/08/12 20:26:52 UTC
[Bug 3680] New: Empty A HREF tag obfuscations
http://bugzilla.spamassassin.org/show_bug.cgi?id=3680
Summary: Empty A HREF tag obfuscations
Product: Spamassassin
Version: 2.64
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P5
Component: Rules
AssignedTo: spamassassin-dev@incubator.apache.org
ReportedBy: hsteoh@debian.org
Hi,
I've discovered another HTML obfuscation technique used in the latest spams.
The following is a snippet of a HTML spam I got:
----BEGIN QUOTE----
<font face="verdana" size="+3">A permanent fix to Pe<a href></a>nis Enla<a
href></a>rgement</font>
----END QUOTE----
As you can see, the <a href></a> tag is being used to obfuscate key spamwords
in the message. Apparently, the spammers have caught on with the backhair.cf
SA ruleset, which catches the use of embedded non-existent tags to hide spam
words, so now they are using legal tags which have no effect, instead.
As a pre-emptive measure, perhaps SA should strip out all empty tags (i.e.,
anything that looks like <tagname ...></tagname>, which contains nothing in
between), before passing it to the text analysis rules.
And specifically, there should be a rule to penalize the use of such tags.
Here are 2 rules I made to catch suspicious <A HREF> tags:
rawbody EMPTY_LINK_2 /\<a\b[^\>]*\bhref\s*([^=]|=\s*\"\s*\")/i
describe EMPTY_LINK_2 HTML anchor has empty or missing HREF value
rawbody EMPTY_LINK_3 /\<a\b[^\>]*\>\s*\<\/a\>/i
describe EMPTY_LINK_3 HTML anchor has empty text body
I know we're trying to move away from rawbody tests; but I don't know SA's
HTML-parsing code well enough to write an eval test.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.