You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/01/15 20:42:34 UTC
[Bug 2929] New: Suggested rule - filtering out invalid HTML tags
http://bugzilla.spamassassin.org/show_bug.cgi?id=2929
Summary: Suggested rule - filtering out invalid HTML tags
Product: Spamassassin
Version: 2.61
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P5
Component: Rules
AssignedTo: spamassassin-dev@incubator.apache.org
ReportedBy: seann@herdejurgen.com
A number of spam e-mails I receive contain invalid HTML tags. For example:
We</defensible> be</squashy>lie</thou>ve</eigenspace> orde</tercel>ring me=
</xavier>dication should be</glen> as simple</bedimming> as orde</bellini>=
ring anything e</ersatz>lse</cypriot> on the</postfix> Inte</priscilla>rne=
Since there are only about 100 valid HTML tags, you could check every tag of
the form </TAG> and see if they are valid or not. If the percentage of
invalid tags is greater than some number, say 50%, then set the rule to true.
Another possible rule would be to 'de-html' a message before checking for
words. I have a short dehtml script written in Perl here:
while (<>) { s/<.*?>//gs; print $_; }
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.