You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/01/15 20:59:01 UTC
[Bug 2931] New: HTML font matching
http://bugzilla.spamassassin.org/show_bug.cgi?id=2931
Summary: HTML font matching
Product: Spamassassin
Version: unspecified
Platform: Other
OS/Version: other
Status: NEW
Severity: normal
Priority: P5
Component: Rules
AssignedTo: spamassassin-dev@incubator.apache.org
ReportedBy: robert@accettura.com
A new tactic used by spammers, is to use HTML, and embed spam into a normal
article. Something like:
<font>he short drive <B>BUY</B>begins a trek that could take the craft to a
variety of sites of scientifi<B>VIAGRA</B>c interest during the next three
months, including shallow depressions and nearby hills that it observed in
earlier photos.The successful rolloff by Spirit, which came almost two weeks
after its risky landing in Gusev Crater <B>TODAY</B>near the Martian equator,
left mission controllers at NASA's Jet Propulsion Laboratory ecstatic.</font>
The method is to help trip up the bayesian filter, and prevent detection.
My proposal is this:
Extract words according to their font description:
Hence, in the above testcase, all the bold words (<B>) would be put together:
BUY VIAGRA TODAY
It would need to be somwhat advanced to truly perform this task:
be aware of CSS, and know that for example:
#fff = #ffffff = rgb(255,255,255) = rgb(100%,100%,100%)
But this method, could prove successful in helping to eliminate this spamming
tactic. By ordering the text, based on font description, it would be no longer
be vulnerable to learning bogus data.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.