You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/01/15 20:59:01 UTC

[Bug 2931] New: HTML font matching

http://bugzilla.spamassassin.org/show_bug.cgi?id=2931

           Summary: HTML font matching
           Product: Spamassassin
           Version: unspecified
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Rules
        AssignedTo: spamassassin-dev@incubator.apache.org
        ReportedBy: robert@accettura.com


A new tactic used by spammers, is to use HTML, and embed spam into a normal
article.  Something like:

<font>he short drive <B>BUY</B>begins a trek that could take the craft to a
variety of sites of scientifi<B>VIAGRA</B>c interest during the next three
months, including shallow depressions and nearby hills that it observed in
earlier photos.The successful rolloff by Spirit, which came almost two weeks
after its risky landing in Gusev Crater <B>TODAY</B>near the Martian equator,
left mission controllers at NASA's Jet Propulsion Laboratory ecstatic.</font>

The method is to help trip up the bayesian filter, and prevent detection.

My proposal is this:

Extract words according to their font description:

Hence, in the above testcase, all the bold words (<B>) would be put together:
BUY VIAGRA TODAY

It would need to be somwhat advanced to truly perform this task:
be aware of CSS, and know that for example:
#fff = #ffffff = rgb(255,255,255) = rgb(100%,100%,100%)

But this method, could prove successful in helping to eliminate this spamming
tactic.  By ordering the text, based on font description, it would be no longer
be vulnerable to learning bogus data.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.