You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/01/04 20:18:36 UTC
[Bug 2878] Identify when plain text and HTML are different in multipart/alternative
http://bugzilla.spamassassin.org/show_bug.cgi?id=2878
niels@fabel.dk changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |niels@fabel.dk
------- Additional Comments From niels@fabel.dk 2004-01-04 10:23 -------
Is SA able to make text version of the HTML part? It only needs to perform good
on non-spam.
Then you have the problem of determining if the two texts are different.
That problem can be solved with the Levenshtein Distance, it's described on
http://www.merriampark.com/ld.htm
It tells you how much you have to change to make the texts equal.
The algorithm is O(n^2) so it should probably only be run on the first few kb of
the mail or something like that.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.