You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/01/04 20:18:36 UTC

[Bug 2878] Identify when plain text and HTML are different in multipart/alternative

http://bugzilla.spamassassin.org/show_bug.cgi?id=2878

niels@fabel.dk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |niels@fabel.dk



------- Additional Comments From niels@fabel.dk  2004-01-04 10:23 -------
Is SA able to make text version of the HTML part? It only needs to perform good 
on non-spam.

Then you have the problem of determining if the two texts are different.

That problem can be solved with the Levenshtein Distance, it's described on 
http://www.merriampark.com/ld.htm

It tells you how much you have to change to make the texts equal.

The algorithm is O(n^2) so it should probably only be run on the first few kb of 
the mail or something like that.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.