You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spamassassin.apache.org by sp...@incubator.apache.org on 2004/08/01 00:13:01 UTC
[SpamAssassin Wiki] Updated: CeasNotesJustin
Date: 2004-07-31T15:13:00
Editor: JustinMason <jm...@jmason.org>
Wiki: SpamAssassin Wiki
Page: CeasNotesJustin
URL: http://wiki.apache.org/spamassassin/CeasNotesJustin
no comment
Change Log:
------------------------------------------------------------------------------
@@ -296,4 +296,37 @@
* q from John Levine: TurnTide does exactly this technique by narrowing the TCP window on the spammer's connections.
* q: why not just use delayed ACKs? a: because it's not entirely as effective as the other techniques
+AOL hashing:
+
+ * I-Match: large corpus; lexicon generation
+ * intersection of document and lexicon gives signature
+ * trad I-Match lexicon generation: reject v frequent and hapaxes
+ * use "Mutual Information" as a measurement of fitness to avoid overlapping rules
+ * use multiple lexicons to avoid randomization from having an effect
+ * generate multiple lexicons, by removing random entries from an original lexicon
+ * also: distributional word clustering (Information Bottleneck) for lexicon selection (Terms with similar class distribution of P(spam|term))
+ * q: "'cluster' selection" -- is that reports from live users? yep
+ * q: "FP rate?" a: very very low
+
+Distributed, collaborative spam filtering:
+
+ * TCD, yay
+ * definition: "spam is email that the recipient is interested in receiving". we disagree, of course ;)
+ * P2P approach
+
+Reputation network analysis for mail filtering:
+
+ * 75% of semweb data is FOAF files
+ * using web of trust
+ * a bit like http://web-o-trust.org/ , but not yet workable with email addrs since there's no spoofing protection
+
+On attacking statistical spam filters:
+
+ * spammers wanted to evade bayes
+ * tokenization/obfuscation: turn out to be good spamsigns
+ * should not have used SpamArchive spam, due to its lack of headers, in my opinion; headers improve spam recognition greatly
+ * pretty similar to http://www.cs.dal.ca/research/techreports/2004/CS-2004-06.pdf ;)
+
+
+