You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/04/29 20:51:17 UTC

[Bug 3333] New: RFE: Salt bayes token hash for privacy

http://bugzilla.spamassassin.org/show_bug.cgi?id=3333

           Summary: RFE: Salt bayes token hash for privacy
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: Learner
        AssignedTo: spamassassin-dev@incubator.apache.org
        ReportedBy: sidney@sidney.com


I'm not sure how important this is, but I want to make sure it is called out
expkicitly for discussion.

Once we hash the bayes tokens, the db becomes less privacy sensitive. Without a
translation database, it is difficult for someone to, for example, get a list of
all people or topics that are regularly in your ham.

But what is still possible is for someone to probe the db for matches to
specific words by creatig the hash for a word and looking that up. That is still
a privacy concern.

The way to fix that is by incorporating some kind of salt with the hash, so that
we use sha1(token . salt) instead of sha1(token), where salt is a unique 20 byte
 number for each db.

The number would be stored somewhere that would not be copied over with the
database if the database were to be given away.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.