You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@community.apache.org by "Kevin A. McGrail (JIRA)" <ji...@apache.org> on 2018/02/04 10:40:00 UTC

[jira] [Created] (COMDEV-260) SpamAssassin Bayes Token ID

Kevin A. McGrail created COMDEV-260:
---------------------------------------

             Summary: SpamAssassin Bayes Token ID
                 Key: COMDEV-260
                 URL: https://issues.apache.org/jira/browse/COMDEV-260
             Project: Community Development
          Issue Type: Project
            Reporter: Kevin A. McGrail


From DFS idea used with permission:

We tokenize inbound messages and store the tokens on the server. In each message, we add links for doing training. When you click on a training link, the system trains the message based on the tokens stored on the server. In that way, you are training using exactly the tokens that the Bayes code saw. 

For SA, the key point is a framework to store the Bayesian tokens from the email before delivery of the email so later, a "this is spam" "this is ham" mechanism can take advantage of that information without having the entire email.

Adding a header with the message id for the storage of the headers allows a framework to be built for train as spam, train as ham to be more readily built.

The issues you are pointing to have to deal more with the implementation of the this is spam/this is ham mechanism.

By storing just the tokens, there is less space and privacy & legal concerns are mitigated.

sa-learn would then be extended to use the message id and learn as spam/ham instead of feeding it the entire message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org