You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-user@james.apache.org by Steen Jansdal <sj...@buanco.dk> on 2003/11/03 09:22:28 UTC

Using the PRAXIS spam filter

Hi,

I'm trying to use the Spam filter from "Vincenzo Gianferrari Pini 
(Praxis Calcolo S.p.A.)", but I have a couple of questions. I ask
here because I know Vincenzo are among us, and since this could be
interesting for other James users.

* Whitelist: Is it possible to use wildchars in usernames. I would
like to whitelist all mails from a certain domain, no matter which
user from that domain is sending. (Perhaps something like 
*.<remote-domain> ).
And I would like to be able to whitelist a mail address for all my
users in one step. (For example by using a wildchar: *.<mydomain> )

* Spam manager: Is it possible to automatically add a whitelisted mail 
directly to the corpus. This way my users don't have to send non-spam 
mails to the spam manager.

* Spam manager: Are mails identified as spam automatically added to
the corpus. Again to save my users from sending spam mails to the
spam manager. (You know, my users are lazy)

Steen


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


RE: Using the PRAXIS spam filter

Posted by "Noel J. Bergman" <no...@devtech.com>.
> Yes, but in our CRM database we have approx. 1000 companies that
> must be on the whitelist, and mails from them must in *NO* way be
> considered spam. IMO entering and maintaining 1000 companies
> directly in the config file is not the correct way to do it.

Spammers can and do spoof e-mail addresses.  The only thing you can use
verifiably is the IP address, which is why DNS blocklists work, and why many
of us refuse service to DHCP pools.  If you've a requirement to allow e-mail
from those customer domains, you either accept e-mail only from their
outgoing mail servers, or open yourself to spam with spoofed addresses.

If you need to do the latter, you could look at the experimental regex
matchers (which will probably change before Release) in the v2.2 test
builds.  You could, for example, subclass GenericRegexMatcher (which will
probably be renamed to something like AbstractRegexHeaderMatcher unless we
change it to handle the body stream as well) and load patterns directly from
your CRM database.  See FileRegexMatcher for an example.

	--- Noel


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


Re: Using the PRAXIS spam filter

Posted by Steen Jansdal <sj...@buanco.dk>.
Vincenzo Gianferrari Pini wrote:
> Steen,
> 
> sorry for the delay in answering.
> 

What delay? Since this is not a paid support-list,
you are allowed to do other things than just wait
for my stupid questions to answer. :-)

> 
>>* Whitelist: Is it possible to use wildchars in usernames. I would
>>like to whitelist all mails from a certain domain, no matter which
>>user from that domain is sending. (Perhaps something like 
>>*.<remote-domain> ).
> 
> 
> There is no need to insert such "wildchared" addresses in the 
 > hitelist; you can achieve the desired behaviour simply
 > appropriately coding your config file, as I did too
 > (search for "SenderHostIs=xxx.com yyy.org" in the snippet below):
> 

Yes, but in our CRM database we have approx. 1000 companies that
must be on the whitelist, and mails from them must in *NO* way be
considered spam. IMO entering and maintaining 1000 companies
directly in the config file is not the correct way to do it.


> 
>>And I would like to be able to whitelist a mail address for all my
>>users in one step. (For example by using a wildchar: *.<mydomain> )
>>
> 
> 
> This is a good idea, and I will add it to WhiteListManager and IsInWhiteList.
> But what if a user ingenuously answers to a spam message to be removed? 
 > The spammer address would immediately go into the whitelist for
 > many other users. Such option should be used cautiously.
> 

Yes, I see what you mean. In my case these wildchared whitelists will
not be maintained by the user himself, but come directly from the
CRM database.

> 
>>* Spam manager: Is it possible to automatically add a whitelisted mail 
>>directly to the corpus. This way my users don't have to send non-spam 
>>mails to the spam manager.
>>
> 
> 
> Again, you can do that just playing around with the config file, 
 > putting appropriate "not spam" bayesian analysis feeder
 > mailet entries in appropriate places with appropriate matchers.
> But beware: while JDBCBayesianAnalysis is quite fast, the 
 > JDBCBayesianAnalysisFeeder mailet does a lot of work
 > (database activity) and takes several seconds or even
 > minutes to update the statistics in the database for
 > a single message feeded. This is ok as long as the
 > number of spam/not.spam message feedings is low compared
 > to the number of messages analysed, but feeding any
 > whitelisted message would kill everything. Regarding this
 > performance problem, I'm thinking on using a serializable
 > object (with some kind of "asynchronous intermittent
 > lazy writer" and appropriate behaviour against write
 > failures) instead of a database for storing the corpus.
> 

I see the problem and I'll drop my idea.


> 
>>* Spam manager: Are mails identified as spam automatically added to
>>the corpus. Again to save my users from sending spam mails to the
>>spam manager. (You know, my users are lazy)
> 
> 
> Same as above, but moreover IMO it would be dangerous: the 
 > effect would be amplifying "false positives" through a
 > feedback mechanism, quickly ruining the corpus; only
 > "true positives", determined as such in some other way,
 > should be fed as spam, and you can do it simply playing
 > around with the config.xml file.
> 

I see what you mean and this idea is dropped too.

> 
> Vincenzo
> 
> 

Steen


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


RE: Using the PRAXIS spam filter

Posted by Vincenzo Gianferrari Pini <vi...@praxis.it>.
Steen,

sorry for the delay in answering.

> 
> * Whitelist: Is it possible to use wildchars in usernames. I would
> like to whitelist all mails from a certain domain, no matter which
> user from that domain is sending. (Perhaps something like 
> *.<remote-domain> ).

There is no need to insert such "wildchared" addresses in the whitelist; you can achieve the desired behaviour simply appropriately coding your config file, as I did too(search for "SenderHostIs=xxx.com yyy.org" in the snippet below):

         <!-- White list management -->
         <mailet match="All" class="WhiteListManager" onMailetException="ignore">
            <table>db://maildb/whitelist</table>
            <whitelistManagerAddress>whitelist.manager@mycompany.com</whitelistManagerAddress>
            <displayFlag>display</displayFlag>
            <insertFlag>insert</insertFlag>
            <removeFlag>remove</removeFlag>
         </mailet>

         <!-- Anti spam bayesian analysis -->
         <mailet match="All" class="JDBCBayesianAnalysis" onMailetException="ignore">
            <repositoryPath>db://maildb</repositoryPath>
            <hamTable>bayesiananalysis_ham</hamTable>
            <spamTable>bayesiananalysis_spam</spamTable>
            <messageCountsTable>bayesiananalysis_messagecounts</messageCountsTable>
            <spamManagerAddress>spam.manager@mycompany.com</spamManagerAddress>
            <rebuildSubjectFlag>rebuild spam corpus</rebuildSubjectFlag>
            <headerName>X-MessageIsSpamProbability</headerName>
            <ignoreLocalSender>true</ignoreLocalSender>
         </mailet>

         <mailet match="SenderHostIsLocal" class="ToProcessor">
            <processor> transport </processor>
         </mailet>

         <mailet match="IsInWhiteList=db://maildb/whitelist" class="ToProcessor" onMatchException="matchAll">
            <processor> transport </processor>
         </mailet>

         <mailet match="SenderHostIs=xxx.com yyy.org" class="ToProcessor">
            <processor> transport </processor>
         </mailet>

         <mailet match="CompareNumericHeaderValue=X-MessageIsSpamProbability > 0.90" class="AddHeader" onMatchException="noMatch">
            <name>X-MessageIsSpam</name>
            <value>true</value>
         </mailet>

         <mailet match="CompareNumericHeaderValue=X-MessageIsSpamProbability > 0.99" class="ToProcessor" onMatchException="noMatch">
            <processor> spam </processor>
            <notice>Spam not accepted</notice>
         </mailet>

         <!-- Send remaining mails to the transport processor for either local or remote delivery -->
         <mailet match="All" class="ToProcessor">
            <processor> transport </processor>
         </mailet>
      </processor>

> And I would like to be able to whitelist a mail address for all my
> users in one step. (For example by using a wildchar: *.<mydomain> )
> 

This is a good idea, and I will add it to WhiteListManager and IsInWhiteList.
But what if a user ingenuously answers to a spam message to be removed? The spammer address would immediately go into the whitelist for many other users. Such option should be used cautiously.

> * Spam manager: Is it possible to automatically add a whitelisted mail 
> directly to the corpus. This way my users don't have to send non-spam 
> mails to the spam manager.
> 

Again, you can do that just playing around with the config file, putting appropriate "not spam" bayesian analysis feeder mailet entries in appropriate places with appropriate matchers.
But beware: while JDBCBayesianAnalysis is quite fast, the JDBCBayesianAnalysisFeeder mailet does a lot of work (database activity) and takes several seconds or even minutes to update the statistics in the database for a single message feeded. This is ok as long as the number of spam/not.spam message feedings is low compared to the number of messages analysed, but feeding any whitelisted message would kill everything. Regarding this performance problem, I'm thinking on using a serializable object (with some kind of "asynchronous intermittent lazy writer" and appropriate behaviour against write failures) instead of a database for storing the corpus.

> * Spam manager: Are mails identified as spam automatically added to
> the corpus. Again to save my users from sending spam mails to the
> spam manager. (You know, my users are lazy)

Same as above, but moreover IMO it would be dangerous: the effect would be amplifying "false positives" through a feedback mechanism, quickly ruining the corpus; only "true positives", determined as such in some other way, should be fed as spam, and you can do it simply playing around with the config.xml file.

> 
> Steen
> 

Vincenzo


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org