You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-user@james.apache.org by David Legg <da...@searchevent.co.uk> on 2007/01/05 17:05:54 UTC

Best way to feed ham or spam to BayesianAnalysis mailet?

I've just upgraded to James 2.3.0 from 2.2.0 to take advantage of the 
Bayesian filter amongst other things.  Apart from a hiccup with the 
MySQL driver not being included in the tar file (despite what it says on 
the web page [1]) the upgrade was very smooth.

Now I want to wage war on spam!  I've used the example config.xml to 
create a 'spam' and 'not.spam' email address which seems to work.  
However I'm using a Thunderbird client to forward the emails to these 
two addresses and it occurs to me that thunderbird is forwarding emails 
by creating an attachment out of the original received message.  Isn't 
the Bayesian analysis filter going to eventually think that any emails 
from me are spam rather than analysing the actual message in the attachment?

Regards,
David Legg.

-----------------------
[1] The web page 
[http://james.apache.org/server/2.3.0/using_database.html] states: 
Please note that a MySQL driver is included as part of the James 
distribution and so there is no need to add such a driver to the lib 
directory.  But the download file [james-with-phoenix-2.3.0-src.tar.gz] 
doesn't actually contain it.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


Re: Best way to feed ham or spam to BayesianAnalysis mailet?

Posted by Danny Angus <da...@gmail.com>.
On 1/5/07, David Legg <da...@searchevent.co.uk> wrote:
> I've just upgraded to James 2.3.0 from 2.2.0 to take advantage of the
> Bayesian filter amongst other things.  Apart from a hiccup with the
> MySQL driver not being included in the tar file (despite what it says on
> the web page [1]) the upgrade was very smooth.

I just noticed that today, I *promise* to update the docs.

>
> Now I want to wage war on spam!  I've used the example config.xml to
> create a 'spam' and 'not.spam' email address which seems to work.
> However I'm using a Thunderbird client to forward the emails to these
> two addresses and it occurs to me that thunderbird is forwarding emails
> by creating an attachment out of the original received message.  Isn't
> the Bayesian analysis filter going to eventually think that any emails
> from me are spam rather than analysing the actual message in the attachment?

No but first can't you make it forward them like a reply?
Alternatively can you "reply" and change the address to the feeder address?

The reason why it won't block your address and things is because there
should be an equivalent amount of mail with your details in the ham
and the spam, which means that there is a tendency for that stuff to
be "neutral"

d.

---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


Re: Best way to feed ham or spam to BayesianAnalysis mailet?

Posted by Stefano Bagnara <ap...@bago.org>.
David Legg wrote:
> Now I want to wage war on spam!  I've used the example config.xml to 
> create a 'spam' and 'not.spam' email address which seems to work.  
> However I'm using a Thunderbird client to forward the emails to these 
> two addresses and it occurs to me that thunderbird is forwarding emails 
> by creating an attachment out of the original received message.  Isn't 
> the Bayesian analysis filter going to eventually think that any emails 
> from me are spam rather than analysing the actual message in the 
> attachment?

I suggest you the use of the "Fast Mail Redirect" thunderbird extension.
This will allow you add a couple of links to the message pane to send 
copies of the message to the spam and not.spam folders.

Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


Re: Best way to feed ham or spam to BayesianAnalysis mailet?

Posted by Danny Angus <da...@gmail.com>.
> I've only added a relatively small corpus so far and the Bayesian filter
> is doing a fantastic job.

It is pretty awesome isn't it?
Respect to Vincenzo for his efforts.

d.

---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


Re: Best way to feed ham or spam to BayesianAnalysis mailet?

Posted by David Legg <da...@searchevent.co.uk>.
> With James 2.2.0 if my users left their account for longer than a day 
> the network connection would time out waiting for the SMTP 'LIST' 
> command to respond.

It's late... I meant POP3 'LIST' command. :-8

- David.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


Re: Best way to feed ham or spam to BayesianAnalysis mailet?

Posted by Norman Maurer <nm...@byteaction.de>.
Danny Angus schrieb:
>> I've only added a relatively small corpus so far and the Bayesian filter
>> is doing a fantastic job.
>
> It is pretty awesome isn't it?
> Respect to Vincenzo for his efforts.
>
> d.
>
That was why i asked for a subproject ;-)

bye
Norman



---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


Re: Best way to feed ham or spam to BayesianAnalysis mailet?

Posted by David Legg <da...@searchevent.co.uk>.
Oh! this is heaven ;-)

I've only added a relatively small corpus so far and the Bayesian filter 
is doing a fantastic job.  Strangely, it is doing a *much* better job at 
detecting spam than the built-in junk detector that comes with 
Thunderbird.  That junk detector keeps passing many of the penny stock 
spams no matter how many I mark as junk.

With James 2.2.0 if my users left their account for longer than a day 
the network connection would time out waiting for the SMTP 'LIST' 
command to respond.  Now with 2.3.0 they should be able to go for *weeks*.

Thanks so much for all your hard work.

- David.



---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


Re: Best way to feed ham or spam to BayesianAnalysis mailet?

Posted by David Legg <da...@searchevent.co.uk>.
I found the answer to my question on the wiki... 
http://wiki.apache.org/james/Bayesian_Analysis

It seems I've been doing the right thing... feeding spam or ham messages 
as attachments is the correct procedure.

- David.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


Re: Best way to feed ham or spam to BayesianAnalysis mailet?

Posted by Norman Maurer <nm...@byteaction.de>.
David Legg schrieb:
> I found the answer to my question on the wiki...
> http://wiki.apache.org/james/Bayesian_Analysis
>
> It seems I've been doing the right thing... feeding spam or ham
> messages as attachments is the correct procedure.
>
> - David.
>

Right,

in next major release we will include support for training via corpus
with RemoteManager and JMX. Its allready in trunk

bye
Norman




---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org