You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Andreas Rust <ru...@webnova.de> on 2005/03/14 17:52:56 UTC

Feeding Bayes aswell

Hello,

we are mostly using Eudora (Windows version) and it's saving "Junk" emails 
(as junked by Eudora itself) into
an .mbx file.
Eventhough the .mbx file looks like an ok-formatted mbox file, it carries:

 From ???@??? Tue Jan 18 15:28:02 2005

Infront of the normal headers.

Such as:

 From ???@??? Tue Jan 18 15:28:02 2005   --- the added line
Return-path: <gr...@canaldata.es>   --- the header as we normally 
expect it ...
Envelope-to: etcetcetcetc....

If I feed that to spamassassin, does that influence scoring in some way or 
would it be ignored completely?
(Or even raise a problem ? :) )
Afterall that is in no way a valid header line.

thx for any pointers


     Andreas Rust     -   webnova GmbH
     rust@webnova.de  -   www.webnova.de
     Tel:  +49 (0)700 - 20 30 7000
     Fax:  +49 (0)700 - 20 30 8000
+:----------------------------------------------------------:+
          www.Synergien-Nutzen.de
          Gemeinsam sind wir stark...


Re: Feeding Bayes aswell

Posted by Kelson <ke...@speed.net>.
Matt Kettler wrote:
> You can't train SA on most messages from Eudora's .mbx files. Speaking 
> as a user of eudora, eudora completely destroys many important parts of 
> a message when it stores it in the mbox, and it cannot be reconstructed.

And this was the main reason that after using Eudora for 8 years, I 
finally switched to Thunderbird.

The import process from Eudora to Thunderbird works pretty well, though 
it obviously can't restore information that isn't there. 
multipart/alternative is, of course, toast, though it does a reasonable 
job of re-attaching attachments (though the original mime 
characteristics are long gone).  I had 4 years of mail to test it with, 
and found a lot of bugs for them to fix in the pre-1.0 days!

I think my favorite Eudora craziness was the fact that outgoing mail 
with signatures is stored as HTML, even if you wrote it as plain text, 
but isn't labeled as HTML.

-- 
Kelson Vibber
SpeedGate Communications <www.speed.net>

Re: Feeding Bayes aswell

Posted by Matt Kettler <mk...@evi-inc.com>.
At 11:52 AM 3/14/2005, Andreas Rust wrote:
>we are mostly using Eudora (Windows version) and it's saving "Junk" emails 
>(as junked by Eudora itself) into
>an .mbx file.

<snip>



You can't train SA on most messages from Eudora's .mbx files. Speaking as a 
user of eudora, eudora completely destroys many important parts of a 
message when it stores it in the mbox, and it cannot be reconstructed.

The biggest problem is that Eudora mangles all multipart messages in a 
non-reversible manner. It only ever saves one mime section in the mbox, and 
strips or discards, everything else. For example in Multipart/alternative 
messages the plain segment is discarded and the html segment is saved. That 
plain segment is gone, and is saved anywhere. Anything with embedded images 
has the embedded images stripped out and saved as separate files. Ditto for 
attachments.

The only messages you can reconstruct are single-part text/plain or 
text/html messages. For those, you can use a script that converts Eudora 
mbx format into standard unix mbox format, such as eudora2unix.pl.