You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Andreas Rust <ru...@webnova.de> on 2005/03/14 17:52:56 UTC
Feeding Bayes aswell
Hello,
we are mostly using Eudora (Windows version) and it's saving "Junk" emails
(as junked by Eudora itself) into
an .mbx file.
Eventhough the .mbx file looks like an ok-formatted mbox file, it carries:
From ???@??? Tue Jan 18 15:28:02 2005
Infront of the normal headers.
Such as:
From ???@??? Tue Jan 18 15:28:02 2005 --- the added line
Return-path: <gr...@canaldata.es> --- the header as we normally
expect it ...
Envelope-to: etcetcetcetc....
If I feed that to spamassassin, does that influence scoring in some way or
would it be ignored completely?
(Or even raise a problem ? :) )
Afterall that is in no way a valid header line.
thx for any pointers
Andreas Rust - webnova GmbH
rust@webnova.de - www.webnova.de
Tel: +49 (0)700 - 20 30 7000
Fax: +49 (0)700 - 20 30 8000
+:----------------------------------------------------------:+
www.Synergien-Nutzen.de
Gemeinsam sind wir stark...
Re: Feeding Bayes aswell
Posted by Kelson <ke...@speed.net>.
Matt Kettler wrote:
> You can't train SA on most messages from Eudora's .mbx files. Speaking
> as a user of eudora, eudora completely destroys many important parts of
> a message when it stores it in the mbox, and it cannot be reconstructed.
And this was the main reason that after using Eudora for 8 years, I
finally switched to Thunderbird.
The import process from Eudora to Thunderbird works pretty well, though
it obviously can't restore information that isn't there.
multipart/alternative is, of course, toast, though it does a reasonable
job of re-attaching attachments (though the original mime
characteristics are long gone). I had 4 years of mail to test it with,
and found a lot of bugs for them to fix in the pre-1.0 days!
I think my favorite Eudora craziness was the fact that outgoing mail
with signatures is stored as HTML, even if you wrote it as plain text,
but isn't labeled as HTML.
--
Kelson Vibber
SpeedGate Communications <www.speed.net>
Re: Feeding Bayes aswell
Posted by Matt Kettler <mk...@evi-inc.com>.
At 11:52 AM 3/14/2005, Andreas Rust wrote:
>we are mostly using Eudora (Windows version) and it's saving "Junk" emails
>(as junked by Eudora itself) into
>an .mbx file.
<snip>
You can't train SA on most messages from Eudora's .mbx files. Speaking as a
user of eudora, eudora completely destroys many important parts of a
message when it stores it in the mbox, and it cannot be reconstructed.
The biggest problem is that Eudora mangles all multipart messages in a
non-reversible manner. It only ever saves one mime section in the mbox, and
strips or discards, everything else. For example in Multipart/alternative
messages the plain segment is discarded and the html segment is saved. That
plain segment is gone, and is saved anywhere. Anything with embedded images
has the embedded images stripped out and saved as separate files. Ditto for
attachments.
The only messages you can reconstruct are single-part text/plain or
text/html messages. For those, you can use a script that converts Eudora
mbx format into standard unix mbox format, such as eudora2unix.pl.