You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Dan Barker <db...@visioncomm.net> on 2004/10/27 19:59:36 UTC
SA-Learn input format?
I've been running SA for about a week now, and need to sa-(un)learn the FPs.
My system is Windoze/IMail (5sp4/8.13) and the harry and susan (shouldn't
call them Ham and Spam, should I) folders contain all mis-identified email
in one giant flat file each.
Does this work?
Must I bust them up into separate emails before calling sa-learn?
The doc mentions the folders but says diddly-squat/infinity about the
contents of those folders.
Dan Barker
Format of a big flat file:
>>From <db...@visioncomm.net> Thu Oct 21 17:17:58 2004
Received: from dan [172.27.0.30] by visioncomm.net with ESMTP
(SMTPD32-8.13) id A7823A3001E; Thu, 21 Oct 2004 17:17:54 -0400
From: "Dan Barker" <db...@visioncomm.net>
To: <su...@visioncomm.net>
... rest of headers
<HTML>
<TITLE></TITLE>
<BODY >
... rest of message
>>From <db...@visioncomm.net> Thu Oct 21 17:44:42 2004
Received: from dan [172.27.0.30] by visioncomm.net with ESMTP
(SMTPD32-8.13) id ADCA1BD007C; Thu, 21 Oct 2004 17:44:42 -0400
From: "Dan Barker" <db...@visioncomm.net>
To: <su...@visioncomm.net>
Subject: Dbarker, Served in the MlLlTARY?
... rest of headers
This is a multi-part message in MIME format.
------=_NextPart_000_03B4_01C4B795.A68C01E0
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 7bit
... rest of message
...
... for every email in the "box".
The Headers stop and Body begins on the first blank line.
I haven't figured out how the body ends yet. It appears to be the "From < in
column 1". Yeah, that's it. I just ran a test with "From <" in column 1, and
the email is stored with ">From <" instead. So, a splitter will be trivial
to write, but must I?
Re: SA-Learn input format?
Posted by Theo Van Dinter <fe...@kluge.net>.
On Wed, Oct 27, 2004 at 01:59:36PM -0400, Dan Barker wrote:
> My system is Windoze/IMail (5sp4/8.13) and the harry and susan (shouldn't
> call them Ham and Spam, should I) folders contain all mis-identified email
> in one giant flat file each.
>
> Does this work?
If the file format is correct, sure.
> The doc mentions the folders but says diddly-squat/infinity about the
> contents of those folders.
Well, it does actually. sa-learn supports mbox and mbx files.
> Format of a big flat file:
>
> >From <db...@visioncomm.net> Thu Oct 21 17:17:58 2004
[...]
> >From <db...@visioncomm.net> Thu Oct 21 17:44:42 2004
this is almost mbox, except the mbox separator is escaped which won't work.
> the email is stored with ">From <" instead. So, a splitter will be trivial
> to write, but must I?
You could rewrite the ">From <...>" to be "From <...>", and then it's
apparently just an mbox file. :)
--
Randomly Generated Tagline:
"When all else fails, kick with lunar boot." - James Burke