You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Ky...@transplace.com on 2005/02/17 00:32:34 UTC

how to read sa-learn --dump output

Could someone help me determine what these fields represent?

0.001          0         48 1108391722  H*M:hpb
0.001          0         48 1108391722  H*M:hpbrm
0.001          0         55 1108391737  H*r:sk:PUBLIC.
0.001          0         56 1108391737  H*F:U*mthomason
0.001          0         57 1108391737  H*F:D*halfpricebooks.com
0.001          0         57 1108391737  H*r:66.250.20

0.013          0          4 1108391715  awfully
0.013          0          4 1108391715  dudes
0.013          0          4 1108391715  hostile
0.013          0          4 1108391722  Someplace
0.013          0          4 1108391737  drinker

0.993          7          0 1107371048  attn
0.993          7          0 1107442179  H*p:D*dartmail.net
0.993          7          0 1107442194  Levitra
0.993          7          0 1107527893  leveraging

1.000        166          0 1108588775  ball
1.000        166          0 1108588775  casualties
1.000        166          0 1108588775  decisive
1.000        166          0 1108588775  impossible
1.000        166          0 1108588775  injuries
1.000        166          0 1108588775  lies


I assume the far left field is the score bayes will apply?  Middle maybe
the number of time this token has been encountered in the learning process
(but that doesn't explain the 0's), I don't know the next field, and maybe
the date and then the token?

Could someone clarify this a bit for me?

Thanks,


Kyle Reynolds
972-731-4731
KyleReynolds@Transplace.com




Re: how to read sa-learn --dump output

Posted by Henk van Lingen <he...@cs.uu.nl>.
On Thu, Feb 17, 2005 at 01:41:05PM +0100, Henk van Lingen wrote:
  >   > 
  >   > spam_probability, #_in_spam, #_in_ham, timestamp, token
  > 
  >   How do you produce the tokens in readable form? When I do this, I get:
  > 
  >   1.000        332          1 1108642657  463fa0e5c1

  Oke, reading another thread I understand it's being hashed since 3.x.
  So I suppose there no way going back to the text :-( ?

  Cheers,

-- 
Henk van Lingen, Systems & Network Administrator              (o-      -+
Dept. of Computer Science, Utrecht University.                /\        |
phone: +31-30-2535278                                        v_/_
http://henk.vanlingen.net/             http://www.tuxtown.net/netiquette/

Re: how to read sa-learn --dump output

Posted by Henk van Lingen <he...@cs.uu.nl>.
On Wed, Feb 16, 2005 at 07:33:21PM -0500, Theo Van Dinter wrote:
  > On Wed, Feb 16, 2005 at 05:32:34PM -0600, KyleReynolds@transplace.com wrote:
  > > Could someone help me determine what these fields represent?
  > > 
  > > 0.001          0         48 1108391722  H*M:hpb
  > >
  > > Could someone clarify this a bit for me?
  > 
  > spam_probability, #_in_spam, #_in_ham, timestamp, token

  Hi,

  How do you produce the tokens in readable form? When I do this, I get:

  adonis:/users/henkvl 538% sa-learn --dump data | sort -nr | head
  1.000        332          1 1108642657  463fa0e5c1
  1.000        234          2 1108584021  3897973044
  1.000        219          2 1108584021  b8fd60299c
  1.000        103          0 1108435322  6b5b7358b4
  0.999         87          1 1108629105  0537c8cf38
  0.999         77          0 1108567242  68bb99bda7
  0.999         75          0 1108629105  5c1d162935
  0.999         71          2 1108621927  760d79b69d
  0.999         68          1 1107940121  4d7c962875
  0.999         66          2 1108637074  c77440b295

  Cheers,

-- 
Henk van Lingen, Systems & Network Administrator              (o-      -+
Dept. of Computer Science, Utrecht University.                /\        |
phone: +31-30-2535278                                        v_/_
http://henk.vanlingen.net/             http://www.tuxtown.net/netiquette/

Re: how to read sa-learn --dump output

Posted by Theo Van Dinter <fe...@kluge.net>.
On Wed, Feb 16, 2005 at 05:32:34PM -0600, KyleReynolds@transplace.com wrote:
> Could someone help me determine what these fields represent?
> 
> 0.001          0         48 1108391722  H*M:hpb
>
> Could someone clarify this a bit for me?

spam_probability, #_in_spam, #_in_ham, timestamp, token

-- 
Randomly Generated Tagline:
Oh my God, someone's trying to kill me!  Oh wait, it's for Bart.
 
 		-- Homer Simpson
 		   Cape Feare