You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Dennis German <DG...@Real-World-Systems.com> on 2009/03/16 18:23:22 UTC

spamassasin: sa-learn --dump magic intrepretation

Is there a document regarding the interpretation of


 > sa-learn --dump magic
config: could not find site rules directory

0.000          0            3          0  non-token data: bayes db  
version
0.000          0       261451          0  non-token data: nspam
0.000          0        18530          0  non-token data: nham
0.000          0       143599          0  non-token data: ntokens

0.000          0  1231533845          0  non-token data: oldest atime
0.000          0  1237223892          0  non-token data: newest atime
0.000          0  1237214668          0  non-token data: last journal  
sync atime
0.000          0  1237059740          0  non-token data: last expiry  
atime

0.000          0    5529600          0  non-token data: last expire  
atime delta

0.000          0       9311          0  non-token data: last expire  
reduction count


Re: spamassasin: sa-learn --dump magic intrepretation

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 16.03.09 13:23, Dennis German wrote:
> Is there a document regarding the interpretation of
> 
> 
> > sa-learn --dump magic
> config: could not find site rules directory

> 0.000          0            3          0  non-token data: bayes db  
> version
> 0.000          0       261451          0  non-token data: nspam
> 0.000          0        18530          0  non-token data: nham


Ohh, that's way too much of spam I'd say. Don't you have much of FPs ?

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux - It's now safe to turn on your computer.
Linux - Teraz mozete pocitac bez obav zapnut.

Re: spamassasin: sa-learn --dump magic intrepretation

Posted by RW <rw...@googlemail.com>.
On Mon, 16 Mar 2009 13:23:22 -0400
Dennis German <DG...@Real-World-Systems.com> wrote:

> Is there a document regarding the interpretation of
> 
> 
>  > sa-learn --dump magic


The are pretty self-explanatory, if you know roughly how Bayes works. 

The first three are the number of hams and spams learned and the total
number of tokens in the database. One of them is the time the journal
was last synched with the bayes database. The rest are concerned with
the automatic  expiry of tokens from the database, to prevent it
growing indefinitely.

Each token has a timestamp which is set from the headers when 
learned and updated when it contributes to the Bayesian probability in
a test. This timestamp is used to age-out the less useful tokens.
There's a detailed description in the sa-learn manpage in the
EXPIRATION section.

Re: spamassasin: sa-learn --dump magic intrepretation

Posted by smfabac <sm...@att.net>.

Michael Scheidell wrote:
> 
>>> Is there a document regarding the interpretation of
>>> 
>>> 
>>>> > sa-learn --dump magic
>>> config: could not find site rules directory
>>> 
>>> 0.000          0            3          0  non-token data: bayes db
>>> version
>>> 0.000          0       261451          0  non-token data: nspam
>>> 0.000          0        18530          0  non-token data: nham
>>> 0.000          0       143599          0  non-token data: ntokens
>>> 
>>> 0.000          0  1231533845          0  non-token data: oldest atime
>>> 0.000          0  1237223892          0  non-token data: newest atime
>>> 0.000          0  1237214668          0  non-token data: last journal
>>> sync
>>> atime
>>> 0.000          0  1237059740          0  non-token data: last expiry
>>> atime
>>>
>>> 0.000          0    5529600          0  non-token data: last expire
>>> atime
>>> delta
>>> 
>>> 0.000          0       9311          0  non-token data: last expire
>>> reduction
>>> count
>>> 
>>> 
>> Let me take a stab at it.
>> The db version is 3
>>
>> You have 261,451 tokens that appeared in Œspam¹.
>> You have 18,530 tokens that appeard in Œham¹
>>
>> You have 143,599 tokens (remember, some tokens could appear in both spam
>> and
>> ham)
>>
>> The oldest token is date -j -f %s 1231533845
>> Fri Jan  9 15:44:05 EST 2009
>>
>> The newest token is date -j -f %s 1237223892
>> Mon Mar 16 13:18:12 EDT 2009
>>
>> The rest should be easy to figure out.
> 
> Two questions: what is the "date" program above that accepts "-j -f %s
> 1231533845"
> (what OS)? Neither Windows or SCO UNIX accepts these options. 
> 
> What about the other fields in the output of dump magic (field 1: 0.000, 
> field 2: and field 4: 0)?  Are they a secret known only to spamassassin
> developers
> and kept secret for some reason?
> 
> 
> 
> -- 
> Michael Scheidell, CTO
>>|SECNAP Network Security
> Finalist 2009 Network Products Guide Hot Companies
> FreeBSD SpamAssassin Ports maintainer
> 
> 
> 
> _________________________________________________________________________
> This email has been scanned and certified safe by SpammerTrap(r). 
> For Information please see http://www.secnap.com/products/spammertrap/
> _________________________________________________________________________
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/spamassasin%3A-sa-learn---dump-magic-intrepretation-tp22543157p27565677.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: spamassasin: sa-learn --dump magic intrepretation

Posted by RW <rw...@googlemail.com>.
On Mon, 16 Mar 2009 14:03:47 -0400
Michael Scheidell <sc...@secnap.net> wrote:

> You have 261,451 tokens that appeared in Œspam¹.
> You have 18,530 tokens that appeard in Œham¹
> 
> You have 143,599 tokens (remember, some tokens could appear in both
> spam and ham)

The first two are actually the total number of spam and ham emails
learned. Most of the tokens have since expired.

Re: spamassasin: sa-learn --dump magic intrepretation

Posted by Matt Kettler <mk...@verizon.net>.
Michael Scheidell wrote:
>
>     Is there a document regarding the interpretation of
>
>
>     > sa-learn --dump magic
>     config: could not find site rules directory
>
>     0.000          0            3          0  non-token data: bayes db
>     version
>     0.000          0       261451          0  non-token data: nspam
>     0.000          0        18530          0  non-token data: nham
>     0.000          0       143599          0  non-token data: ntokens
>
>     0.000          0  1231533845          0  non-token data: oldest atime
>     0.000          0  1237223892          0  non-token data: newest atime
>     0.000          0  1237214668          0  non-token data: last
>     journal sync atime
>     0.000          0  1237059740          0  non-token data: last
>     expiry atime
>
>     0.000          0    5529600          0  non-token data: last
>     expire atime delta
>
>     0.000          0       9311          0  non-token data: last
>     expire reduction count
>
>
> Let me take a stab at it.
> The db version is 3
>
> You have 261,451 tokens that appeared in ‘spam’.
> You have 18,530 tokens that appeard in ‘ham’
Actually, nspam and nham count messages, not tokens. They're also a
count of the total training, and don't "go down" as tokens expire out.
>
> You have 143,599 tokens (remember, some tokens could appear in both
> spam and ham)
Yes, and also you need to account for SA expiring out tokens, and tokens
that occur in multiple messages. (ie: it's not strange that your message
count is higher than your token count).
>
> The oldest token is date -j -f %s 1231533845
> Fri Jan  9 15:44:05 EST 2009
>
> The newest token is date -j -f %s 1237223892
> Mon Mar 16 13:18:12 EDT 2009
>
> The rest should be easy to figure out.
>
> -- 
> Michael Scheidell, CTO
> >|SECNAP Network Security
> Finalist 2009 Network Products Guide Hot Companies
> FreeBSD SpamAssassin Ports maintainer
>
>
> ------------------------------------------------------------------------
>
> This email has been scanned and certified safe by SpammerTrap®.
> For Information please see www.secnap.com/products/spammertrap/
> <http://www.secnap.com/products/spammertrap/>
>
> ------------------------------------------------------------------------
>


Re: spamassasin: sa-learn --dump magic interpretation good/bad/other?

Posted by Dennis German <DG...@Real-World-Systems.com>.
0) Michael, thanks

1) what are the various  zero columns??
for example in  0.000  0  3  0  non-token data: bayes db version

2) Is this good?  not too good? bad? trouble?

++++++++++++++++++++
On Mar 16, 2009, at 14:03, Michael Scheidell wrote:

>> Is there a document regarding the interpretation of
>>
>>
>> > sa-learn --dump magic
>> config: could not find site rules directory
>>
>> 0.000          0            3          0  non-token data: bayes db  
>> version
>> 0.000          0       261451          0  non-token data: nspam
>> 0.000          0        18530          0  non-token data: nham
>> 0.000          0       143599          0  non-token data: ntokens
>>
>> 0.000          0  1231533845          0  non-token data: oldest atime
>> 0.000          0  1237223892          0  non-token data: newest atime
>> 0.000          0  1237214668          0  non-token data: last  
>> journal sync atime
>> 0.000          0  1237059740          0  non-token data: last  
>> expiry atime
>>
>> 0.000          0    5529600          0  non-token data: last expire  
>> atime delta
>>
>> 0.000          0       9311          0  non-token data: last expire  
>> reduction count
>>
>
> The db version is 3
>
> You have 261,451 tokens that appeared in ‘spam’.
> You have 18,530 tokens that appeard in ‘ham’
>
> You have 143,599 tokens (remember, some tokens could appear in both  
> spam and ham)
>
> The oldest token is date -j -f %s 1231533845
> Fri Jan  9 15:44:05 EST 2009
>
> The newest token is date -j -f %s 1237223892
> Mon Mar 16 13:18:12 EDT 2009


Re: spamassasin: sa-learn --dump magic intrepretation

Posted by Michael Scheidell <sc...@secnap.net>.
> Is there a document regarding the interpretation of
> 
> 
>> > sa-learn --dump magic
> config: could not find site rules directory
> 
> 0.000          0            3          0  non-token data: bayes db version
> 0.000          0       261451          0  non-token data: nspam
> 0.000          0        18530          0  non-token data: nham
> 0.000          0       143599          0  non-token data: ntokens
> 
> 0.000          0  1231533845          0  non-token data: oldest atime
> 0.000          0  1237223892          0  non-token data: newest atime
> 0.000          0  1237214668          0  non-token data: last journal sync
> atime
> 0.000          0  1237059740          0  non-token data: last expiry atime
> 
> 0.000          0    5529600          0  non-token data: last expire atime
> delta
> 
> 0.000          0       9311          0  non-token data: last expire reduction
> count
> 
> 
Let me take a stab at it.
The db version is 3

You have 261,451 tokens that appeared in Œspam¹.
You have 18,530 tokens that appeard in Œham¹

You have 143,599 tokens (remember, some tokens could appear in both spam and
ham)

The oldest token is date -j -f %s 1231533845
Fri Jan  9 15:44:05 EST 2009

The newest token is date -j -f %s 1237223892
Mon Mar 16 13:18:12 EDT 2009

The rest should be easy to figure out.

-- 
Michael Scheidell, CTO
>|SECNAP Network Security
Finalist 2009 Network Products Guide Hot Companies
FreeBSD SpamAssassin Ports maintainer



_________________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
_________________________________________________________________________