You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Mário Gamito <mg...@telbit.pt> on 2007/04/13 15:36:42 UTC

sa-learn question about number of messages processed

Hi,

How can i know how many messages did already sa-learn processed ?

Thanks in advance.

Warm Regards
-- 
:wq! Mário Gamito

Re: sa-learn: have i seen this before?

Posted by Phil Dibowitz <ph...@ipom.com>.
Phil Barnett wrote:
> On Tuesday 17 April 2007 00:17, Faisal N Jawdat wrote:
>> On Apr 16, 2007, at 9:34 PM, Matt Kettler wrote:
>>> Try to learn it, if it comes back with something to the affect of:
>>> "learned from 0 messages, processed 1.." then it's already been
>>> learned.
>> this seems to be the common suggestion.
>>
>> it has a couple drawbacks, as i see it:
>>
>> 1.  it's relatively cpu-intensive if i want to do it all the time
>> (e.g. scan my spam folder to learn only the messages which haven't
>> already been learned)
> 
> Move the messages to a different folder after you learn them.

This is what I do. Well, sort of. I copy the mbox file to a temp file, and
then echo '' > MissedSpam and then run sa-learn on the copy. Once sa-learn
is done, I remove the temp file.

If you're using MailDir it's even easier...

-- 
Phil Dibowitz                             phil@ipom.com
Open Source software and tech docs        Insanity Palace of Metallica
http://www.phildev.net/                   http://www.ipom.com/

"Never write it in C if you can do it in 'awk';
 Never do it in 'awk' if 'sed' can handle it;
 Never use 'sed' when 'tr' can do the job;
 Never invoke 'tr' when 'cat' is sufficient;
 Avoid using 'cat' whenever possible" -- Taylor's Laws of Programming



Re: sa-learn: have i seen this before?

Posted by Phil Barnett <ph...@philb.us>.
On Tuesday 17 April 2007 00:17, Faisal N Jawdat wrote:
> On Apr 16, 2007, at 9:34 PM, Matt Kettler wrote:
> > Try to learn it, if it comes back with something to the affect of:
> > "learned from 0 messages, processed 1.." then it's already been
> > learned.
>
> this seems to be the common suggestion.
>
> it has a couple drawbacks, as i see it:
>
> 1.  it's relatively cpu-intensive if i want to do it all the time
> (e.g. scan my spam folder to learn only the messages which haven't
> already been learned)

Move the messages to a different folder after you learn them.

-- 
Phil Barnett
AI4OF
SKCC #600

Re: sa-learn: have i seen this before?

Posted by Faisal N Jawdat <fa...@faisal.com>.
On Apr 22, 2007, at 9:05 AM, Matt Kettler wrote:
> You don't have sa-blacklist, do you?

no, but i had a whitelist with almost 5,000 entries

-faisal



Re: sa-learn: have i seen this before?

Posted by Matt Kettler <mk...@verizon.net>.
Faisal N Jawdat wrote:
> On Apr 21, 2007, at 11:49 PM, Matt Kettler wrote:
>> Try adding a -D to sa-learn.. if it's lock contention, you should see
>> a bunch of messages about it waiting for the lock.
>
> i did this earlier (after some mucking about with file tracing tools)
> and found that most of the wait seems to be in two places:
>
> - loading user_prefs (which in my case has some auto-generated
> portions that could be substantially trimmed)
>
> - one of the rules files (i'm trying to isolate what rule) 

You don't have sa-blacklist, do you?


Re: sa-learn: have i seen this before?

Posted by Faisal N Jawdat <fa...@faisal.com>.
On Apr 21, 2007, at 11:49 PM, Matt Kettler wrote:
> Try adding a -D to sa-learn.. if it's lock contention, you should  
> see a bunch of messages about it waiting for the lock.

i did this earlier (after some mucking about with file tracing tools)  
and found that most of the wait seems to be in two places:

- loading user_prefs (which in my case has some auto-generated  
portions that could be substantially trimmed)

- one of the rules files (i'm trying to isolate what rule)

to the point of filtering, i do wonder if i can solve my "what to  
scan" problem by setting my own custom imap flags.  since maildir  
records that information in the filename this would allow me to  
easily call sa-learn once on all files matching that pattern.

-faisal


Re: sa-learn: have i seen this before?

Posted by Matt Kettler <mk...@verizon.net>.
Faisal N Jawdat wrote:
> On Apr 21, 2007, at 2:11 PM, Matt Kettler wrote:
>> Ok, I just did some testing. Something is *VERY* wrong with your
>> system.. Are you running out of ram and swapping?
>
> Hrm.  top currently reports 123mb free (out of 2g physical, with some
> swapping.  sa-learn has a 62M RSS.  This is a shared system with a
> bunch of activity, so I can't easily isolate all issues, but I'll keep
> digging.
>
> Repeated the test almost (but not completely) as you described and it
> still took 20 seconds.    File lock-contention sounds plausible.

Try adding a -D to sa-learn.. if it's lock contention, you should see a
bunch of messages about it waiting for the lock.


Re: sa-learn: have i seen this before?

Posted by Faisal N Jawdat <fa...@faisal.com>.
On Apr 21, 2007, at 2:11 PM, Matt Kettler wrote:
> Ok, I just did some testing. Something is *VERY* wrong with your  
> system.. Are you running out of ram and swapping?

Hrm.  top currently reports 123mb free (out of 2g physical, with some  
swapping.  sa-learn has a 62M RSS.  This is a shared system with a  
bunch of activity, so I can't easily isolate all issues, but I'll  
keep digging.

Repeated the test almost (but not completely) as you described and it  
still took 20 seconds.    File lock-contention sounds plausible.

-faisal




Re: sa-learn: have i seen this before?

Posted by Matt Kettler <mk...@verizon.net>.
Faisal N Jawdat wrote:
>
> if sa-learn already does this internally then it's doing it rather
> inefficiently.  20 seconds to pull a message id and compare it against
> the db (berkeleydb, fwiw)?
>
Ok, I just did some testing. Something is *VERY* wrong with your
system.. Are you running out of ram and swapping?

Either that or you've got a lot of mail comming in and you're waiting a
long time for the bayes DB to be unlocked. 20 seconds does sound about
right for a bayes-lock timeout...

Learning one message should take less than a second.

Here's my test results, using SA 3.1.8, freshly downloaded.
The machine used is an Athlon 64 3200+ (2ghz) with 2GB of DDR ram.
The hard drive used is an IDE drive, Maxtor 94610U6.
The Operating System was fedora Core 4.
The system was not doing anything else at the time, no background SA
processes, etc.

For repeatability, I used part of the public corpus:

http://spamassassin.apache.org/publiccorpus/20030228_easy_ham.tar.bz2

Single message tests:
----------------------
Prestep:
    rm ~/.spamassassin/bayes*
Message used: 01251

First run of sa-learn --nonspam
    Real 0.841sec
    User 0.716 sec
    Sys 0.047 sec
    Learned 1 of 1 messages.
   real speed: 0.841 seconds per message

Second run
    Real 0.770sec
    User 0.682 sec
    Sys 0.047 sec
    Learned 0 of 1 messages.
  real speed: 0.77 seconds per message

Whole batch tests:
----------------------
Prestep:
    rm ~/.spamassassin/bayes*
    message used: whole directory of 2501 messages

First run of sa-learn --nonspam easy_ham/
    Real 3m 11.445 sec
    User 2m 11.850 sec
    System 0m 5.177 sec
    Learned 2501 of 2501 messages.
    real speed: 0.07654 seconds per message

Second run sa-learn --nonspam easy_ham/
    Real 0m 53.926 sec
    User 2m 11.850 sec
    System 0m 0.277 sec
    Learned 0 of 2501 messages.
    real speed: 0.02156 second per message


Re: sa-learn: have i seen this before?

Posted by Faisal N Jawdat <fa...@faisal.com>.
On Apr 21, 2007, at 1:34 PM, Matt Kettler wrote:
> time sa-learn on it, and feed it the WHOLE DIRECTORY at once. Do not
> iterate messages, do not specify filenames, just give sa-learn the  
> name
> of the directory.

Doing this on a directory with 6 messages takes about a second more  
than doing it for a single message, which is promising.  That said,  
it isn't noticeably faster (tenths of a second) the second time  
(timed using /usr/bin/time).

> If it's not, and the first pass did learn messages, you've got a  
> problem.

That's promising (I have a problem, but problems can be found).

> The other possibility is you've got write-lock contention. You can  
> avoid
> a lot of this by using the bayes_learn_to_journal option, at the  
> expense
> of causing your training to not take effect until the next sync.

For batch scripts I'm pretty comfortable doing everything with --no- 
sync, with a --sync at the end.

-faisal



Re: sa-learn: have i seen this before?

Posted by Matt Kettler <mk...@verizon.net>.
Faisal N Jawdat wrote:
> On Apr 21, 2007, at 11:23 AM, Matt Kettler wrote:
>> Ok, but how does knowing what SA learned it as help? It doesn't.
>>
>> Figure out what to train as, and train.
>
> it helps in that i can automatically iterate over some or all of my
> mail folders on a regular basis, selectively retraining *if*:
>
> a) the message has already been trained
> b) it's been trained the same way that i want it trained in the end
> and
> c) the cost of determining it's already been trained is substantially
> lower than the cost of just training it
But what's the point in that? Why not:

1) tell sa-learn the way you want it trained,
2) Let sa-learn check its "bayes_seen" database and which messages were
already learned properly, and it will automatically skip them, saving
the processing time by not relearning them.


sa-learn will already skip messages that are correctly learned.
Completely automatically. In fact, there's no way to cause it to not
skip a message that's already been properly learned.

What does your proposal do that sa-learn already doesn't?
>> I never suggested that you should parse the headers. sa-learn does this
>> to extract the message-id and compare that to the bayes_seen database.
>> sa-learn *MUST* do this much to determine if the message has already
>> been learned. There's NO other way.
>
> even so, it should be possible to parse the message, extract the
> message-id, and compare the results in << 20 seconds. 
Yep. If you're feeding single messages to sa-learn at a time, I can see
how you'd have the perception that it's really slow to make this
decision. Most of the time would be spent loading sa-learn.

Try this experiment:

Set up a directory full of messages you want to train.

time sa-learn on it, and feed it the WHOLE DIRECTORY at once. Do not
iterate messages, do not specify filenames, just give sa-learn the name
of the directory.

time sa-learn --spam /some/folder/of/spam/

Then, afterwards, re-run it. The second time it should skip all the
messages, and should run substantially faster. If it's not, and the
first pass did learn messages, you've got a problem.

>
>> That's a "separate sorter". sa-learn already does this internally, so
>> *any* code on your part is a waste.
>
> if sa-learn already does this internally then it's doing it rather
> inefficiently.  20 seconds to pull a message id and compare it against
> the db (berkeleydb, fwiw)? 
I'd venture to guess sa-learn spends most of that time loading the perl
interpreter and shutting it back down. That's why you really should
avoid feeding single messages to sa-learn, it's really slow. It's also
why "spamassassin" is much slower than the spamc/spamd pair.

The other possibility is you've got write-lock contention. You can avoid
a lot of this by using the bayes_learn_to_journal option, at the expense
of causing your training to not take effect until the next sync.


As a matter of fact, using berkelydb, you should be able to LEARN 2,000
messages in 124 seconds on a p4.

Phase 1a and 1b are both learning 2000 fresh messages into the DB, run
as a single batch using a mbox file. However, these tests are purely
testing sa-learn. No live mail scanning is accessing the baye DB at the
time.

http://wiki.apache.org/spamassassin/BayesBenchmarkResults

Re: sa-learn: have i seen this before?

Posted by Faisal N Jawdat <fa...@faisal.com>.
On Apr 21, 2007, at 11:23 AM, Matt Kettler wrote:
> Ok, but how does knowing what SA learned it as help? It doesn't.
>
> Figure out what to train as, and train.

it helps in that i can automatically iterate over some or all of my  
mail folders on a regular basis, selectively retraining *if*:

a) the message has already been trained
b) it's been trained the same way that i want it trained in the end
and
c) the cost of determining it's already been trained is substantially  
lower than the cost of just training it

right now i do this manually:  i have a "retrain as spam" folder and  
a "retrain as ham" folder and i hit them each every 5 minutes.  i'd  
rather get rid of the folders, which lets me then use the client-side  
junk mail systems to flag messages as spam or ham, which sa would  
then pick up to retrain.

> I never suggested that you should parse the headers. sa-learn does  
> this
> to extract the message-id and compare that to the bayes_seen database.
> sa-learn *MUST* do this much to determine if the message has already
> been learned. There's NO other way.

even so, it should be possible to parse the message, extract the  
message-id, and compare the results in << 20 seconds.

> That's a "separate sorter". sa-learn already does this internally,  
> so *any* code on your part is a waste.

if sa-learn already does this internally then it's doing it rather  
inefficiently.  20 seconds to pull a message id and compare it  
against the db (berkeleydb, fwiw)?

-faisal


Re: sa-learn: have i seen this before?

Posted by Matt Kettler <mk...@verizon.net>.
Faisal N Jawdat wrote:
> On Apr 21, 2007, at 1:30 AM, Matt Kettler wrote:
>>> 2.  which way do i learn it.
>>
>> Erm, if it's spam, learn it as spam.. if it's nonspam, learn it as
>> nonspam. What's the problem here?
>
> i have a program looking through for untrained messages and deciding
> what to train them as.  alternatively, i have a program looking
> through and training all messages in a folder, deciding how to train
> on the fly.
Ok, but how does knowing what SA learned it as help? It doesn't.

Figure out what to train as, and train.

>
>> What you want to do would reduce efficiency by making SA take two
>> passes. In the first pass, it parses all the headers of every
>> message, and tells you which ones it's learned or not.
>
> a couple issues here:
>
> 1.  the headers do not necessarily tell the truth -- if you train on a
> message after it arrives then the headers will still say the same as
> written at delivery time.  and, as you point out, parsing the headers
> is an ugly way to do it.
I never suggested that you should parse the headers. sa-learn does this
to extract the message-id and compare that to the bayes_seen database.
sa-learn *MUST* do this much to determine if the message has already
been learned. There's NO other way.
>
> 2.  depending on how fast the "have i trained this message before"
> lookup is, this could still beat training every message.  as it is i'm
> looking at 19-20 seconds to [not] retrain a previously trained
> messages on a fairly unloaded box.
>
> i'm guess i could write a wrapper script around the sa-learn functions
> to keep a seperate db of what has and hasn't been trained.
But *WHY*.. Spamassassin already has such a database! And it uses it for
the exact same purpose you propose!

What you're ultimately trying to do is redundant. sa-learn already
handles all of this.
>
>> Then you use some external sorter
>> Then you call SA to learn the messages that weren't learned. It now has
>> to re-parse the headers from scratch, then parse/tokenize and learn the
>> body.
>
> why call a separate sorter?  do something more like:
>
> for my $message (@messages) {
>   learn($message) unless (already_learned($message))
> }
That's a "separate sorter". sa-learn already does this internally, so
*any* code on your part is a waste.

Why are you wanting to redundantly implement a feature sa-learn already
does?
How are you going to do it any "better"?


Re: sa-learn: have i seen this before?

Posted by Faisal N Jawdat <fa...@faisal.com>.
On Apr 21, 2007, at 1:30 AM, Matt Kettler wrote:
>> 2.  which way do i learn it.
>
> Erm, if it's spam, learn it as spam.. if it's nonspam, learn it as  
> nonspam. What's the problem here?

i have a program looking through for untrained messages and deciding  
what to train them as.  alternatively, i have a program looking  
through and training all messages in a folder, deciding how to train  
on the fly.

> What you want to do would reduce efficiency by making SA take two  
> passes. In the first pass, it parses all the headers of every  
> message, and tells you which ones it's learned or not.

a couple issues here:

1.  the headers do not necessarily tell the truth -- if you train on  
a message after it arrives then the headers will still say the same  
as written at delivery time.  and, as you point out, parsing the  
headers is an ugly way to do it.

2.  depending on how fast the "have i trained this message before"  
lookup is, this could still beat training every message.  as it is  
i'm looking at 19-20 seconds to [not] retrain a previously trained  
messages on a fairly unloaded box.

i'm guess i could write a wrapper script around the sa-learn  
functions to keep a seperate db of what has and hasn't been trained.

> Then you use some external sorter
> Then you call SA to learn the messages that weren't learned. It now  
> has
> to re-parse the headers from scratch, then parse/tokenize and learn  
> the
> body.

why call a separate sorter?  do something more like:

for my $message (@messages) {
   learn($message) unless (already_learned($message))
}

-faisal




Re: sa-learn: have i seen this before?

Posted by Matt Kettler <mk...@verizon.net>.
Faisal N Jawdat wrote:
> On Apr 16, 2007, at 9:34 PM, Matt Kettler wrote:
>> Try to learn it, if it comes back with something to the affect of:
>> "learned from 0 messages, processed 1.." then it's already been learned.
>
First, sorry for taking so long to get back to you.. I've been absurdly
busy at work lately.
> this seems to be the common suggestion.
>
> it has a couple drawbacks, as i see it:
>
> 1.  it's relatively cpu-intensive if i want to do it all the time
> (e.g. scan my spam folder to learn only the messages which haven't
> already been learned)
If this is your SOLE desire, yes, because sa-learn will at the same time
also learn every message that's not been learned before.
>
> 2.  which way do i learn it.
Erm, if it's spam, learn it as spam.. if it's nonspam, learn it as
nonspam. What's the problem here?
>
> to step back a bit, my final goal is to be able to figure out which
> messages in a folder haven't been learned, and learn only those.  in
> the ideal situation i can also figure out (ahead of time), whether a
> learned message was learned as ham or spam.

You do realize that the above idea would be *SLOWER* than feeding all
the messages to sa-learn.. Right?

sa-learn already internally recognizes which messages are already
learned and skips them.

What you want to do would reduce efficiency by making SA take two
passes. In the first pass, it parses all the headers of every message,
and tells you which ones it's learned or not.
Then you use some external sorter
Then you call SA to learn the messages that weren't learned. It now has
to re-parse the headers from scratch, then parse/tokenize and learn the
body.

Your sorter and the re-parsing of the headers is excess overhead that
would have been eliminated by letting sa-learn just handle it all the
first time through.





Re: sa-learn: have i seen this before?

Posted by Faisal N Jawdat <fa...@faisal.com>.
On Apr 16, 2007, at 9:34 PM, Matt Kettler wrote:
> Try to learn it, if it comes back with something to the affect of:
> "learned from 0 messages, processed 1.." then it's already been  
> learned.

this seems to be the common suggestion.

it has a couple drawbacks, as i see it:

1.  it's relatively cpu-intensive if i want to do it all the time  
(e.g. scan my spam folder to learn only the messages which haven't  
already been learned)

2.  which way do i learn it.

to step back a bit, my final goal is to be able to figure out which  
messages in a folder haven't been learned, and learn only those.  in  
the ideal situation i can also figure out (ahead of time), whether a  
learned message was learned as ham or spam.

this may be semi-impossible.

on the other hand, what can i learn from the headers?

e.g. it looks like autolearn=[something] will tell me about the  
autolearner, but is there anything for manual learns?

where i'm going with all this:

i can run a cron job to learn the contents of different mailboxes on  
a regular basis.  what i do now is have a TrainSpam and TrainHam  
mailbox, and when something gets misfiled (in Spam or any ham folder)  
i just move it in there.  every 5 minutes a cron job goes through and  
scans things appropriately. <http://www.faisal.com/software/sa- 
harvest/quicktrain.html>

first, i'd like to be able to do that within the mailboxes rather  
than using special mailboxes.

second, i'd like to be able to key off junk mail flags set by the  
client (thunderbird, apple mail).  i'm using dovecot, so it's a  
fairly simple matter of parsing Maildir filenames, but to do it right  
i need to combine the knowledge with what spamassassin thinks.

i might just go write a dovecot plugin to do this in real-time, but  
i'm not feeling the motivation to break the mail server with a  
misplaced pointer.

-faisal


Re: sa-learn: have i seen this before?

Posted by Matt Kettler <mk...@verizon.net>.
Faisal N Jawdat wrote:
> Is there an easy way to tell if sa-learn has learned a given message
> before?
Try to learn it, if it comes back with something to the affect of:
"learned from 0 messages, processed 1.." then it's already been learned.


sa-learn: have i seen this before?

Posted by Faisal N Jawdat <fa...@faisal.com>.
Is there an easy way to tell if sa-learn has learned a given message  
before?

-faisal


Re: sa-learn question about number of messages processed

Posted by "John D. Hardin" <jh...@impsec.org>.
On Mon, 16 Apr 2007, Matt Kettler wrote:

> > 0.000          0        569          0  non-token data: nspam
> You have trained 569 nonspam messages

that should be: 569 spams (Number SPAM)

> > 0.000          0          7          0  non-token data: nham
> You have trained 7 spam messages

and: 7 hams (Number HAM)

Pakogah: you need to train 193 more ham emails before Bayes will start 
scoring.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Ten-millimeter explosive-tip caseless, standard light armor
  piercing rounds. Why?
-----------------------------------------------------------------------
 3 days until The 232nd anniversary of The Shot Heard 'Round The World


Re: sa-learn question about number of messages processed

Posted by Matt Kettler <mk...@verizon.net>.
PakOgah wrote:
>
> Matt Kettler wrote:
>> Mário Gamito wrote:
>>  
>>> Hi,
>>>
>>> How can i know how many messages did already sa-learn processed ?
>>>     
>> You mean the total number of messages learned in the bayes database
>> (includes sa-learn and autolearn)?
>>
>> sa-learn --dump magic
> and how do I read this information ?
> # sa-learn --dump magic
> 0.000          0          3          0  non-token data: bayes db version
Bayes DB is in the version 3 format. (it's changed a couple times in
history, but hasn't changed recently)
>
> 0.000          0        569          0  non-token data: nspam
You have trained 569 nonspam messages
> 0.000          0          7          0  non-token data: nham
You have trained 7 spam messages, which is very few, not enough for SA
to be willing to start using the bayes database to rate mail yet.. by
default you need 200 (and I do not recommend changing it to anything
lower except in lab tests to study bayes errors in under-trained
databases.).
> 0.000          0      53898          0  non-token data: ntokens
There are 53,898 total tokens in the bayes database. (small, but not
absurdly so. By default SA aims to keep it between 150k and 100k.
Looking above, you've not trained enough emails for SA to start
considering throwing out old tokens to keep it under 150k.)
> 0.000          0  987802486          0  non-token data: oldest atime
> 0.000          0 1176482771          0  non-token data: newest atime
The least-recently used token in the database was last accessed
987802486 seconds after January 1st, 1970, and the most-recent was
accessed at 1176482771. (not very interesting except to compare against
each other)
> 0.000          0          0          0  non-token data: last journal
> sync atime
> 0.000          0          0          0  non-token data: last expiry atime
> 0.000          0          0          0  non-token data: last expire
> atime delta
> 0.000          0          0          0  non-token data: last expire
> reduction count
>
There's never been a journal sync or expiration of old tokens.

In a young database this is reasonably normal, although I'd eventually
expect a journal sync after you've got enough nonspam for your bayes to
become actively used by SA. Also, you'll never get expiry until your
database is a bit larger. Expiry doesn't kick in until you've got
150,000 tokens, and you've got about a third of that.


Re: sa-learn question about number of messages processed

Posted by PakOgah <pa...@pala.bo-tak.info>.
Matt Kettler wrote:
> Mário Gamito wrote:
>   
>> Hi,
>>
>> How can i know how many messages did already sa-learn processed ?
>>     
> You mean the total number of messages learned in the bayes database
> (includes sa-learn and autolearn)?
>
> sa-learn --dump magic
and how do I read this information ?
# sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0        569          0  non-token data: nspam
0.000          0          7          0  non-token data: nham
0.000          0      53898          0  non-token data: ntokens
0.000          0  987802486          0  non-token data: oldest atime
0.000          0 1176482771          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal 
sync atime
0.000          0          0          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire 
atime delta
0.000          0          0          0  non-token data: last expire 
reduction count

Re: sa-learn question about number of messages processed

Posted by Matt Kettler <mk...@verizon.net>.
Mário Gamito wrote:
> Hi,
>
> How can i know how many messages did already sa-learn processed ?
You mean the total number of messages learned in the bayes database
(includes sa-learn and autolearn)?

sa-learn --dump magic