You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Cecil Westerhof <Ce...@decebal.nl> on 2010/01/05 17:51:05 UTC

About upgrading

After the scare about the 2010 problem, it was found that there was no
problem, but that was because an old version of SpamAssassin was used
(3.0.4). The web-site says it is not a big problem to upgrade to the
latest version. But in how far is this really the case? Are there
surprises to be aware of when going from 3.0.4 to 3.2.5? And how
important is this? Until now there is not a real problem with the
filtering.

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

Re: About upgrading

Posted by Cecil Westerhof <Ce...@decebal.nl>.
Jeff Mincy <je...@delphioutpost.com> writes:

>    But it does not seem to be interesting in my situation.
>    First my code has to grow from:
>        sa-learn --${typeStr} ${HOME}/Maildir/.SpamDir.${dirStr}/cur/
>    to:
>        for i in ${HOME}/Maildir/.SpamDir.${dirStr}/cur/*; do
>            spamc -L ${typeStr} <${i}
>        done
>    
>    Which is not even enough, because I need to take care of the situation
>    that the directory is empty and I need to implement code to show the
>    messages delivered by sa-learn.
>
> Oh.  You're learning all of the messages in a directory.  spamc -L is
> faster than sa-learn for learning single messages because sa-learn is
> a perl script that has to load Mail::SpamAssassin each time.  For a
> large directory the slower startup of sa-learn is less of an issue.
> sa-learn is fine for doing directories.

I have been continuing and have something that seems to work. The only
problem is that spamc gives another exit code as the documentation
suggests. I'll finish the code and when it is tested tonight I'll post
it here.

Again: it is not really necessary, but I like efficient code.


>    Which a low level of spam it work, but if it becomes bigger, it does not
>    work:
>        date
>        echo ${echoStr}
>        sa-learn --${typeStr} ${HOME}/Maildir/.SpamDir.${dirStr}/cur/
>        date
>        for i in ${HOME}/Maildir/.SpamDir.${dirStr}/cur/*; do
>            spamc -L ${typeStr} <${i}
>        done
>        echo learned in the new way
>        date
>    gives:
>        za jan  9 16:09:25 CET 2010
>        Increase
>        Learned tokens from 0 message(s) (45 message(s) examined)
>        za jan  9 16:09:40 CET 2010
>        learned in the new way
>        za jan  9 16:10:00 CET 2010
>    
>    So sa-learn takes 15 seconds and spamc -L 20 seconds. (And I need more
>    code. Beside taking care of an empty directory, I also need to implement
>    the feedback given by sa-learn.)
>    
> You learned tokens from 0 messages and looked at 45 messages.
> You've already previously learned from those 45 messages, which is
> just timing how fast it can do nothing.

I changed the code and now it moves already processed messages out of
the way. As mentioned above: I'll post it after it has been tested.


>    > Also, What is the size of your database?   Maybe you are spending lots
>    > of time doing expires or something.
>    
>    sa-learn --dump magic gives:
>        0.000          0          3          0  non-token data: bayes db version
>        0.000          0      57538          0  non-token data: nspam
>        0.000          0      74876          0  non-token data: nham
>        0.000          0     166338          0  non-token data: ntokens
>        0.000          0 1257478501          0  non-token data: oldest atime
>        0.000          0 1263049426          0  non-token data: newest atime
>        0.000          0 1263049538          0  non-token data: last journal sync atime
>        0.000          0 1263044805          0  non-token data: last expiry atime
>        0.000          0    5529600          0  non-token data: last expire atime delta
>        0.000          0       1868          0  non-token data: last expire reduction count
>    
> Your database has 166338 tokens which is larger than the default
> bayes_expiry_max_db_size 150000.  The last expiration ran this morning
> at 8:46.  You could try letting the bayes database get larger and turn
> off bayes_auto_expire.  If you turn off bayes_auto_expire you'll have
> to add something to cron to periodically expire tokens.
> bayes_auto_expire is fine for lower volumes of email, but can get in
> the way with higher volumes.

With the changed code it only takes a few seconds, so properly I do not
have to worry about this.
Also, the learning is already done in a cronjob at the time I am in
principle not working on the computer.

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

Re: About upgrading

Posted by John Hardin <jh...@impsec.org>.
On Mon, 11 Jan 2010, RW wrote:

> I do wonder whether there's any real-basis to the idea that autoexpiry 
> isn't "industrial-strength". I don't use expiry any more, but when I 
> did, it didn't seem like a big deal at 200,000 tokens, and it's O(N) so 
> millions of tokens shouldn't be too bad either.

In practice it _is_ problematic.

We fairly often get queries to the list about why someone is seeing scan 
timeouts and scads of expiry work files in the bayes directory; the cause 
is that auto-expiry is taking longer than some timer is willing to wait 
for spamd to complete, so the auto-expiry never completes. The standard 
answer is to turn off auto-expiry and expire weekly or daily from cron.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   One difference between a liberal and a pickpocket is that if you
   demand your money back from a pickpocket he will not question your
   motives.                                          -- William Rusher
-----------------------------------------------------------------------
  6 days until Benjamin Franklin's 304th Birthday

Re: About upgrading

Posted by RW <rw...@googlemail.com>.
On Mon, 11 Jan 2010 08:54:20 -0500
Jeff Mincy <je...@delphioutpost.com> wrote:



> You have an exclusive lock when doing expiration.  Expiration
> presumably takes longer on larger volumes, but it is still pretty
> fast. Running expiration daily or weekly should be more than
> sufficient.

AFAIK the exclusive lock is only against learning and database
maintenance operations (sync, backup, etc), at least that's my
experience based on what happens when stale lockfiles get left-behind.
Mail continues to get processed by spamd, and gets classified by
Bayes - the only difference is that you see autolearn=unavailable in
the X-SPAM-STATUS header.

There's no reason to lock bayes completely because expiry writes-out a
new database file and does a swap-over. And because the writes go to a
different db file they don't even interfere with spamd read-locking
(in some cases it might even speed-up spamd as it turns-off
autolearning).

I do wonder whether there's any real-basis to the idea that autoexpiry
isn't  "industrial-strength". I don't use expiry any more, but when I
did, it didn't seem like a big deal at 200,000 tokens, and it's O(N) so
millions of tokens shouldn't be too bad either.



Re: About upgrading

Posted by Alex <my...@gmail.com>.
Hi,

Thanks for the information on bayes and sa-learn. Very helpful.

Best,
Alex

>   I suppose you could take the ntokens value before, and subtract it
>   from the after value to see how many tokens were expired, right? It
>   would be interesting to see how many tokens are expired on a regular
>   basis, but not sure that's very useful, just interesting.
>
> sa-learn tells how many tokens were deleted you when you do --force-expire, for example:
>  expired old bayes database entries in 152 seconds
>  1516428 entries kept, 115692 deleted
>  token frequency: 1-occurrence tokens: 73.76%
>  token frequency: less than 8 occurrences: 16.19%
>
> -jeff
>

Re: About upgrading

Posted by Jeff Mincy <je...@delphioutpost.com>.
   From: Alex <my...@gmail.com>
   Date: Sat, 9 Jan 2010 21:13:24 -0500
   
   >   sa-learn --dump magic gives:
   >       0.000          0          3          0  non-token data: bayes db version
   >       0.000          0      57538          0  non-token data: nspam
   >       0.000          0      74876          0  non-token data: nham
   >       0.000          0     166338          0  non-token data: ntokens
   >       0.000          0 1257478501          0  non-token data: oldest atime
   >       0.000          0 1263049426          0  non-token data: newest atime
   >       0.000          0 1263049538          0  non-token data: last journal sync atime
   >       0.000          0 1263044805          0  non-token data: last expiry atime
   >       0.000          0    5529600          0  non-token data: last expire atime delta
   >       0.000          0       1868          0  non-token data: last expire reduction count
   >
   > Your database has 166338 tokens which is larger than the default
   > bayes_expiry_max_db_size 150000.  The last expiration ran this morning
   > at 8:46.  You could try letting the bayes database get larger and turn
   > off bayes_auto_expire.  If you turn off bayes_auto_expire you'll have
   > to add something to cron to periodically expire tokens.
   > bayes_auto_expire is fine for lower volumes of email, but can get in
   > the way with higher volumes.
   
   Also, what is the drawback with using auto_expire on larger volumes?
   Is it the locking delay and preventing learning new messages during
   that time? If you were to put it in cron to manually do an expiry, how
   often should it be run?
   
You have an exclusive lock when doing expiration.  Expiration presumably
takes longer on larger volumes, but it is still pretty fast.  
Running expiration daily or weekly should be more than sufficient.

   Is there anything that should be tested prior to making this change,
   or is it pretty benign?

Yes - turning off bayes_auto_expire is pretty benign.
You may not need to make this type of change.   The default options
for bayes work fine for lower email volumes.

   I suppose you could take the ntokens value before, and subtract it
   from the after value to see how many tokens were expired, right? It
   would be interesting to see how many tokens are expired on a regular
   basis, but not sure that's very useful, just interesting.

sa-learn tells how many tokens were deleted you when you do --force-expire, for example:
 expired old bayes database entries in 152 seconds
 1516428 entries kept, 115692 deleted
 token frequency: 1-occurrence tokens: 73.76%
 token frequency: less than 8 occurrences: 16.19%

-jeff

Re: About upgrading

Posted by Bill Landry <bi...@inetmsg.com>.
Rosenbaum, Larry M. wrote:
> 
>> -----Original Message-----
>> From: Bill Landry [mailto:bill@inetmsg.com]
>> Sent: Sunday, January 10, 2010 12:42 PM
>> To: users@spamassassin.apache.org
>> Subject: Re: About upgrading
>>
>> LuKreme wrote:
>>> On 9-Jan-2010, at 21:23, Rosenbaum, Larry M. wrote:
>>>
>>>> It's the number of seconds since the epoch (Jan 1, 1970).  One easy
>> way to convert it to a readable time is
>>>> # perl -e 'print scalar localtime 1263044805, "\n"'
>>>> Sat Jan  9 08:46:45 2010
>> Or even simpler:
>>
>> perl -le 'print scalar localtime 1263049538'
>> Sat Jan  9 05:46:45 2010
>>
>>>  % date -r 1263044805
>>> Sat Jan  9 06:46:45 MST 2010
>> On Linux based systems:
>>
>> date -d @1263044805
>> Sat Jan  9 05:46:45 PST 2010
>>
>> I like this output better than the perl output because it also includes
>> the timezone.
> 
> Excellent.  Is there one that works on Solaris (other than the Perl version)?

Don't know about the default date utility for Solaris, but you could
always use the GNU date utility on Solaris.

Bill

RE: About upgrading

Posted by "Rosenbaum, Larry M." <ro...@ornl.gov>.

> -----Original Message-----
> From: Bill Landry [mailto:bill@inetmsg.com]
> Sent: Sunday, January 10, 2010 12:42 PM
> To: users@spamassassin.apache.org
> Subject: Re: About upgrading
> 
> LuKreme wrote:
> > On 9-Jan-2010, at 21:23, Rosenbaum, Larry M. wrote:
> >
> >> It's the number of seconds since the epoch (Jan 1, 1970).  One easy
> way to convert it to a readable time is
> >>
> >> # perl -e 'print scalar localtime 1263044805, "\n"'
> >> Sat Jan  9 08:46:45 2010
> 
> Or even simpler:
> 
> perl -le 'print scalar localtime 1263049538'
> Sat Jan  9 05:46:45 2010
> 
> >  % date -r 1263044805
> > Sat Jan  9 06:46:45 MST 2010
> 
> On Linux based systems:
> 
> date -d @1263044805
> Sat Jan  9 05:46:45 PST 2010
> 
> I like this output better than the perl output because it also includes
> the timezone.

Excellent.  Is there one that works on Solaris (other than the Perl version)?

Re: About upgrading

Posted by Bill Landry <bi...@inetmsg.com>.
LuKreme wrote:
> On 9-Jan-2010, at 21:23, Rosenbaum, Larry M. wrote:
> 
>> It's the number of seconds since the epoch (Jan 1, 1970).  One easy way to convert it to a readable time is
>>
>> # perl -e 'print scalar localtime 1263044805, "\n"'
>> Sat Jan  9 08:46:45 2010

Or even simpler:

perl -le 'print scalar localtime 1263049538'
Sat Jan  9 05:46:45 2010

>  % date -r 1263044805
> Sat Jan  9 06:46:45 MST 2010

On Linux based systems:

date -d @1263044805
Sat Jan  9 05:46:45 PST 2010

I like this output better than the perl output because it also includes
the timezone.

Bill

Re: About upgrading

Posted by LuKreme <kr...@kreme.com>.
On 9-Jan-2010, at 21:23, Rosenbaum, Larry M. wrote:

> It's the number of seconds since the epoch (Jan 1, 1970).  One easy way to convert it to a readable time is
> 
> # perl -e 'print scalar localtime 1263044805, "\n"'
> Sat Jan  9 08:46:45 2010

 % date -r 1263044805
Sat Jan  9 06:46:45 MST 2010


-- 
MEGAHAL: within my penguin lies a torrid story of hate and love.


RE: About upgrading

Posted by "Rosenbaum, Larry M." <ro...@ornl.gov>.
--Original Message-----
> From: Alex [mailto:mysqlstudent@gmail.com]
> Sent: Saturday, January 09, 2010 9:13 PM
> To: SA Mailing list
> Subject: Re: About upgrading
> 
> Hi,
> 
> >   sa-learn --dump magic gives:
> >       0.000          0          3          0  non-token data: bayes
> db version
> >       0.000          0      57538          0  non-token data: nspam
> >       0.000          0      74876          0  non-token data: nham
> >       0.000          0     166338          0  non-token data: ntokens
> >       0.000          0 1257478501          0  non-token data: oldest
> atime
> >       0.000          0 1263049426          0  non-token data: newest
> atime
> >       0.000          0 1263049538          0  non-token data: last
> journal sync atime
> >       0.000          0 1263044805          0  non-token data: last
> expiry atime
> >       0.000          0    5529600          0  non-token data: last
> expire atime delta
> >       0.000          0       1868          0  non-token data: last
> expire reduction count
> >
> > Your database has 166338 tokens which is larger than the default
> > bayes_expiry_max_db_size 150000.  The last expiration ran this
> morning
> > at 8:46.  You could try letting the bayes database get larger and
> turn
> > off bayes_auto_expire.  If you turn off bayes_auto_expire you'll have
> > to add something to cron to periodically expire tokens.
> > bayes_auto_expire is fine for lower volumes of email, but can get in
> > the way with higher volumes.
> 
> Can I ask how you calculate the actual time from that number? I
> suspect it's the epoch minus some division of 24hrs, but a quick
> search wasn't fruitful.

It's the number of seconds since the epoch (Jan 1, 1970).  One easy way to convert it to a readable time is

# perl -e 'print scalar localtime 1263044805, "\n"'
Sat Jan  9 08:46:45 2010

Re: About upgrading

Posted by Alex <my...@gmail.com>.
Hi,

>   sa-learn --dump magic gives:
>       0.000          0          3          0  non-token data: bayes db version
>       0.000          0      57538          0  non-token data: nspam
>       0.000          0      74876          0  non-token data: nham
>       0.000          0     166338          0  non-token data: ntokens
>       0.000          0 1257478501          0  non-token data: oldest atime
>       0.000          0 1263049426          0  non-token data: newest atime
>       0.000          0 1263049538          0  non-token data: last journal sync atime
>       0.000          0 1263044805          0  non-token data: last expiry atime
>       0.000          0    5529600          0  non-token data: last expire atime delta
>       0.000          0       1868          0  non-token data: last expire reduction count
>
> Your database has 166338 tokens which is larger than the default
> bayes_expiry_max_db_size 150000.  The last expiration ran this morning
> at 8:46.  You could try letting the bayes database get larger and turn
> off bayes_auto_expire.  If you turn off bayes_auto_expire you'll have
> to add something to cron to periodically expire tokens.
> bayes_auto_expire is fine for lower volumes of email, but can get in
> the way with higher volumes.

Can I ask how you calculate the actual time from that number? I
suspect it's the epoch minus some division of 24hrs, but a quick
search wasn't fruitful.

Also, what is the drawback with using auto_expire on larger volumes?
Is it the locking delay and preventing learning new messages during
that time? If you were to put it in cron to manually do an expiry, how
often should it be run?

Is there anything that should be tested prior to making this change,
or is it pretty benign?

I suppose you could take the ntokens value before, and subtract it
from the after value to see how many tokens were expired, right? It
would be interesting to see how many tokens are expired on a regular
basis, but not sure that's very useful, just interesting.

Finally, I opened gmail this evening, and google reported this to me,
which I thought was amusing:

# Webpage display issues: "Aw, Snap!"
http://www.google.com/support/chrome/bin/answer.py?answer=95669&hl=en

Aw, Snap? Isn't that from like 2002?

Totally off-topic comment, but perhaps acceptable for a late Saturday evening.

Thanks very much.
Best regards,
Alex

Re: About upgrading

Posted by Jeff Mincy <je...@delphioutpost.com>.
   From: Cecil Westerhof <Ce...@decebal.nl>
   Date: Sat, 09 Jan 2010 16:24:56 +0100
   
   Jeff Mincy <je...@delphioutpost.com> writes:
   
   >    I upgraded from 3.0.4 to 3.2.5. I have the feeling that sa-learn takes
   >    more time with 3.2.5 as it took with 3.0.4. Can this be true?
   >    
   >    It is not a problem, because it is done by cron-tab, but I am just
   >    curious.
   >
   > You can use spamc -L spam/ham to learn messages.  Spamc -L is faster
   > than sa-learn.  The spamd daemon needs to be started with
   > --allow-tell.
   
   That is not really an answer on my question. ;-)

I doubt that bayes learning has slowed down significantly.
I would expect that choice of bayes_store_module, learning to
journal, whether auto expiration runs, and lock contention
matters more than the version.

   But it does not seem to be interesting in my situation.
   First my code has to grow from:
       sa-learn --${typeStr} ${HOME}/Maildir/.SpamDir.${dirStr}/cur/
   to:
       for i in ${HOME}/Maildir/.SpamDir.${dirStr}/cur/*; do
           spamc -L ${typeStr} <${i}
       done
   
   Which is not even enough, because I need to take care of the situation
   that the directory is empty and I need to implement code to show the
   messages delivered by sa-learn.

Oh.  You're learning all of the messages in a directory.  spamc -L is
faster than sa-learn for learning single messages because sa-learn is
a perl script that has to load Mail::SpamAssassin each time.  For a
large directory the slower startup of sa-learn is less of an issue.
sa-learn is fine for doing directories.

   Which a low level of spam it work, but if it becomes bigger, it does not
   work:
       date
       echo ${echoStr}
       sa-learn --${typeStr} ${HOME}/Maildir/.SpamDir.${dirStr}/cur/
       date
       for i in ${HOME}/Maildir/.SpamDir.${dirStr}/cur/*; do
           spamc -L ${typeStr} <${i}
       done
       echo learned in the new way
       date
   gives:
       za jan  9 16:09:25 CET 2010
       Increase
       Learned tokens from 0 message(s) (45 message(s) examined)
       za jan  9 16:09:40 CET 2010
       learned in the new way
       za jan  9 16:10:00 CET 2010
   
   So sa-learn takes 15 seconds and spamc -L 20 seconds. (And I need more
   code. Beside taking care of an empty directory, I also need to implement
   the feedback given by sa-learn.)
   
You learned tokens from 0 messages and looked at 45 messages.
You've already previously learned from those 45 messages, which is
just timing how fast it can do nothing.

   > You can try using bayes_learn_to_journal - and do a separate sa-learn
   > --sync job in cron.   Learning to the journal is faster.
   
   I'll look into that.
   
   
   > Also, What is the size of your database?   Maybe you are spending lots
   > of time doing expires or something.
   
   sa-learn --dump magic gives:
       0.000          0          3          0  non-token data: bayes db version
       0.000          0      57538          0  non-token data: nspam
       0.000          0      74876          0  non-token data: nham
       0.000          0     166338          0  non-token data: ntokens
       0.000          0 1257478501          0  non-token data: oldest atime
       0.000          0 1263049426          0  non-token data: newest atime
       0.000          0 1263049538          0  non-token data: last journal sync atime
       0.000          0 1263044805          0  non-token data: last expiry atime
       0.000          0    5529600          0  non-token data: last expire atime delta
       0.000          0       1868          0  non-token data: last expire reduction count
   
Your database has 166338 tokens which is larger than the default
bayes_expiry_max_db_size 150000.  The last expiration ran this morning
at 8:46.  You could try letting the bayes database get larger and turn
off bayes_auto_expire.  If you turn off bayes_auto_expire you'll have
to add something to cron to periodically expire tokens.
bayes_auto_expire is fine for lower volumes of email, but can get in
the way with higher volumes.
-jeff

Re: About upgrading

Posted by RW <rw...@googlemail.com>.
On Sat, 09 Jan 2010 16:24:56 +0100
Cecil Westerhof <Ce...@decebal.nl> wrote:

> Jeff Mincy <je...@delphioutpost.com> writes:
> 
> >    I upgraded from 3.0.4 to 3.2.5. I have the feeling that sa-learn
> > takes more time with 3.2.5 as it took with 3.0.4. Can this be true?
> >    
> >    It is not a problem, because it is done by cron-tab, but I am
> > just curious.
> >
> > You can use spamc -L spam/ham to learn messages.  Spamc -L is faster
> > than sa-learn.  The spamd daemon needs to be started with
> > --allow-tell.
> 
> That is not really an answer on my question. ;-)
> 
> ...
> So sa-learn takes 15 seconds and spamc -L 20 seconds. (And I need more
> code. Beside taking care of an empty directory, I also need to
> implement the feedback given by sa-learn.)

It's not really surprising sa-learn doesn't have the problem of having
to initialize for each individually mail, so spamc is just extra
overhead. 

> > You can try using bayes_learn_to_journal - and do a separate
> > sa-learn --sync job in cron.   Learning to the journal is faster.
> 
> I'll look into that.

I wouldn't bother setting that just to speed-up learning. AFAIK the
point of bayes_learn_to_journal is to prevent autolearning from
slowing-down classification. The gdbm backend uses a simple
reader-writer lock, so updating token counts locks-out all the other
spamd processes from the database. If you have enough active spamd
processes to justify it, updating to the journal avoids lock
contention. The downside is that the updates don't take effect until
the sync.

My guess is that it doesn't really speed-up learning it just defers
some of the work until sync, and there's not much point in that,
since you could just defer the sa-learn.

Re: About upgrading

Posted by Cecil Westerhof <Ce...@decebal.nl>.
Jeff Mincy <je...@delphioutpost.com> writes:

>    I upgraded from 3.0.4 to 3.2.5. I have the feeling that sa-learn takes
>    more time with 3.2.5 as it took with 3.0.4. Can this be true?
>    
>    It is not a problem, because it is done by cron-tab, but I am just
>    curious.
>
> You can use spamc -L spam/ham to learn messages.  Spamc -L is faster
> than sa-learn.  The spamd daemon needs to be started with
> --allow-tell.

That is not really an answer on my question. ;-)

But it does not seem to be interesting in my situation.
First my code has to grow from:
    sa-learn --${typeStr} ${HOME}/Maildir/.SpamDir.${dirStr}/cur/
to:
    for i in ${HOME}/Maildir/.SpamDir.${dirStr}/cur/*; do
        spamc -L ${typeStr} <${i}
    done

Which is not even enough, because I need to take care of the situation
that the directory is empty and I need to implement code to show the
messages delivered by sa-learn.

Which a low level of spam it work, but if it becomes bigger, it does not
work:
    date
    echo ${echoStr}
    sa-learn --${typeStr} ${HOME}/Maildir/.SpamDir.${dirStr}/cur/
    date
    for i in ${HOME}/Maildir/.SpamDir.${dirStr}/cur/*; do
        spamc -L ${typeStr} <${i}
    done
    echo learned in the new way
    date
gives:
    za jan  9 16:09:25 CET 2010
    Increase
    Learned tokens from 0 message(s) (45 message(s) examined)
    za jan  9 16:09:40 CET 2010
    learned in the new way
    za jan  9 16:10:00 CET 2010

So sa-learn takes 15 seconds and spamc -L 20 seconds. (And I need more
code. Beside taking care of an empty directory, I also need to implement
the feedback given by sa-learn.)


> You can try using bayes_learn_to_journal - and do a separate sa-learn
> --sync job in cron.   Learning to the journal is faster.

I'll look into that.


> Also, What is the size of your database?   Maybe you are spending lots
> of time doing expires or something.

sa-learn --dump magic gives:
    0.000          0          3          0  non-token data: bayes db version
    0.000          0      57538          0  non-token data: nspam
    0.000          0      74876          0  non-token data: nham
    0.000          0     166338          0  non-token data: ntokens
    0.000          0 1257478501          0  non-token data: oldest atime
    0.000          0 1263049426          0  non-token data: newest atime
    0.000          0 1263049538          0  non-token data: last journal sync atime
    0.000          0 1263044805          0  non-token data: last expiry atime
    0.000          0    5529600          0  non-token data: last expire atime delta
    0.000          0       1868          0  non-token data: last expire reduction count

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

Re: About upgrading

Posted by Jeff Mincy <je...@delphioutpost.com>.
   From: Cecil Westerhof <Ce...@decebal.nl>
   Date: Sat, 09 Jan 2010 14:39:59 +0100
   
   Cecil Westerhof <Ce...@decebal.nl> writes:
   
   > I did the upgrade. It took some time and there was a slight problem with
   > permissions, but it looks like a successful upgrade. I only changed
   > /dev/null to a real mailbox, because of the 2010 problem. When something
   > like this happens again I now can recover those e-mails.
   
   I upgraded from 3.0.4 to 3.2.5. I have the feeling that sa-learn takes
   more time with 3.2.5 as it took with 3.0.4. Can this be true?
   
   It is not a problem, because it is done by cron-tab, but I am just
   curious.

You can use spamc -L spam/ham to learn messages.  Spamc -L is faster
than sa-learn.  The spamd daemon needs to be started with --allow-tell.

You can try using bayes_learn_to_journal - and do a separate sa-learn
--sync job in cron.   Learning to the journal is faster.

Also, What is the size of your database?   Maybe you are spending lots
of time doing expires or something.

-jeff

Re: About upgrading

Posted by Cecil Westerhof <Ce...@decebal.nl>.
Cecil Westerhof <Ce...@decebal.nl> writes:

> I did the upgrade. It took some time and there was a slight problem with
> permissions, but it looks like a successful upgrade. I only changed
> /dev/null to a real mailbox, because of the 2010 problem. When something
> like this happens again I now can recover those e-mails.

I upgraded from 3.0.4 to 3.2.5. I have the feeling that sa-learn takes
more time with 3.2.5 as it took with 3.0.4. Can this be true?

It is not a problem, because it is done by cron-tab, but I am just
curious.

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

Re: About upgrading

Posted by Cecil Westerhof <Ce...@decebal.nl>.
Kai Schaetzl <ma...@conactive.com> writes:

>> you have changed WHAT???
>
> He means he uses procmail and used to send all spam to /dev/null.

That is right. I also made the following script:
    #!/usr/bin/env bash

    # When --no-filename is not an accepted parameter for grep use -h
    # When --max-count=1 is not an accepted parameter for grep use -m 1

    declare -r DELETE_PARS="-daystart -type f -mtime +30 -print0"
    declare -r BLACK_HOLE=${HOME}/Maildir/.SpamDir.SpamAssassin.black-hole/
    declare -r SUBJECT_PARS="-daystart -type f -mtime 1"

    spamfolderArray=(
      ${BLACK_HOLE}
      ${HOME}/Maildir/.SpamDir.SpamNotFound
      ${HOME}/Maildir/.SpamDir.SpamIncrease
      ${HOME}/Maildir/.SpamDir.SpamNLLGG
      ${HOME}/Maildir/.SpamDir.SpamGoogle
      ${HOME}/Maildir/.SpamDir.SpamBounceReal
      ${HOME}/Maildir/.SpamDir.SpamFalse
    )

    # subjects of yesterday
    spamFiles="$(find ${BLACK_HOLE} ${SUBJECT_PARS})"
    if [[ ${spamFiles} == "" ]] ; then
        echo Yesterday there where no 'deleted' spam e-mails.
    else
        echo "${spamFiles}" | \
          xargs grep --max-count=1 --no-filename ^Subject: |
          cut -b1-70
    fi

    # remove spam older as defined (30 days)
    for spamdir in ${spamfolderArray[@]}; do
        find ${spamdir} ${DELETE_PARS} | xargs -0 /bin/rm -f
    done

Maybe my script to learn is also interesting:
    #!/usr/bin/env bash

    IFS="#"
    PARAMETERS=(
      "NotFound#spam#SpamNotFound"
      "NLLGG#spam#SpamNLLGG"
      "Google#spam#SpamGoogle"
      "Increase#spam#SpamIncrease"
      "Bounced#spam#SpamBounceReal"
      "FalsePositive#ham#SpamFalse"
    )

    for temp in "${PARAMETERS[@]}"; do
        read echoStr typeStr dirStr < <(echo "${temp}")
        date
        echo ${echoStr}
        sa-learn --${typeStr} ${HOME}/Maildir/.SpamDir.${dirStr}/cur/
        echo
    done
    date
    sa-learn --dump magic
    date

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

Re: About upgrading

Posted by Kai Schaetzl <ma...@conactive.com>.
Matus UHLAR - fantomas wrote on Sat, 9 Jan 2010 12:31:26 +0100:

> you have changed WHAT???

He means he uses procmail and used to send all spam to /dev/null.

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com




Re: About upgrading

Posted by LuKreme <kr...@kreme.com>.
On 9-Jan-2010, at 07:07, Cecil Westerhof wrote:

> LuKreme <kr...@kreme.com> writes:
> 
>> I think he (she?)
> 
> He. Cecilia and Cecile are female, but Cecil is male. Think about Cecil
> B. DeMill.

I thought I was referring to Kai, which can go either way. I know Cecil is a male name.

-- 
Wally: That's my nickname, "Waly" with one el. 
Dilbert: Who calls you that?
Wally: Most people, they just don't realize it.


Re: About upgrading

Posted by Cecil Westerhof <Ce...@decebal.nl>.
LuKreme <kr...@kreme.com> writes:

> I think he (she?)

He. Cecilia and Cecile are female, but Cecil is male. Think about Cecil
B. DeMill.

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

Re: About upgrading

Posted by LuKreme <kr...@kreme.com>.
On 9-Jan-2010, at 04:31, Matus UHLAR - fantomas wrote:
>> Kai Schaetzl <ma...@conactive.com> writes:
>> 
>> I only changed /dev/null to a real mailbox, 

> you have changed WHAT???

I think he (she?) meant that the local delivery for certain spam-thresholds was set to /dev/null and that's been changed to a real mailbox).

For the record, unless the mail is addressed to my personal email addresses, I NEVER dev null any message, no matter how high the score. I have seen 'legitimate' messages score very high in the past, and it is much simpler to simply tell users that once a mail is accepted it will ALWAYS be delivered. It might be delivered to their spam mailbox, but it will be delivered.

-- 
I'm completely in favor of the separation of Church and State. My 
	idea is that these two institutions screw us up enough on their 
	own, so both of them together is certain death. 


Re: About upgrading

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
> Kai Schaetzl <ma...@conactive.com> writes:
> > There's always a document about updating from the various old versions, 
> > read it and you will be prepared for most problems. But your SA is 
> > *really* old, expect some minor config problems.

On 06.01.10 02:57, Cecil Westerhof wrote:
> I did the upgrade. It took some time and there was a slight problem with
> permissions, but it looks like a successful upgrade. I only changed
> /dev/null to a real mailbox, because of the 2010 problem. When something
> like this happens again I now can recover those e-mails.

you have changed WHAT???
-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
"The box said 'Requires Windows 95 or better', so I bought a Macintosh".

Re: About upgrading

Posted by Cecil Westerhof <Ce...@decebal.nl>.
Kai Schaetzl <ma...@conactive.com> writes:

> There's always a document about updating from the various old versions, 
> read it and you will be prepared for most problems. But your SA is 
> *really* old, expect some minor config problems.

I did the upgrade. It took some time and there was a slight problem with
permissions, but it looks like a successful upgrade. I only changed
/dev/null to a real mailbox, because of the 2010 problem. When something
like this happens again I now can recover those e-mails.

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

Re: About upgrading

Posted by David Bayle <da...@zerospam.ca>.
Kai Schaetzl wrote:
> There's always a document about updating from the various old versions, 
> read it and you will be prepared for most problems. But your SA is 
> *really* old, expect some minor config problems.
>
> Kai
>
>   
Hy,

Thanks for your advice but I have already read the UPGRADE file.
We are trying to install SA from the SVN repository which it means 
that's not outdated version.

I'm pretty sure that a dependency problem,
But, again, we are trying to setup  recent snapshot of SA on  Ubuntu.

Thanks for your help.

-- 
ZEROSPAM Sécurité Inc. - http://www.zerospam.ca
Tél :	(514) 527 3232	#210
Fax :	(514) 527 1201


--
Ce courriel a été filtré par ZEROSPAM pour votre sécurité.
This email has been scanned by ZEROSPAM for your security.
http://zerospam.ca/


Re: About upgrading

Posted by Kai Schaetzl <ma...@conactive.com>.
There's always a document about updating from the various old versions, 
read it and you will be prepared for most problems. But your SA is 
*really* old, expect some minor config problems.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com




Re: About upgrading

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 05.01.10 17:51, Cecil Westerhof wrote:
> After the scare about the 2010 problem, it was found that there was no
> problem, but that was because an old version of SpamAssassin was used
> (3.0.4). The web-site says it is not a big problem to upgrade to the
> latest version. But in how far is this really the case? Are there
> surprises to be aware of when going from 3.0.4 to 3.2.5? And how
> important is this? Until now there is not a real problem with the
> filtering.

MANY rules changed
MANY plugins changed.
MANY new plugins were added
sa-update works since 3.1 so you even can't sa-update with 3.0.*

Yes, it is worth upgrading.
-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
REALITY.SYS corrupted. Press any key to reboot Universe.