You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by GRP Productions <gr...@hotmail.com> on 2005/03/13 10:21:12 UTC

Bayes DB does not grow anymore

Hello,
for some days now my bayesian DB does not seem to grow. Its size remains
stable. It is read with no problems by SA 3.0.2, but nothing new is written. 
I send an email to me, it is classified as BAYES_50. I sa-learn it as spam, 
send it again, and it is still BAYES_50 (I expected to see it as BAYES_99).

I use SpamAssassin 3.0.2. No configuration change has been done recently. It 
used to work fine.
I've tried --sync, --force-expire, but no luck.
Any help would be appreciated
Thanks
Greg

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


Re: Bayes DB does not grow anymore

Posted by jdow <jd...@earthlink.net>.
From: "Kai Schaetzl" <ma...@conactive.com>

> > in a degree I have set my SA score to be more or less equal with the
> > BAYES_99 score (around 8).
>
> Your BAYES_99 score is 8? I would never do this. General rule is that no
single
> rule should be able to mark a message as ham or spam. That cries for false
> positives.

I'd not do that with Bayes scores. However, there are a few rules that
are iron clad spam detectors here and they get VERY high scores. They
are unique to me and uniquely usable by me so I don't bother to pass
them along. (I have a string if wrong names associated with products
people spam me about that I use to send a score well over 5 to SA. And
I have some additional PayPal antispam of my own which involve some
fancy dancing with meta rules that get an automatic 105 to make sure
they never get through to anything but my spam folder. I do scan the
spam folder, though. If I didn't scan it I'd not be so vicious about
some of my spam scores.

{^_-}



Re: Bayes DB does not grow anymore

Posted by Kai Schaetzl <ma...@conactive.com>.
GRP Productions wrote on Fri, 18 Mar 2005 10:38:29 +0200:

> It seems SURBL is now enabled by default. It has also changed its name to 
> URIDNSBL :-)

SURBL refers generally to those xx_SURBL rules and to URIDNSBL since the only 
other distributed rules is SBL and SURBL started it all.

 I do not use SARE rules (although I am trying to find time to 
> look at them, as I am aware of their credibility). I use Gray's rules 
> (http://files.grayonline.id.au), they seem quite efficient.

I wasn't aware of that site, but now that I visited it, I remember I visited it 
at least once. Use whatever works for you. After all, all this stuff isn't done 
to make you try out again and again but to help you focus your time on the 
important things.

> I understand what you say. The point is, what should be the criteria to 
> understand if the time for an expiration has come? I mean, supposing we take 
> only the size in consideration, could be a problem. What if some old tokens 
> are still common nowadays in spam mail?

This is not a problem. Expiry isn't done by "addition time", but by access time 
(short: atime). So, items which didn't occur recently drop to the "end" of the 
db and get removed by expiry. There's always the chance that old tokens which 
haven't been seen for a long time "come back". But the chance is slimmer the 
older the atime of that token is. There's probably some statistical curve 
algorithm which could be used to determine the best "break point". Because of 
the way dbx databases work expiry can't be done this way, though.

> As I told you, since my last post I have reset everything.  It seems to me 
> it works fine, and it learns rapidly. It gives me no reason not to trust it, 
> in a degree I have set my SA score to be more or less equal with the 
> BAYES_99 score (around 8).

Your BAYES_99 score is 8? I would never do this. General rule is that no single 
rule should be able to mark a message as ham or spam. That cries for false 
positives.

 Of course I keep doing mistake-based learning, 
> but most of the times I feed it with 'subjective' spam mail (ie. mail that 
> my users don't want to receive, but is definitely not spam).

What kind of mail is that? Newsletters they once subscribed to and don't like 
anymore? They should unsubscribe instead of declaring it as spam.


Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org




Re: Bayes DB does not grow anymore

Posted by GRP Productions <gr...@hotmail.com>.
>Thanks for the offer. You can send it to the email address I use for this 
>list,
>or you could just send me an FTP URL for retrieval.

Sorry I did not find the time to do this, but I will try to send it during 
the weekend.

>Oh, yes. You need to have SURBL switched on via the init.pre (I think it's 
>off
>by default) and you should use custom rules. I use a set of carefully 
>chosen
>rulesets mostly from SARE and updated via rulesdujour and some more rules 
>of my
>own accumulated over time.

It seems SURBL is now enabled by default. It has also changed its name to 
URIDNSBL :-) I do not use SARE rules (although I am trying to find time to 
look at them, as I am aware of their credibility). I use Gray's rules 
(http://files.grayonline.id.au), they seem quite efficient.

>I think on a heavy traffic machine it's preferrable to have it off, 
>especially
>when using MailScanner. Otherwise the expiry can kick in at random times 
>every
>few hours (you can set a minimum time, though, f.i. one day). Some people 
>run a
>scheduled expiry three times a day. That's an advice which often comes up 
>on
>the Mailscanner list (which is a very helpful list, btw).
>Depends on how often you need it (whether it reaches the limit you want to 
>hold
>more often or not). Starting with one expiry per night should be fine, but 
>you
>should occasionally expire manually and look at the output, in case there 
>are
>problems.

>No. One should get rid of really old tokens, they are only "ballast" in the 
>db.
>I don't know how a big db behaves on a busy site. Ours contain 1 Mio. 
>tokens
>and have a size of 40 MB. They work very well with no ressource hogging. 
>But I
>have only a few thousand messages running thru each of our servers, there's
>probably none which gets more than 10.000 a day. If you get 100.000 it may 
>be
>different.

I understand what you say. The point is, what should be the criteria to 
understand if the time for an expiration has come? I mean, supposing we take 
only the size in consideration, could be a problem. What if some old tokens 
are still common nowadays in spam mail? You could say it doesn't matter it 
will be started again and recognize all the bad stuff. In that sense, we 
could just stop maintaining Bayes completely.

>That's what we do. I only learn messages which were categorized wrong. Not 
>by
>Bayes, but by SA. Most messages which get a score lower than 5 still get a
>BAYES_99 which means that Bayes identifies them all. Nevertheless, I learn
>these messages because they are spam and it reassures Bayes that they are 
>spam.
>BTW: I have set BAYES_99 to 3.0, because it's so accurate for us.

As I told you, since my last post I have reset everything.  It seems to me 
it works fine, and it learns rapidly. It gives me no reason not to trust it, 
in a degree I have set my SA score to be more or less equal with the 
BAYES_99 score (around 8). Of course I keep doing mistake-based learning, 
but most of the times I feed it with 'subjective' spam mail (ie. mail that 
my users don't want to receive, but is definitely not spam). I monitor it 
constantly and I am happy about it.

>No problem :-) I tend to be a bit snappy on first messages which look to me
>like the author could have done a bit more research, but once we are over 
>that
>stage I hope I can give some good advice based on my experience.

I have to admit that our communication was valuable to me, I learned so much 
about how the whole thing works. Once again, I appreciate it.

Greg

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


Re: Bayes DB does not grow anymore

Posted by Kai Schaetzl <ma...@conactive.com>.
GRP Productions wrote on Tue, 15 Mar 2005 01:12:53 +0200:

> >I have been trying to get something from CVS for several days now, no luck. 
>  
> Send me your email in private (grpprod@hotmail.com) to send it to you.

Thanks for the offer. You can send it to the email address I use for this list, 
or you could just send me an FTP URL for retrieval.

> I will probably start again from scratch. One point: Do you think I should 
> put custom rules inside /etc/mail/spamassassin or the default installation 
> is enough? 

Oh, yes. You need to have SURBL switched on via the init.pre (I think it's off 
by default) and you should use custom rules. I use a set of carefully chosen 
rulesets mostly from SARE and updated via rulesdujour and some more rules of my 
own accumulated over time.

> Yes I just added this. Should auto_expire remain always at 0?

I think on a heavy traffic machine it's preferrable to have it off, especially 
when using MailScanner. Otherwise the expiry can kick in at random times every 
few hours (you can set a minimum time, though, f.i. one day). Some people run a 
scheduled expiry three times a day. That's an advice which often comes up on 
the Mailscanner list (which is a very helpful list, btw).
Depends on how often you need it (whether it reaches the limit you want to hold 
more often or not). Starting with one expiry per night should be fine, but you 
should occasionally expire manually and look at the output, in case there are 
problems.


 Also, do you 
> think it would be better if the db NEVER expired?

No. One should get rid of really old tokens, they are only "ballast" in the db. 
I don't know how a big db behaves on a busy site. Ours contain 1 Mio. tokens 
and have a size of 40 MB. They work very well with no ressource hogging. But I 
have only a few thousand messages running thru each of our servers, there's 
probably none which gets more than 10.000 a day. If you get 100.000 it may be 
different.


 Would this value of 500000 
> achieve that? I don't want to come at work some day and see my tokens were 
> lost again :-( 

Just look at what the dump says about your oldest token. If your bayes 
"performance" is good than the hold time is probably of no interest, but if the 
spam detection from bayes is bad and you have a short hold time one of the 
things I would look at is the short hold time.


>  
> In general, should I do as you said, ie. trust the autolearn system and 
> never use sa-learn again, provided that I do not have the time to do full 
> training. 

That's what we do. I only learn messages which were categorized wrong. Not by 
Bayes, but by SA. Most messages which get a score lower than 5 still get a 
BAYES_99 which means that Bayes identifies them all. Nevertheless, I learn 
these messages because they are spam and it reassures Bayes that they are spam.
BTW: I have set BAYES_99 to 3.0, because it's so accurate for us.

>  
> Thanks for giving me so much of your time, and being so patient with my 
> silly questions.

No problem :-) I tend to be a bit snappy on first messages which look to me 
like the author could have done a bit more research, but once we are over that 
stage I hope I can give some good advice based on my experience.


Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org




RE: Sudden spam to this email address

Posted by Rob McEwen <ro...@powerviewsystems.com>.
David B Funk said:
>geocities is pretty good about taking crap down once they're notified,

Yes... but it often takes them a couple of days to get this done... even
when kiddy pron is involved.

I wish geocities would respond faster to such complaints.

Also, much higher volumes of spam mail with geocities.com URLs hit my server
than legit mail with geocities.com URLs.

Rob McEwen


Re: Sudden spam to this email address

Posted by David B Funk <db...@engineering.uiowa.edu>.
On Mon, 14 Mar 2005, Jeff Chan wrote:

> Well when they can sell spams that don't advertise a web site
> for the same price as those that do, let us know.  Until
> then SURBLs have them.
>
> Jeff C.

OK, how about 419'ers or stock scammers?

The child porn sites that use: http://beam.to/adultworld
or http://angels.hk.to  or a page at geocities?

geocities is pretty good about taking crap down once they're
notified, but that angels.hk.to site has been around for months.

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: Sudden spam to this email address

Posted by Matt Kettler <mk...@evi-inc.com>.
Stuart Johnston wrote:

> Hey, SURBLs are GREAT, no doubt about it but lets not kid ourselves.  
> It is a long way from a 100% spam solution.


I think Jeff's point is that SURBL is one test spammers have a limited 
ability to adapt to without cutting into their bottom line. Not that 
it's perfect.

Re: Sudden spam to this email address

Posted by Stuart Johnston <st...@ebby.com>.
Jeff Chan wrote:
> On Tuesday, March 15, 2005, 9:02:44 AM, Stuart Johnston wrote:
> 
>>SURBLs have them... most of the time... eventually...  Er, yeah.
> 
> 
> Just to check, are you using ob.surbl.org and jp.surbl.org
> in multi.surbl.org, i.e.:

In the last ~24 hours:

All SA > 5: 	32540
*_SURBL: 	22361 (69%)
JP_SURBL:	20157 (62%)
OB_SURBL:	19900 (61%)

This is after a couple of DNSBLs at SMTP which may skew my stats.

Re: Sudden spam to this email address

Posted by Jeff Chan <je...@surbl.org>.
On Tuesday, March 15, 2005, 9:02:44 AM, Stuart Johnston wrote:
> SURBLs have them... most of the time... eventually...  Er, yeah.

Just to check, are you using ob.surbl.org and jp.surbl.org
in multi.surbl.org, i.e.:

urirhssub URIBL_JP_SURBL  multi.surbl.org.        A   64
body      URIBL_JP_SURBL  eval:check_uridnsbl('URIBL_JP_SURBL')
describe  URIBL_JP_SURBL  Has URI in JP at http://www.surbl.org/lists.html
tflags    URIBL_JP_SURBL  net

score URIBL_JP_SURBL    4.0

They tend to catch new domains pretty quickly.

Jeff C.
-- 
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/


Re: Sudden spam to this email address

Posted by Stuart Johnston <st...@ebby.com>.
Jeff Chan wrote:
> 
> Well when they can sell spams that don't advertise a web site
> for the same price as those that do, let us know.  Until
> then SURBLs have them.

SURBLs have them... most of the time... eventually...  Er, yeah.

Hey, SURBLs are GREAT, no doubt about it but lets not kid ourselves.  It 
is a long way from a 100% spam solution.

Re: Sudden spam to this email address

Posted by Jeff Chan <je...@surbl.org>.
On Monday, March 14, 2005, 10:31:29 PM, Matt Kettler wrote:
> I am 100% certain that there are spammers subscribed to this list, or are
> getting the messages in some manner or another.  It's rather obvious why 
> they do it. Spam tools seem to quickly adapt to subjects discussed here. 
> List harvesting is a bonus. 

Well when they can sell spams that don't advertise a web site
for the same price as those that do, let us know.  Until
then SURBLs have them.

Jeff C.
-- 
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/


Re: Sudden spam to this email address

Posted by Matt Kettler <mk...@comcast.net>.
At 11:35 PM 3/14/2005, Greg Allen wrote:

>Does posting to this list open me up to dweebs harvesting email addresses?

Without a doubt, yes.

I am 100% certain that there are spammers subscribed to this list, or are 
getting the messages in some manner or another.  It's rather obvious why 
they do it. Spam tools seem to quickly adapt to subjects discussed here. 
List harvesting is a bonus. 


RE: Sudden spam to this email address

Posted by Matt Kettler <mk...@comcast.net>.
At 11:53 PM 3/14/2005, Greg Allen wrote:
>Yep, I just found the culprit.
>
>The below 2 websites volunteer SA users-list email addresses for all the
>world to harvest. I found my email address in Google from posting here on
>this list.

One of many.. As I pointed out before, there's probably multiple spammers 
who are directly subscribed to the list.

>Be warned, if you post to this list use a throw-away email address unless
>you are looking to have a good test account for SA. :-)

That should be said of *any* mailing list that's open to public 
subscription. Period. They're all vulnerable to being mined regardless of 
what web archives they have. All the spammer needs to do is subscribe a 
"legitimate" account to the list and run all the messages through their 
list-mining software. As long as that address is only used for harvesting, 
and not used as a drop box for spam runs, nobody is ever likely to be the 
wiser to it.

Let's face it, telling the difference between a lurker who subscribes but 
never posts, and a spammer who mines but never posts is pretty much 
impossible. It's like trying to tell if a stranger is a spammer. The guy at 
the table behind you at lunch could be a spammer, and you'd not know. Only 
a few of the really big-time spammers get their pictures circulated.

The spammer is not easily recognized. The spammer is among us, and looks 
very much like us. Don't be fooled into thinking the spammer isn't there 
just because you can't see him. It's in his best interest to be here, and 
it's also in his best interest to blend in and not be noticed. Don't 
underestimate the spammers, some may be stupid, but some are also clever 
(albeit morally deficient).

Spying on one's adversaries is a battle tactic which is  thousands of years 
old. It goes on all the time between governments, militiaries, police and 
criminals, companies, neighbors. Why not here?

I'm sure at least some spammers know to spy on their adversaries... to spy 
on us... here... on this list.

And I'm SURE they have no moral problems with doing so.

   


Re: Sudden spam to this email address

Posted by Bob Proulx <bo...@proulx.com>.
Mike Burger wrote:
> The second link definitely gets you to, what appear to be, the raw list 
> archive files.

I did not see any "raw list archives" at this moment.  But I did see
the mail address in the mail archives here.  This one for example.

  http://spamassassin.apache.org/mail/users/200503

> In addition, the actual "archives", that are viewable to the world, show 
> the senders' email addresses.

Yes, but so does the mailing list.  Anyone can subscribe to the
mailing list.  And mailing lists that provide anonymity have been
around before but usually they have their own set of really bad
problems.  Basically web forums today are the anonymous media today.

There can be no illusion that your mail address is secret after
posting to a public mailing list.  So any spammer could get it from
there directly by subscribing regardless of how it was handled in mail
archives.  I think obfuscating addresses is just closing the barn door
after the animals have already escaped.  It just frustrates you and
annoys the pig.[1]  But even mailing addresses only known by friends
will get leaked out because a friend will sign you up for an email
greeting card or some other such frivolous thing and get you on a
spammer's list.

However I think the true leak is web pages.  I have seen studies
showing that between one to four weeks after an email address shows up
on a web site that it will start collecting spam.  And almost all
mailings lists are gateway'd to web pages somewhere on the 'net these
days.

When I web search for my email address it scary how many hits come
back.  I have old addresses from the late 1980's that are still found
by web searches.  Yet I still get very little spam to my mailbox.
RBLs, greylisting, virus filtering, spamassassin.  Sad that those are
needed.  But that is the way of things.  Fortunately they are very
effective.

Bob

[1] Let's see how long the OT followup thread goes about that analogy. :-)

Re: Sudden spam to this email address

Posted by Mike Burger <mb...@bubbanfriends.org>.
Not his point.

The second link definitely gets you to, what appear to be, the raw list 
archive files.

The first link got me a blank page.

In addition, the actual "archives", that are viewable to the world, show 
the senders' email addresses.

Seems to me that whatever's generating the list archives, the raw files 
should be hidden from the world.  It also occurs to me that apache.org 
should either be using a list manager whose archives feature hides the 
email addresses (MailMan comes to mind) or a tool that properly masks the 
addresses...I believe mail2html, or somesuch.

But that's just my 2 cents worth.

On Mon, 14 Mar 2005, Thomas Cameron wrote:

> I don't post terribly frequently, but I certaibly do post to this list (and 
> many others).  Ditto for Usenet.  No throw-away addresses for me.
>
> I use SpamAssassin with Pyzor, Razor, DCC, and network checks, ClamAV, and 
> greylisting.
>
> I can remember one spam message that made it into my Inbox this year.  One.
>
> I can't shout from the roof tops loudly or often enough:  "SpamAssassin 
> works!"  :-)
>
> Thomas
>
> ----- Original Message ----- From: "Greg Allen" <ga...@netrox.net>
> To: <us...@spamassassin.apache.org>
> Sent: Monday, March 14, 2005 10:53 PM
> Subject: RE: Sudden spam to this email address
>
>
>> Yep, I just found the culprit.
>> 
>> The below 2 websites volunteer SA users-list email addresses for all the
>> world to harvest. I found my email address in Google from posting here on
>> this list.
>> 
>> aspn.activestate.com/ASPN/ Mail/Message/spamassassin-users
>> 
>> spamassassin.apache.org/mail/users
>> 
>> 
>> Be warned, if you post to this list use a throw-away email address unless
>> you are looking to have a good test account for SA. :-)
>> 
>> 
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Greg Allen [mailto:gallen@netrox.net]
>> Sent: Monday, March 14, 2005 11:36 PM
>> To: users@spamassassin.apache.org
>> Subject: Sudden spam to this email address
>> 
>> 
>> Does posting to this list open me up to dweebs harvesting email addresses?
>> 
>> I'm suddenly getting BS spams to this email address, and they have to be
>> coming from one of two sources. This list being one of the options.
>> 
>> Thanks.
>> 
>> 
>

--
Mike Burger
http://www.bubbanfriends.org

Visit the Dog Pound II BBS
telnet://dogpound2.citadel.org or http://dogpound2.citadel.org

To be notified of updates to the web site, visit 
http://www.bubbanfriends.org/mailman/listinfo/site-update, or send a 
message to:

site-update-request@bubbanfriends.org

with a message of:

subscribe

Re: Sudden spam to this email address

Posted by Thomas Cameron <th...@camerontech.com>.
I don't post terribly frequently, but I certaibly do post to this list (and 
many others).  Ditto for Usenet.  No throw-away addresses for me.

I use SpamAssassin with Pyzor, Razor, DCC, and network checks, ClamAV, and 
greylisting.

I can remember one spam message that made it into my Inbox this year.  One.

I can't shout from the roof tops loudly or often enough:  "SpamAssassin 
works!"  :-)

Thomas

----- Original Message ----- 
From: "Greg Allen" <ga...@netrox.net>
To: <us...@spamassassin.apache.org>
Sent: Monday, March 14, 2005 10:53 PM
Subject: RE: Sudden spam to this email address


> Yep, I just found the culprit.
>
> The below 2 websites volunteer SA users-list email addresses for all the
> world to harvest. I found my email address in Google from posting here on
> this list.
>
> aspn.activestate.com/ASPN/ Mail/Message/spamassassin-users
>
> spamassassin.apache.org/mail/users
>
>
> Be warned, if you post to this list use a throw-away email address unless
> you are looking to have a good test account for SA. :-)
>
>
>
>
>
>
> -----Original Message-----
> From: Greg Allen [mailto:gallen@netrox.net]
> Sent: Monday, March 14, 2005 11:36 PM
> To: users@spamassassin.apache.org
> Subject: Sudden spam to this email address
>
>
> Does posting to this list open me up to dweebs harvesting email addresses?
>
> I'm suddenly getting BS spams to this email address, and they have to be
> coming from one of two sources. This list being one of the options.
>
> Thanks.
>
> 


RE: Sudden spam to this email address

Posted by Greg Allen <ga...@netrox.net>.
Yep, I just found the culprit.

The below 2 websites volunteer SA users-list email addresses for all the
world to harvest. I found my email address in Google from posting here on
this list.

aspn.activestate.com/ASPN/ Mail/Message/spamassassin-users

spamassassin.apache.org/mail/users


Be warned, if you post to this list use a throw-away email address unless
you are looking to have a good test account for SA. :-)






-----Original Message-----
From: Greg Allen [mailto:gallen@netrox.net]
Sent: Monday, March 14, 2005 11:36 PM
To: users@spamassassin.apache.org
Subject: Sudden spam to this email address


Does posting to this list open me up to dweebs harvesting email addresses?

I'm suddenly getting BS spams to this email address, and they have to be
coming from one of two sources. This list being one of the options.

Thanks.


Sudden spam to this email address

Posted by Greg Allen <ga...@netrox.net>.
Does posting to this list open me up to dweebs harvesting email addresses?

I'm suddenly getting BS spams to this email address, and they have to be
coming from one of two sources. This list being one of the options.

Thanks.


Re: Bayes DB does not grow anymore

Posted by GRP Productions <gr...@hotmail.com>.
>I have been trying to get something from CVS for several days now, no luck.

Send me your email in private (grpprod@hotmail.com) to send it to you.

>Bayes needs constant training, but this doesn't mean it needs any manual
>training. Once it's up and running and "well-greased" it should take care 
>of
>itself by auto-learning (bayes_auto_learn 1, don't know if on by default).
>About 70 or 80% of our spam and ham (especially the spam) is autolearned.

I will probably start again from scratch. One point: Do you think I should 
put custom rules inside /etc/mail/spamassassin or the default installation 
is enough?

>Actually, with those "few" tokens you won't loose much if you throw it away 
>;-)
>As I said upping that should help, no need to throw it away unless you 
>think
>that's easier (if most spam you get scores at BAYES_50 it might be better 
>to
>start over than to convince the db that it's spam).

I'll probably do it.

> > bayes_auto_expire 0
>
> > bayes_expiry_max_db_size 500000
>I assume you just added>/changed that?

Yes I just added this. Should auto_expire remain always at 0? Also, do you 
think it would be better if the db NEVER expired? Would this value of 500000 
achieve that? I don't want to come at work some day and see my tokens were 
lost again :-(

In general, should I do as you said, ie. trust the autolearn system and 
never use sa-learn again, provided that I do not have the time to do full 
training.

Thanks for giving me so much of your time, and being so patient with my 
silly questions.
Best regards,
Greg

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


Re: Bayes DB does not grow anymore

Posted by Kai Schaetzl <ma...@conactive.com>.
GRP Productions wrote on Mon, 14 Mar 2005 03:41:40 +0200:

> Indeed, this is the CVS version :-) 

I have been trying to get something from CVS for several days now, no luck.

> This is perhaps because I have been using only 'mistake-based' training (ie 
> training only when false classificaiton happens). However this used to work 
> fine. 

Bayes needs constant training, but this doesn't mean it needs any manual 
training. Once it's up and running and "well-greased" it should take care of 
itself by auto-learning (bayes_auto_learn 1, don't know if on by default). 
About 70 or 80% of our spam and ham (especially the spam) is autolearned.

>  
> >your "hold time" is quite low, it's about a month. I think we haven tokens 
> >from 
> >even a year ago. That's maybe a bit too much, but I strongly suggest upping 
> >your bayes_expiry_max_db_size to something like 500.000 or so. Since you 
> >have a 
> >much higher flux of messages than we have on that machine you are literally 
> >"burning" your db to uselessness. 
>  
> So what would you suggest? I certainly dont want to lose everything that has 
> been learned till now. 

Actually, with those "few" tokens you won't loose much if you throw it away ;-) 
As I said upping that should help, no need to throw it away unless you think 
that's easier (if most spam you get scores at BAYES_50 it might be better to 
start over than to convince the db that it's spam).

> Nope, there is definitely only the one comng with MS. I never use SA from 
> the command line anyway.

Well, let's go back:
you sa-learn a message, it says it learned, you dump magic and see there's no 
change, you look in the directory and there's no journal. There *has* to be at 
least one additional Bayes db. Or something happens which I haven't heard of in 
my about three years of using SA+Bayes. What's the output of "sa-learn --dump 
magic"? Don't specify a config file!
 
> bayes_path              /var/spool/MailScanner/bayes/bayes 

and what's in your /etc/mail/spamassassin/local.conf?

> bayes_auto_expire 0
ok, that means it won't expire. Of course, if it doesn't grow this isn't 
necessary ... ;-)

> bayes_expiry_max_db_size 500000
I assume you just added>/changed that?

> If I get it you mean that the tokens are lost very quickly?

Yes. However, now that I know that your bayes_expiry is off we have a different 
case? Since when has it been off? Since Feb. 11 as your dump magic suggests? 
Your oldest token is Feb. 2. So that either means your started the db that day 
or you are burning your tokens in 10 days. That's one problem, upping to a 
higher ceiling, as you already did, should take care of that. The other problem 
is that it's apparently not growing. One of the reasons is, of course, that you 
only learn by mistake. So, how often is that done? How many do you actually add 
this way? The second part of this other problem is that even if you learn it 
doesn't seem to learn. I don't see another possibility as that it uses 
different dbs.

 I think am 
> confused , if bayes works with tokens, why does it need nspam and nham? Or 
> are they just counters? 

It's just the number of spam and ham messages you learned to it. Yes, it's more 
or less informational only.

>  
> In general, do you think that setting bayes_expiry_max_db_size would be 
> enough? 

To cure the fast expiration, yes, but you didn't expire for the last 30 days, 
anyway.

> One final thing: Why even if i manually expire, the date of last expiration 
> remains old?

Same reason as above: you work on different dbs. What does the expire output 
show?


Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org




Re: Bayes DB does not grow anymore

Posted by GRP Productions <gr...@hotmail.com>.
>That's okay, the problem just is one cannot be sure how accurate it is. 
>Knowing
>that you use MS would have been useful, anyway :-)
>(BTW: my version of Mailwatch can't show this, do you use a CVS version?)

Indeed, this is the CVS version :-)

>See the number of tokens, we have ten times yours with less learned mail. 
>That
>means that our db has much more tokens to qualify an email as ham or spam. 
>Also

This is perhaps because I have been using only 'mistake-based' training (ie 
training only when false classificaiton happens). However this used to work 
fine.

>your "hold time" is quite low, it's about a month. I think we haven tokens 
>from
>even a year ago. That's maybe a bit too much, but I strongly suggest upping
>your bayes_expiry_max_db_size to something like 500.000 or so. Since you 
>have a
>much higher flux of messages than we have on that machine you are literally
>"burning" your db to uselessness.

So what would you suggest? I certainly dont want to lose everything that has 
been learned till now.

>And you learned by specifying the config file? I suspect that you are at 
>least
>occasionally using two SA configurations, the one coming with MS and the 
>one
>coming with SA.

Nope, there is definitely only the one comng with MS. I never use SA from 
the command line anyway.

>Oh. Still possible, though. You don't need to have one, but on high volume
>systems it's highly recommended. Check your SA config (whereever it is :-) 
>for
>bayes_learn_to_journal 1. I don't know if it is 1 by default, though. What 
>do
>you have starting with bayes in your config file?

# grep bayes /opt/MailScanner/etc/spam.assassin.prefs.conf
# be created as /var/spool/spamassassin/bayes_msgcount, etc.
#bayes_path                 /var/spool/spamassassin/bayes
#bayes_file_mode            0600
bayes_path              /var/spool/MailScanner/bayes/bayes
bayes_file_mode         0666
# MailScanner: big bayes_toks.new files wasting space.
bayes_auto_expire 0
bayes_expiry_max_db_size 500000
bayes_ignore_header X-MailScanner
bayes_ignore_header X-MailScanner-SpamCheck
bayes_ignore_header X-MailScanner-SpamScore
bayes_ignore_header X-MailScanner-Information
# use_bayes 0

>Don't know if this would be of any help. As I said, I suspect you are using 
>at
>least two different bayes dbs. At least when you do it from the command 
>line.
>Run an "updatedb" and then "locate bayes" (this may not locate all files, 
>f.i.
>not in /var !).

I think there is only one.

>MS, of course, can only use one and doesn't have a chance of confusing 
>that, so
>when it uses SA that learns and checks the same db. And so far that part 
>seems
>to be okay (except for the bigger size of bayes_seen, but as I said, this 
>may
>be normal for your setup, I really don't know). But you burn your tokens 
>too
>fast. At least that's what I think.

If I get it you mean that the tokens are lost very quickly? I think am 
confused , if bayes works with tokens, why does it need nspam and nham? Or 
are they just counters?

In general, do you think that setting bayes_expiry_max_db_size would be 
enough?
One final thing: Why even if i manually expire, the date of last expiration 
remains old?

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


Re: Bayes DB does not grow anymore

Posted by Kai Schaetzl <ma...@conactive.com>.
GRP Productions wrote on Mon, 14 Mar 2005 00:32:42 +0200:

> You are right, I am using MailWatch. I just posted this output to be easy 
> for one to see the actual dates without having to convert.

That's okay, the problem just is one cannot be sure how accurate it is. Knowing 
that you use MS would have been useful, anyway :-)
(BTW: my version of Mailwatch can't show this, do you use a CVS version?)

 Here is the 
> actual output: 
>  
> # /usr/bin/sa-learn -p /opt/MailScanner/etc/spam.assassin.prefs.conf --dump 
> magic 
> 0.000          0          3          0  non-token data: bayes db version 
> 0.000          0      49740          0  non-token data: nspam 
> 0.000          0      47167          0  non-token data: nham 
> 0.000          0     123325          0  non-token data: ntokens

I didn't look at this closely before, but I think this ratio indicates a 
problem, f.i. this is from our own mail server (just getting our own mail, not 
our clients'):

0.000          0      30089          0  non-token data: nspam
0.000          0      12515          0  non-token data: nham
0.000          0    1001630          0  non-token data: ntokens

See the number of tokens, we have ten times yours with less learned mail. That 
means that our db has much more tokens to qualify an email as ham or spam. Also 
your "hold time" is quite low, it's about a month. I think we haven tokens from 
even a year ago. That's maybe a bit too much, but I strongly suggest upping 
your bayes_expiry_max_db_size to something like 500.000 or so. Since you have a 
much higher flux of messages than we have on that machine you are literally 
"burning" your db to uselessness.

> No it isn't. This is exactly the point I mentioned.

But you didn't prove it ;-)

 But as I said earlier, 
> sa-learn claims it has learned, even from the web interface: 
> >SA Learn: Learned from 1 message(s) (1 message(s) examined). 

And you learned by specifying the config file? I suspect that you are at least 
occasionally using two SA configurations, the one coming with MS and the one 
coming with SA.

> This is getting more suspicious: there is no bayes_journal file! 

Oh. Still possible, though. You don't need to have one, but on high volume 
systems it's highly recommended. Check your SA config (whereever it is :-) for 
bayes_learn_to_journal 1. I don't know if it is 1 by default, though. What do 
you have starting with bayes in your config file?

> -rw-rw-rw-  1 root nobody     1236 Mar 14 00:22 bayes.mutex 
> -rw-rw-rw-  1 root nobody 10452992 Mar 14 00:22 bayes_seen 
> -rw-rw-rw-  1 root nobody  5509120 Mar 14 00:02 bayes_toks 

bayes_seen is quite high. I haven't ever seen that it is higher than bayes_toks 
on our systems. But maybe that's normal for high volume systems, I don't know. 
On the Mailscanner list many people complain about very big bayes_seen files. 
Someone else on this list should comment on the size.

> I can assure you noone has touched anything inside this directory. If this 
> is the reason for the problems I've been facing, is there a way to recreate 
> the file without having to lose my current data? (perhaps by copying the 
> above files somewhere, execute sa-learn --clear and some time later restore 
> the above files?)

Don't know if this would be of any help. As I said, I suspect you are using at 
least two different bayes dbs. At least when you do it from the command line. 
Run an "updatedb" and then "locate bayes" (this may not locate all files, f.i. 
not in /var !).
MS, of course, can only use one and doesn't have a chance of confusing that, so 
when it uses SA that learns and checks the same db. And so far that part seems 
to be okay (except for the bigger size of bayes_seen, but as I said, this may 
be normal for your setup, I really don't know). But you burn your tokens too 
fast. At least that's what I think.


Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org




RE: 2 pops

Posted by Matt Kettler <mk...@evi-inc.com>.
At 02:53 PM 3/14/2005, S M.C Butler wrote:
> >1) what does this have to do with the thread "Re: Bayes DB does not grow
> >anymore"?
> >
>
>oops I replied to that mail to get the mailing list address and forgot to
>delete the inline text, sorry about that.

Even if you did remove the inline text, it's still going to show up as a 
reply to that thread... The "In-Reply-To:" header will give you away.. From 
your original post:

>In-Reply-To: <BA...@phx.gbl>

Based on this, any threading mail readers and list archives will burry your 
post as a reply, rather than showing it as a new thread.

Take a look at the GMANE archives for an example of threading:
http://news.gmane.org/gmane.mail.spam.spamassassin.general

The big difference in a mail client is that threading mail clients 
generally allow you to collapse threads and you don't see the posts under 
them when collapsed.

When posting a new thread it's really in your best interest to just create 
a new message, and not try to hijack a reply into being something it's not. 


RE: 2 pops

Posted by "S M.C Butler" <si...@icmethods.com>.
>
>1) what does this have to do with the thread "Re: Bayes DB does not grow
>anymore"?
>

oops I replied to that mail to get the mailing list address and forgot to
delete the inline text, sorry about that.

>2) man fetchmail

thx, I'll check it out.


Re: 2 pops

Posted by Matt Kettler <mk...@evi-inc.com>.
At 02:20 PM 3/14/2005, S M.C Butler wrote:
>Hi, I would like to have my mail forwarded to my ISP's account and then
>popped to my server where I can run spam assassin and finally popped a
>second time to my PC. How do I get this 2-level pop mechanism going? How can
>I pop from my ISP account to my server in a way that will allow me to do a
>second pop from /var/mail/username to my pc
>
>  Thx in advance.


1) what does this have to do with the thread "Re: Bayes DB does not grow 
anymore"?

2) man fetchmail 


2 pops

Posted by "S M.C Butler" <si...@icmethods.com>.
Hi, I would like to have my mail forwarded to my ISP's account and then
popped to my server where I can run spam assassin and finally popped a
second time to my PC. How do I get this 2-level pop mechanism going? How can
I pop from my ISP account to my server in a way that will allow me to do a
second pop from /var/mail/username to my pc

 Thx in advance.


>-----Original Message-----
>From: GRP Productions [mailto:grpprod@hotmail.com]
>Sent: Sunday, March 13, 2005 2:33 PM
>To: users@spamassassin.apache.org
>Subject: Re: Bayes DB does not grow anymore
>
>>That is the output of --dump magic? I haven't ever seen it formatted that
>>nicely. I assume you skipped the first line, but there's also missing the
>>expire atime delta. So, where do you got this from? Not directly from
>>sa-learn
>>--dump magic I'd say. You are running SA thru some interface? You should
>>have
>>said something about the whereabouts of your installation.
>
>You are right, I am using MailWatch. I just posted this output to be easy
>for one to see the actual dates without having to convert. Here is the
>actual output:
>
># /usr/bin/sa-learn -p /opt/MailScanner/etc/spam.assassin.prefs.conf --dump
>magic
>0.000          0          3          0  non-token data: bayes db version
>0.000          0      49740          0  non-token data: nspam
>0.000          0      47167          0  non-token data: nham
>0.000          0     123325          0  non-token data: ntokens
>0.000          0 1107319073          0  non-token data: oldest atime
>0.000          0 1110636450          0  non-token data: newest atime
>0.000          0 1108137790          0  non-token data: last journal sync
>atime
>0.000          0 1108129534          0  non-token data: last expiry atime
>0.000          0     804361          0  non-token data: last expire atime
>delta
>0.000          0       3475          0  non-token data: last expire
>reduction count
>
>>Ok. Get the values. Then learn a message to it. Make sure it says that it
>>actually learned, then check the values again. Is either the spam or ham
>>count
>>increased by one or not?
>
>No it isn't. This is exactly the point I mentioned. But as I said earlier,
>sa-learn claims it has learned, even from the web interface:
>>SA Learn: Learned from 1 message(s) (1 message(s) examined).
>
>>Ok, this finally looks a bit suspicious. No sync and no expire for a
>month.
>>If
>>it doesn't sync you don't get new tokens. Check in your bayes directory
>how
>>big
>>your bayes_journal is. I'd think it's quite big. Do a sync now. (Please
>>don't
>>do it via an interface, do it on the command line.) What's the output? Is
>>the
>>journal gone and the number of tokens increased now? If so, you need to
>>investigate why it doesn't sync anymore. Also do an expire then.
>
>This is getting more suspicious: there is no bayes_journal file!
>
># ll /var/spool/MailScanner/bayes/
>total 11780
>drwxrwxrwx  2 root nobody     4096 Mar 14 00:22 .
>drwxr-xr-x  4 root nobody     4096 Mar 13 11:55 ..
>-rw-rw-rw-  1 root nobody     1236 Mar 14 00:22 bayes.mutex
>-rw-rw-rw-  1 root nobody 10452992 Mar 14 00:22 bayes_seen
>-rw-rw-rw-  1 root nobody  5509120 Mar 14 00:02 bayes_toks
>
>I can assure you noone has touched anything inside this directory. If this
>is the reason for the problems I've been facing, is there a way to recreate
>the file without having to lose my current data? (perhaps by copying the
>above files somewhere, execute sa-learn --clear and some time later restore
>the above files?)
>
>Thanks for your help
>
>_________________________________________________________________
>Express yourself instantly with MSN Messenger! Download today it's FREE!
>http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


Re: Bayes DB does not grow anymore

Posted by GRP Productions <gr...@hotmail.com>.
>That is the output of --dump magic? I haven't ever seen it formatted that
>nicely. I assume you skipped the first line, but there's also missing the
>expire atime delta. So, where do you got this from? Not directly from 
>sa-learn
>--dump magic I'd say. You are running SA thru some interface? You should 
>have
>said something about the whereabouts of your installation.

You are right, I am using MailWatch. I just posted this output to be easy 
for one to see the actual dates without having to convert. Here is the 
actual output:

# /usr/bin/sa-learn -p /opt/MailScanner/etc/spam.assassin.prefs.conf --dump 
magic
0.000          0          3          0  non-token data: bayes db version
0.000          0      49740          0  non-token data: nspam
0.000          0      47167          0  non-token data: nham
0.000          0     123325          0  non-token data: ntokens
0.000          0 1107319073          0  non-token data: oldest atime
0.000          0 1110636450          0  non-token data: newest atime
0.000          0 1108137790          0  non-token data: last journal sync 
atime
0.000          0 1108129534          0  non-token data: last expiry atime
0.000          0     804361          0  non-token data: last expire atime 
delta
0.000          0       3475          0  non-token data: last expire 
reduction count

>Ok. Get the values. Then learn a message to it. Make sure it says that it
>actually learned, then check the values again. Is either the spam or ham 
>count
>increased by one or not?

No it isn't. This is exactly the point I mentioned. But as I said earlier, 
sa-learn claims it has learned, even from the web interface:
>SA Learn: Learned from 1 message(s) (1 message(s) examined).

>Ok, this finally looks a bit suspicious. No sync and no expire for a month. 
>If
>it doesn't sync you don't get new tokens. Check in your bayes directory how 
>big
>your bayes_journal is. I'd think it's quite big. Do a sync now. (Please 
>don't
>do it via an interface, do it on the command line.) What's the output? Is 
>the
>journal gone and the number of tokens increased now? If so, you need to
>investigate why it doesn't sync anymore. Also do an expire then.

This is getting more suspicious: there is no bayes_journal file!

# ll /var/spool/MailScanner/bayes/
total 11780
drwxrwxrwx  2 root nobody     4096 Mar 14 00:22 .
drwxr-xr-x  4 root nobody     4096 Mar 13 11:55 ..
-rw-rw-rw-  1 root nobody     1236 Mar 14 00:22 bayes.mutex
-rw-rw-rw-  1 root nobody 10452992 Mar 14 00:22 bayes_seen
-rw-rw-rw-  1 root nobody  5509120 Mar 14 00:02 bayes_toks

I can assure you noone has touched anything inside this directory. If this 
is the reason for the problems I've been facing, is there a way to recreate 
the file without having to lose my current data? (perhaps by copying the 
above files somewhere, execute sa-learn --clear and some time later restore 
the above files?)

Thanks for your help

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


Re: Bayes DB does not grow anymore

Posted by Kai Schaetzl <ma...@conactive.com>.
GRP Productions wrote on Sun, 13 Mar 2005 22:54:22 +0200:

> Perhaps I have not been clear enough. It's not only that the files' size is 
> constant. I am pasting the output of dump magic,

That is the output of --dump magic? I haven't ever seen it formatted that 
nicely. I assume you skipped the first line, but there's also missing the 
expire atime delta. So, where do you got this from? Not directly from sa-learn 
--dump magic I'd say. You are running SA thru some interface? You should have 
said something about the whereabouts of your installation.

 and I have to explain that 
> the nham and nspam values are the same for many days now.

Ok. Get the values. Then learn a message to it. Make sure it says that it 
actually learned, then check the values again. Is either the spam or ham count 
increased by one or not?

> work fine. If I send to myself a message from Yahoo, with subject 'Viagra 
> sex teen ........" and other nice words, I certainly do not want it to pass. 
> Bayes classifies it as 50% spam.  I tried to sa-learn --forget, and then 
> re-learn, still is BAYES_50.

Again, this is NOT how Bayes works. You can't learn it one message and then 
expect it to flag that message as spam next time. Bayes does not work like 
this!
And that it classifies that message as 50%, which means, it cannot determine if 
it's ham or spam, just says that the tokens in the db are not good enough for 
that message. Or maybe it contains enough hammy tokens, whatever.

> Number of Spam Messages: 49,740 
> Number of Ham Messages: 47,167 
> Number of Tokens: 123,325 
> Oldest Token: Wed, 2 Feb 2005 06:37:53 +0200 
> Newest Token: Sat, 12 Mar 2005 16:07:30 +0200 

Says it added/changed time a token yesterday.

> Last Journal Sync: Fri, 11 Feb 2005 18:03:10 +0200 
> Last Expiry: Fri, 11 Feb 2005 15:45:34 +0200 
> Last Expiry Reduction Count: 3,475 tokens

Ok, this finally looks a bit suspicious. No sync and no expire for a month. If 
it doesn't sync you don't get new tokens. Check in your bayes directory how big 
your bayes_journal is. I'd think it's quite big. Do a sync now. (Please don't 
do it via an interface, do it on the command line.) What's the output? Is the 
journal gone and the number of tokens increased now? If so, you need to 
investigate why it doesn't sync anymore. Also do an expire then.


Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org




Re: Bayes DB does not grow anymore

Posted by GRP Productions <gr...@hotmail.com>.
>This doesn't prove anything. sa-learn --dump magic shows you what's inside.
>Also, Bayes is not a checksum system like Razor, that's its strength. If 
>you
>learn something to it that means that it extracts tokens (short pieces) 
>from
>the message and adjusts its internal probability for them being ham or spam 
>by
>a certain factor. Or if it doesn't know that token yet it adds it.
>That the size doesn't grow can have several reasons, f.i. expiry or the 
>fact
>that the db format seems to have some "air" in it, so that it grows in 
>jumps
>and not continually.

Perhaps I have not been clear enough. It's not only that the files' size is 
constant. I am pasting the output of dump magic, and I have to explain that 
the nham and nspam values are the same for many days now. This is not 
normal, since we are talking about a very busy server (more than 4,000 
messages per day). This behaviour has not always been the case, it used to 
work fine. If I send to myself a message from Yahoo, with subject 'Viagra 
sex teen ........" and other nice words, I certainly do not want it to pass. 
Bayes classifies it as 50% spam. I tried to sa-learn --forget, and then 
re-learn, still is BAYES_50. The nham and nspam values used to increase very 
rapidly (sometimes by a value of 200-300 per day). No errors are produced. I 
wouldn't have noticed the particular problem, but fortunately during the 
last days we started having more spam than usual to be passing. Also, I 
tried to force an expiration many times, but as you can see the expiration 
did not take place. Its definitely not a file permission issue.

Thanks

Number of Spam Messages:	49,740
Number of Ham Messages:	47,167
Number of Tokens:	123,325
Oldest Token:	Wed, 2 Feb 2005 06:37:53 +0200
Newest Token:	Sat, 12 Mar 2005 16:07:30 +0200
Last Journal Sync:	Fri, 11 Feb 2005 18:03:10 +0200
Last Expiry:	Fri, 11 Feb 2005 15:45:34 +0200
Last Expiry Reduction Count:	3,475 tokens

_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar - get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/


Re: Bayes DB does not grow anymore

Posted by Kai Schaetzl <ma...@conactive.com>.
GRP Productions wrote on Sun, 13 Mar 2005 11:21:12 +0200:

> for some days now my bayesian DB does not seem to grow. Its size remains 
> stable. It is read with no problems by SA 3.0.2, but nothing new is written. 
> I send an email to me, it is classified as BAYES_50. I sa-learn it as spam, 
> send it again, and it is still BAYES_50 (I expected to see it as BAYES_99).
>

This doesn't prove anything. sa-learn --dump magic shows you what's inside. 
Also, Bayes is not a checksum system like Razor, that's its strength. If you 
learn something to it that means that it extracts tokens (short pieces) from 
the message and adjusts its internal probability for them being ham or spam by 
a certain factor. Or if it doesn't know that token yet it adds it.
That the size doesn't grow can have several reasons, f.i. expiry or the fact 
that the db format seems to have some "air" in it, so that it grows in jumps 
and not continually.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org