You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Gene Heskett <ge...@verizon.net> on 2008/02/08 07:49:52 UTC

sa-learn --ham ground rules

Greetings;

About an hour ago, based on some comments made that the bayes database needed 
trained on ham as well as spam, and because it seemed to be forgetting some 
of the stuff I'd fed it as spam, I re-wrote that filter rule in kmail to 
launch it using one of my sorted directories from a mailing this as the 
argument.  Syntax otherwise the same as the sa-learn-spam filter.

The sa-learn --spam can process a message in 5 to 10 seconds or so, so if I've 
dropped 20 doofus mails in the spam directory and fire it off, I have it done 
and kmail is back among the living in 2-3 minutes.

But, feeding it a 'ham' directory with about 7k messages in it, turned 
sa-learn into a 100% cpu hog, incrementing the message processed number only 
about every 3 to 5 minutes. I couldn't kill it, it kept coming back and I 
must have fed it a kill -9 50 times.  Finally, one of the kills killed x too!  
But no console came back, so I had to hit the reset button.  The reboot was 
like molassses in January, so I did a power down, same story.  Same story 3 
times running, so I went and made a sandwich while it set powered down.  Then 
the reboot was normal up to e2fscking a a 372GB drive I use for amanda, the 
backup proggy.  That hung, with no indication of progress for about 20 
minutes, no marching **** or anything.  But it finally fell through and 
completed the bootup, and is running normally now but it has taken the 
majority of an hour to do this.

So what is the maximum number of files in a directory that one can feed to 
sa-learn --ham and expect it to achieve normal speed?  I vaguely recall 
feeding it my corpus of another folder it was having trouble with a year ago, 
the linux-usb list, 600 to 1k messages in it and it was finished in an hour 
that time.

The command that kmail issues to it is:
sa-learn --ham  /root/Mail/(foldername)/cur

Where foldername is whatever mailing list I want to tell it is ham.

Is this correct?  I've had it setup that way for 2 or 3 years at least and 
till now it hasn't been that much of a problem.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
"What a wonder is USENET; such wholesale production of conjecture from
such a trifling investment in fact."
-- Carl S. Gutekunst

Re: sa-learn --ham ground rules

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.

> Am 2008-02-08 01:49:52, schrieb Gene Heskett:
> > So what is the maximum number of files in a directory that one can feed to 
> > sa-learn --ham and expect it to achieve normal speed?  I vaguely recall 
> > feeding it my corpus of another folder it was having trouble with a year ago, 
> > the linux-usb list, 600 to 1k messages in it and it was finished in an hour 
> > that time.

On 10.02.08 21:31, Michelle Konzack wrote:
> Many programs including "rm", "mv", "ls" and "sa-learn" have a limit in
> the commandline options which is arround 1200 to 1400.

it's not limit of those programs, it's system limit of how much data can be
passed in arguments. And they don't exit, it's the shell who fails to
execute such process 

> Note:  "sa-learn" exits automaticaly if you feed to many messages to it.

no. sa-learn afaik doesn't count that.
-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux - It's now safe to turn on your computer.
Linux - Teraz mozete pocitac bez obav zapnut.

Re: sa-learn --ham ground rules

Posted by Michelle Konzack <li...@freenet.de>.

Am 2008-02-08 01:49:52, schrieb Gene Heskett:
> So what is the maximum number of files in a directory that one can feed to 
> sa-learn --ham and expect it to achieve normal speed?  I vaguely recall 
> feeding it my corpus of another folder it was having trouble with a year ago, 
> the linux-usb list, 600 to 1k messages in it and it was finished in an hour 
> that time.

Many programs including "rm", "mv", "ls" and "sa-learn" have a limit in
the commandline options which is arround 1200 to 1400.

So I would never try to feed more then 1000 messages/files at once to it.

Note:  "sa-learn" exits automaticaly if you feed to many messages to it.

Thanks, Greetings and nice Day
    Michelle Konzack
    Systemadministrator
    Tamay Dogan Network
    Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
##################### Debian GNU/Linux Consultant #####################
Michelle Konzack   Apt. 917                  ICQ #328449886
                   50, rue de Soultz         MSN LinuxMichi
0033/6/61925193    67100 Strasbourg/France   IRC #Debian (irc.icq.com)

Re: sa-learn --ham ground rules

Posted by Gene Heskett <ge...@verizon.net>.

On Friday 08 February 2008, jdow wrote:
>From: "Gene Heskett" <ge...@verizon.net>
>Sent: Friday, 2008, February 08 16:43
>
>> On Friday 08 February 2008, Karsten Bräckelmann wrote:
>>>On Fri, 2008-02-08 at 01:49 -0500, Gene Heskett wrote:
>>>> The command that kmail issues to it is:
>>>> sa-learn --ham  /root/Mail/(foldername)/cur
>>>
>>>You're not using root as your ordinary user account, do you !?
>>>
>>>  guenther
>>
>> In fact I do, but I have myself somewhat in a sandbox as all the mail
>> handling
>> stuff except kmail runs as an unprivileged user, and kmail pulls incoming
>> from that mailbox in /var.  I've been doing that for about 2-3 of years,
>> started it back at FC2.  And running as root since RH5.1.  Yeah, I'm an
>> un-repentant old fart.
>
>Gene, how many times have I told you "don't do that with Linux"? Old fart
>or not, you broke it. Now you get to fix it.

Ya mean I get to keep all the pieces?  Oh goodie.

>You declared you run all mail handling as an unprivileged user. Then you try
>to run sa-learn as root on root's mailbox. There is rather a problem there
>if
>you think about it. Sit and ruminate a few minutes. Seriously - think.
>
>Have you considered what you are doing and where at least one
>hyper-obvious problem lies?
>
>If you're not banging your forehead and screaming "Doh!" by now here is the
>clue bat. If you've figured it out, "DUCK!"

Quack?

>SpamAssassin will be using a Bayes database collected as that unprivileged
>user. It cannot use one generated as root and placed in root's directory
>structure. The last I knew you were trying to use per user Bayes.

That I wouldn't bet on, but spamassassins kids are running as gene, called 
into service by procmail also running as gene so I'd have to assume the 
applicable bayes database its using is the one in /home/gene.
 
>So 
>sa-learn as root will build the file in a place the unprivileged process
>cannot access AND will likely leave the file with privileges that prevent
>access by that unprivileged user.
>
>That's issue one for you to fix.
>
>And if you don't fix your errant ways mama's gonna whup you good.

Good, when you get done I'll buy.

>What size machine are you trying to work on? How deep into your swap
>file are you when you run sa-learn?

xp-2800, a gig of ram, 2 of swap.  Swap is very rarely touched.

>{^_^}   Joanne, ashamed I've known you all these years, Gene. You shame
>        me by not taking advice repeated virtually every time we
>        communicate.

It figures, Joanne would have to see how long she can balance on the soap 
box. ;-)  At least here in Weston, our 'free speech stump', the stump of a 4+ 
foot diameter Grand Old Man that was here & probably 50 feet tall when the 
war between the states was a current event, has a guard rail now, and since 
they left it about 4 feet tall, has a set of steps so even an old fart like 
me can make it up onto it should I feel the urge to make a speech.

However, I see what you are saying, both about perms, and locations.  
Excellent points, I'll see what I can figure out toward making that database 
belong to me instead of root.  Obviously I didn't carry that conversion to 
user near far enough, so I deserve the knuckle rap.

How about I change that kmail filter rule to use:
runcon -l gene sa-learn --spam /path/to/spam
and:
runcon -l gene sa-learn --ham /path/to/ham

Now, I note that the /home/gene/.spamassassin/bayes* stuff is carrying a very 
current time stamp,

# ls -l
total 53332
-rw------- 1 gene gene 20983808 2008-02-08 23:27 auto-whitelist
-rw-rw-rw- 1 gene gene        6 2008-01-03 02:37 auto-whitelist.mutex
-rw------- 1 gene gene    26616 2008-02-08 23:27 bayes_journal
-rw-rw-rw- 1 gene gene   147750 2008-01-03 02:37 bayes.mutex
-rw------- 1 gene gene 41889792 2008-02-08 23:27 bayes_seen
-rw------- 1 gene gene  5292032 2008-02-08 23:27 bayes_toks
-rw-r--r-- 1 gene gene      934 2005-12-14 16:58 init.pre
-rw-r--r-- 1 gene gene     1164 2006-01-16 13:45 user_prefs
-rw-r--r-- 1 gene gene     2397 2005-12-14 16:58 v310.pre

so apparently it is doing some self-learning?

Many thanks girl.  I will get it sorted.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
There is hardly a thing in the world that some man can not make a little
worse and sell a little cheaper.

Re: sa-learn --ham ground rules

Posted by Gene Heskett <ge...@verizon.net>.

On Saturday 09 February 2008, John Hardin wrote:
>Gene Heskett sez:
>> running as root since RH5.1.  Yeah, I'm an un-repentant old fart.
>
>There's no fool like an old fool.

And that's why they pay me the big bucks when something really goes aglay at 
the tv station even if I have been semi-retired since mid 2002.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Lost: gray and white female cat.  Answers to electric can opener.

Re: sa-learn --ham ground rules

Posted by Chris Hoogendyk <ho...@bio.umass.edu>.


Gene Heskett wrote:
> On Saturday 09 February 2008, jdow wrote:
>   
>> From: "John Hardin" <jh...@impsec.org>
>> Sent: Friday, 2008, February 08 21:03
>>
>>     
>>> Gene Heskett sez:
>>>       
>>>> running as root since RH5.1.  Yeah, I'm an un-repentant old fart.
>>>>         
>>> There's no fool like an old fool.
>>>       
>> I'm close enough to Gene's age and have known him long enough I get
>> the right to rap his knuckles. Hm, in about a year that advances a
>> step to rap his knuckles with an iron bar?
>>
>> {^_-}
>>     
>
> Ouch, that would hurt my arthritic joints something terrible. Can it wait till 
> I've had a chance to hit my thumbs with another cortisone shot?  On second 
> thought, the iron bar is less painful in the short term.  The last time I 
> checked, they wanted to do surgery at $5k per thumb and I said how about 
> cortisone?  He said then (15 years ago) that it was $60 a shot, and it would 
> hurt like hell.  He was right on both counts, but that thumb still works 
> today.  Now its the other ones turn I guess.  :)

hmm. hurt like hell? I think that's very Dr. specific. I got a shot that 
was eased in slowly, front loaded with lidocane, back loaded with 
cortisone. It was almost painless, the pain I was experiencing before 
the shot disappeared almost immediately due to the lidocane, and then 
disappeared in a more ongoing basis due to the cortisone. Magic.



---------------

Chris Hoogendyk

-
   O__  ---- Systems Administrator
  c/ /'_ --- Biology & Geology Departments
 (*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst 

<ho...@bio.umass.edu>

--------------- 

Erdös 4

Re: sa-learn --ham ground rules

Posted by Gene Heskett <ge...@verizon.net>.

On Saturday 09 February 2008, jdow wrote:
>From: "John Hardin" <jh...@impsec.org>
>Sent: Friday, 2008, February 08 21:03
>
>> Gene Heskett sez:
>>> running as root since RH5.1.  Yeah, I'm an un-repentant old fart.
>>
>> There's no fool like an old fool.
>
>I'm close enough to Gene's age and have known him long enough I get
>the right to rap his knuckles. Hm, in about a year that advances a
>step to rap his knuckles with an iron bar?
>
>{^_-}

Ouch, that would hurt my arthritic joints something terrible. Can it wait till 
I've had a chance to hit my thumbs with another cortisone shot?  On second 
thought, the iron bar is less painful in the short term.  The last time I 
checked, they wanted to do surgery at $5k per thumb and I said how about 
cortisone?  He said then (15 years ago) that it was $60 a shot, and it would 
hurt like hell.  He was right on both counts, but that thumb still works 
today.  Now its the other ones turn I guess.  :)

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
FORCE YOURSELF TO RELAX!

Re: sa-learn --ham ground rules

Posted by jdow <jd...@earthlink.net>.

From: "John Hardin" <jh...@impsec.org>
Sent: Friday, 2008, February 08 21:03
> 
> Gene Heskett sez:
>>
>> running as root since RH5.1.  Yeah, I'm an un-repentant old fart.
> 
> There's no fool like an old fool.

I'm close enough to Gene's age and have known him long enough I get
the right to rap his knuckles. Hm, in about a year that advances a
step to rap his knuckles with an iron bar?

{^_-}

Re: sa-learn --ham ground rules

Posted by John Hardin <jh...@impsec.org>.

Gene Heskett sez:
>
> running as root since RH5.1.  Yeah, I'm an un-repentant old fart.

There's no fool like an old fool.

-- 
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  USMC Rules of Gunfighting #20: The faster you finish the fight,
  the less shot you will get.
-----------------------------------------------------------------------
 4 days until Abraham Lincoln's and Charles Darwin's 199th Birthdays

Re: sa-learn --ham ground rules

Posted by jdow <jd...@earthlink.net>.

From: "Gene Heskett" <ge...@verizon.net>
Sent: Friday, 2008, February 08 16:43

> On Friday 08 February 2008, Karsten Bräckelmann wrote:
>>On Fri, 2008-02-08 at 01:49 -0500, Gene Heskett wrote:

>>> The command that kmail issues to it is:
>>> sa-learn --ham  /root/Mail/(foldername)/cur
>>
>>You're not using root as your ordinary user account, do you !?
>>
>>  guenther
>
> In fact I do, but I have myself somewhat in a sandbox as all the mail 
> handling
> stuff except kmail runs as an unprivileged user, and kmail pulls incoming
> from that mailbox in /var.  I've been doing that for about 2-3 of years,
> started it back at FC2.  And running as root since RH5.1.  Yeah, I'm an
> un-repentant old fart.

Gene, how many times have I told you "don't do that with Linux"? Old fart
or not, you broke it. Now you get to fix it.

You declared you run all mail handling as an unprivileged user. Then you try
to run sa-learn as root on root's mailbox. There is rather a problem there 
if
you think about it. Sit and ruminate a few minutes. Seriously - think.

Have you considered what you are doing and where at least one
hyper-obvious problem lies?

If you're not banging your forehead and screaming "Doh!" by now here is the
clue bat. If you've figured it out, "DUCK!"

SpamAssassin will be using a Bayes database collected as that unprivileged
user. It cannot use one generated as root and placed in root's directory
structure. The last I knew you were trying to use per user Bayes. So
sa-learn as root will build the file in a place the unprivileged process
cannot access AND will likely leave the file with privileges that prevent
access by that unprivileged user.

That's issue one for you to fix.

And if you don't fix your errant ways mama's gonna whup you good.

What size machine are you trying to work on? How deep into your swap
file are you when you run sa-learn?

{^_^}   Joanne, ashamed I've known you all these years, Gene. You shame
        me by not taking advice repeated virtually every time we
        communicate.

Re: sa-learn --ham ground rules

Posted by Gene Heskett <ge...@verizon.net>.

On Friday 08 February 2008, Karsten Bräckelmann wrote:
>On Fri, 2008-02-08 at 01:49 -0500, Gene Heskett wrote:
>> The sa-learn --spam can process a message in 5 to 10 seconds or so, so if
>> I've dropped 20 doofus mails in the spam directory and fire it off, I have
>> it done and kmail is back among the living in 2-3 minutes.
>
>This seems *way* too high. If there have been only 20 messages total in
>that folder, sa-learn should have processed these in a few *seconds* or
>less.
>
>> But, feeding it a 'ham' directory with about 7k messages in it, turned
>> sa-learn into a 100% cpu hog, [...]
>
>What did you expect? Based on your numbers above, processing that folder
>would have taken 10-20 *hours*...
>
>> incrementing the message processed number only
>> about every 3 to 5 minutes. I couldn't kill it, it kept coming back and I
>> must have fed it a kill -9 50 times.
>
>Hmm. Kmail doesn't start one process per mail by any chance?
>
>> So what is the maximum number of files in a directory that one can feed to
>> sa-learn --ham and expect it to achieve normal speed?
>
>Dunno if there are limitations -- however, your 7k messages should be
>perfectly fine. Just ran a test on a 6k messages mbox file, and there
>was no noticeable difference to a 30 messages test.
>
>> The command that kmail issues to it is:
>> sa-learn --ham  /root/Mail/(foldername)/cur
>
>You're not using root as your ordinary user account, do you !?
>
>  guenther

In fact I do, but I have myself somewhat in a sandbox as all the mail handling 
stuff except kmail runs as an unprivileged user, and kmail pulls incoming 
from that mailbox in /var.  I've been doing that for about 2-3 of years, 
started it back at FC2.  And running as root since RH5.1.  Yeah, I'm an 
un-repentant old fart.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
CPU needs recalibration

Re: sa-learn --ham ground rules

Posted by Michelle Konzack <li...@freenet.de>.

Am 2008-02-13 05:14:38, schrieb Karsten Bräckelmann:
> On Sun, 2008-02-10 at 21:34 +0100, Michelle Konzack wrote:
> > Am 2008-02-08 20:13:10, schrieb Karsten Bräckelmann:
> > > > So what is the maximum number of files in a directory that one can feed to 
> > > > sa-learn --ham and expect it to achieve normal speed?
> > > 
> > > Dunno if there are limitations -- however, your 7k messages should be
> > > perfectly fine. Just ran a test on a 6k messages mbox file, and there
> > > was no noticeable difference to a 30 messages test.
> > 
> > Yeah, you can even feed 200.000 spams from the bebian lists to it IF
> > YOU USE A MAILBOX FILE.  But the OP seems to use Maildir or MH which
> > is slidely different and he seems to exceed the ARGS limit.
> 
> Nope.  The command fragment Gene cared to show us did *not* have any
> wildcard, but a dir. No bash filename expansion, no limit exceeded.

Right, bu if I run 'sa-learn --spam --dir ...' sa-learn exit with
an error message that I have tried to scan to many messages (something
similar)  which mean, perl had exceed the limits.

Thanks, Greetings and nice Day
    Michelle Konzack
    Systemadministrator
    Tamay Dogan Network
    Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
##################### Debian GNU/Linux Consultant #####################
Michelle Konzack   Apt. 917                  ICQ #328449886
+49/177/9351947    50, rue de Soultz         MSN LinuxMichi
+33/6/61925193     67100 Strasbourg/France   IRC #Debian (irc.icq.com)

Re: sa-learn --ham ground rules

Posted by Karsten Bräckelmann <gu...@rudersport.de>.

On Sun, 2008-02-10 at 21:34 +0100, Michelle Konzack wrote:
> Am 2008-02-08 20:13:10, schrieb Karsten Bräckelmann:
> > > So what is the maximum number of files in a directory that one can feed to 
> > > sa-learn --ham and expect it to achieve normal speed?
> > 
> > Dunno if there are limitations -- however, your 7k messages should be
> > perfectly fine. Just ran a test on a 6k messages mbox file, and there
> > was no noticeable difference to a 30 messages test.
> 
> Yeah, you can even feed 200.000 spams from the bebian lists to it IF
> YOU USE A MAILBOX FILE.  But the OP seems to use Maildir or MH which
> is slidely different and he seems to exceed the ARGS limit.

Nope.  The command fragment Gene cared to show us did *not* have any
wildcard, but a dir. No bash filename expansion, no limit exceeded.

  guenther


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: sa-learn --ham ground rules

Posted by Michelle Konzack <li...@freenet.de>.

Am 2008-02-13 10:04:36, schrieb Matus UHLAR - fantomas:
> you can just provide te directory name. sa-learn will then scan the
> directory w/o args limit

Sory, but I use "--dir" since ages and if I have over 1200-1400 messages
sa-learn exit with an error message that I have exceed the limits...

Thanks, Greetings and nice Day
    Michelle Konzack
    Systemadministrator
    Tamay Dogan Network
    Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
##################### Debian GNU/Linux Consultant #####################
Michelle Konzack   Apt. 917                  ICQ #328449886
+49/177/9351947    50, rue de Soultz         MSN LinuxMichi
+33/6/61925193     67100 Strasbourg/France   IRC #Debian (irc.icq.com)

Re: sa-learn --ham ground rules

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.

> Am 2008-02-08 20:13:10, schrieb Karsten Bräckelmann:
> > > So what is the maximum number of files in a directory that one can feed to 
> > > sa-learn --ham and expect it to achieve normal speed?
> > 
> > Dunno if there are limitations -- however, your 7k messages should be
> > perfectly fine. Just ran a test on a 6k messages mbox file, and there
> > was no noticeable difference to a 30 messages test.

On 10.02.08 21:34, Michelle Konzack wrote:
> Yeah, you can even feed 200.000 spams from the bebian lists to it IF
> YOU USE A MAILBOX FILE.  But the OP seems to use Maildir or MH which
> is slidely different and he seems to exceed the ARGS limit.

you can just provide te directory name. sa-learn will then scan the
directory w/o args limit
-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
M$ Win's are shit, do not use it !

Re: sa-learn --ham ground rules

Posted by Gene Heskett <ge...@verizon.net>.

On Tuesday 12 February 2008, John Hardin wrote:
>On Tue, 12 Feb 2008, Gene Heskett wrote:
>> On Sunday 10 February 2008, Michelle Konzack wrote:
>>> Am 2008-02-08 20:13:10, schrieb Karsten Bräckelmann:
>>>>> So what is the maximum number of files in a directory that one can feed
>>>>> to sa-learn --ham and expect it to achieve normal speed?
>>>>
>>>> Dunno if there are limitations -- however, your 7k messages should be
>>>> perfectly fine. Just ran a test on a 6k messages mbox file, and there
>>>> was no noticeable difference to a 30 messages test.
>>>
>>> Yeah, you can even feed 200.000 spams from the bebian lists to it IF
>>> YOU USE A MAILBOX FILE.  But the OP seems to use Maildir or MH which
>>> is slidely different and he seems to exceed the ARGS limit.
>>>
>>> Thanks, Greetings and nice Day
>>>    Michelle Konzack
>>>    Systemadministrator
>>>    Tamay Dogan Network
>>>    Debian GNU/Linux Consultant
>>
>> Guilty, its all in Mail dir format.
>
>xargs then?

Looks interesting if I can grok how to use it.  Another day though, my plate 
runneth over till the weekend now.  Thanks John.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
What an author likes to write most is his signature on the back of a cheque.
		-- Brendan Francis

Re: sa-learn --ham ground rules

Posted by John Hardin <jh...@impsec.org>.

On Tue, 12 Feb 2008, Gene Heskett wrote:

> On Sunday 10 February 2008, Michelle Konzack wrote:
>> Am 2008-02-08 20:13:10, schrieb Karsten Bräckelmann:
>>>> So what is the maximum number of files in a directory that one can feed
>>>> to sa-learn --ham and expect it to achieve normal speed?
>>>
>>> Dunno if there are limitations -- however, your 7k messages should be
>>> perfectly fine. Just ran a test on a 6k messages mbox file, and there
>>> was no noticeable difference to a 30 messages test.
>>
>> Yeah, you can even feed 200.000 spams from the bebian lists to it IF
>> YOU USE A MAILBOX FILE.  But the OP seems to use Maildir or MH which
>> is slidely different and he seems to exceed the ARGS limit.
>>
>> Thanks, Greetings and nice Day
>>    Michelle Konzack
>>    Systemadministrator
>>    Tamay Dogan Network
>>    Debian GNU/Linux Consultant
>
> Guilty, its all in Mail dir format.

xargs then?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   ...in the 2nd amendment the right to arms clause means you have
   the right to choose how many arms you want, and the militia clause
   means that Congress can punish you if the answer is "none."
                                 -- David Hardy, 2nd Amendment scholar
-----------------------------------------------------------------------
  Today: Abraham Lincoln's and Charles Darwin's 199th Birthdays

Re: sa-learn --ham ground rules

Posted by Theo Van Dinter <fe...@apache.org>.

On Tue, Feb 12, 2008 at 04:04:28PM -0500, Gene Heskett wrote:
> Guilty, its all in Mail dir format.

--dir ?

-- 
Randomly Selected Tagline:
"There are all of these warnings and incantations and unnatural rituals
 and everything's veiled in this threat of "you mess with the mayo,
 the mayo mess with you, man."   - Alton Brown, Good Eats, "Mayo Clinc"

Re: sa-learn --ham ground rules

Posted by Gene Heskett <ge...@verizon.net>.

On Sunday 10 February 2008, Michelle Konzack wrote:
>Am 2008-02-08 20:13:10, schrieb Karsten Bräckelmann:
>> > So what is the maximum number of files in a directory that one can feed
>> > to sa-learn --ham and expect it to achieve normal speed?
>>
>> Dunno if there are limitations -- however, your 7k messages should be
>> perfectly fine. Just ran a test on a 6k messages mbox file, and there
>> was no noticeable difference to a 30 messages test.
>
>Yeah, you can even feed 200.000 spams from the bebian lists to it IF
>YOU USE A MAILBOX FILE.  But the OP seems to use Maildir or MH which
>is slidely different and he seems to exceed the ARGS limit.
>
>Thanks, Greetings and nice Day
>    Michelle Konzack
>    Systemadministrator
>    Tamay Dogan Network
>    Debian GNU/Linux Consultant

Guilty, its all in Mail dir format.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
new, adj.:
	Different color from previous model.

Re: sa-learn --ham ground rules

Posted by Michelle Konzack <li...@freenet.de>.

Am 2008-02-08 20:13:10, schrieb Karsten Bräckelmann:
> > So what is the maximum number of files in a directory that one can feed to 
> > sa-learn --ham and expect it to achieve normal speed?
> 
> Dunno if there are limitations -- however, your 7k messages should be
> perfectly fine. Just ran a test on a 6k messages mbox file, and there
> was no noticeable difference to a 30 messages test.

Yeah, you can even feed 200.000 spams from the bebian lists to it IF
YOU USE A MAILBOX FILE.  But the OP seems to use Maildir or MH which
is slidely different and he seems to exceed the ARGS limit.

Thanks, Greetings and nice Day
    Michelle Konzack
    Systemadministrator
    Tamay Dogan Network
    Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
##################### Debian GNU/Linux Consultant #####################
Michelle Konzack   Apt. 917                  ICQ #328449886
                   50, rue de Soultz         MSN LinuxMichi
0033/6/61925193    67100 Strasbourg/France   IRC #Debian (irc.icq.com)

Re: sa-learn --ham ground rules

Posted by Karsten Bräckelmann <gu...@rudersport.de>.

On Fri, 2008-02-08 at 01:49 -0500, Gene Heskett wrote:
> The sa-learn --spam can process a message in 5 to 10 seconds or so, so if I've 
> dropped 20 doofus mails in the spam directory and fire it off, I have it done 
> and kmail is back among the living in 2-3 minutes.

This seems *way* too high. If there have been only 20 messages total in
that folder, sa-learn should have processed these in a few *seconds* or
less.

> But, feeding it a 'ham' directory with about 7k messages in it, turned 
> sa-learn into a 100% cpu hog, [...]

What did you expect? Based on your numbers above, processing that folder
would have taken 10-20 *hours*...

> incrementing the message processed number only 
> about every 3 to 5 minutes. I couldn't kill it, it kept coming back and I 
> must have fed it a kill -9 50 times.

Hmm. Kmail doesn't start one process per mail by any chance?

> So what is the maximum number of files in a directory that one can feed to 
> sa-learn --ham and expect it to achieve normal speed?

Dunno if there are limitations -- however, your 7k messages should be
perfectly fine. Just ran a test on a 6k messages mbox file, and there
was no noticeable difference to a 30 messages test.

> The command that kmail issues to it is:
> sa-learn --ham  /root/Mail/(foldername)/cur

You're not using root as your ordinary user account, do you !?

  guenther

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}