You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by alexus <al...@gmail.com> on 2009/04/21 04:20:19 UTC

sa-learn

i'm trying to teach my SA whats spam

it's a brand new out of box SA, i have few domains that i dont get
anything but a spam and on the top seems like from same spamers as
they "picked" emails that they thought would be good to spam and keep
on spaming them

so i do sa-learn --spam *
after a while it saying something like

Learned tokens from 52 message(s) (52 message(s) examined)

yet, when more of some what same email comes in it still can't
determinate if its spam or not...

am i doing something wrong? or is sa-learn isn't suppose to work as i
thought it would..

-- 
http://alexus.org/

Re: sa-learn

Posted by alexus <al...@gmail.com>.
On Mon, Apr 20, 2009 at 11:37 PM, alexus <al...@gmail.com> wrote:
> On Mon, Apr 20, 2009 at 11:01 PM, alexus <al...@gmail.com> wrote:
>> On Mon, Apr 20, 2009 at 10:27 PM, Evan Platt <ev...@espphotography.com> wrote:
>>> At 07:20 PM 4/20/2009, you wrote:
>>>>
>>>> i'm trying to teach my SA whats spam
>>>>
>>>> it's a brand new out of box SA, i have few domains that i dont get
>>>> anything but a spam and on the top seems like from same spamers as
>>>> they "picked" emails that they thought would be good to spam and keep
>>>> on spaming them
>>>>
>>>> so i do sa-learn --spam *
>>>> after a while it saying something like
>>>>
>>>> Learned tokens from 52 message(s) (52 message(s) examined)
>>>>
>>>> yet, when more of some what same email comes in it still can't
>>>> determinate if its spam or not...
>>>>
>>>> am i doing something wrong? or is sa-learn isn't suppose to work as i
>>>> thought it would..
>>>
>>> I could be wrong, but I believe you need to teach sa-learn ham too,
>>> otherwise if you only feed it one or the other, it doesn't 'know' what the
>>> difference is between ham and spam.
>>>
>>
>> i don't remember how but last time i was able to pull some sort of
>> stats and it had plenty of ham emails as well
>>
>> --
>> http://alexus.org/
>>
>
> 0.000          0          3          0  non-token data: bayes db version
> 0.000          0       5603          0  non-token data: nspam
> 0.000          0       1066          0  non-token data: nham
> 0.000          0     146370          0  non-token data: ntokens
> 0.000          0 1239464082          0  non-token data: oldest atime
> 0.000          0 1240284572          0  non-token data: newest atime
> 0.000          0 1240280535          0  non-token data: last journal sync atime
> 0.000          0 1240154929          0  non-token data: last expiry atime
> 0.000          0     691200          0  non-token data: last expire atime delta
> 0.000          0      31166          0  non-token data: last expire
> reduction count
>
> --
> http://alexus.org/
>

i trained it some more...

0.000          0          3          0  non-token data: bayes db version
0.000          0       5604          0  non-token data: nspam
0.000          0       9924          0  non-token data: nham
0.000          0     572242          0  non-token data: ntokens
0.000          0 1008017988          0  non-token data: oldest atime
0.000          0 1240285316          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal sync atime
0.000          0 1240285336          0  non-token data: last expiry atime
0.000          0     691200          0  non-token data: last expire atime delta
0.000          0     104929          0  non-token data: last expire
reduction count


-- 
http://alexus.org/

Re: sa-learn

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
> >>> i'm trying to teach my SA whats spam
> >>>
> >>> it's a brand new out of box SA, i have few domains that i dont get
> >>> anything but a spam and on the top seems like from same spamers as
> >>> they "picked" emails that they thought would be good to spam and keep
> >>> on spaming them

Different domains -- are these different users, too? Do you have a
site-wide Bayes setup? The training and scanning user must be the same.
Did you train as the scanning user?


> >>> yet, when more of some what same email comes in it still can't
> >>> determinate if its spam or not...

I assume you do have Bayes enabled, and that the training user is the
same as the scanning user. Are you positive the FNs are due to Bayes?
You didn't show any evidence.

Which rules do these messages trigger? If need be, just upload a raw
sample including all headers and body somewhere, your own webspace or a
pastebin, and provide the link.


> > i don't remember how but last time i was able to pull some sort of
> > stats and it had plenty of ham emails as well

Yup, sa-learn --dump magic. ;)

> 0.000          0          3          0  non-token data: bayes db version
> 0.000          0       5603          0  non-token data: nspam
> 0.000          0       1066          0  non-token data: nham
> 0.000          0     146370          0  non-token data: ntokens

That's sufficient for Bayes to kick in, with the default thresholds of
200 messages each.

Did you gather these stats -- and do the manual training -- as the
*same* user that scans your incoming mail?


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: sa-learn

Posted by alexus <al...@gmail.com>.
On Mon, Apr 20, 2009 at 11:01 PM, alexus <al...@gmail.com> wrote:
> On Mon, Apr 20, 2009 at 10:27 PM, Evan Platt <ev...@espphotography.com> wrote:
>> At 07:20 PM 4/20/2009, you wrote:
>>>
>>> i'm trying to teach my SA whats spam
>>>
>>> it's a brand new out of box SA, i have few domains that i dont get
>>> anything but a spam and on the top seems like from same spamers as
>>> they "picked" emails that they thought would be good to spam and keep
>>> on spaming them
>>>
>>> so i do sa-learn --spam *
>>> after a while it saying something like
>>>
>>> Learned tokens from 52 message(s) (52 message(s) examined)
>>>
>>> yet, when more of some what same email comes in it still can't
>>> determinate if its spam or not...
>>>
>>> am i doing something wrong? or is sa-learn isn't suppose to work as i
>>> thought it would..
>>
>> I could be wrong, but I believe you need to teach sa-learn ham too,
>> otherwise if you only feed it one or the other, it doesn't 'know' what the
>> difference is between ham and spam.
>>
>
> i don't remember how but last time i was able to pull some sort of
> stats and it had plenty of ham emails as well
>
> --
> http://alexus.org/
>

0.000          0          3          0  non-token data: bayes db version
0.000          0       5603          0  non-token data: nspam
0.000          0       1066          0  non-token data: nham
0.000          0     146370          0  non-token data: ntokens
0.000          0 1239464082          0  non-token data: oldest atime
0.000          0 1240284572          0  non-token data: newest atime
0.000          0 1240280535          0  non-token data: last journal sync atime
0.000          0 1240154929          0  non-token data: last expiry atime
0.000          0     691200          0  non-token data: last expire atime delta
0.000          0      31166          0  non-token data: last expire
reduction count

-- 
http://alexus.org/

Re: sa-learn

Posted by alexus <al...@gmail.com>.
On Mon, Apr 20, 2009 at 10:27 PM, Evan Platt <ev...@espphotography.com> wrote:
> At 07:20 PM 4/20/2009, you wrote:
>>
>> i'm trying to teach my SA whats spam
>>
>> it's a brand new out of box SA, i have few domains that i dont get
>> anything but a spam and on the top seems like from same spamers as
>> they "picked" emails that they thought would be good to spam and keep
>> on spaming them
>>
>> so i do sa-learn --spam *
>> after a while it saying something like
>>
>> Learned tokens from 52 message(s) (52 message(s) examined)
>>
>> yet, when more of some what same email comes in it still can't
>> determinate if its spam or not...
>>
>> am i doing something wrong? or is sa-learn isn't suppose to work as i
>> thought it would..
>
> I could be wrong, but I believe you need to teach sa-learn ham too,
> otherwise if you only feed it one or the other, it doesn't 'know' what the
> difference is between ham and spam.
>

i don't remember how but last time i was able to pull some sort of
stats and it had plenty of ham emails as well

-- 
http://alexus.org/

Re: sa-learn

Posted by Evan Platt <ev...@espphotography.com>.
At 07:20 PM 4/20/2009, you wrote:
>i'm trying to teach my SA whats spam
>
>it's a brand new out of box SA, i have few domains that i dont get
>anything but a spam and on the top seems like from same spamers as
>they "picked" emails that they thought would be good to spam and keep
>on spaming them
>
>so i do sa-learn --spam *
>after a while it saying something like
>
>Learned tokens from 52 message(s) (52 message(s) examined)
>
>yet, when more of some what same email comes in it still can't
>determinate if its spam or not...
>
>am i doing something wrong? or is sa-learn isn't suppose to work as i
>thought it would..

I could be wrong, but I believe you need to teach sa-learn ham too, 
otherwise if you only feed it one or the other, it doesn't 'know' 
what the difference is between ham and spam. 


Re: sa-learn

Posted by John Hardin <jh...@impsec.org>.
On Tue, 21 Apr 2009, Karsten Br�ckelmann wrote:

> With most backends, mail storage formats -- be it local or IMAP -- 
> "moving spam out of the Inbox" isn't sufficient to have it *clean*. The 
> source message that's just "moved" out often still physically remains in 
> the folder, unnoticed, until one expunges (compact in TB lingo).
>
> Without that step, training the Inbox as ham might learn all those 
> pesky, sneaky spam as ham too, which the user believes has been moved...

Oh, ouch. I didn't think of that aspect...

> Just a side-note. :)

And a very good one. Thanks.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   A sword is never a killer, it is but a tool in the killer's hands.
                           -- Lucius Annaeus Seneca (Martial) 4BC-65AD
-----------------------------------------------------------------------
  2 days until Max Planck's 151st birthday

Re: sa-learn

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Tue, 2009-04-21 at 01:21 -0400, Gene Heskett wrote:
> You need to have it learn at least 200 messages of both 'ham' and 'spam' 
> before it has enough data to switch to working mode.  So sort them into 
> separate directories, and have it learn both a clean inbox as ham, and an all 
> spam directory.  [...]

Very true. There's one important word there, that /might/ bite one's
butt, though. *clean*.

With most backends, mail storage formats -- be it local or IMAP --
"moving spam out of the Inbox" isn't sufficient to have it *clean*. The
source message that's just "moved" out often still physically remains in
the folder, unnoticed, until one expunges (compact in TB lingo).

Without that step, training the Inbox as ham might learn all those
pesky, sneaky spam as ham too, which the user believes has been moved...

Just a side-note. :)


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: sa-learn

Posted by Adam Katz <an...@khopis.com>.
>>> X-Spam-Status: No, score=4.9 required=5.0 tests=BAYES_99,HTML_MESSAGE,
>>>        MIME_HTML_ONLY,SPF_HELO_PASS autolearn=no version=3.2.5
>>>
>>> it gave BAYES_99, yet it still think it's autolearn=no, and it still
>>> doesnt think this is SPAM

Autolearn can't trigger from BAYES hits because it's used to populate
BAYES hits.  If the points from BAYES_* rules get added (or subtracted)
to the score, all autolearn would do is learn that things it already
knew about were right ... regardless of whether that's true.

Additionally, autolearn triggers for spam at a higher threshold (12 by
default, see "man Mail::SpamAssassin::Plugin::AutoLearnThreshold").

>> X-Spam-Flag: YES
>> X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mx1.alexus.biz
>> X-Spam-Level: *****
>> X-Spam-Status: Yes, score=5.6 required=5.0 tests=BAYES_99,HTML_MESSAGE,
>> 	MIME_HTML_ONLY,SPF_HELO_PASS,SPF_SOFTFAIL autolearn=no version=3.2.5
>> X-Spam-Report:
>> 	*  3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
>> 	*      [score: 1.0000]
>> 	* -0.0 SPF_HELO_PASS SPF: HELO matches SPF record
>> 	*  0.6 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail)
>> 	*  0.0 HTML_MESSAGE BODY: HTML included in message
>> 	*  1.5 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
>>
>> how can I put X-Spam-Report into every email? because this was
>> generated manually via "spamassassin -t email"

To do that, put this in your ~/.spamassassin/user_prefs or local.cf:

add_header all Report _REPORT_


Re: sa-learn

Posted by Gene Heskett <ge...@verizon.net>.
On Tuesday 21 April 2009, alexus wrote:
>On Tue, Apr 21, 2009 at 4:03 PM, alexus <al...@gmail.com> wrote:
>> On Tue, Apr 21, 2009 at 3:58 PM, Gene Heskett <ge...@verizon.net> 
wrote:
>>> On Tuesday 21 April 2009, alexus wrote:
>>>>On Tue, Apr 21, 2009 at 1:21 AM, Gene Heskett <ge...@verizon.net>
>>>
>>> wrote:
>>>>> On Monday 20 April 2009, alexus wrote:
>>>>>>i'm trying to teach my SA whats spam
>>>>>>
>>>>>>it's a brand new out of box SA, i have few domains that i dont get
>>>>>>anything but a spam and on the top seems like from same spamers as
>>>>>>they "picked" emails that they thought would be good to spam and keep
>>>>>>on spaming them
>>>>>>
>>>>>>so i do sa-learn --spam *
>>>>>>after a while it saying something like
>>>>>>
>>>>>>Learned tokens from 52 message(s) (52 message(s) examined)
>>>>>>
>>>>>>yet, when more of some what same email comes in it still can't
>>>>>>determinate if its spam or not...
>>>>>>
>>>>>>am i doing something wrong? or is sa-learn isn't suppose to work as i
>>>>>>thought it would..
>>>>>
>>>>> You need to have it learn at least 200 messages of both 'ham' and
>>>>> 'spam' before it has enough data to switch to working mode.  So sort
>>>>> them into separate directories, and have it learn both a clean inbox as
>>>>> ham, and an all spam directory.  When it has learned those, it keep
>>>>> track and will not learn those particular emails again, so clean the
>>>>> spam box, just delete its contents.  I even use a cleaned up, sorted to
>>>>> separate directories mailing list as ham just so it knows stuff from
>>>>> that list is generally ham.  I had one list that I never figured out
>>>>> what was spammy about it, and since the corpus of that list went back
>>>>> several years, I fed the whole thing to SA as ham. Took it several
>>>>> hours but no more problems with that lists messages now.  Now, the spam
>>>>> that does get through goes into a spam dir, and a cron job learns it,
>>>>> then deletes it daily.  I'm lazy, and repetitive tasks are to be done
>>>>> by a cron fired script around this camp.
>>>>>
>>>>> :)
>>>>>
>>>>> --
>>>>> Cheers, Gene
>>>>> "There are four boxes to be used in defense of liberty:
>>>>>  soap, ballot, jury, and ammo. Please use in that order."
>>>>> -Ed Howdershelt (Author)
>>>>> Any two philosophers can tell each other all they know in two hours.
>>>>>                -- Oliver Wendell Holmes, Jr.
>>>>
>>>>how do I change my SA from learning mode to working mode?
>>>
>>> I believe that is automatic once it has enough data.  See above, 200 msgs
>>> of each type required IIRC.
>>>
>>> Understand that SA only rates the email, and puts its findings in the
>>> header. It is up to you to determine what is done with mail that is too
>>> spammy.  I use procmail as the MTA from fetchmail, and procmail is
>>> configured to send anything that SA labels with 5 stars or over to
>>> /dev/null.
>>>
>>> --
>>> Cheers, Gene
>>> "There are four boxes to be used in defense of liberty:
>>>  soap, ballot, jury, and ammo. Please use in that order."
>>> -Ed Howdershelt (Author)
>>> Delta: The kids will love our inflatable slides.    -- David Letterman
>>
>> an example
>>
>> Received: by simscan 1.4.0 ppid: 97779, pid: 97780, t: 3.8809s
>>        scanners: regex: 1.4.0 clamav: 0.95/m:50/d:9252 spam: 3.2.5
>> X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mx1.alexus.biz
>> X-Spam-Level: ****
>> X-Spam-Status: No, score=4.9 required=5.0 tests=BAYES_99,HTML_MESSAGE,
>>        MIME_HTML_ONLY,SPF_HELO_PASS autolearn=no version=3.2.5
>>
>> it gave BAYES_99, yet it still think it's autolearn=no, and it still
>> doesnt think this is SPAM
>>
>> --
>> http://alexus.org/
>
>this is from another email
>
>X-Spam-Flag: YES
>X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mx1.alexus.biz
>X-Spam-Level: *****
>X-Spam-Status: Yes, score=5.6 required=5.0 tests=BAYES_99,HTML_MESSAGE,
>	MIME_HTML_ONLY,SPF_HELO_PASS,SPF_SOFTFAIL autolearn=no version=3.2.5
>X-Spam-Report:
>	*  3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
>	*      [score: 1.0000]
>	* -0.0 SPF_HELO_PASS SPF: HELO matches SPF record
>	*  0.6 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail)
>	*  0.0 HTML_MESSAGE BODY: HTML included in message
>	*  1.5 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
>
>how can I put X-Spam-Report into every email? because this was
>generated manually via "spamassassin -t email"

That I do not know, because I have never used anything but the number of *** 
in the X-Spam-Level line.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Hating the Yankees is as American as pizza pie, unwed mothers and
cheating on your income tax.
		-- Mike Royko


Re: sa-learn

Posted by alexus <al...@gmail.com>.
On Tue, Apr 21, 2009 at 4:03 PM, alexus <al...@gmail.com> wrote:
> On Tue, Apr 21, 2009 at 3:58 PM, Gene Heskett <ge...@verizon.net> wrote:
>> On Tuesday 21 April 2009, alexus wrote:
>>>On Tue, Apr 21, 2009 at 1:21 AM, Gene Heskett <ge...@verizon.net>
>> wrote:
>>>> On Monday 20 April 2009, alexus wrote:
>>>>>i'm trying to teach my SA whats spam
>>>>>
>>>>>it's a brand new out of box SA, i have few domains that i dont get
>>>>>anything but a spam and on the top seems like from same spamers as
>>>>>they "picked" emails that they thought would be good to spam and keep
>>>>>on spaming them
>>>>>
>>>>>so i do sa-learn --spam *
>>>>>after a while it saying something like
>>>>>
>>>>>Learned tokens from 52 message(s) (52 message(s) examined)
>>>>>
>>>>>yet, when more of some what same email comes in it still can't
>>>>>determinate if its spam or not...
>>>>>
>>>>>am i doing something wrong? or is sa-learn isn't suppose to work as i
>>>>>thought it would..
>>>>
>>>> You need to have it learn at least 200 messages of both 'ham' and 'spam'
>>>> before it has enough data to switch to working mode.  So sort them into
>>>> separate directories, and have it learn both a clean inbox as ham, and an
>>>> all spam directory.  When it has learned those, it keep track and will not
>>>> learn those particular emails again, so clean the spam box, just delete
>>>> its contents.  I even use a cleaned up, sorted to separate directories
>>>> mailing list as ham just so it knows stuff from that list is generally
>>>> ham.  I had one list that I never figured out what was spammy about it,
>>>> and since the corpus of that list went back several years, I fed the whole
>>>> thing to SA as ham. Took it several hours but no more problems with that
>>>> lists messages now.  Now, the spam that does get through goes into a spam
>>>> dir, and a cron job learns it, then deletes it daily.  I'm lazy, and
>>>> repetitive tasks are to be done by a cron fired script around this camp.
>>>> :)
>>>>
>>>> --
>>>> Cheers, Gene
>>>> "There are four boxes to be used in defense of liberty:
>>>>  soap, ballot, jury, and ammo. Please use in that order."
>>>> -Ed Howdershelt (Author)
>>>> Any two philosophers can tell each other all they know in two hours.
>>>>                -- Oliver Wendell Holmes, Jr.
>>>
>>>how do I change my SA from learning mode to working mode?
>>
>> I believe that is automatic once it has enough data.  See above, 200 msgs of
>> each type required IIRC.
>>
>> Understand that SA only rates the email, and puts its findings in the header.
>> It is up to you to determine what is done with mail that is too spammy.  I use
>> procmail as the MTA from fetchmail, and procmail is configured to send
>> anything that SA labels with 5 stars or over to /dev/null.
>>
>> --
>> Cheers, Gene
>> "There are four boxes to be used in defense of liberty:
>>  soap, ballot, jury, and ammo. Please use in that order."
>> -Ed Howdershelt (Author)
>> Delta: The kids will love our inflatable slides.    -- David Letterman
>>
>>
>
> an example
>
> Received: by simscan 1.4.0 ppid: 97779, pid: 97780, t: 3.8809s
>        scanners: regex: 1.4.0 clamav: 0.95/m:50/d:9252 spam: 3.2.5
> X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mx1.alexus.biz
> X-Spam-Level: ****
> X-Spam-Status: No, score=4.9 required=5.0 tests=BAYES_99,HTML_MESSAGE,
>        MIME_HTML_ONLY,SPF_HELO_PASS autolearn=no version=3.2.5
>
> it gave BAYES_99, yet it still think it's autolearn=no, and it still
> doesnt think this is SPAM
>
> --
> http://alexus.org/
>

this is from another email

X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mx1.alexus.biz
X-Spam-Level: *****
X-Spam-Status: Yes, score=5.6 required=5.0 tests=BAYES_99,HTML_MESSAGE,
	MIME_HTML_ONLY,SPF_HELO_PASS,SPF_SOFTFAIL autolearn=no version=3.2.5
X-Spam-Report:
	*  3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
	*      [score: 1.0000]
	* -0.0 SPF_HELO_PASS SPF: HELO matches SPF record
	*  0.6 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail)
	*  0.0 HTML_MESSAGE BODY: HTML included in message
	*  1.5 MIME_HTML_ONLY BODY: Message only has text/html MIME parts

how can I put X-Spam-Report into every email? because this was
generated manually via "spamassassin -t email"

-- 
http://alexus.org/

Re: sa-learn

Posted by alexus <al...@gmail.com>.
On Tue, Apr 21, 2009 at 3:58 PM, Gene Heskett <ge...@verizon.net> wrote:
> On Tuesday 21 April 2009, alexus wrote:
>>On Tue, Apr 21, 2009 at 1:21 AM, Gene Heskett <ge...@verizon.net>
> wrote:
>>> On Monday 20 April 2009, alexus wrote:
>>>>i'm trying to teach my SA whats spam
>>>>
>>>>it's a brand new out of box SA, i have few domains that i dont get
>>>>anything but a spam and on the top seems like from same spamers as
>>>>they "picked" emails that they thought would be good to spam and keep
>>>>on spaming them
>>>>
>>>>so i do sa-learn --spam *
>>>>after a while it saying something like
>>>>
>>>>Learned tokens from 52 message(s) (52 message(s) examined)
>>>>
>>>>yet, when more of some what same email comes in it still can't
>>>>determinate if its spam or not...
>>>>
>>>>am i doing something wrong? or is sa-learn isn't suppose to work as i
>>>>thought it would..
>>>
>>> You need to have it learn at least 200 messages of both 'ham' and 'spam'
>>> before it has enough data to switch to working mode.  So sort them into
>>> separate directories, and have it learn both a clean inbox as ham, and an
>>> all spam directory.  When it has learned those, it keep track and will not
>>> learn those particular emails again, so clean the spam box, just delete
>>> its contents.  I even use a cleaned up, sorted to separate directories
>>> mailing list as ham just so it knows stuff from that list is generally
>>> ham.  I had one list that I never figured out what was spammy about it,
>>> and since the corpus of that list went back several years, I fed the whole
>>> thing to SA as ham. Took it several hours but no more problems with that
>>> lists messages now.  Now, the spam that does get through goes into a spam
>>> dir, and a cron job learns it, then deletes it daily.  I'm lazy, and
>>> repetitive tasks are to be done by a cron fired script around this camp.
>>> :)
>>>
>>> --
>>> Cheers, Gene
>>> "There are four boxes to be used in defense of liberty:
>>>  soap, ballot, jury, and ammo. Please use in that order."
>>> -Ed Howdershelt (Author)
>>> Any two philosophers can tell each other all they know in two hours.
>>>                -- Oliver Wendell Holmes, Jr.
>>
>>how do I change my SA from learning mode to working mode?
>
> I believe that is automatic once it has enough data.  See above, 200 msgs of
> each type required IIRC.
>
> Understand that SA only rates the email, and puts its findings in the header.
> It is up to you to determine what is done with mail that is too spammy.  I use
> procmail as the MTA from fetchmail, and procmail is configured to send
> anything that SA labels with 5 stars or over to /dev/null.
>
> --
> Cheers, Gene
> "There are four boxes to be used in defense of liberty:
>  soap, ballot, jury, and ammo. Please use in that order."
> -Ed Howdershelt (Author)
> Delta: The kids will love our inflatable slides.    -- David Letterman
>
>

an example

Received: by simscan 1.4.0 ppid: 97779, pid: 97780, t: 3.8809s
        scanners: regex: 1.4.0 clamav: 0.95/m:50/d:9252 spam: 3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mx1.alexus.biz
X-Spam-Level: ****
X-Spam-Status: No, score=4.9 required=5.0 tests=BAYES_99,HTML_MESSAGE,
        MIME_HTML_ONLY,SPF_HELO_PASS autolearn=no version=3.2.5

it gave BAYES_99, yet it still think it's autolearn=no, and it still
doesnt think this is SPAM

-- 
http://alexus.org/

Re: sa-learn

Posted by Gene Heskett <ge...@verizon.net>.
On Tuesday 21 April 2009, alexus wrote:
>On Tue, Apr 21, 2009 at 1:21 AM, Gene Heskett <ge...@verizon.net> 
wrote:
>> On Monday 20 April 2009, alexus wrote:
>>>i'm trying to teach my SA whats spam
>>>
>>>it's a brand new out of box SA, i have few domains that i dont get
>>>anything but a spam and on the top seems like from same spamers as
>>>they "picked" emails that they thought would be good to spam and keep
>>>on spaming them
>>>
>>>so i do sa-learn --spam *
>>>after a while it saying something like
>>>
>>>Learned tokens from 52 message(s) (52 message(s) examined)
>>>
>>>yet, when more of some what same email comes in it still can't
>>>determinate if its spam or not...
>>>
>>>am i doing something wrong? or is sa-learn isn't suppose to work as i
>>>thought it would..
>>
>> You need to have it learn at least 200 messages of both 'ham' and 'spam'
>> before it has enough data to switch to working mode.  So sort them into
>> separate directories, and have it learn both a clean inbox as ham, and an
>> all spam directory.  When it has learned those, it keep track and will not
>> learn those particular emails again, so clean the spam box, just delete
>> its contents.  I even use a cleaned up, sorted to separate directories
>> mailing list as ham just so it knows stuff from that list is generally
>> ham.  I had one list that I never figured out what was spammy about it,
>> and since the corpus of that list went back several years, I fed the whole
>> thing to SA as ham. Took it several hours but no more problems with that
>> lists messages now.  Now, the spam that does get through goes into a spam
>> dir, and a cron job learns it, then deletes it daily.  I'm lazy, and
>> repetitive tasks are to be done by a cron fired script around this camp.
>> :)
>>
>> --
>> Cheers, Gene
>> "There are four boxes to be used in defense of liberty:
>>  soap, ballot, jury, and ammo. Please use in that order."
>> -Ed Howdershelt (Author)
>> Any two philosophers can tell each other all they know in two hours.
>>                -- Oliver Wendell Holmes, Jr.
>
>how do I change my SA from learning mode to working mode?

I believe that is automatic once it has enough data.  See above, 200 msgs of 
each type required IIRC.

Understand that SA only rates the email, and puts its findings in the header.  
It is up to you to determine what is done with mail that is too spammy.  I use 
procmail as the MTA from fetchmail, and procmail is configured to send 
anything that SA labels with 5 stars or over to /dev/null.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Delta: The kids will love our inflatable slides.    -- David Letterman


Re: sa-learn

Posted by alexus <al...@gmail.com>.
On Tue, Apr 21, 2009 at 1:21 AM, Gene Heskett <ge...@verizon.net> wrote:
> On Monday 20 April 2009, alexus wrote:
>>i'm trying to teach my SA whats spam
>>
>>it's a brand new out of box SA, i have few domains that i dont get
>>anything but a spam and on the top seems like from same spamers as
>>they "picked" emails that they thought would be good to spam and keep
>>on spaming them
>>
>>so i do sa-learn --spam *
>>after a while it saying something like
>>
>>Learned tokens from 52 message(s) (52 message(s) examined)
>>
>>yet, when more of some what same email comes in it still can't
>>determinate if its spam or not...
>>
>>am i doing something wrong? or is sa-learn isn't suppose to work as i
>>thought it would..
>
> You need to have it learn at least 200 messages of both 'ham' and 'spam'
> before it has enough data to switch to working mode.  So sort them into
> separate directories, and have it learn both a clean inbox as ham, and an all
> spam directory.  When it has learned those, it keep track and will not learn
> those particular emails again, so clean the spam box, just delete its
> contents.  I even use a cleaned up, sorted to separate directories mailing
> list as ham just so it knows stuff from that list is generally ham.  I had one
> list that I never figured out what was spammy about it, and since the corpus
> of that list went back several years, I fed the whole thing to SA as ham.
> Took it several hours but no more problems with that lists messages now.  Now,
> the spam that does get through goes into a spam dir, and a cron job learns it,
> then deletes it daily.  I'm lazy, and repetitive tasks are to be done by a
> cron fired script around this camp. :)
>
> --
> Cheers, Gene
> "There are four boxes to be used in defense of liberty:
>  soap, ballot, jury, and ammo. Please use in that order."
> -Ed Howdershelt (Author)
> Any two philosophers can tell each other all they know in two hours.
>                -- Oliver Wendell Holmes, Jr.
>
>

how do I change my SA from learning mode to working mode?

-- 
http://alexus.org/

Re: sa-learn

Posted by Gene Heskett <ge...@verizon.net>.
On Monday 20 April 2009, alexus wrote:
>i'm trying to teach my SA whats spam
>
>it's a brand new out of box SA, i have few domains that i dont get
>anything but a spam and on the top seems like from same spamers as
>they "picked" emails that they thought would be good to spam and keep
>on spaming them
>
>so i do sa-learn --spam *
>after a while it saying something like
>
>Learned tokens from 52 message(s) (52 message(s) examined)
>
>yet, when more of some what same email comes in it still can't
>determinate if its spam or not...
>
>am i doing something wrong? or is sa-learn isn't suppose to work as i
>thought it would..

You need to have it learn at least 200 messages of both 'ham' and 'spam' 
before it has enough data to switch to working mode.  So sort them into 
separate directories, and have it learn both a clean inbox as ham, and an all 
spam directory.  When it has learned those, it keep track and will not learn 
those particular emails again, so clean the spam box, just delete its 
contents.  I even use a cleaned up, sorted to separate directories mailing 
list as ham just so it knows stuff from that list is generally ham.  I had one 
list that I never figured out what was spammy about it, and since the corpus 
of that list went back several years, I fed the whole thing to SA as ham.  
Took it several hours but no more problems with that lists messages now.  Now, 
the spam that does get through goes into a spam dir, and a cron job learns it, 
then deletes it daily.  I'm lazy, and repetitive tasks are to be done by a 
cron fired script around this camp. :)

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Any two philosophers can tell each other all they know in two hours.
		-- Oliver Wendell Holmes, Jr.