You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Harald Binkle <bi...@jam-software.com> on 2008/05/06 15:33:20 UTC

shortcircuit for USER_IN_WHITELIST --> noautolearn?? ==>learn!

Hi,
I just wondered why my bayes filter does not learn as much ham mails as before.
Then I realized that the USER_IN_WHITELIST shortcirciut is set to spam which has tflags noautoloearn.
Does this really make sense?
The only case a mail from a user of the whitelist is no ham could if the senders machine is infected by a virus or an Trojan.
So why don't set it back that mails from users in the withlist are learned by the bayes?

How can I set it back for me that mails from users in the withlist are learned by the bayes?

Greetings

Harry



----------------------------------------------------
JAM Software GmbH
Gesch?ftsf?hrer: Joachim Marder
Bruchhausenstr. 1 * 54290 Trier * Germany
Tel: 0700-70707050 * Fax: 0700-70707059
(max. 12,4 ct/min, Preise aus Mobilfunknetzen k?nnen abweichen)
Handelsregister Nr. HRB 4920 (AG Wittlich)  http://www.jam-software.de

RE: shortcircuit for USER_IN_WHITELIST --> noautolearn?? ==>learn!

Posted by Harald Binkle <bi...@jam-software.com>.
> -----Original Message-----
> From: Loren Wilton [mailto:lwilton@earthlink.net]
>
> Is there a way to clear the noautolearn for the whitelist rules?
> Normal
> rules could probably do it with tflags.  Except I'm not sure that you
> can
> necessarily negate a previously set tflags value with a later tflags
> value.
> (If not, maybe it would be worth an enhancement request.)

I tried that already. No changes. It seems I can't override the tflags of the USER_IN_WHITELIST rule/shortcirciut.
(I tried to override in local.cf)


> Another solution in this case would be to not use the whitelist.  Just
> make
> a rule, or several rules and meta them together, and give the overall
> rule a
> score of -100 and set the shortcircuit and autolearn flags on the rule.
> As
> everyone has mentioned, this can still end up poisioning your database
> if
> any of those senders get joe-jobbed.  But then again, you might be
> lucky and
> it would work.

Thanks but I think they convinced me.

Harry




----------------------------------------------------
JAM Software GmbH
Gesch?ftsf?hrer: Joachim Marder
Bruchhausenstr. 1 * 54290 Trier * Germany
Tel: 0700-70707050 * Fax: 0700-70707059
(max. 12,4 ct/min, Preise aus Mobilfunknetzen k?nnen abweichen)
Handelsregister Nr. HRB 4920 (AG Wittlich)  http://www.jam-software.de

Re: shortcircuit for USER_IN_WHITELIST --> noautolearn?? ==>learn!

Posted by Loren Wilton <lw...@earthlink.net>.
Is there a way to clear the noautolearn for the whitelist rules?  Normal 
rules could probably do it with tflags.  Except I'm not sure that you can 
necessarily negate a previously set tflags value with a later tflags value. 
(If not, maybe it would be worth an enhancement request.)

Another solution in this case would be to not use the whitelist.  Just make 
a rule, or several rules and meta them together, and give the overall rule a 
score of -100 and set the shortcircuit and autolearn flags on the rule.  As 
everyone has mentioned, this can still end up poisioning your database if 
any of those senders get joe-jobbed.  But then again, you might be lucky and 
it would work.

        Loren


new eval functions comparing the matches of two regular expression?

Posted by Harald Binkle <bi...@jam-software.com>.
What about a new eval functions comparing the matches of two regular expression?
If there would be functions like

 eval:Equals(/regex1/,/regex2/)
and
 eval:NOTEquals(/regex1/,/regex2/)

it would be easy to define rules like:

a rule scoring, say with 0.8 points, if there is only one recipients address and that one equals the senders address but they have different 'name parts'?
Like:
TO: "User Name" <us...@domain.com>
FROM: "viagra offer" <us...@domain.com>

There are a lot of spam mails with that structure trying to get through because many people have their own domain on the whitelist.
I tried to set this up as rule but with no luck. I fear it is not possible to do this with a regular expression
as it is not possible to compare results of a regular expression in a regular expression.

Could someone implement this?

Greetings

Harry



----------------------------------------------------
JAM Software GmbH
Gesch?ftsf?hrer: Joachim Marder
Max-Planck-Str. 22 * 54296 Trier * Germany
Tel: 0700-70707050 * Fax: 0700-70707059
(max. 12,4 ct/min, Preise aus Mobilfunknetzen k?nnen abweichen)
Handelsregister Nr. HRB 4920 (AG Wittlich)  http://www.jam-software.de

RE: shortcircuit for USER_IN_WHITELIST --> noautolearn?? ==>learn!

Posted by Harald Binkle <bi...@jam-software.com>.
Sidney,
thank you very much for your answers and explanations.
I just looked over the code of check_forged_in_whitelist and think it's hard to use for my intention.
I will wait some days if someone else  will replay to the request of implementing eval:Equals(/regex1/,/regex2/) and eval:NOTEquals(/regex1/,/regex2/).
If no one will answer I'll post that request with a correct (more appropriate) subject in one or two weeks to the dev list again and see what others say.
The problem I have is, that we use the windows version of SpamAssassin (http://sourceforge.net/projects/sawin32/) so just implementing a plugin providing those two functions is not easy (much work).

I think those evals would give the option to write more powerful rules without the need to implement little things in plugins as it is not possible to compare matches of regular expression within the same regular expression.

Harry

> -----Original Message-----
> From: Sidney Markowitz [mailto:sidney@sidney.com]
> Sent: Wednesday, May 07, 2008 10:19 AM
> To: Harald Binkle
> Cc: 'dev@spamassassin.apache.org'
> Subject: Re: shortcircuit for USER_IN_WHITELIST --> noautolearn??
> ==>learn!
>
> Harald Binkle wrote, On 7/5/08 7:46 PM:
> > Sorry, I thought a discussion for switching the default behavior
> would be right to be
> > in dev list.
>
> Yes, I'm the one who brought up the related issues of how to handle
> learning and
> whitelisting, and I said what I did to make sure that any further
> digression to those
> topics should go to the users list. Your questions about changing the
> default behavior and
> about new eval rules would go in this list.
>
> > And what about a discussion about a new eval function comparing the
> matches of two
> > regular expression. If there would be functions
> eval:Equals(/regex1/,/regex2/) and
> > eval:NOTEquals(/regex1/,/regex2/)  it would be easy to define rules
> like the one I
> > mentioned in my last mail.
>
> I don't have an immediate opinion about this. Perhaps you could try it
> out in a plugin and
> see how it works out compared to simply using whitelist_from_rcvd to
> make the whitelisting
> work.
>
> I did once try to catch that kind of spam with an eval rule that calls
> check_forged_in_whitelist which is supposed to catch anything that
> matched the address
> portion of a whitelist_in_rcvd but doesn't match the received part of
> the test. I don't
> remember now why we don't have any rules that use that eval, it may be
> that it doesn't
> really work. You might try defining a rule
>
>    header FORGED_USER_IN_WHITELIST  eval:check_forged_in_whitelist()
>
> and also define some whitelist_from_rcvd entries and see if that rule
> has any success at
> catching those.
>
>   -- sidney




----------------------------------------------------
JAM Software GmbH
Gesch?ftsf?hrer: Joachim Marder
Bruchhausenstr. 1 * 54290 Trier * Germany
Tel: 0700-70707050 * Fax: 0700-70707059
(max. 12,4 ct/min, Preise aus Mobilfunknetzen k?nnen abweichen)
Handelsregister Nr. HRB 4920 (AG Wittlich)  http://www.jam-software.de

Re: shortcircuit for USER_IN_WHITELIST --> noautolearn?? ==>learn!

Posted by Sidney Markowitz <si...@sidney.com>.
Harald Binkle wrote, On 7/5/08 7:46 PM:
> Sorry, I thought a discussion for switching the default behavior would be right to be
> in dev list.

Yes, I'm the one who brought up the related issues of how to handle learning and
whitelisting, and I said what I did to make sure that any further digression to those
topics should go to the users list. Your questions about changing the default behavior and
about new eval rules would go in this list.

> And what about a discussion about a new eval function comparing the matches of two
> regular expression. If there would be functions eval:Equals(/regex1/,/regex2/) and
> eval:NOTEquals(/regex1/,/regex2/)  it would be easy to define rules like the one I
> mentioned in my last mail.

I don't have an immediate opinion about this. Perhaps you could try it out in a plugin and
see how it works out compared to simply using whitelist_from_rcvd to make the whitelisting
work.

I did once try to catch that kind of spam with an eval rule that calls
check_forged_in_whitelist which is supposed to catch anything that matched the address
portion of a whitelist_in_rcvd but doesn't match the received part of the test. I don't
remember now why we don't have any rules that use that eval, it may be that it doesn't 
really work. You might try defining a rule

   header FORGED_USER_IN_WHITELIST  eval:check_forged_in_whitelist()

and also define some whitelist_from_rcvd entries and see if that rule has any success at
catching those.

  -- sidney


RE: shortcircuit for USER_IN_WHITELIST --> noautolearn?? ==>learn!

Posted by Harald Binkle <bi...@jam-software.com>.
I see.
Sorry, I thought a discussion for switching the default behavior would be right to be in dev list.
And what about a discussion about a new eval function comparing the matches of two regular expression.
If there would be functions eval:Equals(/regex1/,/regex2/) and eval:NOTEquals(/regex1/,/regex2/)  it would be easy to define rules like the one I mentioned in my last mail.

(create a rule scoring say with 0.8 points if there is only one recipients address and that one equals the senders address but they have different 'name parts'?
Like:
TO: "User Name" <us...@domain.com>
FROM: "viargre offer" <us...@domain.com>

There are a lot of spam mails with that structure trying to get through because many people have their own domain on the whitelist.
I tried to set this up as rule but with no luck. I fear it is not possible to do with an regular expression.)


Harry

> -----Original Message-----
> From: Sidney Markowitz [mailto:sidney@sidney.com]
> Sent: Wednesday, May 07, 2008 9:07 AM
> To: Harald Binkle
> Cc: 'dev@spamassassin.apache.org'
> Subject: Re: shortcircuit for USER_IN_WHITELIST --> noautolearn??
> ==>learn!
>
> Harald Binkle wrote, On 7/5/08 6:30 PM:
> > Hi, ok, these are good reasons, I see. But I wrote a script setting
> all recipients of
> > outgoing mails on the whitelist. So everyone I send a message to will
> be on the
> > whitelist. Meanwhile nearly all people I have contact to are on my
> whitelist so there
> > are almost no mails I receive which will be automatically learned as
> ham.
>
> Autolearn is a way of doing the best that you can with no work, but you
> are seeing some of
> its failings. There is really no substitute for a manual learning
> procedure where you find
> a way to make it easy to specify whether email is really typical ham or
> spam and send it
> to the learner, avoiding sending atypical ham that contains words that
> you would not want
> to learn as ham. I could get into a discussion about ideas on how to do
> that without
> having to classify all your mail by hand, which of course is what you
> use SpamAssassin to
> avoid in the first place, but that's the kind of discussion that the
> SpamAssassin users
> mailing list is for.
>
> > There are a lot of spam mails with that structure trying to get
> through because many
> > people have their own domain on the whitelist. I tried to set this up
> as rule but with
> > no luck. I fear it is not possible to do with an regular expression.
>
> The proper way to do it is to use whitelist_from_rcvd instead of
> whitelist_from and put in
> a rule for each sending mail server that the person uses. Again, this
> is a topic for the
> sa-users mailing list rather than the dev list.
>
>   -- sidney




----------------------------------------------------
JAM Software GmbH
Gesch?ftsf?hrer: Joachim Marder
Bruchhausenstr. 1 * 54290 Trier * Germany
Tel: 0700-70707050 * Fax: 0700-70707059
(max. 12,4 ct/min, Preise aus Mobilfunknetzen k?nnen abweichen)
Handelsregister Nr. HRB 4920 (AG Wittlich)  http://www.jam-software.de

Re: shortcircuit for USER_IN_WHITELIST --> noautolearn?? ==>learn!

Posted by Sidney Markowitz <si...@sidney.com>.
Harald Binkle wrote, On 7/5/08 6:30 PM:
> Hi, ok, these are good reasons, I see. But I wrote a script setting all recipients of
> outgoing mails on the whitelist. So everyone I send a message to will be on the
> whitelist. Meanwhile nearly all people I have contact to are on my whitelist so there
> are almost no mails I receive which will be automatically learned as ham.

Autolearn is a way of doing the best that you can with no work, but you are seeing some of 
its failings. There is really no substitute for a manual learning procedure where you find 
a way to make it easy to specify whether email is really typical ham or spam and send it 
to the learner, avoiding sending atypical ham that contains words that you would not want 
to learn as ham. I could get into a discussion about ideas on how to do that without 
having to classify all your mail by hand, which of course is what you use SpamAssassin to 
avoid in the first place, but that's the kind of discussion that the SpamAssassin users 
mailing list is for.

> There are a lot of spam mails with that structure trying to get through because many
> people have their own domain on the whitelist. I tried to set this up as rule but with
> no luck. I fear it is not possible to do with an regular expression.

The proper way to do it is to use whitelist_from_rcvd instead of whitelist_from and put in 
a rule for each sending mail server that the person uses. Again, this is a topic for the 
sa-users mailing list rather than the dev list.

  -- sidney


RE: shortcircuit for USER_IN_WHITELIST --> noautolearn?? ==>learn!

Posted by Harald Binkle <bi...@jam-software.com>.
Hi,
ok, these are good reasons, I see.
But I wrote a script setting all recipients of outgoing mails on the whitelist.
So everyone I send a message to will be on the whitelist.
Meanwhile nearly all people I have contact to are on my whitelist so there are almost no mails I receive which will be automatically learned as ham.

Another thing regarding to your answer Matt:
Why don't create a rule scoring say with 0.8 points if there is only one recipients address and that one equals the senders address but they have different 'name parts'?
Like:
TO: "User Name" <us...@domain.com>
FROM: "viargre offer" <us...@domain.com>

There are a lot of spam mails with that structure trying to get through because many people have their own domain on the whitelist.
I tried to set this up as rule but with no luck. I fear it is not possible to do with an regular expression.


Harry


> -----Original Message-----
> From: Matt Kettler [mailto:mkettler_sa@verizon.net]
> Sent: Wednesday, May 07, 2008 7:19 AM
> To: Sidney Markowitz
> Cc: Harald Binkle; 'dev@spamassassin.apache.org'
> Subject: Re: shortcircuit for USER_IN_WHITELIST --> noautolearn??
> ==>learn!
>
> Sidney Markowitz wrote:
> > Harald Binkle wrote, On 7/5/08 1:33 AM:
> >> Hi, I just wondered why my bayes filter does not learn as much ham
> >> mails as before. Then I realized that the USER_IN_WHITELIST
> >> shortcirciut is set to spam which has tflags
> >> noautoloearn. Does this really make sense?
> >
> > The rationale is that you put an address on the whitelist when they
> > might send mail that looks like spam but you know it is really ham.
> If
> > it looks like spam, you don't want the Bayes filter to learn that it
> > is ham, because from anyone else it would be spam.
>
> Another reason not to do so is the frequency with which people
> mis-configure their whitelists.
>
> If you mistakenly whitelist_from *@mydomain.com, as many people have
> done when first setting up SA, your DNS database will be poisoned
> rather
> quickly if it allows such messages to autolearn.
>

&&&&&&&&&&&&&&&&&&&&

> -----Original Message-----
> From: Sidney Markowitz [mailto:sidney@sidney.com]
> Sent: Tuesday, May 06, 2008 10:41 PM
> To: Harald Binkle
> Cc: 'dev@spamassassin.apache.org'
> Subject: Re: shortcircuit for USER_IN_WHITELIST --> noautolearn??
> ==>learn!
>
> Harald Binkle wrote, On 7/5/08 1:33 AM:
> > Hi, I just wondered why my bayes filter does not learn as much ham
> mails as before.
> > Then I realized that the USER_IN_WHITELIST shortcirciut is set to
> spam which has tflags
> > noautoloearn. Does this really make sense?
>
> The rationale is that you put an address on the whitelist when they
> might send mail that
> looks like spam but you know it is really ham. If it looks like spam,
> you don't want the
> Bayes filter to learn that it is ham, because from anyone else it would
> be spam.
>
> Of course, someone on your whitelist can also send mail that looks like
> ham. The Bayes
> filter can't learn anything one way or the other from that mail, so it
> is sent to noautolearn.
>
>   -- sidney




----------------------------------------------------
JAM Software GmbH
Gesch?ftsf?hrer: Joachim Marder
Bruchhausenstr. 1 * 54290 Trier * Germany
Tel: 0700-70707050 * Fax: 0700-70707059
(max. 12,4 ct/min, Preise aus Mobilfunknetzen k?nnen abweichen)
Handelsregister Nr. HRB 4920 (AG Wittlich)  http://www.jam-software.de

Re: shortcircuit for USER_IN_WHITELIST --> noautolearn?? ==>learn!

Posted by Matt Kettler <mk...@verizon.net>.
Sidney Markowitz wrote:
> Harald Binkle wrote, On 7/5/08 1:33 AM:
>> Hi, I just wondered why my bayes filter does not learn as much ham 
>> mails as before. Then I realized that the USER_IN_WHITELIST 
>> shortcirciut is set to spam which has tflags
>> noautoloearn. Does this really make sense?
>
> The rationale is that you put an address on the whitelist when they 
> might send mail that looks like spam but you know it is really ham. If 
> it looks like spam, you don't want the Bayes filter to learn that it 
> is ham, because from anyone else it would be spam.

Another reason not to do so is the frequency with which people 
mis-configure their whitelists.

If you mistakenly whitelist_from *@mydomain.com, as many people have 
done when first setting up SA, your DNS database will be poisoned rather 
quickly if it allows such messages to autolearn.



Re: shortcircuit for USER_IN_WHITELIST --> noautolearn?? ==>learn!

Posted by Sidney Markowitz <si...@sidney.com>.
Harald Binkle wrote, On 7/5/08 1:33 AM:
> Hi, I just wondered why my bayes filter does not learn as much ham mails as before. 
> Then I realized that the USER_IN_WHITELIST shortcirciut is set to spam which has tflags
> noautoloearn. Does this really make sense?

The rationale is that you put an address on the whitelist when they might send mail that 
looks like spam but you know it is really ham. If it looks like spam, you don't want the 
Bayes filter to learn that it is ham, because from anyone else it would be spam.

Of course, someone on your whitelist can also send mail that looks like ham. The Bayes 
filter can't learn anything one way or the other from that mail, so it is sent to noautolearn.

  -- sidney