You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Chris Lear <ch...@laculine.com> on 2005/05/20 11:27:00 UTC
SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251
I've been running quite a lot of sare rules on a site-wide SA
installation for a month or two now. I've been keeping a fairly close
eye on it, and there have been few false positives generally.
But today I noticed that several e-mails are hitting both
SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251. These are ham, sent from
(one specific address in) Ukraine to a Ukrainian in England, written in
English.
The scoring is such that the e-mail gets a score of 3.333 PLUS 4.0 - so
only bayes saves it from being rejected (we reject at >5.5).
I can re-score these rules (or remove sare_header0, which will lower the
scores anyway), but I have 2 questions:
- Is this a slightly unfair double-scoring?
- Are there any other similar rules I should worry about, given that
some Russian mail to this server is ham?
--
Chris
Re: SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251
Posted by Chris Lear <ch...@laculine.com>.
* John Wilcock wrote (05/20/05 12:15):
> Chris Lear wrote:
>> They're in my header0.cf from sare/rules du jour. And in header.cf with
>> a lower score as well. Have I got the wrong files?
>
> Methinks you have an old header0.cf that is no longer being updated -
> these rules aren't in the current header0 on rulesemporium.com.
OK, thanks. I'll try to find out what's wrong with my Rules du Jour.
>
> And in any case you shouldn't be using header and header0 together...
I didn't know that. I'll fix that as well.
Thanks for your help.
--
Chris
Re: SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251
Posted by John Wilcock <jo...@tradoc.fr>.
Chris Lear wrote:
> They're in my header0.cf from sare/rules du jour. And in header.cf with
> a lower score as well. Have I got the wrong files?
Methinks you have an old header0.cf that is no longer being updated -
these rules aren't in the current header0 on rulesemporium.com.
And in any case you shouldn't be using header and header0 together...
John.
--
-- Over 2500 webcams from ski resorts around the world - www.snoweye.com
-- Translate your technical documents and web pages - www.tradoc.fr
Re: SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251
Posted by Chris Lear <ch...@laculine.com>.
* Robert Menschel wrote (05/20/05 15:13):
> Hello Chris, John,
>
> Friday, May 20, 2005, 3:47:55 AM, you wrote:
>
>>>> I can re-score these rules (or remove sare_header0, which will lower the
>>>> scores anyway), but I have 2 questions:
>>>> - Is this a slightly unfair double-scoring?
>>>> - Are there any other similar rules I should worry about, given that
>>>> some Russian mail to this server is ham?
>>>
>>> These are actually in the header1 file, not header0, but surely they
>>> ought to be moved to the 70_sare_header_eng.cf as they hit non-English
>>> ham. Bob?
>
> CL> They're in my header0.cf from sare/rules du jour. And in header.cf with
> CL> a lower score as well. Have I got the wrong files?
>
> Yes, your header0 is old. Both rules are in header1 in the current
> versions. You need to fix your RDJ for header0, or just delete it,
> since header0 through header3 are included in header.cf
>
> Yes, you can and maybe should provide a lower score, at least
> temporarily.
>
> Yes, they should be moved to header_eng, and will be this weekend.
Thanks for all this. I've been educated.
>
> Meanwhile, is it possible for you to send me some samples of the ham?
> If I add that to my corpus, it'll be taken into account in the next
> rescoring.
Sent under separate cover.
--
Chris
Re[2]: SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251
Posted by Robert Menschel <Ro...@Menschel.net>.
Hello Chris, John,
Friday, May 20, 2005, 3:47:55 AM, you wrote:
>>> I can re-score these rules (or remove sare_header0, which will lower the
>>> scores anyway), but I have 2 questions:
>>> - Is this a slightly unfair double-scoring?
>>> - Are there any other similar rules I should worry about, given that
>>> some Russian mail to this server is ham?
>>
>> These are actually in the header1 file, not header0, but surely they
>> ought to be moved to the 70_sare_header_eng.cf as they hit non-English
>> ham. Bob?
CL> They're in my header0.cf from sare/rules du jour. And in header.cf with
CL> a lower score as well. Have I got the wrong files?
Yes, your header0 is old. Both rules are in header1 in the current
versions. You need to fix your RDJ for header0, or just delete it,
since header0 through header3 are included in header.cf
Yes, you can and maybe should provide a lower score, at least
temporarily.
Yes, they should be moved to header_eng, and will be this weekend.
Meanwhile, is it possible for you to send me some samples of the ham?
If I add that to my corpus, it'll be taken into account in the next
rescoring.
Bob Menschel
Re: SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251
Posted by Chris Lear <ch...@laculine.com>.
* John Wilcock wrote (05/20/05 10:51):
> Chris Lear wrote:
>> But today I noticed that several e-mails are hitting both
>> SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251. These are ham, sent from
>> (one specific address in) Ukraine to a Ukrainian in England, written in
>> English.
>> The scoring is such that the e-mail gets a score of 3.333 PLUS 4.0 - so
>> only bayes saves it from being rejected (we reject at >5.5).
>>
>> I can re-score these rules (or remove sare_header0, which will lower the
>> scores anyway), but I have 2 questions:
>> - Is this a slightly unfair double-scoring?
>> - Are there any other similar rules I should worry about, given that
>> some Russian mail to this server is ham?
>
> These are actually in the header1 file, not header0, but surely they
> ought to be moved to the 70_sare_header_eng.cf as they hit non-English
> ham. Bob?
They're in my header0.cf from sare/rules du jour. And in header.cf with
a lower score as well. Have I got the wrong files?
RulesDuJour $ grep SARE_FROM_CHAR_W1251 *
70_sare_header.cf:header SARE_FROM_CHAR_W1251 From:raw =~
/\=\?Windows-1251\?/i
70_sare_header.cf:describe SARE_FROM_CHAR_W1251 Displays in
unexpected charset
70_sare_header.cf:score SARE_FROM_CHAR_W1251 1.666
70_sare_header.cf:#ham SARE_FROM_CHAR_W1251 Found in some
Russian ham
70_sare_header.cf:#hist SARE_FROM_CHAR_W1251 Created by Bob
Menschel May 17 2004
70_sare_header.cf:#counts SARE_FROM_CHAR_W1251 245s/4h of 238550
corpus (112525s/126025h RM) 02/28/05
70_sare_header.cf:#counts SARE_FROM_CHAR_W1251 640s/0h of 54176
corpus (16997s/37179h JH-3.01) 02/01/05
70_sare_header.cf:#counts SARE_FROM_CHAR_W1251 0s/0h of 17050
corpus (14617s/2433h MY) 08/08/04
70_sare_header0.cf:header SARE_FROM_CHAR_W1251 From:raw =~
/\=\?Windows-1251\?/i
70_sare_header0.cf:describe SARE_FROM_CHAR_W1251 Displays in
unexpected charset
70_sare_header0.cf:score SARE_FROM_CHAR_W1251 4.000
70_sare_header0.cf:#stype SARE_FROM_CHAR_W1251 spamgg
70_sare_header0.cf:#hist SARE_FROM_CHAR_W1251 Created by Bob
Menschel May 17 2004
70_sare_header0.cf:#counts SARE_FROM_CHAR_W1251 180s/0h of 66979
corpus (41757s/25222h RM) 09/04/04
70_sare_header0.cf:#counts SARE_FROM_CHAR_W1251 209s/0h of 38398
corpus (14914s/23484h JH) 08/14/04 TM2 SA3.0-pre2
70_sare_header0.cf:#counts SARE_FROM_CHAR_W1251 0s/0h of 17050
corpus (14617s/2433h MY) 08/08/04
--
Chris
Re: SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251
Posted by John Wilcock <jo...@tradoc.fr>.
Chris Lear wrote:
> But today I noticed that several e-mails are hitting both
> SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251. These are ham, sent from
> (one specific address in) Ukraine to a Ukrainian in England, written in
> English.
> The scoring is such that the e-mail gets a score of 3.333 PLUS 4.0 - so
> only bayes saves it from being rejected (we reject at >5.5).
>
> I can re-score these rules (or remove sare_header0, which will lower the
> scores anyway), but I have 2 questions:
> - Is this a slightly unfair double-scoring?
> - Are there any other similar rules I should worry about, given that
> some Russian mail to this server is ham?
These are actually in the header1 file, not header0, but surely they
ought to be moved to the 70_sare_header_eng.cf as they hit non-English
ham. Bob?
And yes, the double scoring effect does seem rather over the top to me,
even for sites that don't expect any Cyrillic ham.
John.
--
-- Over 2500 webcams from ski resorts around the world - www.snoweye.com
-- Translate your technical documents and web pages - www.tradoc.fr