You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by up...@3.am on 2008/02/14 16:17:00 UTC

Rule for Russian character sets

We're suddenly getting a ton of spam with koi8-r encoding...I tried to do
a custom rule for it like this:

header SUBJ_RUSS_CHAR           Subject =~/koi8-r/i
describe SUBJ_RUSS_CHAR         has Russian char encoding
score SUBJ_RUSS_CHAR            3.5

The short headers for these spams look like this:

Subject: [koi8-r] ??? ????

The "raw" Subject header, like this:

Subject: =?koi8-r?B?9/zkINDSxcTQ0snR1MnKINPFzcnOwdI=?=

I would think the rule would catch it either way...what am I missing?

TIA,

James Smallacombe		      PlantageNet, Inc. CEO and Janitor
up@3.am							    http://3.am
=========================================================================

Re: Rule for Russian character sets

Posted by Karsten Bräckelmann <gu...@rudersport.de>.

On Sat, 2008-02-16 at 04:26 +0800, jidanni@jidanni.org wrote:
> KB> If you want to trigger on Russian only, list all but ru.
> What if to catch Ms. Ba'loney  Margar'ine, airport security had to keep a
> current list of all the other people in the world. So this is the
> wrong approach, as we've been thru before. OK, bye.

Thank you for your most valuable contribution.

Yes, we've been through this before. However, it seems you still don't
understand. There IS NO negated counterpart to ok_locales. Also, this is
not about languages, but character sets -- and there are exactly 6. So,
listing all but one in this context doesn't seem to be asking too much.

Instead of ranting, just try to understand ok_locales as an option to
list all character sets you can read. For most people, this boils down
to one or two anyway. Thus, the general usecase is to list just these.

Also, the OP specifically asked to catch Russian only. Listing 5 locales
is the only way to do this currently. If you know about a better way,
please let me know.

Otherwise, you just wasted everyone's time. Had a bad day, eh?

  guenther

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: Rule for Russian character sets

Posted by ji...@jidanni.org.

KB> If you want to trigger on Russian only, list all but ru.
What if to catch Ms. Ba'loney  Margar'ine, airport security had to keep a
current list of all the other people in the world. So this is the
wrong approach, as we've been thru before. OK, bye.

Re: Rule for Russian character sets

Posted by Karsten Bräckelmann <gu...@rudersport.de>.

On Thu, 2008-02-14 at 10:17 -0500, up@3.am wrote:
> We're suddenly getting a ton of spam with koi8-r encoding...I tried to do
> a custom rule for it like this:
> 
> header SUBJ_RUSS_CHAR           Subject =~/koi8-r/i
> describe SUBJ_RUSS_CHAR         has Russian char encoding
> score SUBJ_RUSS_CHAR            3.5

> I would think the rule would catch it either way...what am I missing?

I guess its being decoded before matching. It's not the actual subject
anyway, but a charset definition.

Instead of writing your own rules to catch these, I suggest using
ok_locales. See the Language Options:
  http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Conf.html

If you want to trigger on Russian only, list all but ru. However, you
probably want more like en (all western charsets) only. ;)  Also, this
will trigger on header as well as on the body. grep for CHARSET_FARAWAY
in the rules, if you want to adjust its scores.

  guenther

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

RE: Rule for Russian character sets

Posted by Karsten Bräckelmann <gu...@rudersport.de>.

On Fri, 2008-02-15 at 11:49 -0500, Rosenbaum, Larry M. wrote:
> > From: Karsten Bräckelmann [mailto:guenther@rudersport.de]
> >
> > I've pointed it out before. Just use ok_locales, which is all about
> > these char sets. No REs, almost no thinking required, no headache. A
> > single line, and you're done.
> 
> What's the best way to test the character set for use in a meta rule? 
> We don't want to reject

SA doesn't reject anyway. It merely classifies and tags mail.

> all messages with the Russian (Cyrillic)
> character set, but we may want to use something like
> 
> if (character set is Russian) && (body contains 'xyzzy')

Well, it depends...

If it is ok for you to treat all char sets, which you did not set in
ok_locales, the same way, then it is just a regular meta rule -- and
based on my understanding of your description re-scoring of the few
CHARSET_FARAWY rules.

> for instance.  How would we test the character set?

This I believe can not be done with the current HeaderEval plugin, since
it does not report the char set, but treats all unwanted char sets the
same. However, if you need fine grained rules per char set, it should be
fairly easy to alter the existing plugin or to write custom rules or
plugin based on this.

  guenther

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: Rule for Russian character sets

Posted by "McDonald, Dan" <Da...@austinenergy.com>.

On Fri, 2008-02-15 at 11:04 -0800, Paul Douglas Franklin wrote:
> I believe that what you are asking for is
> meta RUSSIAN_AND_BADTEXT (CHARSET_FARAWAY && __OTHER_RULE)
> That requires first that you have set up ok_locales.

If you have TextCat enabled, then the X-Language: meta header will be
added and can be used with rules, although it doesn't show up in the
output.

I don't think that there is an equivalent X-Locales: 


> --Paul
> 
> Rosenbaum, Larry M. wrote:
> >> From: Karsten Bräckelmann [mailto:guenther@rudersport.de]
> >>
> >> I've pointed it out before. Just use ok_locales, which is all about
> >> these char sets. No REs, almost no thinking required, no headache. A
> >> single line, and you're done.
> >>     
> >
> > What's the best way to test the character set for use in a meta rule?  We don't want to reject all messages with the Russian (Cyrillic) character set, but we may want to use something like
> >
> > if (character set is Russian) && (body contains 'xyzzy')
> >
> > for instance.  How would we test the character set?
> >   
> 
-- 
Daniel J McDonald, CCIE #2495, CISSP #78281, CNX
Austin Energy
http://www.austinenergy.com

Re: Rule for Russian character sets

Posted by Paul Douglas Franklin <pd...@yugm.org>.

I believe that what you are asking for is
meta RUSSIAN_AND_BADTEXT (CHARSET_FARAWAY && __OTHER_RULE)
That requires first that you have set up ok_locales.
--Paul

Rosenbaum, Larry M. wrote:
>> From: Karsten Bräckelmann [mailto:guenther@rudersport.de]
>>
>> I've pointed it out before. Just use ok_locales, which is all about
>> these char sets. No REs, almost no thinking required, no headache. A
>> single line, and you're done.
>>     
>
> What's the best way to test the character set for use in a meta rule?  We don't want to reject all messages with the Russian (Cyrillic) character set, but we may want to use something like
>
> if (character set is Russian) && (body contains 'xyzzy')
>
> for instance.  How would we test the character set?
>   

-- 
Paul Douglas Franklin
Computer Manager, Union Gospel Mission of Yakima, Washington
Husband of Danette
Father of Laurene, Miriam, Tycko, Timothy, Sarabeth, Marie, Dawnita, Anna Leah, Alexander, and Caleb

RE: Rule for Russian character sets

Posted by "Rosenbaum, Larry M." <ro...@ornl.gov>.

> From: Karsten Bräckelmann [mailto:guenther@rudersport.de]
>
> I've pointed it out before. Just use ok_locales, which is all about
> these char sets. No REs, almost no thinking required, no headache. A
> single line, and you're done.

What's the best way to test the character set for use in a meta rule?  We don't want to reject all messages with the Russian (Cyrillic) character set, but we may want to use something like

if (character set is Russian) && (body contains 'xyzzy')

for instance.  How would we test the character set?

RE: Rule for Russian character sets (=?koi8-r? not quite a charset)

Posted by Karsten Bräckelmann <gu...@rudersport.de>.

On Fri, 2008-02-15 at 17:10 +1300, Michael Hutchinson wrote:
> > From: Karsten Bräckelmann [mailto:guenther@rudersport.de]

> > Why are you guys now trying to re-invent the wheel in the special case
> > of a gray asphalt street? What about a dirt track, grass, and anything
> > else a wheel works on?
> > 
> > I've pointed it out before. Just use ok_locales, which is all about
> > these char sets. No REs, almost no thinking required, no headache. A
> > single line, and you're done.
> 
> We don't want to "only allow" the English locale, because we (here at
> my work) do not want our international clients (non Russian) to be
> denied email service. 

ok_locales  en ja ko th zh

This will allow anything but Cyrillic char sets. Please note that en
does *not* mean "English locale" despite its name. It applies to all
Western charsets, including German Umlauts, Swedisch, French, Turkish,
etc. Basically everything that uses the characters in this post, plus
language specific chars.

> That aside, I really don't think getting detailed with Regular
> Expressions is re-inventing the wheel. Rather, it is expanding
> knowledge that will help write better rules in the future. (More
> flexible wheels, in your context).
> 
> Although I appreciated your earlier post of 'ok_locales', and
> understood it, I did not appreciate your Troll.

Sorry, I did not mean to troll nor any kind of offense.

However, you missed my point. Getting detailed with REs is a good thing,
sure. I was not about that -- but the RE in question does not properly
handle charset encoding. See the Subject for an example which is not
encoding, but will be matched by your rule.

My point was, that the rule discussed aims at being something that it
unfortunately is not, because charset encoding is slightly more complex
and definitely requires a closing part. A Regular Expression that does
this can be found in check_for_faraway_charset_in_headers() in
HeaderEval.pm:
  $hdr =~ /=\?(.+?)\?.\?.*?\?=/g

Hence, the my re-inventing the wheel analogy. And these wheels are quite
flexible, too. ;-)

Also, your rule applies to the Subject only, whereas ok_locales does
check all MIME parts and will trigger on Russian spam with a "western"
Subject.

Hope this clarifies my previous posts and is appreciated again...

  guenther

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

RE: Rule for Russian character sets

Posted by Michael Hutchinson <mh...@manux.co.nz>.

> -----Original Message-----
> From: Karsten Bräckelmann [mailto:guenther@rudersport.de]
> Sent: Friday, 15 February 2008 3:43 p.m.
> To: users@spamassassin.apache.org
> Subject: RE: Rule for Russian character sets
> 
> On Fri, 2008-02-15 at 12:19 +1300, Michael Hutchinson wrote:
> [...]
> > Does anyone have suggestions for matching question marks and equals
> > signs in one line? I would like to match everything exactly between the
> > double quotes:
> 
> Apart from neither equal nor minus being any special in an RE (outside a
> char class) unlike the question mark, which has been answered already...
> 
> 
> Why are you guys now trying to re-invent the wheel in the special case
> of a gray asphalt street? What about a dirt track, grass, and anything
> else a wheel works on?
> 
> I've pointed it out before. Just use ok_locales, which is all about
> these char sets. No REs, almost no thinking required, no headache. A
> single line, and you're done.
> 
>   guenther

We don't want to "only allow" the English locale, because we (here at my work) do not want our international clients (non Russian) to be denied email service. 

That aside, I really don't think getting detailed with Regular Expressions is re-inventing the wheel. Rather, it is expanding knowledge that will help write better rules in the future. (More flexible wheels, in your context).

Although I appreciated your earlier post of 'ok_locales', and understood it, I did not appreciate your Troll.

Cheers,
Mike

RE: Rule for Russian character sets

Posted by Karsten Bräckelmann <gu...@rudersport.de>.

On Fri, 2008-02-15 at 12:19 +1300, Michael Hutchinson wrote:
[...]
> Does anyone have suggestions for matching question marks and equals
> signs in one line? I would like to match everything exactly between the
> double quotes:

Apart from neither equal nor minus being any special in an RE (outside a
char class) unlike the question mark, which has been answered already...

Why are you guys now trying to re-invent the wheel in the special case
of a gray asphalt street? What about a dirt track, grass, and anything
else a wheel works on?

I've pointed it out before. Just use ok_locales, which is all about
these char sets. No REs, almost no thinking required, no headache. A
single line, and you're done.

  guenther

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: Rule for Russian character sets

Posted by ji...@jidanni.org.

Hmm, let me see. I use the below in user_prefs. Hope that helps.
header J_CHSET3 Subject:raw =~ /\s=\?(windows-(125[0125]|874)|koi8-r|iso-8859-[28])\?/i
score J_CHSET3 5
ifplugin Mail::SpamAssassin::Plugin::TextCat
#ok_languages en zh.big5
#http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5697
ok_languages en zh
add_header all Languages _LANGUAGES_
score UNWANTED_LANGUAGE_BODY 5
endif
ok_locales en zh

RE: Rule for Russian character sets

Posted by Michael Hutchinson <mh...@manux.co.nz>.

> -----Original Message-----
> For the most part you can match any character by the appearance of the
> character.  Any character with special meaning needs to be escaped in
some
> way.  The easiest way is usually with a backslash, but in some cases
you
> can
> also do it by making it a member of a character class.
> 
> So for you questionmark case, you could do \? or [?], as most of the
> special
> characters lose their meaning in a character class.  The exceptions
are
> obviously right bracket, backslash, and dash becomes special if it
isn't
> the
> first character.
> 
> > /\=\?koi8\-r\?/

This is what I'd setup originally, except when I ran it past a RE
interpreter the results were just.. wrong. I do think it would work,
however, and will be testing it on a Virtual Machine today to be sure.

> This should work.  You don't need to escape the dash, and I'm pretty
sure
> you don't need to escape the equal sign; just the questionmark.
> 
> Also, you may want to handle this in both uppercase and lowercase, so
you
> could do
> 
>     /=\?koi8-r\?/i
> 
> And you probably don't need the = sign to get reasonably reliable
> matching.
 
Ah, this is the bit I was unsure about, limiting how many characters are
escaped. I would tend towards the fully escaped one myself, I just
wouldn't trust non-escaped = and ? signs. But that's probably got to do
with some bad history with Spamassassin:)

Thanks for reinforcing some points with RE that needed to be (:

Cheers,
Mike

Re: Rule for Russian character sets

Posted by Loren Wilton <lw...@earthlink.net>.

> Ok fair enough. I've noticed that having the \ doesn't hurt for a dash.
> Now what about matching a question mark and an equals sign?

If you read perlre closely you will find it says that it never hurts to put 
a backslash before a special character that you want to match as a 
character.  So this is a case of "does nothing, but doesn't hurt".


> I've read perlre and perlretut and understand regular expressions, but
> there is no clear cut way of matching these characters, either outlined
> by this document or any Spamassassin document I've come across so far.

For the most part you can match any character by the appearance of the 
character.  Any character with special meaning needs to be escaped in some 
way.  The easiest way is usually with a backslash, but in some cases you can 
also do it by making it a member of a character class.

So for you questionmark case, you could do \? or [?], as most of the special 
characters lose their meaning in a character class.  The exceptions are 
obviously right bracket, backslash, and dash becomes special if it isn't the 
first character.

> /\=\?koi8\-r\?/

This should work.  You don't need to escape the dash, and I'm pretty sure 
you don't need to escape the equal sign; just the questionmark.

Also, you may want to handle this in both uppercase and lowercase, so you 
could do

    /=\?koi8-r\?/i

And you probably don't need the = sign to get reasonably reliable matching.

        Loren

RE: Rule for Russian character sets

Posted by Michael Hutchinson <mh...@manux.co.nz>.

> -----Original Message-----
> From: John Hardin [mailto:jhardin@impsec.org]
> Sent: Friday, 15 February 2008 3:07 p.m.
> To: Michael Hutchinson
> Subject: RE: Rule for Russian character sets
> 
> On Fri, 15 Feb 2008, Michael Hutchinson wrote:
> 
> > Now what about matching a question mark and an equals sign?
> 
> An equals sign isn't special but a question mark is.
> 
> > Except for a backslash, but I've heard no testimony would suggest
this
> > line will work with Spamassassin, and like before, the SARE Regular
> > Expressions Expander tool doesn't like it (and may have put un-due
doubt
> > in my head):
> >
> > /\=\?koi8\-r\?/
> 
> Try    /=\?koi8-r\?/i
> 
> NB: You can also use [?] (a character set consisting of a single
question
> mark) but that's a little clumsy.

OK sounds good, might just have to test that one under Vmware as well. 

Results from SARE Regexp expander weren't good, I don't know if I should
trust that thing anymore.

Thanks,
Mike

RE: Rule for Russian character sets

Posted by Michael Hutchinson <mh...@manux.co.nz>.

> -----Original Message-----
> From: John Hardin [mailto:jhardin@impsec.org]
> Sent: Friday, 15 February 2008 2:19 p.m.
> To: Michael Hutchinson
> Cc: users@spamassassin.apache.org
> Subject: RE: Rule for Russian character sets
> 
> On Fri, 15 Feb 2008, Michael Hutchinson wrote:
> 
> > Are we not meant to delimit characters like a minus sign?
> >
> > Ex:
> > header SUBJ_RUSS_CHAR			Subject:raw =~
/koi8\-r/i
> 
> Only where they have special meaning, and a dash is only "special" in
a
> character set, e.g. [A-Z]. I have found the simplest way to avoid
> misinterpretation in that context is to put the dash first, e.g.
> [-abcde12345]

Ok fair enough. I've noticed that having the \ doesn't hurt for a dash.
Now what about matching a question mark and an equals sign? 

I'm tempted to setup Spamassassin under a virtual machine, just so I can
test against \= and \?

I've read perlre and perlretut and understand regular expressions, but
there is no clear cut way of matching these characters, either outlined
by this document or any Spamassassin document I've come across so far.

Except for a backslash, but I've heard no testimony would suggest this
line will work with Spamassassin, and like before, the SARE Regular
Expressions Expander tool doesn't like it (and may have put un-due doubt
in my head):

/\=\?koi8\-r\?/

I tried using \x1B notation, and it doesn't work, so presumably, not
every feature of perl regular expressions work under Spamassassin.

Cheers,
Mike

RE: Rule for Russian character sets

Posted by John Hardin <jh...@impsec.org>.

On Fri, 15 Feb 2008, Michael Hutchinson wrote:

> Are we not meant to delimit characters like a minus sign?
>
> Ex:
> header SUBJ_RUSS_CHAR			Subject:raw =~ /koi8\-r/i

Only where they have special meaning, and a dash is only "special" in a 
character set, e.g. [A-Z]. I have found the simplest way to avoid 
misinterpretation in that context is to put the dash first, e.g. 
[-abcde12345]

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   ...to announce there must be no criticism of the President or to
   stand by the President right or wrong is not only unpatriotic and
   servile, but is morally treasonous to the American public.
                                           -- Theodore Roosevelt, 1918
-----------------------------------------------------------------------
  8 days until George Washington's 276th Birthday

RE: Rule for Russian character sets

Posted by Michael Hutchinson <mh...@manux.co.nz>.

> -----Original Message-----
> > > We're suddenly getting a ton of spam with koi8-r encoding...I
tried to
> > > do a custom rule for it like this:
> > >
> > > header SUBJ_RUSS_CHAR           Subject =~/koi8-r/i
> > > describe SUBJ_RUSS_CHAR         has Russian char encoding
> > > score SUBJ_RUSS_CHAR            3.5
> > >
> > > The short headers for these spams look like this:
> > >
> > > Subject: [koi8-r] ??? ????
> > >
> > > The "raw" Subject header, like this:
> > >
> > > Subject: =?koi8-r?B?9/zkINDSxcTQ0snR1MnKINPFzcnOwdI=?=
> > >
> > > I would think the rule would catch it either way...what am I
missing?
> >
> > I think this should work:
> >
> > header SUBJ_RUSS_CHAR           Subject:raw =~ /koi8-r/i
> 
> That did it, thanks!
> 

Are we not meant to delimit characters like a minus sign?

Ex:
header SUBJ_RUSS_CHAR			Subject:raw =~ /koi8\-r/i

I would really like to trap the question marks too, just in case someone
sends a legitimate email with koi8-r in the subject (ie: "why does email
with the koi8-r character set get tagged as spam?)

In other words, the following rule (if it worked) would be nice to use
instead:

Ex:

Header SUBJ_RUSS_CHAR			Subject:raw =~ /\=\?koi8\-r\?/

Where we could trap the Equals sign, and two question marks. I have not
employed this rule because I think its dodgy, the Regexp expander over
at SARE says there is a scary amount of matches (2000+) with that rule,
so I'm presuming that the matching for the equals character and the
question mark are not working properly, and will have to be delimited
some other way. For example, using the \x1B notation, but I've had no
luck with this.

Does anyone have suggestions for matching question marks and equals
signs in one line? I would like to match everything exactly between the
double quotes:

"=?koi8-r?"

If I were to read the perldoc docs I'd be using "\=\?koi8\-r\?"
But I don't want to test it on my live server, because of the output of
the Regex expander utility.

Help anyone?

Cheers,
Mike

Re: Rule for Russian character sets

Posted by up...@3.am.

On Thu, 14 Feb 2008, Per Jessen wrote:

> up@3.am wrote:
>
> >
> > We're suddenly getting a ton of spam with koi8-r encoding...I tried to
> > do a custom rule for it like this:
> >
> > header SUBJ_RUSS_CHAR           Subject =~/koi8-r/i
> > describe SUBJ_RUSS_CHAR         has Russian char encoding
> > score SUBJ_RUSS_CHAR            3.5
> >
> > The short headers for these spams look like this:
> >
> > Subject: [koi8-r] ??? ????
> >
> > The "raw" Subject header, like this:
> >
> > Subject: =?koi8-r?B?9/zkINDSxcTQ0snR1MnKINPFzcnOwdI=?=
> >
> > I would think the rule would catch it either way...what am I missing?
>
> I think this should work:
>
> header SUBJ_RUSS_CHAR           Subject:raw =~ /koi8-r/i

That did it, thanks!

James Smallacombe		      PlantageNet, Inc. CEO and Janitor
up@3.am							    http://3.am
=========================================================================

Re: Rule for Russian character sets

Posted by Per Jessen <pe...@computer.org>.

up@3.am wrote:

> 
> We're suddenly getting a ton of spam with koi8-r encoding...I tried to
> do a custom rule for it like this:
> 
> header SUBJ_RUSS_CHAR           Subject =~/koi8-r/i
> describe SUBJ_RUSS_CHAR         has Russian char encoding
> score SUBJ_RUSS_CHAR            3.5
> 
> The short headers for these spams look like this:
> 
> Subject: [koi8-r] ??? ????
> 
> The "raw" Subject header, like this:
> 
> Subject: =?koi8-r?B?9/zkINDSxcTQ0snR1MnKINPFzcnOwdI=?=
> 
> I would think the rule would catch it either way...what am I missing?

I think this should work:

header SUBJ_RUSS_CHAR           Subject:raw =~ /koi8-r/i



/Per Jessen, Zürich