You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Matija Nalis <mn...@voyager.hr> on 2023/06/21 15:41:34 UTC

Re: Problems matching the last word in multi-OR Regex

On Thu, Dec 15, 2022 at 09:17:54AM -0500, Bill Cole wrote:
> On 2022-12-15 at 07:03:25 UTC-0500 (Thu, 15 Dec 2022 12:03:25 +0000 (UTC))
> Pedro David Marco via users <pe...@yahoo.com> is rumored to have said:
> 
> > HI,
> > Situation:i have 2 twin servers running exactly the same OS, and SA.
> > (3.4.4)

Are there different version of some external plugins installed,
maybe?

> > i have an email with the word 'dog' inside.
> > i have this rule:      body    __ANIMALS    /cat|mouse|bird|dog/i
> > 
> > Problem:Rule  __ANIMALS  its in one server, but in the other one, does
> > not!

Interesting. Is there perhaps some syntax error elsewhere in the file? 
You can check with "spamassassin --lint"

Also, maybe there is another rule with same name defined elsewhere
(maybe editor backup file that SA includes?)

> > i have noticed that if i switch the rule words order, like this:
> > 
> >   body    __ANIMALS    /cat|mouse|dog|bird/i
> > 
> > and 'dog' is not the latest word, then it hits on both servers.
> > 
> > I have tried many permutations and it only fails with the word that
> > appears the last in regular expressions with multiple OR
> > Has anyone seed this before? is that a known bug?  
> 
> This is absolutely NOT a known bug. I'm not sure how it is possible for
> something so fundamental to still be lurking in SA undiscovered. I don't
> think the basic parsing of REs in rules has changed since v2.
> 
> It would help a great deal if you could open a bug at
> https://bz.apache.org/SpamAssassin/ with sample messages that are hit or not
> by different variants of the rule.

I agree. Do mention the issue in this thread when you open it, so
interested parties may follow.


One other obscure situation that comes to mind that might possibly
happen is that one used "sa-compile" in the past for previous version
of the regex, but something went wrong with system clock so SA does
not detect that changed regex needs recompiling and continues to use
old outdated version)

Or are you using spamc/spamd which did not reload new rule?

Ore maybe the word "dog" is copy/pasted instead of type and so it
includes some invisible UTF8 characters.

I'd suggest if you could try creating new unique different name for
the rule (e.g. NEWANIMALS_20230621), typing the rule content manually
instead of copy/pasting, and checking if that rule matches by using
"spamassassin -t" on that?

That should rule out most of the possible other issues above.

-- 
Opinions above are GNU-copylefted.