You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Karsten Bräckelmann <gu...@rudersport.de> on 2011/03/21 04:44:54 UTC

Fwd: Re: Reproducing Bug 6559

Lengthy, I mean *verbose* reply by Matt Elson below. Awesome! The
important bits of my original private reply have been communicated to
the users list already.

First a lot of test machine details (which I mainly ignored), then the
beef that might lead to the "use space rather than \s in body rules"
problem with re2c compilation, only exhibited in some highly specific
circumstances.

Approval to publish:
> > Matt, since you sent this off-list to me, is it OK to share the info
> > with the (public) dev list, also?
> 
> Sure; I'm just feeling a little embarrassed/paranoid about the fact my 
> initial email with debugging info triggered the issue on other people's 
> systems - just wasn't thinking too clearly when I sent it out.
> 
> So assuming it doesn't destroy systems, go ahead ;).

Don't worry. One of the original reports a few hours ago already sported
the patterns that trigger the issue. We'll be fine. ;)


-------- Forwarded Message --------
From: Matt Elson
To: Karsten Bräckelmann
Subject: Re: Reproducing Bug 6559
Date: Sun, 20 Mar 2011 22:45:43 -0400

> Since there have been offers for further testing: One data point is to
> collect details about systems, CPU architecture, instruction set used
> for compiling, versions (OS, kernel, compiler, re2c, Perl) and patch-
> level.
>

I've seen the issue on six different hosts now. Three of them are my 
normal production machines, three were ones I'm testing on specifically 
for this purpose. Here's the specs (let me know if I missed something or 
misunderstood the request.. which I suspect I have).

Sorry, this is a bit long and conceivably unclear..

RHEL4 32bit machine, production box #1
---

CPU:
model name	: Intel(R) Xeon(TM) CPU 2.80GHz

uname -a:
Linux spam2 2.6.9-89.0.19.ELsmp #1 SMP Wed Dec 30 12:53:30 EST 2009 i686 
i686 i386 GNU/Linux

gcc -v:

Reading specs from /usr/lib/gcc/i386-redhat-linux/3.4.6/specs
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
--infodir=/usr/share/info --enable-shared --enable-threads=posix 
--disable-checking --with-system-zlib --enable-__cxa_atexit 
--disable-libunwind-exceptions --enable-java-awt=gtk 
--host=i386-redhat-linux
Thread model: posix
gcc version 3.4.6 20060404 (Red Hat 3.4.6-11)

re2c -v:
re2c 0.13.2

spamassassin -V:

SpamAssassin version 3.3.1
   running on Perl version 5.8.5
(from spamassassin.apache.org)

/etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant Update 9)

(should be up to date)


RHEL5 32bit machine, production box #2
---
CPU:
model name	: Intel(R) Xeon(R) CPU            5110  @ 1.60GHz

uname -a
Linux spam3 2.6.18-238.5.1.el5PAE #1 SMP Mon Feb 21 06:01:16 EST 2011 
i686 i686 i386 GNU/Linux

gcc -v
Using built-in specs.
Target: i386-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
--infodir=/usr/share/info --enable-shared --enable-threads=posix 
--enable-checking=release --with-system-zlib --enable-__cxa_atexit 
--disable-libunwind-exceptions --enable-libgcj-multifile 
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada 
--enable-java-awt=gtk --disable-dssi --disable-plugin 
--with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre 
--with-cpu=generic --host=i386-redhat-linux
Thread model: posix
gcc version 4.1.2 20080704 (Red Hat 4.1.2-50)

re2c -v
re2c 0.13.5

spamassassin -V:

SpamAssassin version 3.3.1
   running on Perl version 5.8.8

(from spamassassin.apache.org)

cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.6 (Tikanga)
(should only be off on patches by a week or so)

RHEL5 32bit machine, production box #3
---

  Intel(R) Xeon(R) CPU           E5430  @ 2.66GHz


uname -a:

Linux spam4 2.6.18-194.32.1.el5PAE #1 SMP Mon Dec 20 11:00:23 EST 2010 
i686 i686 i386 GNU/Linux

gcc -v:
Using built-in specs.
Target: i386-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
--infodir=/usr/share/info --enable-shared --enable-threads=posix 
--enable-checking=release --with-system-zlib --enable-__cxa_atexit 
--disable-libunwind-exceptions --enable-libgcj-multifile 
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada 
--enable-java-awt=gtk --disable-dssi --enable-plugin 
--with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre 
--with-cpu=generic --host=i386-redhat-linux
Thread model: posix
gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)

re2c -v:
re2c 0.13.2

spamassassin -V:
SpamAssassin version 3.3.1
   running on Perl version 5.8.8

(from spamassassin.apache.org)

cat /etc/redhat-release:
   Red Hat Enterprise Linux Server release 5.5 (Tikanga)
(behind a bit)

New RHEL6 64bit machine, virtual (vmware)
---
CPU: model name	: Intel(R) Xeon(R) CPU           E5345  @ 2.33GHz

uname -a:

Linux rhel6-x64 2.6.32-71.18.2.el6.x86_64 #1 SMP Wed Mar 2 14:17:40 EST 
2011 x86_64 x86_64 x86_64 GNU/Linux

gcc -v

Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
--infodir=/usr/share/info 
--with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap 
--enable-shared --enable-threads=posix --enable-checking=release 
--with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions 
--enable-gnu-unique-object 
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada 
--enable-java-awt=gtk --disable-dssi 
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre 
--enable-libgcj-multifile --enable-java-maintainer-mode 
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar 
--disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic 
--with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC)

re2c -v:

re2c 0.13.5

spamassassin -V:

SpamAssassin version 3.3.1
   running on Perl version 5.10.1
(redhat packaged)

Latest patches from RedHat as of today.

New RHEL6 32bit machine, virtual (vmware)
----

CPU reported: model name	: Intel(R) Xeon(R) CPU           E5345  @ 2.33GHz

uname -a:

Linux rhel6-32 2.6.32-71.18.2.el6.i686 #1 SMP Wed Mar 2 14:38:52 EST 
2011 i686 i686 i386 GNU/Linux

gcc -v:

Target: i686-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
--infodir=/usr/share/info 
--with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap 
--enable-shared --enable-threads=posix --enable-checking=release 
--with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions 
--enable-gnu-unique-object 
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada 
--enable-java-awt=gtk --disable-dssi 
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre 
--enable-libgcj-multifile --enable-java-maintainer-mode 
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar 
--disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic 
--with-arch=i686 --build=i686-redhat-linux
Thread model: posix
gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC)

re2c -v:

re2c 0.13.5

spamassassin -V:

SpamAssassin version 3.3.1
   running on Perl version 5.10.1
(redhat packaged)

Latest patches from all programs from RedHat as of today

Debian Desktop (work desktop)
----

CPU:
model name	: Intel(R) Pentium(R) 4 CPU 2.66GHz

uname -a:

Linux workDesktop 2.6.37-2-686 #1 SMP Sun Feb 27 10:51:32 UTC 2011 i686 
GNU/Linux

gcc -v:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/i486-linux-gnu/4.5.2/lto-wrapper
Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.5.2-5' 
--with-bugurl=file:///usr/share/doc/gcc-4.5/README.Bugs 
--enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr 
--program-suffix=-4.5 --enable-shared --enable-multiarch 
--enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib 
--without-included-gettext --enable-threads=posix 
--with-gxx-include-dir=/usr/include/c++/4.5 --libdir=/usr/lib 
--enable-nls --enable-clocale=gnu --enable-libstdcxx-debug 
--enable-libstdcxx-time=yes --enable-plugin --enable-gold 
--enable-ld=default --with-plugin-ld=ld.gold --enable-objc-gc 
--enable-targets=all --with-arch-32=i586 --with-tune=generic 
--enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu 
--target=i486-linux-gnu
Thread model: posix
gcc version 4.5.2 (Debian 4.5.2-5)

spamassassin -V:

SpamAssassin version 3.3.1
   running on Perl version 5.10.1
   (from Debian packaging)

re2c -v:
  re2c 0.13.5
   (from Debian packaging)

Up to date as of.. last week; Debian Unstable.


> Another might be to reproduce the issue, and get a minimal test-case.

I'm not super familiar with any advance usage of SpamAssassin, so 
apologies for the next bits.

> For that, can you reproduce the problem with trivial REs for the three
> __PILL_PRICE_x sub-rules?
>

Not quite sure what this means (sorry); does this mean trying to get a 
simplified form of the __PILL_PRICE rules that still trigger the problem?

If so here's a few crude manglings of __PILL_PRICE_3 that still causes 
the loop on one of my test machines:

(using: http://pastebin.com/iGQ2RJ6v)

body __PILL_PRICE_3 /free\s(?:pill|cap(?:sule|let))s/i
body __PILL_PRICE_3 /free\s(?:pill|cap)s/i
body __PILL_PRICE_3 /free\spills/i
body __PILL_PRICE_3 /free\s/i
body __PILL_PRICE_3 /Free\s/
body __PILL_PRICE_3 /ree\s/

The following does *not* cause the problem, however:

body __PILL_PRICE_3 /ee\s/
body __PILL_PRICE_3 /free pills/i
body __PILL_PRICE_3 /free\ pills/i
body __PILL_PRICE_3 /\s/
body __PILL_PRICE_3 /s\s/

Playing around a bit, the following also causes the problem (using my 
sample text from the pastebin)

body __PILL_PRICE_3 /shipping\s/i
body __PILL_PRICE_3 /ping\s/i
body __PILL_PRICE_3 /ing\s/i
body __PILL_PRICE_3 /quality\s/i
body __PILL_PRICE_3 /ity\s/i

Whereas the following do not:

body __PILL_PRICE_3 /ty\s/i
body __PILL_PRICE_3 /ng\s/i

I have no idea why, but it seems:

\s proceeded by three or more characters and tflags multiple

regularly hits the problem for me.


> Can you reproduce the problem by keeping (a renamed copy of) the
> original sub-rules and tflags, using a simple meta rule?

Not quite sure what do here either; if I disable the rules with the meta 
trick and then make one that mirrors it (based on __PILL_PRICE3), I get 
the same behavior:

i.e.

meta __PILL_PRICE_1 (0)
meta __PILL_PRICE_3 (0)
meta __PILL_PRICE_2 (0)

body LOCAL_TEST /free\s(?:pill|tablet|cap(?:sule|let))s/i
tflags LOCAL_TEST multiple

Will still cause the problem, but of course hitting LOCAL_TEST as 
opposed to the __PILL_PRICE rules.

I think I completely misunderstood the question though ;).

> Are two of them
> sufficient? Or even one?

I disabled __PILL_PRICE1, __PILL_PRICE2 with meta (0) and can still get 
the error with just __PILL_PRICE_3 being active.

And not sure if it helps, but I ran into a similar behavior w/ re2c a 
long time ago:

http://mail-archives.apache.org/mod_mbox/spamassassin-users/200907.mbox/%3C4A4A4EC6.2000301@fastmail.net%3E

Like I said, it may just be noise but I figured it can't hurt to have 
another data point.

Anyway, hope this all helps.

Matt


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: [SA-dev] Fwd: Re: Reproducing Bug 6559

Posted by John Hardin <jh...@impsec.org>.
On Mon, 21 Mar 2011, Adam Katz wrote:

> I would want to try this, which should be a faster regex anyway:

Thanks for your suggestions, I'll take a look at them soonest.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Watch... Wallet... Gun... Knee...                    -- Denny Crane
-----------------------------------------------------------------------
  8 days until the M1911 is 100 years old - and still going strong!

Re: [SA-dev] Fwd: Re: Reproducing Bug 6559

Posted by Adam Katz <an...@khopis.com>.
On 03/20/2011 08:44 PM, Karsten Bräckelmann forwarded From: Matt Elson
> I have no idea why, but it seems:
> \s proceeded by three or more characters and tflags multiple
> regularly hits the problem for me.

I don't have much experience with non-production re2c; how do I properly
reproduce (and therefore test) this bug on svn trunk?

I would want to try this, which should be a faster regex anyway:

/free\s[ptc](?:ill|ablet|ap(?:sule|let)s/i

I also wanted to try a leading word-break ("\b") in front of the regex,
though I don't know how many spams that will skip.

While looking at the PILL_PRICE rules,

body  __PILL_PRICE_1
m;\$?[\d\s.]{3,8}(?:/|per|each)\s?(?:pill|tablet|cap(?:sule|let));i

What is the point of leading with an optional piece?  That regex is
identical to this simpler one:

m;[\d\s.]{3}(?:/|per|each)\s?(?:pill|tablet|cap(?:sule|let));i

Another point; what if we merge _1 and _3 from

_1 m;\$?[\d\s.]{3,8}(?:/|per|each)\s?(?:pill|tablet|cap(?:sule|let));i
_2 /(?:pill|tablet|cap(?:sule|let))s\s\$?[\d\s.]{3,8}/i
_3 /free\s(?:pill|tablet|cap(?:sule|let))s/i

into (note removal of _1's optional lead)

m;(?:[\d\s.]{3}(?:/|per|each)|free)\s?(?:pill|tablet|cap(?:sule|let));i

Matt already showed that disabling _1 and _2 didn't prevent the problem
with _3, so this isn't as much of a potential remedy as it initially
seems, but it should be slightly more efficient and might avoid the re2c
bug.


Re: Fwd: Re: Reproducing Bug 6559

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Mon, 2011-03-21 at 04:44 +0100, Karsten Bräckelmann wrote:
> > > Matt, since you sent this off-list to me, is it OK to share the info
> > > with the (public) dev list, also?
> > 
> > Sure; I'm just feeling a little embarrassed/paranoid about the fact my 
> > initial email with debugging info triggered the issue on other people's 
> > systems - just wasn't thinking too clearly when I sent it out.
> > 
> > So assuming it doesn't destroy systems, go ahead ;).
> 
> Don't worry. One of the original reports a few hours ago already sported
> the patterns that trigger the issue. We'll be fine. ;)

Oh, wait, now I realize -- you where actually speaking about exactly
that, and your report, which was the very first! :-D

Anyway, don't worry -- it's either that, or the next male enhancement
spam coming in to trigger the bug. And I still advise not to scan SA
list mail.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}