You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by Justin Erenkrantz <je...@ebuilt.com> on 2001/06/24 04:42:28 UTC

GCC 2.96 optimization bug

I just ran across a compiler bug when dealing with long long in the GCC
shipped with Mandrake 8.0 (Intel).  When -O2 is specified (it's in the
default), it seems to lose the upper 32 bits of the 64 bit integers.
(I was playing with the XML code in apr-util and the return values from
elem_size() were corrupted.)  Removing -O2 fixes this.

% gcc -v
Reading specs from /usr/lib/gcc-lib/i586-mandrake-linux/2.96/specs
gcc version 2.96 20000731 (Linux-Mandrake 8.0 2.96-0.48mdk)

This is a fairly old snapshot.  I am building gcc-3.0 now and going to 
see if the problem still exists (unlikely).  Even if it doesn't, we 
probably need to stick something in autoconf to avoid optimizations 
with this buggy compiler.

Silly distros shipping beta compilers (2.96 was never a released
version).  This is lame.  

Thoughts?  Should we just avoid this problem and tell anyone using a
default install of Mandrake that they are screwed?  Or, do we just
disable compiler optimizations with gcc 2.96?  -- justin


Re: GCC 2.96 optimization bug

Posted by Joe Orton <jo...@btconnect.com>.
On Sun, Jun 24, 2001 at 01:09:11AM -0700, Justin Erenkrantz wrote:
> On Sat, Jun 23, 2001 at 07:42:28PM -0700, Justin Erenkrantz wrote:
> > I just ran across a compiler bug when dealing with long long in the GCC
> > shipped with Mandrake 8.0 (Intel).  When -O2 is specified (it's in the
> > default), it seems to lose the upper 32 bits of the 64 bit integers.
> > (I was playing with the XML code in apr-util and the return values from
> > elem_size() were corrupted.)  Removing -O2 fixes this.

I can't seem to find where 64-bit integers are used in the XML code?  
elem_size uses size_t everywhere AFAICT, I guessed you mean Intel x86?

> FWIW (and as expected), GCC 3.0 doesn't exhibit this problem with the
> -O2 flags set.
>
> Any thoughts as to what we should do?  It is a buggy compiler, but a
> fairly popular "stable" Linux distro has this compiler installed by 
> default.  I believe (at one point) RH 7 included GCC 3.0 CVS 
> snapshots.  I wonder if other distributions are affected by this 
> problem.

The Mandrake gcc package follows the Red Hat Linux gcc package (plus
extra bits). From looking at the spec files it looks like your MDK8.0
gcc has the patches from the RHL7.1 gcc. Can you post code and I can try
to reproduce it on 7.1?

> And, it looks like the -O2 flag is coming from the autoconf distribution
> itself (/usr/share/autoconf/acspecific.m4).  So, this doesn't look
> directly configurable in our autoconf scripts (unless we replace
> AC_PROG_CC).  -- justin

Well you could sed it out after the AC_PROG_CC is used if it proves
necessary (CFLAGS="`echo $CFLAGS | sed s/-O2//g`")

Regards,

joe


Re: GCC 2.96 optimization bug

Posted by Justin Erenkrantz <je...@ebuilt.com>.
On Sat, Jun 23, 2001 at 07:42:28PM -0700, Justin Erenkrantz wrote:
> I just ran across a compiler bug when dealing with long long in the GCC
> shipped with Mandrake 8.0 (Intel).  When -O2 is specified (it's in the
> default), it seems to lose the upper 32 bits of the 64 bit integers.
> (I was playing with the XML code in apr-util and the return values from
> elem_size() were corrupted.)  Removing -O2 fixes this.

FWIW (and as expected), GCC 3.0 doesn't exhibit this problem with the
-O2 flags set.

Any thoughts as to what we should do?  It is a buggy compiler, but a
fairly popular "stable" Linux distro has this compiler installed by 
default.  I believe (at one point) RH 7 included GCC 3.0 CVS 
snapshots.  I wonder if other distributions are affected by this 
problem.

And, it looks like the -O2 flag is coming from the autoconf distribution
itself (/usr/share/autoconf/acspecific.m4).  So, this doesn't look
directly configurable in our autoconf scripts (unless we replace
AC_PROG_CC).  -- justin


Re: GCC 2.96 optimization bug

Posted by Justin Erenkrantz <je...@ebuilt.com>.
On Mon, Jun 25, 2001 at 08:16:06AM -0700, Ian Holsman wrote:
> There was a update to GCC from redhat over the weekend, did you
> apply that patch perhaps?

I didn't touch anything.  That's what makes it odd.  It didn't work, and
now it does.  -- justin


Re: GCC 2.96 optimization bug

Posted by Ian Holsman <ia...@cnet.com>.
On 24 Jun 2001 20:07:02 -0700, Justin Erenkrantz wrote:
> FWIW, I can't seem to reproduce the problem now.  I know that I was
> getting odd things last night, but I can't seem to get it to happen
> again today.  Grr.  I feel silly.  I could put a note in the STATUS 
> file that we've seen some oddities with GCC 2.96.  I swear I wasn't
> making it up - I *was* getting segfaults because it was trying to
> allocate too much memory (the size returned from elem_size() was
> corrupt).

There was a update to GCC from redhat over the weekend, did you
apply that patch perhaps?

-- 
Ian Holsman 
Performance Measurement and Analysis
CNET Networks
PH: (415) 364 8608


Re: GCC 2.96 optimization bug

Posted by Justin Erenkrantz <je...@ebuilt.com>.
FWIW, I can't seem to reproduce the problem now.  I know that I was
getting odd things last night, but I can't seem to get it to happen
again today.  Grr.  I feel silly.  I could put a note in the STATUS 
file that we've seen some oddities with GCC 2.96.  I swear I wasn't
making it up - I *was* getting segfaults because it was trying to
allocate too much memory (the size returned from elem_size() was
corrupt).

I do think it is something to keep an eye out on though.  I do think
Ryan's point of placing a fatal error in the autoconf routine might be
the best thing for us to do.  Force these guys to only distribute 
released compilers.  

But, once I can solidly reproduce the bug, we can then take appropriate
action.  For now, I'll keep a look out for odd things like this.  Oh,
well.  -- justin

On Sun, Jun 24, 2001 at 07:32:40PM -0700, rbb@covalent.net wrote:
> 
> > > > Thoughts?  Should we just avoid this problem and tell anyone using a
> > > > default install of Mandrake that they are screwed?  Or, do we just
> > > > disable compiler optimizations with gcc 2.96?  -- justin
> > >
> > > We shouldn't have to work around bugs in beta versions of the compiler.
> > > If somebody is using a buggy compiler, then they are screwed.  Hopefully,
> > > this will teach distributions that they have a responsability to use good
> > > packages when they assemble their dist.
> >
> > Much as I'd love to agree with you, it seems to me that RHL7.1 and the
> > corresponding Mandrake distro are widespread enough that we should pay
> > them attention.  If they're broken and we know how to work around it, I
> > think we should.  Or at *least* we should warn the user to install a
> > usable revision of gcc.  Just letting it segfault with no warning at all
> > doesn't seem acceptable to me.
> >
> > PS: Don't take this to mean that I think we're responsible for finding all
> > bugs in the RH distros and accounting for them... I just think that if we
> > *do* happen to know about one that's as nasty as this then we should try
> > to inform the user somehow.  <shrug>
> 
> I wouldn't have any problem putting a message in the install file, or
> putting a warning in even in the autoconf script.  I do not want to get in
> the habit of putting work-arounds in our configure script for buggy
> compilers though.  If we fix the problems with poor distributions, then we
> are condoning people releasing software before it is ready.  Red Hat and
> Mandrake made a poor decision, but we shouldn't have to hack our config
> file to fix their mistake.  If we work-around this by exiting and issuing
> a message that says:
> 
> "Your compiler is known to have bugs which stop Apache from working
> properly.  Please re-configure with CFLAGS=-O2"
> 
> This gets around the problem, but doesn't make us responsible for fixing
> RH and Mandrake's mistakes.  If there was some way to put their names in
> the error, that would be even better.
> 
> Unless groups hold RH and Mandrake accountable, they will continue to make
> irresponsible decisions.
> 
> Ryan
> 
> _______________________________________________________________________________
> Ryan Bloom                        	rbb@apache.org
> 406 29th St.
> San Francisco, CA 94131
> -------------------------------------------------------------------------------


Re: GCC 2.96 optimization bug

Posted by Cliff Woolley <cl...@yahoo.com>.
On Sun, 24 Jun 2001 rbb@covalent.net wrote:

> "Your compiler is known to have bugs which stop Apache from working
> properly.  Please re-configure with CFLAGS=-O2"

CFLAGS=-O0

+1

> This gets around the problem, but doesn't make us responsible for fixing
> RH and Mandrake's mistakes.  If there was some way to put their names in
> the error, that would be even better.
> Unless groups hold RH and Mandrake accountable, they will continue to make
> irresponsible decisions.

Agreed.

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA



Re: GCC 2.96 optimization bug

Posted by rb...@covalent.net.
> > > Thoughts?  Should we just avoid this problem and tell anyone using a
> > > default install of Mandrake that they are screwed?  Or, do we just
> > > disable compiler optimizations with gcc 2.96?  -- justin
> >
> > We shouldn't have to work around bugs in beta versions of the compiler.
> > If somebody is using a buggy compiler, then they are screwed.  Hopefully,
> > this will teach distributions that they have a responsability to use good
> > packages when they assemble their dist.
>
> Much as I'd love to agree with you, it seems to me that RHL7.1 and the
> corresponding Mandrake distro are widespread enough that we should pay
> them attention.  If they're broken and we know how to work around it, I
> think we should.  Or at *least* we should warn the user to install a
> usable revision of gcc.  Just letting it segfault with no warning at all
> doesn't seem acceptable to me.
>
> PS: Don't take this to mean that I think we're responsible for finding all
> bugs in the RH distros and accounting for them... I just think that if we
> *do* happen to know about one that's as nasty as this then we should try
> to inform the user somehow.  <shrug>

I wouldn't have any problem putting a message in the install file, or
putting a warning in even in the autoconf script.  I do not want to get in
the habit of putting work-arounds in our configure script for buggy
compilers though.  If we fix the problems with poor distributions, then we
are condoning people releasing software before it is ready.  Red Hat and
Mandrake made a poor decision, but we shouldn't have to hack our config
file to fix their mistake.  If we work-around this by exiting and issuing
a message that says:

"Your compiler is known to have bugs which stop Apache from working
properly.  Please re-configure with CFLAGS=-O2"

This gets around the problem, but doesn't make us responsible for fixing
RH and Mandrake's mistakes.  If there was some way to put their names in
the error, that would be even better.

Unless groups hold RH and Mandrake accountable, they will continue to make
irresponsible decisions.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: GCC 2.96 optimization bug

Posted by Cliff Woolley <jw...@virginia.edu>.
On Sun, 24 Jun 2001 rbb@covalent.net wrote:

> > Thoughts?  Should we just avoid this problem and tell anyone using a
> > default install of Mandrake that they are screwed?  Or, do we just
> > disable compiler optimizations with gcc 2.96?  -- justin
>
> We shouldn't have to work around bugs in beta versions of the compiler.
> If somebody is using a buggy compiler, then they are screwed.  Hopefully,
> this will teach distributions that they have a responsability to use good
> packages when they assemble their dist.

Much as I'd love to agree with you, it seems to me that RHL7.1 and the
corresponding Mandrake distro are widespread enough that we should pay
them attention.  If they're broken and we know how to work around it, I
think we should.  Or at *least* we should warn the user to install a
usable revision of gcc.  Just letting it segfault with no warning at all
doesn't seem acceptable to me.

PS: Don't take this to mean that I think we're responsible for finding all
bugs in the RH distros and accounting for them... I just think that if we
*do* happen to know about one that's as nasty as this then we should try
to inform the user somehow.  <shrug>

--Cliff


--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA



Re: GCC 2.96 optimization bug

Posted by rb...@covalent.net.
> I just ran across a compiler bug when dealing with long long in the GCC
> shipped with Mandrake 8.0 (Intel).  When -O2 is specified (it's in the
> default), it seems to lose the upper 32 bits of the 64 bit integers.
> (I was playing with the XML code in apr-util and the return values from
> elem_size() were corrupted.)  Removing -O2 fixes this.
>
> % gcc -v
> Reading specs from /usr/lib/gcc-lib/i586-mandrake-linux/2.96/specs
> gcc version 2.96 20000731 (Linux-Mandrake 8.0 2.96-0.48mdk)
>
> This is a fairly old snapshot.  I am building gcc-3.0 now and going to
> see if the problem still exists (unlikely).  Even if it doesn't, we
> probably need to stick something in autoconf to avoid optimizations
> with this buggy compiler.
>
> Silly distros shipping beta compilers (2.96 was never a released
> version).  This is lame.
>
> Thoughts?  Should we just avoid this problem and tell anyone using a
> default install of Mandrake that they are screwed?  Or, do we just
> disable compiler optimizations with gcc 2.96?  -- justin

We shouldn't have to work around bugs in beta versions of the compiler.
If somebody is using a buggy compiler, then they are screwed.  Hopefully,
this will teach distributions that they have a responsability to use good
packages when they assemble their dist.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: GCC 2.96 optimization bug

Posted by Cliff Woolley <cl...@yahoo.com>.
On Sun, 24 Jun 2001, Dale Ghent wrote:

> When Apache was compile sans -O2, everything worked well and there was no
> segfault.

What about -O?

> You're right... because of a 2.96 snapshot was used in RH, the use of
> 2.96 (I'm assuming any snapshot. I see that your's was older than mine)
> should probably tell autoconf not to include any optimization.

If you're gonna do that (which is probably unavoidable), there'd better be
a BIG warning message issued, obviously... but other than that, I have no
problem with this.

http://gcc.gnu.org/gcc-2.96.html

--Cliff


--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA



Re: GCC 2.96 optimization bug

Posted by Dale Ghent <da...@elemental.org>.
On Sat, 23 Jun 2001, Justin Erenkrantz wrote:

| I just ran across a compiler bug when dealing with long long in the GCC
| shipped with Mandrake 8.0 (Intel).  When -O2 is specified (it's in the
| default), it seems to lose the upper 32 bits of the 64 bit integers.
| (I was playing with the XML code in apr-util and the return values from
| elem_size() were corrupted.)  Removing -O2 fixes this.
|
| % gcc -v
| Reading specs from /usr/lib/gcc-lib/i586-mandrake-linux/2.96/specs
| gcc version 2.96 20000731 (Linux-Mandrake 8.0 2.96-0.48mdk)
|
| This is a fairly old snapshot.  I am building gcc-3.0 now and going to
| see if the problem still exists (unlikely).  Even if it doesn't, we
| probably need to stick something in autoconf to avoid optimizations
| with this buggy compiler.

I started seeing this *exact* same problem a few months back, running 2.96
20000619 on Solaris. If Apache was compiled with -O2, httpd would
segfault when servicing a request. It would always segfault in
mod_autoindex when in a va_list statement when building a FancyIndex page.
When Apache was compile sans -O2, everything worked well and there was no
segfault.

You're right... because of a 2.96 snapshot was used in RH, the use of
2.96 (I'm assuming any snapshot. I see that your's was older than mine)
should probably tell autoconf not to include any optimization.

/dale