You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stdcxx.apache.org by Martin Sebor <se...@roguewave.com> on 2007/10/17 19:21:43 UTC

stdcxx 4.2.0/4.1.3 binary incompatibility on Linux

In a 12D build with the default gcc 4.1.0 on SuSE Linux Enterprise
Server 10 (x86_64), the following simple program abends with the
error below after upgrading the 4.1.3 library to 4.2.0:

#include <string>

int main ()
{
     std::string s = "a";
}

The only library symbols referenced from the executable are

   __rw::__rw_throw(int, ...)
   __rw::__rw_deallocate(void*, unsigned long, int)
   std::string::_C_null_ref
   std::string::string(char const*, std::allocator<char> const&)

Of these, the first one isn't being called and the second and
fourth haven't changed (according to diff of string.cc). I hate
to admit I'm stumped. I suppose I should try to do a build on
a different distribution of Linux with an older version of gcc
to see if I can reproduce it there.


*** glibc detected *** ./t: free(): invalid pointer: 0x0000000000500fe8 ***
======= Backtrace: =========
/lib64/libc.so.6[0x2b71c3a4537e]
/lib64/libc.so.6(__libc_free+0x6c)[0x2b71c3a4699c]
./t(__gxx_personality_v0+0x198)[0x400968]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x2b71c39f7154]
./t(__gxx_personality_v0+0x59)[0x400829]

Re: stdcxx 4.2.0/4.1.3 binary incompatibility on Linux

Posted by Martin Sebor <se...@roguewave.com>.
Mark Brown wrote:
> On 10/17/07, Martin Sebor <se...@roguewave.com> wrote:
>> Okay, I've got it:
>>
>>     http://issues.apache.org/jira/browse/STDCXX-162
>>
>> Damn that was hard!
>>
>> So, what do we do? Going back to using a mutex for strings would
>> be *huge* performance hit on one of the most popular platforms
>> (if not the most popular one), but then again, keeping the status
>> quo will break binary compatibility on the (now) most popular
>> platform.
>>
>> Opinions?
> 
> Maybe it isn't as bad as you think. Have you done any measurements of
> the performance difference between the previous version and 4.2.0?

It is quite bad. Here are timings for 4 threads (running on
a 4 CPU dual core Xeon) copying the same global string object
5 million times:

                4.1.x        4.2.0
     real    0m38.464s    0m 8.660s
     user    0m26.685s    0m33.655s
     sys     1m54.129s    0m 0.001s

I was hoping that stdcxx 4.1.2 wouldn't build on x86_64 and
that the platform would be new for 4.1.3 (and could thought
of as experimental, giving us a possible excuse to break
compatibility) but no such luck. It builds fine and has the
mutex in it, just like 4.1.3.

Martin

> 
> -- Mark


Re: stdcxx 4.2.0/4.1.3 binary incompatibility on Linux

Posted by Mark Brown <ma...@gmail.com>.
On 10/17/07, Martin Sebor <se...@roguewave.com> wrote:
> Okay, I've got it:
>
>     http://issues.apache.org/jira/browse/STDCXX-162
>
> Damn that was hard!
>
> So, what do we do? Going back to using a mutex for strings would
> be *huge* performance hit on one of the most popular platforms
> (if not the most popular one), but then again, keeping the status
> quo will break binary compatibility on the (now) most popular
> platform.
>
> Opinions?

Maybe it isn't as bad as you think. Have you done any measurements of
the performance difference between the previous version and 4.2.0?

-- Mark

Re: stdcxx 4.2.0/4.1.3 binary incompatibility on Linux

Posted by Mark Brown <ma...@gmail.com>.
On 10/17/07, Travis Vitek <tv...@roguewave.com> wrote:
>
> So the problem is that the size of the __string_ref has changed, right?
> Specifically, this block of code causes the removal of the per-string
> mutex from __string_ref.
>
> +#ifndef _RWSTD_NO_ATOMIC_OPS
> +   // disable string mutex when atomic operations are available
> +#  ifndef _RWSTD_NO_STRING_MUTEX
> +#    define _RWSTD_NO_STRING_MUTEX
> +#  endif   // _RWSTD_NO_STRING_MUTEX
> +#endif   // _RWSTD_NO_ATOMIC_OPS
>
> An obvious option would be to just remove that block and let the user
> decide if they want to define _RWSTD_NO_STRING_MUTEX or not. If they
> define it then they must know that they are breaking binary
> compatibility with a library previously compiled without it. This
> doesn't help users compiling new source using the default configuration,
> but it does maintain compatibility.

This would be the safe way of handling it: maintain compatibility
while giving users a simple way to build a faster but incompatible
library. Given how hard it seems to be to debug problems caused by the
incompatibility I think this is a most sensible approach.

>
> We could also ask that they define _RWSTD_NO_ATOMIC_OPS, but that may
> cause other binary incompatibilities elsewhere or it will at the very
> least cause performance problems in other places.

Making the fast (but incompatible) implementation the default would
certainly make new users happy but expose existing users who are
upgrading from previous releases to the dire consequences of the
incompatibility. Which group is more important to us?

-- Mark

>
> It would be nice if we could just insert an appropriately sized pad
> buffer in place of the _C_mutex member. If the methods of __string_ref
> were not inlined I'm pretty sure that this would work. Unfortunately
> _C_inc_ref() and _C_dec_ref() are inlined, so their code may be compiled
> in the user executable, so I'm not convinced that this is a viable
> option.
>
> Travis
>
>
> Martin Sebor wrote:
> >
> >Okay, I've got it:
> >
> >    http://issues.apache.org/jira/browse/STDCXX-162
> >
> >Damn that was hard!
> >
> >So, what do we do? Going back to using a mutex for strings would
> >be *huge* performance hit on one of the most popular platforms
> >(if not the most popular one), but then again, keeping the status
> >quo will break binary compatibility on the (now) most popular
> >platform.
> >
> >Opinions?
> >
> >Martin
> >
> >Martin Sebor wrote:
> >> Martin Sebor wrote:
> >>> In a 12D build with the default gcc 4.1.0 on SuSE Linux Enterprise
> >>> Server 10 (x86_64), the following simple program abends with the
> >>> error below after upgrading the 4.1.3 library to 4.2.0:
> >>
> >> I've enhanced the program to replace operators new and delete
> >> and to print the value of the pointer. The enhanced test case
> >> and the output obtained from a 12D build with gcc 3.4.6 on Red
> >> Hat Enterprise Linux AS release 4 (Nahant Update 4) is below.
> >> Interestingly, the 12d (32-bit) output with Sun C++ on Solaris
> >> is fine.
> >>
> >> $ cat t.cpp && LD_LIBRARY_PATH=../lib ./t
> >> #include <cstdio>
> >> #include <cstdlib>
> >> #include <new>
> >> #include <string>
> >>
> >> int main ()
> >> {
> >>     std::string s = "a";
> >> }
> >>
> >> void* operator new (std::size_t n) throw (std::bad_alloc)
> >> {
> >>     void* const p = std::malloc (n);
> >>     std::fprintf (stdout, "operator new (%zu) ==> %#p\n", n, p);
> >>     return p;
> >> }
> >>
> >> void operator delete (void *p) throw ()
> >> {
> >>     std::fprintf (stdout, "operator delete (%#p)\n", p);
> >>     std::free (p);
> >> }
> >>
> >> void* operator new[] (std::size_t n) throw (std::bad_alloc)
> >> {
> >>     void* const p = std::malloc (n);
> >>     std::fprintf (stdout, "operator new[] (%zu) ==> %#p\n", n, p);
> >>     return p;
> >> }
> >>
> >> void operator delete[] (void *p) throw ()
> >> {
> >>     std::fprintf (stdout, "operator delete[] (%#p)\n", p);
> >>     std::free (p);
> >> }
> >>
> >> operator new (58) ==> 0x502010
> >> operator delete (0x501fe8)
> >> *** glibc detected *** free(): invalid pointer:
> >0x0000000000501fe8 ***
> >> Aborted
> >>
> >>
> >
>

Re: [PATCH] Re: stdcxx 4.2.0/4.1.3 binary incompatibility on Linux

Posted by Martin Sebor <se...@roguewave.com>.
Okay, unless I hear further objections in the next 10 minutes
or so this is going in.

Martin

Travis Vitek wrote:
>  
> 
> Martin Sebor wrote:
>> Travis Vitek wrote:
>>>  
>>> The only thing I don't like about this patch is that it 
>>> assumes no other platform will exhibit the same behavior
>>> that we are seeing on Linux/x86_64. I can't say with any
>>> confidence that things will actually work out that way.
>>> Personally I'd rather see the change compiled out by
>>> default on all platforms, and a macro to enable it if
>>> the user wants it.
>> I'm not sure what you mean by "see the change compiled out on all
>> platforms." Are you saying you want the atomic ops disabled on
>> all platforms?
> 
> No, I'm definitely not saying disable atomic ops on all platforms. That
> would be silly.
> 
>> The reason for this patch is because the atomic
>> ops hadn't been ported to x86_64 in time for 4.1.3 (bad mistake):
>>   http://issues.apache.org/jira/browse/STDCXX-162
>>
> 
> Makes sense now. Sorry for the confusion. It has been a long day.
> 
>>> Also, if you want to pick nits, the version check should probably use
>>> _RWSTD_VER_MAJOR.
>> Yes, that would be better. I'll change it.
>>
>> Martin
>>
>>> Travis


RE: [PATCH] Re: stdcxx 4.2.0/4.1.3 binary incompatibility on Linux

Posted by Travis Vitek <tv...@roguewave.com>.
 

Martin Sebor wrote:
>
>Travis Vitek wrote:
>>  
>> The only thing I don't like about this patch is that it 
>> assumes no other platform will exhibit the same behavior
>> that we are seeing on Linux/x86_64. I can't say with any
>> confidence that things will actually work out that way.
>> Personally I'd rather see the change compiled out by
>> default on all platforms, and a macro to enable it if
>> the user wants it.
>
>I'm not sure what you mean by "see the change compiled out on all
>platforms." Are you saying you want the atomic ops disabled on
>all platforms?

No, I'm definitely not saying disable atomic ops on all platforms. That
would be silly.

> The reason for this patch is because the atomic
>ops hadn't been ported to x86_64 in time for 4.1.3 (bad mistake):
>   http://issues.apache.org/jira/browse/STDCXX-162
>

Makes sense now. Sorry for the confusion. It has been a long day.

>> 
>> Also, if you want to pick nits, the version check should probably use
>> _RWSTD_VER_MAJOR.
>
>Yes, that would be better. I'll change it.
>
>Martin
>
>> 
>> Travis

Re: [PATCH] Re: stdcxx 4.2.0/4.1.3 binary incompatibility on Linux

Posted by Martin Sebor <se...@roguewave.com>.
Travis Vitek wrote:
>  
> The only thing I don't like about this patch is that it assumes no other
> platform will exhibit the same behavior that we are seeing on
> Linux/x86_64. I can't say with any confidence that things will actually
> work out that way. Personally I'd rather see the change compiled out by
> default on all platforms, and a macro to enable it if the user wants it.

I'm not sure what you mean by "see the change compiled out on all
platforms." Are you saying you want the atomic ops disabled on
all platforms? The reason for this patch is because the atomic
ops hadn't been ported to x86_64 in time for 4.1.3 (bad mistake):
   http://issues.apache.org/jira/browse/STDCXX-162

> 
> Also, if you want to pick nits, the version check should probably use
> _RWSTD_VER_MAJOR.

Yes, that would be better. I'll change it.

Martin

> 
> Travis
> 
>> -----Original Message-----
>> From: Martin Sebor [mailto:sebor@roguewave.com] 
>> Sent: Wednesday, October 17, 2007 5:39 PM
>> To: stdcxx-dev@incubator.apache.org
>> Subject: [PATCH] Re: stdcxx 4.2.0/4.1.3 binary incompatibility on Linux
>>
>> How does the attached patch look?
>>
>> The patch adds two macros, _RWSTD_NO_STRING_ATOMIC_OPS and
>> _RWSTD_USE_STRING_ATOMIC_OPS. The first one is #defined for
>> all compilers on Linux/x86_64 (i.e., in wide mode), *unless*
>> the second one is defined, either on the command line on in
>> the generated config header by the user. This whole hackery
>> is guarded by _RWSTD_VER and automatically disabled (i.e.,
>> the library switches over to using atomic operations by
>> default) at version 5.
>>
>> Martin
>>


RE: [PATCH] Re: stdcxx 4.2.0/4.1.3 binary incompatibility on Linux

Posted by Travis Vitek <tv...@roguewave.com>.
 
The only thing I don't like about this patch is that it assumes no other
platform will exhibit the same behavior that we are seeing on
Linux/x86_64. I can't say with any confidence that things will actually
work out that way. Personally I'd rather see the change compiled out by
default on all platforms, and a macro to enable it if the user wants it.

Also, if you want to pick nits, the version check should probably use
_RWSTD_VER_MAJOR.

Travis

>-----Original Message-----
>From: Martin Sebor [mailto:sebor@roguewave.com] 
>Sent: Wednesday, October 17, 2007 5:39 PM
>To: stdcxx-dev@incubator.apache.org
>Subject: [PATCH] Re: stdcxx 4.2.0/4.1.3 binary incompatibility on Linux
>
>How does the attached patch look?
>
>The patch adds two macros, _RWSTD_NO_STRING_ATOMIC_OPS and
>_RWSTD_USE_STRING_ATOMIC_OPS. The first one is #defined for
>all compilers on Linux/x86_64 (i.e., in wide mode), *unless*
>the second one is defined, either on the command line on in
>the generated config header by the user. This whole hackery
>is guarded by _RWSTD_VER and automatically disabled (i.e.,
>the library switches over to using atomic operations by
>default) at version 5.
>
>Martin
>

[PATCH] Re: stdcxx 4.2.0/4.1.3 binary incompatibility on Linux

Posted by Martin Sebor <se...@roguewave.com>.
How does the attached patch look?

The patch adds two macros, _RWSTD_NO_STRING_ATOMIC_OPS and
_RWSTD_USE_STRING_ATOMIC_OPS. The first one is #defined for
all compilers on Linux/x86_64 (i.e., in wide mode), *unless*
the second one is defined, either on the command line on in
the generated config header by the user. This whole hackery
is guarded by _RWSTD_VER and automatically disabled (i.e.,
the library switches over to using atomic operations by
default) at version 5.

Martin

Martin Sebor wrote:
> Travis Vitek wrote:
>> So the problem is that the size of the __string_ref has changed, right?
> 
> Yes. More precisely, that is one part of the problem (the other
> part is the pthread functions that are inlined in user code and
> that expect to get a mutex and not random garbage).
> 
>> Specifically, this block of code causes the removal of the per-string
>> mutex from __string_ref.
>>
>> +#ifndef _RWSTD_NO_ATOMIC_OPS
>> +   // disable string mutex when atomic operations are available
>> +#  ifndef _RWSTD_NO_STRING_MUTEX
>> +#    define _RWSTD_NO_STRING_MUTEX
>> +#  endif   // _RWSTD_NO_STRING_MUTEX
>> +#endif   // _RWSTD_NO_ATOMIC_OPS
>>
>> An obvious option would be to just remove that block and let the user
>> decide if they want to define _RWSTD_NO_STRING_MUTEX or not. If they
>> define it then they must know that they are breaking binary
>> compatibility with a library previously compiled without it. This
>> doesn't help users compiling new source using the default configuration,
>> but it does maintain compatibility.
> 
> I'm also leaning in this general direction. Keep the compatible
> behavior and and let users opt into the breaking fix. I'll have
> to think some more about how to control the behavior. I was
> hoping to use an existing config macro without changing any
> library code (except _config-gcc.h).
> 
>>
>> We could also ask that they define _RWSTD_NO_ATOMIC_OPS, but that may
>> cause other binary incompatibilities elsewhere or it will at the very
>> least cause performance problems in other places.
> 
> It is very tempting to go with the improved implementation by
> default, but from a compatibility standpoint it would be the
> wrong thing to do.
> 
>>
>> It would be nice if we could just insert an appropriately sized pad
>> buffer in place of the _C_mutex member. If the methods of __string_ref
>> were not inlined I'm pretty sure that this would work. Unfortunately
>> _C_inc_ref() and _C_dec_ref() are inlined, so their code may be compiled
>> in the user executable, so I'm not convinced that this is a viable
>> option.
> 
> All the __string_ref members are trivial inline wrappers that
> are fully expected to be inlined. It's possible that some of
> them will not be inlined under some conditions but I expect
> those to be exceeding rare.
> 
> Martin
> 
>>
>> Travis
>>
>>
>> Martin Sebor wrote:
>>> Okay, I've got it:
>>>
>>>    http://issues.apache.org/jira/browse/STDCXX-162
>>>
>>> Damn that was hard!
>>>
>>> So, what do we do? Going back to using a mutex for strings would
>>> be *huge* performance hit on one of the most popular platforms
>>> (if not the most popular one), but then again, keeping the status
>>> quo will break binary compatibility on the (now) most popular
>>> platform.
>>>
>>> Opinions?
>>>
>>> Martin
>>>
>>> Martin Sebor wrote:
>>>> Martin Sebor wrote:
>>>>> In a 12D build with the default gcc 4.1.0 on SuSE Linux Enterprise
>>>>> Server 10 (x86_64), the following simple program abends with the
>>>>> error below after upgrading the 4.1.3 library to 4.2.0:
>>>> I've enhanced the program to replace operators new and delete
>>>> and to print the value of the pointer. The enhanced test case
>>>> and the output obtained from a 12D build with gcc 3.4.6 on Red
>>>> Hat Enterprise Linux AS release 4 (Nahant Update 4) is below.
>>>> Interestingly, the 12d (32-bit) output with Sun C++ on Solaris
>>>> is fine.
>>>>
>>>> $ cat t.cpp && LD_LIBRARY_PATH=../lib ./t
>>>> #include <cstdio>
>>>> #include <cstdlib>
>>>> #include <new>
>>>> #include <string>
>>>>
>>>> int main ()
>>>> {
>>>>     std::string s = "a";
>>>> }
>>>>
>>>> void* operator new (std::size_t n) throw (std::bad_alloc)
>>>> {
>>>>     void* const p = std::malloc (n);
>>>>     std::fprintf (stdout, "operator new (%zu) ==> %#p\n", n, p);
>>>>     return p;
>>>> }
>>>>
>>>> void operator delete (void *p) throw ()
>>>> {
>>>>     std::fprintf (stdout, "operator delete (%#p)\n", p);
>>>>     std::free (p);
>>>> }
>>>>
>>>> void* operator new[] (std::size_t n) throw (std::bad_alloc)
>>>> {
>>>>     void* const p = std::malloc (n);
>>>>     std::fprintf (stdout, "operator new[] (%zu) ==> %#p\n", n, p);
>>>>     return p;
>>>> }
>>>>
>>>> void operator delete[] (void *p) throw ()
>>>> {
>>>>     std::fprintf (stdout, "operator delete[] (%#p)\n", p);
>>>>     std::free (p);
>>>> }
>>>>
>>>> operator new (58) ==> 0x502010
>>>> operator delete (0x501fe8)
>>>> *** glibc detected *** free(): invalid pointer: 
>>> 0x0000000000501fe8 ***
>>>> Aborted
>>>>
>>>>
> 


Re: stdcxx 4.2.0/4.1.3 binary incompatibility on Linux

Posted by Martin Sebor <se...@roguewave.com>.
Travis Vitek wrote:
> So the problem is that the size of the __string_ref has changed, right?

Yes. More precisely, that is one part of the problem (the other
part is the pthread functions that are inlined in user code and
that expect to get a mutex and not random garbage).

> Specifically, this block of code causes the removal of the per-string
> mutex from __string_ref.
> 
> +#ifndef _RWSTD_NO_ATOMIC_OPS
> +   // disable string mutex when atomic operations are available
> +#  ifndef _RWSTD_NO_STRING_MUTEX
> +#    define _RWSTD_NO_STRING_MUTEX
> +#  endif   // _RWSTD_NO_STRING_MUTEX
> +#endif   // _RWSTD_NO_ATOMIC_OPS
> 
> An obvious option would be to just remove that block and let the user
> decide if they want to define _RWSTD_NO_STRING_MUTEX or not. If they
> define it then they must know that they are breaking binary
> compatibility with a library previously compiled without it. This
> doesn't help users compiling new source using the default configuration,
> but it does maintain compatibility.

I'm also leaning in this general direction. Keep the compatible
behavior and and let users opt into the breaking fix. I'll have
to think some more about how to control the behavior. I was
hoping to use an existing config macro without changing any
library code (except _config-gcc.h).

> 
> We could also ask that they define _RWSTD_NO_ATOMIC_OPS, but that may
> cause other binary incompatibilities elsewhere or it will at the very
> least cause performance problems in other places.

It is very tempting to go with the improved implementation by
default, but from a compatibility standpoint it would be the
wrong thing to do.

> 
> It would be nice if we could just insert an appropriately sized pad
> buffer in place of the _C_mutex member. If the methods of __string_ref
> were not inlined I'm pretty sure that this would work. Unfortunately
> _C_inc_ref() and _C_dec_ref() are inlined, so their code may be compiled
> in the user executable, so I'm not convinced that this is a viable
> option.

All the __string_ref members are trivial inline wrappers that
are fully expected to be inlined. It's possible that some of
them will not be inlined under some conditions but I expect
those to be exceeding rare.

Martin

> 
> Travis
> 
> 
> Martin Sebor wrote:
>> Okay, I've got it:
>>
>>    http://issues.apache.org/jira/browse/STDCXX-162
>>
>> Damn that was hard!
>>
>> So, what do we do? Going back to using a mutex for strings would
>> be *huge* performance hit on one of the most popular platforms
>> (if not the most popular one), but then again, keeping the status
>> quo will break binary compatibility on the (now) most popular
>> platform.
>>
>> Opinions?
>>
>> Martin
>>
>> Martin Sebor wrote:
>>> Martin Sebor wrote:
>>>> In a 12D build with the default gcc 4.1.0 on SuSE Linux Enterprise
>>>> Server 10 (x86_64), the following simple program abends with the
>>>> error below after upgrading the 4.1.3 library to 4.2.0:
>>> I've enhanced the program to replace operators new and delete
>>> and to print the value of the pointer. The enhanced test case
>>> and the output obtained from a 12D build with gcc 3.4.6 on Red
>>> Hat Enterprise Linux AS release 4 (Nahant Update 4) is below.
>>> Interestingly, the 12d (32-bit) output with Sun C++ on Solaris
>>> is fine.
>>>
>>> $ cat t.cpp && LD_LIBRARY_PATH=../lib ./t
>>> #include <cstdio>
>>> #include <cstdlib>
>>> #include <new>
>>> #include <string>
>>>
>>> int main ()
>>> {
>>>     std::string s = "a";
>>> }
>>>
>>> void* operator new (std::size_t n) throw (std::bad_alloc)
>>> {
>>>     void* const p = std::malloc (n);
>>>     std::fprintf (stdout, "operator new (%zu) ==> %#p\n", n, p);
>>>     return p;
>>> }
>>>
>>> void operator delete (void *p) throw ()
>>> {
>>>     std::fprintf (stdout, "operator delete (%#p)\n", p);
>>>     std::free (p);
>>> }
>>>
>>> void* operator new[] (std::size_t n) throw (std::bad_alloc)
>>> {
>>>     void* const p = std::malloc (n);
>>>     std::fprintf (stdout, "operator new[] (%zu) ==> %#p\n", n, p);
>>>     return p;
>>> }
>>>
>>> void operator delete[] (void *p) throw ()
>>> {
>>>     std::fprintf (stdout, "operator delete[] (%#p)\n", p);
>>>     std::free (p);
>>> }
>>>
>>> operator new (58) ==> 0x502010
>>> operator delete (0x501fe8)
>>> *** glibc detected *** free(): invalid pointer: 
>> 0x0000000000501fe8 ***
>>> Aborted
>>>
>>>


RE: stdcxx 4.2.0/4.1.3 binary incompatibility on Linux

Posted by Travis Vitek <tv...@roguewave.com>.
So the problem is that the size of the __string_ref has changed, right?
Specifically, this block of code causes the removal of the per-string
mutex from __string_ref.

+#ifndef _RWSTD_NO_ATOMIC_OPS
+   // disable string mutex when atomic operations are available
+#  ifndef _RWSTD_NO_STRING_MUTEX
+#    define _RWSTD_NO_STRING_MUTEX
+#  endif   // _RWSTD_NO_STRING_MUTEX
+#endif   // _RWSTD_NO_ATOMIC_OPS

An obvious option would be to just remove that block and let the user
decide if they want to define _RWSTD_NO_STRING_MUTEX or not. If they
define it then they must know that they are breaking binary
compatibility with a library previously compiled without it. This
doesn't help users compiling new source using the default configuration,
but it does maintain compatibility.

We could also ask that they define _RWSTD_NO_ATOMIC_OPS, but that may
cause other binary incompatibilities elsewhere or it will at the very
least cause performance problems in other places.

It would be nice if we could just insert an appropriately sized pad
buffer in place of the _C_mutex member. If the methods of __string_ref
were not inlined I'm pretty sure that this would work. Unfortunately
_C_inc_ref() and _C_dec_ref() are inlined, so their code may be compiled
in the user executable, so I'm not convinced that this is a viable
option.

Travis


Martin Sebor wrote:
>
>Okay, I've got it:
>
>    http://issues.apache.org/jira/browse/STDCXX-162
>
>Damn that was hard!
>
>So, what do we do? Going back to using a mutex for strings would
>be *huge* performance hit on one of the most popular platforms
>(if not the most popular one), but then again, keeping the status
>quo will break binary compatibility on the (now) most popular
>platform.
>
>Opinions?
>
>Martin
>
>Martin Sebor wrote:
>> Martin Sebor wrote:
>>> In a 12D build with the default gcc 4.1.0 on SuSE Linux Enterprise
>>> Server 10 (x86_64), the following simple program abends with the
>>> error below after upgrading the 4.1.3 library to 4.2.0:
>> 
>> I've enhanced the program to replace operators new and delete
>> and to print the value of the pointer. The enhanced test case
>> and the output obtained from a 12D build with gcc 3.4.6 on Red
>> Hat Enterprise Linux AS release 4 (Nahant Update 4) is below.
>> Interestingly, the 12d (32-bit) output with Sun C++ on Solaris
>> is fine.
>> 
>> $ cat t.cpp && LD_LIBRARY_PATH=../lib ./t
>> #include <cstdio>
>> #include <cstdlib>
>> #include <new>
>> #include <string>
>> 
>> int main ()
>> {
>>     std::string s = "a";
>> }
>> 
>> void* operator new (std::size_t n) throw (std::bad_alloc)
>> {
>>     void* const p = std::malloc (n);
>>     std::fprintf (stdout, "operator new (%zu) ==> %#p\n", n, p);
>>     return p;
>> }
>> 
>> void operator delete (void *p) throw ()
>> {
>>     std::fprintf (stdout, "operator delete (%#p)\n", p);
>>     std::free (p);
>> }
>> 
>> void* operator new[] (std::size_t n) throw (std::bad_alloc)
>> {
>>     void* const p = std::malloc (n);
>>     std::fprintf (stdout, "operator new[] (%zu) ==> %#p\n", n, p);
>>     return p;
>> }
>> 
>> void operator delete[] (void *p) throw ()
>> {
>>     std::fprintf (stdout, "operator delete[] (%#p)\n", p);
>>     std::free (p);
>> }
>> 
>> operator new (58) ==> 0x502010
>> operator delete (0x501fe8)
>> *** glibc detected *** free(): invalid pointer: 
>0x0000000000501fe8 ***
>> Aborted
>> 
>> 
>

Re: stdcxx 4.2.0/4.1.3 binary incompatibility on Linux

Posted by Martin Sebor <se...@roguewave.com>.
Okay, I've got it:

    http://issues.apache.org/jira/browse/STDCXX-162

Damn that was hard!

So, what do we do? Going back to using a mutex for strings would
be *huge* performance hit on one of the most popular platforms
(if not the most popular one), but then again, keeping the status
quo will break binary compatibility on the (now) most popular
platform.

Opinions?

Martin

Martin Sebor wrote:
> Martin Sebor wrote:
>> In a 12D build with the default gcc 4.1.0 on SuSE Linux Enterprise
>> Server 10 (x86_64), the following simple program abends with the
>> error below after upgrading the 4.1.3 library to 4.2.0:
> 
> I've enhanced the program to replace operators new and delete
> and to print the value of the pointer. The enhanced test case
> and the output obtained from a 12D build with gcc 3.4.6 on Red
> Hat Enterprise Linux AS release 4 (Nahant Update 4) is below.
> Interestingly, the 12d (32-bit) output with Sun C++ on Solaris
> is fine.
> 
> $ cat t.cpp && LD_LIBRARY_PATH=../lib ./t
> #include <cstdio>
> #include <cstdlib>
> #include <new>
> #include <string>
> 
> int main ()
> {
>     std::string s = "a";
> }
> 
> void* operator new (std::size_t n) throw (std::bad_alloc)
> {
>     void* const p = std::malloc (n);
>     std::fprintf (stdout, "operator new (%zu) ==> %#p\n", n, p);
>     return p;
> }
> 
> void operator delete (void *p) throw ()
> {
>     std::fprintf (stdout, "operator delete (%#p)\n", p);
>     std::free (p);
> }
> 
> void* operator new[] (std::size_t n) throw (std::bad_alloc)
> {
>     void* const p = std::malloc (n);
>     std::fprintf (stdout, "operator new[] (%zu) ==> %#p\n", n, p);
>     return p;
> }
> 
> void operator delete[] (void *p) throw ()
> {
>     std::fprintf (stdout, "operator delete[] (%#p)\n", p);
>     std::free (p);
> }
> 
> operator new (58) ==> 0x502010
> operator delete (0x501fe8)
> *** glibc detected *** free(): invalid pointer: 0x0000000000501fe8 ***
> Aborted
> 
> 
>>
>> #include <string>
>>
>> int main ()
>> {
>>     std::string s = "a";
>> }
>>
>> The only library symbols referenced from the executable are
>>
>>   __rw::__rw_throw(int, ...)
>>   __rw::__rw_deallocate(void*, unsigned long, int)
>>   std::string::_C_null_ref
>>   std::string::string(char const*, std::allocator<char> const&)
>>
>> Of these, the first one isn't being called and the second and
>> fourth haven't changed (according to diff of string.cc). I hate
>> to admit I'm stumped. I suppose I should try to do a build on
>> a different distribution of Linux with an older version of gcc
>> to see if I can reproduce it there.
>>
>>
>> *** glibc detected *** ./t: free(): invalid pointer: 
>> 0x0000000000500fe8 ***
>> ======= Backtrace: =========
>> /lib64/libc.so.6[0x2b71c3a4537e]
>> /lib64/libc.so.6(__libc_free+0x6c)[0x2b71c3a4699c]
>> ./t(__gxx_personality_v0+0x198)[0x400968]
>> /lib64/libc.so.6(__libc_start_main+0xf4)[0x2b71c39f7154]
>> ./t(__gxx_personality_v0+0x59)[0x400829]
> 


Re: stdcxx 4.2.0/4.1.3 binary incompatibility on Linux

Posted by Martin Sebor <se...@roguewave.com>.
Martin Sebor wrote:
> Martin Sebor wrote:
>> In a 12D build with the default gcc 4.1.0 on SuSE Linux Enterprise
>> Server 10 (x86_64), the following simple program abends with the
>> error below after upgrading the 4.1.3 library to 4.2.0:
> 
> I've enhanced the program to replace operators new and delete
> and to print the value of the pointer. The enhanced test case
> and the output obtained from a 12D build with gcc 3.4.6 on Red
> Hat Enterprise Linux AS release 4 (Nahant Update 4) is below.
> Interestingly, the 12d (32-bit) output with Sun C++ on Solaris
> is fine.

I think finally I might be getting somewhere with this. The 32
bit library seems to work fine on Linux, too (the test case and
all examples run to completion). On Solaris, both 12d (32-bit)
and 12D (64-bit) are good. So it looks like the problem is
isolated to 64-bit Linux (of course, we haven't checked AIX
or HP-UX).

> $ cat t.cpp && LD_LIBRARY_PATH=../lib ./t
> #include <cstdio>
> #include <cstdlib>
> #include <new>
> #include <string>
> 
> int main ()
> {
>     std::string s = "a";
> }
> 
> void* operator new (std::size_t n) throw (std::bad_alloc)
> {
>     void* const p = std::malloc (n);
>     std::fprintf (stdout, "operator new (%zu) ==> %#p\n", n, p);
>     return p;
> }
> 
> void operator delete (void *p) throw ()
> {
>     std::fprintf (stdout, "operator delete (%#p)\n", p);
>     std::free (p);
> }
> 
> void* operator new[] (std::size_t n) throw (std::bad_alloc)
> {
>     void* const p = std::malloc (n);
>     std::fprintf (stdout, "operator new[] (%zu) ==> %#p\n", n, p);
>     return p;
> }
> 
> void operator delete[] (void *p) throw ()
> {
>     std::fprintf (stdout, "operator delete[] (%#p)\n", p);
>     std::free (p);
> }
> 
> operator new (58) ==> 0x502010
> operator delete (0x501fe8)
> *** glibc detected *** free(): invalid pointer: 0x0000000000501fe8 ***
> Aborted
> 
> 
>>
>> #include <string>
>>
>> int main ()
>> {
>>     std::string s = "a";
>> }
>>
>> The only library symbols referenced from the executable are
>>
>>   __rw::__rw_throw(int, ...)
>>   __rw::__rw_deallocate(void*, unsigned long, int)
>>   std::string::_C_null_ref
>>   std::string::string(char const*, std::allocator<char> const&)
>>
>> Of these, the first one isn't being called and the second and
>> fourth haven't changed (according to diff of string.cc). I hate
>> to admit I'm stumped. I suppose I should try to do a build on
>> a different distribution of Linux with an older version of gcc
>> to see if I can reproduce it there.
>>
>>
>> *** glibc detected *** ./t: free(): invalid pointer: 
>> 0x0000000000500fe8 ***
>> ======= Backtrace: =========
>> /lib64/libc.so.6[0x2b71c3a4537e]
>> /lib64/libc.so.6(__libc_free+0x6c)[0x2b71c3a4699c]
>> ./t(__gxx_personality_v0+0x198)[0x400968]
>> /lib64/libc.so.6(__libc_start_main+0xf4)[0x2b71c39f7154]
>> ./t(__gxx_personality_v0+0x59)[0x400829]
> 


Re: stdcxx 4.2.0/4.1.3 binary incompatibility on Linux

Posted by Martin Sebor <se...@roguewave.com>.
Martin Sebor wrote:
> In a 12D build with the default gcc 4.1.0 on SuSE Linux Enterprise
> Server 10 (x86_64), the following simple program abends with the
> error below after upgrading the 4.1.3 library to 4.2.0:

I've enhanced the program to replace operators new and delete
and to print the value of the pointer. The enhanced test case
and the output obtained from a 12D build with gcc 3.4.6 on Red
Hat Enterprise Linux AS release 4 (Nahant Update 4) is below.
Interestingly, the 12d (32-bit) output with Sun C++ on Solaris
is fine.

$ cat t.cpp && LD_LIBRARY_PATH=../lib ./t
#include <cstdio>
#include <cstdlib>
#include <new>
#include <string>

int main ()
{
     std::string s = "a";
}

void* operator new (std::size_t n) throw (std::bad_alloc)
{
     void* const p = std::malloc (n);
     std::fprintf (stdout, "operator new (%zu) ==> %#p\n", n, p);
     return p;
}

void operator delete (void *p) throw ()
{
     std::fprintf (stdout, "operator delete (%#p)\n", p);
     std::free (p);
}

void* operator new[] (std::size_t n) throw (std::bad_alloc)
{
     void* const p = std::malloc (n);
     std::fprintf (stdout, "operator new[] (%zu) ==> %#p\n", n, p);
     return p;
}

void operator delete[] (void *p) throw ()
{
     std::fprintf (stdout, "operator delete[] (%#p)\n", p);
     std::free (p);
}

operator new (58) ==> 0x502010
operator delete (0x501fe8)
*** glibc detected *** free(): invalid pointer: 0x0000000000501fe8 ***
Aborted


> 
> #include <string>
> 
> int main ()
> {
>     std::string s = "a";
> }
> 
> The only library symbols referenced from the executable are
> 
>   __rw::__rw_throw(int, ...)
>   __rw::__rw_deallocate(void*, unsigned long, int)
>   std::string::_C_null_ref
>   std::string::string(char const*, std::allocator<char> const&)
> 
> Of these, the first one isn't being called and the second and
> fourth haven't changed (according to diff of string.cc). I hate
> to admit I'm stumped. I suppose I should try to do a build on
> a different distribution of Linux with an older version of gcc
> to see if I can reproduce it there.
> 
> 
> *** glibc detected *** ./t: free(): invalid pointer: 0x0000000000500fe8 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x2b71c3a4537e]
> /lib64/libc.so.6(__libc_free+0x6c)[0x2b71c3a4699c]
> ./t(__gxx_personality_v0+0x198)[0x400968]
> /lib64/libc.so.6(__libc_start_main+0xf4)[0x2b71c39f7154]
> ./t(__gxx_personality_v0+0x59)[0x400829]