You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stdcxx.apache.org by Travis Vitek <vi...@roguewave.com> on 2007/11/02 17:39:32 UTC

19.exceptions.mt.cpp fails on AIX

I'm working on fixing an issue with test 19.exceptions.mt.cpp on AIX. The
issue is that the test spins in a loop because the loop counter is being
thrashed when an exception is copied onto the stack. Here is a simple
testcase.

#include <exception>

void test_single_exception ()
{
for (int i = 0; i < 5; ++i) {

try {
throw std::exception ();
}
catch (std::exception ex) {
&ex;
}

}
}

int main ()
{
test_single_exception ();
return 0;
}

So the issue is actually not that complicated. The exception type provided
by STDCXX is 4 bytes in size, and the native one is 8. In and of itself,
that shouldn't really be a problem because the two definitions should never
coexist, right? So the problem really shows up when the config tests run,
they set the following macros...

#define _RWSTD_NO_EXCEPTION_ASSIGNMENT
// #define _RWSTD_NO_EXCEPTION_COPY_CTOR
#define _RWSTD_NO_EXCEPTION_DEFAULT_CTOR
#define _RWSTD_NO_EXCEPTION_DTOR
#define _RWSTD_NO_EXCEPTION_WHAT

If I'm reading the code correctly, this means that STDCXX will provide
definitions for the default ctor, copy-assignment operator, dtor and what().
The definition of the copy-ctor will come from somewhere else [where?].
Anyways, an exception is created and copied out for the unwind. The
exception is then copied back onto the stack at the location the exception
is handled. The code that actually copies the exception expects the object
to be 8 bytes in size, but the code that created the exception only
allocates 4 bytes for it.

So here is my problem with all of this. How is this safe? If the one of the
'special' functions provided by the system has some side effect, and we use
that implementation, then how can we safely define any of the other
'special' functions?

That said, what is the appropriate solution? Should we just pad the type out
to the correct size, or should we provide our own definition of the copy
ctor, possibly looking at the compiler test to verify it is not wrong. Both
seem to work quite well, but I'm afraid I don't understand why we opt to use
the definitions of the 'special' functions that are provided. I guess I
would understand if I had got a linker error complaining of mulitply defined
symbols, but I don't.

Travis

--
View this message in context: http://www.nabble.com/19.exceptions.mt.cpp-fails-on-AIX-tf4738595.html#a13551223
Sent from the stdcxx-dev mailing list archive at Nabble.com.

Re: 19.exceptions.mt.cpp fails on AIX

Posted by Martin Sebor <se...@roguewave.com>.

Travis Vitek wrote:
> 
> 
> Farid Zaripov-2 wrote:
>>> From: Travis Vitek [mailto:vitek@roguewave.com] 
>>>
>>> The definition of the copy-ctor will come from somewhere else 
>>> [where?].
>>   The definition of the copy-ctor should be present in libc.
>>
> 
> The C library has definitions for C++ types that are supposed to be part of
> the C++ Standard Library? That just seems wrong. You must mean libC.a on AIX
> [or more generally the C++ library]. In that case I don't think any symbols
> should _ever_ come from there. We shouldn't be linking the C++ library at
> all if we can avoid it.
> 
> None of this really explains why we don't always define all of the members
> of std::exception ourselves. I guess if we absolutely cannot avoid getting
> libC.a linked then we would see link errors complaining of multiply defined
> symbols, but I don't see them if I modify the config header and rebuild.

libC.a contains other important runtime symbols that C++ programs
can't do without (e.g., exception handling, runtime type id info,
dynamic memory management, etc.) We use the xlCcore command to avoid
linking with the native C++ Standard Library. IBM implemented xlCcore
partly in response to our complaints about problems due to mixing the
native and our implementation of the C++ Standard Library in the same
program. You can view the history of the whole issue here (on the
inside of the Rogue Wave firewall):
   http://bugzilla.cvo.roguewave.com/show_bug.cgi?id=2352

 From the issue you can see that we asked IBM to split up xlC.a into
two libraries like most other compiler vendors do: the C++ runtime
library and the rest of the C++ Standard Library. They decided to
do it their way and keep all symbols in the same library. I'm not
comfortable with the solution because it feels like magic to me and
makes me wonder if it even works, but unless we can point to actual
problems that it causes there's little we can do about it.

Martin

> 
> 
> Farid Zaripov-2 wrote:
>>> From: Travis Vitek [mailto:vitek@roguewave.com] 
>>>
>>> Anyways, an exception is created and copied out for the unwind. The
>>> exception is then copied back onto the stack at the location 
>>> the exception
>>> is handled. The code that actually copies the exception 
>>> expects the object
>>> to be 8 bytes in size, but the code that created the exception only
>>> allocates 4 bytes for it.
>>   The definitions of the std::exception class in STDCXX and AIX system
>> headers should be the same. If no, the include/exception header file
>> should have corresponding #ifdef _RWSTD_OS_AIX / #endif
>>
> 
> Technically that will break binary compatibility, right? I realize the
> STDCXX 'special' functions are all implemented as no-ops, but there is
> probably some contrived user-code that would break because of this change.
> 
> If this change is indeed safe [I'm having trouble coming up with a testcase
> that will break because of it], then why don't we create a compiler test to
> determine if the exception class has has any pad so that we can pad out the
> declaration of std::exception. That way we would be able to conveniently
> avoid this problem in the future.
> 
> 
> Farid Zaripov-2 wrote:
>>> From: Travis Vitek [mailto:vitek@roguewave.com] 
>>>
>>> So here is my problem with all of this. How is this safe? If 
>>> the one of the 'special' functions provided by the system has
>>> some side effect, and we use that implementation, then how
>>> can we safely define any of the other 'special' functions?
>>>
>>> That said, what is the appropriate solution? Should we just 
>>> pad the type out to the correct size
>>   Yes, since the stdcxx uses std::exception class just as base class
>> for the other exception classes (the members of the std::exception are
>> not used).
>>
>> Farid.
>>
> 
> I think the members of std::exception provided by stdcxx are actually being
> used when they are provided, but their definitions are 'dumb'.
> 
> Travis
>

RE: 19.exceptions.mt.cpp fails on AIX

Posted by Travis Vitek <vi...@roguewave.com>.

Farid Zaripov-2 wrote:
> 
>> From: Travis Vitek [mailto:vitek@roguewave.com] 
>>
>> The definition of the copy-ctor will come from somewhere else 
>> [where?].
> 
>   The definition of the copy-ctor should be present in libc.
> 

The C library has definitions for C++ types that are supposed to be part of
the C++ Standard Library? That just seems wrong. You must mean libC.a on AIX
[or more generally the C++ library]. In that case I don't think any symbols
should _ever_ come from there. We shouldn't be linking the C++ library at
all if we can avoid it.

None of this really explains why we don't always define all of the members
of std::exception ourselves. I guess if we absolutely cannot avoid getting
libC.a linked then we would see link errors complaining of multiply defined
symbols, but I don't see them if I modify the config header and rebuild.

Farid Zaripov-2 wrote:
> 
>> From: Travis Vitek [mailto:vitek@roguewave.com] 
>>
>> Anyways, an exception is created and copied out for the unwind. The
>> exception is then copied back onto the stack at the location 
>> the exception
>> is handled. The code that actually copies the exception 
>> expects the object
>> to be 8 bytes in size, but the code that created the exception only
>> allocates 4 bytes for it.
> 
>   The definitions of the std::exception class in STDCXX and AIX system
> headers should be the same. If no, the include/exception header file
> should have corresponding #ifdef _RWSTD_OS_AIX / #endif
> 

Technically that will break binary compatibility, right? I realize the
STDCXX 'special' functions are all implemented as no-ops, but there is
probably some contrived user-code that would break because of this change.

If this change is indeed safe [I'm having trouble coming up with a testcase
that will break because of it], then why don't we create a compiler test to
determine if the exception class has has any pad so that we can pad out the
declaration of std::exception. That way we would be able to conveniently
avoid this problem in the future.

Farid Zaripov-2 wrote:
> 
>> From: Travis Vitek [mailto:vitek@roguewave.com] 
>>
>> So here is my problem with all of this. How is this safe? If 
>> the one of the 'special' functions provided by the system has
>> some side effect, and we use that implementation, then how
>> can we safely define any of the other 'special' functions?
>> 
>> That said, what is the appropriate solution? Should we just 
>> pad the type out to the correct size
> 
>   Yes, since the stdcxx uses std::exception class just as base class
> for the other exception classes (the members of the std::exception are
> not used).
> 
> Farid.
> 

I think the members of std::exception provided by stdcxx are actually being
used when they are provided, but their definitions are 'dumb'.

Travis

-- 
View this message in context: http://www.nabble.com/19.exceptions.mt.cpp-fails-on-AIX-tf4738595.html#a13554236
Sent from the stdcxx-dev mailing list archive at Nabble.com.

RE: 19.exceptions.mt.cpp fails on AIX

Posted by Farid Zaripov <Fa...@epam.com>.

> -----Original Message-----
> From: Travis Vitek [mailto:vitek@roguewave.com] 
> Sent: Friday, November 02, 2007 6:40 PM
> To: stdcxx-dev@incubator.apache.org
> Subject: 19.exceptions.mt.cpp fails on AIX
> 
> If I'm reading the code correctly, this means that STDCXX will provide
> definitions for the default ctor, copy-assignment operator, 
> dtor and what().
> The definition of the copy-ctor will come from somewhere else 
> [where?].

  The definition of the copy-ctor should be present in libc.

> Anyways, an exception is created and copied out for the unwind. The
> exception is then copied back onto the stack at the location 
> the exception
> is handled. The code that actually copies the exception 
> expects the object
> to be 8 bytes in size, but the code that created the exception only
> allocates 4 bytes for it.

  The definitions of the std::exception class in STDCXX and AIX system
headers should be the same. If no, the include/exception header file
should
have corresponding #ifdef _RWSTD_OS_AIX / #endif

> So here is my problem with all of this. How is this safe? If 
> the one of the
> 'special' functions provided by the system has some side 
> effect, and we use
> that implementation, then how can we safely define any of the other
> 'special' functions?
> 
> That said, what is the appropriate solution? Should we just 
> pad the type out to the correct size

  Yes, since the stdcxx uses std::exception class just as base class
for the other exception classes (the members of the std::exception are
not used).

Farid.

Re: 19.exceptions.mt.cpp fails on AIX

Posted by Travis Vitek <vi...@roguewave.com>.



Martin Sebor wrote:
> 
> I was going to go on and say that the incompatibility caused
> by the fix won't be detectable because it simply fixes an
> already existing incompatibility (with the runtime) when I
> realized that there are many more (user-defined) exception
> classes that are unrelated to the runtime exceptions. So by
> fixing the incompatibility with the runtime we will be
> breaking compatibility with user-defined classes...
> 

I went ahead and filed http://issues.apache.org/jira/browse/STDCXX-643 and
-644 for the issue and the affected test.



-- 
View this message in context: http://www.nabble.com/19.exceptions.mt.cpp-fails-on-AIX-tf4738595.html#a13557441
Sent from the stdcxx-dev mailing list archive at Nabble.com.

Re: 19.exceptions.mt.cpp fails on AIX

Posted by Martin Sebor <se...@roguewave.com>.

Travis Vitek wrote:
> 
> 
> Martin Sebor wrote:
>>> That said, what is the appropriate solution? Should we just pad the type
>>> out
>>> to the correct size,
>> Yes. As Farid says, the XLC exception and ours must have the same
>> size.
>>
> 
> Is this a binary compatible change? I've always believed changing the size
> of a type breaks binary compatibility, but if the the special functions are
> no-op, I'm thinking it might be safe.

No, it's not binary compatible, and neither is the status quo
(i.e., stdcxx 4.2.0 and most likely all previous versions of
stdcxx on AIX) is binary incompatible with the AIX runtime.
The only reason why I suspect no one has noticed it yet is
most likely because runtime exceptions are relatively rare
and perhaps also because the context where the stack
corruption occurs is either benign, recoverable, or gets
chalked up to cosmic rays.

I was going to go on and say that the incompatibility caused
by the fix won't be detectable because it simply fixes an
already existing incompatibility (with the runtime) when I
realized that there are many more (user-defined) exception
classes that are unrelated to the runtime exceptions. So by
fixing the incompatibility with the runtime we will be
breaking compatibility with user-defined classes...

> 
> Martin Sebor wrote:
>>> I guess I
>>> would understand if I had got a linker error complaining of mulitply
>>> defined
>>> symbols, but I don't.
>> This (the interface between the language runtime and the library)
>> is the most difficult area of the library for independent library
>> authors like us to get right. We can't help but make assumptions
>> about the runtime. Some are based on our inspection of the native
>> library headers (which may change in subtle but sometimes
>> important ways from one release to another), others we try to
>> automate (the detection of the special member functions of the
>> exception classes). Both approaches are fraught with peril.
>>
> 
> Okay. So the appropriate solution is to modify the exception header to have
> the correct size. As I mentioned, I think I can easily write a config test
> to determine the necessary pad and apply it. Is that a better approach?

You mean as opposed to conditionally hardcoding it for XLC?
I'm not sure it's worth the trouble, although the information
would be useful in the test suite to verify that our size is
the same as the runtime classes.

> 
> At the very least I'd like to determine the size of the runtime library
> version of std::exception and then add a test that will check the size
> matches up with the size of the type we are providing. That way we have a
> way to be notified of the problem in the future instead of having to find it
> the hard way.

That's a good idea. In fact, I think there is a need for a class
of tests (configuration and/or otherwise) that are *required* to
pass in order to put out a release. Otherwise, unless we implement
a no-test-failures policy, we will risk putting out badly broken
releases.

Martin

Re: 19.exceptions.mt.cpp fails on AIX

Posted by Travis Vitek <vi...@roguewave.com>.

Martin Sebor wrote:
> 
>> That said, what is the appropriate solution? Should we just pad the type
>> out
>> to the correct size,
> 
> Yes. As Farid says, the XLC exception and ours must have the same
> size.
> 

Is this a binary compatible change? I've always believed changing the size
of a type breaks binary compatibility, but if the the special functions are
no-op, I'm thinking it might be safe.

Martin Sebor wrote:
> 
>>I guess I
>> would understand if I had got a linker error complaining of mulitply
>> defined
>> symbols, but I don't.
> 
> This (the interface between the language runtime and the library)
> is the most difficult area of the library for independent library
> authors like us to get right. We can't help but make assumptions
> about the runtime. Some are based on our inspection of the native
> library headers (which may change in subtle but sometimes
> important ways from one release to another), others we try to
> automate (the detection of the special member functions of the
> exception classes). Both approaches are fraught with peril.
> 

Okay. So the appropriate solution is to modify the exception header to have
the correct size. As I mentioned, I think I can easily write a config test
to determine the necessary pad and apply it. Is that a better approach?

At the very least I'd like to determine the size of the runtime library
version of std::exception and then add a test that will check the size
matches up with the size of the type we are providing. That way we have a
way to be notified of the problem in the future instead of having to find it
the hard way.

Travis
-- 
View this message in context: http://www.nabble.com/19.exceptions.mt.cpp-fails-on-AIX-tf4738595.html#a13556315
Sent from the stdcxx-dev mailing list archive at Nabble.com.

Re: 19.exceptions.mt.cpp fails on AIX

Posted by Martin Sebor <se...@roguewave.com>.

Travis Vitek wrote:
> 
> I'm working on fixing an issue with test 19.exceptions.mt.cpp on AIX. The
> issue is that the test spins in a loop because the loop counter is being
> thrashed when an exception is copied onto the stack. Here is a simple
> testcase.
> 
> #include <exception> 
> 
> void test_single_exception ()
> {
>     for (int i = 0; i < 5; ++i) {
> 
>         try {
>             throw std::exception ();
>         }
>         catch (std::exception ex) {
>             &ex;
>         }
> 
>     }
> }
> 
> int main ()
> {
>     test_single_exception ();
>     return 0;
> }
> 
> So the issue is actually not that complicated. The exception type provided
> by STDCXX is 4 bytes in size, and the native one is 8.

Yikes! That's very bad!

> In and of itself,
> that shouldn't really be a problem because the two definitions should never
> coexist, right?

Unfortunately, they do. All the exceptions thrown by the runtime
(bad_alloc, bad_cast, bad_exception, and bad_typeid) derive from
std::exception. So the runtime has one view of the exception
objects it throws but a program that uses stdcxx sees them
differently.

> So the problem really shows up when the config tests run,
> they set the following macros...
> 
> #define _RWSTD_NO_EXCEPTION_ASSIGNMENT
> // #define _RWSTD_NO_EXCEPTION_COPY_CTOR
> #define _RWSTD_NO_EXCEPTION_DEFAULT_CTOR
> #define _RWSTD_NO_EXCEPTION_DTOR
> #define _RWSTD_NO_EXCEPTION_WHAT
> 
> If I'm reading the code correctly, this means that STDCXX will provide
> definitions for the default ctor, copy-assignment operator, dtor and what().

Right.

> The definition of the copy-ctor will come from somewhere else [where?].

 From the xlC runtime library, libC.a.

> Anyways, an exception is created and copied out for the unwind. The
> exception is then copied back onto the stack at the location the exception
> is handled. The code that actually copies the exception expects the object
> to be 8 bytes in size, but the code that created the exception only
> allocates 4 bytes for it.
> 
> So here is my problem with all of this. How is this safe?

It's not. It's a serious bug in our library. Somehow we're missing
a data member in std::exception that the xlC runtime library defines.

> If the one of the
> 'special' functions provided by the system has some side effect, and we use
> that implementation, then how can we safely define any of the other
> 'special' functions?

The assumption/hope is that the functions are straightforward and
have no side-effects.

> 
> That said, what is the appropriate solution? Should we just pad the type out
> to the correct size,

Yes. As Farid says, the XLC exception and ours must have the same
size.

> or should we provide our own definition of the copy
> ctor, possibly looking at the compiler test to verify it is not wrong. Both
> seem to work quite well, but I'm afraid I don't understand why we opt to use
> the definitions of the 'special' functions that are provided. I guess I
> would understand if I had got a linker error complaining of mulitply defined
> symbols, but I don't.

This (the interface between the language runtime and the library)
is the most difficult area of the library for independent library
authors like us to get right. We can't help but make assumptions
about the runtime. Some are based on our inspection of the native
library headers (which may change in subtle but sometimes
important ways from one release to another), others we try to
automate (the detection of the special member functions of the
exception classes). Both approaches are fraught with peril.

Martin