You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by Alan Conway <ac...@redhat.com> on 2016/06/16 21:05:12 UTC
Review Request 48816: QPID-7306 Fix memory bugs and race conditions
found in a Qpid HA deployment
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48816/
-----------------------------------------------------------
Review request for qpid, Gordon Sim and Ted Ross.
Bugs: QPID-7306
https://issues.apache.org/jira/browse/QPID-7306
Repository: qpid
Description
-------
Also available as separate commits on https://github.com/alanconway/qpid/commits/mc-rebase
QPID-7306: Memory management error in Link/Bridge
qpid::broker Link and Bridge use Connection::requestIOProcessing() to register
callbacks in the connection thread. They were binding a plain "this" pointer to
the callback, but the classes are managed by boost::shared_ptr so if all the
shared_ptr were released, the callback could happen on a dangling pointer.
This fix uses boost::weak_ptr in the callbacks, so if all shared_ptr instances
are released, we don't use the dead pointer.
Link::destroy cannot be skipped, so use a shared_ptr for that.
QPID-7306: Conditional compile mismatch in broker and common libs.
Removed _IN_QPID_BROKER compile definition:
Inconsistently set for libqpidcommon (compiled .cpp files) and libqpidbroker (uses .h) files
- The broker has a binary incompatible notion of what is in the library.
It sort-of works by accident:
- shared_ptr<T> only contains a T*, the mismatch is effectively doing reinterpret_cast<T*>
- plain T* works for op->, but doesn't guarantee no concurrent deletes.
- we create and destroy shared_ptr in libraries with _IN_QPID_BROKER set so
we get cleanup, and no cores if we're lucky but there is limited protection from races.
Was only used by management:
- I think exposing non-shared ptr GetObject was a feature that never materialized,
- if not we need a separate function or class for non-shared-ptr users.
QPID-7306: Fix race conditions during Queue destruction.
Stack traces indicate a Queue was being destroyed concurrently while still in
use by its ManagedObject.
ManagedObject holds a plain pointer to the Manageable object (e.g. Queue) it
belongs to. The Manageable calls ManagedObject::resourceDestroy() when it is
deleted, but without any locking.
Added a locked wrapper class ManageablePtr so destroy is atomic with respect to
other calls via ManageablePtr, calls after pointer is reset to 0 in destroy()
are skipped.
Call resourceDestroy() in Queue::~Queue if it was not called already. This is
probably redundant given given the fixes above but can't hurt.
Queue::destroyed() was also being called without locking and could be called
concurrrently, e.g. if auto-delete happens concurrently with delete via QMF or
by a 0-10 client. Moved the destroyed() call into QueueRegistry::destroy(),
using QueueRegistry lock to guarantee it is called exactly once.
Diffs
-----
trunk/qpid/cpp/managementgen/qmfgen/schema.py 1748523
trunk/qpid/cpp/src/CMakeLists.txt 1748523
trunk/qpid/cpp/src/amqp.cmake 1748523
trunk/qpid/cpp/src/legacystore.cmake 1748523
trunk/qpid/cpp/src/linearstore.cmake 1748523
trunk/qpid/cpp/src/qpid/broker/Bridge.cpp 1748523
trunk/qpid/cpp/src/qpid/broker/Broker.cpp 1748523
trunk/qpid/cpp/src/qpid/broker/Link.h 1748523
trunk/qpid/cpp/src/qpid/broker/Link.cpp 1748523
trunk/qpid/cpp/src/qpid/broker/Queue.h 1748523
trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1748523
trunk/qpid/cpp/src/qpid/broker/QueueRegistry.cpp 1748523
trunk/qpid/cpp/src/qpid/broker/SelfDestructQueue.cpp 1748523
trunk/qpid/cpp/src/qpid/broker/amqp_0_10/Connection.h 1748523
trunk/qpid/cpp/src/qpid/management/Manageable.h 1748523
trunk/qpid/cpp/src/qpid/management/ManagementObject.h 1748523
trunk/qpid/cpp/src/qpid/management/ManagementObject.cpp 1748523
trunk/qpid/cpp/src/qpid/store/CMakeLists.txt 1748523
trunk/qpid/cpp/src/rdma.cmake 1748523
trunk/qpid/cpp/src/tests/CMakeLists.txt 1748523
Diff: https://reviews.apache.org/r/48816/diff/
Testing
-------
Thanks,
Alan Conway
Re: Review Request 48816: QPID-7306 Fix memory bugs and race
conditions found in a Qpid HA deployment
Posted by Gordon Sim <gs...@redhat.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48816/#review138195
-----------------------------------------------------------
Ship it!
Ship It!
- Gordon Sim
On June 16, 2016, 9:05 p.m., Alan Conway wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48816/
> -----------------------------------------------------------
>
> (Updated June 16, 2016, 9:05 p.m.)
>
>
> Review request for qpid, Gordon Sim and Ted Ross.
>
>
> Bugs: QPID-7306
> https://issues.apache.org/jira/browse/QPID-7306
>
>
> Repository: qpid
>
>
> Description
> -------
>
> Also available as separate commits on https://github.com/alanconway/qpid/commits/mc-rebase
>
> QPID-7306: Memory management error in Link/Bridge
>
> qpid::broker Link and Bridge use Connection::requestIOProcessing() to register
> callbacks in the connection thread. They were binding a plain "this" pointer to
> the callback, but the classes are managed by boost::shared_ptr so if all the
> shared_ptr were released, the callback could happen on a dangling pointer.
>
> This fix uses boost::weak_ptr in the callbacks, so if all shared_ptr instances
> are released, we don't use the dead pointer.
>
> Link::destroy cannot be skipped, so use a shared_ptr for that.
>
> QPID-7306: Conditional compile mismatch in broker and common libs.
>
> Removed _IN_QPID_BROKER compile definition:
>
> Inconsistently set for libqpidcommon (compiled .cpp files) and libqpidbroker (uses .h) files
> - The broker has a binary incompatible notion of what is in the library.
>
> It sort-of works by accident:
> - shared_ptr<T> only contains a T*, the mismatch is effectively doing reinterpret_cast<T*>
> - plain T* works for op->, but doesn't guarantee no concurrent deletes.
> - we create and destroy shared_ptr in libraries with _IN_QPID_BROKER set so
> we get cleanup, and no cores if we're lucky but there is limited protection from races.
>
> Was only used by management:
> - I think exposing non-shared ptr GetObject was a feature that never materialized,
> - if not we need a separate function or class for non-shared-ptr users.
>
> QPID-7306: Fix race conditions during Queue destruction.
>
> Stack traces indicate a Queue was being destroyed concurrently while still in
> use by its ManagedObject.
>
> ManagedObject holds a plain pointer to the Manageable object (e.g. Queue) it
> belongs to. The Manageable calls ManagedObject::resourceDestroy() when it is
> deleted, but without any locking.
>
> Added a locked wrapper class ManageablePtr so destroy is atomic with respect to
> other calls via ManageablePtr, calls after pointer is reset to 0 in destroy()
> are skipped.
>
> Call resourceDestroy() in Queue::~Queue if it was not called already. This is
> probably redundant given given the fixes above but can't hurt.
>
> Queue::destroyed() was also being called without locking and could be called
> concurrrently, e.g. if auto-delete happens concurrently with delete via QMF or
> by a 0-10 client. Moved the destroyed() call into QueueRegistry::destroy(),
> using QueueRegistry lock to guarantee it is called exactly once.
>
>
> Diffs
> -----
>
> trunk/qpid/cpp/managementgen/qmfgen/schema.py 1748523
> trunk/qpid/cpp/src/CMakeLists.txt 1748523
> trunk/qpid/cpp/src/amqp.cmake 1748523
> trunk/qpid/cpp/src/legacystore.cmake 1748523
> trunk/qpid/cpp/src/linearstore.cmake 1748523
> trunk/qpid/cpp/src/qpid/broker/Bridge.cpp 1748523
> trunk/qpid/cpp/src/qpid/broker/Broker.cpp 1748523
> trunk/qpid/cpp/src/qpid/broker/Link.h 1748523
> trunk/qpid/cpp/src/qpid/broker/Link.cpp 1748523
> trunk/qpid/cpp/src/qpid/broker/Queue.h 1748523
> trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1748523
> trunk/qpid/cpp/src/qpid/broker/QueueRegistry.cpp 1748523
> trunk/qpid/cpp/src/qpid/broker/SelfDestructQueue.cpp 1748523
> trunk/qpid/cpp/src/qpid/broker/amqp_0_10/Connection.h 1748523
> trunk/qpid/cpp/src/qpid/management/Manageable.h 1748523
> trunk/qpid/cpp/src/qpid/management/ManagementObject.h 1748523
> trunk/qpid/cpp/src/qpid/management/ManagementObject.cpp 1748523
> trunk/qpid/cpp/src/qpid/store/CMakeLists.txt 1748523
> trunk/qpid/cpp/src/rdma.cmake 1748523
> trunk/qpid/cpp/src/tests/CMakeLists.txt 1748523
>
> Diff: https://reviews.apache.org/r/48816/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Alan Conway
>
>
Re: Review Request 48816: QPID-7306 Fix memory bugs and race
conditions found in a Qpid HA deployment
Posted by Andrew Stitcher <as...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48816/#review139263
-----------------------------------------------------------
trunk/qpid/cpp/src/qpid/management/Manageable.h
<https://reviews.apache.org/r/48816/#comment204435>
I think the original reason for this total madness was keeping ABI compatibility with some other management agent code that used this library.
Good riddance!
- Andrew Stitcher
On June 16, 2016, 9:05 p.m., Alan Conway wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48816/
> -----------------------------------------------------------
>
> (Updated June 16, 2016, 9:05 p.m.)
>
>
> Review request for qpid, Gordon Sim and Ted Ross.
>
>
> Bugs: QPID-7306
> https://issues.apache.org/jira/browse/QPID-7306
>
>
> Repository: qpid
>
>
> Description
> -------
>
> Also available as separate commits on https://github.com/alanconway/qpid/commits/mc-rebase
>
> QPID-7306: Memory management error in Link/Bridge
>
> qpid::broker Link and Bridge use Connection::requestIOProcessing() to register
> callbacks in the connection thread. They were binding a plain "this" pointer to
> the callback, but the classes are managed by boost::shared_ptr so if all the
> shared_ptr were released, the callback could happen on a dangling pointer.
>
> This fix uses boost::weak_ptr in the callbacks, so if all shared_ptr instances
> are released, we don't use the dead pointer.
>
> Link::destroy cannot be skipped, so use a shared_ptr for that.
>
> QPID-7306: Conditional compile mismatch in broker and common libs.
>
> Removed _IN_QPID_BROKER compile definition:
>
> Inconsistently set for libqpidcommon (compiled .cpp files) and libqpidbroker (uses .h) files
> - The broker has a binary incompatible notion of what is in the library.
>
> It sort-of works by accident:
> - shared_ptr<T> only contains a T*, the mismatch is effectively doing reinterpret_cast<T*>
> - plain T* works for op->, but doesn't guarantee no concurrent deletes.
> - we create and destroy shared_ptr in libraries with _IN_QPID_BROKER set so
> we get cleanup, and no cores if we're lucky but there is limited protection from races.
>
> Was only used by management:
> - I think exposing non-shared ptr GetObject was a feature that never materialized,
> - if not we need a separate function or class for non-shared-ptr users.
>
> QPID-7306: Fix race conditions during Queue destruction.
>
> Stack traces indicate a Queue was being destroyed concurrently while still in
> use by its ManagedObject.
>
> ManagedObject holds a plain pointer to the Manageable object (e.g. Queue) it
> belongs to. The Manageable calls ManagedObject::resourceDestroy() when it is
> deleted, but without any locking.
>
> Added a locked wrapper class ManageablePtr so destroy is atomic with respect to
> other calls via ManageablePtr, calls after pointer is reset to 0 in destroy()
> are skipped.
>
> Call resourceDestroy() in Queue::~Queue if it was not called already. This is
> probably redundant given given the fixes above but can't hurt.
>
> Queue::destroyed() was also being called without locking and could be called
> concurrrently, e.g. if auto-delete happens concurrently with delete via QMF or
> by a 0-10 client. Moved the destroyed() call into QueueRegistry::destroy(),
> using QueueRegistry lock to guarantee it is called exactly once.
>
>
> Diffs
> -----
>
> trunk/qpid/cpp/managementgen/qmfgen/schema.py 1748523
> trunk/qpid/cpp/src/CMakeLists.txt 1748523
> trunk/qpid/cpp/src/amqp.cmake 1748523
> trunk/qpid/cpp/src/legacystore.cmake 1748523
> trunk/qpid/cpp/src/linearstore.cmake 1748523
> trunk/qpid/cpp/src/qpid/broker/Bridge.cpp 1748523
> trunk/qpid/cpp/src/qpid/broker/Broker.cpp 1748523
> trunk/qpid/cpp/src/qpid/broker/Link.h 1748523
> trunk/qpid/cpp/src/qpid/broker/Link.cpp 1748523
> trunk/qpid/cpp/src/qpid/broker/Queue.h 1748523
> trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1748523
> trunk/qpid/cpp/src/qpid/broker/QueueRegistry.cpp 1748523
> trunk/qpid/cpp/src/qpid/broker/SelfDestructQueue.cpp 1748523
> trunk/qpid/cpp/src/qpid/broker/amqp_0_10/Connection.h 1748523
> trunk/qpid/cpp/src/qpid/management/Manageable.h 1748523
> trunk/qpid/cpp/src/qpid/management/ManagementObject.h 1748523
> trunk/qpid/cpp/src/qpid/management/ManagementObject.cpp 1748523
> trunk/qpid/cpp/src/qpid/store/CMakeLists.txt 1748523
> trunk/qpid/cpp/src/rdma.cmake 1748523
> trunk/qpid/cpp/src/tests/CMakeLists.txt 1748523
>
> Diff: https://reviews.apache.org/r/48816/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Alan Conway
>
>