You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by Alan Conway <ac...@redhat.com> on 2016/06/16 21:05:12 UTC

Review Request 48816: QPID-7306 Fix memory bugs and race conditions found in a Qpid HA deployment

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48816/
-----------------------------------------------------------

Review request for qpid, Gordon Sim and Ted Ross.


Bugs: QPID-7306
    https://issues.apache.org/jira/browse/QPID-7306


Repository: qpid


Description
-------

Also available as separate commits on https://github.com/alanconway/qpid/commits/mc-rebase

QPID-7306: Memory management error in Link/Bridge

qpid::broker Link and Bridge use Connection::requestIOProcessing() to register
callbacks in the connection thread. They were binding a plain "this" pointer to
the callback, but the classes are managed by boost::shared_ptr so if all the
shared_ptr were released, the callback could happen on a dangling pointer.

This fix uses boost::weak_ptr in the callbacks, so if all shared_ptr instances
are released, we don't use the dead pointer.

Link::destroy cannot be skipped, so use a shared_ptr for that.

QPID-7306: Conditional compile mismatch in broker and common libs.

Removed _IN_QPID_BROKER compile definition:

Inconsistently set for libqpidcommon (compiled .cpp files) and libqpidbroker (uses .h) files
  - The broker has a binary incompatible notion of what is in the library.

It sort-of works by accident:
   - shared_ptr<T> only contains a T*, the mismatch is effectively doing reinterpret_cast<T*>
   - plain T* works for op->, but doesn't guarantee no concurrent deletes.
   - we create and destroy shared_ptr in libraries with _IN_QPID_BROKER set so
     we get cleanup, and no cores if we're lucky but there is limited protection from races.

Was only used by management:
  - I think exposing non-shared ptr GetObject was a feature that never materialized,
  - if not we need a separate function or class for non-shared-ptr users.

QPID-7306: Fix race conditions during Queue destruction.

Stack traces indicate a Queue was being destroyed concurrently while still in
use by its ManagedObject.

ManagedObject holds a plain pointer to the Manageable object (e.g. Queue) it
belongs to.  The Manageable calls ManagedObject::resourceDestroy() when it is
deleted, but without any locking.

Added a locked wrapper class ManageablePtr so destroy is atomic with respect to
other calls via ManageablePtr, calls after pointer is reset to 0 in destroy()
are skipped.

Call resourceDestroy() in Queue::~Queue if it was not called already. This is
probably redundant given given the fixes above but can't hurt.

Queue::destroyed() was also being called without locking and could be called
concurrrently, e.g. if auto-delete happens concurrently with delete via QMF or
by a 0-10 client. Moved the destroyed() call into QueueRegistry::destroy(),
using QueueRegistry lock to guarantee it is called exactly once.


Diffs
-----

  trunk/qpid/cpp/managementgen/qmfgen/schema.py 1748523 
  trunk/qpid/cpp/src/CMakeLists.txt 1748523 
  trunk/qpid/cpp/src/amqp.cmake 1748523 
  trunk/qpid/cpp/src/legacystore.cmake 1748523 
  trunk/qpid/cpp/src/linearstore.cmake 1748523 
  trunk/qpid/cpp/src/qpid/broker/Bridge.cpp 1748523 
  trunk/qpid/cpp/src/qpid/broker/Broker.cpp 1748523 
  trunk/qpid/cpp/src/qpid/broker/Link.h 1748523 
  trunk/qpid/cpp/src/qpid/broker/Link.cpp 1748523 
  trunk/qpid/cpp/src/qpid/broker/Queue.h 1748523 
  trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1748523 
  trunk/qpid/cpp/src/qpid/broker/QueueRegistry.cpp 1748523 
  trunk/qpid/cpp/src/qpid/broker/SelfDestructQueue.cpp 1748523 
  trunk/qpid/cpp/src/qpid/broker/amqp_0_10/Connection.h 1748523 
  trunk/qpid/cpp/src/qpid/management/Manageable.h 1748523 
  trunk/qpid/cpp/src/qpid/management/ManagementObject.h 1748523 
  trunk/qpid/cpp/src/qpid/management/ManagementObject.cpp 1748523 
  trunk/qpid/cpp/src/qpid/store/CMakeLists.txt 1748523 
  trunk/qpid/cpp/src/rdma.cmake 1748523 
  trunk/qpid/cpp/src/tests/CMakeLists.txt 1748523 

Diff: https://reviews.apache.org/r/48816/diff/


Testing
-------


Thanks,

Alan Conway


Re: Review Request 48816: QPID-7306 Fix memory bugs and race conditions found in a Qpid HA deployment

Posted by Gordon Sim <gs...@redhat.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48816/#review138195
-----------------------------------------------------------


Ship it!




Ship It!

- Gordon Sim


On June 16, 2016, 9:05 p.m., Alan Conway wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48816/
> -----------------------------------------------------------
> 
> (Updated June 16, 2016, 9:05 p.m.)
> 
> 
> Review request for qpid, Gordon Sim and Ted Ross.
> 
> 
> Bugs: QPID-7306
>     https://issues.apache.org/jira/browse/QPID-7306
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Also available as separate commits on https://github.com/alanconway/qpid/commits/mc-rebase
> 
> QPID-7306: Memory management error in Link/Bridge
> 
> qpid::broker Link and Bridge use Connection::requestIOProcessing() to register
> callbacks in the connection thread. They were binding a plain "this" pointer to
> the callback, but the classes are managed by boost::shared_ptr so if all the
> shared_ptr were released, the callback could happen on a dangling pointer.
> 
> This fix uses boost::weak_ptr in the callbacks, so if all shared_ptr instances
> are released, we don't use the dead pointer.
> 
> Link::destroy cannot be skipped, so use a shared_ptr for that.
> 
> QPID-7306: Conditional compile mismatch in broker and common libs.
> 
> Removed _IN_QPID_BROKER compile definition:
> 
> Inconsistently set for libqpidcommon (compiled .cpp files) and libqpidbroker (uses .h) files
>   - The broker has a binary incompatible notion of what is in the library.
> 
> It sort-of works by accident:
>    - shared_ptr<T> only contains a T*, the mismatch is effectively doing reinterpret_cast<T*>
>    - plain T* works for op->, but doesn't guarantee no concurrent deletes.
>    - we create and destroy shared_ptr in libraries with _IN_QPID_BROKER set so
>      we get cleanup, and no cores if we're lucky but there is limited protection from races.
> 
> Was only used by management:
>   - I think exposing non-shared ptr GetObject was a feature that never materialized,
>   - if not we need a separate function or class for non-shared-ptr users.
> 
> QPID-7306: Fix race conditions during Queue destruction.
> 
> Stack traces indicate a Queue was being destroyed concurrently while still in
> use by its ManagedObject.
> 
> ManagedObject holds a plain pointer to the Manageable object (e.g. Queue) it
> belongs to.  The Manageable calls ManagedObject::resourceDestroy() when it is
> deleted, but without any locking.
> 
> Added a locked wrapper class ManageablePtr so destroy is atomic with respect to
> other calls via ManageablePtr, calls after pointer is reset to 0 in destroy()
> are skipped.
> 
> Call resourceDestroy() in Queue::~Queue if it was not called already. This is
> probably redundant given given the fixes above but can't hurt.
> 
> Queue::destroyed() was also being called without locking and could be called
> concurrrently, e.g. if auto-delete happens concurrently with delete via QMF or
> by a 0-10 client. Moved the destroyed() call into QueueRegistry::destroy(),
> using QueueRegistry lock to guarantee it is called exactly once.
> 
> 
> Diffs
> -----
> 
>   trunk/qpid/cpp/managementgen/qmfgen/schema.py 1748523 
>   trunk/qpid/cpp/src/CMakeLists.txt 1748523 
>   trunk/qpid/cpp/src/amqp.cmake 1748523 
>   trunk/qpid/cpp/src/legacystore.cmake 1748523 
>   trunk/qpid/cpp/src/linearstore.cmake 1748523 
>   trunk/qpid/cpp/src/qpid/broker/Bridge.cpp 1748523 
>   trunk/qpid/cpp/src/qpid/broker/Broker.cpp 1748523 
>   trunk/qpid/cpp/src/qpid/broker/Link.h 1748523 
>   trunk/qpid/cpp/src/qpid/broker/Link.cpp 1748523 
>   trunk/qpid/cpp/src/qpid/broker/Queue.h 1748523 
>   trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1748523 
>   trunk/qpid/cpp/src/qpid/broker/QueueRegistry.cpp 1748523 
>   trunk/qpid/cpp/src/qpid/broker/SelfDestructQueue.cpp 1748523 
>   trunk/qpid/cpp/src/qpid/broker/amqp_0_10/Connection.h 1748523 
>   trunk/qpid/cpp/src/qpid/management/Manageable.h 1748523 
>   trunk/qpid/cpp/src/qpid/management/ManagementObject.h 1748523 
>   trunk/qpid/cpp/src/qpid/management/ManagementObject.cpp 1748523 
>   trunk/qpid/cpp/src/qpid/store/CMakeLists.txt 1748523 
>   trunk/qpid/cpp/src/rdma.cmake 1748523 
>   trunk/qpid/cpp/src/tests/CMakeLists.txt 1748523 
> 
> Diff: https://reviews.apache.org/r/48816/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alan Conway
> 
>


Re: Review Request 48816: QPID-7306 Fix memory bugs and race conditions found in a Qpid HA deployment

Posted by Andrew Stitcher <as...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48816/#review139263
-----------------------------------------------------------




trunk/qpid/cpp/src/qpid/management/Manageable.h 
<https://reviews.apache.org/r/48816/#comment204435>

    I think the original reason for this total madness was keeping ABI compatibility with some other management agent code that used this library.
    
    Good riddance!


- Andrew Stitcher


On June 16, 2016, 9:05 p.m., Alan Conway wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48816/
> -----------------------------------------------------------
> 
> (Updated June 16, 2016, 9:05 p.m.)
> 
> 
> Review request for qpid, Gordon Sim and Ted Ross.
> 
> 
> Bugs: QPID-7306
>     https://issues.apache.org/jira/browse/QPID-7306
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Also available as separate commits on https://github.com/alanconway/qpid/commits/mc-rebase
> 
> QPID-7306: Memory management error in Link/Bridge
> 
> qpid::broker Link and Bridge use Connection::requestIOProcessing() to register
> callbacks in the connection thread. They were binding a plain "this" pointer to
> the callback, but the classes are managed by boost::shared_ptr so if all the
> shared_ptr were released, the callback could happen on a dangling pointer.
> 
> This fix uses boost::weak_ptr in the callbacks, so if all shared_ptr instances
> are released, we don't use the dead pointer.
> 
> Link::destroy cannot be skipped, so use a shared_ptr for that.
> 
> QPID-7306: Conditional compile mismatch in broker and common libs.
> 
> Removed _IN_QPID_BROKER compile definition:
> 
> Inconsistently set for libqpidcommon (compiled .cpp files) and libqpidbroker (uses .h) files
>   - The broker has a binary incompatible notion of what is in the library.
> 
> It sort-of works by accident:
>    - shared_ptr<T> only contains a T*, the mismatch is effectively doing reinterpret_cast<T*>
>    - plain T* works for op->, but doesn't guarantee no concurrent deletes.
>    - we create and destroy shared_ptr in libraries with _IN_QPID_BROKER set so
>      we get cleanup, and no cores if we're lucky but there is limited protection from races.
> 
> Was only used by management:
>   - I think exposing non-shared ptr GetObject was a feature that never materialized,
>   - if not we need a separate function or class for non-shared-ptr users.
> 
> QPID-7306: Fix race conditions during Queue destruction.
> 
> Stack traces indicate a Queue was being destroyed concurrently while still in
> use by its ManagedObject.
> 
> ManagedObject holds a plain pointer to the Manageable object (e.g. Queue) it
> belongs to.  The Manageable calls ManagedObject::resourceDestroy() when it is
> deleted, but without any locking.
> 
> Added a locked wrapper class ManageablePtr so destroy is atomic with respect to
> other calls via ManageablePtr, calls after pointer is reset to 0 in destroy()
> are skipped.
> 
> Call resourceDestroy() in Queue::~Queue if it was not called already. This is
> probably redundant given given the fixes above but can't hurt.
> 
> Queue::destroyed() was also being called without locking and could be called
> concurrrently, e.g. if auto-delete happens concurrently with delete via QMF or
> by a 0-10 client. Moved the destroyed() call into QueueRegistry::destroy(),
> using QueueRegistry lock to guarantee it is called exactly once.
> 
> 
> Diffs
> -----
> 
>   trunk/qpid/cpp/managementgen/qmfgen/schema.py 1748523 
>   trunk/qpid/cpp/src/CMakeLists.txt 1748523 
>   trunk/qpid/cpp/src/amqp.cmake 1748523 
>   trunk/qpid/cpp/src/legacystore.cmake 1748523 
>   trunk/qpid/cpp/src/linearstore.cmake 1748523 
>   trunk/qpid/cpp/src/qpid/broker/Bridge.cpp 1748523 
>   trunk/qpid/cpp/src/qpid/broker/Broker.cpp 1748523 
>   trunk/qpid/cpp/src/qpid/broker/Link.h 1748523 
>   trunk/qpid/cpp/src/qpid/broker/Link.cpp 1748523 
>   trunk/qpid/cpp/src/qpid/broker/Queue.h 1748523 
>   trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1748523 
>   trunk/qpid/cpp/src/qpid/broker/QueueRegistry.cpp 1748523 
>   trunk/qpid/cpp/src/qpid/broker/SelfDestructQueue.cpp 1748523 
>   trunk/qpid/cpp/src/qpid/broker/amqp_0_10/Connection.h 1748523 
>   trunk/qpid/cpp/src/qpid/management/Manageable.h 1748523 
>   trunk/qpid/cpp/src/qpid/management/ManagementObject.h 1748523 
>   trunk/qpid/cpp/src/qpid/management/ManagementObject.cpp 1748523 
>   trunk/qpid/cpp/src/qpid/store/CMakeLists.txt 1748523 
>   trunk/qpid/cpp/src/rdma.cmake 1748523 
>   trunk/qpid/cpp/src/tests/CMakeLists.txt 1748523 
> 
> Diff: https://reviews.apache.org/r/48816/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alan Conway
> 
>