You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Vishal Sharda (JIRA)" <ji...@apache.org> on 2016/06/12 19:51:20 UTC

[jira] [Updated] (DISPATCH-382) Intermittent router crash when starting 50 receivers/0 senders and doing qdstat

     [ https://issues.apache.org/jira/browse/DISPATCH-382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vishal Sharda updated DISPATCH-382:
-----------------------------------
    Attachment: val_crash_2.txt
                val_crash_1.txt
                Crash_in_Valgrind_3.txt
                Crash_in_Valgrind_3.png
                Crash_in_Valgrind_2.png
                Crash_in_Valgrind_1.png

Attached 3 screenshots showing the crash and 3 output files from Valgrind for the corresponding runs.

Here is the information about the thread that lead to SIGABRT as per Valgrind:

==18841== Thread 2:
==18841== Invalid read of size 4
==18841==    at 0x52F7274: pthread_mutex_lock (pthread_mutex_lock.c:66)
==18841==    by 0x4E648E7: sys_mutex_lock (threading.c:70)
==18841==    by 0x4E70EDC: qdr_forward_deliver_CT (forwarder.c:132)
==18841==    by 0x4E71D4A: qdr_forward_closest_CT (forwarder.c:405)
==18841==    by 0x4E72A96: qdr_forward_message_CT (forwarder.c:707)
==18841==    by 0x4E7C0B9: qdr_send_to_CT (transfer.c:581)
==18841==    by 0x4E76623: router_core_thread (router_core_thread.c:71)
==18841==    by 0x52F50A3: start_thread (pthread_create.c:309)
==18841==    by 0x5F8387C: clone (clone.S:111)
==18841==  Address 0xdc42ee0 is 16 bytes inside a block of size 48 free'd
==18841==    at 0x4C28D29: free (vg_replace_malloc.c:530)
==18841==    by 0x4E648CD: sys_mutex_free (threading.c:64)
==18841==    by 0x4E6FBAF: qdr_connection_closed_CT (connections.c:972)
==18841==    by 0x4E76623: router_core_thread (router_core_thread.c:71)
==18841==    by 0x52F50A3: start_thread (pthread_create.c:309)
==18841==    by 0x5F8387C: clone (clone.S:111)
==18841==  Block was alloc'd at
==18841==    at 0x4C27C0F: malloc (vg_replace_malloc.c:299)
==18841==    by 0x4E64859: sys_mutex (threading.c:51)
==18841==    by 0x4E6D111: qdr_connection_opened (connections.c:85)
==18841==    by 0x4E7D7CA: AMQP_opened_handler (router_node.c:560)
==18841==    by 0x4E7D837: AMQP_inbound_opened_handler (router_node.c:572)
==18841==    by 0x4E5397D: notify_opened (container.c:261)
==18841==    by 0x4E53A0D: policy_notify_opened (container.c:275)
==18841==    by 0x4E61B3A: qd_policy_amqp_open (policy.c:744)
==18841==    by 0x4E81BC1: invoke_deferred_calls (server.c:720)
==18841==    by 0x4E81CE7: process_connector (server.c:766)
==18841==    by 0x4E827C0: thread_run (server.c:1024)
==18841==    by 0x52F50A3: start_thread (pthread_create.c:309)
==18841== 
==18841== Invalid read of size 4
==18841==    at 0x52F2A03: __pthread_mutex_lock_full (pthread_mutex_lock.c:177)
==18841==    by 0x4E648E7: sys_mutex_lock (threading.c:70)
==18841==    by 0x4E70EDC: qdr_forward_deliver_CT (forwarder.c:132)
==18841==    by 0x4E71D4A: qdr_forward_closest_CT (forwarder.c:405)
==18841==    by 0x4E72A96: qdr_forward_message_CT (forwarder.c:707)
==18841==    by 0x4E7C0B9: qdr_send_to_CT (transfer.c:581)
==18841==    by 0x4E76623: router_core_thread (router_core_thread.c:71)
==18841==    by 0x52F50A3: start_thread (pthread_create.c:309)
==18841==    by 0x5F8387C: clone (clone.S:111)
==18841==  Address 0xdc42ee0 is 16 bytes inside a block of size 48 free'd
==18841==    at 0x4C28D29: free (vg_replace_malloc.c:530)
==18841==    by 0x4E648CD: sys_mutex_free (threading.c:64)
==18841==    by 0x4E6FBAF: qdr_connection_closed_CT (connections.c:972)
==18841==    by 0x4E76623: router_core_thread (router_core_thread.c:71)
==18841==    by 0x52F50A3: start_thread (pthread_create.c:309)
==18841==    by 0x5F8387C: clone (clone.S:111)
==18841==  Block was alloc'd at
==18841==    at 0x4C27C0F: malloc (vg_replace_malloc.c:299)
==18841==    by 0x4E64859: sys_mutex (threading.c:51)
==18841==    by 0x4E6D111: qdr_connection_opened (connections.c:85)
==18841==    by 0x4E7D7CA: AMQP_opened_handler (router_node.c:560)
==18841==    by 0x4E7D837: AMQP_inbound_opened_handler (router_node.c:572)
==18841==    by 0x4E5397D: notify_opened (container.c:261)
==18841==    by 0x4E53A0D: policy_notify_opened (container.c:275)
==18841==    by 0x4E61B3A: qd_policy_amqp_open (policy.c:744)
==18841==    by 0x4E81BC1: invoke_deferred_calls (server.c:720)
==18841==    by 0x4E81CE7: process_connector (server.c:766)
==18841==    by 0x4E827C0: thread_run (server.c:1024)
==18841==    by 0x52F50A3: start_thread (pthread_create.c:309)
==18841== 
==18841== 
==18841== Process terminating with default action of signal 6 (SIGABRT)
==18841==    at 0x5ED0067: raise (raise.c:56)
==18841==    by 0x5ED1447: abort (abort.c:89)
==18841==    by 0x5EC9265: __assert_fail_base (assert.c:92)
==18841==    by 0x5EC9311: __assert_fail (assert.c:101)
==18841==    by 0x4E6490F: sys_mutex_lock (threading.c:71)
==18841==    by 0x4E70EDC: qdr_forward_deliver_CT (forwarder.c:132)
==18841==    by 0x4E71D4A: qdr_forward_closest_CT (forwarder.c:405)
==18841==    by 0x4E72A96: qdr_forward_message_CT (forwarder.c:707)
==18841==    by 0x4E7C0B9: qdr_send_to_CT (transfer.c:581)
==18841==    by 0x4E76623: router_core_thread (router_core_thread.c:71)
==18841==    by 0x52F50A3: start_thread (pthread_create.c:309)
==18841==    by 0x5F8387C: clone (clone.S:111)
==18841== 


> Intermittent router crash when starting 50 receivers/0 senders and doing qdstat
> -------------------------------------------------------------------------------
>
>                 Key: DISPATCH-382
>                 URL: https://issues.apache.org/jira/browse/DISPATCH-382
>             Project: Qpid Dispatch
>          Issue Type: Bug
>          Components: Routing Engine
>    Affects Versions: 0.6.0
>         Environment: Debian 8.3, Apache Qpid Proton 0.13.0-RC for drivers and dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD each on 3 separate machines
>            Reporter: Vishal Sharda
>            Priority: Blocker
>         Attachments: Crash_in_Valgrind_1.png, Crash_in_Valgrind_2.png, Crash_in_Valgrind_3.png, Crash_in_Valgrind_3.txt, val_crash_1.txt, val_crash_2.txt
>
>
> Network: A network of 3 interior routers built from trunk and connected to each other using 2-way SSL.
> We ran a Proton-J Reactor API based client to start 50 receivers and 0 senders on one of the above 3 routers.  After that we ran "qdstat -c".  This leads to intermittent crash in the router.  This crash could not be reproduced while running the routers independently or inside gdb.  When we run the routers inside Valgrind, this crash is frequent.  I was able to reproduce the crash 3 times using Valgrind (Screenshots and Valgrind output files are attached).
> This intermittent crash becomes permanent in our instrumented build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org