You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2016/12/02 04:27:58 UTC
[jira] [Commented] (KUDU-695) contention on glog disk can block reactor threads

    [ https://issues.apache.org/jira/browse/KUDU-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713984#comment-15713984 ] 

Todd Lipcon commented on KUDU-695:
----------------------------------

In some Impala stress test under high concurrency we're seeing this be more than just an annoyance. With 500+ concurrent clients doing work, once the reactors get overloaded, a few timeouts happen, and those generate log messages, and then the issue cascades.

Here's an strace of a reactor thread inside an impalad:

{code}
futex(0x7f6950a7714c, FUTEX_WAIT_PRIVATE, 4006587, NULL) = 0 <6.962580>
futex(0x7f6950a7714c, FUTEX_WAIT_PRIVATE, 4008542, NULL) = 0 <7.222337>
gettid()                                = 160476 <0.000006>
write(7, "W1201 20:23:02.450058 160476 meta_cache.cc:774] Timed out: GetTableLocations { table: 'impala::tpch_100_kudu.partsupp', partition-key: (bucket=41), attempt: 174 } failed: timed out after deadline expired: GetTableLocations { table: 'impala::tpch_100_kudu.partsupp', partition-key: (bucket=41), attempt: 174 } passed its deadline: Timed out: client has too many outstanding requests to the master\n", 396) = 396 <0.000018>
fadvise64(7, 0, 87339008, POSIX_FADV_DONTNEED) = 0 <0.001017>
write(3, "W1201 20:23:02.450058 160476 meta_cache.cc:774] Timed out: GetTableLocations { table: 'impala::tpch_100_kudu.partsupp', partition-key: (bucket=41), attempt: 174 } failed: timed out after deadline expired: GetTableLocations { table: 'impala::tpch_100_kudu.partsupp', partition-key: (bucket=41), attempt: 174 } passed its deadline: Timed out: client has too many outstanding requests to the master\n", 396) = 396 <0.000018>
fadvise64(3, 0, 16257024, POSIX_FADV_DONTNEED) = 0 <0.000959>
futex(0x7f6950a7714c, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000015>
gettid()                                = 160476 <0.000013>
futex(0x7f6950a7714c, FUTEX_WAIT_PRIVATE, 4010498, NULL) = 0 <6.527559>
gettid()                                = 160476 <0.000006>
write(7, "W1201 20:23:08.980012 160476 meta_cache.cc:774] Timed out: GetTableLocations { table: 'impala::tpch_100_kudu.partsupp', partition-key: (bucket=41), attempt: 175 } failed: timed out after deadline expired: GetTableLocations { table: 'impala::tpch_100_kudu.partsupp', partition-key: (bucket=41), attempt: 175 } passed its deadline: Timed out: client has too many outstanding requests to the master\n", 396) = 396 <0.000016>
fadvise64(7, 0, 88068096, POSIX_FADV_DONTNEED) = 0 <0.000238>
write(3, "W1201 20:23:08.980012 160476 meta_cache.cc:774] Timed out: GetTableLocations { table: 'impala::tpch_100_kudu.partsupp', partition-key: (bucket=41), attempt: 175 } failed: timed out after deadline expired: GetTableLocations { table: 'impala::tpch_100_kudu.partsupp', partition-key: (bucket=41), attempt: 175 } passed its deadline: Timed out: client has too many outstanding requests to the master\n", 396) = 396 <0.000012>
fadvise64(3, 0, 17240064, POSIX_FADV_DONTNEED) = 0 <0.005418>
futex(0x7f6950a7714c, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000328>
gettid()                                = 160476 <0.000007>
futex(0x7f6950a7714c, FUTEX_WAIT_PRIVATE, 4012452, NULL) = 0 <6.870387>
futex(0x7f6950a7714c, FUTEX_WAIT_PRIVATE, 4014406, NULL) = 0 <7.152100>
futex(0x7f6950a7714c, FUTEX_WAIT_PRIVATE, 4016374, NULL) = 0 <6.935930>
gettid()                                = 160476 <0.000010>
{code}

You can see that it's waiting for 10+ seconds to acquire the lock, then it's able to log a couple messages, and then it has to block on the lock for another 10+ seconds.

I used perf to track overall lock acquistions:

{code}
[todd@ve1120 kudu]$ sudo perf record -a -e syscalls:sys_enter_futex  --filter 'uaddr == "0x7f6950a7714c"' -g sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 4.191 MB perf.data (~183118 samples) ]
[todd@ve1120 kudu]$ sudo perf script | grep -A1 'LogMess' | grep client:: | sort | uniq -c
    272 	    7f6950477224 kudu::client::internal::WriteRpc::Finish(kudu::Status const&) (/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.1/lib/impala/lib/libkudu_client.so.0.1.0)
     92 	    7f695049be85 kudu::client::internal::RemoteTablet::MarkReplicaFailed(kudu::client::internal::RemoteTabletServer*, kudu::Status const&) (/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.1/lib/impala/lib/libkudu_client.so.0.1.0)
  14453 	    7f69504a1bc5 kudu::client::internal::LookupRpc::SendRpcCb(kudu::Status const&) (/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.1/lib/impala/lib/libkudu_client.so.0.1.0)
{code}

so you can see that it's managing to turn over the lock a few thousand times per second, but that's still too much of a bottleneck given the huge number of threads here. You can also see that the vast majority are in the SendRpcCb function, so we should probably throttle that log message at the very least, but also prioritize some fix for this general issue.

> contention on glog disk can block reactor threads
> -------------------------------------------------
>
>                 Key: KUDU-695
>                 URL: https://issues.apache.org/jira/browse/KUDU-695
>             Project: Kudu
>          Issue Type: Bug
>          Components: perf
>    Affects Versions: Public beta
>            Reporter: Todd Lipcon
>
> After adding a watchdog which shows the stack trace of blocked reactor callbacks, I see that in many cases it's actually due to blocking within glog itself. We should see whether we can use an asynchronous sink in glog to avoid this, or else we need to be more careful to move even more work off of reactors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)