You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Michael Ho (JIRA)" <ji...@apache.org> on 2019/01/22 19:59:01 UTC

[jira] [Commented] (IMPALA-8027) KRPC datastream timing out on both the receiver and sender side even in a minicluster

    [ https://issues.apache.org/jira/browse/IMPALA-8027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749067#comment-16749067 ] 

Michael Ho commented on IMPALA-8027:
------------------------------------

Looks like the coordinator seems to be slow somehow between 05:16:52 and 05:18:56. Both failing queries had to do with time out waiting for the coordinator fragment not becoming ready within 2 minutes.

Log from coordinators:
{noformat}
I1228 05:16:55.374305 26714 query-state.cc:568] Executing instance. instance_id=8f46b2518734bef1:6ef2d40400000000 fragment_idx=0 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=9
I1228 05:16:55.374305 26710 coordinator.cc:368] started execution on 2 backends for query_id=8f46b2518734bef1:6ef2d40400000000
{noformat}

{noformat}
I1228 05:16:52.217275 26685 query-state.cc:568] Executing instance. instance_id=194f5b70907ac97c:84a116d600000000 fragment_idx=0 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=7
I1228 05:16:52.217211 26678 coordinator.cc:368] started execution on 3 backends for query_id=194f5b70907ac97c:84a116d600000000
{noformat}

The top output snapshots from that time period didn't suggest much other than Impalad wasn't scheduled much in them. Looking into query profiles next.


> KRPC datastream timing out on both the receiver and sender side even in a minicluster
> -------------------------------------------------------------------------------------
>
>                 Key: IMPALA-8027
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8027
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Distributed Exec
>    Affects Versions: Impala 3.2.0
>            Reporter: Bikramjeet Vig
>            Assignee: Michael Ho
>            Priority: Critical
>              Labels: broken-build
>
> krpc datastreams seem to time out at the same time at both sender and receiver causing two running queries to fail. This happened while running core tests on s3.
> Logs from coordinator:
> {noformat}
> I1228 05:18:56.202587 13396 krpc-data-stream-mgr.cc:353] Sender 127.0.0.1 timed out waiting for receiver fragment instance: 8f46b2518734bef1:6ef2d40400000000, dest node: 2
> I1228 05:18:56.203061 13396 rpcz_store.cc:265] Call impala.DataStreamService.TransmitData from 127.0.0.1:53118 (request call id 11274) took 120782ms. Request Metrics: {}
> I1228 05:18:56.203114 13396 krpc-data-stream-mgr.cc:353] Sender 127.0.0.1 timed out waiting for receiver fragment instance: 194f5b70907ac97c:84a116d600000000, dest node: 2
> I1228 05:18:56.203136 13396 rpcz_store.cc:265] Call impala.DataStreamService.TransmitData from 127.0.0.1:53110 (request call id 8637) took 123811ms. Request Metrics: {}
> I1228 05:18:56.203155 13396 krpc-data-stream-mgr.cc:353] Sender 127.0.0.1 timed out waiting for receiver fragment instance: 194f5b70907ac97c:84a116d600000000, dest node: 2
> I1228 05:18:56.203167 13396 rpcz_store.cc:265] Call impala.DataStreamService.TransmitData from 127.0.0.1:53118 (request call id 11273) took 123776ms. Request Metrics: {}
> I1228 05:18:56.203181 13396 krpc-data-stream-mgr.cc:408] Reduced stream ID cache from 413 items, to 410, eviction took: 1ms
> I1228 05:18:56.204746 13377 coordinator.cc:707] Backend completed: host=impala-ec2-centos74-m5-4xlarge-ondemand-07b3.vpc.cloudera.com:22001 remaining=2 query_id=8f46b2518734bef1:6ef2d40400000000
> I1228 05:18:56.204756 13377 coordinator-backend-state.cc:262] query_id=8f46b2518734bef1:6ef2d40400000000: first in-progress backend: impala-ec2-centos74-m5-4xlarge-ondemand-07b3.vpc.cloudera.com:22000
> I1228 05:18:56.204769 13377 coordinator.cc:522] ExecState: query id=8f46b2518734bef1:6ef2d40400000000 finstance=8f46b2518734bef1:6ef2d40400000001 on host=impala-ec2-centos74-m5-4xlarge-ondemand-07b3.vpc.cloudera.com:22001 (EXECUTING -> ERROR) status=Sender 127.0.0.1 timed out waiting for receiver fragment instance: 8f46b2518734bef1:6ef2d40400000000, dest node: 2
> {noformat}
> Logs from executor:
> {noformat}
> E1228 05:18:56.203181 26715 krpc-data-stream-sender.cc:343] channel send to 127.0.0.1:27000 failed: (fragment_instance_id=8f46b2518734bef1:6ef2d40400000000): Sender 127.0.0.1 timed out waiting for receiver fragment instance: 8f46b2518734bef1:6ef2d40400000000, dest node: 2
> E1228 05:18:56.203256 26682 krpc-data-stream-sender.cc:343] channel send to 127.0.0.1:27000 failed: (fragment_instance_id=194f5b70907ac97c:84a116d600000000): Sender 127.0.0.1 timed out waiting for receiver fragment instance: 194f5b70907ac97c:84a116d600000000, dest node: 2
> I1228 05:18:56.203451 26715 query-state.cc:576] Instance completed. instance_id=8f46b2518734bef1:6ef2d40400000001 #in-flight=3 status=DATASTREAM_SENDER_TIMEOUT: Sender 127.0.0.1 timed out waiting for receiver fragment instance: 8f46b2518734bef1:6ef2d40400000000, dest node: 2
> I1228 05:18:56.203485 26713 query-state.cc:249] UpdateBackendExecState(): last report for 8f46b2518734bef1:6ef2d40400000000
> I1228 05:18:56.203514 26682 query-state.cc:576] Instance completed. instance_id=194f5b70907ac97c:84a116d600000003 #in-flight=2 status=DATASTREAM_SENDER_TIMEOUT: Sender 127.0.0.1 timed out waiting for receiver fragment instance: 194f5b70907ac97c:84a116d600000000, dest node: 2
> I1228 05:18:56.203536 26680 query-state.cc:249] UpdateBackendExecState(): last report for 194f5b70907ac97c:84a116d600000000
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org