You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/11/18 06:03:00 UTC

[jira] [Commented] (IMPALA-10339) Apparent hang in TestSpillingNoDebugActionDimensions.test_spilling_no_debug_action

    [ https://issues.apache.org/jira/browse/IMPALA-10339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234282#comment-17234282 ] 

Tim Armstrong commented on IMPALA-10339:
----------------------------------------

It seems like a fragment was not making progress:
{noformat}
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com.jenkins.log.INFO.20201111-182145.10331:I1111 23:12:10.873950 10635 control-service.cc:142] de444e75ef44ee0b:a1614bfe00000000] ExecQueryFInstances(): query_id=de444e75ef44ee0b:a1614bfe00000000 coord=impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com:27000 #instances=2
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com.jenkins.log.INFO.20201111-182145.10331:I1111 23:12:10.874297 19104 query-state.cc:897] de444e75ef44ee0b:a1614bfe00000001] Executing instance. instance_id=de444e75ef44ee0b:a1614bfe00000001 fragment_idx=2 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=4
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com.jenkins.log.INFO.20201111-182145.10331:I1111 23:12:10.874364 19105 query-state.cc:897] de444e75ef44ee0b:a1614bfe00000000] Executing instance. instance_id=de444e75ef44ee0b:a1614bfe00000000 fragment_idx=0 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=5
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com.jenkins.log.INFO.20201111-182145.10331:I1111 23:12:13.723183 19104 krpc-data-stream-mgr.cc:337] de444e75ef44ee0b:a1614bfe00000001] cancelling active streams for fragment_instance_id=de444e75ef44ee0b:a1614bfe00000001
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com.jenkins.log.INFO.20201111-182145.10331:I1111 23:12:13.723249 19104 query-state.cc:906] de444e75ef44ee0b:a1614bfe00000001] Instance completed. instance_id=de444e75ef44ee0b:a1614bfe00000001 #in-flight=4 status=OK
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com.jenkins.log.INFO.20201111-182145.10331:I1112 01:12:12.851925 17396 coordinator-backend-state.cc:921] query_id=de444e75ef44ee0b:a1614bfe00000000 target backend=127.0.0.1:27000: Sending CancelQueryFInstances rpc
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com.jenkins.log.INFO.20201111-182145.10331:I1112 01:12:12.852037 19105 krpc-data-stream-mgr.cc:337] de444e75ef44ee0b:a1614bfe00000000] cancelling active streams for fragment_instance_id=de444e75ef44ee0b:a1614bfe00000000
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com.jenkins.log.INFO.20201111-182145.10331:I1112 01:12:12.852058 19105 krpc-data-stream-mgr.cc:306] de444e75ef44ee0b:a1614bfe00000000] DeregisterRecvr(): fragment_instance_id=de444e75ef44ee0b:a1614bfe00000000, node=5
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com.jenkins.log.INFO.20201111-182145.10331:I1112 01:12:12.852077 19105 query-state.cc:906] de444e75ef44ee0b:a1614bfe00000000] Instance completed. instance_id=de444e75ef44ee0b:a1614bfe00000000 #in-flight=0 status=CANCELLED: Cancelled
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com.jenkins.log.INFO.20201111-182145.10331:I1112 01:12:12.852396 19101 query-state.cc:686] de444e75ef44ee0b:a1614bfe00000000] Cancelling fragment instances as directed by the coordinator. Returned status: ReportExecStatus(): Received report for unknown query ID (probably closed or cancelled): de444e75ef44ee0b:a1614bfe00000000 remote host=127.0.0.1:57846
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com.jenkins.log.INFO.20201111-182145.10336:I1111 23:12:10.874104 10940 control-service.cc:142] de444e75ef44ee0b:a1614bfe00000000] ExecQueryFInstances(): query_id=de444e75ef44ee0b:a1614bfe00000000 coord=impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com:27000 #instances=2
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com.jenkins.log.INFO.20201111-182145.10336:I1111 23:12:10.874544 19107 query-state.cc:897] de444e75ef44ee0b:a1614bfe00000003] Executing instance. instance_id=de444e75ef44ee0b:a1614bfe00000003 fragment_idx=2 per_fragment_instance_idx=2 coord_state_idx=1 #in-flight=2
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com.jenkins.log.INFO.20201111-182145.10336:I1111 23:12:10.874622 19108 query-state.cc:897] de444e75ef44ee0b:a1614bfe00000004] Executing instance. instance_id=de444e75ef44ee0b:a1614bfe00000004 fragment_idx=1 per_fragment_instance_idx=0 coord_state_idx=1 #in-flight=3
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com.jenkins.log.INFO.20201111-182145.10336:I1111 23:12:14.104694 19107 krpc-data-stream-mgr.cc:337] de444e75ef44ee0b:a1614bfe00000003] cancelling active streams for fragment_instance_id=de444e75ef44ee0b:a1614bfe00000003
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com.jenkins.log.INFO.20201111-182145.10336:I1111 23:12:14.104787 19107 query-state.cc:906] de444e75ef44ee0b:a1614bfe00000003] Instance completed. instance_id=de444e75ef44ee0b:a1614bfe00000003 #in-flight=2 status=OK
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com.jenkins.log.INFO.20201111-182145.10336:I1111 23:12:14.163125 19191 krpc-data-stream-mgr.cc:306] de444e75ef44ee0b:a1614bfe00000004] DeregisterRecvr(): fragment_instance_id=de444e75ef44ee0b:a1614bfe00000004, node=4
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com.jenkins.log.INFO.20201111-182145.10336:I1111 23:12:15.854332 19108 krpc-data-stream-mgr.cc:337] de444e75ef44ee0b:a1614bfe00000004] cancelling active streams for fragment_instance_id=de444e75ef44ee0b:a1614bfe00000004
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com.jenkins.log.INFO.20201111-182145.10336:I1111 23:12:15.875731 19108 query-state.cc:906] de444e75ef44ee0b:a1614bfe00000004] Instance completed. instance_id=de444e75ef44ee0b:a1614bfe00000004 #in-flight=3 status=PARTITIONED_HASH_JOIN_REPARTITION_FAILS: Cannot perform hash join at node with id 2. Repartitioning did not reduce the size of a spilled partition. Repartitioning level 2. Number of rows 1500000:
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com.jenkins.log.INFO.20201111-182145.10342:I1111 23:12:10.874073 11247 control-service.cc:142] de444e75ef44ee0b:a1614bfe00000000] ExecQueryFInstances(): query_id=de444e75ef44ee0b:a1614bfe00000000 coord=impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com:27000 #instances=1
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com.jenkins.log.INFO.20201111-182145.10342:I1111 23:12:10.874435 19106 query-state.cc:897] de444e75ef44ee0b:a1614bfe00000002] Executing instance. instance_id=de444e75ef44ee0b:a1614bfe00000002 fragment_idx=2 per_fragment_instance_idx=1 coord_state_idx=2 #in-flight=2
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com.jenkins.log.INFO.20201111-182145.10342:I1111 23:12:14.144906 19106 krpc-data-stream-mgr.cc:337] de444e75ef44ee0b:a1614bfe00000002] cancelling active streams for fragment_instance_id=de444e75ef44ee0b:a1614bfe00000002
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17ee.vpc.cloudera.com.jenkins.log.INFO.20201111-182145.10342:I1111 23:12:14.144990 19106 query-state.cc:906] de444e75ef44ee0b:a1614bfe00000002] Instance completed. instance_id=de444e75ef44ee0b:a1614bfe00000002 #in-flight=0 status=OK
{noformat}

Makes me suspicious that it's somehow connected to the other hangs I've seen.

> Apparent hang in TestSpillingNoDebugActionDimensions.test_spilling_no_debug_action
> ----------------------------------------------------------------------------------
>
>                 Key: IMPALA-10339
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10339
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.0
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Blocker
>              Labels: broken-build, flaky, hang
>
> Release build with this commit as the tip:
> {noformat}
> commit 9400e9b17b13f613defb6d7b9deb471256b1d95c (CDH/cdpd-master-staging)
> Author: wzhou-code <wz...@cloudera.com>
> Date:   Thu Oct 29 22:32:03 2020 -0700
>     IMPALA-10305: Sync Kudu's FIPS compliant changes
>     
> {noformat}
> {noformat}
> Regression
> query_test.test_spilling.TestSpillingNoDebugActionDimensions.test_spilling_no_debug_action[protocol: beeswax | exec_option: {'mt_dop': 0, 'default_spillable_buffer_size': '64k'} | table_format: parquet/none] (from pytest)
> Failing for the past 1 build (Since Failed#100 )
> Took 1 hr 59 min.
> add description
> Error Message
> query_test/test_spilling.py:134: in test_spilling_no_debug_action     self.run_test_case('QueryTest/spilling-no-debug-action', vector) common/impala_test_suite.py:668: in run_test_case     self.__verify_exceptions(test_section['CATCH'], str(e), use_db) common/impala_test_suite.py:485: in __verify_exceptions     (expected_str, actual_str) E   AssertionError: Unexpected exception string. Expected: row_regex:.*Cannot perform hash join at node with id .*. Repartitioning did not reduce the size of a spilled partition.* E   Not found in actual: Timeout >7200s
> Stacktrace
> query_test/test_spilling.py:134: in test_spilling_no_debug_action
>     self.run_test_case('QueryTest/spilling-no-debug-action', vector)
> common/impala_test_suite.py:668: in run_test_case
>     self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
> common/impala_test_suite.py:485: in __verify_exceptions
>     (expected_str, actual_str)
> E   AssertionError: Unexpected exception string. Expected: row_regex:.*Cannot perform hash join at node with id .*. Repartitioning did not reduce the size of a spilled partition.*
> E   Not found in actual: Timeout >7200s
> Standard Error
> SET client_identifier=query_test/test_spilling.py::TestSpillingNoDebugActionDimensions::()::test_spilling_no_debug_action[protocol:beeswax|exec_option:{'mt_dop':0;'default_spillable_buffer_size':'64k'}|table_format:parquet/none];
> -- executing against localhost:21000
> use tpch_parquet;
> -- 2020-11-11 23:12:04,319 INFO     MainThread: Started query c740c1c66d9679a9:6a40f16100000000
> SET client_identifier=query_test/test_spilling.py::TestSpillingNoDebugActionDimensions::()::test_spilling_no_debug_action[protocol:beeswax|exec_option:{'mt_dop':0;'default_spillable_buffer_size':'64k'}|table_format:parquet/none];
> SET mt_dop=0;
> SET default_spillable_buffer_size=64k;
> -- 2020-11-11 23:12:04,320 INFO     MainThread: Loading query test file: /data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test
> -- 2020-11-11 23:12:04,323 INFO     MainThread: Starting new HTTP connection (1): localhost
> -- executing against localhost:21000
> set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
> -- 2020-11-11 23:12:04,377 INFO     MainThread: Started query c044afcf5ae44df9:a2e2e7c600000000
> -- executing against localhost:21000
> select straight_join count(*)
> from
> lineitem a, lineitem b
> where
> a.l_partkey = 1 and
> a.l_orderkey = b.l_orderkey;
> -- 2020-11-11 23:12:04,385 INFO     MainThread: Started query 314c019cd252f322:2411bc7600000000
> -- executing against localhost:21000
> SET DEBUG_ACTION="";
> -- 2020-11-11 23:12:05,199 INFO     MainThread: Started query 80424e68922c30f9:b2144dff00000000
> -- executing against localhost:21000
> set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
> -- 2020-11-11 23:12:05,207 INFO     MainThread: Started query 2a4c1f4b26ea52da:4339f3ff00000000
> -- executing against localhost:21000
> select straight_join count(*)
> from
> lineitem a
> where
> a.l_partkey not in (select l_partkey from lineitem where l_partkey > 10)
> and a.l_partkey < 1000;
> -- 2020-11-11 23:12:05,215 INFO     MainThread: Started query f845afd00a569446:79c5054a00000000
> -- executing against localhost:21000
> SET DEBUG_ACTION="";
> -- 2020-11-11 23:12:07,507 INFO     MainThread: Started query ee4f8a685928e7ef:830d965100000000
> -- executing against localhost:21000
> set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
> -- 2020-11-11 23:12:07,512 INFO     MainThread: Started query 654a6eced9594931:cc68289a00000000
> -- executing against localhost:21000
> select straight_join count(*)
> from
> supplier right outer join lineitem on s_suppkey = l_suppkey
> where s_acctbal > 0 and s_acctbal < 10;
> -- 2020-11-11 23:12:07,519 INFO     MainThread: Started query 7a41a406b446e082:b5e3bf2f00000000
> -- executing against localhost:21000
> SET DEBUG_ACTION="";
> -- 2020-11-11 23:12:08,549 INFO     MainThread: Started query 1445a2833895ee1a:d136681000000000
> -- executing against localhost:21000
> set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
> -- 2020-11-11 23:12:08,554 INFO     MainThread: Started query 4149ef276c643426:a285156400000000
> -- executing against localhost:21000
> select straight_join count(*)
> from
> supplier right outer join lineitem on s_suppkey = l_suppkey
> where s_acctbal > 0 and s_acctbal < 10;
> -- 2020-11-11 23:12:08,562 INFO     MainThread: Started query 58427e99dadb6ca9:0d184f2700000000
> -- executing against localhost:21000
> SET DEBUG_ACTION="";
> -- 2020-11-11 23:12:09,586 INFO     MainThread: Started query 1d498d1d50b86ed3:616ba0e600000000
> -- executing against localhost:21000
> set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
> -- 2020-11-11 23:12:09,592 INFO     MainThread: Started query 4543a32357f9b4cb:e3dbfab800000000
> -- executing against localhost:21000
> with x as (select * from supplier limit 10)
> select straight_join count(*)
> from
> x right anti join lineitem on s_suppkey + 100 = l_suppkey;
> -- 2020-11-11 23:12:09,599 INFO     MainThread: Started query 9e4ba8fd3dbbf2bd:d73094e700000000
> -- executing against localhost:21000
> SET DEBUG_ACTION="";
> -- 2020-11-11 23:12:10,320 INFO     MainThread: Started query 62471da179150bd1:ffa7dd7000000000
> -- executing against localhost:21000
> set mem_limit=75m;
> -- 2020-11-11 23:12:10,326 INFO     MainThread: Started query 8445afe55c819986:0915d3f400000000
> -- executing against localhost:21000
> select l_orderkey, group_concat(repeat(l_comment, 10)) comments
> from lineitem
> group by l_orderkey
> order by comments desc
> limit 5;
> -- 2020-11-11 23:12:10,332 INFO     MainThread: Started query d84ba6a837b98c8d:c7c73fd500000000
> -- executing against localhost:21000
> SET MEM_LIMIT="0";
> -- 2020-11-11 23:12:10,638 INFO     MainThread: Started query 58476b9f3b524901:bed8346900000000
> -- executing against localhost:21000
> set topn_bytes_limit=-1;
> -- 2020-11-11 23:12:10,642 INFO     MainThread: Started query 2742e503cfca610e:a4029e4800000000
> -- executing against localhost:21000
> set mem_limit=100m;
> -- 2020-11-11 23:12:10,648 INFO     MainThread: Started query 9a4f67b8c8a95465:67f8143100000000
> -- executing against localhost:21000
> select *
> from lineitem
> order by l_orderkey desc
> limit 6000000;
> -- 2020-11-11 23:12:10,654 INFO     MainThread: Started query 7147f6f62984fdb2:614cab6a00000000
> -- executing against localhost:21000
> SET TOPN_BYTES_LIMIT="536870912";
> -- 2020-11-11 23:12:10,859 INFO     MainThread: Started query 4642e316a7f90110:c8344b0d00000000
> -- executing against localhost:21000
> SET MEM_LIMIT="0";
> -- 2020-11-11 23:12:10,863 INFO     MainThread: Started query 414724e3a0a7f290:d1a0e69000000000
> -- executing against localhost:21000
> set mem_limit=250m;
> -- 2020-11-11 23:12:10,867 INFO     MainThread: Started query 854c0c0d1569c7f5:3635e5ae00000000
> -- executing against localhost:21000
> select straight_join *
> from supplier join /* +broadcast */ lineitem on s_suppkey = l_linenumber
> order by l_tax desc
> limit 5;
> -- 2020-11-11 23:12:10,873 INFO     MainThread: Started query de444e75ef44ee0b:a1614bfe00000000
> ~~~~~~~~~~~~~~~~~~~~~ Stack of <unknown> (140237730514688) ~~~~~~~~~~~~~~~~~~~~~
>   File "/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/execnet/gateway_base.py", line 277, in _perform_spawn
>     reply.run()
>   File "/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/execnet/gateway_base.py", line 213, in run
>     self._result = func(*args, **kwargs)
>   File "/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/execnet/gateway_base.py", line 954, in _thread_receiver
>     msg = Message.from_io(io)
>   File "/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/execnet/gateway_base.py", line 418, in from_io
>     header = io.read(9)  # type 1, channel 4, payload 4
>   File "/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/execnet/gateway_base.py", line 386, in read
>     data = self._read(numbytes-len(buf))
> -- executing against localhost:21000
> SET MEM_LIMIT="0";
> -- 2020-11-12 01:12:03,717 INFO     MainThread: Started query 8d4eac3249e55996:2b2053a700000000
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org