You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Wenzhe Zhou (Jira)" <ji...@apache.org> on 2021/03/29 06:51:00 UTC

[jira] [Commented] (IMPALA-10339) Apparent hang or crash in TestSpillingNoDebugActionDimensions.test_spilling_no_debug_action

    [ https://issues.apache.org/jira/browse/IMPALA-10339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310452#comment-17310452 ] 

Wenzhe Zhou commented on IMPALA-10339:
--------------------------------------

Saw this test failure in upstream exhaustive release build. But did not see DCHECK failure.

Here are the log messages for the query:

I0327 12:57:36.536957 84973 impala-server.cc:1267] Found local timezone "Universal".
I0327 12:57:36.539453 84973 impala-server.cc:1324] 30408ca68c884cb5:cd01ce9900000000] Registered query query_id=30408ca68c884cb5:
cd01ce9900000000 session_id=454a70d96fb68f97:1eab40aaba24b7ab
I0327 12:57:36.539530 84973 Frontend.java:1600] 30408ca68c884cb5:cd01ce9900000000] Analyzing query: select straight_join *
from supplier join /* +broadcast */ lineitem on s_suppkey = l_linenumber
order by l_tax desc
limit 5 db: tpch_parquet
I0327 12:57:36.540146 84973 BaseAuthorizationChecker.java:110] 30408ca68c884cb5:cd01ce9900000000] Authorization check took 0 ms
I0327 12:57:36.540176 84973 Frontend.java:1643] 30408ca68c884cb5:cd01ce9900000000] Analysis and authorization finished.
I0327 12:57:36.542649 89961 admission-controller.cc:1747] 30408ca68c884cb5:cd01ce9900000000] Trying to admit id=30408ca68c884cb5:cd01ce9900000000 in pool_name=default-pool executor_group_name=default per_host_mem_estimate=3.05 GB dedicated_coord_mem_estimate=103.02 MB max_requests=-1 max_queued=200 max_mem=-1.00 B
I0327 12:57:36.542666 89961 admission-controller.cc:1755] 30408ca68c884cb5:cd01ce9900000000] Stats: agg_num_running=0, agg_num_queued=0, agg_mem_reserved=300.00 MB, local_host(local_mem_admitted=0, num_admitted_running=0, num_queued=0, backend_mem_reserved=100.00 MB, topN_query_stats: queries=[624b8dae6e0a68f5:837bd0c300000000], total_mem_consumed=66.04 MB, fraction_of_pool_total_mem=1; pool_level_stats: num_running=1, min=66.04 MB, max=66.04 MB, pool_total_mem=66.04 MB, average_per_query=66.04 MB)
I0327 12:57:36.542691 89961 admission-controller.cc:1215] 30408ca68c884cb5:cd01ce9900000000] Admitting query id=30408ca68c884cb5:cd01ce9900000000
I0327 12:57:36.542716 89961 impala-server.cc:2150] 30408ca68c884cb5:cd01ce9900000000] Registering query locations
I0327 12:57:36.542724 89961 coordinator.cc:150] 30408ca68c884cb5:cd01ce9900000000] Exec() query_id=30408ca68c884cb5:cd01ce9900000000 stmt=select straight_join *
from supplier join /* +broadcast */ lineitem on s_suppkey = l_linenumber
order by l_tax desc
limit 5
I0327 12:57:36.542918 89961 coordinator.cc:474] 30408ca68c884cb5:cd01ce9900000000] starting execution on 3 backends for query_id=30408ca68c884cb5:cd01ce9900000000
I0327 12:57:36.543314 110481 control-service.cc:148] 30408ca68c884cb5:cd01ce9900000000] ExecQueryFInstances(): query_id=30408ca68c884cb5:cd01ce9900000000 coord=ip-172-31-45-110:27000 #instances=2
I0327 12:57:36.543737 89966 query-state.cc:948] 30408ca68c884cb5:cd01ce9900000002] Executing instance. instance_id=30408ca68c884cb5:cd01ce9900000002 fragment_idx=2 per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=1
I0327 12:57:36.543826 89967 query-state.cc:948] 30408ca68c884cb5:cd01ce9900000000] Executing instance. instance_id=30408ca68c884cb5:cd01ce9900000000 fragment_idx=0 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2
I0327 12:57:36.543896 89961 coordinator.cc:533] 30408ca68c884cb5:cd01ce9900000000] started execution on 3 backends for query_id=30408ca68c884cb5:cd01ce9900000000
I0327 12:57:37.818672 110514 krpc-data-stream-mgr.cc:427] Reduced stream ID cache from 1711 items, to 1672, eviction took: 0
I0327 12:57:39.287169 89966 query-state.cc:957] 30408ca68c884cb5:cd01ce9900000002] Instance completed. instance_id=30408ca68c884cb5:cd01ce9900000002 #in-flight=1 status=OK
I0327 12:57:39.654379 110481 coordinator.cc:1027] Backend completed: host=ip-172-31-45-110:27001 remaining=3 query_id=30408ca68c884cb5:cd01ce9900000000
I0327 12:57:39.654438 110481 coordinator-backend-state.cc:371] query_id=30408ca68c884cb5:cd01ce9900000000: first in-progress backend: ip-172-31-45-110:27000

I0327 12:57:41.559049 110480 coordinator.cc:1027] Backend completed: host=ip-172-31-45-110:27002 remaining=2 query_id=30408ca68c884cb5:cd01ce9900000000
I0327 12:57:41.559067 110480 coordinator-backend-state.cc:371] query_id=30408ca68c884cb5:cd01ce9900000000: first in-progress backend: ip-172-31-45-110:27000

 

I0327 14:57:39.607128 70664 admission-controller.cc:1755] c345af709d791f23:c8c25d4900000000] Stats: agg_num_running=1, agg_num_qu
eued=0, agg_mem_reserved=750.00 MB, local_host(local_mem_admitted=250.00 MB, num_admitted_running=1, num_queued=0, backend_mem_r
eserved=500.00 MB, topN_query_stats: queries=[30408ca68c884cb5:cd01ce9900000000, cc46768581aaf7c5:b78131d600000000], total_mem_co
nsumed=2.02 MB, fraction_of_pool_total_mem=1; pool_level_stats: num_running=2, min=1.01 MB, max=1.01 MB, pool_total_mem=2.02 MB, average_per_query=1.01 MB)

 

I0327 14:57:40.631008 84973 impala-server.cc:1408] UnregisterQuery(): query_id=30408ca68c884cb5:cd01ce9900000000
I0327 14:57:40.631021 84973 coordinator.cc:706] ExecState: query id=30408ca68c884cb5:cd01ce9900000000 execution cancelled
I0327 14:57:40.631072 84973 coordinator-backend-state.cc:974] query_id=30408ca68c884cb5:cd01ce9900000000 target backend=127.0.0.1
:27000: Sending CancelQueryFInstances rpc
I0327 14:57:40.631183 110480 control-service.cc:219] CancelQueryFInstances(): query_id=30408ca68c884cb5:cd01ce9900000000
I0327 14:57:40.631191 110480 query-exec-mgr.cc:126] QueryState: query_id=30408ca68c884cb5:cd01ce9900000000 refcnt=4
I0327 14:57:40.631196 110480 query-state.cc:974] Cancel: query_id=30408ca68c884cb5:cd01ce9900000000
I0327 14:57:40.631202 110480 krpc-data-stream-mgr.cc:339] cancelling active streams for query_id=30408ca68c884cb5:cd01ce9900000000


I0327 14:57:40.631266 89967 krpc-data-stream-mgr.cc:308] 30408ca68c884cb5:cd01ce9900000000] DeregisterRecvr(): fragment_instance_
id=30408ca68c884cb5:cd01ce9900000000, node=5
I0327 14:57:40.631292 84973 coordinator-backend-state.cc:974] query_id=30408ca68c884cb5:cd01ce9900000000 target backend=127.0.0.1
:27002: Not cancelling because the backend is already done: 
I0327 14:57:40.631312 84973 coordinator-backend-state.cc:974] query_id=30408ca68c884cb5:cd01ce9900000000 target backend=127.0.0.1
:27001: Not cancelling because the backend is already done: 
I0327 14:57:40.631319 84973 coordinator.cc:994] CancelBackends() query_id=30408ca68c884cb5:cd01ce9900000000, tried to cancel 1 ba
ckends
I0327 14:57:40.631317 89967 query-state.cc:957] 30408ca68c884cb5:cd01ce9900000000] Instance completed. instance_id=30408ca68c884c
b5:cd01ce9900000000 #in-flight=0 status=CANCELLED: Cancelled
I0327 14:57:40.631340 84973 coordinator.cc:1370] Release admission control resources for query_id=30408ca68c884cb5:cd01ce99000000
00
I0327 14:57:40.631336 89963 query-state.cc:468] 30408ca68c884cb5:cd01ce9900000000] UpdateBackendExecState(): last report for 3040
8ca68c884cb5:cd01ce9900000000

 

I0327 14:57:40.635918 110480 impala-server.cc:1571] Invalid or unknown query handle: 30408ca68c884cb5:cd01ce9900000000.
I0327 14:57:40.635933 110480 control-service.cc:179] ReportExecStatus(): Received report for unknown query ID (probably closed or
 cancelled): 30408ca68c884cb5:cd01ce9900000000 remote host=127.0.0.1:57762
I0327 14:57:40.636023 89963 query-state.cc:738] 30408ca68c884cb5:cd01ce9900000000] Cancelling fragment instances as directed by t
he coordinator. Returned status: ReportExecStatus(): Received report for unknown query ID (probably closed or cancelled): 30408ca
68c884cb5:cd01ce9900000000 remote host=127.0.0.1:57762
I0327 14:57:40.636039 89963 query-state.cc:974] 30408ca68c884cb5:cd01ce9900000000] Cancel: query_id=30408ca68c884cb5:cd01ce990000
0000
I0327 14:57:40.636209 111580 impala-server.cc:1440] Query successfully unregistered: query_id=30408ca68c884cb5:cd01ce9900000000


I0327 14:57:40.640755 89963 query-exec-mgr.cc:213] 30408ca68c884cb5:cd01ce9900000000] ReleaseQueryState(): deleted query_id=30408
ca68c884cb5:cd01ce9900000000

 

> Apparent hang or crash in TestSpillingNoDebugActionDimensions.test_spilling_no_debug_action
> -------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-10339
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10339
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.0
>            Reporter: Tim Armstrong
>            Assignee: Wenzhe Zhou
>            Priority: Blocker
>              Labels: broken-build, flaky, hang
>
> Release build with this commit as the tip:
> {noformat}
> commit 9400e9b17b13f613defb6d7b9deb471256b1d95c (CDH/cdpd-master-staging)
> Author: wzhou-code <wz...@cloudera.com>
> Date:   Thu Oct 29 22:32:03 2020 -0700
>     IMPALA-10305: Sync Kudu's FIPS compliant changes
>     
> {noformat}
> {noformat}
> Regression
> query_test.test_spilling.TestSpillingNoDebugActionDimensions.test_spilling_no_debug_action[protocol: beeswax | exec_option: {'mt_dop': 0, 'default_spillable_buffer_size': '64k'} | table_format: parquet/none] (from pytest)
> Failing for the past 1 build (Since Failed#100 )
> Took 1 hr 59 min.
> add description
> Error Message
> query_test/test_spilling.py:134: in test_spilling_no_debug_action     self.run_test_case('QueryTest/spilling-no-debug-action', vector) common/impala_test_suite.py:668: in run_test_case     self.__verify_exceptions(test_section['CATCH'], str(e), use_db) common/impala_test_suite.py:485: in __verify_exceptions     (expected_str, actual_str) E   AssertionError: Unexpected exception string. Expected: row_regex:.*Cannot perform hash join at node with id .*. Repartitioning did not reduce the size of a spilled partition.* E   Not found in actual: Timeout >7200s
> Stacktrace
> query_test/test_spilling.py:134: in test_spilling_no_debug_action
>     self.run_test_case('QueryTest/spilling-no-debug-action', vector)
> common/impala_test_suite.py:668: in run_test_case
>     self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
> common/impala_test_suite.py:485: in __verify_exceptions
>     (expected_str, actual_str)
> E   AssertionError: Unexpected exception string. Expected: row_regex:.*Cannot perform hash join at node with id .*. Repartitioning did not reduce the size of a spilled partition.*
> E   Not found in actual: Timeout >7200s
> Standard Error
> SET client_identifier=query_test/test_spilling.py::TestSpillingNoDebugActionDimensions::()::test_spilling_no_debug_action[protocol:beeswax|exec_option:{'mt_dop':0;'default_spillable_buffer_size':'64k'}|table_format:parquet/none];
> -- executing against localhost:21000
> use tpch_parquet;
> -- 2020-11-11 23:12:04,319 INFO     MainThread: Started query c740c1c66d9679a9:6a40f16100000000
> SET client_identifier=query_test/test_spilling.py::TestSpillingNoDebugActionDimensions::()::test_spilling_no_debug_action[protocol:beeswax|exec_option:{'mt_dop':0;'default_spillable_buffer_size':'64k'}|table_format:parquet/none];
> SET mt_dop=0;
> SET default_spillable_buffer_size=64k;
> -- 2020-11-11 23:12:04,320 INFO     MainThread: Loading query test file: /data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test
> -- 2020-11-11 23:12:04,323 INFO     MainThread: Starting new HTTP connection (1): localhost
> -- executing against localhost:21000
> set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
> -- 2020-11-11 23:12:04,377 INFO     MainThread: Started query c044afcf5ae44df9:a2e2e7c600000000
> -- executing against localhost:21000
> select straight_join count(*)
> from
> lineitem a, lineitem b
> where
> a.l_partkey = 1 and
> a.l_orderkey = b.l_orderkey;
> -- 2020-11-11 23:12:04,385 INFO     MainThread: Started query 314c019cd252f322:2411bc7600000000
> -- executing against localhost:21000
> SET DEBUG_ACTION="";
> -- 2020-11-11 23:12:05,199 INFO     MainThread: Started query 80424e68922c30f9:b2144dff00000000
> -- executing against localhost:21000
> set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
> -- 2020-11-11 23:12:05,207 INFO     MainThread: Started query 2a4c1f4b26ea52da:4339f3ff00000000
> -- executing against localhost:21000
> select straight_join count(*)
> from
> lineitem a
> where
> a.l_partkey not in (select l_partkey from lineitem where l_partkey > 10)
> and a.l_partkey < 1000;
> -- 2020-11-11 23:12:05,215 INFO     MainThread: Started query f845afd00a569446:79c5054a00000000
> -- executing against localhost:21000
> SET DEBUG_ACTION="";
> -- 2020-11-11 23:12:07,507 INFO     MainThread: Started query ee4f8a685928e7ef:830d965100000000
> -- executing against localhost:21000
> set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
> -- 2020-11-11 23:12:07,512 INFO     MainThread: Started query 654a6eced9594931:cc68289a00000000
> -- executing against localhost:21000
> select straight_join count(*)
> from
> supplier right outer join lineitem on s_suppkey = l_suppkey
> where s_acctbal > 0 and s_acctbal < 10;
> -- 2020-11-11 23:12:07,519 INFO     MainThread: Started query 7a41a406b446e082:b5e3bf2f00000000
> -- executing against localhost:21000
> SET DEBUG_ACTION="";
> -- 2020-11-11 23:12:08,549 INFO     MainThread: Started query 1445a2833895ee1a:d136681000000000
> -- executing against localhost:21000
> set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
> -- 2020-11-11 23:12:08,554 INFO     MainThread: Started query 4149ef276c643426:a285156400000000
> -- executing against localhost:21000
> select straight_join count(*)
> from
> supplier right outer join lineitem on s_suppkey = l_suppkey
> where s_acctbal > 0 and s_acctbal < 10;
> -- 2020-11-11 23:12:08,562 INFO     MainThread: Started query 58427e99dadb6ca9:0d184f2700000000
> -- executing against localhost:21000
> SET DEBUG_ACTION="";
> -- 2020-11-11 23:12:09,586 INFO     MainThread: Started query 1d498d1d50b86ed3:616ba0e600000000
> -- executing against localhost:21000
> set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
> -- 2020-11-11 23:12:09,592 INFO     MainThread: Started query 4543a32357f9b4cb:e3dbfab800000000
> -- executing against localhost:21000
> with x as (select * from supplier limit 10)
> select straight_join count(*)
> from
> x right anti join lineitem on s_suppkey + 100 = l_suppkey;
> -- 2020-11-11 23:12:09,599 INFO     MainThread: Started query 9e4ba8fd3dbbf2bd:d73094e700000000
> -- executing against localhost:21000
> SET DEBUG_ACTION="";
> -- 2020-11-11 23:12:10,320 INFO     MainThread: Started query 62471da179150bd1:ffa7dd7000000000
> -- executing against localhost:21000
> set mem_limit=75m;
> -- 2020-11-11 23:12:10,326 INFO     MainThread: Started query 8445afe55c819986:0915d3f400000000
> -- executing against localhost:21000
> select l_orderkey, group_concat(repeat(l_comment, 10)) comments
> from lineitem
> group by l_orderkey
> order by comments desc
> limit 5;
> -- 2020-11-11 23:12:10,332 INFO     MainThread: Started query d84ba6a837b98c8d:c7c73fd500000000
> -- executing against localhost:21000
> SET MEM_LIMIT="0";
> -- 2020-11-11 23:12:10,638 INFO     MainThread: Started query 58476b9f3b524901:bed8346900000000
> -- executing against localhost:21000
> set topn_bytes_limit=-1;
> -- 2020-11-11 23:12:10,642 INFO     MainThread: Started query 2742e503cfca610e:a4029e4800000000
> -- executing against localhost:21000
> set mem_limit=100m;
> -- 2020-11-11 23:12:10,648 INFO     MainThread: Started query 9a4f67b8c8a95465:67f8143100000000
> -- executing against localhost:21000
> select *
> from lineitem
> order by l_orderkey desc
> limit 6000000;
> -- 2020-11-11 23:12:10,654 INFO     MainThread: Started query 7147f6f62984fdb2:614cab6a00000000
> -- executing against localhost:21000
> SET TOPN_BYTES_LIMIT="536870912";
> -- 2020-11-11 23:12:10,859 INFO     MainThread: Started query 4642e316a7f90110:c8344b0d00000000
> -- executing against localhost:21000
> SET MEM_LIMIT="0";
> -- 2020-11-11 23:12:10,863 INFO     MainThread: Started query 414724e3a0a7f290:d1a0e69000000000
> -- executing against localhost:21000
> set mem_limit=250m;
> -- 2020-11-11 23:12:10,867 INFO     MainThread: Started query 854c0c0d1569c7f5:3635e5ae00000000
> -- executing against localhost:21000
> select straight_join *
> from supplier join /* +broadcast */ lineitem on s_suppkey = l_linenumber
> order by l_tax desc
> limit 5;
> -- 2020-11-11 23:12:10,873 INFO     MainThread: Started query de444e75ef44ee0b:a1614bfe00000000
> ~~~~~~~~~~~~~~~~~~~~~ Stack of <unknown> (140237730514688) ~~~~~~~~~~~~~~~~~~~~~
>   File "/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/execnet/gateway_base.py", line 277, in _perform_spawn
>     reply.run()
>   File "/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/execnet/gateway_base.py", line 213, in run
>     self._result = func(*args, **kwargs)
>   File "/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/execnet/gateway_base.py", line 954, in _thread_receiver
>     msg = Message.from_io(io)
>   File "/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/execnet/gateway_base.py", line 418, in from_io
>     header = io.read(9)  # type 1, channel 4, payload 4
>   File "/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/execnet/gateway_base.py", line 386, in read
>     data = self._read(numbytes-len(buf))
> -- executing against localhost:21000
> SET MEM_LIMIT="0";
> -- 2020-11-12 01:12:03,717 INFO     MainThread: Started query 8d4eac3249e55996:2b2053a700000000
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org