You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2021/03/30 02:37:00 UTC

[jira] [Created] (IMPALA-10621) TestSpillingDebugActionDimensions.test_spilling_large_rows hanging

Quanlong Huang created IMPALA-10621:
---------------------------------------

             Summary: TestSpillingDebugActionDimensions.test_spilling_large_rows hanging
                 Key: IMPALA-10621
                 URL: https://issues.apache.org/jira/browse/IMPALA-10621
             Project: IMPALA
          Issue Type: Bug
            Reporter: Quanlong Huang


Saw a possible haning in test_spilling_large_rows in a downstream exhaustive release build. The query didn't finish in 12mins:
{code:java}
query_test/test_spilling.py:83: in test_spilling_large_rows
    self.run_test_case('QueryTest/spilling-large-rows', vector, unique_database)
common/impala_test_suite.py:665: in run_test_case
    result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:603: in __exec_in_impala
    result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:923: in __execute_query
    return impalad_client.execute(query, user=user)
common/impala_connection.py:205: in execute
    return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:205: in execute
    result = self.fetch_results(query_string, handle)
beeswax/impala_beeswax.py:452: in fetch_results
    exec_result = self.__fetch_results(query_handle, max_rows)
beeswax/impala_beeswax.py:463: in __fetch_results
    results = self.__do_rpc(lambda: self.imp_service.fetch(handle, False, fetch_rows))
beeswax/impala_beeswax.py:520: in __do_rpc
    raise ImpalaBeeswaxException(self.__build_error_message(b), b)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
E    INNER EXCEPTION: <class 'beeswaxd.ttypes.BeeswaxException'>
E    MESSAGE: Query b94d5cd2894cbe68:68acfd6300000000 cancelled due to unresponsive backend: 127.0.0.1:27000 has not sent a report in 753530ms (max allowed lag is 720000ms) {code}
impalad.INFO indicates it's waiting for the coordinator fragment to finish:
{code:java}
I0325 23:54:16.146754 16521 Frontend.java:1600] b94d5cd2894cbe68:68acfd6300000000] Analyzing query: select group_concat(string_col), length(bigstr) from bigstrs2
group by bigstr db: test_spilling_large_rows_6df9f183
I0325 23:54:16.147089 16521 BaseAuthorizationChecker.java:110] b94d5cd2894cbe68:68acfd6300000000] Authorization check took 0 ms
I0325 23:54:16.147109 16521 Frontend.java:1643] b94d5cd2894cbe68:68acfd6300000000] Analysis and authorization finished.
I0325 23:54:16.148593 20605 admission-controller.cc:1747] b94d5cd2894cbe68:68acfd6300000000] Trying to admit id=b94d5cd2894cbe68:68acfd6300000000 in pool_name=default-pool executor_group_name=default per_host_mem_estimate=309.98 MB dedicated_coord_mem_estimate=134.99 MB max_requests=-1 max_queued=200 max_mem=-1.00 B
I0325 23:54:16.148609 20605 admission-controller.cc:1755] b94d5cd2894cbe68:68acfd6300000000] Stats: agg_num_running=0, agg_num_queued=0, agg_mem_reserved=1.00 GB,  local_host(local_mem_admitted=0, num_admitted_running=0, num_queued=0, backend_mem_reserved=0, topN_query_stats: queries=[3e42ce0b92a72227:b4c127a000000000, 0f472bfdce173e65:1920e85d00000000, cc48be51cfecde3a:c93d09f600000000], total_mem_consumed=0; pool_level_stats: num_running=3, min=0, max=0, pool_total_mem=0, average_per_query=0)
I0325 23:54:16.148628 20605 admission-controller.cc:1215] b94d5cd2894cbe68:68acfd6300000000] Admitting query id=b94d5cd2894cbe68:68acfd6300000000
I0325 23:54:16.148648 20605 impala-server.cc:2150] b94d5cd2894cbe68:68acfd6300000000] Registering query locations
I0325 23:54:16.148655 20605 coordinator.cc:150] b94d5cd2894cbe68:68acfd6300000000] Exec() query_id=b94d5cd2894cbe68:68acfd6300000000 stmt=select group_concat(string_col), length(bigstr) from bigstrs2
group by bigstr
I0325 23:54:16.148877 20605 coordinator.cc:474] b94d5cd2894cbe68:68acfd6300000000] starting execution on 2 backends for query_id=b94d5cd2894cbe68:68acfd6300000000
I0325 23:54:16.149272 27701 control-service.cc:148] b94d5cd2894cbe68:68acfd6300000000] ExecQueryFInstances(): query_id=b94d5cd2894cbe68:68acfd6300000000 coord=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000 #instances=3
I0325 23:54:16.149561 20605 coordinator.cc:533] b94d5cd2894cbe68:68acfd6300000000] started execution on 2 backends for query_id=b94d5cd2894cbe68:68acfd6300000000
I0325 23:54:16.149663 20609 query-state.cc:948] b94d5cd2894cbe68:68acfd6300000002] Executing instance. instance_id=b94d5cd2894cbe68:68acfd6300000002 fragment_idx=2 per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=1
I0325 23:54:16.149741 20611 query-state.cc:948] b94d5cd2894cbe68:68acfd6300000000] Executing instance. instance_id=b94d5cd2894cbe68:68acfd6300000000 fragment_idx=0 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2
I0325 23:54:16.149821 20613 query-state.cc:948] b94d5cd2894cbe68:68acfd6300000004] Executing instance. instance_id=b94d5cd2894cbe68:68acfd6300000004 fragment_idx=1 per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=3
I0325 23:54:16.795727 20609 query-state.cc:957] b94d5cd2894cbe68:68acfd6300000002] Instance completed. instance_id=b94d5cd2894cbe68:68acfd6300000002 #in-flight=2 status=OK
I0325 23:54:16.817301 20613 krpc-data-stream-mgr.cc:308] b94d5cd2894cbe68:68acfd6300000004] DeregisterRecvr(): fragment_instance_id=b94d5cd2894cbe68:68acfd6300000004, node=2
I0325 23:54:16.854861 16521 impala-beeswax-server.cc:215] get_results_metadata(): query_id=b94d5cd2894cbe68:68acfd6300000000
I0325 23:54:16.908766 27701 coordinator.cc:1027] Backend completed: host=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27001 remaining=2 query_id=b94d5cd2894cbe68:68acfd6300000000
I0325 23:54:16.908782 27701 coordinator-backend-state.cc:371] query_id=b94d5cd2894cbe68:68acfd6300000000: first in-progress backend: impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000
I0325 23:54:16.968387 20613 query-state.cc:957] b94d5cd2894cbe68:68acfd6300000004] Instance completed. instance_id=b94d5cd2894cbe68:68acfd6300000004 #in-flight=1 status=OK
I0325 23:54:17.001765 20611 krpc-data-stream-mgr.cc:308] b94d5cd2894cbe68:68acfd6300000000] DeregisterRecvr(): fragment_instance_id=b94d5cd2894cbe68:68acfd6300000000, node=4
I0325 23:54:17.001814 16521 coordinator.cc:706] ExecState: query id=b94d5cd2894cbe68:68acfd6300000000 execution completed
I0325 23:54:17.001832 16521 coordinator.cc:878] Coordinator waiting for backends to finish, 1 remaining. query_id=b94d5cd2894cbe68:68acfd6300000000{code}
After 12mins, it reaches the timeout limit:
{code:java}
I0326 00:06:54.688612 28694 status.cc:82] Query b94d5cd2894cbe68:68acfd6300000000 cancelled due to unresponsive backend: 127.0.0.1:27000 has not sent a report in 753530ms (max allowed lag is 720000ms)
    @           0xc93371
    @          0x1398edc
    @          0x138fd14
    @          0x1596d91
    @          0x159820a
    @          0x1e20f11
    @     0x7f57d79ede24
    @     0x7f57d443734c
I0326 00:06:54.688832 28683 impala-server.cc:1875] CancelFromThreadPool(): cancelling query_id=b94d5cd2894cbe68:68acfd6300000000
I0326 00:06:54.688863 28683 coordinator-backend-state.cc:974] query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27000: Sending CancelQueryFInstances rpc
I0326 00:06:54.688975 27708 control-service.cc:219] CancelQueryFInstances(): query_id=b94d5cd2894cbe68:68acfd6300000000
I0326 00:06:54.688988 27708 query-exec-mgr.cc:126] QueryState: query_id=b94d5cd2894cbe68:68acfd6300000000 refcnt=4
I0326 00:06:54.688993 27708 query-state.cc:974] Cancel: query_id=b94d5cd2894cbe68:68acfd6300000000
I0326 00:06:54.689061 28683 coordinator-backend-state.cc:974] query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27001: Not cancelling because the backend is already done:
I0326 00:06:54.689071 28683 coordinator.cc:994] CancelBackends() query_id=b94d5cd2894cbe68:68acfd6300000000, tried to cancel 1 backends
I0326 00:06:54.689091 16521 coordinator.cc:1370] Release admission control resources for query_id=b94d5cd2894cbe68:68acfd6300000000
I0326 00:06:54.689462 16521 impala-server.cc:1408] UnregisterQuery(): query_id=b94d5cd2894cbe68:68acfd6300000000
I0326 00:06:54.689471 16521 coordinator-backend-state.cc:974] query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27000: Not cancelling because the backend is already done: Cancelled
I0326 00:06:54.689476 16521 coordinator-backend-state.cc:974] query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27001: Not cancelling because the backend is already done:
I0326 00:06:54.689479 16521 coordinator.cc:994] CancelBackends() query_id=b94d5cd2894cbe68:68acfd6300000000, tried to cancel 0 backends
{code}
This seems similar to IMPALA-10339.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org