You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/05/19 01:37:00 UTC

[jira] [Commented] (IMPALA-10704) test_retry_query_result_cacheing_failed and test_retry_query_set_query_in_flight_failed are flaky

    [ https://issues.apache.org/jira/browse/IMPALA-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347242#comment-17347242 ] 

ASF subversion and git services commented on IMPALA-10704:
----------------------------------------------------------

Commit d111443e8f9692a3eac734e565a5afc41980a0ba in impala's branch refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d111443 ]

IMPALA-10704: Fix retried query id not being unregistered when retry fails

When query retry fails in RetryQueryFromThread(), the retried query id
may not be unregistered if the failure happens before we store the
retry_request_state. In this case, QueryDriver::Unregister() has no way
to get the retried query id so it's not deleted. Note that the retried
query id is registered in RetryQueryFromThread() so should be deleted
later. This finally results in a leak in the query driver map, where
queries in it are shown as in-flight queries.

test_retry_query_result_cacheing_failed and
test_retry_query_set_query_in_flight_failed (added in IMPALA-10413)
asserts one in-flight query at the end. This is satisfied by the leak.
Instead, we should verify no running queries at the end.

This patch adds a new field in QueryDriver to remember the registered
retry query id as a backup way for getting it when query retry fails
before we store the ClientRequestState of the retried query (so
retried_client_request_state_ is null).

Tests:
 - Run test_retry_query_result_cacheing_failed and
   test_retry_query_set_query_in_flight_failed 100 times.

Change-Id: I074526799d68041a425b2379e74f8d8b45ce892a
Reviewed-on: http://gerrit.cloudera.org:8080/17465
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> test_retry_query_result_cacheing_failed and test_retry_query_set_query_in_flight_failed are flaky
> -------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-10704
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10704
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>         Attachments: test_retry_query_result_cacheing_failed.png
>
>
> These two tests are added in IMPALA-10413. Saw the falures in
>  * https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/13844/
>  * https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/13878/
> The failures are
> {code}
> tests/custom_cluster/test_query_retries.py:761: in test_retry_query_result_cacheing_failed
>     lambda: self.cluster.get_first_impalad().service.get_num_in_flight_queries() == 1)
> tests/common/impala_test_suite.py:1128: in assert_eventually
>     count, timeout_s, error_msg_str))
> E   Timeout: Check failed to return True after 0 tries and 60 seconds
> tests/custom_cluster/test_query_retries.py:775: in test_retry_query_set_query_in_flight_failed
>     lambda: self.cluster.get_first_impalad().service.get_num_in_flight_queries() == 1)
> tests/common/impala_test_suite.py:1128: in assert_eventually
>     count, timeout_s, error_msg_str))
> E   Timeout: Check failed to return True after 0 tries and 60 seconds
> {code}
> Another problem is, when manually ran them locally, found that the original query is hanging in RETRYING state. See attached screenshot.
> The test codes are problematic, it only expects one query running but not expecting its finish:
> {code:python}
>   @pytest.mark.execute_serially
>   @CustomClusterTestSuite.with_args(
>       impalad_args="--debug_actions=QUERY_RETRY_SET_RESULT_CACHE:FAIL",
>       statestored_args="--statestore_heartbeat_frequency_ms=60000")
>   def test_retry_query_result_cacheing_failed(self):
>     """Test setting up results cacheing failed."""
>     self.cluster.impalads[1].kill()
>     query = "select count(*) from tpch_parquet.lineitem"
>     self.hs2_client.set_configuration({'retry_failed_queries': 'true'})
>     self.hs2_client.set_configuration_option('impala.resultset.cache.size', '1024')
>     self.hs2_client.execute_async(query)
>     self.assert_eventually(60, 0.1, 
>         lambda: self.cluster.get_first_impalad().service.get_num_in_flight_queries() == 1)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org