You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Fang-Yu Rao (Jira)" <ji...@apache.org> on 2023/06/23 05:35:00 UTC

[jira] [Updated] (IMPALA-12235) test_multiple_coordinator() failed because _start_impala_cluster() returned non-zero exit status

     [ https://issues.apache.org/jira/browse/IMPALA-12235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Fang-Yu Rao updated IMPALA-12235:
---------------------------------
    Description: 
We found that test_multiple_coordinator() could fail because [_start_impala_cluster()|https://github.com/apache/impala/blame/master/tests/common/custom_cluster_test_suite.py#L283] returned non-zero exit status. test_multiple_coordinator() calls test_multiple_coordinator() at https://github.com/apache/impala/blame/master/tests/custom_cluster/test_coordinators.py#L41C10-L41C31.

*Error Message*
{code:java}
CalledProcessError: Command '['/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py', '--state_store_args=--statestore_update_frequency_ms=50     --statestore_priority_update_frequency_ms=50     --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', '--num_coordinators=2', '--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests', '--log_level=1', '--impalad_args=--default_query_options=']' returned non-zero exit status 1
{code}
*Stacktrace*
{code:java}
custom_cluster/test_coordinators.py:41: in test_multiple_coordinators
    self._start_impala_cluster([], num_coordinators=2, cluster_size=3)
common/custom_cluster_test_suite.py:330: in _start_impala_cluster
    check_call(cmd + options, close_fds=True)
/data/jenkins/workspace/impala-asf-master-core-erasure-coding/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/subprocess.py:190: in check_call
    raise CalledProcessError(retcode, cmd)
E   CalledProcessError: Command '['/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py', '--state_store_args=--statestore_update_frequency_ms=50     --statestore_priority_update_frequency_ms=50     --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', '--num_coordinators=2', '--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests', '--log_level=1', '--impalad_args=--default_query_options=']' returned non-zero exit status 1
{code}

The following console output shows that 'num_known_live_backends' could not reach 3 in 4 mins and thus the command that starts the cluster failed with non-zero exit status.
{code}
-- 2023-06-21 20:54:40,594 INFO     MainThread: Starting cluster with command: /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py '--state_store_args=--statestore_update_frequency_ms=50     --statestore_priority_update_frequency_ms=50     --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=2 --log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests --log_level=1 --impalad_args=--default_query_options=
20:54:41 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
20:54:41 MainThread: Starting State Store logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/statestored.INFO
20:54:42 MainThread: Starting Catalog Service logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
20:54:43 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad.INFO
20:54:43 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
20:54:43 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
20:54:46 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
20:54:46 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
20:54:46 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25000
20:54:46 MainThread: Waiting for num_known_live_backends=3. Current value: 1
20:54:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
20:54:47 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25000
20:54:47 MainThread: Waiting for num_known_live_backends=3. Current value: 1
20:54:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
20:54:48 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25000
20:54:48 MainThread: num_known_live_backends has reached value: 3
20:54:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
20:54:48 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25001
20:54:48 MainThread: Waiting for num_known_live_backends=3. Current value: 2
...
20:58:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
20:58:48 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25001
20:58:48 MainThread: Waiting for num_known_live_backends=3. Current value: 2
20:58:49 MainThread: Error starting cluster
Traceback (most recent call last):
  File "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py", line 931, in <module>
    expected_cluster_size - expected_catalog_delays)
  File "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/tests/common/impala_cluster.py", line 205, in wait_until_ready
    early_abort_fn=check_processes_still_running)
  File "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/tests/common/impala_service.py", line 374, in wait_for_num_known_live_backends
    assert 0, 'num_known_live_backends did not reach expected value in time'
AssertionError: num_known_live_backends did not reach expected value in time
-- 2023-06-21 20:58:49,141 DEBUG    MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
{code}

  was:
We found that test_multiple_coordinator() could fail because [_start_impala_cluster()|https://github.com/apache/impala/blame/master/tests/common/custom_cluster_test_suite.py#L283] returned non-zero exit status. test_multiple_coordinator() calls test_multiple_coordinator() at https://github.com/apache/impala/blame/master/tests/custom_cluster/test_coordinators.py#L41C10-L41C31.

*Error Message*
{code:java}
CalledProcessError: Command '['/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py', '--state_store_args=--statestore_update_frequency_ms=50     --statestore_priority_update_frequency_ms=50     --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', '--num_coordinators=2', '--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests', '--log_level=1', '--impalad_args=--default_query_options=']' returned non-zero exit status 1
{code}
*Stacktrace*
{code:java}
custom_cluster/test_coordinators.py:41: in test_multiple_coordinators
    self._start_impala_cluster([], num_coordinators=2, cluster_size=3)
common/custom_cluster_test_suite.py:330: in _start_impala_cluster
    check_call(cmd + options, close_fds=True)
/data/jenkins/workspace/impala-asf-master-core-erasure-coding/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/subprocess.py:190: in check_call
    raise CalledProcessError(retcode, cmd)
E   CalledProcessError: Command '['/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py', '--state_store_args=--statestore_update_frequency_ms=50     --statestore_priority_update_frequency_ms=50     --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', '--num_coordinators=2', '--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests', '--log_level=1', '--impalad_args=--default_query_options=']' returned non-zero exit status 1
{code}


> test_multiple_coordinator() failed because _start_impala_cluster() returned non-zero exit status
> ------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-12235
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12235
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Fang-Yu Rao
>            Assignee: Wenzhe Zhou
>            Priority: Major
>              Labels: broken-build
>
> We found that test_multiple_coordinator() could fail because [_start_impala_cluster()|https://github.com/apache/impala/blame/master/tests/common/custom_cluster_test_suite.py#L283] returned non-zero exit status. test_multiple_coordinator() calls test_multiple_coordinator() at https://github.com/apache/impala/blame/master/tests/custom_cluster/test_coordinators.py#L41C10-L41C31.
> *Error Message*
> {code:java}
> CalledProcessError: Command '['/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py', '--state_store_args=--statestore_update_frequency_ms=50     --statestore_priority_update_frequency_ms=50     --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', '--num_coordinators=2', '--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests', '--log_level=1', '--impalad_args=--default_query_options=']' returned non-zero exit status 1
> {code}
> *Stacktrace*
> {code:java}
> custom_cluster/test_coordinators.py:41: in test_multiple_coordinators
>     self._start_impala_cluster([], num_coordinators=2, cluster_size=3)
> common/custom_cluster_test_suite.py:330: in _start_impala_cluster
>     check_call(cmd + options, close_fds=True)
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/subprocess.py:190: in check_call
>     raise CalledProcessError(retcode, cmd)
> E   CalledProcessError: Command '['/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py', '--state_store_args=--statestore_update_frequency_ms=50     --statestore_priority_update_frequency_ms=50     --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', '--num_coordinators=2', '--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests', '--log_level=1', '--impalad_args=--default_query_options=']' returned non-zero exit status 1
> {code}
> The following console output shows that 'num_known_live_backends' could not reach 3 in 4 mins and thus the command that starts the cluster failed with non-zero exit status.
> {code}
> -- 2023-06-21 20:54:40,594 INFO     MainThread: Starting cluster with command: /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py '--state_store_args=--statestore_update_frequency_ms=50     --statestore_priority_update_frequency_ms=50     --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=2 --log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests --log_level=1 --impalad_args=--default_query_options=
> 20:54:41 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 20:54:41 MainThread: Starting State Store logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 20:54:42 MainThread: Starting Catalog Service logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 20:54:43 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 20:54:43 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 20:54:43 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 20:54:46 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 20:54:46 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 20:54:46 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25000
> 20:54:46 MainThread: Waiting for num_known_live_backends=3. Current value: 1
> 20:54:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 20:54:47 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25000
> 20:54:47 MainThread: Waiting for num_known_live_backends=3. Current value: 1
> 20:54:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 20:54:48 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25000
> 20:54:48 MainThread: num_known_live_backends has reached value: 3
> 20:54:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 20:54:48 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25001
> 20:54:48 MainThread: Waiting for num_known_live_backends=3. Current value: 2
> ...
> 20:58:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 20:58:48 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25001
> 20:58:48 MainThread: Waiting for num_known_live_backends=3. Current value: 2
> 20:58:49 MainThread: Error starting cluster
> Traceback (most recent call last):
>   File "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py", line 931, in <module>
>     expected_cluster_size - expected_catalog_delays)
>   File "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/tests/common/impala_cluster.py", line 205, in wait_until_ready
>     early_abort_fn=check_processes_still_running)
>   File "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/tests/common/impala_service.py", line 374, in wait_for_num_known_live_backends
>     assert 0, 'num_known_live_backends did not reach expected value in time'
> AssertionError: num_known_live_backends did not reach expected value in time
> -- 2023-06-21 20:58:49,141 DEBUG    MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org