You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Csaba Ringhofer (Jira)" <ji...@apache.org> on 2020/03/26 18:43:00 UTC

[jira] [Commented] (IMPALA-8989) TestAdmissionController.test_release_backend is flaky

    [ https://issues.apache.org/jira/browse/IMPALA-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067943#comment-17067943 ] 

Csaba Ringhofer commented on IMPALA-8989:
-----------------------------------------

Found test_release_backend  broken again. One of the impalads crashed with SIGABRT. The callstack wasn't too helpful - the only thing I could understand was impalad!_fini, which suggests that this occurred during exit.
{code}
Crash reason:  SIGABRT
Crash address: 0x7d100002437
Process uptime: not available

Thread 0 (crashed)
 0  libc-2.17.so + 0x351f7
    rax = 0x0000000000000000   rdx = 0x0000000000000006
    rcx = 0xffffffffffffffff   rbx = 0x00000000136b85e8
    rsi = 0x0000000000002437   rdi = 0x0000000000002437
    rbp = 0x0000000007289eb0   rsp = 0x00007ffc919f9d98
     r8 = 0x000000000000000a    r9 = 0x00007f341d8889c0
    r10 = 0x0000000000000008   r11 = 0x0000000000000202
    r12 = 0x0000000014592780   r13 = 0x00007ffc919fa000
    r14 = 0x0000000000000000   r15 = 0x0000000000000000
    rip = 0x00007f3419d0a1f7
    Found by: given as instruction pointer in context
 1  libc-2.17.so + 0x368e8
    rsp = 0x00007ffc919f9da0   rip = 0x00007f3419d0b8e8
    Found by: stack scanning
 2  libstdc++.so.6.0.20 + 0xbf198
    rsp = 0x00007ffc919f9e50   rip = 0x00007f341a670198
    Found by: stack scanning
 3  libc-2.17.so + 0x765fd
    rsp = 0x00007ffc919f9e60   rip = 0x00007f3419d4b5fd
    Found by: stack scanning
 4  libc-2.17.so + 0x765fd
    rsp = 0x00007ffc919f9e70   rip = 0x00007f3419d4b5fd
    Found by: stack scanning
 5  libstdc++.so.6.0.20 + 0x5fd2d
    rsp = 0x00007ffc919f9ed0   rip = 0x00007f341a610d2d
    Found by: stack scanning
 6  libstdc++.so.6.0.20 + 0x5dd86
    rsp = 0x00007ffc919f9f00   rip = 0x00007f341a60ed86
    Found by: stack scanning
 7  libstdc++.so.6.0.20 + 0x5ce79
    rsp = 0x00007ffc919f9f10   rip = 0x00007f341a60de79
    Found by: stack scanning
 8  libstdc++.so.6.0.20 + 0x5d5db
    rsp = 0x00007ffc919f9f20   rip = 0x00007f341a60e5db
    Found by: stack scanning
 9  impalad!_fini + 0x16522e4
    rsp = 0x00007ffc919f9f40   rip = 0x0000000006733c54
    Found by: stack scanning
10  impalad!_fini + 0x1d49238
    rsp = 0x00007ffc919f9f60   rip = 0x0000000006e2aba8
    Found by: stack scanning
11  impalad!_fini + 0x1d4924d
    rsp = 0x00007ffc919f9f68   rip = 0x0000000006e2abbd
    Found by: stack scanning
12  impalad!_fini + 0x1cb973b
    rsp = 0x00007ffc919f9f88   rip = 0x0000000006d9b0ab
    Found by: stack scanning
13  libgcc_s.so.1 + 0xffa3
    rsp = 0x00007ffc919fa000   rip = 0x00007f341a0a7fa3
    Found by: stack scanning
{code}

The logs around the error:
{code}
05:53:54.959 ERROR custom_cluster/test_admission_controller.py::TestAdmissionController::()::test_release_backends[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]
05:53:54.960 ==================================== ERRORS ====================================
05:53:54.960  ERROR at setup of TestAdmissionController.test_release_backends[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none] 
05:53:54.975 common/custom_cluster_test_suite.py:190: in setup_method
05:53:54.976     self._start_impala_cluster(cluster_args, **kwargs)
05:53:54.976 common/custom_cluster_test_suite.py:307: in _start_impala_cluster
05:53:54.976     check_call(cmd + options, close_fds=True)
05:53:54.976 /usr/lib64/python2.7/subprocess.py:542: in check_call
05:53:54.976     raise CalledProcessError(retcode, cmd)
05:53:54.976 E   CalledProcessError: Command '['/data/jenkins/workspace/impala-asf-master-core/repos/Impala/bin/start-impala-cluster.py', '--state_store_args=--statestore_update_frequency_ms=50     --statestore_priority_update_frequency_ms=50     --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', '--num_coordinators=1', '--log_dir=/data/jenkins/workspace/impala-asf-master-core/repos/Impala/logs/custom_cluster_tests', '--log_level=1', '--use_exclusive_coordinators', '--state_store_args=None ', '--impalad_args=--default_query_options=']' returned non-zero exit status 1
05:53:54.977 ---------------------------- Captured stderr setup -----------------------------
05:53:54.978 01:47:32 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
05:53:54.978 01:47:32 MainThread: Starting State Store logging to /data/jenkins/workspace/impala-asf-master-core/repos/Impala/logs/custom_cluster_tests/statestored.INFO
05:53:54.978 01:47:32 MainThread: Starting Catalog Service logging to /data/jenkins/workspace/impala-asf-master-core/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
05:53:54.978 01:47:32 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core/repos/Impala/logs/custom_cluster_tests/impalad.INFO
05:53:54.979 01:47:32 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
05:53:54.979 01:47:32 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
05:53:54.979 01:47:35 MainThread: Found 4 impalad/1 statestored/1 catalogd process(es)
05:53:54.979 01:47:35 MainThread: Found 4 impalad/1 statestored/1 catalogd process(es)
05:53:54.980 01:47:35 MainThread: Getting num_known_live_backends from impala-ec2-centos74-m5-4xlarge-ondemand-081b.vpc.cloudera.com:25000
05:53:54.980 01:47:35 MainThread: 'backends'
05:53:54.980 01:47:35 MainThread: Waiting for num_known_live_backends=3. Current value: None
05:53:54.980 01:47:36 MainThread: Found 2 impalad/1 statestored/1 catalogd process(es)
05:53:54.980 01:47:36 MainThread: Error starting cluster
05:53:54.980 Traceback (most recent call last):
05:53:54.980   File "/data/jenkins/workspace/impala-asf-master-core/repos/Impala/bin/start-impala-cluster.py", line 770, in <module>
05:53:54.981     expected_cluster_size - expected_catalog_delays)
05:53:54.981   File "/data/jenkins/workspace/impala-asf-master-core/repos/Impala/tests/common/impala_cluster.py", line 186, in wait_until_ready
05:53:54.981     early_abort_fn=check_processes_still_running)
05:53:54.981   File "/data/jenkins/workspace/impala-asf-master-core/repos/Impala/tests/common/impala_service.py", line 267, in wait_for_num_known_live_backends
05:53:54.981     early_abort_fn()
05:53:54.981   File "/data/jenkins/workspace/impala-asf-master-core/repos/Impala/tests/common/impala_cluster.py", line 178, in check_processes_still_running
05:53:54.982     assert len(self.impalads) >= expected_num_impalads
05:53:54.982 AssertionError
05:53:54.982 DEBUG:impala_cluster:Found 2 impalad/1 statestored/1 catalogd process(es)
05:53:54.982  226 passed, 90 skipped, 8 xfailed, 2 pytest-warnings, 1 error in 6498.61 seconds 
05:53:55.622 ERROR in /data/jenkins/workspace/impala-asf-master-core/repos/Impala/tests/run-custom-cluster-tests.sh at line 49: impala-py.test "${ARGS[@]}"
{code}

> TestAdmissionController.test_release_backend is flaky
> -----------------------------------------------------
>
>                 Key: IMPALA-8989
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8989
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.4.0
>            Reporter: Michael Ho
>            Assignee: Sahil Takiar
>            Priority: Major
>
> Test seems to have failed due to a mismatch in number of completed backend. Only failed once so far.
> {noformat}
> Error Message
> assert 'NumCompletedBackends: 1 (1)' in 'Query (id=c341f85f14c3d981:5827484900000000):\n  DEBUG MODE WARNING: Query profile created while running a DEBUG buil... - OptimizationTime: 372.020ms\n           - PeakMemoryUsage: 528.00 KB (540672)\n           - PrepareTime: 39.002ms\n'  +  where 'Query (id=c341f85f14c3d981:5827484900000000):\n  DEBUG MODE WARNING: Query profile created while running a DEBUG buil... - OptimizationTime: 372.020ms\n           - PeakMemoryUsage: 528.00 KB (540672)\n           - PrepareTime: 39.002ms\n' = <bound method BeeswaxConnection.get_runtime_profile of <tests.common.impala_connection.BeeswaxConnection object at 0x5323510>>(<tests.common.impala_connection.OperationHandle object at 0x5462cd0>)  +    where <bound method BeeswaxConnection.get_runtime_profile of <tests.common.impala_connection.BeeswaxConnection object at 0x5323510>> = <tests.common.impala_connection.BeeswaxConnection object at 0x5323510>.get_runtime_profile  +      where <tests.common.impala_connection.BeeswaxConnection object at 0x5323510> = <test_admission_controller.TestAdmissionController object at 0x51f5fd0>.client
> Stacktrace
> custom_cluster/test_admission_controller.py:1338: in test_release_backends
>     assert "NumCompletedBackends: 1 (1)" in self.client.get_runtime_profile(handle)
> E   assert 'NumCompletedBackends: 1 (1)' in 'Query (id=c341f85f14c3d981:5827484900000000):\n  DEBUG MODE WARNING: Query profile created while running a DEBUG buil... - OptimizationTime: 372.020ms\n           - PeakMemoryUsage: 528.00 KB (540672)\n           - PrepareTime: 39.002ms\n'
> E    +  where 'Query (id=c341f85f14c3d981:5827484900000000):\n  DEBUG MODE WARNING: Query profile created while running a DEBUG buil... - OptimizationTime: 372.020ms\n           - PeakMemoryUsage: 528.00 KB (540672)\n           - PrepareTime: 39.002ms\n' = <bound method BeeswaxConnection.get_runtime_profile of <tests.common.impala_connection.BeeswaxConnection object at 0x5323510>>(<tests.common.impala_connection.OperationHandle object at 0x5462cd0>)
> E    +    where <bound method BeeswaxConnection.get_runtime_profile of <tests.common.impala_connection.BeeswaxConnection object at 0x5323510>> = <tests.common.impala_connection.BeeswaxConnection object at 0x5323510>.get_runtime_profile
> E    +      where <tests.common.impala_connection.BeeswaxConnection object at 0x5323510> = <test_admission_controller.TestAdmissionController object at 0x51f5fd0>.client
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org