You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Joe McDonnell (Jira)" <ji...@apache.org> on 2020/04/01 16:30:00 UTC

[jira] [Commented] (IMPALA-9571) TestCompactCatalogUpdates.test_restart_catalogd test failure

    [ https://issues.apache.org/jira/browse/IMPALA-9571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072939#comment-17072939 ] 

Joe McDonnell commented on IMPALA-9571:
---------------------------------------

It looks like boost filesystem remove_all is throwing an exception that we aren't catching. Here is the stack:
{noformat}
#0  0x00007feb06ed71f7 in raise () from /lib64/libc.so.6
#1  0x00007feb06ed88e8 in abort () from /lib64/libc.so.6
#2  0x00007feb077ddd2d in __gnu_cxx::__verbose_terminate_handler () at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00007feb077dbd86 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_terminate.cc:47
#4  0x00007feb077dae79 in __cxa_call_terminate (ue_header=0x1456a7e0) at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_call.cc:54
#5  0x00007feb077db5db in __cxxabiv1::__gxx_personality_v0 (version=<optimized out>, actions=6, exception_class=<optimized out>, ue_header=0x1456a7e0, context=0x7ffd6d93ede0) at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_personality.cc:670
#6  0x00007feb07274fa3 in _Unwind_RaiseException_Phase2 (exc=exc@entry=0x1456a7e0, context=context@entry=0x7ffd6d93ede0) at ../../../gcc-4.9.2/libgcc/unwind.inc:62
#7  0x00007feb072754c7 in _Unwind_Resume (exc=0x1456a7e0) at ../../../gcc-4.9.2/libgcc/unwind.inc:230
#8  0x0000000003dec188 in (anonymous namespace)::remove_all_aux(boost::filesystem::path const&, boost::filesystem::file_type, boost::system::error_code*) ()
#9  0x0000000003dec266 in boost::filesystem::detail::remove_all(boost::filesystem::path const&, boost::system::error_code*) ()
#10 0x00000000024f280a in boost::filesystem::remove_all (p=..., ec=...) at /data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/Impala-Toolchain/boost-1.61.0-p2/include/boost/filesystem/operations.hpp:675
#11 0x00000000024ed79f in impala::FileSystemUtil::RemoveAndCreateDirectory (directory=...) at /data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/repos/Impala/be/src/util/filesystem-util.cc:93
#12 0x00000000021a3fde in impala::TmpFileMgr::InitCustom (this=0xe36c4c0, tmp_dir_specifiers=..., one_dir_per_device=false, metrics=0x127010e0) at /data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/repos/Impala/be/src/runtime/tmp-file-mgr.cc:176
#13 0x00000000021a362a in impala::TmpFileMgr::InitCustom (this=0xe36c4c0, tmp_dirs_spec=..., one_dir_per_device=false, metrics=0x127010e0) at /data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/repos/Impala/be/src/runtime/tmp-file-mgr.cc:109
#14 0x00000000021a358f in impala::TmpFileMgr::Init (this=0xe36c4c0, metrics=0x127010e0) at /data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/repos/Impala/be/src/runtime/tmp-file-mgr.cc:99
#15 0x000000000233c2eb in impala::ImpalaServer::ImpalaServer (this=0xe8a2000, exec_env=0x7ffd6d93fee0, __in_chrg=<optimized out>, __vtt_parm=<optimized out>) at /data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/repos/Impala/be/src/service/impala-server.cc:380
#16 0x0000000002337cdd in ImpaladMain (argc=22, argv=0x7ffd6d940528) at /data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/repos/Impala/be/src/service/impalad-main.cc:87
#17 0x0000000001c3c140 in main (argc=22, argv=0x7ffd6d940528) at /data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/repos/Impala/be/src/service/daemon-main.cc:37{noformat}
In the ERROR log, there is this:
{noformat}
terminate called after throwing an instance of 'boost::filesystem::filesystem_error'
  what():  boost::filesystem::directory_iterator::construct: No such file or directory: "/tmp/impala-scratch"
Wrote minidump to Impala/logs/custom_cluster_tests/minidumps/impalad/8c93dc48-76ff-473d-ea54a9a4-2bf20fa8.dmp{noformat}
What is weird is that IMPALA-2846 wrapped this call in a try/catch block ([https://github.com/apache/impala/blob/master/be/src/util/filesystem-util.cc#L85-L96]):
{noformat}
    // Attempt to remove the directory and its contents so that we can create a fresh
    // empty directory that we will have permissions for. There is an open window between
    // the check for existence above and the removal here. If the directory is removed in
    // this window, we may get "no_such_file_or_directory" error which is fine.
    //
    // There is a bug in boost library (as of version 1.6) which may lead to unexpected
    // exceptions even though we are using the no-exceptions interface. See IMPALA-2846.
    try {
      filesystem::remove_all(directory, errcode);
    } catch (filesystem::filesystem_error& e) {
      errcode = e.code();
    }{noformat}

> TestCompactCatalogUpdates.test_restart_catalogd test failure
> ------------------------------------------------------------
>
>                 Key: IMPALA-9571
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9571
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Yongzhi Chen
>            Assignee: Joe McDonnell
>            Priority: Major
>
> In impala-asf-master-exhaustive-data-cache build, 
> custom_cluster.test_local_catalog.TestCompactCatalogUpdates.test_restart_catalogd test failed:
> Error Message
> test setup failure
> Stacktrace
> common/custom_cluster_test_suite.py:190: in setup_method
>     self._start_impala_cluster(cluster_args, **kwargs)
> common/custom_cluster_test_suite.py:307: in _start_impala_cluster
>     check_call(cmd + options, close_fds=True)
> /usr/lib64/python2.7/subprocess.py:542: in check_call
>     raise CalledProcessError(retcode, cmd)
> E   CalledProcessError: Command '['/data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/repos/Impala/bin/start-impala-cluster.py', '--state_store_args=--statestore_update_frequency_ms=50     --statestore_priority_update_frequency_ms=50     --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', '--num_coordinators=3', '--log_dir=/data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/repos/Impala/logs/custom_cluster_tests', '--log_level=1', '--impalad_args=--use_local_catalog=true ', '--state_store_args=None ', '--catalogd_args=--catalog_topic_mode=minimal ', '--impalad_args=--default_query_options=']' returned non-zero exit status 1
> .....
> 11:08:37 MainThread: Error starting cluster
> Traceback (most recent call last):
>   File "/data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/repos/Impala/bin/start-impala-cluster.py", line 770, in <module>
>     expected_cluster_size - expected_catalog_delays)
>   File "/data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/repos/Impala/tests/common/impala_cluster.py", line 186, in wait_until_ready
>     early_abort_fn=check_processes_still_running)
>   File "/data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/repos/Impala/tests/common/impala_service.py", line 267, in wait_for_num_known_live_backends
>     early_abort_fn()
>   File "/data/jenkins/workspace/impala-asf-master-exhaustive-data-cache/repos/Impala/tests/common/impala_cluster.py", line 178, in check_processes_still_running
>     assert len(self.impalads) >= expected_num_impalads
> AssertionError
> DEBUG:impala_cluster:Found 2 impalad/1 statestored/1 catalogd process(es)
> Details:
> https://master-02.jenkins.cloudera.com/job/impala-asf-master-exhaustive-data-cache/113/testReport/custom_cluster.test_local_catalog/TestCompactCatalogUpdates/test_restart_catalogd/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org