You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Charles Natali (Jira)" <ji...@apache.org> on 2021/07/28 19:43:00 UTC

[jira] [Created] (MESOS-10226) test suite hangs on ARM64

Charles Natali created MESOS-10226:
--------------------------------------

             Summary: test suite hangs on ARM64
                 Key: MESOS-10226
                 URL: https://issues.apache.org/jira/browse/MESOS-10226
             Project: Mesos
          Issue Type: Bug
            Reporter: Charles Natali
            Assignee: Charles Natali


Reported by [~mgrigorov].

 
{noformat}
[ RUN      ] NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace
sh: 1: hadoop: not found
Marked '/' as rslave
I0726 11:59:17.812630    32 exec.cpp:164] Version: 1.12.0
I0726 11:59:17.827512    31 exec.cpp:237] Executor registered on agent 9076f44b-846d-4f00-a2dc-11f694cc1900-S0
I0726 11:59:17.830999    36 executor.cpp:190] Received SUBSCRIBED event
I0726 11:59:17.832351    36 executor.cpp:194] Subscribed executor on martin-arm64
I0726 11:59:17.832775    36 executor.cpp:190] Received LAUNCH event
I0726 11:59:17.834415    36 executor.cpp:722] Starting task d1bbb266-bee7-4c9d-929f-16aa41f4e9cf
I0726 11:59:17.839910    36 executor.cpp:740] Forked command at 38
Preparing rootfs at '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791'
Changing root to /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791
Failed to execute 'sh': Exec format error
I0726 11:59:18.113488    33 executor.cpp:1041] Command exited with status 1 (pid: 38)
../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:1111: Failure
Mock function called more times than expected - returning directly.
    Function call: statusUpdate(0xffffc28527f0, @0xffffa2cf3a60 136-byte object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 03-00 00-00>)
         Expected: to be called twice
           Actual: called 3 times - over-saturated and active
I0726 11:59:19.117401    37 process.cpp:935] Stopped the socket accept loop{noformat}
 

I asked him to provide a gdb traceback and we can see the following:

 
{noformat}


Thread 1 (Thread 0xffffa3bc2c60 (LWP 173475)):
#0 0x0000ffffa518db20 in __libc_open64 (file=0xaaab00f342e0 "/tmp/7VXP3w/pipe", oflag=<optimized out>) at ../sysdeps/unix/sysv/linux/open64.c:48
#1 0x0000ffffa513adb0 in __GI__IO_file_open (fp=fp@entry=0xaaab00e439a0, filename=<optimized out>, posix_mode=<optimized out>, prot=prot@entry=438, read_write=8, is32not64=<optimized out>) at fileops.c:189
#2 0x0000ffffa513b0b0 in _IO_new_file_fopen (fp=fp@entry=0xaaab00e439a0, filename=filename@entry=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode=<optimized out>, mode@entry=0xaaaad762f3c8 "r", is32not64=is32not64@e
ntry=1) at fileops.c:281 
#3 0x0000ffffa512e0dc in __fopen_internal (filename=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode=0xaaaad762f3c8 "r", is32=1) at iofopen.c:75
#4 0x0000aaaad54f5350 in os::read (path="/tmp/7VXP3w/pipe") at ../../3rdparty/stout/include/stout/os/read.hpp:136
#5 0x0000aaaad74f1c1c in mesos::internal::tests::NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_Test::TestBody (this=0xaaab00f88f50) at ../../src/tests/containeri
zer/nested_mesos_containerizer_tests.cpp:1126
{noformat}
 

 

Basically the test uses a named pipe to synchronize with the task being started, and if the task fails to start - in this case because we're trying to launch an x86 container on an arm64 host - the test will just hang reading from the pipe.

I send Martin a tentative fix for him to test, and I'll open an MR if successful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)