You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Bharath Bhushan (JIRA)" <ji...@apache.org> on 2014/03/30 13:12:16 UTC

[jira] [Commented] (MESOS-1163) make check fails on ubuntu 12.04

    [ https://issues.apache.org/jira/browse/MESOS-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954642#comment-13954642 ] 

Bharath Bhushan commented on MESOS-1163:
----------------------------------------

I continued the debug the problem and I see that mmap in leveldb/util/env_posix.cc is returning EINVAL.
PosixMmapFile::MapNewRegion has this mmap call. I see that the fd is fine and the start offset and length and other args seem reasonable. But still EINVAL is returned.

I tried changing the args to see which one was causing the EINVAL. It turns out that changing MAP_SHARED to MAP_PRIVATE causes the EINVAL problem to go away. But the corruption gets caught later in the test presumably because some other process was hoping to get something reasonable in that file but got junk.

I also created a small C program to simulate the error but was unable to reproduce it. MAP_SHARED is fine there.

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/mman.h>
#include <assert.h>

int main() {
  int fd = open("/vagrant/mesos/build/.state/MANIFEST-000001",
                O_RDWR | O_CREAT);
  assert(fd > 0);
  printf("fd=%d %d\n", fd, errno);
  ftruncate(fd, 65536);
  void* src = mmap(NULL, 65536, PROT_READ | PROT_WRITE, MAP_SHARED,
                  fd, 0);
  printf("src=%p\n", src);
}

I don't know if the .state directory creation has something to do with this.

> make check fails on ubuntu 12.04
> --------------------------------
>
>                 Key: MESOS-1163
>                 URL: https://issues.apache.org/jira/browse/MESOS-1163
>             Project: Mesos
>          Issue Type: Bug
>          Components: build
>    Affects Versions: 0.19.0
>         Environment: Ubuntu 12.04 LTS in vagrant
>            Reporter: Bharath Bhushan
>
> make check fails 17 tests.
>     $ make check
>     ....
>     ....
>     [==========] 280 tests from 49 test cases ran. (349554 ms total)
>     [  PASSED  ] 263 tests.
>     [  FAILED  ] 17 tests, listed below:
>     [  FAILED  ] LevelDBStateTest.FetchAndStoreAndFetch
>     [  FAILED  ] LevelDBStateTest.FetchAndStoreAndStoreAndFetch
>     [  FAILED  ] LevelDBStateTest.FetchAndStoreAndStoreFailAndFetch
>     [  FAILED  ] LevelDBStateTest.FetchAndStoreAndExpungeAndFetch
>     [  FAILED  ] LevelDBStateTest.FetchAndStoreAndExpungeAndExpunge
>     [  FAILED  ] LevelDBStateTest.FetchAndStoreAndExpungeAndStoreAndFetch
>     [  FAILED  ] LevelDBStateTest.Names
>     [  FAILED  ] Strict/RegistrarTest.recover/0, where GetParam() = false
>     [  FAILED  ] Strict/RegistrarTest.recover/1, where GetParam() = true
>     [  FAILED  ] Strict/RegistrarTest.admit/0, where GetParam() = false
>     [  FAILED  ] Strict/RegistrarTest.admit/1, where GetParam() = true
>     [  FAILED  ] Strict/RegistrarTest.readmit/0, where GetParam() = false
>     [  FAILED  ] Strict/RegistrarTest.readmit/1, where GetParam() = true
>     [  FAILED  ] Strict/RegistrarTest.remove/0, where GetParam() = false
>     [  FAILED  ] Strict/RegistrarTest.remove/1, where GetParam() = true
>     [  FAILED  ] Strict/RegistrarTest.bootstrap/0, where GetParam() = false
>     [  FAILED  ] Strict/RegistrarTest.bootstrap/1, where GetParam() = true
> The failure details for all the them appear to be the same as below:
>     $ GLOG_v=2 ./bin/mesos-tests.sh --gtest_filter="LevelDBStateTest.FetchAndStoreAndFetch" --verbose
>      
>     ...
>      
>     [ RUN      ] LevelDBStateTest.FetchAndStoreAndFetch
>     I0328 19:12:49.520093  4836 process.cpp:2533] Resuming (1)@10.0.2.15:59978 at 2014-03-28 19:12:49.520086016+00:00
>     I0328 19:12:49.520884  4821 process.cpp:2523] Spawned process (1)@10.0.2.15:59978
>     ../../src/tests/state_tests.cpp:70: Failure
>     (future1).failure(): IO error: /vagrant/mesos/build/.state/MANIFEST-000001: Invalid argument
>     I0328 19:12:49.538709  4841 process.cpp:2533] Resuming (1)@10.0.2.15:59978 at 2014-03-28 19:12:49.538699008+00:00
>     I0328 19:12:49.539717  4841 process.cpp:2640] Cleaning up (1)@10.0.2.15:59978
>     [  FAILED  ] LevelDBStateTest.FetchAndStoreAndFetch (28 ms)
>     [----------] 1 test from LevelDBStateTest (29 ms total)
>      
>     [----------] Global test environment tear-down
>     [==========] 1 test from 1 test case ran. (40 ms total)
>     [  PASSED  ] 0 tests.
>     [  FAILED  ] 1 test, listed below:
>     [  FAILED  ] LevelDBStateTest.FetchAndStoreAndFetch
> I tried creating the .state directory before running the test but that did not help. 
> Also running the c++ test framework causes all 5 tasks to be LOST as soon as they are created. The error message says: 
> I0328 19:28:02.898165  5756 launcher.cpp:116] Forked child with pid '5922' for c
> ontainer '51c4e487-e7f4-49ea-a385-ab985e1e2bf8'
> I0328 19:28:02.899159  5756 slave.cpp:2116] Monitoring executor 'default' of fra
> mework '20140328-192600-251789322-5050-5716-0000' in container '51c4e487-e7f4-49
> ea-a385-ab985e1e2bf8'
> I0328 19:28:03.899876  5756 mesos_containerizer.cpp:879] Executor for container
> '51c4e487-e7f4-49ea-a385-ab985e1e2bf8' has exited
> I0328 19:28:03.899938  5756 mesos_containerizer.cpp:796] Destroying container '5
> 1c4e487-e7f4-49ea-a385-ab985e1e2bf8'
> I0328 19:28:03.916704  5759 slave.cpp:2174] Executor 'default' of framework 2014
> 0328-192600-251789322-5050-5716-0000 has exited with status 127
> I0328 19:28:03.917382  5759 slave.cpp:1774] Handling status update TASK_LOST (UU
> ID: bffea6f2-2c74-4b1d-b616-9e6edf050f83) for task 4 of framework 20140328-19260
> 0-251789322-5050-5716-0000 from @0.0.0.0:0
> I also tried running spark-shell against mesos master+slave and that too resulted in a lost task. I am stumped here and any help is appreciated.
> One warning message that I see:
> *** Warning: Linking the shared library libmesos.la against the
> *** static library ../3rdparty/leveldb/libleveldb.a is not portable!
> I am not sure if this is causing the test failures and the test-framework errors.
> Also I synced to a later version of code and I still see the same error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)