You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Arindam Barua <ab...@247-inc.com> on 2012/06/20 20:29:54 UTC

few 'make check' tests taking longer than expected

Hello,

I have a standalone install of mesos running on a capable box (running Centos 6.2), but running into issues with the sanity test (make check).

The initial failure is:

[ RUN      ] CoordinatorTest.MultipleAppendsNotLearnedFill

../../src/tests/log_tests.cpp:860: Failure

Value of: result.isSome()

  Actual: false

Expected: true

[  FAILED  ] CoordinatorTest.MultipleAppendsNotLearnedFill (2295 ms)

[ RUN      ] CoordinatorTest.Truncate

#

# A fatal error has been detected by the Java Runtime Environment:

#

#  SIGSEGV (0xb) at pc=0x0000000001ed0ab8, pid=8514, tid=140213985802016

In the mailing list archives, I found a similar problem faced by another user:
http://mail-archives.apache.org/mod_mbox/incubator-mesos-dev/201205.mbox/%3CCAKOYTn9Td-Ru487aLnqrApKtvo2QuEb77Zt459_WPff2eLrVCQ%40mail.gmail.com%3E

As suggested in that thread, the problem seems to be with the test on my box taking longer than what the test expects, and increasing the timeout in the test to 3s or 4s makes the failures above go away. However, other tests continue to fail like CoordinatorTest.MultipleAppends, CoordinatorTest.TruncateNotLearnedFill, CoordinatorTest.TruncateLearnedFill. I had to increase the timeouts on all these tests to get make check to succeed.

I am evaluating mesos for production use, and I am concerned about the reason for these tests taking longer than expected. Any help would be appreciated.

My environment/config:
Centos 6.2, 64-bit
Kernel: 2.6.32-220.el6.x86_64
CPU: 24 (Intel(R) Xeon(R) CPU L5640  @ 2.27GHz)
RAM: 16 GB

Thanks,
Arindam

RE: few 'make check' tests taking longer than expected

Posted by Arindam Barua <ab...@247-inc.com>.
Any ideas regarding why I might be facing this? Are there any known issues with Centos 6.2, or successfully installs by others?
Note that I did a fresh install of the OS specifically to test out mesos, and yum'd most of the latest required packages to build it, except maybe downloaded java 7 directly from the Oracle website.
Oh, and I had checked out the mesos code from apache svn around May 14 (not sure if any changes have gotten in since, that might help)

Thanks,
Arindam

-----Original Message-----
From: Arindam Barua [mailto:abarua@247-inc.com] 
Sent: Wednesday, June 20, 2012 6:54 PM
To: mesos-dev@incubator.apache.org
Subject: RE: few 'make check' tests taking longer than expected


Thanks for your prompt response. No, I am not running in a virtual environment. This box is dedicated for this purpose and not used for anything else.

$ ./bin/mesos-tests.sh --gtest_filter=CoordinatorTest* results in the same failure, followed by core, i.e. in MultipleAppendsNotLearnedFill and Truncate.

Below are the outputs of 3 of the runs I attempted with the random test order:
$ ./bin/mesos-tests.sh --gtest_filter=CoordinatorTest* --gtest_shuffle --gtest_repeat=1000

1) <few successes>
[ RUN      ] CoordinatorTest.TruncateNotLearnedFill
../../src/tests/log_tests.cpp:1006: Failure Value of: result.isSome()
  Actual: false
Expected: true
[  FAILED  ] CoordinatorTest.TruncateNotLearnedFill (2450 ms)
[ RUN      ] CoordinatorTest.FillNoQuorum
[       OK ] CoordinatorTest.FillNoQuorum (0 ms)
[ RUN      ] CoordinatorTest.MultipleAppends
Segmentation fault (core dumped)

2) <few successes>
[ RUN      ] CoordinatorTest.TruncateNotLearnedFill
../../src/tests/log_tests.cpp:1006: Failure Value of: result.isSome()
  Actual: false
Expected: true
[  FAILED  ] CoordinatorTest.TruncateNotLearnedFill (2480 ms)
[ RUN      ] CoordinatorTest.FillInconsistent
[       OK ] CoordinatorTest.FillInconsistent (0 ms)
[ RUN      ] CoordinatorTest.FillNoQuorum
[       OK ] CoordinatorTest.FillNoQuorum (0 ms)
[ RUN      ] CoordinatorTest.TruncateLearnedFill
Segmentation fault (core dumped)


3) [ RUN      ] CoordinatorTest.TruncateNotLearnedFill
../../src/tests/log_tests.cpp:1006: Failure Value of: result.isSome()
  Actual: false
Expected: true
[  FAILED  ] CoordinatorTest.TruncateNotLearnedFill (2291 ms)
[ RUN      ] CoordinatorTest.LearnedOnOneReplica_NotLearnedOnAnother_AnotherFailsAndRecovers
[       OK ] CoordinatorTest.LearnedOnOneReplica_NotLearnedOnAnother_AnotherFailsAndRecovers (0 ms)
[ RUN      ] CoordinatorTest.RacingElect
[       OK ] CoordinatorTest.RacingElect (0 ms)
[ RUN      ] CoordinatorTest.FillInconsistent
[       OK ] CoordinatorTest.FillInconsistent (0 ms)
[ RUN      ] CoordinatorTest.Elect
Segmentation fault (core dumped)

-----Original Message-----
From: benjamin.hindman@gmail.com [mailto:benjamin.hindman@gmail.com] On Behalf Of Benjamin Hindman
Sent: Wednesday, June 20, 2012 12:13 PM
To: mesos-dev@incubator.apache.org
Subject: Re: few 'make check' tests taking longer than expected

Are you running in a virtualized environment like EC2?

The Java errors point to an issue with the Java related tests (i.e., something not getting shutdown or cleaned up correctly). It's possible that the JVM is causing the process to run slowly and the tests are thus taking longer than expected.

What happens if you just run the Coordinator tests?

Try:

$ ./bin/mesos-tests.sh --gtest_filter=CoordinatorTest*

Or even:

$ ./bin/mesos-tests.sh --gtest_filter=CoordinatorTest* --gtest_shuffle
--gtest_repeat=1000


Ben.



On Wed, Jun 20, 2012 at 11:29 AM, Arindam Barua <ab...@247-inc.com> wrote:

>
> Hello,
>
> I have a standalone install of mesos running on a capable box (running 
> Centos 6.2), but running into issues with the sanity test (make check).
>
> The initial failure is:
>
> [ RUN      ] CoordinatorTest.MultipleAppendsNotLearnedFill
>
> ../../src/tests/log_tests.cpp:860: Failure
>
> Value of: result.isSome()
>
>  Actual: false
>
> Expected: true
>
> [  FAILED  ] CoordinatorTest.MultipleAppendsNotLearnedFill (2295 ms)
>
> [ RUN      ] CoordinatorTest.Truncate
>
> #
>
> # A fatal error has been detected by the Java Runtime Environment:
>
> #
>
> #  SIGSEGV (0xb) at pc=0x0000000001ed0ab8, pid=8514,
> tid=140213985802016
>
> In the mailing list archives, I found a similar problem faced by 
> another
> user:
>
> http://mail-archives.apache.org/mod_mbox/incubator-mesos-dev/201205.mb
> ox/%3CCAKOYTn9Td-Ru487aLnqrApKtvo2QuEb77Zt459_WPff2eLrVCQ%40mail.gmail
> .com%3E
>
> As suggested in that thread, the problem seems to be with the test on 
> my box taking longer than what the test expects, and increasing the 
> timeout in the test to 3s or 4s makes the failures above go away.
> However, other tests continue to fail like 
> CoordinatorTest.MultipleAppends, 
> CoordinatorTest.TruncateNotLearnedFill,
> CoordinatorTest.TruncateLearnedFill. I had to increase the timeouts on 
> all these tests to get make check to succeed.
>
> I am evaluating mesos for production use, and I am concerned about the 
> reason for these tests taking longer than expected. Any help would be 
> appreciated.
>
> My environment/config:
> Centos 6.2, 64-bit
> Kernel: 2.6.32-220.el6.x86_64
> CPU: 24 (Intel(R) Xeon(R) CPU L5640  @ 2.27GHz)
> RAM: 16 GB
>
> Thanks,
> Arindam
>




RE: few 'make check' tests taking longer than expected

Posted by Arindam Barua <ab...@247-inc.com>.
Thanks for your prompt response. No, I am not running in a virtual environment. This box is dedicated for this purpose and not used for anything else.

$ ./bin/mesos-tests.sh --gtest_filter=CoordinatorTest* 
results in the same failure, followed by core, i.e. in MultipleAppendsNotLearnedFill and Truncate.

Below are the outputs of 3 of the runs I attempted with the random test order:
$ ./bin/mesos-tests.sh --gtest_filter=CoordinatorTest* --gtest_shuffle --gtest_repeat=1000

1) <few successes>
[ RUN      ] CoordinatorTest.TruncateNotLearnedFill
../../src/tests/log_tests.cpp:1006: Failure
Value of: result.isSome()
  Actual: false
Expected: true
[  FAILED  ] CoordinatorTest.TruncateNotLearnedFill (2450 ms)
[ RUN      ] CoordinatorTest.FillNoQuorum
[       OK ] CoordinatorTest.FillNoQuorum (0 ms)
[ RUN      ] CoordinatorTest.MultipleAppends
Segmentation fault (core dumped)

2) <few successes>
[ RUN      ] CoordinatorTest.TruncateNotLearnedFill
../../src/tests/log_tests.cpp:1006: Failure
Value of: result.isSome()
  Actual: false
Expected: true
[  FAILED  ] CoordinatorTest.TruncateNotLearnedFill (2480 ms)
[ RUN      ] CoordinatorTest.FillInconsistent
[       OK ] CoordinatorTest.FillInconsistent (0 ms)
[ RUN      ] CoordinatorTest.FillNoQuorum
[       OK ] CoordinatorTest.FillNoQuorum (0 ms)
[ RUN      ] CoordinatorTest.TruncateLearnedFill
Segmentation fault (core dumped)


3) [ RUN      ] CoordinatorTest.TruncateNotLearnedFill
../../src/tests/log_tests.cpp:1006: Failure
Value of: result.isSome()
  Actual: false
Expected: true
[  FAILED  ] CoordinatorTest.TruncateNotLearnedFill (2291 ms)
[ RUN      ] CoordinatorTest.LearnedOnOneReplica_NotLearnedOnAnother_AnotherFailsAndRecovers
[       OK ] CoordinatorTest.LearnedOnOneReplica_NotLearnedOnAnother_AnotherFailsAndRecovers (0 ms)
[ RUN      ] CoordinatorTest.RacingElect
[       OK ] CoordinatorTest.RacingElect (0 ms)
[ RUN      ] CoordinatorTest.FillInconsistent
[       OK ] CoordinatorTest.FillInconsistent (0 ms)
[ RUN      ] CoordinatorTest.Elect
Segmentation fault (core dumped)

-----Original Message-----
From: benjamin.hindman@gmail.com [mailto:benjamin.hindman@gmail.com] On Behalf Of Benjamin Hindman
Sent: Wednesday, June 20, 2012 12:13 PM
To: mesos-dev@incubator.apache.org
Subject: Re: few 'make check' tests taking longer than expected

Are you running in a virtualized environment like EC2?

The Java errors point to an issue with the Java related tests (i.e., something not getting shutdown or cleaned up correctly). It's possible that the JVM is causing the process to run slowly and the tests are thus taking longer than expected.

What happens if you just run the Coordinator tests?

Try:

$ ./bin/mesos-tests.sh --gtest_filter=CoordinatorTest*

Or even:

$ ./bin/mesos-tests.sh --gtest_filter=CoordinatorTest* --gtest_shuffle
--gtest_repeat=1000


Ben.



On Wed, Jun 20, 2012 at 11:29 AM, Arindam Barua <ab...@247-inc.com> wrote:

>
> Hello,
>
> I have a standalone install of mesos running on a capable box (running 
> Centos 6.2), but running into issues with the sanity test (make check).
>
> The initial failure is:
>
> [ RUN      ] CoordinatorTest.MultipleAppendsNotLearnedFill
>
> ../../src/tests/log_tests.cpp:860: Failure
>
> Value of: result.isSome()
>
>  Actual: false
>
> Expected: true
>
> [  FAILED  ] CoordinatorTest.MultipleAppendsNotLearnedFill (2295 ms)
>
> [ RUN      ] CoordinatorTest.Truncate
>
> #
>
> # A fatal error has been detected by the Java Runtime Environment:
>
> #
>
> #  SIGSEGV (0xb) at pc=0x0000000001ed0ab8, pid=8514, 
> tid=140213985802016
>
> In the mailing list archives, I found a similar problem faced by 
> another
> user:
>
> http://mail-archives.apache.org/mod_mbox/incubator-mesos-dev/201205.mb
> ox/%3CCAKOYTn9Td-Ru487aLnqrApKtvo2QuEb77Zt459_WPff2eLrVCQ%40mail.gmail
> .com%3E
>
> As suggested in that thread, the problem seems to be with the test on 
> my box taking longer than what the test expects, and increasing the 
> timeout in the test to 3s or 4s makes the failures above go away. 
> However, other tests continue to fail like 
> CoordinatorTest.MultipleAppends, 
> CoordinatorTest.TruncateNotLearnedFill,
> CoordinatorTest.TruncateLearnedFill. I had to increase the timeouts on 
> all these tests to get make check to succeed.
>
> I am evaluating mesos for production use, and I am concerned about the 
> reason for these tests taking longer than expected. Any help would be 
> appreciated.
>
> My environment/config:
> Centos 6.2, 64-bit
> Kernel: 2.6.32-220.el6.x86_64
> CPU: 24 (Intel(R) Xeon(R) CPU L5640  @ 2.27GHz)
> RAM: 16 GB
>
> Thanks,
> Arindam
>


Re: few 'make check' tests taking longer than expected

Posted by Benjamin Hindman <be...@eecs.berkeley.edu>.
Are you running in a virtualized environment like EC2?

The Java errors point to an issue with the Java related tests (i.e.,
something not getting shutdown or cleaned up correctly). It's possible that
the JVM is causing the process to run slowly and the tests are thus taking
longer than expected.

What happens if you just run the Coordinator tests?

Try:

$ ./bin/mesos-tests.sh --gtest_filter=CoordinatorTest*

Or even:

$ ./bin/mesos-tests.sh --gtest_filter=CoordinatorTest* --gtest_shuffle
--gtest_repeat=1000


Ben.



On Wed, Jun 20, 2012 at 11:29 AM, Arindam Barua <ab...@247-inc.com> wrote:

>
> Hello,
>
> I have a standalone install of mesos running on a capable box (running
> Centos 6.2), but running into issues with the sanity test (make check).
>
> The initial failure is:
>
> [ RUN      ] CoordinatorTest.MultipleAppendsNotLearnedFill
>
> ../../src/tests/log_tests.cpp:860: Failure
>
> Value of: result.isSome()
>
>  Actual: false
>
> Expected: true
>
> [  FAILED  ] CoordinatorTest.MultipleAppendsNotLearnedFill (2295 ms)
>
> [ RUN      ] CoordinatorTest.Truncate
>
> #
>
> # A fatal error has been detected by the Java Runtime Environment:
>
> #
>
> #  SIGSEGV (0xb) at pc=0x0000000001ed0ab8, pid=8514, tid=140213985802016
>
> In the mailing list archives, I found a similar problem faced by another
> user:
>
> http://mail-archives.apache.org/mod_mbox/incubator-mesos-dev/201205.mbox/%3CCAKOYTn9Td-Ru487aLnqrApKtvo2QuEb77Zt459_WPff2eLrVCQ%40mail.gmail.com%3E
>
> As suggested in that thread, the problem seems to be with the test on my
> box taking longer than what the test expects, and increasing the timeout in
> the test to 3s or 4s makes the failures above go away. However, other tests
> continue to fail like CoordinatorTest.MultipleAppends,
> CoordinatorTest.TruncateNotLearnedFill,
> CoordinatorTest.TruncateLearnedFill. I had to increase the timeouts on all
> these tests to get make check to succeed.
>
> I am evaluating mesos for production use, and I am concerned about the
> reason for these tests taking longer than expected. Any help would be
> appreciated.
>
> My environment/config:
> Centos 6.2, 64-bit
> Kernel: 2.6.32-220.el6.x86_64
> CPU: 24 (Intel(R) Xeon(R) CPU L5640  @ 2.27GHz)
> RAM: 16 GB
>
> Thanks,
> Arindam
>