You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2014/04/17 23:47:15 UTC

[jira] [Resolved] (MESOS-1220) Make check failure on OSX - IO error: Too many open files

     [ https://issues.apache.org/jira/browse/MESOS-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Mahler resolved MESOS-1220.
------------------------------------

    Resolution: Fixed

Committed a fix that worked on my OS X laptop, let me know if you're still seeing issues.

{noformat}
commit 1ff83d71a694583859d9c26d7e678e1556acc38f
Author: Benjamin Mahler <bm...@twitter.com>
Date:   Thu Apr 17 14:45:03 2014 -0700

    Fixed missing call to delete in Cluster.
{noformat}

> Make check failure on OSX - IO error: Too many open files
> ---------------------------------------------------------
>
>                 Key: MESOS-1220
>                 URL: https://issues.apache.org/jira/browse/MESOS-1220
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>         Environment: OSX 10.9.2, Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
>            Reporter: Till Toenshoff
>            Assignee: Benjamin Mahler
>              Labels: build, build-failure, mesos, unit-test
>
> Make check runs into an abort:
> {noformat}
> $ make check
> [...]
> [       OK ] CoordinatorTest.LearnedOnOneReplica_NotLearnedOnAnother_AnotherFailsAndRecovers (0 ms)
> [----------] 21 tests from CoordinatorTest (816 ms total)
> [----------] 2 tests from RecoverTest
> [ RUN      ] RecoverTest.RacingCatchup
> F0417 21:45:21.254204 1980908304 replica.cpp:709] CHECK_SOME(state): IO error: /private/tmp/RecoverTest_RacingCatchup_if5Cz6/.log4/LOCK: Too many open filesFailed to recover the log
> *** Check failure stack trace: ***
>     @        0x10a2f9434  google::LogMessage::SendToLog()
>     @        0x10a2f9963  google::LogMessage::Flush()
>     @        0x10a2fcaff  google::LogMessageFatal::~LogMessageFatal()
>     @        0x10a2fa059  google::LogMessageFatal::~LogMessageFatal()
>     @        0x109dd8479  _CheckFatal::~_CheckFatal()
>     @        0x109dd8349  _CheckFatal::~_CheckFatal()
>     @        0x10a1b379a  mesos::internal::log::ReplicaProcess::restore()
>     @        0x10a1b3241  mesos::internal::log::ReplicaProcess::ReplicaProcess()
>     @        0x10a1b696b  mesos::internal::log::Replica::Replica()
>     @        0x1091ebd9a  RecoverTest_RacingCatchup_Test::TestBody()
>     @        0x10945234c  testing::internal::HandleExceptionsInMethodIfSupported<>()
>     @        0x1094431ea  testing::Test::Run()
>     @        0x109443e72  testing::TestInfo::Run()
>     @        0x1094444b0  testing::TestCase::Run()
>     @        0x109449d05  testing::internal::UnitTestImpl::RunAllTests()
>     @        0x109452b14  testing::internal::HandleExceptionsInMethodIfSupported<>()
>     @        0x109449a39  testing::UnitTest::Run()
>     @        0x10922a270  main
>     @     0x7fff8a98d5fd  start
>     @                0x1  (unknown)
> make[3]: *** [check-local] Abort trap: 6
> {noformat}
> That test does not fail when being run individually, hinting that we got some general file-handle leakage problem.
> The exact test that throws the abort bomb is machine and dependent. Tried it on two MBP's and one fails a few tests earlier than the other.



--
This message was sent by Atlassian JIRA
(v6.2#6252)