You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Bernd Mathiske <be...@mesosphere.io> on 2015/06/14 15:34:59 UTC

Re: Review Request 35247: Fixed race between EXPECT_CALL(resourceOffers, _) and driver.start() in fetcher_cache_tests.cpp.

Vinod,

I am working on it. I reckon this will work:

s/Return()/DeclineOffers()

I’ll prepare a RR with this, that also fixes the CHECK the way Jie proposed.

Bernd

> On Jun 12, 2015, at 8:59 PM, Vinod Kone <vi...@gmail.com> wrote:
> 
> This is an automatically generated e-mail. To reply, visit:https://reviews.apache.org/r/35247/ <https://reviews.apache.org/r/35247/>
> On June 9th, 2015, 6:11 p.m. UTC, Vinod Kone wrote:
> 
> src/tests/fetcher_cache_tests.cpp <https://reviews.apache.org/r/35247/diff/1/?file=981340#file981340line201> (Diff revision 1)
> void FetcherCacheTest::SetUp()
> 201	
> 202	
>   // This installs a temporary reaction to resourceOffers calls, which
> 203	
>   // must be in place BEFORE starting the scheduler driver. This
> 204	
>   // "cover" is necessary, because we only add relevant mock actions
> 205	
>   // in launchTask() and launchTasks() AFTER starting the driver.
> 206	
>   EXPECT_CALL(scheduler, resourceOffers(driver, _))
> 207	
>     .WillRepeatedly(Return());
> While this looks good as a temporary fix, what is the long term strategy here?
> I really don't like setting expectations in SetUp() or TearDown() because it's really hard to reason about in the individual tests. For example, why did you set expecations on registered and offers but not others? I prefer to move these expectations to tests. 
> Also, this SetUp() is doing too much (starting slave, starting master, constructing scheduler but not starting it, setting some expectations) and there is no documentation for it!
> On June 9th, 2015, 8:07 p.m. UTC, Bernd Mathiske wrote:
> 
> Long term I am working on developing up stress tests for the fetcher. These are still relatively basic functionality tests so far.
> Yes, SetUp() and TearDown() do a lot here. Would you prefer a) inlining them or b) commenting what they do more or c) both? In my experience a) would be most consistent with existing patterns, but it makes it harder to spot what the different tests are doing besides all the code that is always the same. And the general code blowup would be rather substantial in this test series.
> On June 10th, 2015, 5:36 a.m. UTC, Bernd Mathiske wrote:
> 
> Maybe you meant the long term plan wrt. the code structure here. In this case, it is to create more generally applicable test pattern lego blocks that aggregate such things as in the Setup() here. These will have to be carefully selected/crafted then.
> On June 10th, 2015, 9:01 p.m. UTC, Vinod Kone wrote:
> 
> i would prefer to inline them as we do everywhere else in the code base. The current abstractions are a bit hard to follow.
> Actually, looks like this fix is wrong! If this generic expectation catches a resource offer, it is possible that the expectation in launchTask() wouldn't get any offer, failing the CHECK.
> 18:26:48 DEBUG: [ RUN      ] FetcherCacheTest.CachedFallback
> ...
> ...
> ...
> 18:26:49 DEBUG: I0612 18:26:49.245573 13609 hierarchical.hpp:354] Added framework 20150612-182649-1787367596-40122-13591-0000
> 18:26:49 DEBUG: I0612 18:26:49.245582 13610 sched.cpp:448] Framework registered with 20150612-182649-1787367596-40122-13591-0000
> 18:26:49 DEBUG: I0612 18:26:49.245728 13609 master.cpp:4108] Sending 1 offers to framework 20150612-182649-1787367596-40122-13591-0000 (default) at scheduler-fed7aed8-916f-48c8-8b74-3f9b2e96bccc@<redacted>
> 18:26:49 DEBUG: I0612 18:26:49.263645 13622 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 20.784933ms
> 18:26:49 DEBUG: I0612 18:26:49.263702 13622 leveldb.cpp:401] Deleting ~2 keys from leveldb took 25193ns
> 18:26:49 DEBUG: I0612 18:26:49.263715 13622 replica.cpp:679] Persisted action at 4
> 18:26:49 DEBUG: I0612 18:26:49.263722 13622 replica.cpp:664] Replica learned TRUNCATE action at position 4
> 18:27:04 DEBUG: I0612 18:27:04.234977 13607 slave.cpp:4048] Querying resource estimator for oversubscribable resources
> 18:27:04 DEBUG: I0612 18:27:04.235074 13607 slave.cpp:4069] Received oversubscribable resources  from the resource estimator
> 18:27:04 DEBUG: F0612 18:27:04.264626 13591 fetcher_cache_tests.cpp:361] CHECK_READY(offers): is PENDING Failed to wait for resource offers
> 18:27:04 DEBUG: *** Check failure stack trace: ***
> 18:27:04 DEBUG:     @     0x7f72fb2715fd  google::LogMessage::Fail()
> 18:27:04 DEBUG:     @     0x7f72fb27343d  google::LogMessage::SendToLog()
> 18:27:04 DEBUG:     @     0x7f72fb2711ec  google::LogMessage::Flush()
> 18:27:04 DEBUG:     @     0x7f72fb273d39  google::LogMessageFatal::~LogMessageFatal()
> 18:27:04 DEBUG:     @           0x53dab8  _CheckFatal::~_CheckFatal()
> 18:27:04 DEBUG:     @           0x66be2f  mesos::internal::tests::FetcherCacheTest::launchTask()
> 18:27:04 DEBUG:     @           0x66f6c9  mesos::internal::tests::FetcherCacheTest_CachedFallback_Test::TestBody()
> 18:27:04 DEBUG:     @           0xbb24e3  testing::internal::HandleExceptionsInMethodIfSupported<>()
> 18:27:04 DEBUG:     @           0xba9787  testing::Test::Run()
> 18:27:04 DEBUG:     @           0xba982e  testing::TestInfo::Run()
> 18:27:04 DEBUG:     @           0xba9935  testing::TestCase::Run()
> 18:27:04 DEBUG:     @           0xba9bd8  testing::internal::UnitTestImpl::RunAllTests()
> 18:27:04 DEBUG:     @           0xba9e77  testing::UnitTest::Run()
> 18:27:04 DEBUG:     @           0x4a2073  main
> 18:27:04 DEBUG:     @     0x7f72f90a2d5d  __libc_start_main
> 18:27:04 DEBUG:     @           0x4ad3f9  (unknown)
> 
> Also, as Jie mentioned below, we should never do CHECK inside tests because it will crash the whole unix process running tests. We are currently having trouble getting our internal RPM builds to succeed because of this. Can you please fix this ASAP?
> 
> - Vinod
> 
> 
> On June 9th, 2015, 9:32 a.m. UTC, Bernd Mathiske wrote:
> 
> Review request for mesos, Benjamin Hindman, Till Toenshoff, and Vinod Kone.
> By Bernd Mathiske.
> Updated June 9, 2015, 9:32 a.m.
> 
> Bugs: MESOS-2815 <https://issues.apache.org/jira/browse/MESOS-2815>, MESOS-2829 <https://issues.apache.org/jira/browse/MESOS-2829>, MESOS-2831 <https://issues.apache.org/jira/browse/MESOS-2831>
> Repository: mesos
> Description
> 
> Fixed race between EXPECT_CALL(resourceOffers, _) and driver.start() in fetcher_cache_tests.cpp.
> Installed a default response that provides teporary cover for this mocked method until we install more interesting callbacks later. This prevents gmock from complaining about an "uninteresting gmock call", which led to a variety of tests failing due to offers not getting through subsequently.
> All fetcher cache tests have been affected by this race and should be fixed in this regard now.
> (Also fixed some typos.)
> Testing
> 
> make check
> Diffs
> 
> src/tests/fetcher_cache_tests.cpp (cbd44b98d19953d174fac977f509d4900a37481f)
> View Diff <https://reviews.apache.org/r/35247/diff/>