You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Adam Roberts (JIRA)" <ji...@apache.org> on 2016/09/16 16:13:20 UTC

[jira] [Commented] (SPARK-17564) Flaky RequestTimeoutIntegrationSuite, furtherRequestsDelay

    [ https://issues.apache.org/jira/browse/SPARK-17564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496699#comment-15496699 ] 

Adam Roberts commented on SPARK-17564:
--------------------------------------

callback1.failure is sometimes null and due to this failing intermittently I'm sure it's timing window related

We are supposed to get an IOException when the test passes:
{code}
callback1.failure: java.io.IOException: Connection from /*some ip*:35581 closed
callback1.failure.getClass: class java.io.IOException
{code}

but sometimes we get this so the assertion fails (and we should improve the message too)
{code}
callback1.failure: null
{code}

[~zsxwing] your expertise is welcome here, there's also the CountdownLatch constructor to experiment with (so could increase from 1) as well as the 1.2 sec timeouts, looking to improve this test's robustness

> Flaky RequestTimeoutIntegrationSuite, furtherRequestsDelay
> ----------------------------------------------------------
>
>                 Key: SPARK-17564
>                 URL: https://issues.apache.org/jira/browse/SPARK-17564
>             Project: Spark
>          Issue Type: Improvement
>          Components: Tests
>    Affects Versions: 2.0.1, 2.1.0
>            Reporter: Adam Roberts
>            Priority: Minor
>
> Could be related to [SPARK-10680]
> This is the test and one fix would be to increase the timeouts from 1.2 seconds to 5 seconds
> {code}
> // The timeout is relative to the LAST request sent, which is kinda weird, but still.
>   // This test also makes sure the timeout works for Fetch requests as well as RPCs.
>   @Test
>   public void furtherRequestsDelay() throws Exception {
>     final byte[] response = new byte[16];
>     final StreamManager manager = new StreamManager() {
>       @Override
>       public ManagedBuffer getChunk(long streamId, int chunkIndex) {
>         Uninterruptibles.sleepUninterruptibly(FOREVER, TimeUnit.MILLISECONDS);
>         return new NioManagedBuffer(ByteBuffer.wrap(response));
>       }
>     };
>     RpcHandler handler = new RpcHandler() {
>       @Override
>       public void receive(
>           TransportClient client,
>           ByteBuffer message,
>           RpcResponseCallback callback) {
>         throw new UnsupportedOperationException();
>       }
>       @Override
>       public StreamManager getStreamManager() {
>         return manager;
>       }
>     };
>     TransportContext context = new TransportContext(conf, handler);
>     server = context.createServer();
>     clientFactory = context.createClientFactory();
>     TransportClient client = clientFactory.createClient(TestUtils.getLocalHost(), server.getPort());
>     // Send one request, which will eventually fail.
>     TestCallback callback0 = new TestCallback();
>     client.fetchChunk(0, 0, callback0);
>     Uninterruptibles.sleepUninterruptibly(1200, TimeUnit.MILLISECONDS);
>     // Send a second request before the first has failed.
>     TestCallback callback1 = new TestCallback();
>     client.fetchChunk(0, 1, callback1);
>     Uninterruptibles.sleepUninterruptibly(1200, TimeUnit.MILLISECONDS);
>     // not complete yet, but should complete soon
>     assertEquals(-1, callback0.successLength);
>     assertNull(callback0.failure);
>     callback0.latch.await(60, TimeUnit.SECONDS);
>     assertTrue(callback0.failure instanceof IOException);
>     // failed at same time as previous
>     assertTrue(callback1.failure instanceof IOException); // This is where we fail because callback1.failure is null
>   }
> {code}
> If there are better suggestions for improving this test let's take them onboard, I think using 5 sec timeout periods would be a place to start so folks don't need to needlessly triage this failure. Will add a few prints and report back



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org