You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Alex Amato <aj...@google.com> on 2019/02/05 17:50:30 UTC

[BEAM-6594] Flakey GrpcDataServiceTest.testMessageReceivedBySingleClientWhenThereAreMultipleClients - failing in precommit

org.apache.beam.runners.fnexecution.data.GrpcDataServiceTest.testMessageReceivedBySingleClientWhenThereAreMultipleClients

I keep seeing this test failing in my PRs

https://builds.apache.org/job/beam_PreCommit_Java_Commit/4018/

https://builds.apache.org/job/beam_PreCommit_Java_Commit/4018/testReport/junit/org.apache.beam.runners.fnexecution.data/GrpcDataServiceTest/testMessageReceivedBySingleClientWhenThereAreMultipleClients/


I've seen this one come and go for a few weeks or so. I am unsure exactly
when it first occured.

Re: Is it possible to gracefully close GrpcDataService? [was Re: [BEAM-6594] Flakey GrpcDataServiceTest.testMessageReceivedBySingleClientWhenThereAreMultipleClients - failing in precommit]

Posted by Daniel Oliveira <da...@google.com>.
This is something I've run into while working on the reference runner and
it's bugged me too. I've tried looking into what the issue was but usually
hit dead ends. Your post is really helpful, I might use it to take another
look when I have the time.

On Fri, Feb 8, 2019 at 5:26 PM Alex Amato <aj...@google.com> wrote:

> I think graceful shutdown has been historically overlooked, it would not
> surprise me if there are a few things accidentally left out to gracefully
> shutdown the runner harness/sdk.
>
> IIRC there was also some discussion around starting up incorrectly as well
> (requiring a certain order of SDK process startup and runner harness
> startup, which may have had races as well.)
>
> On Fri, Feb 8, 2019 at 4:49 PM Brian Hulette <bh...@google.com> wrote:
>
>> I think I've finally got a handle on this flake, and a possible solution
>> [1]. One thing that's still bothering me though is that the "CANCELLED:
>> Multiplexer hanging up" errors seem to be unavoidable.
>>
>> They occur when the GrpcDataService is closed [2] and it closes all of
>> it's multiplexers, which just send an error to their outbound observers
>> [3]. It seems to me that there should be a more graceful way to shut
>> everything down, but I'm not seeing it. Am I missing something?
>>
>> grpc-java suggests using GrpcCleanupRule to gracefully shut-down
>> in-process servers and clients [4], should we be utilizing that somehow?
>>
>> Brian
>>
>> [1] https://github.com/apache/beam/pull/7794
>> [2]
>> https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/data/GrpcDataService.java#L117
>> [3]
>> https://github.com/apache/beam/tree/master/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataGrpcMultiplexer.java#L112
>> [4]
>> https://github.com/grpc/grpc-java/blob/master/examples/README.md#unit-test-examples
>>
>> On Thu, Feb 7, 2019 at 11:49 AM Brian Hulette <bh...@google.com>
>> wrote:
>>
>>> This was already reported in BEAM-6512 [1], which Scott gave me as a
>>> starter bug. I haven't been able to reproduce locally, so I'm trying to see
>>> if I can get it to fail on Jenkins again with some additional logging [2].
>>>
>>> Definitely interested in other's thoughts on this, I only vaguely
>>> understand what's going on. So far the only headway I've made is noticing
>>> that the "CANCELLED: Multiplexer hanging up" error seems to always occur
>>> exactly three times in failing tests. Successful runs may have one or two
>>> of these messages but never three.
>>>
>>> [1] https://issues.apache.org/jira/browse/BEAM-6512
>>> [2] https://github.com/apache/beam/pull/7767
>>>
>>> On Tue, Feb 5, 2019 at 9:50 AM Alex Amato <aj...@google.com> wrote:
>>>
>>>>
>>>> org.apache.beam.runners.fnexecution.data.GrpcDataServiceTest.testMessageReceivedBySingleClientWhenThereAreMultipleClients
>>>>
>>>> I keep seeing this test failing in my PRs
>>>>
>>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/4018/
>>>>
>>>>
>>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/4018/testReport/junit/org.apache.beam.runners.fnexecution.data/GrpcDataServiceTest/testMessageReceivedBySingleClientWhenThereAreMultipleClients/
>>>>
>>>>
>>>> I've seen this one come and go for a few weeks or so. I am unsure
>>>> exactly when it first occured.
>>>>
>>>

Re: Is it possible to gracefully close GrpcDataService? [was Re: [BEAM-6594] Flakey GrpcDataServiceTest.testMessageReceivedBySingleClientWhenThereAreMultipleClients - failing in precommit]

Posted by Alex Amato <aj...@google.com>.
I think graceful shutdown has been historically overlooked, it would not
surprise me if there are a few things accidentally left out to gracefully
shutdown the runner harness/sdk.

IIRC there was also some discussion around starting up incorrectly as well
(requiring a certain order of SDK process startup and runner harness
startup, which may have had races as well.)

On Fri, Feb 8, 2019 at 4:49 PM Brian Hulette <bh...@google.com> wrote:

> I think I've finally got a handle on this flake, and a possible solution
> [1]. One thing that's still bothering me though is that the "CANCELLED:
> Multiplexer hanging up" errors seem to be unavoidable.
>
> They occur when the GrpcDataService is closed [2] and it closes all of
> it's multiplexers, which just send an error to their outbound observers
> [3]. It seems to me that there should be a more graceful way to shut
> everything down, but I'm not seeing it. Am I missing something?
>
> grpc-java suggests using GrpcCleanupRule to gracefully shut-down
> in-process servers and clients [4], should we be utilizing that somehow?
>
> Brian
>
> [1] https://github.com/apache/beam/pull/7794
> [2]
> https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/data/GrpcDataService.java#L117
> [3]
> https://github.com/apache/beam/tree/master/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataGrpcMultiplexer.java#L112
> [4]
> https://github.com/grpc/grpc-java/blob/master/examples/README.md#unit-test-examples
>
> On Thu, Feb 7, 2019 at 11:49 AM Brian Hulette <bh...@google.com> wrote:
>
>> This was already reported in BEAM-6512 [1], which Scott gave me as a
>> starter bug. I haven't been able to reproduce locally, so I'm trying to see
>> if I can get it to fail on Jenkins again with some additional logging [2].
>>
>> Definitely interested in other's thoughts on this, I only vaguely
>> understand what's going on. So far the only headway I've made is noticing
>> that the "CANCELLED: Multiplexer hanging up" error seems to always occur
>> exactly three times in failing tests. Successful runs may have one or two
>> of these messages but never three.
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-6512
>> [2] https://github.com/apache/beam/pull/7767
>>
>> On Tue, Feb 5, 2019 at 9:50 AM Alex Amato <aj...@google.com> wrote:
>>
>>>
>>> org.apache.beam.runners.fnexecution.data.GrpcDataServiceTest.testMessageReceivedBySingleClientWhenThereAreMultipleClients
>>>
>>> I keep seeing this test failing in my PRs
>>>
>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/4018/
>>>
>>>
>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/4018/testReport/junit/org.apache.beam.runners.fnexecution.data/GrpcDataServiceTest/testMessageReceivedBySingleClientWhenThereAreMultipleClients/
>>>
>>>
>>> I've seen this one come and go for a few weeks or so. I am unsure
>>> exactly when it first occured.
>>>
>>

Is it possible to gracefully close GrpcDataService? [was Re: [BEAM-6594] Flakey GrpcDataServiceTest.testMessageReceivedBySingleClientWhenThereAreMultipleClients - failing in precommit]

Posted by Brian Hulette <bh...@google.com>.
I think I've finally got a handle on this flake, and a possible solution
[1]. One thing that's still bothering me though is that the "CANCELLED:
Multiplexer hanging up" errors seem to be unavoidable.

They occur when the GrpcDataService is closed [2] and it closes all of it's
multiplexers, which just send an error to their outbound observers [3]. It
seems to me that there should be a more graceful way to shut everything
down, but I'm not seeing it. Am I missing something?

grpc-java suggests using GrpcCleanupRule to gracefully shut-down in-process
servers and clients [4], should we be utilizing that somehow?

Brian

[1] https://github.com/apache/beam/pull/7794
[2]
https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/data/GrpcDataService.java#L117
[3]
https://github.com/apache/beam/tree/master/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataGrpcMultiplexer.java#L112
[4]
https://github.com/grpc/grpc-java/blob/master/examples/README.md#unit-test-examples

On Thu, Feb 7, 2019 at 11:49 AM Brian Hulette <bh...@google.com> wrote:

> This was already reported in BEAM-6512 [1], which Scott gave me as a
> starter bug. I haven't been able to reproduce locally, so I'm trying to see
> if I can get it to fail on Jenkins again with some additional logging [2].
>
> Definitely interested in other's thoughts on this, I only vaguely
> understand what's going on. So far the only headway I've made is noticing
> that the "CANCELLED: Multiplexer hanging up" error seems to always occur
> exactly three times in failing tests. Successful runs may have one or two
> of these messages but never three.
>
> [1] https://issues.apache.org/jira/browse/BEAM-6512
> [2] https://github.com/apache/beam/pull/7767
>
> On Tue, Feb 5, 2019 at 9:50 AM Alex Amato <aj...@google.com> wrote:
>
>>
>> org.apache.beam.runners.fnexecution.data.GrpcDataServiceTest.testMessageReceivedBySingleClientWhenThereAreMultipleClients
>>
>> I keep seeing this test failing in my PRs
>>
>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/4018/
>>
>>
>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/4018/testReport/junit/org.apache.beam.runners.fnexecution.data/GrpcDataServiceTest/testMessageReceivedBySingleClientWhenThereAreMultipleClients/
>>
>>
>> I've seen this one come and go for a few weeks or so. I am unsure exactly
>> when it first occured.
>>
>

Re: [BEAM-6594] Flakey GrpcDataServiceTest.testMessageReceivedBySingleClientWhenThereAreMultipleClients - failing in precommit

Posted by Brian Hulette <bh...@google.com>.
This was already reported in BEAM-6512 [1], which Scott gave me as a
starter bug. I haven't been able to reproduce locally, so I'm trying to see
if I can get it to fail on Jenkins again with some additional logging [2].

Definitely interested in other's thoughts on this, I only vaguely
understand what's going on. So far the only headway I've made is noticing
that the "CANCELLED: Multiplexer hanging up" error seems to always occur
exactly three times in failing tests. Successful runs may have one or two
of these messages but never three.

[1] https://issues.apache.org/jira/browse/BEAM-6512
[2] https://github.com/apache/beam/pull/7767

On Tue, Feb 5, 2019 at 9:50 AM Alex Amato <aj...@google.com> wrote:

>
> org.apache.beam.runners.fnexecution.data.GrpcDataServiceTest.testMessageReceivedBySingleClientWhenThereAreMultipleClients
>
> I keep seeing this test failing in my PRs
>
> https://builds.apache.org/job/beam_PreCommit_Java_Commit/4018/
>
>
> https://builds.apache.org/job/beam_PreCommit_Java_Commit/4018/testReport/junit/org.apache.beam.runners.fnexecution.data/GrpcDataServiceTest/testMessageReceivedBySingleClientWhenThereAreMultipleClients/
>
>
> I've seen this one come and go for a few weeks or so. I am unsure exactly
> when it first occured.
>