You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Łukasz Gajowy <lu...@gmail.com> on 2018/03/05 14:30:14 UTC

Should tests fail due to transient errors on Dataflow Runner?

Hi there!

I wonder: why tests that use TestDataflowRunner fail if there are some
transient difficulties on Dataflow pipeline?

Let's consider the JDBC Performance test case: the pipelines that are there
sometimes have trouble connecting to a Postgres instance. If this happens,
they retry processing the bundle as described in Dataflow FAQ [1]. The
PSQLExceptions that happen on Dataflow (due to connection problems) are
collected by TestDataflowRunner's messageHandler. After the whole data
processing is done, TestDataflowRunner "rethrows" gathered exceptions if
there are any ([2], [3]). IMO, this results in a "false-negative": maven
fails due to the exceptions being thrown, even despite the fact that the
job actually succeeded on Dataflow (State.DONE).

I think we should "rethrow" those exceptions only if the job status is
other than DONE, which AFAIK means that the job succeeded on Dataflow. If
Dataflow managed to handle them, I don't see any reason for the test to
fail. Am I missing something here? WDYT?

[1]
https://cloud.google.com/dataflow/faq#how-are-java-exceptions-handled-in-dataflow
[2]
https://github.com/apache/beam/blob/a3e262b96be5e6507f3c38413341b4ab607ade41/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/TestDataflowRunner.java#L197
[3]
https://builds.apache.org/view/A-D/view/Beam/job/beam_PerformanceTests_JDBC/291/console

Re: Should tests fail due to transient errors on Dataflow Runner?

Posted by Łukasz Gajowy <lu...@gmail.com>.
Thank you. I did a quick check based on what you are saying and it
confirmed that the streaming scenario is more tricky. Nevertheless this
seems to be the problem that makes JDBC IOIT flaky, so I created a Jira for
that: https://issues.apache.org/jira/browse/BEAM-3798


2018-03-06 1:52 GMT+01:00 Lukasz Cwik <lc...@google.com>:

> That makes sense but you'll want to make sure that no test + runner is
> relying on this behavior by making your change and running all the
> validates runner tests.
>
> Historically what you say was not always the case because Dataflow
> streaming jobs were never "DONE", they only were in the "RUNNING" state
> forever and required to be cancelled if an error message was ever seen.
>
> On Mon, Mar 5, 2018 at 6:30 AM, Łukasz Gajowy <lu...@gmail.com>
> wrote:
>
>> Hi there!
>>
>> I wonder: why tests that use TestDataflowRunner fail if there are some
>> transient difficulties on Dataflow pipeline?
>>
>> Let's consider the JDBC Performance test case: the pipelines that are
>> there sometimes have trouble connecting to a Postgres instance. If this
>> happens, they retry processing the bundle as described in Dataflow FAQ [1].
>> The PSQLExceptions that happen on Dataflow (due to connection problems) are
>> collected by TestDataflowRunner's messageHandler. After the whole data
>> processing is done, TestDataflowRunner "rethrows" gathered exceptions if
>> there are any ([2], [3]). IMO, this results in a "false-negative": maven
>> fails due to the exceptions being thrown, even despite the fact that the
>> job actually succeeded on Dataflow (State.DONE).
>>
>> I think we should "rethrow" those exceptions only if the job status is
>> other than DONE, which AFAIK means that the job succeeded on Dataflow. If
>> Dataflow managed to handle them, I don't see any reason for the test to
>> fail. Am I missing something here? WDYT?
>>
>> [1] https://cloud.google.com/dataflow/faq#how-are-java-exception
>> s-handled-in-dataflow
>> [2] https://github.com/apache/beam/blob/a3e262b96be5e6507f3c3841
>> 3341b4ab607ade41/runners/google-cloud-dataflow-java/
>> src/main/java/org/apache/beam/runners/dataflow/
>> TestDataflowRunner.java#L197
>> [3] https://builds.apache.org/view/A-D/view/Beam/job/beam_Pe
>> rformanceTests_JDBC/291/console
>>
>>
>

Re: Should tests fail due to transient errors on Dataflow Runner?

Posted by Lukasz Cwik <lc...@google.com>.
That makes sense but you'll want to make sure that no test + runner is
relying on this behavior by making your change and running all the
validates runner tests.

Historically what you say was not always the case because Dataflow
streaming jobs were never "DONE", they only were in the "RUNNING" state
forever and required to be cancelled if an error message was ever seen.

On Mon, Mar 5, 2018 at 6:30 AM, Łukasz Gajowy <lu...@gmail.com>
wrote:

> Hi there!
>
> I wonder: why tests that use TestDataflowRunner fail if there are some
> transient difficulties on Dataflow pipeline?
>
> Let's consider the JDBC Performance test case: the pipelines that are
> there sometimes have trouble connecting to a Postgres instance. If this
> happens, they retry processing the bundle as described in Dataflow FAQ [1].
> The PSQLExceptions that happen on Dataflow (due to connection problems) are
> collected by TestDataflowRunner's messageHandler. After the whole data
> processing is done, TestDataflowRunner "rethrows" gathered exceptions if
> there are any ([2], [3]). IMO, this results in a "false-negative": maven
> fails due to the exceptions being thrown, even despite the fact that the
> job actually succeeded on Dataflow (State.DONE).
>
> I think we should "rethrow" those exceptions only if the job status is
> other than DONE, which AFAIK means that the job succeeded on Dataflow. If
> Dataflow managed to handle them, I don't see any reason for the test to
> fail. Am I missing something here? WDYT?
>
> [1] https://cloud.google.com/dataflow/faq#how-are-java-
> exceptions-handled-in-dataflow
> [2] https://github.com/apache/beam/blob/a3e262b96be5e6507f3c38413341b4
> ab607ade41/runners/google-cloud-dataflow-java/src/main/
> java/org/apache/beam/runners/dataflow/TestDataflowRunner.java#L197
> [3] https://builds.apache.org/view/A-D/view/Beam/job/beam_
> PerformanceTests_JDBC/291/console
>
>