You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Robin Qiu <ro...@google.com> on 2018/08/27 18:26:44 UTC

Should we allow ValidatesRunner tests to have access to file systems?

Hello everyone,

I am writing a test [1] for the support of @RequiresStableInput annotation
in Java SDK [2]. For the test to work, I need to have a ParDo make some
side effect (e.g. writing to a file system). However, ValidatesRunner tests
in Beam currently cannot depend on external states (cannot write to file
systems). So I am wondering if it is a good idea to allow ValidatesRunner
tests to have access to file systems. This way we can create more flexible
ValidatesRunner tests.

I could make this test a integration test to get access to file systems
(e.g. like WordCountIT.java [3]). But functionally I think this test should
be a ValidatesRunner test, because it is testing the support of some SDK
features on runners.

So what do you think? Any suggestions or concerns are appreciated.

Best,
Robin

[1] https://github.com/apache/beam/pull/6220
[2]
https://docs.google.com/document/d/117yRKbbcEdm3eIKB_26BHOJGmHSZl1YNoF0RqWGtqAM/edit#
[3]
https://github.com/apache/beam/blob/master/examples/java/src/test/java/org/apache/beam/examples/WordCountIT.java

Re: Should we allow ValidatesRunner tests to have access to file systems?

Posted by Robin Qiu <ro...@google.com>.
Hi Alan and Luke,

Thanks for your comments! I agree on your suggestions, and now I have made
the test into an integration test. Please take a look at the PR when you
have time.

Luke's suggestion on using metrics sounds promising. I will open a separate
email thread to ask people's idea on making the improvement.

Best,
Robin

On Tue, Aug 28, 2018 at 5:42 PM Lukasz Cwik <lc...@google.com> wrote:

> I also agree about not having external dependencies in validates runner
> tests.
>
> One suggestion would have been to use attempted metrics but there is
> currently no way to get access to runner metrics from within a DoFn easily
> that is runner agnostic. This is likely a place for improvement since:
> * cancelling a pipeline from within the pipeline is useful
> * starting a new job against the existing runner from in a pipeline is
> useful
> * accessing attempted metrics to test DoFn's with side effects is useful
> for error handling testing
>
> On Mon, Aug 27, 2018 at 12:40 PM Alan Myrvold <am...@google.com> wrote:
>
>> I think this should be an integration test if it requires more access
>> than the current ValidatesRunner tests.
>>
>> Although the ValidatesRunner and integration tests are similar, the
>> intent is that the validates runner tests are smaller and more like
>> component tests, and there have been discusions on fusing the validates
>> runner tests into a smaller set of pipelines.
>>
>> On Mon, Aug 27, 2018 at 11:27 AM Robin Qiu <ro...@google.com> wrote:
>>
>>> Hello everyone,
>>>
>>> I am writing a test [1] for the support of @RequiresStableInput
>>> annotation in Java SDK [2]. For the test to work, I need to have a ParDo
>>> make some side effect (e.g. writing to a file system). However,
>>> ValidatesRunner tests in Beam currently cannot depend on external states
>>> (cannot write to file systems). So I am wondering if it is a good idea to
>>> allow ValidatesRunner tests to have access to file systems. This way we can
>>> create more flexible ValidatesRunner tests.
>>>
>>> I could make this test a integration test to get access to file systems
>>> (e.g. like WordCountIT.java [3]). But functionally I think this test should
>>> be a ValidatesRunner test, because it is testing the support of some SDK
>>> features on runners.
>>>
>>> So what do you think? Any suggestions or concerns are appreciated.
>>>
>>> Best,
>>> Robin
>>>
>>> [1] https://github.com/apache/beam/pull/6220
>>> [2]
>>> https://docs.google.com/document/d/117yRKbbcEdm3eIKB_26BHOJGmHSZl1YNoF0RqWGtqAM/edit#
>>> [3]
>>> https://github.com/apache/beam/blob/master/examples/java/src/test/java/org/apache/beam/examples/WordCountIT.java
>>>
>>>

Re: Should we allow ValidatesRunner tests to have access to file systems?

Posted by Lukasz Cwik <lc...@google.com>.
I also agree about not having external dependencies in validates runner
tests.

One suggestion would have been to use attempted metrics but there is
currently no way to get access to runner metrics from within a DoFn easily
that is runner agnostic. This is likely a place for improvement since:
* cancelling a pipeline from within the pipeline is useful
* starting a new job against the existing runner from in a pipeline is
useful
* accessing attempted metrics to test DoFn's with side effects is useful
for error handling testing

On Mon, Aug 27, 2018 at 12:40 PM Alan Myrvold <am...@google.com> wrote:

> I think this should be an integration test if it requires more access than
> the current ValidatesRunner tests.
>
> Although the ValidatesRunner and integration tests are similar, the intent
> is that the validates runner tests are smaller and more like component
> tests, and there have been discusions on fusing the validates runner tests
> into a smaller set of pipelines.
>
> On Mon, Aug 27, 2018 at 11:27 AM Robin Qiu <ro...@google.com> wrote:
>
>> Hello everyone,
>>
>> I am writing a test [1] for the support of @RequiresStableInput
>> annotation in Java SDK [2]. For the test to work, I need to have a ParDo
>> make some side effect (e.g. writing to a file system). However,
>> ValidatesRunner tests in Beam currently cannot depend on external states
>> (cannot write to file systems). So I am wondering if it is a good idea to
>> allow ValidatesRunner tests to have access to file systems. This way we can
>> create more flexible ValidatesRunner tests.
>>
>> I could make this test a integration test to get access to file systems
>> (e.g. like WordCountIT.java [3]). But functionally I think this test should
>> be a ValidatesRunner test, because it is testing the support of some SDK
>> features on runners.
>>
>> So what do you think? Any suggestions or concerns are appreciated.
>>
>> Best,
>> Robin
>>
>> [1] https://github.com/apache/beam/pull/6220
>> [2]
>> https://docs.google.com/document/d/117yRKbbcEdm3eIKB_26BHOJGmHSZl1YNoF0RqWGtqAM/edit#
>> [3]
>> https://github.com/apache/beam/blob/master/examples/java/src/test/java/org/apache/beam/examples/WordCountIT.java
>>
>>

Re: Should we allow ValidatesRunner tests to have access to file systems?

Posted by Alan Myrvold <am...@google.com>.
I think this should be an integration test if it requires more access than
the current ValidatesRunner tests.

Although the ValidatesRunner and integration tests are similar, the intent
is that the validates runner tests are smaller and more like component
tests, and there have been discusions on fusing the validates runner tests
into a smaller set of pipelines.

On Mon, Aug 27, 2018 at 11:27 AM Robin Qiu <ro...@google.com> wrote:

> Hello everyone,
>
> I am writing a test [1] for the support of @RequiresStableInput annotation
> in Java SDK [2]. For the test to work, I need to have a ParDo make some
> side effect (e.g. writing to a file system). However, ValidatesRunner tests
> in Beam currently cannot depend on external states (cannot write to file
> systems). So I am wondering if it is a good idea to allow ValidatesRunner
> tests to have access to file systems. This way we can create more flexible
> ValidatesRunner tests.
>
> I could make this test a integration test to get access to file systems
> (e.g. like WordCountIT.java [3]). But functionally I think this test should
> be a ValidatesRunner test, because it is testing the support of some SDK
> features on runners.
>
> So what do you think? Any suggestions or concerns are appreciated.
>
> Best,
> Robin
>
> [1] https://github.com/apache/beam/pull/6220
> [2]
> https://docs.google.com/document/d/117yRKbbcEdm3eIKB_26BHOJGmHSZl1YNoF0RqWGtqAM/edit#
> [3]
> https://github.com/apache/beam/blob/master/examples/java/src/test/java/org/apache/beam/examples/WordCountIT.java
>
>