You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Brian Hulette <bh...@google.com> on 2020/12/02 21:21:51 UTC

Unit tests vs. Integration Tests

I've been working with Niels Basjes on a standardized developer build
environment that can run `./gradlew check` [1]. We've run into issues
because several Java unit tests (class *Test, run with :test) are not
hermetic. They fail unless the environment they're running in has access to
the internet, and is authenticated to GCP with access to certain resources.
Of course the former isn't typically a blocker, but the latter certainly
can be.

Personally I think these tests should be classified as integration tests
(renamed to *IT, and run with the :integrationTest task), but I realized I
don't know if we have a formal definition for what should be a unit test vs
an integration test. Should we (do we) require unit tests to be hermetic?

Brian

[1] https://github.com/apache/beam/pull/13308

Re: Unit tests vs. Integration Tests

Posted by Etienne Chauchot <ec...@apache.org>.
Big +1 on using testcontainers rather than embedded real backends. That 
is what we plan to use for ES refactoring.

I'm a strong believer that mocks are useless to replace complex 
backends. Testing things like IOs against mocks are a almost certain 
failure because mocks cannot be representative of the complex backend.

Etienne

On 08/12/2020 00:36, Kenneth Knowles wrote:
>
>
> On Fri, Dec 4, 2020 at 3:29 PM Brian Hulette <bhulette@google.com 
> <ma...@google.com>> wrote:
>
>
>
>     On Wed, Dec 2, 2020 at 5:50 PM Brian Hulette <bhulette@google.com
>     <ma...@google.com>> wrote:
>
>         I guess another question I should ask - is :test supposed to
>         only run unit tests? I've been assuming so since many modules
>         have separate :integrationTest tasks for *IT tests.
>
>         On Wed, Dec 2, 2020 at 4:15 PM Kyle Weaver
>         <kcweaver@google.com <ma...@google.com>> wrote:
>
>             > DicomIOTest, FhirIOTest,
>             HL7v2IOTest, org.apache.beam.sdk.extensions.ml.*IT
>
>             Looking only at the former three tests, I don't see any
>             reason they can't mock their API clients, especially
>             since all they expect the server to do is send back an error.
>
>
>         Fair point, that wouldn't be *too* much trouble. More than
>         just re-classifying them as integration tests though :)
>
>
> Can we write the tests to be agnostic as to whether the service is 
> mocked, faked, or real? Then we can run them in various modes.
>
> I'm a strong believer in "fake don't mock" to the extent that I think 
> mocking these might be counterproductive. On a case-by-case basis, 
> perhaps one can write a tiny in-memory version that has the 
> functionality needed rather than exact scripted responses. This will 
> interact well with the idea of making the test unaware of whether it 
> is running against the real service or not.
>
> This way we can run a fast (or at least hermetic) version as a unit 
> test and once in a while sanity check it against the real service.
>
> (In a perfect world, owners of services would always ship a local 
> testing fake and own keeping it up to date... testcontainers is one 
> approach but also in-memory fakes are great)
>
> Kenn
>
>             > This seems like something that is easy to get wrong
>             without some automation to help. Could we run the :test
>             targets on Jenkins using the sandbox command or docker to
>             block network access?
>
>             That's a great idea. Are we planning on integrating the
>             "standardized developer build environment" mentioned in
>             the original post into our CI somehow?
>
>
>         I was thinking it could be good to use it in CI somehow to
>         make sure it doesn't get out of date, but all I had in mind
>         was running some minimal set of tasks. Using it in this way
>         would obviously be even better.
>
>
>     I filed https://issues.apache.org/jira/browse/BEAM-11404 to track
>     this idea.
>
>
>             On Wed, Dec 2, 2020 at 4:03 PM Andrew Pilloud
>             <apilloud@google.com <ma...@google.com>> wrote:
>
>                 We have a large number of tests that run pipelines on
>                 the Direct Runner or various local runners, but don't
>                 require network access, so I don't think the
>                 distinction is clear. I do agree that requiring a
>                 remote service falls on the integration test side.
>
>                 This seems like something that is easy to get wrong
>                 without some automation to help. Could we run the
>                 :test targets on Jenkins using the sandbox command or
>                 docker to block network access?
>
>                 On Wed, Dec 2, 2020 at 3:38 PM Brian Hulette
>                 <bhulette@google.com <ma...@google.com>> wrote:
>
>                     Sorry I should've included the list of tests here.
>                     So far we've run into:
>                     DicomIOTest, FhirIOTest,
>                     HL7v2IOTest, org.apache.beam.sdk.extensions.ml.*IT
>
>                     Note the latter are called IT, but that package's
>                     build.gradle has a line to scoop ITs into the
>                     :test task (addressing in [1]).
>
>                     All of these tests are actually running pipelines
>                     so I think they'd be difficult to mock.
>
>                     [1] https://github.com/apache/beam/pull/13444
>
>                     On Wed, Dec 2, 2020 at 3:28 PM Kyle Weaver
>                     <kcweaver@google.com <ma...@google.com>>
>                     wrote:
>
>                         > Should we (do we) require unit tests to be
>                         hermetic?
>
>                         We should. Unit tests are hermetic by
>                         definition. That begs the definition of
>                         hermetic, but clearly the internet is not.
>
>                         > Personally I think these tests should be
>                         classified as integration tests (renamed to
>                         *IT, and run with the :integrationTest task)
>
>                         I'm not sure which tests you're talking about,
>                         but it may be better to make them hermetic
>                         through mocking, depending on the intent of
>                         the test.
>
>                         On Wed, Dec 2, 2020 at 1:22 PM Brian Hulette
>                         <bhulette@google.com
>                         <ma...@google.com>> wrote:
>
>                             I've been working with Niels Basjes on a
>                             standardized developer build environment
>                             that can run `./gradlew check` [1]. We've
>                             run into issues because several Java unit
>                             tests (class *Test, run with :test) are
>                             not hermetic. They fail unless the
>                             environment they're running in has access
>                             to the internet, and is authenticated to
>                             GCP with access to certain resources. Of
>                             course the former isn't typically a
>                             blocker, but the latter certainly can be.
>
>                             Personally I think these tests should be
>                             classified as integration tests (renamed
>                             to *IT, and run with the :integrationTest
>                             task), but I realized I don't know if we
>                             have a formal definition for what should
>                             be a unit test vs an integration test.
>                             Should we (do we) require unit tests to be
>                             hermetic?
>
>                             Brian
>
>                             [1] https://github.com/apache/beam/pull/13308
>

Re: Unit tests vs. Integration Tests

Posted by Kenneth Knowles <ke...@apache.org>.
On Mon, Dec 7, 2020 at 4:03 PM Kyle Weaver <kc...@google.com> wrote:

> > Can we write the tests to be agnostic as to whether the service is
> mocked, faked, or real? Then we can run them in various modes.
>
> Even better.
>
> > I'm a strong believer in "fake don't mock" to the extent that I think
> mocking these might be counterproductive.
>
> Fakes are great too. As long as there's an alternative to the real service.
>

One example is DataflowPipelineTranslatorTest. The test methods do not set
up mocking, nor directly refer to mocks or fakes. Instead, the test class
builds a mock of GcsUtil that is used throughout, because other classes
pull out the GcsUtil from the PipelineOptions. I cannot condone using a
global options object to inject the mock, but the tests themselves do not
set it up so they are mostly agnostic.

It is actually a mixed result: because the mock setup is global, the tests
implicitly rely on it and violate the principle that tests should read as
straight-line as possible without chasing dependencies. If the tests did
their own setup, then the mock would have to handle to GCS writes and set
variables that would later be read in GCS reads. So the mock would become a
fake, basically. Mockito's helpers would just be awkward ways of defining a
class.

Incidentally I am making a change with nothing relevant to GCS, yet causes
failures in the irrelevant mockito aspects of the test. So that likely
intensifies my usual anti-mock stance :-)

Kenn


>
> On Mon, Dec 7, 2020 at 3:37 PM Kenneth Knowles <ke...@apache.org> wrote:
>
>>
>>
>> On Fri, Dec 4, 2020 at 3:29 PM Brian Hulette <bh...@google.com> wrote:
>>
>>>
>>>
>>> On Wed, Dec 2, 2020 at 5:50 PM Brian Hulette <bh...@google.com>
>>> wrote:
>>>
>>>> I guess another question I should ask - is :test supposed to only run
>>>> unit tests? I've been assuming so since many modules have separate
>>>> :integrationTest tasks for *IT tests.
>>>>
>>>> On Wed, Dec 2, 2020 at 4:15 PM Kyle Weaver <kc...@google.com> wrote:
>>>>
>>>>> > DicomIOTest, FhirIOTest,
>>>>> HL7v2IOTest, org.apache.beam.sdk.extensions.ml.*IT
>>>>>
>>>>> Looking only at the former three tests, I don't see any reason they
>>>>> can't mock their API clients, especially since all they expect the server
>>>>> to do is send back an error.
>>>>>
>>>>
>>>> Fair point, that wouldn't be *too* much trouble. More than just
>>>> re-classifying them as integration tests though :)
>>>>
>>>
>> Can we write the tests to be agnostic as to whether the service is
>> mocked, faked, or real? Then we can run them in various modes.
>>
>> I'm a strong believer in "fake don't mock" to the extent that I think
>> mocking these might be counterproductive. On a case-by-case basis, perhaps
>> one can write a tiny in-memory version that has the functionality needed
>> rather than exact scripted responses. This will interact well with the idea
>> of making the test unaware of whether it is running against the real
>> service or not.
>>
>> This way we can run a fast (or at least hermetic) version as a unit test
>> and once in a while sanity check it against the real service.
>>
>> (In a perfect world, owners of services would always ship a local testing
>> fake and own keeping it up to date... testcontainers is one approach but
>> also in-memory fakes are great)
>>
>> Kenn
>>
>>
>>> > This seems like something that is easy to get wrong without some
>>>>> automation to help. Could we run the :test targets on Jenkins using the
>>>>> sandbox command or docker to block network access?
>>>>>
>>>>> That's a great idea. Are we planning on integrating the "standardized
>>>>> developer build environment" mentioned in the original post into our CI
>>>>> somehow?
>>>>>
>>>>
>>>> I was thinking it could be good to use it in CI somehow to make sure it
>>>> doesn't get out of date, but all I had in mind was running some minimal set
>>>> of tasks. Using it in this way would obviously be even better.
>>>>
>>>
>>> I filed https://issues.apache.org/jira/browse/BEAM-11404 to track this
>>> idea.
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> On Wed, Dec 2, 2020 at 4:03 PM Andrew Pilloud <ap...@google.com>
>>>>> wrote:
>>>>>
>>>>>> We have a large number of tests that run pipelines on the Direct
>>>>>> Runner or various local runners, but don't require network access, so I
>>>>>> don't think the distinction is clear. I do agree that requiring a remote
>>>>>> service falls on the integration test side.
>>>>>>
>>>>>> This seems like something that is easy to get wrong without some
>>>>>> automation to help. Could we run the :test targets on Jenkins using the
>>>>>> sandbox command or docker to block network access?
>>>>>>
>>>>>> On Wed, Dec 2, 2020 at 3:38 PM Brian Hulette <bh...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Sorry I should've included the list of tests here. So far we've run
>>>>>>> into:
>>>>>>> DicomIOTest, FhirIOTest,
>>>>>>> HL7v2IOTest, org.apache.beam.sdk.extensions.ml.*IT
>>>>>>>
>>>>>>> Note the latter are called IT, but that package's build.gradle has a
>>>>>>> line to scoop ITs into the :test task (addressing in [1]).
>>>>>>>
>>>>>>> All of these tests are actually running pipelines so I think they'd
>>>>>>> be difficult to mock.
>>>>>>>
>>>>>>> [1] https://github.com/apache/beam/pull/13444
>>>>>>>
>>>>>>> On Wed, Dec 2, 2020 at 3:28 PM Kyle Weaver <kc...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> > Should we (do we) require unit tests to be hermetic?
>>>>>>>>
>>>>>>>> We should. Unit tests are hermetic by definition. That begs the
>>>>>>>> definition of hermetic, but clearly the internet is not.
>>>>>>>>
>>>>>>>> > Personally I think these tests should be classified as
>>>>>>>> integration tests (renamed to *IT, and run with the :integrationTest task)
>>>>>>>>
>>>>>>>> I'm not sure which tests you're talking about, but it may be better
>>>>>>>> to make them hermetic through mocking, depending on the intent of the test.
>>>>>>>>
>>>>>>>> On Wed, Dec 2, 2020 at 1:22 PM Brian Hulette <bh...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I've been working with Niels Basjes on a standardized developer
>>>>>>>>> build environment that can run `./gradlew check` [1]. We've run into issues
>>>>>>>>> because several Java unit tests (class *Test, run with :test) are not
>>>>>>>>> hermetic. They fail unless the environment they're running in has access to
>>>>>>>>> the internet, and is authenticated to GCP with access to certain resources.
>>>>>>>>> Of course the former isn't typically a blocker, but the latter certainly
>>>>>>>>> can be.
>>>>>>>>>
>>>>>>>>> Personally I think these tests should be classified as integration
>>>>>>>>> tests (renamed to *IT, and run with the :integrationTest task), but I
>>>>>>>>> realized I don't know if we have a formal definition for what should be a
>>>>>>>>> unit test vs an integration test. Should we (do we) require unit tests to
>>>>>>>>> be hermetic?
>>>>>>>>>
>>>>>>>>> Brian
>>>>>>>>>
>>>>>>>>> [1] https://github.com/apache/beam/pull/13308
>>>>>>>>>
>>>>>>>>

Re: Unit tests vs. Integration Tests

Posted by Kyle Weaver <kc...@google.com>.
> Can we write the tests to be agnostic as to whether the service is
mocked, faked, or real? Then we can run them in various modes.

Even better.

> I'm a strong believer in "fake don't mock" to the extent that I think
mocking these might be counterproductive.

Fakes are great too. As long as there's an alternative to the real service.

On Mon, Dec 7, 2020 at 3:37 PM Kenneth Knowles <ke...@apache.org> wrote:

>
>
> On Fri, Dec 4, 2020 at 3:29 PM Brian Hulette <bh...@google.com> wrote:
>
>>
>>
>> On Wed, Dec 2, 2020 at 5:50 PM Brian Hulette <bh...@google.com> wrote:
>>
>>> I guess another question I should ask - is :test supposed to only run
>>> unit tests? I've been assuming so since many modules have separate
>>> :integrationTest tasks for *IT tests.
>>>
>>> On Wed, Dec 2, 2020 at 4:15 PM Kyle Weaver <kc...@google.com> wrote:
>>>
>>>> > DicomIOTest, FhirIOTest,
>>>> HL7v2IOTest, org.apache.beam.sdk.extensions.ml.*IT
>>>>
>>>> Looking only at the former three tests, I don't see any reason they
>>>> can't mock their API clients, especially since all they expect the server
>>>> to do is send back an error.
>>>>
>>>
>>> Fair point, that wouldn't be *too* much trouble. More than just
>>> re-classifying them as integration tests though :)
>>>
>>
> Can we write the tests to be agnostic as to whether the service is mocked,
> faked, or real? Then we can run them in various modes.
>
> I'm a strong believer in "fake don't mock" to the extent that I think
> mocking these might be counterproductive. On a case-by-case basis, perhaps
> one can write a tiny in-memory version that has the functionality needed
> rather than exact scripted responses. This will interact well with the idea
> of making the test unaware of whether it is running against the real
> service or not.
>
> This way we can run a fast (or at least hermetic) version as a unit test
> and once in a while sanity check it against the real service.
>
> (In a perfect world, owners of services would always ship a local testing
> fake and own keeping it up to date... testcontainers is one approach but
> also in-memory fakes are great)
>
> Kenn
>
>
>> > This seems like something that is easy to get wrong without some
>>>> automation to help. Could we run the :test targets on Jenkins using the
>>>> sandbox command or docker to block network access?
>>>>
>>>> That's a great idea. Are we planning on integrating the "standardized
>>>> developer build environment" mentioned in the original post into our CI
>>>> somehow?
>>>>
>>>
>>> I was thinking it could be good to use it in CI somehow to make sure it
>>> doesn't get out of date, but all I had in mind was running some minimal set
>>> of tasks. Using it in this way would obviously be even better.
>>>
>>
>> I filed https://issues.apache.org/jira/browse/BEAM-11404 to track this
>> idea.
>>
>>
>>>
>>>
>>>>
>>>> On Wed, Dec 2, 2020 at 4:03 PM Andrew Pilloud <ap...@google.com>
>>>> wrote:
>>>>
>>>>> We have a large number of tests that run pipelines on the Direct
>>>>> Runner or various local runners, but don't require network access, so I
>>>>> don't think the distinction is clear. I do agree that requiring a remote
>>>>> service falls on the integration test side.
>>>>>
>>>>> This seems like something that is easy to get wrong without some
>>>>> automation to help. Could we run the :test targets on Jenkins using the
>>>>> sandbox command or docker to block network access?
>>>>>
>>>>> On Wed, Dec 2, 2020 at 3:38 PM Brian Hulette <bh...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Sorry I should've included the list of tests here. So far we've run
>>>>>> into:
>>>>>> DicomIOTest, FhirIOTest,
>>>>>> HL7v2IOTest, org.apache.beam.sdk.extensions.ml.*IT
>>>>>>
>>>>>> Note the latter are called IT, but that package's build.gradle has a
>>>>>> line to scoop ITs into the :test task (addressing in [1]).
>>>>>>
>>>>>> All of these tests are actually running pipelines so I think they'd
>>>>>> be difficult to mock.
>>>>>>
>>>>>> [1] https://github.com/apache/beam/pull/13444
>>>>>>
>>>>>> On Wed, Dec 2, 2020 at 3:28 PM Kyle Weaver <kc...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> > Should we (do we) require unit tests to be hermetic?
>>>>>>>
>>>>>>> We should. Unit tests are hermetic by definition. That begs the
>>>>>>> definition of hermetic, but clearly the internet is not.
>>>>>>>
>>>>>>> > Personally I think these tests should be classified as integration
>>>>>>> tests (renamed to *IT, and run with the :integrationTest task)
>>>>>>>
>>>>>>> I'm not sure which tests you're talking about, but it may be better
>>>>>>> to make them hermetic through mocking, depending on the intent of the test.
>>>>>>>
>>>>>>> On Wed, Dec 2, 2020 at 1:22 PM Brian Hulette <bh...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I've been working with Niels Basjes on a standardized developer
>>>>>>>> build environment that can run `./gradlew check` [1]. We've run into issues
>>>>>>>> because several Java unit tests (class *Test, run with :test) are not
>>>>>>>> hermetic. They fail unless the environment they're running in has access to
>>>>>>>> the internet, and is authenticated to GCP with access to certain resources.
>>>>>>>> Of course the former isn't typically a blocker, but the latter certainly
>>>>>>>> can be.
>>>>>>>>
>>>>>>>> Personally I think these tests should be classified as integration
>>>>>>>> tests (renamed to *IT, and run with the :integrationTest task), but I
>>>>>>>> realized I don't know if we have a formal definition for what should be a
>>>>>>>> unit test vs an integration test. Should we (do we) require unit tests to
>>>>>>>> be hermetic?
>>>>>>>>
>>>>>>>> Brian
>>>>>>>>
>>>>>>>> [1] https://github.com/apache/beam/pull/13308
>>>>>>>>
>>>>>>>

Re: Unit tests vs. Integration Tests

Posted by Kenneth Knowles <ke...@apache.org>.
On Fri, Dec 4, 2020 at 3:29 PM Brian Hulette <bh...@google.com> wrote:

>
>
> On Wed, Dec 2, 2020 at 5:50 PM Brian Hulette <bh...@google.com> wrote:
>
>> I guess another question I should ask - is :test supposed to only run
>> unit tests? I've been assuming so since many modules have separate
>> :integrationTest tasks for *IT tests.
>>
>> On Wed, Dec 2, 2020 at 4:15 PM Kyle Weaver <kc...@google.com> wrote:
>>
>>> > DicomIOTest, FhirIOTest,
>>> HL7v2IOTest, org.apache.beam.sdk.extensions.ml.*IT
>>>
>>> Looking only at the former three tests, I don't see any reason they
>>> can't mock their API clients, especially since all they expect the server
>>> to do is send back an error.
>>>
>>
>> Fair point, that wouldn't be *too* much trouble. More than just
>> re-classifying them as integration tests though :)
>>
>
Can we write the tests to be agnostic as to whether the service is mocked,
faked, or real? Then we can run them in various modes.

I'm a strong believer in "fake don't mock" to the extent that I think
mocking these might be counterproductive. On a case-by-case basis, perhaps
one can write a tiny in-memory version that has the functionality needed
rather than exact scripted responses. This will interact well with the idea
of making the test unaware of whether it is running against the real
service or not.

This way we can run a fast (or at least hermetic) version as a unit test
and once in a while sanity check it against the real service.

(In a perfect world, owners of services would always ship a local testing
fake and own keeping it up to date... testcontainers is one approach but
also in-memory fakes are great)

Kenn


> > This seems like something that is easy to get wrong without some
>>> automation to help. Could we run the :test targets on Jenkins using the
>>> sandbox command or docker to block network access?
>>>
>>> That's a great idea. Are we planning on integrating the "standardized
>>> developer build environment" mentioned in the original post into our CI
>>> somehow?
>>>
>>
>> I was thinking it could be good to use it in CI somehow to make sure it
>> doesn't get out of date, but all I had in mind was running some minimal set
>> of tasks. Using it in this way would obviously be even better.
>>
>
> I filed https://issues.apache.org/jira/browse/BEAM-11404 to track this
> idea.
>
>
>>
>>
>>>
>>> On Wed, Dec 2, 2020 at 4:03 PM Andrew Pilloud <ap...@google.com>
>>> wrote:
>>>
>>>> We have a large number of tests that run pipelines on the Direct Runner
>>>> or various local runners, but don't require network access, so I don't
>>>> think the distinction is clear. I do agree that requiring a remote service
>>>> falls on the integration test side.
>>>>
>>>> This seems like something that is easy to get wrong without some
>>>> automation to help. Could we run the :test targets on Jenkins using the
>>>> sandbox command or docker to block network access?
>>>>
>>>> On Wed, Dec 2, 2020 at 3:38 PM Brian Hulette <bh...@google.com>
>>>> wrote:
>>>>
>>>>> Sorry I should've included the list of tests here. So far we've run
>>>>> into:
>>>>> DicomIOTest, FhirIOTest,
>>>>> HL7v2IOTest, org.apache.beam.sdk.extensions.ml.*IT
>>>>>
>>>>> Note the latter are called IT, but that package's build.gradle has a
>>>>> line to scoop ITs into the :test task (addressing in [1]).
>>>>>
>>>>> All of these tests are actually running pipelines so I think they'd be
>>>>> difficult to mock.
>>>>>
>>>>> [1] https://github.com/apache/beam/pull/13444
>>>>>
>>>>> On Wed, Dec 2, 2020 at 3:28 PM Kyle Weaver <kc...@google.com>
>>>>> wrote:
>>>>>
>>>>>> > Should we (do we) require unit tests to be hermetic?
>>>>>>
>>>>>> We should. Unit tests are hermetic by definition. That begs the
>>>>>> definition of hermetic, but clearly the internet is not.
>>>>>>
>>>>>> > Personally I think these tests should be classified as integration
>>>>>> tests (renamed to *IT, and run with the :integrationTest task)
>>>>>>
>>>>>> I'm not sure which tests you're talking about, but it may be better
>>>>>> to make them hermetic through mocking, depending on the intent of the test.
>>>>>>
>>>>>> On Wed, Dec 2, 2020 at 1:22 PM Brian Hulette <bh...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I've been working with Niels Basjes on a standardized developer
>>>>>>> build environment that can run `./gradlew check` [1]. We've run into issues
>>>>>>> because several Java unit tests (class *Test, run with :test) are not
>>>>>>> hermetic. They fail unless the environment they're running in has access to
>>>>>>> the internet, and is authenticated to GCP with access to certain resources.
>>>>>>> Of course the former isn't typically a blocker, but the latter certainly
>>>>>>> can be.
>>>>>>>
>>>>>>> Personally I think these tests should be classified as integration
>>>>>>> tests (renamed to *IT, and run with the :integrationTest task), but I
>>>>>>> realized I don't know if we have a formal definition for what should be a
>>>>>>> unit test vs an integration test. Should we (do we) require unit tests to
>>>>>>> be hermetic?
>>>>>>>
>>>>>>> Brian
>>>>>>>
>>>>>>> [1] https://github.com/apache/beam/pull/13308
>>>>>>>
>>>>>>

Re: Unit tests vs. Integration Tests

Posted by Brian Hulette <bh...@google.com>.
On Wed, Dec 2, 2020 at 5:50 PM Brian Hulette <bh...@google.com> wrote:

> I guess another question I should ask - is :test supposed to only run unit
> tests? I've been assuming so since many modules have separate
> :integrationTest tasks for *IT tests.
>
> On Wed, Dec 2, 2020 at 4:15 PM Kyle Weaver <kc...@google.com> wrote:
>
>> > DicomIOTest, FhirIOTest,
>> HL7v2IOTest, org.apache.beam.sdk.extensions.ml.*IT
>>
>> Looking only at the former three tests, I don't see any reason they can't
>> mock their API clients, especially since all they expect the server to do
>> is send back an error.
>>
>
> Fair point, that wouldn't be *too* much trouble. More than just
> re-classifying them as integration tests though :)
>
>
>>
>> > This seems like something that is easy to get wrong without some
>> automation to help. Could we run the :test targets on Jenkins using the
>> sandbox command or docker to block network access?
>>
>> That's a great idea. Are we planning on integrating the "standardized
>> developer build environment" mentioned in the original post into our CI
>> somehow?
>>
>
> I was thinking it could be good to use it in CI somehow to make sure it
> doesn't get out of date, but all I had in mind was running some minimal set
> of tasks. Using it in this way would obviously be even better.
>

I filed https://issues.apache.org/jira/browse/BEAM-11404 to track this idea.


>
>
>>
>> On Wed, Dec 2, 2020 at 4:03 PM Andrew Pilloud <ap...@google.com>
>> wrote:
>>
>>> We have a large number of tests that run pipelines on the Direct Runner
>>> or various local runners, but don't require network access, so I don't
>>> think the distinction is clear. I do agree that requiring a remote service
>>> falls on the integration test side.
>>>
>>> This seems like something that is easy to get wrong without some
>>> automation to help. Could we run the :test targets on Jenkins using the
>>> sandbox command or docker to block network access?
>>>
>>> On Wed, Dec 2, 2020 at 3:38 PM Brian Hulette <bh...@google.com>
>>> wrote:
>>>
>>>> Sorry I should've included the list of tests here. So far we've run
>>>> into:
>>>> DicomIOTest, FhirIOTest,
>>>> HL7v2IOTest, org.apache.beam.sdk.extensions.ml.*IT
>>>>
>>>> Note the latter are called IT, but that package's build.gradle has a
>>>> line to scoop ITs into the :test task (addressing in [1]).
>>>>
>>>> All of these tests are actually running pipelines so I think they'd be
>>>> difficult to mock.
>>>>
>>>> [1] https://github.com/apache/beam/pull/13444
>>>>
>>>> On Wed, Dec 2, 2020 at 3:28 PM Kyle Weaver <kc...@google.com> wrote:
>>>>
>>>>> > Should we (do we) require unit tests to be hermetic?
>>>>>
>>>>> We should. Unit tests are hermetic by definition. That begs the
>>>>> definition of hermetic, but clearly the internet is not.
>>>>>
>>>>> > Personally I think these tests should be classified as integration
>>>>> tests (renamed to *IT, and run with the :integrationTest task)
>>>>>
>>>>> I'm not sure which tests you're talking about, but it may be better to
>>>>> make them hermetic through mocking, depending on the intent of the test.
>>>>>
>>>>> On Wed, Dec 2, 2020 at 1:22 PM Brian Hulette <bh...@google.com>
>>>>> wrote:
>>>>>
>>>>>> I've been working with Niels Basjes on a standardized developer build
>>>>>> environment that can run `./gradlew check` [1]. We've run into issues
>>>>>> because several Java unit tests (class *Test, run with :test) are not
>>>>>> hermetic. They fail unless the environment they're running in has access to
>>>>>> the internet, and is authenticated to GCP with access to certain resources.
>>>>>> Of course the former isn't typically a blocker, but the latter certainly
>>>>>> can be.
>>>>>>
>>>>>> Personally I think these tests should be classified as integration
>>>>>> tests (renamed to *IT, and run with the :integrationTest task), but I
>>>>>> realized I don't know if we have a formal definition for what should be a
>>>>>> unit test vs an integration test. Should we (do we) require unit tests to
>>>>>> be hermetic?
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>> [1] https://github.com/apache/beam/pull/13308
>>>>>>
>>>>>

Re: Unit tests vs. Integration Tests

Posted by Brian Hulette <bh...@google.com>.
I guess another question I should ask - is :test supposed to only run unit
tests? I've been assuming so since many modules have separate
:integrationTest tasks for *IT tests.

On Wed, Dec 2, 2020 at 4:15 PM Kyle Weaver <kc...@google.com> wrote:

> > DicomIOTest, FhirIOTest,
> HL7v2IOTest, org.apache.beam.sdk.extensions.ml.*IT
>
> Looking only at the former three tests, I don't see any reason they can't
> mock their API clients, especially since all they expect the server to do
> is send back an error.
>

Fair point, that wouldn't be *too* much trouble. More than just
re-classifying them as integration tests though :)


>
> > This seems like something that is easy to get wrong without some
> automation to help. Could we run the :test targets on Jenkins using the
> sandbox command or docker to block network access?
>
> That's a great idea. Are we planning on integrating the "standardized
> developer build environment" mentioned in the original post into our CI
> somehow?
>

I was thinking it could be good to use it in CI somehow to make sure it
doesn't get out of date, but all I had in mind was running some minimal set
of tasks. Using it in this way would obviously be even better.


>
> On Wed, Dec 2, 2020 at 4:03 PM Andrew Pilloud <ap...@google.com> wrote:
>
>> We have a large number of tests that run pipelines on the Direct Runner
>> or various local runners, but don't require network access, so I don't
>> think the distinction is clear. I do agree that requiring a remote service
>> falls on the integration test side.
>>
>> This seems like something that is easy to get wrong without some
>> automation to help. Could we run the :test targets on Jenkins using the
>> sandbox command or docker to block network access?
>>
>> On Wed, Dec 2, 2020 at 3:38 PM Brian Hulette <bh...@google.com> wrote:
>>
>>> Sorry I should've included the list of tests here. So far we've run into:
>>> DicomIOTest, FhirIOTest,
>>> HL7v2IOTest, org.apache.beam.sdk.extensions.ml.*IT
>>>
>>> Note the latter are called IT, but that package's build.gradle has a
>>> line to scoop ITs into the :test task (addressing in [1]).
>>>
>>> All of these tests are actually running pipelines so I think they'd be
>>> difficult to mock.
>>>
>>> [1] https://github.com/apache/beam/pull/13444
>>>
>>> On Wed, Dec 2, 2020 at 3:28 PM Kyle Weaver <kc...@google.com> wrote:
>>>
>>>> > Should we (do we) require unit tests to be hermetic?
>>>>
>>>> We should. Unit tests are hermetic by definition. That begs the
>>>> definition of hermetic, but clearly the internet is not.
>>>>
>>>> > Personally I think these tests should be classified as integration
>>>> tests (renamed to *IT, and run with the :integrationTest task)
>>>>
>>>> I'm not sure which tests you're talking about, but it may be better to
>>>> make them hermetic through mocking, depending on the intent of the test.
>>>>
>>>> On Wed, Dec 2, 2020 at 1:22 PM Brian Hulette <bh...@google.com>
>>>> wrote:
>>>>
>>>>> I've been working with Niels Basjes on a standardized developer build
>>>>> environment that can run `./gradlew check` [1]. We've run into issues
>>>>> because several Java unit tests (class *Test, run with :test) are not
>>>>> hermetic. They fail unless the environment they're running in has access to
>>>>> the internet, and is authenticated to GCP with access to certain resources.
>>>>> Of course the former isn't typically a blocker, but the latter certainly
>>>>> can be.
>>>>>
>>>>> Personally I think these tests should be classified as integration
>>>>> tests (renamed to *IT, and run with the :integrationTest task), but I
>>>>> realized I don't know if we have a formal definition for what should be a
>>>>> unit test vs an integration test. Should we (do we) require unit tests to
>>>>> be hermetic?
>>>>>
>>>>> Brian
>>>>>
>>>>> [1] https://github.com/apache/beam/pull/13308
>>>>>
>>>>

Re: Unit tests vs. Integration Tests

Posted by Kyle Weaver <kc...@google.com>.
> DicomIOTest, FhirIOTest,
HL7v2IOTest, org.apache.beam.sdk.extensions.ml.*IT

Looking only at the former three tests, I don't see any reason they can't
mock their API clients, especially since all they expect the server to do
is send back an error.

> This seems like something that is easy to get wrong without some
automation to help. Could we run the :test targets on Jenkins using the
sandbox command or docker to block network access?

That's a great idea. Are we planning on integrating the "standardized
developer build environment" mentioned in the original post into our CI
somehow?

On Wed, Dec 2, 2020 at 4:03 PM Andrew Pilloud <ap...@google.com> wrote:

> We have a large number of tests that run pipelines on the Direct Runner or
> various local runners, but don't require network access, so I don't think
> the distinction is clear. I do agree that requiring a remote service falls
> on the integration test side.
>
> This seems like something that is easy to get wrong without some
> automation to help. Could we run the :test targets on Jenkins using the
> sandbox command or docker to block network access?
>
> On Wed, Dec 2, 2020 at 3:38 PM Brian Hulette <bh...@google.com> wrote:
>
>> Sorry I should've included the list of tests here. So far we've run into:
>> DicomIOTest, FhirIOTest,
>> HL7v2IOTest, org.apache.beam.sdk.extensions.ml.*IT
>>
>> Note the latter are called IT, but that package's build.gradle has a line
>> to scoop ITs into the :test task (addressing in [1]).
>>
>> All of these tests are actually running pipelines so I think they'd be
>> difficult to mock.
>>
>> [1] https://github.com/apache/beam/pull/13444
>>
>> On Wed, Dec 2, 2020 at 3:28 PM Kyle Weaver <kc...@google.com> wrote:
>>
>>> > Should we (do we) require unit tests to be hermetic?
>>>
>>> We should. Unit tests are hermetic by definition. That begs the
>>> definition of hermetic, but clearly the internet is not.
>>>
>>> > Personally I think these tests should be classified as integration
>>> tests (renamed to *IT, and run with the :integrationTest task)
>>>
>>> I'm not sure which tests you're talking about, but it may be better to
>>> make them hermetic through mocking, depending on the intent of the test.
>>>
>>> On Wed, Dec 2, 2020 at 1:22 PM Brian Hulette <bh...@google.com>
>>> wrote:
>>>
>>>> I've been working with Niels Basjes on a standardized developer build
>>>> environment that can run `./gradlew check` [1]. We've run into issues
>>>> because several Java unit tests (class *Test, run with :test) are not
>>>> hermetic. They fail unless the environment they're running in has access to
>>>> the internet, and is authenticated to GCP with access to certain resources.
>>>> Of course the former isn't typically a blocker, but the latter certainly
>>>> can be.
>>>>
>>>> Personally I think these tests should be classified as integration
>>>> tests (renamed to *IT, and run with the :integrationTest task), but I
>>>> realized I don't know if we have a formal definition for what should be a
>>>> unit test vs an integration test. Should we (do we) require unit tests to
>>>> be hermetic?
>>>>
>>>> Brian
>>>>
>>>> [1] https://github.com/apache/beam/pull/13308
>>>>
>>>

Re: Unit tests vs. Integration Tests

Posted by Andrew Pilloud <ap...@google.com>.
We have a large number of tests that run pipelines on the Direct Runner or
various local runners, but don't require network access, so I don't think
the distinction is clear. I do agree that requiring a remote service falls
on the integration test side.

This seems like something that is easy to get wrong without some automation
to help. Could we run the :test targets on Jenkins using the sandbox
command or docker to block network access?

On Wed, Dec 2, 2020 at 3:38 PM Brian Hulette <bh...@google.com> wrote:

> Sorry I should've included the list of tests here. So far we've run into:
> DicomIOTest, FhirIOTest, HL7v2IOTest, org.apache.beam.sdk.extensions.ml.*IT
>
> Note the latter are called IT, but that package's build.gradle has a line
> to scoop ITs into the :test task (addressing in [1]).
>
> All of these tests are actually running pipelines so I think they'd be
> difficult to mock.
>
> [1] https://github.com/apache/beam/pull/13444
>
> On Wed, Dec 2, 2020 at 3:28 PM Kyle Weaver <kc...@google.com> wrote:
>
>> > Should we (do we) require unit tests to be hermetic?
>>
>> We should. Unit tests are hermetic by definition. That begs the
>> definition of hermetic, but clearly the internet is not.
>>
>> > Personally I think these tests should be classified as integration
>> tests (renamed to *IT, and run with the :integrationTest task)
>>
>> I'm not sure which tests you're talking about, but it may be better to
>> make them hermetic through mocking, depending on the intent of the test.
>>
>> On Wed, Dec 2, 2020 at 1:22 PM Brian Hulette <bh...@google.com> wrote:
>>
>>> I've been working with Niels Basjes on a standardized developer build
>>> environment that can run `./gradlew check` [1]. We've run into issues
>>> because several Java unit tests (class *Test, run with :test) are not
>>> hermetic. They fail unless the environment they're running in has access to
>>> the internet, and is authenticated to GCP with access to certain resources.
>>> Of course the former isn't typically a blocker, but the latter certainly
>>> can be.
>>>
>>> Personally I think these tests should be classified as integration tests
>>> (renamed to *IT, and run with the :integrationTest task), but I realized I
>>> don't know if we have a formal definition for what should be a unit test vs
>>> an integration test. Should we (do we) require unit tests to be hermetic?
>>>
>>> Brian
>>>
>>> [1] https://github.com/apache/beam/pull/13308
>>>
>>

Re: Unit tests vs. Integration Tests

Posted by Brian Hulette <bh...@google.com>.
Sorry I should've included the list of tests here. So far we've run into:
DicomIOTest, FhirIOTest, HL7v2IOTest, org.apache.beam.sdk.extensions.ml.*IT

Note the latter are called IT, but that package's build.gradle has a line
to scoop ITs into the :test task (addressing in [1]).

All of these tests are actually running pipelines so I think they'd be
difficult to mock.

[1] https://github.com/apache/beam/pull/13444

On Wed, Dec 2, 2020 at 3:28 PM Kyle Weaver <kc...@google.com> wrote:

> > Should we (do we) require unit tests to be hermetic?
>
> We should. Unit tests are hermetic by definition. That begs the definition
> of hermetic, but clearly the internet is not.
>
> > Personally I think these tests should be classified as integration tests
> (renamed to *IT, and run with the :integrationTest task)
>
> I'm not sure which tests you're talking about, but it may be better to
> make them hermetic through mocking, depending on the intent of the test.
>
> On Wed, Dec 2, 2020 at 1:22 PM Brian Hulette <bh...@google.com> wrote:
>
>> I've been working with Niels Basjes on a standardized developer build
>> environment that can run `./gradlew check` [1]. We've run into issues
>> because several Java unit tests (class *Test, run with :test) are not
>> hermetic. They fail unless the environment they're running in has access to
>> the internet, and is authenticated to GCP with access to certain resources.
>> Of course the former isn't typically a blocker, but the latter certainly
>> can be.
>>
>> Personally I think these tests should be classified as integration tests
>> (renamed to *IT, and run with the :integrationTest task), but I realized I
>> don't know if we have a formal definition for what should be a unit test vs
>> an integration test. Should we (do we) require unit tests to be hermetic?
>>
>> Brian
>>
>> [1] https://github.com/apache/beam/pull/13308
>>
>

Re: Unit tests vs. Integration Tests

Posted by Kyle Weaver <kc...@google.com>.
> Should we (do we) require unit tests to be hermetic?

We should. Unit tests are hermetic by definition. That begs the definition
of hermetic, but clearly the internet is not.

> Personally I think these tests should be classified as integration tests
(renamed to *IT, and run with the :integrationTest task)

I'm not sure which tests you're talking about, but it may be better to make
them hermetic through mocking, depending on the intent of the test.

On Wed, Dec 2, 2020 at 1:22 PM Brian Hulette <bh...@google.com> wrote:

> I've been working with Niels Basjes on a standardized developer build
> environment that can run `./gradlew check` [1]. We've run into issues
> because several Java unit tests (class *Test, run with :test) are not
> hermetic. They fail unless the environment they're running in has access to
> the internet, and is authenticated to GCP with access to certain resources.
> Of course the former isn't typically a blocker, but the latter certainly
> can be.
>
> Personally I think these tests should be classified as integration tests
> (renamed to *IT, and run with the :integrationTest task), but I realized I
> don't know if we have a formal definition for what should be a unit test vs
> an integration test. Should we (do we) require unit tests to be hermetic?
>
> Brian
>
> [1] https://github.com/apache/beam/pull/13308
>