You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Eric Yang <ey...@hortonworks.com> on 2017/09/29 17:12:09 UTC

[DISCUSS] HADOOP-9122 Add power mock library for writing better unit tests

Hi Hadoop-dev,

Long time ago, Hadoop community decided to put Powermock on hold for unit tests.  Both mockito and powermock has evolved a lot in the past 5 years.  There are mature versions of both software, and there are compatibility charts to indicate which versions can work together.  Hadoop has grown a lot in the last 5 years.  It becomes apparent that without ability to instrument lower level classes to contain unit test scope.  Many tests are written to simulate integration test in order to perform unit tests.  The result is slow performance on unit tests, and some parts are not testable strictly in unit test case.  This discussion is to revisit the decision, and see if we would embrace Powermock and allow HADOOP-9122 to be implemented.  Feel free to comment on HADOOP-9122 and this thread to revisit this issue.

Thank you for your time.

Regards,
Eric

Re: [DISCUSS] HADOOP-9122 Add power mock library for writing better unit tests

Posted by Chris Douglas <cd...@apache.org>.
Eric-

It might be easier to grok the benefits of Powermock if you wrote an
example test. As you say, starting a mini*cluster to test input
validation is wasteful, but it's not clear why that would be required
if Powermock were unavailable.

It's not a huge deal to add a dependency, particularly if it improves
our tests. However, "constructor suppression, inner class replacement
and private method override" are aggressive workarounds. These might
be justifiable for testing with a third-party library, but in our
setting it's more likely a sign of a suboptimal design. Some of these,
particularly "private method override" sound like they'll create more
brittle tests, rather than more testable code.

However, if there are some cases where Hadoop is forcing bad design
because of some existing, pervasive code, then maybe judicious use of
Powermock could help. -C


On Fri, Sep 29, 2017 at 2:46 PM, Eric Yang <ey...@hortonworks.com> wrote:
> Hi Chris and Andrew,
>
>
>
> The intend is for new code to have better unit test cases without resort to
> invocation of miniHDFSCluster or miniYarnCluster.  Existing code don’t
> require refactoring, if the test cases already have good coverages.  I am
> currently working on part of YARN to improve YARN and Docker integration.
> There are a lot of code getting triggered for UGI, FileSystem object to Yarn
> job submission.  My code is only responsible to check the logic of the user
> input, and expected output prior to YarnClient job submission.  Starting a
> miniCluster for this test case is excessive for the small piece of code for
> validation.  The submission code was imported from Slider for YARN native
> services, a single class imports various Hadoop services.  In several
> failure cases, it is difficult to simulate exact error conditions because
> the API is several layers deep.  Powermock provides easy way to replace and
> stubbing return object or throw proper exception to simulate the failure
> conditions.  One can argue that the code should have been written easier for
> unit tests, but Hadoop code density is beyond trivial to get simple
> initialization done.  Constructor suppression, inner class replacement and
> private method override are good tools from Powermock that can provide more
> accurate testing without losing sights of multiple stage API calling tests
> while keeping the test case localized to a small piece of the greater
> puzzle.  Hence, I like to request the community to rethink the improvement
> that Powermock can bring to the table.  Thank you for your considation.
>
>
>
> Regards,
>
> Eric
>
>
>
> From: Andrew Wang <an...@cloudera.com>
> Date: Friday, September 29, 2017 at 1:55 PM
> To: Chris Douglas <cd...@apache.org>
> Cc: Eric Yang <ey...@hortonworks.com>, "common-dev@hadoop.apache.org"
> <co...@hadoop.apache.org>
> Subject: Re: [DISCUSS] HADOOP-9122 Add power mock library for writing better
> unit tests
>
>
>
> Making code testable is about a lot more than your mocking library. In HDFS,
> the NameNode is very monolithic, so it's hard to instantiate little pieces
> of functionality in isolation and mock things out. I have my doubts that
> Powermock will help with this unless someone's willing to invest in
> significant refactoring effort of our existing code and unit test suites.
>
>
>
> I could see an argument of this being useful for new code being developed,
> but like Chris, I'd like to see an example of where Mockito falls short, and
> what additional capabilities Powermock brings to the table.
>
>
>
> Best,
>
> Andrew
>
>
>
> On Fri, Sep 29, 2017 at 10:38 AM, Chris Douglas <cd...@apache.org> wrote:
>
> Eric-
>
> Can you explain how Powermock differs from/augments Mockito, why we
> should adopt it, and maybe an example of an existing test that could
> be improved using this library? -C
>
>
> On Fri, Sep 29, 2017 at 10:12 AM, Eric Yang <ey...@hortonworks.com> wrote:
>> Hi Hadoop-dev,
>>
>> Long time ago, Hadoop community decided to put Powermock on hold for unit
>> tests.  Both mockito and powermock has evolved a lot in the past 5 years.
>> There are mature versions of both software, and there are compatibility
>> charts to indicate which versions can work together.  Hadoop has grown a lot
>> in the last 5 years.  It becomes apparent that without ability to instrument
>> lower level classes to contain unit test scope.  Many tests are written to
>> simulate integration test in order to perform unit tests.  The result is
>> slow performance on unit tests, and some parts are not testable strictly in
>> unit test case.  This discussion is to revisit the decision, and see if we
>> would embrace Powermock and allow HADOOP-9122 to be implemented.  Feel free
>> to comment on HADOOP-9122 and this thread to revisit this issue.
>>
>> Thank you for your time.
>>
>> Regards,
>> Eric
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: common-dev-help@hadoop.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org


Re: [DISCUSS] HADOOP-9122 Add power mock library for writing better unit tests

Posted by Chris Douglas <cd...@apache.org>.
Sorry, took awhile to get back to this.

In this example, the win with PowerMock is the capability to set the
private static field, so it can be final outside the unit test (in the
version ultimately committed, there's a @VisibleForTesting static
method to overwrite the static field between tests)? I'm not familiar
with how this class is used, but if it's an injected @Singleton, then
what is the virtue of making the field static at all? Would it be
possible to create a new instance for each test, if this were a member
field? Using reflection to set internal state, even if it's a library
with a good API, creates brittle coupling between unit tests and
implementations... this library sounds like syntactic sugar for the
brittle anti-patterns that Steve cited as the bane of mock tests,
generally.

Nonetheless, if you find that the existing tooling is insufficient,
then please feel free to include it. -C

On Mon, Oct 2, 2017 at 4:55 PM, Eric Yang <ey...@hortonworks.com> wrote:
> Chris,
>
> Here is a patch that use powermock.  https://issues.apache.org/jira/secure/attachment/12889144/YARN-7202.yarn-native-services.002.patch
> This was written to verify that when ServiceClient interacts with Hadoop, if it throws the possible Exception types declared by ServiceClient API, does the REST API layer handles the error code correctly.  It can help to simulate internal errors and safe guard the API against the errors.  It seems like a useful approach to reduce the full setup of MiniYarnCluster, and submit job and generate actual failure situations in the backend.
>
> It looks like a useful way to test negative test cases.  The full exercise of positive case is written in another test case in TestYarnNativeServices in Hadoop-yarn-services-api project.
> Without ability to inject fault into the system, it is harder to test negative cases.  However, I found it difficult to attempt this in Hadoop code base.  Suggestion?
>
> Regards,
> Eric
>
> On 10/2/17, 3:09 PM, "Chris Douglas" <cd...@apache.org> wrote:
>
>     Eric/Steve-
>
>     Please pick a test- any test- and demonstrate why Powermock would
>     improve- by any metric- testing in Hadoop. -C
>
>
>
>     On Mon, Oct 2, 2017 at 2:12 PM, Eric Yang <ey...@hortonworks.com> wrote:
>     > Mock provides tool chains to run simulation for a piece of code.  It helps to prevent null pointer exception, and reduce unexpected runtime exceptions.  When a piece of code is finished with a well-defined unit test, it provides great insights to see author’s intention and reasoning to write the code.  However, everyone looks at code from a different perspective, and it is often easier to rewrite the code than modifying and update the tests.   The short coming of writing new code, there is always danger of losing existing purpose, workaround buried deep in the code.  On the other hand, if a test program is filling with several pages of initialization code, and override.  It is hard to get context of the test case, and easy to lose the original meaning of the test case.  Hence, there are drawback for using mock or full integration test.
>     >
>     > I was in favor of using Powermock in favor of giving user the ability to unit test a class and reduce external interference initially.  However, I quickly come to realization that Hadoop usage of protocol buffer serialization technique and java reflection serialization technique have some difference which prevents powermock to work for certain Hadoop classes.
>     >
>     > Hadoop unit tests are written to be bigger than one class, and frequently, a mini-cluster is spawned to test 5-10 lines of code.  Any simple API test will trigger large portion of Hadoop code to be initialized.  Hadoop code base will require too much effort to work with Powermock.  Programs outside of Hadoop can use powermock annotation to prevent mocking Hadoop classes, such as: @powermockignore({"javax.management_", "javax.xml.", "org.w3c.", "org.apache.hadoop._", "com.sun.*"}) .  However, working in Hadoop code base, this technique is not practical because every class in Hadoop prefix with org.apache.hadoop.  It will be heavy upkeep to maintain the list of prefix packages that can not work with powermock reflection.
>     > Hence, I rest my case for re-opening this issue.
>     >
>     > Regards,
>     > Eric
>     >
>     > From: Steve Loughran <st...@hortonworks.com>
>     > Date: Sunday, October 1, 2017 at 12:36 PM
>     > To: Eric Yang <ey...@hortonworks.com>
>     > Cc: Andrew Wang <an...@cloudera.com>, Chris Douglas <cd...@apache.org>, "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
>     > Subject: Re: [DISCUSS] HADOOP-9122 Add power mock library for writing better unit tests
>     >
>     >
>     > On 29 Sep 2017, at 22:46, Eric Yang <ey...@hortonworks.com>> wrote:
>     >
>     > Hi Chris and Andrew,
>     >
>     > The intend is for new code to have better unit test cases without resort to invocation of miniHDFSCluster or miniYarnCluster.  Existing code don’t require refactoring, if the test cases already have good coverages.  I am currently working on part of YARN to improve YARN and Docker integration.  There are a lot of code getting triggered for UGI, FileSystem object to Yarn job submission.  My code is only responsible to check the logic of the user input, and expected output prior to YarnClient job submission.  Starting a miniCluster for this test case is excessive for the small piece of code for validation.  The submission code was imported from Slider for YARN native services, a single class imports various Hadoop services.  In several failure cases, it is difficult to simulate exact error conditions because the API is several layers deep.  Powermock provides easy way to replace and stubbing return object or throw proper exception to simulate the failure conditions.  One can argue that the code should have been written easier for unit tests, but Hadoop code density is beyond trivial to get simple initialization done.  Constructor suppression, inner class replacement and private method override are good tools from Powermock that can provide more accurate testing without losing sights of multiple stage API calling tests while keeping the test case localized to a small piece of the greater puzzle.  Hence, I like to request the community to rethink the improvement that Powermock can bring to the table.  Thank you for your considation.
>     >
>     > I don't know enough about powermock to have opinions on the matter. I do know I don't like mocking in general https://www.slideshare.net/steve_l/i-hate-mocking , or at least in the one area where I find it most troublesome: maintaining code
>     >
>     >
>     > I' just find that mock code tests to be very brittle to changes in the codepaths of the classes called, so whenever you change the implementation, tests fail. And it's not so much "your code has regressed and we correctly caught it"  failure as "the change in order of invocation caused our test to report a regression when it wasn't really" kind of failure. Which is bad, as you waste time working out that this is the cause, then often fix the problems by moving bits of the test around until it stops failing. Which can hide real regressions.
>     >
>     > Where mocking can be good is in that
>     >
>     > 1. you can make assertions about how thinga were invoked, though note we've moved in S3A towards actually instrumenting the code and asserting on that. This way our shipping code gets to enjoy better instrumentation. [Note, those assertions can be brittle to changes in implementation too]
>     >
>     > 2. You can simulate failure better. But for S3Guard/S3A we've gone and implemented an InconsistentS3Client which can be used downstream (it ships in the hadoop-aws JAR) and so can be used downstream.
>     >
>     > 3. You can test things without needing so much support infra (e.g. in unit tests and on jenkins without needing logins, running services)
>     >
>     > 4. You can have faster tests, because there's no need to set up/tear down things like HDFS
>     >
>     > 5. You can isolate problems to the code under test, rather than looking at the logs of forked processes collected somewhere under target/
>     >
>     > I think Eric's looking @ #4, & 5 which, for tests which need a MiniYARN cluster is significant. If Powermock helps this, I don't see why we should say "don't use it", as long as we are aware of the cost, which is the risk of creating tests which are brittle to changes in the implementation code
>     >
>     >
>     > FWIW, Mocking is why I couldn't make the init/start/stop methods of org.apache.hadoop.service.AbstractService final; the need to test with mocking can impact production code. Is that bad? Well, we do other things to code to aid testability,...
>     >
>     >
>     > -Steve
>     >
>     >
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org


Re: [DISCUSS] HADOOP-9122 Add power mock library for writing better unit tests

Posted by Eric Yang <ey...@hortonworks.com>.
Chris,

Here is a patch that use powermock.  https://issues.apache.org/jira/secure/attachment/12889144/YARN-7202.yarn-native-services.002.patch
This was written to verify that when ServiceClient interacts with Hadoop, if it throws the possible Exception types declared by ServiceClient API, does the REST API layer handles the error code correctly.  It can help to simulate internal errors and safe guard the API against the errors.  It seems like a useful approach to reduce the full setup of MiniYarnCluster, and submit job and generate actual failure situations in the backend.

It looks like a useful way to test negative test cases.  The full exercise of positive case is written in another test case in TestYarnNativeServices in Hadoop-yarn-services-api project.
Without ability to inject fault into the system, it is harder to test negative cases.  However, I found it difficult to attempt this in Hadoop code base.  Suggestion?

Regards,
Eric

On 10/2/17, 3:09 PM, "Chris Douglas" <cd...@apache.org> wrote:

    Eric/Steve-
    
    Please pick a test- any test- and demonstrate why Powermock would
    improve- by any metric- testing in Hadoop. -C
    
    
    
    On Mon, Oct 2, 2017 at 2:12 PM, Eric Yang <ey...@hortonworks.com> wrote:
    > Mock provides tool chains to run simulation for a piece of code.  It helps to prevent null pointer exception, and reduce unexpected runtime exceptions.  When a piece of code is finished with a well-defined unit test, it provides great insights to see author’s intention and reasoning to write the code.  However, everyone looks at code from a different perspective, and it is often easier to rewrite the code than modifying and update the tests.   The short coming of writing new code, there is always danger of losing existing purpose, workaround buried deep in the code.  On the other hand, if a test program is filling with several pages of initialization code, and override.  It is hard to get context of the test case, and easy to lose the original meaning of the test case.  Hence, there are drawback for using mock or full integration test.
    >
    > I was in favor of using Powermock in favor of giving user the ability to unit test a class and reduce external interference initially.  However, I quickly come to realization that Hadoop usage of protocol buffer serialization technique and java reflection serialization technique have some difference which prevents powermock to work for certain Hadoop classes.
    >
    > Hadoop unit tests are written to be bigger than one class, and frequently, a mini-cluster is spawned to test 5-10 lines of code.  Any simple API test will trigger large portion of Hadoop code to be initialized.  Hadoop code base will require too much effort to work with Powermock.  Programs outside of Hadoop can use powermock annotation to prevent mocking Hadoop classes, such as: @powermockignore({"javax.management_", "javax.xml.", "org.w3c.", "org.apache.hadoop._", "com.sun.*"}) .  However, working in Hadoop code base, this technique is not practical because every class in Hadoop prefix with org.apache.hadoop.  It will be heavy upkeep to maintain the list of prefix packages that can not work with powermock reflection.
    > Hence, I rest my case for re-opening this issue.
    >
    > Regards,
    > Eric
    >
    > From: Steve Loughran <st...@hortonworks.com>
    > Date: Sunday, October 1, 2017 at 12:36 PM
    > To: Eric Yang <ey...@hortonworks.com>
    > Cc: Andrew Wang <an...@cloudera.com>, Chris Douglas <cd...@apache.org>, "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
    > Subject: Re: [DISCUSS] HADOOP-9122 Add power mock library for writing better unit tests
    >
    >
    > On 29 Sep 2017, at 22:46, Eric Yang <ey...@hortonworks.com>> wrote:
    >
    > Hi Chris and Andrew,
    >
    > The intend is for new code to have better unit test cases without resort to invocation of miniHDFSCluster or miniYarnCluster.  Existing code don’t require refactoring, if the test cases already have good coverages.  I am currently working on part of YARN to improve YARN and Docker integration.  There are a lot of code getting triggered for UGI, FileSystem object to Yarn job submission.  My code is only responsible to check the logic of the user input, and expected output prior to YarnClient job submission.  Starting a miniCluster for this test case is excessive for the small piece of code for validation.  The submission code was imported from Slider for YARN native services, a single class imports various Hadoop services.  In several failure cases, it is difficult to simulate exact error conditions because the API is several layers deep.  Powermock provides easy way to replace and stubbing return object or throw proper exception to simulate the failure conditions.  One can argue that the code should have been written easier for unit tests, but Hadoop code density is beyond trivial to get simple initialization done.  Constructor suppression, inner class replacement and private method override are good tools from Powermock that can provide more accurate testing without losing sights of multiple stage API calling tests while keeping the test case localized to a small piece of the greater puzzle.  Hence, I like to request the community to rethink the improvement that Powermock can bring to the table.  Thank you for your considation.
    >
    > I don't know enough about powermock to have opinions on the matter. I do know I don't like mocking in general https://www.slideshare.net/steve_l/i-hate-mocking , or at least in the one area where I find it most troublesome: maintaining code
    >
    >
    > I' just find that mock code tests to be very brittle to changes in the codepaths of the classes called, so whenever you change the implementation, tests fail. And it's not so much "your code has regressed and we correctly caught it"  failure as "the change in order of invocation caused our test to report a regression when it wasn't really" kind of failure. Which is bad, as you waste time working out that this is the cause, then often fix the problems by moving bits of the test around until it stops failing. Which can hide real regressions.
    >
    > Where mocking can be good is in that
    >
    > 1. you can make assertions about how thinga were invoked, though note we've moved in S3A towards actually instrumenting the code and asserting on that. This way our shipping code gets to enjoy better instrumentation. [Note, those assertions can be brittle to changes in implementation too]
    >
    > 2. You can simulate failure better. But for S3Guard/S3A we've gone and implemented an InconsistentS3Client which can be used downstream (it ships in the hadoop-aws JAR) and so can be used downstream.
    >
    > 3. You can test things without needing so much support infra (e.g. in unit tests and on jenkins without needing logins, running services)
    >
    > 4. You can have faster tests, because there's no need to set up/tear down things like HDFS
    >
    > 5. You can isolate problems to the code under test, rather than looking at the logs of forked processes collected somewhere under target/
    >
    > I think Eric's looking @ #4, & 5 which, for tests which need a MiniYARN cluster is significant. If Powermock helps this, I don't see why we should say "don't use it", as long as we are aware of the cost, which is the risk of creating tests which are brittle to changes in the implementation code
    >
    >
    > FWIW, Mocking is why I couldn't make the init/start/stop methods of org.apache.hadoop.service.AbstractService final; the need to test with mocking can impact production code. Is that bad? Well, we do other things to code to aid testability,...
    >
    >
    > -Steve
    >
    >
    
    


Re: [DISCUSS] HADOOP-9122 Add power mock library for writing better unit tests

Posted by Chris Douglas <cd...@apache.org>.
Eric/Steve-

Please pick a test- any test- and demonstrate why Powermock would
improve- by any metric- testing in Hadoop. -C



On Mon, Oct 2, 2017 at 2:12 PM, Eric Yang <ey...@hortonworks.com> wrote:
> Mock provides tool chains to run simulation for a piece of code.  It helps to prevent null pointer exception, and reduce unexpected runtime exceptions.  When a piece of code is finished with a well-defined unit test, it provides great insights to see author’s intention and reasoning to write the code.  However, everyone looks at code from a different perspective, and it is often easier to rewrite the code than modifying and update the tests.   The short coming of writing new code, there is always danger of losing existing purpose, workaround buried deep in the code.  On the other hand, if a test program is filling with several pages of initialization code, and override.  It is hard to get context of the test case, and easy to lose the original meaning of the test case.  Hence, there are drawback for using mock or full integration test.
>
> I was in favor of using Powermock in favor of giving user the ability to unit test a class and reduce external interference initially.  However, I quickly come to realization that Hadoop usage of protocol buffer serialization technique and java reflection serialization technique have some difference which prevents powermock to work for certain Hadoop classes.
>
> Hadoop unit tests are written to be bigger than one class, and frequently, a mini-cluster is spawned to test 5-10 lines of code.  Any simple API test will trigger large portion of Hadoop code to be initialized.  Hadoop code base will require too much effort to work with Powermock.  Programs outside of Hadoop can use powermock annotation to prevent mocking Hadoop classes, such as: @powermockignore({"javax.management_", "javax.xml.", "org.w3c.", "org.apache.hadoop._", "com.sun.*"}) .  However, working in Hadoop code base, this technique is not practical because every class in Hadoop prefix with org.apache.hadoop.  It will be heavy upkeep to maintain the list of prefix packages that can not work with powermock reflection.
> Hence, I rest my case for re-opening this issue.
>
> Regards,
> Eric
>
> From: Steve Loughran <st...@hortonworks.com>
> Date: Sunday, October 1, 2017 at 12:36 PM
> To: Eric Yang <ey...@hortonworks.com>
> Cc: Andrew Wang <an...@cloudera.com>, Chris Douglas <cd...@apache.org>, "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
> Subject: Re: [DISCUSS] HADOOP-9122 Add power mock library for writing better unit tests
>
>
> On 29 Sep 2017, at 22:46, Eric Yang <ey...@hortonworks.com>> wrote:
>
> Hi Chris and Andrew,
>
> The intend is for new code to have better unit test cases without resort to invocation of miniHDFSCluster or miniYarnCluster.  Existing code don’t require refactoring, if the test cases already have good coverages.  I am currently working on part of YARN to improve YARN and Docker integration.  There are a lot of code getting triggered for UGI, FileSystem object to Yarn job submission.  My code is only responsible to check the logic of the user input, and expected output prior to YarnClient job submission.  Starting a miniCluster for this test case is excessive for the small piece of code for validation.  The submission code was imported from Slider for YARN native services, a single class imports various Hadoop services.  In several failure cases, it is difficult to simulate exact error conditions because the API is several layers deep.  Powermock provides easy way to replace and stubbing return object or throw proper exception to simulate the failure conditions.  One can argue that the code should have been written easier for unit tests, but Hadoop code density is beyond trivial to get simple initialization done.  Constructor suppression, inner class replacement and private method override are good tools from Powermock that can provide more accurate testing without losing sights of multiple stage API calling tests while keeping the test case localized to a small piece of the greater puzzle.  Hence, I like to request the community to rethink the improvement that Powermock can bring to the table.  Thank you for your considation.
>
> I don't know enough about powermock to have opinions on the matter. I do know I don't like mocking in general https://www.slideshare.net/steve_l/i-hate-mocking , or at least in the one area where I find it most troublesome: maintaining code
>
>
> I' just find that mock code tests to be very brittle to changes in the codepaths of the classes called, so whenever you change the implementation, tests fail. And it's not so much "your code has regressed and we correctly caught it"  failure as "the change in order of invocation caused our test to report a regression when it wasn't really" kind of failure. Which is bad, as you waste time working out that this is the cause, then often fix the problems by moving bits of the test around until it stops failing. Which can hide real regressions.
>
> Where mocking can be good is in that
>
> 1. you can make assertions about how thinga were invoked, though note we've moved in S3A towards actually instrumenting the code and asserting on that. This way our shipping code gets to enjoy better instrumentation. [Note, those assertions can be brittle to changes in implementation too]
>
> 2. You can simulate failure better. But for S3Guard/S3A we've gone and implemented an InconsistentS3Client which can be used downstream (it ships in the hadoop-aws JAR) and so can be used downstream.
>
> 3. You can test things without needing so much support infra (e.g. in unit tests and on jenkins without needing logins, running services)
>
> 4. You can have faster tests, because there's no need to set up/tear down things like HDFS
>
> 5. You can isolate problems to the code under test, rather than looking at the logs of forked processes collected somewhere under target/
>
> I think Eric's looking @ #4, & 5 which, for tests which need a MiniYARN cluster is significant. If Powermock helps this, I don't see why we should say "don't use it", as long as we are aware of the cost, which is the risk of creating tests which are brittle to changes in the implementation code
>
>
> FWIW, Mocking is why I couldn't make the init/start/stop methods of org.apache.hadoop.service.AbstractService final; the need to test with mocking can impact production code. Is that bad? Well, we do other things to code to aid testability,...
>
>
> -Steve
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org


Re: [DISCUSS] HADOOP-9122 Add power mock library for writing better unit tests

Posted by Eric Yang <ey...@hortonworks.com>.
Mock provides tool chains to run simulation for a piece of code.  It helps to prevent null pointer exception, and reduce unexpected runtime exceptions.  When a piece of code is finished with a well-defined unit test, it provides great insights to see author’s intention and reasoning to write the code.  However, everyone looks at code from a different perspective, and it is often easier to rewrite the code than modifying and update the tests.   The short coming of writing new code, there is always danger of losing existing purpose, workaround buried deep in the code.  On the other hand, if a test program is filling with several pages of initialization code, and override.  It is hard to get context of the test case, and easy to lose the original meaning of the test case.  Hence, there are drawback for using mock or full integration test.

I was in favor of using Powermock in favor of giving user the ability to unit test a class and reduce external interference initially.  However, I quickly come to realization that Hadoop usage of protocol buffer serialization technique and java reflection serialization technique have some difference which prevents powermock to work for certain Hadoop classes.

Hadoop unit tests are written to be bigger than one class, and frequently, a mini-cluster is spawned to test 5-10 lines of code.  Any simple API test will trigger large portion of Hadoop code to be initialized.  Hadoop code base will require too much effort to work with Powermock.  Programs outside of Hadoop can use powermock annotation to prevent mocking Hadoop classes, such as: @powermockignore({"javax.management_", "javax.xml.", "org.w3c.", "org.apache.hadoop._", "com.sun.*"}) .  However, working in Hadoop code base, this technique is not practical because every class in Hadoop prefix with org.apache.hadoop.  It will be heavy upkeep to maintain the list of prefix packages that can not work with powermock reflection.
Hence, I rest my case for re-opening this issue.

Regards,
Eric

From: Steve Loughran <st...@hortonworks.com>
Date: Sunday, October 1, 2017 at 12:36 PM
To: Eric Yang <ey...@hortonworks.com>
Cc: Andrew Wang <an...@cloudera.com>, Chris Douglas <cd...@apache.org>, "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
Subject: Re: [DISCUSS] HADOOP-9122 Add power mock library for writing better unit tests


On 29 Sep 2017, at 22:46, Eric Yang <ey...@hortonworks.com>> wrote:

Hi Chris and Andrew,

The intend is for new code to have better unit test cases without resort to invocation of miniHDFSCluster or miniYarnCluster.  Existing code don’t require refactoring, if the test cases already have good coverages.  I am currently working on part of YARN to improve YARN and Docker integration.  There are a lot of code getting triggered for UGI, FileSystem object to Yarn job submission.  My code is only responsible to check the logic of the user input, and expected output prior to YarnClient job submission.  Starting a miniCluster for this test case is excessive for the small piece of code for validation.  The submission code was imported from Slider for YARN native services, a single class imports various Hadoop services.  In several failure cases, it is difficult to simulate exact error conditions because the API is several layers deep.  Powermock provides easy way to replace and stubbing return object or throw proper exception to simulate the failure conditions.  One can argue that the code should have been written easier for unit tests, but Hadoop code density is beyond trivial to get simple initialization done.  Constructor suppression, inner class replacement and private method override are good tools from Powermock that can provide more accurate testing without losing sights of multiple stage API calling tests while keeping the test case localized to a small piece of the greater puzzle.  Hence, I like to request the community to rethink the improvement that Powermock can bring to the table.  Thank you for your considation.

I don't know enough about powermock to have opinions on the matter. I do know I don't like mocking in general https://www.slideshare.net/steve_l/i-hate-mocking , or at least in the one area where I find it most troublesome: maintaining code


I' just find that mock code tests to be very brittle to changes in the codepaths of the classes called, so whenever you change the implementation, tests fail. And it's not so much "your code has regressed and we correctly caught it"  failure as "the change in order of invocation caused our test to report a regression when it wasn't really" kind of failure. Which is bad, as you waste time working out that this is the cause, then often fix the problems by moving bits of the test around until it stops failing. Which can hide real regressions.

Where mocking can be good is in that

1. you can make assertions about how thinga were invoked, though note we've moved in S3A towards actually instrumenting the code and asserting on that. This way our shipping code gets to enjoy better instrumentation. [Note, those assertions can be brittle to changes in implementation too]

2. You can simulate failure better. But for S3Guard/S3A we've gone and implemented an InconsistentS3Client which can be used downstream (it ships in the hadoop-aws JAR) and so can be used downstream.

3. You can test things without needing so much support infra (e.g. in unit tests and on jenkins without needing logins, running services)

4. You can have faster tests, because there's no need to set up/tear down things like HDFS

5. You can isolate problems to the code under test, rather than looking at the logs of forked processes collected somewhere under target/

I think Eric's looking @ #4, & 5 which, for tests which need a MiniYARN cluster is significant. If Powermock helps this, I don't see why we should say "don't use it", as long as we are aware of the cost, which is the risk of creating tests which are brittle to changes in the implementation code


FWIW, Mocking is why I couldn't make the init/start/stop methods of org.apache.hadoop.service.AbstractService final; the need to test with mocking can impact production code. Is that bad? Well, we do other things to code to aid testability,...


-Steve



Re: [DISCUSS] HADOOP-9122 Add power mock library for writing better unit tests

Posted by Steve Loughran <st...@hortonworks.com>.
On 29 Sep 2017, at 22:46, Eric Yang <ey...@hortonworks.com>> wrote:

Hi Chris and Andrew,

The intend is for new code to have better unit test cases without resort to invocation of miniHDFSCluster or miniYarnCluster.  Existing code don’t require refactoring, if the test cases already have good coverages.  I am currently working on part of YARN to improve YARN and Docker integration.  There are a lot of code getting triggered for UGI, FileSystem object to Yarn job submission.  My code is only responsible to check the logic of the user input, and expected output prior to YarnClient job submission.  Starting a miniCluster for this test case is excessive for the small piece of code for validation.  The submission code was imported from Slider for YARN native services, a single class imports various Hadoop services.  In several failure cases, it is difficult to simulate exact error conditions because the API is several layers deep.  Powermock provides easy way to replace and stubbing return object or throw proper exception to simulate the failure conditions.  One can argue that the code should have been written easier for unit tests, but Hadoop code density is beyond trivial to get simple initialization done.  Constructor suppression, inner class replacement and private method override are good tools from Powermock that can provide more accurate testing without losing sights of multiple stage API calling tests while keeping the test case localized to a small piece of the greater puzzle.  Hence, I like to request the community to rethink the improvement that Powermock can bring to the table.  Thank you for your considation.

I don't know enough about powermock to have opinions on the matter. I do know I don't like mocking in general https://www.slideshare.net/steve_l/i-hate-mocking , or at least in the one area where I find it most troublesome: maintaining code


I' just find that mock code tests to be very brittle to changes in the codepaths of the classes called, so whenever you change the implementation, tests fail. And it's not so much "your code has regressed and we correctly caught it"  failure as "the change in order of invocation caused our test to report a regression when it wasn't really" kind of failure. Which is bad, as you waste time working out that this is the cause, then often fix the problems by moving bits of the test around until it stops failing. Which can hide real regressions.

Where mocking can be good is in that

1. you can make assertions about how thinga were invoked, though note we've moved in S3A towards actually instrumenting the code and asserting on that. This way our shipping code gets to enjoy better instrumentation. [Note, those assertions can be brittle to changes in implementation too]

2. You can simulate failure better. But for S3Guard/S3A we've gone and implemented an InconsistentS3Client which can be used downstream (it ships in the hadoop-aws JAR) and so can be used downstream.

3. You can test things without needing so much support infra (e.g. in unit tests and on jenkins without needing logins, running services)

4. You can have faster tests, because there's no need to set up/tear down things like HDFS

5. You can isolate problems to the code under test, rather than looking at the logs of forked processes collected somewhere under target/

I think Eric's looking @ #4, & 5 which, for tests which need a MiniYARN cluster is significant. If Powermock helps this, I don't see why we should say "don't use it", as long as we are aware of the cost, which is the risk of creating tests which are brittle to changes in the implementation code


FWIW, Mocking is why I couldn't make the init/start/stop methods of org.apache.hadoop.service.AbstractService final; the need to test with mocking can impact production code. Is that bad? Well, we do other things to code to aid testability,...


-Steve



Re: [DISCUSS] HADOOP-9122 Add power mock library for writing better unit tests

Posted by Eric Yang <ey...@hortonworks.com>.
Hi Chris and Andrew,

The intend is for new code to have better unit test cases without resort to invocation of miniHDFSCluster or miniYarnCluster.  Existing code don’t require refactoring, if the test cases already have good coverages.  I am currently working on part of YARN to improve YARN and Docker integration.  There are a lot of code getting triggered for UGI, FileSystem object to Yarn job submission.  My code is only responsible to check the logic of the user input, and expected output prior to YarnClient job submission.  Starting a miniCluster for this test case is excessive for the small piece of code for validation.  The submission code was imported from Slider for YARN native services, a single class imports various Hadoop services.  In several failure cases, it is difficult to simulate exact error conditions because the API is several layers deep.  Powermock provides easy way to replace and stubbing return object or throw proper exception to simulate the failure conditions.  One can argue that the code should have been written easier for unit tests, but Hadoop code density is beyond trivial to get simple initialization done.  Constructor suppression, inner class replacement and private method override are good tools from Powermock that can provide more accurate testing without losing sights of multiple stage API calling tests while keeping the test case localized to a small piece of the greater puzzle.  Hence, I like to request the community to rethink the improvement that Powermock can bring to the table.  Thank you for your considation.

Regards,
Eric

From: Andrew Wang <an...@cloudera.com>
Date: Friday, September 29, 2017 at 1:55 PM
To: Chris Douglas <cd...@apache.org>
Cc: Eric Yang <ey...@hortonworks.com>, "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>
Subject: Re: [DISCUSS] HADOOP-9122 Add power mock library for writing better unit tests

Making code testable is about a lot more than your mocking library. In HDFS, the NameNode is very monolithic, so it's hard to instantiate little pieces of functionality in isolation and mock things out. I have my doubts that Powermock will help with this unless someone's willing to invest in significant refactoring effort of our existing code and unit test suites.

I could see an argument of this being useful for new code being developed, but like Chris, I'd like to see an example of where Mockito falls short, and what additional capabilities Powermock brings to the table.

Best,
Andrew

On Fri, Sep 29, 2017 at 10:38 AM, Chris Douglas <cd...@apache.org>> wrote:
Eric-

Can you explain how Powermock differs from/augments Mockito, why we
should adopt it, and maybe an example of an existing test that could
be improved using this library? -C

On Fri, Sep 29, 2017 at 10:12 AM, Eric Yang <ey...@hortonworks.com>> wrote:
> Hi Hadoop-dev,
>
> Long time ago, Hadoop community decided to put Powermock on hold for unit tests.  Both mockito and powermock has evolved a lot in the past 5 years.  There are mature versions of both software, and there are compatibility charts to indicate which versions can work together.  Hadoop has grown a lot in the last 5 years.  It becomes apparent that without ability to instrument lower level classes to contain unit test scope.  Many tests are written to simulate integration test in order to perform unit tests.  The result is slow performance on unit tests, and some parts are not testable strictly in unit test case.  This discussion is to revisit the decision, and see if we would embrace Powermock and allow HADOOP-9122 to be implemented.  Feel free to comment on HADOOP-9122 and this thread to revisit this issue.
>
> Thank you for your time.
>
> Regards,
> Eric
---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org<ma...@hadoop.apache.org>
For additional commands, e-mail: common-dev-help@hadoop.apache.org<ma...@hadoop.apache.org>


Re: [DISCUSS] HADOOP-9122 Add power mock library for writing better unit tests

Posted by Andrew Wang <an...@cloudera.com>.
Making code testable is about a lot more than your mocking library. In
HDFS, the NameNode is very monolithic, so it's hard to instantiate little
pieces of functionality in isolation and mock things out. I have my doubts
that Powermock will help with this unless someone's willing to invest in
significant refactoring effort of our existing code and unit test suites.

I could see an argument of this being useful for new code being developed,
but like Chris, I'd like to see an example of where Mockito falls short,
and what additional capabilities Powermock brings to the table.

Best,
Andrew

On Fri, Sep 29, 2017 at 10:38 AM, Chris Douglas <cd...@apache.org> wrote:

> Eric-
>
> Can you explain how Powermock differs from/augments Mockito, why we
> should adopt it, and maybe an example of an existing test that could
> be improved using this library? -C
>
> On Fri, Sep 29, 2017 at 10:12 AM, Eric Yang <ey...@hortonworks.com> wrote:
> > Hi Hadoop-dev,
> >
> > Long time ago, Hadoop community decided to put Powermock on hold for
> unit tests.  Both mockito and powermock has evolved a lot in the past 5
> years.  There are mature versions of both software, and there are
> compatibility charts to indicate which versions can work together.  Hadoop
> has grown a lot in the last 5 years.  It becomes apparent that without
> ability to instrument lower level classes to contain unit test scope.  Many
> tests are written to simulate integration test in order to perform unit
> tests.  The result is slow performance on unit tests, and some parts are
> not testable strictly in unit test case.  This discussion is to revisit the
> decision, and see if we would embrace Powermock and allow HADOOP-9122 to be
> implemented.  Feel free to comment on HADOOP-9122 and this thread to
> revisit this issue.
> >
> > Thank you for your time.
> >
> > Regards,
> > Eric
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: common-dev-help@hadoop.apache.org
>
>

Re: [DISCUSS] HADOOP-9122 Add power mock library for writing better unit tests

Posted by Chris Douglas <cd...@apache.org>.
Eric-

Can you explain how Powermock differs from/augments Mockito, why we
should adopt it, and maybe an example of an existing test that could
be improved using this library? -C

On Fri, Sep 29, 2017 at 10:12 AM, Eric Yang <ey...@hortonworks.com> wrote:
> Hi Hadoop-dev,
>
> Long time ago, Hadoop community decided to put Powermock on hold for unit tests.  Both mockito and powermock has evolved a lot in the past 5 years.  There are mature versions of both software, and there are compatibility charts to indicate which versions can work together.  Hadoop has grown a lot in the last 5 years.  It becomes apparent that without ability to instrument lower level classes to contain unit test scope.  Many tests are written to simulate integration test in order to perform unit tests.  The result is slow performance on unit tests, and some parts are not testable strictly in unit test case.  This discussion is to revisit the decision, and see if we would embrace Powermock and allow HADOOP-9122 to be implemented.  Feel free to comment on HADOOP-9122 and this thread to revisit this issue.
>
> Thank you for your time.
>
> Regards,
> Eric

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org