You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/04/26 01:31:02 UTC

[GitHub] [ozone] neils-dev opened a new pull request #2183: HDDS-5099. Error with unit test for hdds.container-service TestSchemaOneBackwardsCompatibility

neils-dev opened a new pull request #2183:
URL: https://github.com/apache/ozone/pull/2183


   ## What changes were proposed in this pull request?
   
   Fixes for intermittent unit test failures found in  hdds.container-service _TestSchemaOneBackwardsCompatibility_ and in _TestBlockDeletingService_.  Current implementation for unit tests uses the _BackgroundService_ for block deletion.  This service uses a thread pool executor that uses worker thread tasks that run asynchronously.  The unit test callers for this service 'start' the _PeriodicalTask_ of the _BackgroundService_ and expect the workers to finish deleting blocks when they do checks for expected results.  The thread pool executor runs the worker tasks, _however_ they do not run to completion prior to returning control to the unit test callers.  This leads to an **intermittent error** due to results not ready when the unit test caller checks for expected results.
   
   Patch _**extends**_ the _PeriodicalTask_ for the background service and runs the worker tasks async until completion prior to returning control to the unit test caller.  This is realized in the _BlockDeletingServiceTestImpl_ _**PeriodicalTaskTestImpl**_ implementation.
   
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-5099
   
   ## How was this patch tested?
   
   Patch was tested with unit tests : _**TestSchemaOneBackwardsCompatibility**_ and _**TestBlockDeletingService**_.
   
   to see error, reproduce the error reliably (it is intermittent), can force failure though injecting fault into the 
   _BlockDeletingService.java_ implementation - adding a small delay to the BackgroundTask call():
   ```
       public BackgroundTaskResult call() throws Exception {
         ContainerBackgroundTaskResult crr;
         final Container container = ozoneContainer.getContainerSet()
             .getContainer(containerData.getContainerID());
         container.writeLock();
         File dataDir = new File(containerData.getChunksPath());
         long startTime = Time.monotonicNow();
         // Scan container's db and get list of under deletion blocks
         try (ReferenceCountedDB meta = BlockUtils.getDB(containerData, conf)) {
           TimeUnit.MILLISECONDS.sleep(100); 
   ```
   **pre-patch**
   
   `hadoop-hdds/container-service$ mvn -Dtest=TestSchemaOneBackwardsCompatibility#testDelete test` 
   ![TestSchemaOneBackwardsCompatibility_prepatch](https://user-images.githubusercontent.com/81126310/116017324-86fcfb00-a5fc-11eb-8cd7-06924ea9b567.png)
   
   `hadoop-hdds/container-service$ mvn -Dtest=TestBlockDeletingService test
   `
   ![testBlockDeletingService_prepatch](https://user-images.githubusercontent.com/81126310/116017350-9714da80-a5fc-11eb-8f86-840727dc1c8e.png)
   
   **after patch is applied**
   
   `hadoop-hdds/container-service$ mvn -Dtest=TestSchemaOneBackwardsCompatibility#testDelete test` 
   ![TestSchemaOneBackwardsCompatibility_afterpatch](https://user-images.githubusercontent.com/81126310/116017336-8cf2dc00-a5fc-11eb-8a6d-a72d2166789b.png)
   
   `hadoop-hdds/container-service$ mvn -Dtest=TestBlockDeletingService test`
   
   ![TestBlockDeletingService_afterpatch](https://user-images.githubusercontent.com/81126310/116017380-a85de700-a5fc-11eb-829c-bd0973a900b4.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] elek commented on a change in pull request #2183: HDDS-5099. Error with unit test for hdds.container-service TestSchemaOneBackwardsCompatibility

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #2183:
URL: https://github.com/apache/ozone/pull/2183#discussion_r625665399



##########
File path: hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/TestSchemaOneBackwardsCompatibility.java
##########
@@ -279,6 +288,8 @@ public void testDelete() throws Exception {
               refCountedDB.getStore().getMetadataTable();
       assertEquals(expectedRegularBlocks + expectedDeletingBlocks,
               (long)metadataTable.get(OzoneConsts.BLOCK_COUNT));
+    //} catch(IOException ex) {

Review comment:
       Do we need this commented out test? Or can we just remove this part?

##########
File path: hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/testutils/BlockDeletingServiceTestImpl.java
##########
@@ -80,6 +80,7 @@ public void start() {
           break;
         }
         Future<?> future = this.getExecutorService().submit(svc);
+

Review comment:
       This line may not be required to modified and this case this test will remain unchanged.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] elek commented on pull request #2183: HDDS-5099. Error with unit test for hdds.container-service TestSchemaOneBackwardsCompatibility

Posted by GitBox <gi...@apache.org>.
elek commented on pull request #2183:
URL: https://github.com/apache/ozone/pull/2183#issuecomment-836394353


   All the comments seems to be solved. Merging it now. Thanks the review @symious and @lokeshj1703 and the contribution @neils-dev 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on pull request #2183: HDDS-5099. Error with unit test for hdds.container-service TestSchemaOneBackwardsCompatibility

Posted by GitBox <gi...@apache.org>.
symious commented on pull request #2183:
URL: https://github.com/apache/ozone/pull/2183#issuecomment-827459074


   I think the point HDDS-4231 is to remove the waiting for async deletes. So before HDDS-4231, this test won't have intermittent exception, since it will always wait for the deletion to complete.
   Using a different deletion implementation seems a little confused here, better to use the original one for the goal of test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] neils-dev commented on pull request #2183: HDDS-5099. Error with unit test for hdds.container-service TestSchemaOneBackwardsCompatibility

Posted by GitBox <gi...@apache.org>.
neils-dev commented on pull request #2183:
URL: https://github.com/apache/ozone/pull/2183#issuecomment-827416873


   > I think one point of these unit tests is to test the feature of `BackgroundService.PeriodicTask`, if we create a new PeriodicTask implementation for tests, it seems a little diverged from the goal of these unit tests.
   > In [HDDS-4231](https://issues.apache.org/jira/browse/HDDS-4231), the waiting for async results is removed, I think we can only focus on solving the intermittent exception here, no need to write the original code for tests.
   > Besides, the comment of `BackgroundService.PeriodicTask` is not accurate since [HDDS-4231](https://issues.apache.org/jira/browse/HDDS-4231), could you help to update the comment?
   
   Thanks for the followup @symious.  I understand your comment on extending the `PeriodicTask`, the goal of the unit test in this case is to test deleting blocks - both with the older schema and the block deleting service.  The goal of the unit test in these cases I believe is not to exercise the `PeriodicTask` of the `background service`.  Further, we are not using the background task in the intended '_periodic_' fashion in these cases as the comment in line 136 implies and how the test operates in a single fashion and _not_ periodic.  
   
   On the [HDDS-4231] issues, the extended `PeriodicalTaskTestImpl` propagates this LOG.warn as in line 118 so will not change affected callers such as the `TestBlockDeletingService`. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] neils-dev commented on pull request #2183: HDDS-5099. Error with unit test for hdds.container-service TestSchemaOneBackwardsCompatibility

Posted by GitBox <gi...@apache.org>.
neils-dev commented on pull request #2183:
URL: https://github.com/apache/ozone/pull/2183#issuecomment-831653772


   Changes pushed to reflect comments.  Thanks **@symious**  and **@lokeshj1703** for insights and suggestions on the patch to fix the existing problems with the _BlockDeletionService_ related unit tests.  To exercise the existing _PeriodicTask_ implementation with our unit tests, I _**removed**_ the extended task from the _BlockDeletingServiceTestImpl_.  Instead a timeout is used (3 sec as was defined previously in the _BlockDeletingServiceTestImpl_), for tests involving the background service _BlockDeletionService_.  Here, using the _waitFor()_ test util we poll for the workers to finish.  This in line with existing tests in the _TestBockDeletionService_.  With this implementation we do **_not_** impact the production _backgroundservice_ or any of its subclasses with added variables both volatile and non volatile that can cause unnecessary processing and contention.  Thanks guys.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] lokeshj1703 commented on a change in pull request #2183: HDDS-5099. Error with unit test for hdds.container-service TestSchemaOneBackwardsCompatibility

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on a change in pull request #2183:
URL: https://github.com/apache/ozone/pull/2183#discussion_r625695922



##########
File path: hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/TestBlockDeletingService.java
##########
@@ -713,9 +723,14 @@ public void testBlockThrottle() throws Exception {
       // - current total space of container(it will be zero as all blocks
       // in all the containers are deleted)).
       deleteAndWait(service, 2);
-      Assert.assertEquals(blocksPerContainer * containerCount * blockSpace,
-          (totalContainerSpace - currentBlockSpace(containerData,
-              containerCount)));
+
+      long totalContainerBlocks = blocksPerContainer*containerCount;
+      GenericTestUtils.waitFor(() ->
+              totalContainerBlocks*blockSpace ==
+                      (totalContainerSpace-
+                              currentBlockSpace(containerData, containerCount)),

Review comment:
       NIT. Same as above. 

##########
File path: hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/TestBlockDeletingService.java
##########
@@ -447,6 +447,8 @@ public void testBlockDeletion() throws Exception {
       // An interval will delete 1 * 2 blocks
       deleteAndWait(svc, 1);
 
+      GenericTestUtils.waitFor(() ->
+              containerData.get(0).getBytesUsed() < 3, 100, 3000);

Review comment:
       Unrelated to your changes. Can we set the block deletion limit to 2 at line 401?
   `dnConf.setBlockDeletionLimit(2);`
   Then here we should check `containerData.get(0).getBytesUsed() == 100`

##########
File path: hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/TestBlockDeletingService.java
##########
@@ -702,9 +709,12 @@ public void testBlockThrottle() throws Exception {
       // Deleted space of 10 blocks should be equal to (initial total space
       // of container - current total space of container).
       deleteAndWait(service, 1);
-      Assert.assertEquals(blockLimitPerInterval * blockSpace,
-          (totalContainerSpace - currentBlockSpace(containerData,
-              containerCount)));
+
+      GenericTestUtils.waitFor(() ->
+              blockLimitPerInterval*blockSpace ==
+                      (totalContainerSpace-
+                              currentBlockSpace(containerData, containerCount)),
+              100, 3000);

Review comment:
       Minor NIT: Space between `blockLimitPerInterval * blockSpace` and
   `(totalContainerSpace -`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] neils-dev commented on a change in pull request #2183: HDDS-5099. Error with unit test for hdds.container-service TestSchemaOneBackwardsCompatibility

Posted by GitBox <gi...@apache.org>.
neils-dev commented on a change in pull request #2183:
URL: https://github.com/apache/ozone/pull/2183#discussion_r628661916



##########
File path: hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/testutils/BlockDeletingServiceTestImpl.java
##########
@@ -80,6 +80,7 @@ public void start() {
           break;
         }
         Future<?> future = this.getExecutorService().submit(svc);
+

Review comment:
       Left over from previous additions to `BlockDeletingServiceTestImpl` - unnecessary and removed.

##########
File path: hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/TestSchemaOneBackwardsCompatibility.java
##########
@@ -279,6 +288,8 @@ public void testDelete() throws Exception {
               refCountedDB.getStore().getMetadataTable();
       assertEquals(expectedRegularBlocks + expectedDeletingBlocks,
               (long)metadataTable.get(OzoneConsts.BLOCK_COUNT));
+    //} catch(IOException ex) {

Review comment:
       unnecessary, removed.

##########
File path: hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/TestBlockDeletingService.java
##########
@@ -713,9 +723,14 @@ public void testBlockThrottle() throws Exception {
       // - current total space of container(it will be zero as all blocks
       // in all the containers are deleted)).
       deleteAndWait(service, 2);
-      Assert.assertEquals(blocksPerContainer * containerCount * blockSpace,
-          (totalContainerSpace - currentBlockSpace(containerData,
-              containerCount)));
+
+      long totalContainerBlocks = blocksPerContainer*containerCount;
+      GenericTestUtils.waitFor(() ->
+              totalContainerBlocks*blockSpace ==
+                      (totalContainerSpace-
+                              currentBlockSpace(containerData, containerCount)),

Review comment:
       change added.

##########
File path: hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/TestBlockDeletingService.java
##########
@@ -702,9 +709,12 @@ public void testBlockThrottle() throws Exception {
       // Deleted space of 10 blocks should be equal to (initial total space
       // of container - current total space of container).
       deleteAndWait(service, 1);
-      Assert.assertEquals(blockLimitPerInterval * blockSpace,
-          (totalContainerSpace - currentBlockSpace(containerData,
-              containerCount)));
+
+      GenericTestUtils.waitFor(() ->
+              blockLimitPerInterval*blockSpace ==
+                      (totalContainerSpace-
+                              currentBlockSpace(containerData, containerCount)),
+              100, 3000);

Review comment:
       change added.

##########
File path: hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/TestBlockDeletingService.java
##########
@@ -447,6 +447,8 @@ public void testBlockDeletion() throws Exception {
       // An interval will delete 1 * 2 blocks
       deleteAndWait(svc, 1);
 
+      GenericTestUtils.waitFor(() ->
+              containerData.get(0).getBytesUsed() < 3, 100, 3000);

Review comment:
       Thanks @lokeshj1703 - `dfConf.setBlockDeletionLimit` corrected from 3 to 2 for test.  Setting conditional check to equiv 100 (`containerData.get(0).getBytesUsed() == containerSpace / 3`).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on pull request #2183: HDDS-5099. Error with unit test for hdds.container-service TestSchemaOneBackwardsCompatibility

Posted by GitBox <gi...@apache.org>.
symious commented on pull request #2183:
URL: https://github.com/apache/ozone/pull/2183#issuecomment-827252271


   I think one point of these unit tests is to test the feature of `BackgroundService.PeriodicTask`, if we create a new PeriodicTask implementation for tests, it seems a little diverged from the goal of these unit tests.
   In [HDDS-4231](https://issues.apache.org/jira/browse/HDDS-4231), the waiting for async results is removed, I think we can only focus on solving the intermittent exception here, no need to write the original code for tests.
   Besides, the comment of `BackgroundService.PeriodicTask` is not accurate since HDDS-4231, could you help to update the comment?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on pull request #2183: HDDS-5099. Error with unit test for hdds.container-service TestSchemaOneBackwardsCompatibility

Posted by GitBox <gi...@apache.org>.
symious commented on pull request #2183:
URL: https://github.com/apache/ozone/pull/2183#issuecomment-826527220


   Since we have found the root cause of the intermittent exception in this case, can we simply wait a while before `future.get` in `BlockDeletingServiceTestImpl$start"?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on pull request #2183: HDDS-5099. Error with unit test for hdds.container-service TestSchemaOneBackwardsCompatibility

Posted by GitBox <gi...@apache.org>.
symious commented on pull request #2183:
URL: https://github.com/apache/ozone/pull/2183#issuecomment-826527220


   Since we have found the root cause of the intermittent exception in this case, can we simply wait a while before `future.get` in `BlockDeletingServiceTestImpl$start"?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] neils-dev commented on pull request #2183: HDDS-5099. Error with unit test for hdds.container-service TestSchemaOneBackwardsCompatibility

Posted by GitBox <gi...@apache.org>.
neils-dev commented on pull request #2183:
URL: https://github.com/apache/ozone/pull/2183#issuecomment-827046293


   > Since we have found the root cause of the intermittent exception in this case, can we simply wait a while before `future.get` in `BlockDeletingServiceTestImpl$start"?
   
   Thanks @symious - true, a delay in BlockDeleteingServiceTestImpl may work provided the delay is sufficient for the workers to complete the tasks; we can choose 10 - 200 ms however this value would vary from platform to platform.  Here we have the thread executor dispatch the completion futures and return when the worker threads are finished, ensuring that the results are available regardless of the platform/environment we are running on.  Another implementation can be through a latch and releasing the caller one the workers have finished, however that would require implementing the counter creating  the same behavior as the patch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] neils-dev commented on pull request #2183: HDDS-5099. Error with unit test for hdds.container-service TestSchemaOneBackwardsCompatibility

Posted by GitBox <gi...@apache.org>.
neils-dev commented on pull request #2183:
URL: https://github.com/apache/ozone/pull/2183#issuecomment-834947424


   Changes from reviewers comments added in latest commit - thanks @lokeshj1703 , @symious and @elek ! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] elek merged pull request #2183: HDDS-5099. Error with unit test for hdds.container-service TestSchemaOneBackwardsCompatibility

Posted by GitBox <gi...@apache.org>.
elek merged pull request #2183:
URL: https://github.com/apache/ozone/pull/2183


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on pull request #2183: HDDS-5099. Error with unit test for hdds.container-service TestSchemaOneBackwardsCompatibility

Posted by GitBox <gi...@apache.org>.
symious commented on pull request #2183:
URL: https://github.com/apache/ozone/pull/2183#issuecomment-831840340


   > Thanks the patch @neils-dev. I have two very minor nit comments, but it looks good to me.
   > 
   > @lokeshj1703 / @symious: are you fine with the latest version?
   
   @neils-dev Thanks for the update. LGTM after solving @elek 's comment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org