You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/08/10 03:24:27 UTC

[GitHub] [ozone] neils-dev opened a new pull request #2518: HDDS-5358. Intermittent failure in testGetS3SecretAndRevokeS3Secret

neils-dev opened a new pull request #2518:
URL: https://github.com/apache/ozone/pull/2518


   ## What changes were proposed in this pull request?
   Problem: Intermittent failure with `testGetS3SecretAndRevokeS3Secret`.  Occasionally, s3 secret is found after it has been revoked.
   On S3 Secret Revoke, specifically on call to `S3RevokeSecretRequest`, the s3 secret is immediately stricken from the s3 secret cache however the action to remove from the s3 table is done through a transaction log batch job request.  These transaction log batch requests are handled by a separate worker.  Due to this, there are times when the cache and s3 table are inconsistent, where the cache is consistent with the revoke request but the request has not yet propagated to the s3 table.  When a key is not found in the cache, it is looked up from the s3 table, hence the problem observed with intermittent integration test failure.
   
   This pr proposes a patch that within the `S3RevokeSecretRequest` repetitively checks the s3 table entry until it is removed or a timeout condition occurs.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5358
   
   ## How was this patch tested?
   
   Patch tested through integration test on CI environment through git action workflow:
   https://github.com/neils-dev/ozone/actions/runs/1115101396
   
   `hadoop-ozone/dev-support/checks/integration.sh`
   with environment variables `$ITERATIONS=60, $MAVEN_OPTS: -Dhttp.keepAlive=false -Dmaven.wagon.http.pool=false -Dmaven.wagon.http.retryHandler.class=standard -Dmaven.wagon.http.retryHandler.count=3`
   
   `hadoop-ozone/dev-support/checks/integration.sh -Dtest=TestSecureOzoneCluster#testGetS3SecretAndRevokeS3Secret`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2518: HDDS-5358. Intermittent failure in testGetS3SecretAndRevokeS3Secret

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2518:
URL: https://github.com/apache/ozone/pull/2518#issuecomment-895987765


   @neils-dev 
   The explanation you have said is already handled, if cache is explicitly having null, we considered it is deleted, as we delete from table by doubleuffer thread in background.
   
   ```
     public CacheResult<CACHEVALUE> lookup(CACHEKEY cachekey) {
   
       CACHEVALUE cachevalue = cache.get(cachekey);
       if (cachevalue == null) {
         return new CacheResult<>(CacheResult.CacheStatus.MAY_EXIST,
               null);
       } else {
         if (cachevalue.getCacheValue() != null) {
           return new CacheResult<>(CacheResult.CacheStatus.EXISTS, cachevalue);
         } else {
           // When entity is marked for delete, cacheValue will be set to null.
           return new CacheResult<>(CacheResult.CacheStatus.NOT_EXIST, null);
         }
       }
     }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] neils-dev commented on pull request #2518: HDDS-5358. Intermittent failure in testGetS3SecretAndRevokeS3Secret

Posted by GitBox <gi...@apache.org>.
neils-dev commented on pull request #2518:
URL: https://github.com/apache/ozone/pull/2518#issuecomment-897009395


   > > How are we observing intermittent errors in this case? Would the s3 secret integration test always fail then?
   > 
   > We incorrectly update wrong table cache, but if double buffer flush completes and then assert check happens it will succeed, else it will fail.
   
   Ok.  Assert check with double buffer flush invalidates the cache entry - _`cleanupCache` with commited transactions during flush_ - **Thanks** @bharatviswa504 , @adoroszlai 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] neils-dev commented on pull request #2518: HDDS-5358. Intermittent failure in testGetS3SecretAndRevokeS3Secret

Posted by GitBox <gi...@apache.org>.
neils-dev commented on pull request #2518:
URL: https://github.com/apache/ozone/pull/2518#issuecomment-896502861


   Thanks @bharatviswa504 for review and comments. Thanks @adoroszlai for spotting the error in invalidating the cache entry.  Have updated commit with invaliding correct cache in revoke.
   
   Q. In the S3RevokeSecretRequest, incorrectly trying to strike the s3 secret from the cache would **ensure** that the s3 key **exists** in the s3 secret cache.  Any subsequent 'get' or 'lookup' for the s3 key would then always 'hit' and retrieve the s3 key the user revoked.  How are we observing intermittent errors in this case?  Would the s3 secret integration test always fail then?  Does the batch delete from the s3 table somehow **also** invalid the cache in the doubleuffer thread in background - thus intermittent failure observed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] smengcl commented on pull request #2518: HDDS-5358. Incorrect cache entry invalidation causes intermittent failure in testGetS3SecretAndRevokeS3Secret

Posted by GitBox <gi...@apache.org>.
smengcl commented on pull request #2518:
URL: https://github.com/apache/ozone/pull/2518#issuecomment-897236067


   Merged. Thanks @neils-dev for the PR and others for reviewing. I have changed the jira title a bit to clarify the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] smengcl commented on pull request #2518: HDDS-5358. Intermittent failure in testGetS3SecretAndRevokeS3Secret

Posted by GitBox <gi...@apache.org>.
smengcl commented on pull request #2518:
URL: https://github.com/apache/ozone/pull/2518#issuecomment-897153889


   Thanks @neils-dev @adoroszlai @bharatviswa504 for catching this bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2518: HDDS-5358. Intermittent failure in testGetS3SecretAndRevokeS3Secret

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2518:
URL: https://github.com/apache/ozone/pull/2518#issuecomment-896974775


   >How are we observing intermittent errors in this case? Would the s3 secret integration test always fail then?
   
   We incorrectly update wrong table cache, but if double buffer flush completes and then assert check happens it will succeed, else it will fail.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] smengcl merged pull request #2518: HDDS-5358. Incorrect cache entry invalidation causes intermittent failure in testGetS3SecretAndRevokeS3Secret

Posted by GitBox <gi...@apache.org>.
smengcl merged pull request #2518:
URL: https://github.com/apache/ozone/pull/2518


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org