You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/05/02 04:00:40 UTC

[GitHub] [ozone] ChenSammi opened a new pull request, #3376: HDDS-6685. Follower OM crashed when validating S3 auth info.

ChenSammi opened a new pull request, #3376:
URL: https://github.com/apache/ozone/pull/3376

   https://issues.apache.org/jira/browse/HDDS-6685
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kaijchen commented on pull request #3376: HDDS-6685. Follower OM crashed when validating S3 auth info.

Posted by GitBox <gi...@apache.org>.
kaijchen commented on PR #3376:
URL: https://github.com/apache/ozone/pull/3376#issuecomment-1118663963

   I have just run 100x `testPrepareDownedOM` with this PR reverted, and surprisingly got a clean run: https://github.com/kaijchen/ozone/actions/runs/2275278052
   
   I guess there might be some issues around restarting the `omRpcServer`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #3376: HDDS-6685. Follower OM crashed when validating S3 auth info.

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on PR #3376:
URL: https://github.com/apache/ozone/pull/3376#issuecomment-1114672199

   Thanks @ChenSammi for working on this.
   
   Can you please check `TestOzoneManagerPrepare`?  Seems to be failing consistently (also locally).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-6685. Follower OM crashed when validating S3 auth info. [ozone]

Posted by "adoroszlai (via GitHub)" <gi...@apache.org>.
adoroszlai commented on PR #3376:
URL: https://github.com/apache/ozone/pull/3376#issuecomment-1902739581

   HDDS-10177 seems to be caused by this change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #3376: HDDS-6685. Follower OM crashed when validating S3 auth info.

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on PR #3376:
URL: https://github.com/apache/ozone/pull/3376#issuecomment-1118437052

   Yes, it seems bind exception is fixed.  But `testPrepareDownedOM` still times out at `assertClusterNotPrepared`, because the restarted OM does not get the cancel prepare request.  It is stuck installing snapshot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kaijchen commented on pull request #3376: HDDS-6685. Follower OM crashed when validating S3 auth info.

Posted by GitBox <gi...@apache.org>.
kaijchen commented on PR #3376:
URL: https://github.com/apache/ozone/pull/3376#issuecomment-1118161635

   Hi @ChenSammi, seems this PR breaks `TestOzoneManagerPrepare#testPrepareDownedOM`, could you please take a look?
   
   * With this change: https://github.com/kaijchen/ozone/actions/runs/2273676445
   * Without this change: https://github.com/kaijchen/ozone/actions/runs/2273677044


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on pull request #3376: HDDS-6685. Follower OM crashed when validating S3 auth info.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on PR #3376:
URL: https://github.com/apache/ozone/pull/3376#issuecomment-1116861623

   Thanks @nandakumar131 and @adoroszlai for the code review. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on pull request #3376: HDDS-6685. Follower OM crashed when validating S3 auth info.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on PR #3376:
URL: https://github.com/apache/ozone/pull/3376#issuecomment-1114750435

   > Thanks @ChenSammi for working on this.
   > 
   > Can you please check `TestOzoneManagerPrepare`? Seems to be failing consistently (also locally).
   
   TestOzoneManagerPrepare is passed locally on my laptop. I just pushed a new patch. Let's see if   TestOzoneManagerPrepare still fails in CI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kaijchen commented on pull request #3376: HDDS-6685. Follower OM crashed when validating S3 auth info.

Posted by GitBox <gi...@apache.org>.
kaijchen commented on PR #3376:
URL: https://github.com/apache/ozone/pull/3376#issuecomment-1118773531

   I did some experiment and find out that the failure of `testPrepareDownedOM ` is related to restarting omRpcServer.
   
   This change caused CI to fail (not in master, but equivalent): https://github.com/kaijchen/ozone/commit/862cb95a17de06943d6584b8797cf31ffdb4e3d5
   100x CI (before): https://github.com/kaijchen/ozone/actions/runs/2276704678
   100x CI (after): https://github.com/kaijchen/ozone/actions/runs/2276739976
   
   The revert to make CI pass (against master): https://github.com/kaijchen/ozone/commit/04bdf75a66bcaab4cbb53f74d18a9c791b1d4d48
   100x CI: https://github.com/kaijchen/ozone/actions/runs/2276881492


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #3376: HDDS-6685. Follower OM crashed when validating S3 auth info.

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on PR #3376:
URL: https://github.com/apache/ozone/pull/3376#issuecomment-1118542291

   Upgrade to current Ratis 2.3.0-SNAPSHOT (current `master`) with Ratis Thirdparty 1.0.0 (pending release) seems to resolve the problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi merged pull request #3376: HDDS-6685. Follower OM crashed when validating S3 auth info.

Posted by GitBox <gi...@apache.org>.
ChenSammi merged PR #3376:
URL: https://github.com/apache/ozone/pull/3376


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #3376: HDDS-6685. Follower OM crashed when validating S3 auth info.

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on PR #3376:
URL: https://github.com/apache/ozone/pull/3376#issuecomment-1118220046

   @ChenSammi `TestOzoneManagerPrepare` was flaky before (HDDS-5990), but it is definitely much worse now than previously.
   
   Seems like stop/start logic is not entirely fail-safe, or test needs additional/adjusted `waitFor` checks?
   
   ```
   2022-05-05 01:41:52,188 [pool-2338-thread-1] ERROR om.OzoneManager (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to start RPC Server.
   java.net.BindException: Problem binding to [localhost:41487] java.net.BindException: Address already in use
     ...
     at org.apache.hadoop.ozone.om.OzoneManager.startRpcServer(OzoneManager.java:1084)
     at org.apache.hadoop.ozone.om.OzoneManager.getRpcServer(OzoneManager.java:1054)
     at org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3366)
     at org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3254)
     at org.apache.hadoop.ozone.om.OzoneManager.installSnapshotFromLeader(OzoneManager.java:3231)
   ```
   
   https://github.com/adoroszlai/ozone-build-results/tree/master/2022/05/05/14826/it-flaky/hadoop-ozone/integration-test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #3376: HDDS-6685. Follower OM crashed when validating S3 auth info.

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on PR #3376:
URL: https://github.com/apache/ozone/pull/3376#issuecomment-1118254468

   Idea for fixing the bind exception: https://github.com/adoroszlai/hadoop-ozone/commit/65e11fa696640ba683c3e5a2c961355fe2d0870b
   
   Running 10 iterations:
   https://github.com/adoroszlai/hadoop-ozone/runs/6302199651
   
   Let's see if it's enough.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org