You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/03/29 08:38:44 UTC

[GitHub] [ozone] GlenGeng opened a new pull request #2090: HDDS-5033. SCM may not be able to know full port list of Datanode after Datanode is started.

GlenGeng opened a new pull request #2090:
URL: https://github.com/apache/ozone/pull/2090


   ## What changes were proposed in this pull request?
   
   When SCM HA is enabled, after restart DN, the SCM may not know the full ports of that DN.
   
   The issue should be: SCMNodeManager just record the DatanodeDetails once during register.
   
   ## What is the link to the Apache JIRA
   
   (Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HDDS-XXXX. Fix a typo in YYY.)
   
   Please replace this section with the link to the Apache JIRA)
   
   ## How was this patch tested?
   
   (Please explain how this patch was tested. Ex: unit tests, manual tests)
   (If this patch involves UI changes, please attach a screen-shot; otherwise, remove this)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant edited a comment on pull request #2090: HDDS-5033. SCM may not be able to know full port list of Datanode after Datanode is started.

Posted by GitBox <gi...@apache.org>.
bshashikant edited a comment on pull request #2090:
URL: https://github.com/apache/ozone/pull/2090#issuecomment-811738656


   @GlenGeng , can you explain why OzoneContainer#start() needs to be called multiple times if SCM HA is enabled for registering to the same SCM?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GlenGeng edited a comment on pull request #2090: HDDS-5033. SCM may not be able to know full port list of Datanode after Datanode is started.

Posted by GitBox <gi...@apache.org>.
GlenGeng edited a comment on pull request #2090:
URL: https://github.com/apache/ozone/pull/2090#issuecomment-811774644


   > @GlenGeng , can you explain why OzoneContainer#start() needs to be called multiple times if SCM HA is enabled for registering to the same SCM?
   
   Please check 
   ```
   public class SCMConnectionManager
       implements Closeable, SCMConnectionManagerMXBean {
     private static final Logger LOG =
         LoggerFactory.getLogger(SCMConnectionManager.class);
   
     private final ReadWriteLock mapLock;
     private final Map<InetSocketAddress, EndpointStateMachine> scmMachines;
   ```
   
   The SCM Connection will go through `VersionEndpointTask`, `RegisterEndpointTask` and `HeartbeatEndpointTask`.
   
   Previously, the `VersionEndpointTask#call()` will execute
   ```
             // Start the container services after getting the version information
             ozoneContainer.start(clusterId);
   ``` 
   
   If SCM HA is enabled, there will be 3 VersionEndpointTask created, one for each SCM. 
   
   DN will call `VersionEndpointTask#call` for each of them. Yet, we need ensure that `OzoneContainer` should only be started once.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GlenGeng commented on pull request #2090: HDDS-5033. SCM may not be able to know full port list of Datanode after Datanode is started.

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on pull request #2090:
URL: https://github.com/apache/ozone/pull/2090#issuecomment-811774644


   > @GlenGeng , can you explain why OzoneContainer#start() needs to be called multiple times if SCM HA is enabled for registering to the same SCM?
   
   Please check 
   ```
   public class SCMConnectionManager
       implements Closeable, SCMConnectionManagerMXBean {
     private static final Logger LOG =
         LoggerFactory.getLogger(SCMConnectionManager.class);
   
     private final ReadWriteLock mapLock;
     private final Map<InetSocketAddress, EndpointStateMachine> scmMachines;
   ```
   
   If SCM HA is enabled, there will be 3 VersionEndpointTask created, one for each SCM. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GlenGeng commented on pull request #2090: HDDS-5033. SCM may not be able to know full port list of Datanode after Datanode is started.

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on pull request #2090:
URL: https://github.com/apache/ozone/pull/2090#issuecomment-810922877


   @bshashikant @nandakumar131 Please take a look at this bug fix of SCM HA. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GlenGeng commented on pull request #2090: HDDS-5033. SCM may not be able to know full port list of Datanode after Datanode is started.

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on pull request #2090:
URL: https://github.com/apache/ozone/pull/2090#issuecomment-811785713


   > @GlenGeng , can you explain why OzoneContainer#start() needs to be called multiple times if SCM HA is enabled for registering to the same SCM?
   
   Actually no, each call is for different SCM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2090: HDDS-5033. SCM may not be able to know full port list of Datanode after Datanode is started.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2090:
URL: https://github.com/apache/ozone/pull/2090#discussion_r604777985



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/OzoneContainer.java
##########
@@ -257,10 +268,21 @@ private void stopContainerScrub() {
    * @throws IOException
    */
   public void start(String clusterId) throws IOException {
-    if (!isStarted.compareAndSet(false, true)) {
+    if (!initializingStatus.compareAndSet(

Review comment:
       Please add some documentation here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GlenGeng edited a comment on pull request #2090: HDDS-5033. SCM may not be able to know full port list of Datanode after Datanode is started.

Posted by GitBox <gi...@apache.org>.
GlenGeng edited a comment on pull request #2090:
URL: https://github.com/apache/ozone/pull/2090#issuecomment-811774644


   > @GlenGeng , can you explain why OzoneContainer#start() needs to be called multiple times if SCM HA is enabled for registering to the same SCM?
   
   Please check 
   ```
   public class SCMConnectionManager
       implements Closeable, SCMConnectionManagerMXBean {
     private static final Logger LOG =
         LoggerFactory.getLogger(SCMConnectionManager.class);
   
     private final ReadWriteLock mapLock;
     private final Map<InetSocketAddress, EndpointStateMachine> scmMachines;
   ```
   
   If SCM HA is enabled, there will be 3 VersionEndpointTask created, one for each SCM. DN will call `VersionEndpointTask#call` for each of them. Yet, we need ensure that `OzoneContainer` should only be started once.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on pull request #2090: HDDS-5033. SCM may not be able to know full port list of Datanode after Datanode is started.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on pull request #2090:
URL: https://github.com/apache/ozone/pull/2090#issuecomment-811738656


   @GlenGeng , can you explain why OzoneContainer#start() needs to be called multiple times if SCM HA is enabled?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GlenGeng commented on a change in pull request #2090: HDDS-5033. SCM may not be able to know full port list of Datanode after Datanode is started.

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on a change in pull request #2090:
URL: https://github.com/apache/ozone/pull/2090#discussion_r605350148



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/OzoneContainer.java
##########
@@ -257,10 +268,21 @@ private void stopContainerScrub() {
    * @throws IOException
    */
   public void start(String clusterId) throws IOException {
-    if (!isStarted.compareAndSet(false, true)) {
+    if (!initializingStatus.compareAndSet(

Review comment:
       Done. Please take another look !




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant merged pull request #2090: HDDS-5033. SCM may not be able to know full port list of Datanode after Datanode is started.

Posted by GitBox <gi...@apache.org>.
bshashikant merged pull request #2090:
URL: https://github.com/apache/ozone/pull/2090


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org