You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/06/15 09:30:59 UTC

[GitHub] [ozone] bharatviswa504 opened a new pull request #2338: HDDS-5341. Container report processing is single threaded

bharatviswa504 opened a new pull request #2338:
URL: https://github.com/apache/ozone/pull/2338


   ## What changes were proposed in this pull request?
   
   Make container report processing multi-threaded.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5341
   
   ## How was this patch tested?
   
   Added UT.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2338: HDDS-5341. Container report processing is single threaded

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2338:
URL: https://github.com/apache/ozone/pull/2338#discussion_r654737149



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/server/events/FixedThreadPoolExecutor.java
##########
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdds.server.events;
+
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.util.StringUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+
+import org.apache.hadoop.metrics2.annotation.Metric;
+import org.apache.hadoop.metrics2.annotation.Metrics;
+import org.apache.hadoop.metrics2.lib.DefaultMetricsSystem;
+import org.apache.hadoop.metrics2.lib.MutableCounterLong;
+
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_EVENT_PREFIX;
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_EVENT_THREAD_POOL_SIZE_DEFAULT;
+
+/**
+ * Fixed thread pool EventExecutor to call all the event handler one-by-one.
+ *
+ * @param <P> the payload type of events
+ */
+@Metrics(context = "EventQueue")
+public class FixedThreadPoolExecutor<P> implements EventExecutor<P> {
+
+  private static final String EVENT_QUEUE = "EventQueue";
+
+  private static final Logger LOG =
+      LoggerFactory.getLogger(SingleThreadExecutor.class);
+
+  private final String name;
+
+  private final ExecutorService executor;
+
+  @Metric
+  private MutableCounterLong queued;
+
+  @Metric
+  private MutableCounterLong done;
+
+  @Metric
+  private MutableCounterLong failed;
+
+  @Metric
+  private MutableCounterLong scheduled;
+
+  /**
+   * Create SingleThreadExecutor.

Review comment:
       ```suggestion
      * Create FixedThreadPoolExecutor.
   ```

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/server/events/FixedThreadPoolExecutor.java
##########
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdds.server.events;
+
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.util.StringUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+
+import org.apache.hadoop.metrics2.annotation.Metric;
+import org.apache.hadoop.metrics2.annotation.Metrics;
+import org.apache.hadoop.metrics2.lib.DefaultMetricsSystem;
+import org.apache.hadoop.metrics2.lib.MutableCounterLong;
+
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_EVENT_PREFIX;
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_EVENT_THREAD_POOL_SIZE_DEFAULT;
+
+/**
+ * Fixed thread pool EventExecutor to call all the event handler one-by-one.
+ *
+ * @param <P> the payload type of events
+ */
+@Metrics(context = "EventQueue")
+public class FixedThreadPoolExecutor<P> implements EventExecutor<P> {
+
+  private static final String EVENT_QUEUE = "EventQueue";
+
+  private static final Logger LOG =
+      LoggerFactory.getLogger(SingleThreadExecutor.class);

Review comment:
       ```suggestion
         LoggerFactory.getLogger(FixedThreadPoolExecutor.class);
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2338: HDDS-5341. Container report processing is single threaded

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2338:
URL: https://github.com/apache/ozone/pull/2338#issuecomment-865578214


   Thank You @JacksonYao287 for the review. I have fixed review comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2338: HDDS-5341. Container report processing is single threaded

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2338:
URL: https://github.com/apache/ozone/pull/2338#issuecomment-865578214


   Thank You @JacksonYao287 for the review. I have fixed review comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 edited a comment on pull request #2338: HDDS-5341. Container report processing is single threaded

Posted by GitBox <gi...@apache.org>.
JacksonYao287 edited a comment on pull request #2338:
URL: https://github.com/apache/ozone/pull/2338#issuecomment-866498478


   @sodonnel 
   > 1. Does it make sense to thread pool the Incremental Reports too? I know they are much smaller, but they are also much more frequent. I have not seen any evidence they are backing up or not, but its worth considering / checking if we can.
   
   as far as i know, when a heartbeat from the data node arrives on SCM, It is queued for processing with the time stamp of when the heartbeat arrived. There is a heartbeat processing thread inside SCM that runs at a specified interval. So i think the point is how many reports is queued at SCM in the specified interval and how fast the report hander can deal with these report. the total num of Incremental Report in a specified interval(default 3s) is not very large , but it makes sense to promote it to a  thread pool if needed in the future. 
   
   > 2. I believe the DNs send a FCR every 60 - 90 seconds. Is that frequency really needed? Unknown bugs aside, is there any reason to send a FCR after startup and first registration? ON HDFS datanodes only send full block reports every 6 hours by default. If ICRs carry all the required information for SCM, perhaps we should increase the FCR interval to an hour or more?
   
   yea, i think this makes sense. too many FCR will Increase the burden of SCM
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2338: HDDS-5341. Container report processing is single threaded

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2338:
URL: https://github.com/apache/ozone/pull/2338#discussion_r658459386



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ScmConfigKeys.java
##########
@@ -535,6 +535,13 @@
       "ozone.scm.ca.list.retry.interval";
   public static final long OZONE_SCM_CA_LIST_RETRY_INTERVAL_DEFAULT = 10;
 
+
+  public static final String OZONE_SCM_EVENT_PREFIX = "ozone.scm.event.";
+
+  public static final String OZONE_SCM_EVENT_CONTAINER_REPORT_THREAD_POOL_SIZE =
+      OZONE_SCM_EVENT_PREFIX + "ContainerReport.thread.pool.size";
+  public static final int OZONE_SCM_EVENT_THREAD_POOL_SIZE_DEFAULT = 10;
+

Review comment:
       As the class SingleThreadExecutor is common for all Events, I have named the default variable in a generic way, but the config name construction is done from the event name. As we have more events which need different defaults, this can be revisited if needed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2338: HDDS-5341. Container report processing is single threaded

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2338:
URL: https://github.com/apache/ozone/pull/2338#discussion_r658457808



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ScmConfigKeys.java
##########
@@ -535,6 +535,13 @@
       "ozone.scm.ca.list.retry.interval";
   public static final long OZONE_SCM_CA_LIST_RETRY_INTERVAL_DEFAULT = 10;
 
+
+  public static final String OZONE_SCM_EVENT_PREFIX = "ozone.scm.event.";
+
+  public static final String OZONE_SCM_EVENT_CONTAINER_REPORT_THREAD_POOL_SIZE =
+      OZONE_SCM_EVENT_PREFIX + "ContainerReport.thread.pool.size";
+  public static final int OZONE_SCM_EVENT_THREAD_POOL_SIZE_DEFAULT = 10;
+

Review comment:
       OZONE_SCM_EVENT_THREAD_POOL_SIZE_DEFAULT -> OZONE_SCM_EVENT_CONTAINER_REPORT_THREAD_POOL_SIZE_DEFAULT??




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #2338: HDDS-5341. Container report processing is single threaded

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on pull request #2338:
URL: https://github.com/apache/ozone/pull/2338#issuecomment-866498478


   > 1. Does it make sense to thread pool the Incremental Reports too? I know they are much smaller, but they are also much more frequent. I have not seen any evidence they are backing up or not, but its worth considering / checking if we can.
   
   as far as i know, when a heartbeat from the data node arrives on SCM, It is queued for processing with the time stamp of when the heartbeat arrived. There is a heartbeat processing thread inside SCM that runs at a specified interval. So i think the point is how many reports is queued at SCM in the specified interval and how fast the report hander can deal with these report. the total num of Incremental Report in a specified interval(default 3s) is not very large , but it makes sense to promote it to a  thread pool if needed in the future. 
   
   > 2. I believe the DNs send a FCR every 60 - 90 seconds. Is that frequency really needed? Unknown bugs aside, is there any reason to send a FCR after startup and first registration? ON HDFS datanodes only send full block reports every 6 hours by default. If ICRs carry all the required information for SCM, perhaps we should increase the FCR interval to an hour or more?
   
   yea, i think this makes sense. too many FCR will Increase the burden of SCM
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2338: HDDS-5341. Container report processing is single threaded

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2338:
URL: https://github.com/apache/ozone/pull/2338#discussion_r658459386



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ScmConfigKeys.java
##########
@@ -535,6 +535,13 @@
       "ozone.scm.ca.list.retry.interval";
   public static final long OZONE_SCM_CA_LIST_RETRY_INTERVAL_DEFAULT = 10;
 
+
+  public static final String OZONE_SCM_EVENT_PREFIX = "ozone.scm.event.";
+
+  public static final String OZONE_SCM_EVENT_CONTAINER_REPORT_THREAD_POOL_SIZE =
+      OZONE_SCM_EVENT_PREFIX + "ContainerReport.thread.pool.size";
+  public static final int OZONE_SCM_EVENT_THREAD_POOL_SIZE_DEFAULT = 10;
+

Review comment:
       As the class SingleThreadExecutor is common for all Events, I have named the default variable in a generic way, but the config name construction is done from the event name. When we have more events that need different defaults, this can be revisited if needed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 edited a comment on pull request #2338: HDDS-5341. Container report processing is single threaded

Posted by GitBox <gi...@apache.org>.
bharatviswa504 edited a comment on pull request #2338:
URL: https://github.com/apache/ozone/pull/2338#issuecomment-866524893


   > I've got a couple of questions on this topic:
   > 
   > 1. Does it make sense to thread pool the Incremental Reports too? I know they are much smaller, but they are also much more frequent. I have not seen any evidence they are backing up or not, but its worth considering / checking if we can.
   
   Previously with ICR's, we used to send full container report, that is fixed by HDDS-5111. We shall be testing with this PR and HDDS-5111 with huge container reports from each DN. If we observe issue with ICR, we can add thread pool  to ICR also.
   
   > 2. I believe the DNs send a FCR every 60 - 90 seconds. Is that frequency really needed? Unknown bugs aside, is there any reason to send a FCR after startup and first registration? ON HDFS datanodes only send full block reports every 6 hours by default. If ICRs carry all the required information for SCM, perhaps we should increase the FCR interval to an hour or more?
   
   During startup/registration we need to send a full container report as the ContainerSafeMode rule is dependent on that to validate its rule.  And also we fire container report event, where we process container reports and build container replica set.
   
   But I completely agree with you we can change the full container report interval to a larger value. And I don't think we need to have a large value like HDFS, as compared to HDFS our container report size should be very less.
   
   From our scale testing
   With 9 DN's with each data node filled with 500 PB data, we have seen around 350K containers in the cluster. So, there are a total of 1 million replicas will be reported from all DN's.(When compared with HDFS our container report size is far less in size)
   
   > 3. Have we been able to capture any profiles (eg flame charts) of processing a large FCR to see if there is anything to be optimized in that flow?
   
   We have not debugged at this level, in future testing, we shall look into this.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2338: HDDS-5341. Container report processing is single threaded

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2338:
URL: https://github.com/apache/ozone/pull/2338#issuecomment-868395965


   Thank You @JacksonYao287 @bshashikant and @sodonnel for the review/comments.
   I will commit this shortly.
   
   @sodonnel In our scale testing, we have not observed an issue with ICR processing, in future, if it is needed, we can add that. As of now with this PR we have a general framework needed to support multi-threaded processing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 merged pull request #2338: HDDS-5341. Container report processing is single threaded

Posted by GitBox <gi...@apache.org>.
bharatviswa504 merged pull request #2338:
URL: https://github.com/apache/ozone/pull/2338


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #2338: HDDS-5341. Container report processing is single threaded

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #2338:
URL: https://github.com/apache/ozone/pull/2338#issuecomment-866174814


   I've got a couple of questions on this topic:
   
   1. Does it make sense to thread pool the Incremental Reports too? I know they are much smaller, but they are also much more frequent. I have not seen any evidence they are backing up or not, but its worth considering / checking if we can.
   2. I believe the DNs send a FCR every 60 - 90 seconds. Is that frequency really needed? Unknown bugs aside, is there any reason to send a FCR after startup and first registration? ON HDFS datanodes only send full block reports every 6 hours by default. If ICRs carry all the required information for SCM, perhaps we should increase the FCR interval to an hour or more?
   3. Have we been able to capture any profiles (eg flame charts) of processing a large FCR to see if there is anything to be optimised in that flow?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 edited a comment on pull request #2338: HDDS-5341. Container report processing is single threaded

Posted by GitBox <gi...@apache.org>.
bharatviswa504 edited a comment on pull request #2338:
URL: https://github.com/apache/ozone/pull/2338#issuecomment-866524893


   > I've got a couple of questions on this topic:
   > 
   > 1. Does it make sense to thread pool the Incremental Reports too? I know they are much smaller, but they are also much more frequent. I have not seen any evidence they are backing up or not, but its worth considering / checking if we can.
   
   Previously with ICR's, we used to send full container report, that is fixed by HDDS-5111. We shall be testing with this PR and HDDS-5111 with huge container reports from each DN. If we observe issue with ICR, we can add thread pool  to ICR also.
   
   > 2. I believe the DNs send a FCR every 60 - 90 seconds. Is that frequency really needed? Unknown bugs aside, is there any reason to send a FCR after startup and first registration? ON HDFS datanodes only send full block reports every 6 hours by default. If ICRs carry all the required information for SCM, perhaps we should increase the FCR interval to an hour or more?
   
   During startup/registration we need to send a full container report as the ContainerSafeMode rule is dependent on that to validate its rule.  And also we fire container report event, where we process container reports and build container replica set.
   
   But I completely agree with you we can change the full container report interval to a larger value. And I don't think we need to have a large value like HDFS, as compared to HDFS our container report size should be very less.
   
   From our scale testing
   With 9 DN's with each data node filled with 500 TB data, we have seen around 350K containers in the cluster. So, there are a total of 1 million replicas will be reported from all DN's.(When compared with HDFS our container report size is far less in size)
   
   > 3. Have we been able to capture any profiles (eg flame charts) of processing a large FCR to see if there is anything to be optimized in that flow?
   
   We have not debugged at this level, in future testing, we shall look into this.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2338: HDDS-5341. Container report processing is single threaded

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2338:
URL: https://github.com/apache/ozone/pull/2338#issuecomment-866524893


   > I've got a couple of questions on this topic:
   > 
   > 1. Does it make sense to thread pool the Incremental Reports too? I know they are much smaller, but they are also much more frequent. I have not seen any evidence they are backing up or not, but its worth considering / checking if we can.
   Previously with ICR's, we used to send full container report, that is fixed by HDDS-5111. We shall be testing with this PR and HDDS-5111 with huge container reports from each DN. If we observe issue with ICR, we can add thread pool  to ICR also.
   
   > 2. I believe the DNs send a FCR every 60 - 90 seconds. Is that frequency really needed? Unknown bugs aside, is there any reason to send a FCR after startup and first registration? ON HDFS datanodes only send full block reports every 6 hours by default. If ICRs carry all the required information for SCM, perhaps we should increase the FCR interval to an hour or more?
   
   During startup/registration we need to send a full container report as the ContainerSafeMode rule is dependent on that to validate its rule.  And also we fire container report event, where we process container reports and build container replica set.
   
   But I completely agree with you we can change the full container report interval to a larger value. And I don't think we need to have a large value like HDFS, as compared to HDFS our container report size should be very less.
   
   From our scale testing
   With 9 DN's with each data node filled with 500 PB data, we have seen around 350K containers in the cluster. So, there are a total of 1 million replicas will be reported from all DN's.(When compared with HDFS our container report size is far less in size)
   
   > 3. Have we been able to capture any profiles (eg flame charts) of processing a large FCR to see if there is anything to be optimized in that flow?
   
   We have not debugged at this level, in future testing, we shall look into this.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org