You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "hemantk-12 (via GitHub)" <gi...@apache.org> on 2023/01/27 05:40:04 UTC

[GitHub] [ozone] hemantk-12 opened a new pull request, #4214: HDDS-7845. Added wait for checkpoint directory creation in rocksDB

hemantk-12 opened a new pull request, #4214:
URL: https://github.com/apache/ozone/pull/4214

   ## What changes were proposed in this pull request?
   This change is to add the wait for 10 secs for checkpoint directory creation with 100 millis interval poll. We can't not use [wait()](https://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#wait()) and [notify()](https://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#notify()) because checkpointing directory creation happens inside RocksDB and RocksDB doesn't provide any [listen](https://javadoc.io/static/org.rocksdb/rocksdbjni/7.4.5/org/rocksdb/EventListener.html) which can be overloaded to notify.
   Hence we are using `awaitility` polling mechanism to check if checkpoint directory gets created in 10 secs.
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-7845
   
   ## How was this patch tested?
   Existing tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] smengcl commented on pull request #4214: HDDS-7845. [Snapshot] Wait for RocksDB checkpoint directory creation

Posted by "smengcl (via GitHub)" <gi...@apache.org>.
smengcl commented on PR #4214:
URL: https://github.com/apache/ozone/pull/4214#issuecomment-1410904973

   And thanks @adoroszlai and @prashantpogde for the reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] smengcl merged pull request #4214: HDDS-7845. [Snapshot] Wait for RocksDB checkpoint directory creation

Posted by "smengcl (via GitHub)" <gi...@apache.org>.
smengcl merged PR #4214:
URL: https://github.com/apache/ozone/pull/4214


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a diff in pull request #4214: HDDS-7845. Added wait for checkpoint directory creation in rocksDB

Posted by "adoroszlai (via GitHub)" <gi...@apache.org>.
adoroszlai commented on code in PR #4214:
URL: https://github.com/apache/ozone/pull/4214#discussion_r1088669532


##########
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBCheckpointManager.java:
##########
@@ -96,6 +100,21 @@ public RocksDBCheckpoint createCheckpoint(String parentDir, String name) {
     return null;
   }
 
+  /**
+   * Wait for checkpoint directory gets created for 10 secs.
+   * <p>
+   * By default, Awaitility waits for 10 seconds and then throw
+   * a ConditionTimeoutException if the condition has not been fulfilled.
+   * The default poll interval and poll delay is 100 milliseconds.
+   */
+  private void waitForCheckpointDirectoryExist(File file) throws IOException {
+    try {
+      await().until(file::exists);
+    } catch (ConditionTimeoutException exception) {
+      LOG.info("Checkpoint directory didn't get created in 10 secs.");

Review Comment:
   Would it be useful to include the dir name in the log message?



##########
hadoop-hdds/framework/pom.xml:
##########
@@ -166,7 +166,11 @@ https://maven.apache.org/xsd/maven-4.0.0.xsd">
       <artifactId>rocksdb-checkpoint-differ</artifactId>
       <version>${hdds.version}</version>
     </dependency>
-
+    <dependency>
+      <groupId>org.awaitility</groupId>
+      <artifactId>awaitility</artifactId>
+      <version>4.2.0</version>
+    </dependency>

Review Comment:
   When adding a new dependency, please:
   
   1. update `hadoop-ozone/dist/src/main/license/bin/LICENSE.txt` (also for any new transitive dependencies)
   2. update `hadoop-ozone/dist/src/main/license/jar-report.txt` with versionless jar names (also for any new transitive dependencies)
   3. create a [property in root POM](https://github.com/apache/ozone/blob/ec1e098694d019043f7340e06b54bfc57af6ed47/pom.xml#L139-L140) for the version number
   4. define the dependency in [root POM's `dependencyManagement`](https://github.com/apache/ozone/blob/ec1e098694d019043f7340e06b54bfc57af6ed47/pom.xml#L756-L760) section using the property
   5. add it to the submodule POM(s) without version definition
   
   
   ```
   Changed jars:
   
   --- /dev/fd/63	2023-01-27 06:[10](https://github.com/hemantk-12/ozone/actions/runs/4021901854/jobs/6911260833#step:5:11):29.3776[12](https://github.com/hemantk-12/ozone/actions/runs/4021901854/jobs/6911260833#step:5:13)253 +0000
   +++ /dev/fd/62	[20](https://github.com/hemantk-12/ozone/actions/runs/4021901854/jobs/6911260833#step:5:21)23-01-27 06:10:29.37761[22](https://github.com/hemantk-12/ozone/actions/runs/4021901854/jobs/6911260833#step:5:23)53 +0000
   @@ -9,6 +9,7 @@
    share/ozone/lib/asm.jar
    share/ozone/lib/aspectjrt.jar
    share/ozone/lib/aspectjweaver.jar
   +share/ozone/lib/awaitility.jar
    share/ozone/lib/aws-java-sdk-core.jar
    share/ozone/lib/aws-java-sdk-kms.jar
    share/ozone/lib/aws-java-sdk-s3.jar
   @@ -60,6 +61,7 @@
    share/ozone/lib/hadoop-hdfs.jar
    share/ozone/lib/hadoop-shaded-guava.jar
    share/ozone/lib/hadoop-shaded-protobuf_3_7.jar
   +share/ozone/lib/hamcrest.jar
    share/ozone/lib/hdds-annotation-processing.jar
    share/ozone/lib/hdds-client.jar
    share/ozone/lib/hdds-common.jar
   ```
   
   https://github.com/hemantk-12/ozone/actions/runs/4021901854/jobs/6911260833#step:5:19



##########
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBCheckpointManager.java:
##########
@@ -85,6 +87,8 @@ public RocksDBCheckpoint createCheckpoint(String parentDir, String name) {
       LOG.info("Created checkpoint at {} in {} milliseconds",
               checkpointPath, duration);

Review Comment:
   Should this "success" log and `end`/`duration` be moved into the new method, after successful wait?



##########
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBCheckpointManager.java:
##########
@@ -85,6 +87,8 @@ public RocksDBCheckpoint createCheckpoint(String parentDir, String name) {
       LOG.info("Created checkpoint at {} in {} milliseconds",
               checkpointPath, duration);
 
+      waitForCheckpointDirectoryExist(checkpointPath.toFile());
+
       return new RocksDBCheckpoint(
           checkpointPath,
           currentTime,

Review Comment:
   What if the wait times out?  Is the checkpoint object being returned still valid?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] prashantpogde commented on a diff in pull request #4214: HDDS-7845. Added wait for checkpoint directory creation in rocksDB

Posted by "prashantpogde (via GitHub)" <gi...@apache.org>.
prashantpogde commented on code in PR #4214:
URL: https://github.com/apache/ozone/pull/4214#discussion_r1091010552


##########
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBCheckpointManager.java:
##########
@@ -79,12 +85,17 @@ public RocksDBCheckpoint createCheckpoint(String parentDir, String name) {
       checkpoint.createCheckpoint(checkpointPath);
       //Best guesstimate here. Not accurate.
       final long latest = checkpoint.getLatestSequenceNumber();
-      Instant end = Instant.now();
 
+      Instant end = Instant.now();
       long duration = Duration.between(start, end).toMillis();
-      LOG.info("Created checkpoint at {} in {} milliseconds",
+      LOG.info("Created checkpoint in rocksDB at {} in {} milliseconds",
               checkpointPath, duration);
 
+      // Wait for checkpoint directory to be created if it doesn't exist.
+      if (!checkpointPath.toFile().exists()) {

Review Comment:
   redundant



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hemantk-12 commented on a diff in pull request #4214: HDDS-7845. Added wait for checkpoint directory creation in rocksDB

Posted by "hemantk-12 (via GitHub)" <gi...@apache.org>.
hemantk-12 commented on code in PR #4214:
URL: https://github.com/apache/ozone/pull/4214#discussion_r1089334227


##########
hadoop-hdds/framework/pom.xml:
##########
@@ -166,7 +166,11 @@ https://maven.apache.org/xsd/maven-4.0.0.xsd">
       <artifactId>rocksdb-checkpoint-differ</artifactId>
       <version>${hdds.version}</version>
     </dependency>
-
+    <dependency>
+      <groupId>org.awaitility</groupId>
+      <artifactId>awaitility</artifactId>
+      <version>4.2.0</version>
+    </dependency>

Review Comment:
   Thanks for the detailed steps to add dependency.
   
   Updated as suggested.



##########
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBCheckpointManager.java:
##########
@@ -85,6 +87,8 @@ public RocksDBCheckpoint createCheckpoint(String parentDir, String name) {
       LOG.info("Created checkpoint at {} in {} milliseconds",
               checkpointPath, duration);
 
+      waitForCheckpointDirectoryExist(checkpointPath.toFile());
+
       return new RocksDBCheckpoint(
           checkpointPath,
           currentTime,

Review Comment:
   Checkpoint object is always valid. It is just that RocksDB has not flushed the queue and created the actual directory.



##########
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBCheckpointManager.java:
##########
@@ -96,6 +100,21 @@ public RocksDBCheckpoint createCheckpoint(String parentDir, String name) {
     return null;
   }
 
+  /**
+   * Wait for checkpoint directory gets created for 10 secs.
+   * <p>
+   * By default, Awaitility waits for 10 seconds and then throw
+   * a ConditionTimeoutException if the condition has not been fulfilled.
+   * The default poll interval and poll delay is 100 milliseconds.
+   */
+  private void waitForCheckpointDirectoryExist(File file) throws IOException {
+    try {
+      await().until(file::exists);
+    } catch (ConditionTimeoutException exception) {
+      LOG.info("Checkpoint directory didn't get created in 10 secs.");

Review Comment:
   Added dir name.



##########
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBCheckpointManager.java:
##########
@@ -85,6 +87,8 @@ public RocksDBCheckpoint createCheckpoint(String parentDir, String name) {
       LOG.info("Created checkpoint at {} in {} milliseconds",
               checkpointPath, duration);
 
+      waitForCheckpointDirectoryExist(checkpointPath.toFile());
+
       return new RocksDBCheckpoint(
           checkpointPath,
           currentTime,

Review Comment:
   As discussed offline, it is not good idea to throw exception in case of timeout. We have to look into more details how we can make sure that rocksDB flush the queue and create checkpointing dir.



##########
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBCheckpointManager.java:
##########
@@ -85,6 +87,8 @@ public RocksDBCheckpoint createCheckpoint(String parentDir, String name) {
       LOG.info("Created checkpoint at {} in {} milliseconds",
               checkpointPath, duration);

Review Comment:
   Checkpoint dir creation is a async task.  As Prashant mentioned it is better to keep this separate from wait method.
   Added another success log in `waitForCheckpointDirectoryExist` function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] prashantpogde commented on a diff in pull request #4214: HDDS-7845. Added wait for checkpoint directory creation in rocksDB

Posted by "prashantpogde (via GitHub)" <gi...@apache.org>.
prashantpogde commented on code in PR #4214:
URL: https://github.com/apache/ozone/pull/4214#discussion_r1089398388


##########
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBCheckpointManager.java:
##########
@@ -85,6 +87,8 @@ public RocksDBCheckpoint createCheckpoint(String parentDir, String name) {
       LOG.info("Created checkpoint at {} in {} milliseconds",
               checkpointPath, duration);
 
+      waitForCheckpointDirectoryExist(checkpointPath.toFile());
+
       return new RocksDBCheckpoint(
           checkpointPath,
           currentTime,

Review Comment:
   I believe checkpoint object returned should still be valid if the rocksDB api returned success. However, we should throw the timeout exception all the way back to the application. Application can wait for the snapshot dir creation when this happens before using the snapshot dir. The exception thrown should contain enough details so that application can wait on the dir creation. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] prashantpogde commented on a diff in pull request #4214: HDDS-7845. Added wait for checkpoint directory creation in rocksDB

Posted by "prashantpogde (via GitHub)" <gi...@apache.org>.
prashantpogde commented on code in PR #4214:
URL: https://github.com/apache/ozone/pull/4214#discussion_r1089353370


##########
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBCheckpointManager.java:
##########
@@ -85,6 +87,8 @@ public RocksDBCheckpoint createCheckpoint(String parentDir, String name) {
       LOG.info("Created checkpoint at {} in {} milliseconds",
               checkpointPath, duration);

Review Comment:
   +1. Let us Log both the time durations, once the RocksDB checkpoint API returns and also after the checkpoint directory comes into namespace.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hemantk-12 commented on a diff in pull request #4214: HDDS-7845. Added wait for checkpoint directory creation in rocksDB

Posted by "hemantk-12 (via GitHub)" <gi...@apache.org>.
hemantk-12 commented on code in PR #4214:
URL: https://github.com/apache/ozone/pull/4214#discussion_r1091021207


##########
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBCheckpointManager.java:
##########
@@ -79,12 +85,17 @@ public RocksDBCheckpoint createCheckpoint(String parentDir, String name) {
       checkpoint.createCheckpoint(checkpointPath);
       //Best guesstimate here. Not accurate.
       final long latest = checkpoint.getLatestSequenceNumber();
-      Instant end = Instant.now();
 
+      Instant end = Instant.now();
       long duration = Duration.between(start, end).toMillis();
-      LOG.info("Created checkpoint at {} in {} milliseconds",
+      LOG.info("Created checkpoint in rocksDB at {} in {} milliseconds",
               checkpointPath, duration);
 
+      // Wait for checkpoint directory to be created if it doesn't exist.
+      if (!checkpointPath.toFile().exists()) {

Review Comment:
   Removed this check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hemantk-12 commented on a diff in pull request #4214: HDDS-7845. Added wait for checkpoint directory creation in rocksDB

Posted by "hemantk-12 (via GitHub)" <gi...@apache.org>.
hemantk-12 commented on code in PR #4214:
URL: https://github.com/apache/ozone/pull/4214#discussion_r1089589210


##########
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBCheckpointManager.java:
##########
@@ -85,6 +87,8 @@ public RocksDBCheckpoint createCheckpoint(String parentDir, String name) {
       LOG.info("Created checkpoint at {} in {} milliseconds",
               checkpointPath, duration);
 
+      waitForCheckpointDirectoryExist(checkpointPath.toFile());
+
       return new RocksDBCheckpoint(
           checkpointPath,
           currentTime,

Review Comment:
   As discussed offline, it is not good idea to throw exception in case of timeout because checkpoint directory will get create eventually.
   Instead, we will flush the DB.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org