You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/28 22:04:03 UTC

[GitHub] [hudi] nsivabalan opened a new pull request #4715: [MINOR] Adding savepoint and restore docs to website

nsivabalan opened a new pull request #4715:
URL: https://github.com/apache/hudi/pull/4715


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #4715: [HUDI-1273] Adding savepoint and restore docs to website

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #4715:
URL: https://github.com/apache/hudi/pull/4715#issuecomment-1024762640


   <img width="918" alt="Screen Shot 2022-01-28 at 7 10 36 PM" src="https://user-images.githubusercontent.com/513218/151638020-0712b72e-0af3-4196-b446-16e3f07c3035.png">
   <img width="900" alt="Screen Shot 2022-01-28 at 7 10 50 PM" src="https://user-images.githubusercontent.com/513218/151638021-9b4f2663-143e-4244-a4e2-6b499701d1fb.png">
   <img width="897" alt="Screen Shot 2022-01-28 at 7 11 00 PM" src="https://user-images.githubusercontent.com/513218/151638022-ac55c452-4669-4b74-8dd8-b56ba5aed742.png">
   <img width="913" alt="Screen Shot 2022-01-28 at 7 11 07 PM" src="https://user-images.githubusercontent.com/513218/151638023-5aa3e663-68dd-4bbf-b2cc-25e82bb4ac5c.png">
   <img width="905" alt="Screen Shot 2022-01-28 at 7 11 17 PM" src="https://user-images.githubusercontent.com/513218/151638024-0d0775f9-6160-4e23-87d3-8dafe4ffe958.png">
   <img width="897" alt="Screen Shot 2022-01-28 at 7 11 26 PM" src="https://user-images.githubusercontent.com/513218/151638027-90bfdc24-726e-4342-9a53-bcad6007418e.png">
   <img width="901" alt="Screen Shot 2022-01-28 at 7 11 34 PM" src="https://user-images.githubusercontent.com/513218/151638028-564a415e-7572-4421-869c-7f59bc4f28ca.png">
   <img width="894" alt="Screen Shot 2022-01-28 at 7 11 43 PM" src="https://user-images.githubusercontent.com/513218/151638029-31b3b704-151b-40b4-ac71-2fa14de93bd5.png">
   <img width="902" alt="Screen Shot 2022-01-28 at 7 11 50 PM" src="https://user-images.githubusercontent.com/513218/151638030-0e6a5362-7a7d-494c-b98c-2a7b1912f7a2.png">
   <img width="893" alt="Screen Shot 2022-01-28 at 7 11 58 PM" src="https://user-images.githubusercontent.com/513218/151638031-ab350910-a2cc-4a5c-9e04-74f5bbecd07f.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] kywe665 commented on a change in pull request #4715: [HUDI-1273] Adding savepoint and restore docs to website

Posted by GitBox <gi...@apache.org>.
kywe665 commented on a change in pull request #4715:
URL: https://github.com/apache/hudi/pull/4715#discussion_r810430779



##########
File path: website/docs/disaster_recovery.md
##########
@@ -0,0 +1,303 @@
+---
+title: Disaster and Recovery with Apache Hudi

Review comment:
       "Disaster Recovery" is used commonly without "and"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan merged pull request #4715: [HUDI-1273] Adding savepoint and restore docs to website

Posted by GitBox <gi...@apache.org>.
nsivabalan merged pull request #4715:
URL: https://github.com/apache/hudi/pull/4715


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] abhijeetkushe removed a comment on pull request #4715: [HUDI-1273] Adding savepoint and restore docs to website

Posted by GitBox <gi...@apache.org>.
abhijeetkushe removed a comment on pull request #4715:
URL: https://github.com/apache/hudi/pull/4715#issuecomment-1045009737


   Actually we were able to get the job running again.When we started with a small volume of data and it was able to rollback the previous commit and catch up to current time.We can close this issue.Thanks for sending the Savepoint documentation @nsivabalan  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] abhijeetkushe commented on pull request #4715: [HUDI-1273] Adding savepoint and restore docs to website

Posted by GitBox <gi...@apache.org>.
abhijeetkushe commented on pull request #4715:
URL: https://github.com/apache/hudi/pull/4715#issuecomment-1045009737


   Actually we were able to get the job running again.When we started with a small volume of data and it was able to rollback the previous commit and catch up to current time.I am going to close this issue.Thanks for sending the Savepoint documentation @nsivabalan  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] abhijeetkushe edited a comment on pull request #4715: [HUDI-1273] Adding savepoint and restore docs to website

Posted by GitBox <gi...@apache.org>.
abhijeetkushe edited a comment on pull request #4715:
URL: https://github.com/apache/hudi/pull/4715#issuecomment-1045009737


   Actually we were able to get the job running again.When we started with a small volume of data and it was able to rollback the previous commit and catch up to current time.We can close this issue.Thanks for sending the Savepoint documentation @nsivabalan  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] kywe665 commented on a change in pull request #4715: [HUDI-1273] Adding savepoint and restore docs to website

Posted by GitBox <gi...@apache.org>.
kywe665 commented on a change in pull request #4715:
URL: https://github.com/apache/hudi/pull/4715#discussion_r810430878



##########
File path: website/docs/disaster_recovery.md
##########
@@ -0,0 +1,303 @@
+---
+title: Disaster and Recovery with Apache Hudi

Review comment:
       "Disaster Recovery" is used commonly without the "and"

##########
File path: website/docs/disaster_recovery.md
##########
@@ -0,0 +1,303 @@
+---
+title: Disaster and Recovery with Apache Hudi
+toc: true
+---
+
+Disaster and Recovery is very much mission critical for any software. Especially when it comes to data systems, the impact could be very serious
+leading to delay in business decisions or even wrong business decisions at times. So, Apache Hudi has built in features to assist 
+you in such situations. Apache Hudi supports two write operation, namely "savepoint" and "restore" to assist you in this.
+
+## Savepoint
+
+As the name suggest, "savepoint" saves the table as of the commit time, so that it lets you restore the table to this 
+savepoint at a later point in time if need be. Users can run some validations on few candidate commits and trigger a savepoint 
+as applicable. Care is taken to ensure cleaner will not clean up any files that are savepointed. On similar lines, 
+savepoint cannot be triggered on a commit that is already cleaned up. In simpler terms, this is synonymous to taking a backup, 
+just that we don't make a new copy of the table, but just save the state of the table elegantly so that we can restore later 
+when in need. 
+
+## Restore
+
+This operation lets you restore your table to one of the savepointed commit. This is a destructive operation and so care 
+should be taken before doing a restore. Hudi will delete all data files and commit files (timeline files) greater than the
+savepointed commit to which the table is being restored. Also, users should bring down all writer processes while triggering 

Review comment:
       "savepoint commit"?

##########
File path: website/docs/disaster_recovery.md
##########
@@ -0,0 +1,303 @@
+---
+title: Disaster and Recovery with Apache Hudi
+toc: true
+---
+
+Disaster and Recovery is very much mission critical for any software. Especially when it comes to data systems, the impact could be very serious

Review comment:
       You can simplify this paragraph into the below. In docs we don't have to justify motivations, just give the info on how to use:
   
   Apache Hudi has two operations to assist you in recovering data from a previous state:  "savepoint" and "restore".

##########
File path: website/docs/disaster_recovery.md
##########
@@ -0,0 +1,303 @@
+---
+title: Disaster and Recovery with Apache Hudi
+toc: true
+---
+
+Disaster and Recovery is very much mission critical for any software. Especially when it comes to data systems, the impact could be very serious
+leading to delay in business decisions or even wrong business decisions at times. So, Apache Hudi has built in features to assist 
+you in such situations. Apache Hudi supports two write operation, namely "savepoint" and "restore" to assist you in this.
+
+## Savepoint
+
+As the name suggest, "savepoint" saves the table as of the commit time, so that it lets you restore the table to this 
+savepoint at a later point in time if need be. Users can run some validations on few candidate commits and trigger a savepoint 
+as applicable. Care is taken to ensure cleaner will not clean up any files that are savepointed. On similar lines, 
+savepoint cannot be triggered on a commit that is already cleaned up. In simpler terms, this is synonymous to taking a backup, 
+just that we don't make a new copy of the table, but just save the state of the table elegantly so that we can restore later 
+when in need. 
+
+## Restore
+
+This operation lets you restore your table to one of the savepointed commit. This is a destructive operation and so care 

Review comment:
       can we say "savepoint commits" instead of "savepointed commit"?

##########
File path: website/docs/disaster_recovery.md
##########
@@ -0,0 +1,303 @@
+---
+title: Disaster and Recovery with Apache Hudi
+toc: true
+---
+
+Disaster and Recovery is very much mission critical for any software. Especially when it comes to data systems, the impact could be very serious
+leading to delay in business decisions or even wrong business decisions at times. So, Apache Hudi has built in features to assist 
+you in such situations. Apache Hudi supports two write operation, namely "savepoint" and "restore" to assist you in this.
+
+## Savepoint
+
+As the name suggest, "savepoint" saves the table as of the commit time, so that it lets you restore the table to this 
+savepoint at a later point in time if need be. Users can run some validations on few candidate commits and trigger a savepoint 

Review comment:
       You can remove the sentence "Users can run some validations..."

##########
File path: website/docs/disaster_recovery.md
##########
@@ -0,0 +1,303 @@
+---
+title: Disaster and Recovery with Apache Hudi
+toc: true
+---
+
+Disaster and Recovery is very much mission critical for any software. Especially when it comes to data systems, the impact could be very serious
+leading to delay in business decisions or even wrong business decisions at times. So, Apache Hudi has built in features to assist 
+you in such situations. Apache Hudi supports two write operation, namely "savepoint" and "restore" to assist you in this.
+
+## Savepoint
+
+As the name suggest, "savepoint" saves the table as of the commit time, so that it lets you restore the table to this 
+savepoint at a later point in time if need be. Users can run some validations on few candidate commits and trigger a savepoint 
+as applicable. Care is taken to ensure cleaner will not clean up any files that are savepointed. On similar lines, 
+savepoint cannot be triggered on a commit that is already cleaned up. In simpler terms, this is synonymous to taking a backup, 
+just that we don't make a new copy of the table, but just save the state of the table elegantly so that we can restore later 
+when in need. 
+
+## Restore
+
+This operation lets you restore your table to one of the savepointed commit. This is a destructive operation and so care 
+should be taken before doing a restore. Hudi will delete all data files and commit files (timeline files) greater than the
+savepointed commit to which the table is being restored. Also, users should bring down all writer processes while triggering 

Review comment:
       Replace "Also, users should..." all the way to end of paragraph with:
   
   "You should pause all writes and reads to the table when performing a restore since they are likely to fail while the restore is in progress."

##########
File path: website/docs/disaster_recovery.md
##########
@@ -0,0 +1,303 @@
+---
+title: Disaster and Recovery with Apache Hudi
+toc: true
+---
+
+Disaster and Recovery is very much mission critical for any software. Especially when it comes to data systems, the impact could be very serious
+leading to delay in business decisions or even wrong business decisions at times. So, Apache Hudi has built in features to assist 
+you in such situations. Apache Hudi supports two write operation, namely "savepoint" and "restore" to assist you in this.
+
+## Savepoint
+
+As the name suggest, "savepoint" saves the table as of the commit time, so that it lets you restore the table to this 
+savepoint at a later point in time if need be. Users can run some validations on few candidate commits and trigger a savepoint 
+as applicable. Care is taken to ensure cleaner will not clean up any files that are savepointed. On similar lines, 
+savepoint cannot be triggered on a commit that is already cleaned up. In simpler terms, this is synonymous to taking a backup, 
+just that we don't make a new copy of the table, but just save the state of the table elegantly so that we can restore later 
+when in need. 
+
+## Restore
+
+This operation lets you restore your table to one of the savepointed commit. This is a destructive operation and so care 

Review comment:
       Can we say, "this operation cannot be undone" (or "reversed"), instead of "destructive"? Unless that is an industry term.

##########
File path: website/docs/disaster_recovery.md
##########
@@ -0,0 +1,303 @@
+---
+title: Disaster and Recovery with Apache Hudi
+toc: true
+---
+
+Disaster and Recovery is very much mission critical for any software. Especially when it comes to data systems, the impact could be very serious
+leading to delay in business decisions or even wrong business decisions at times. So, Apache Hudi has built in features to assist 
+you in such situations. Apache Hudi supports two write operation, namely "savepoint" and "restore" to assist you in this.
+
+## Savepoint
+
+As the name suggest, "savepoint" saves the table as of the commit time, so that it lets you restore the table to this 
+savepoint at a later point in time if need be. Users can run some validations on few candidate commits and trigger a savepoint 
+as applicable. Care is taken to ensure cleaner will not clean up any files that are savepointed. On similar lines, 
+savepoint cannot be triggered on a commit that is already cleaned up. In simpler terms, this is synonymous to taking a backup, 
+just that we don't make a new copy of the table, but just save the state of the table elegantly so that we can restore later 
+when in need. 
+
+## Restore
+
+This operation lets you restore your table to one of the savepointed commit. This is a destructive operation and so care 
+should be taken before doing a restore. Hudi will delete all data files and commit files (timeline files) greater than the
+savepointed commit to which the table is being restored. Also, users should bring down all writer processes while triggering 
+a restore for a given hudi table. And please note that queries might likely fail during the time of restore since queries are likely
+hitting latest files which might be deleted during retore operation. 
+
+## Runbook
+
+Savepoint and restore can only be triggered from hudi-cli. Lets walk through an example of how one can take savepoint 
+and later restore the state of the table. 
+
+Lets create a hudi table via spark-shell. I am going to trigger few batches of inserts. 
+
+```scala
+import org.apache.hudi.QuickstartUtils._
+import scala.collection.JavaConversions._
+import org.apache.spark.sql.SaveMode._
+import org.apache.hudi.DataSourceReadOptions._
+import org.apache.hudi.DataSourceWriteOptions._
+import org.apache.hudi.config.HoodieWriteConfig._
+
+val tableName = "hudi_trips_cow"
+val basePath = "file:///tmp/hudi_trips_cow"
+val dataGen = new DataGenerator
+
+// spark-shell
+val inserts = convertToStringList(dataGen.generateInserts(10))
+val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
+df.write.format("hudi").
+  options(getQuickstartWriteConfigs).
+  option(PRECOMBINE_FIELD_OPT_KEY, "ts").
+  option(RECORDKEY_FIELD_OPT_KEY, "uuid").
+  option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
+  option(TABLE_NAME, tableName).
+  mode(Overwrite).
+  save(basePath)
+```
+
+Each batch inserst 10 records. Repeating for 4 more batches. 
+```scala
+
+val inserts = convertToStringList(dataGen.generateInserts(10))
+val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
+df.write.format("hudi").
+  options(getQuickstartWriteConfigs).
+  option(PRECOMBINE_FIELD_OPT_KEY, "ts").
+  option(RECORDKEY_FIELD_OPT_KEY, "uuid").
+  option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
+  option(TABLE_NAME, tableName).
+  mode(Append).
+  save(basePath)
+```
+
+Total record count should be 50. 
+```scala
+val tripsSnapshotDF = spark.
+  read.
+  format("hudi").
+  load(basePath)
+tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot")
+
+spark.sql("select count(partitionpath, uuid) from  hudi_trips_snapshot ").show()
+
++--------------------------+
+|count(partitionpath, uuid)|
+  +--------------------------+
+|                        50|
+  +--------------------------+
+```
+Let's take a look at the timeline after 5 batch of inserts. 
+```shell
+ls -ltr /tmp/hudi_trips_cow/.hoodie 
+total 128
+drwxr-xr-x  2 nsb  wheel    64 Jan 28 16:00 archived
+-rw-r--r--  1 nsb  wheel   546 Jan 28 16:00 hoodie.properties
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:00 20220128160040171.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:00 20220128160040171.inflight
+-rw-r--r--  1 nsb  wheel  4374 Jan 28 16:00 20220128160040171.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:01 20220128160124637.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:01 20220128160124637.inflight
+-rw-r--r--  1 nsb  wheel  4414 Jan 28 16:01 20220128160124637.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160226172.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160226172.inflight
+-rw-r--r--  1 nsb  wheel  4427 Jan 28 16:02 20220128160226172.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160229636.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160229636.inflight
+-rw-r--r--  1 nsb  wheel  4428 Jan 28 16:02 20220128160229636.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160245447.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160245447.inflight
+-rw-r--r--  1 nsb  wheel  4428 Jan 28 16:02 20220128160245447.commit
+```
+
+Let's trigger a savepoint as of the latest commit. Savepoint can only be done via hudi-cli.
+
+```sh
+./hudi-cli.sh
+
+connect --path /tmp/hudi_trips_cow/
+commits show
+set --conf SPARK_HOME=<SPARK_HOME>
+savepoint create --commit 20220128160245447 --sparkMaster local[2]
+```
+
+Let's check the timeline after savepoint. 
+```shell
+ls -ltr /tmp/hudi_trips_cow/.hoodie
+total 136
+drwxr-xr-x  2 nsb  wheel    64 Jan 28 16:00 archived
+-rw-r--r--  1 nsb  wheel   546 Jan 28 16:00 hoodie.properties
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:00 20220128160040171.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:00 20220128160040171.inflight
+-rw-r--r--  1 nsb  wheel  4374 Jan 28 16:00 20220128160040171.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:01 20220128160124637.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:01 20220128160124637.inflight
+-rw-r--r--  1 nsb  wheel  4414 Jan 28 16:01 20220128160124637.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160226172.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160226172.inflight
+-rw-r--r--  1 nsb  wheel  4427 Jan 28 16:02 20220128160226172.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160229636.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160229636.inflight
+-rw-r--r--  1 nsb  wheel  4428 Jan 28 16:02 20220128160229636.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160245447.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160245447.inflight
+-rw-r--r--  1 nsb  wheel  4428 Jan 28 16:02 20220128160245447.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:05 20220128160245447.savepoint.inflight
+-rw-r--r--  1 nsb  wheel  1168 Jan 28 16:05 20220128160245447.savepoint
+```
+
+You could notice that savepoint meta files are added which keeps track of the files that are part of the latest table snapshot. 
+
+Now, lets continue adding few more batches of inserts. 
+Repeat below commands for 3 times.
+```scala
+val inserts = convertToStringList(dataGen.generateInserts(10))
+val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
+df.write.format("hudi").
+  options(getQuickstartWriteConfigs).
+  option(PRECOMBINE_FIELD_OPT_KEY, "ts").
+  option(RECORDKEY_FIELD_OPT_KEY, "uuid").
+  option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
+  option(TABLE_NAME, tableName).
+  mode(Append).
+  save(basePath)
+```
+
+Total record count will be 80 since we have done 8 batches in total. (5 until savepoint and 3 after savepoint)
+```scala
+val tripsSnapshotDF = spark.
+  read.
+  format("hudi").
+  load(basePath)
+tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot")
+
+spark.sql("select count(partitionpath, uuid) from  hudi_trips_snapshot ").show()
++--------------------------+
+|count(partitionpath, uuid)|
+  +--------------------------+
+|                        80|
+  +--------------------------+
+```
+
+Let's say something bad happened and you want to restore your table to a older snapshot. As we called out earlier, we can
+trigger restore only from hudi-cli. And do remember to bring down all of your writer processes while doing a restore. 
+
+Lets checkout timeline once, before we trigger the restore.
+```shell
+ls -ltr /tmp/hudi_trips_cow/.hoodie
+total 208
+drwxr-xr-x  2 nsb  wheel    64 Jan 28 16:00 archived
+-rw-r--r--  1 nsb  wheel   546 Jan 28 16:00 hoodie.properties
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:00 20220128160040171.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:00 20220128160040171.inflight
+-rw-r--r--  1 nsb  wheel  4374 Jan 28 16:00 20220128160040171.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:01 20220128160124637.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:01 20220128160124637.inflight
+-rw-r--r--  1 nsb  wheel  4414 Jan 28 16:01 20220128160124637.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160226172.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160226172.inflight
+-rw-r--r--  1 nsb  wheel  4427 Jan 28 16:02 20220128160226172.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160229636.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160229636.inflight
+-rw-r--r--  1 nsb  wheel  4428 Jan 28 16:02 20220128160229636.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160245447.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160245447.inflight
+-rw-r--r--  1 nsb  wheel  4428 Jan 28 16:02 20220128160245447.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:05 20220128160245447.savepoint.inflight
+-rw-r--r--  1 nsb  wheel  1168 Jan 28 16:05 20220128160245447.savepoint
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:06 20220128160620557.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:06 20220128160620557.inflight
+-rw-r--r--  1 nsb  wheel  4428 Jan 28 16:06 20220128160620557.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:06 20220128160627501.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:06 20220128160627501.inflight
+-rw-r--r--  1 nsb  wheel  4428 Jan 28 16:06 20220128160627501.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:06 20220128160630785.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:06 20220128160630785.inflight
+-rw-r--r--  1 nsb  wheel  4428 Jan 28 16:06 20220128160630785.commit
+```
+
+If you are continuing in the same hudi-cli session, you can just execute "refresh" so that table state gets refreshed to 
+its latest state. If not, connect to the table again. 
+
+```shell
+./hudi-cli.sh
+
+connect --path /tmp/hudi_trips_cow/
+commits show
+set --conf SPARK_HOME=<SPARK_HOME>
+savepoints show
+╔═══════════════════╗
+║ SavepointTime     ║
+╠═══════════════════╣
+║ 20220128160245447 ║
+╚═══════════════════╝
+savepoint rollback --savepoint 20220128160245447 --sparkMaster local[2]
+```
+
+Hudi table should have been restored to the savepointed commit 20220128160245447. Both data files and timeline files should have 
+been deleted. 
+```shell
+ls -ltr /tmp/hudi_trips_cow/.hoodie
+total 152
+drwxr-xr-x  2 nsb  wheel    64 Jan 28 16:00 archived
+-rw-r--r--  1 nsb  wheel   546 Jan 28 16:00 hoodie.properties
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:00 20220128160040171.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:00 20220128160040171.inflight
+-rw-r--r--  1 nsb  wheel  4374 Jan 28 16:00 20220128160040171.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:01 20220128160124637.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:01 20220128160124637.inflight
+-rw-r--r--  1 nsb  wheel  4414 Jan 28 16:01 20220128160124637.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160226172.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160226172.inflight
+-rw-r--r--  1 nsb  wheel  4427 Jan 28 16:02 20220128160226172.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160229636.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160229636.inflight
+-rw-r--r--  1 nsb  wheel  4428 Jan 28 16:02 20220128160229636.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:02 20220128160245447.commit.requested
+-rw-r--r--  1 nsb  wheel  2594 Jan 28 16:02 20220128160245447.inflight
+-rw-r--r--  1 nsb  wheel  4428 Jan 28 16:02 20220128160245447.commit
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:05 20220128160245447.savepoint.inflight
+-rw-r--r--  1 nsb  wheel  1168 Jan 28 16:05 20220128160245447.savepoint
+-rw-r--r--  1 nsb  wheel     0 Jan 28 16:07 20220128160732437.restore.inflight
+-rw-r--r--  1 nsb  wheel  4152 Jan 28 16:07 20220128160732437.restore
+```
+
+Lets check the total record count in the table. Should match the records we had, just before we triggered the savepoint. 
+```scala
+val tripsSnapshotDF = spark.
+  read.
+  format("hudi").
+  load(basePath)
+tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot")
+
+spark.sql("select count(partitionpath, uuid) from  hudi_trips_snapshot ").show()
++--------------------------+
+|count(partitionpath, uuid)|
+  +--------------------------+
+|                        50|
+  +--------------------------+
+```
+
+As you could see, entire table state is restored back to the commit which was savepointed. Users can choose to trigger savepoint 
+at regular cadence and keep deleting older savepoints when new ones are created. Hudi-cli has a command "savepoint delete" 
+to assist in deleting a savepoint. Please do remember that cleaner may not clean the files that are savepointed. And so users 
+should ensure they delete the savepoints from time to time. If not, the storage reclamation may not happen. 
+
+## Conclusion

Review comment:
       You can remove conclusion, not needed for a doc




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org