You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/02/17 00:58:12 UTC

[GitHub] [hudi] abhijeetkushe opened a new issue #4831: [SUPPORT] Help with Hudi Cli Savepoint and restore

abhijeetkushe opened a new issue #4831:
URL: https://github.com/apache/hudi/issues/4831


   **Describe the problem you faced**
   
   We have been running Hudi Delta streamer on emr-5.31.0 (details below) for more than a year in produce and the dataset has 45 TB of data.The data set was out of sync for almost entire month of Feb (Last commit was Feb 1 19:05 UTC).So to restore it back to present day we decided to write 15 days worth of data in 1 run.
   
   But the job taking a long time (4 hrs) and I decided to kill the job and start a new EMR with more executors.After restarting the job I found that the job resumes from the Feb 1 19:05 UTC checkpoint but immediately stops all the executors.The job also has an **commit.requested** and **inflight** present in the .hoodie folder.I tried deleting both **commit.requested** and **inflight** files but I still get the same behavior.Can I use the Hudi Cli to restore the Hudi table back to the last successful commit and start from that checkpoint
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Run the hudi deltasteamer on emr-5.31.0 on 1 Master and 4 core both m5.4xlarge with 12 executors with 5g memory 
   and spark.executor.cores as 4 and spark.task.cpus: 1. - I can provide more details if needed but this configuration has been
   tested and has been running efficiently for a 1 year.The number of executors was very low 
   2. Read 178,536 files of size 1000.7 GB and write to hudi table (Files 571,478) 45.0 TB
   3. Kill the job in 4 hours
   4. Restart hudi deltastreamer with 28 executors same configuration as Step 1
   
   **Expected behavior**
   
   I expected the hudi deltastreamer will rollback previous inflight commit and start from Feb 1 checkpoint and write all 1000.7 GB files successfully 
   
   **Environment Description**
   
   Hudi version : 0.6.0
   
   Spark version : 2.4.6
   
   Hive version : 2.3.7
   
   Hadoop version : Amazon 2.10.0
   
   Storage (HDFS/S3/GCS..) : S3
   
   Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   I did see this issue [https://github.com/apache/hudi/issues/2072|https://github.com/apache/hudi/issues/2072] which refers to
   using Hudi Cli to create a save-point and reset it back to that point.Is that possible
   
   **Stacktrace**
   
   I did not see any exception in log.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] abhijeetkushe commented on issue #4831: [SUPPORT] Help with Hudi Cli Savepoint and restore Hudi Table (Blocker)

Posted by GitBox <gi...@apache.org>.
abhijeetkushe commented on issue #4831:
URL: https://github.com/apache/hudi/issues/4831#issuecomment-1045011848


   Actually we were able to get the job running again.When we started with a small volume of data and it was able to rollback the previous commit and catch up to current time.I am closing this issue.Thanks for sending the Savepoint documentation @nsivabalan


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] abhijeetkushe closed issue #4831: [SUPPORT] Help with Hudi Cli Savepoint and restore Hudi Table (Blocker)

Posted by GitBox <gi...@apache.org>.
abhijeetkushe closed issue #4831:
URL: https://github.com/apache/hudi/issues/4831


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4831: [SUPPORT] Help with Hudi Cli Savepoint and restore Hudi Table (Blocker)

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4831:
URL: https://github.com/apache/hudi/issues/4831#issuecomment-1043965883


   I have added documentation in this patch https://github.com/apache/hudi/pull/4715. should help you with adding a savepoint and restoring to one of the savepointed commit. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org