You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2021/11/16 13:24:00 UTC

[jira] [Updated] (HUDI-2772) Deltastreamer fails to read checkpoint from previous commit metadata by spark writer intermittantly

     [ https://issues.apache.org/jira/browse/HUDI-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sivabalan narayanan updated HUDI-2772:
--------------------------------------
    Description: 
Even after setting the right config to copy over deltastreamer checkpoint, deltastreamer fails to read the checkpoint from previous commit metadata intermittantly.  

Setup:

Deltastreamer in continuous mode. 

And triggered a concurrent write from spark-datasource. 

 

I inspected the last commit.completed instant(that was reported by deltastreamer) made by spark writer and it looks ok to me. 
{code:java}
grep "checkpoint" /tmp/hudi-deltastreamer-gh-mw/.hoodie/20211116074129737.deltacommit
    "deltastreamer.checkpoint.key" : "1637066483000" {code}
But after the below exception, if I restart deltastreamer, it just runs fine. Very strange? I was able to reprod this 2 times out of 5.  

here is the checkpoint from last delta commit by deltastreamer (which matches the entry found by delta commit by spark writer above)
{code:java}
grep "checkpoint" /tmp/hudi-deltastreamer-gh-mw/.hoodie/20211116074123384.deltacommit
    "deltastreamer.checkpoint.key" : "1637066483000" {code}
 

I also check detlastreamer code and we do look at only completed instants and the completed commit metadata. So, not sure why is this happening. 

stacktrace: 
{code:java}
21/11/16 07:41:31 ERROR HoodieDeltaStreamer: Shutting down delta-sync due to exception
org.apache.hudi.utilities.exception.HoodieDeltaStreamerException: Unable to find previous checkpoint. Please double check if this table was indeed built via delta streamer. Last Commit :Option{val=[20211116074129737__deltacommit__COMPLETED]}, Instants :[[20211116072710523__deltacommit__COMPLETED], [20211116072748906__deltacommit__COMPLETED], [20211116072910768__deltacommit__COMPLETED], [20211116074114874__deltacommit__COMPLETED], [20211116074123384__deltacommit__COMPLETED], [20211116074129737__deltacommit__COMPLETED]], CommitMetadata={
  "partitionToWriteStats" : { },
  "compacted" : false,
  "extraMetadata" : { },
  "operationType" : "UNKNOWN",
  "fileIdAndRelativePaths" : { },
  "totalRecordsDeleted" : 0,
  "totalLogRecordsCompacted" : 0,
  "totalLogFilesCompacted" : 0,
  "totalCompactedRecordsUpdated" : 0,
  "totalLogFilesSize" : 0,
  "totalScanTime" : 0,
  "totalCreateTime" : 0,
  "totalUpsertTime" : 0,
  "minAndMaxEventTime" : {
    "Optional.empty" : {
      "val" : null,
      "present" : false
    }
  },
  "writePartitionPaths" : [ ]
}
	at org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:346)
	at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:281)
	at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:634)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
21/11/16 07:41:31 WARN HoodieDeltaStreamer: Gracefully shutting down compactor
21/11/16 07:41:39 WARN SparkRDDWriteClient: Slept for 20 secs, proceeding 
21/11/16 07:41:40 ERROR HoodieAsyncService: Monitor noticed one or more threads failed. Requesting graceful shutdown of other threads
java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Unable to find previous checkpoint. Please double check if this table was indeed built via delta streamer. Last Commit :Option{val=[20211116074129737__deltacommit__COMPLETED]}, Instants :[[20211116072710523__deltacommit__COMPLETED], [20211116072748906__deltacommit__COMPLETED], [20211116072910768__deltacommit__COMPLETED], [20211116074114874__deltacommit__COMPLETED], [20211116074123384__deltacommit__COMPLETED], [20211116074129737__deltacommit__COMPLETED]], CommitMetadata={{code}

  was:
Even after setting the right config to copy over deltastreamer checkpoint, deltastreamer fails to read the checkpoint from previous commit metadata intermittantly.  

Setup:

Deltastreamer in continuous mode. 

And triggered a concurrent write from spark-datasource. 

 

I inspected the last commit.completed instant(that was reported by deltastreamer) and it looks good. 
{code:java}
grep "checkpoint" /tmp/hudi-deltastreamer-gh-mw/.hoodie/20211116074129737.deltacommit
    "deltastreamer.checkpoint.key" : "1637066483000" {code}
But after the below exception, if I restart deltastreamer, it just runs fine. Very strange? I was able to reprod this 2 times out of 5.  

here is the checkpoint from last delta commit by deltastreamer (which matches the entry found by delta commit by spark writer above)
{code:java}
grep "checkpoint" /tmp/hudi-deltastreamer-gh-mw/.hoodie/20211116074123384.deltacommit
    "deltastreamer.checkpoint.key" : "1637066483000" {code}
 

I also check detlastreamer code and we do look at only completed instants and the completed commit metadata. So, not sure why is this happening. 

stacktrace: 
{code:java}
21/11/16 07:41:31 ERROR HoodieDeltaStreamer: Shutting down delta-sync due to exception
org.apache.hudi.utilities.exception.HoodieDeltaStreamerException: Unable to find previous checkpoint. Please double check if this table was indeed built via delta streamer. Last Commit :Option{val=[20211116074129737__deltacommit__COMPLETED]}, Instants :[[20211116072710523__deltacommit__COMPLETED], [20211116072748906__deltacommit__COMPLETED], [20211116072910768__deltacommit__COMPLETED], [20211116074114874__deltacommit__COMPLETED], [20211116074123384__deltacommit__COMPLETED], [20211116074129737__deltacommit__COMPLETED]], CommitMetadata={
  "partitionToWriteStats" : { },
  "compacted" : false,
  "extraMetadata" : { },
  "operationType" : "UNKNOWN",
  "fileIdAndRelativePaths" : { },
  "totalRecordsDeleted" : 0,
  "totalLogRecordsCompacted" : 0,
  "totalLogFilesCompacted" : 0,
  "totalCompactedRecordsUpdated" : 0,
  "totalLogFilesSize" : 0,
  "totalScanTime" : 0,
  "totalCreateTime" : 0,
  "totalUpsertTime" : 0,
  "minAndMaxEventTime" : {
    "Optional.empty" : {
      "val" : null,
      "present" : false
    }
  },
  "writePartitionPaths" : [ ]
}
	at org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:346)
	at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:281)
	at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:634)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
21/11/16 07:41:31 WARN HoodieDeltaStreamer: Gracefully shutting down compactor
21/11/16 07:41:39 WARN SparkRDDWriteClient: Slept for 20 secs, proceeding 
21/11/16 07:41:40 ERROR HoodieAsyncService: Monitor noticed one or more threads failed. Requesting graceful shutdown of other threads
java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Unable to find previous checkpoint. Please double check if this table was indeed built via delta streamer. Last Commit :Option{val=[20211116074129737__deltacommit__COMPLETED]}, Instants :[[20211116072710523__deltacommit__COMPLETED], [20211116072748906__deltacommit__COMPLETED], [20211116072910768__deltacommit__COMPLETED], [20211116074114874__deltacommit__COMPLETED], [20211116074123384__deltacommit__COMPLETED], [20211116074129737__deltacommit__COMPLETED]], CommitMetadata={{code}


> Deltastreamer fails to read checkpoint from previous commit metadata by spark writer intermittantly
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-2772
>                 URL: https://issues.apache.org/jira/browse/HUDI-2772
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Critical
>
> Even after setting the right config to copy over deltastreamer checkpoint, deltastreamer fails to read the checkpoint from previous commit metadata intermittantly.  
> Setup:
> Deltastreamer in continuous mode. 
> And triggered a concurrent write from spark-datasource. 
>  
> I inspected the last commit.completed instant(that was reported by deltastreamer) made by spark writer and it looks ok to me. 
> {code:java}
> grep "checkpoint" /tmp/hudi-deltastreamer-gh-mw/.hoodie/20211116074129737.deltacommit
>     "deltastreamer.checkpoint.key" : "1637066483000" {code}
> But after the below exception, if I restart deltastreamer, it just runs fine. Very strange? I was able to reprod this 2 times out of 5.  
> here is the checkpoint from last delta commit by deltastreamer (which matches the entry found by delta commit by spark writer above)
> {code:java}
> grep "checkpoint" /tmp/hudi-deltastreamer-gh-mw/.hoodie/20211116074123384.deltacommit
>     "deltastreamer.checkpoint.key" : "1637066483000" {code}
>  
> I also check detlastreamer code and we do look at only completed instants and the completed commit metadata. So, not sure why is this happening. 
> stacktrace: 
> {code:java}
> 21/11/16 07:41:31 ERROR HoodieDeltaStreamer: Shutting down delta-sync due to exception
> org.apache.hudi.utilities.exception.HoodieDeltaStreamerException: Unable to find previous checkpoint. Please double check if this table was indeed built via delta streamer. Last Commit :Option{val=[20211116074129737__deltacommit__COMPLETED]}, Instants :[[20211116072710523__deltacommit__COMPLETED], [20211116072748906__deltacommit__COMPLETED], [20211116072910768__deltacommit__COMPLETED], [20211116074114874__deltacommit__COMPLETED], [20211116074123384__deltacommit__COMPLETED], [20211116074129737__deltacommit__COMPLETED]], CommitMetadata={
>   "partitionToWriteStats" : { },
>   "compacted" : false,
>   "extraMetadata" : { },
>   "operationType" : "UNKNOWN",
>   "fileIdAndRelativePaths" : { },
>   "totalRecordsDeleted" : 0,
>   "totalLogRecordsCompacted" : 0,
>   "totalLogFilesCompacted" : 0,
>   "totalCompactedRecordsUpdated" : 0,
>   "totalLogFilesSize" : 0,
>   "totalScanTime" : 0,
>   "totalCreateTime" : 0,
>   "totalUpsertTime" : 0,
>   "minAndMaxEventTime" : {
>     "Optional.empty" : {
>       "val" : null,
>       "present" : false
>     }
>   },
>   "writePartitionPaths" : [ ]
> }
> 	at org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:346)
> 	at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:281)
> 	at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:634)
> 	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> 21/11/16 07:41:31 WARN HoodieDeltaStreamer: Gracefully shutting down compactor
> 21/11/16 07:41:39 WARN SparkRDDWriteClient: Slept for 20 secs, proceeding 
> 21/11/16 07:41:40 ERROR HoodieAsyncService: Monitor noticed one or more threads failed. Requesting graceful shutdown of other threads
> java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Unable to find previous checkpoint. Please double check if this table was indeed built via delta streamer. Last Commit :Option{val=[20211116074129737__deltacommit__COMPLETED]}, Instants :[[20211116072710523__deltacommit__COMPLETED], [20211116072748906__deltacommit__COMPLETED], [20211116072910768__deltacommit__COMPLETED], [20211116074114874__deltacommit__COMPLETED], [20211116074123384__deltacommit__COMPLETED], [20211116074129737__deltacommit__COMPLETED]], CommitMetadata={{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)