You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "sahnib (via GitHub)" <gi...@apache.org> on 2024/01/23 16:20:31 UTC

Re: [PR] [SPARK-46796][SS] Ensure the correct remote files (mentioned in metadata.zip) are used on RocksDB version load [spark]

sahnib commented on code in PR #44837:
URL: https://github.com/apache/spark/pull/44837#discussion_r1463548230


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala:
##########
@@ -162,6 +162,8 @@ class RocksDB(
         val metadata = fileManager.loadCheckpointFromDfs(latestSnapshotVersion, workingDir)
         loadedVersion = latestSnapshotVersion
 
+        // reset last snapshot version
+        lastSnapshotVersion = 0L

Review Comment:
   We need to reset the `lastSnapshotVersion` to discard any snapshots taken which are higher than the loaded version. It is necessary because the new SST files downloaded might not be compatible with the existing snapshots. Consider the scenario below: 
   
   1. Executor 1 commits v1 and v2. RocksDB snapshots are taken, but only changelog files are committed. 
   2. Executor 1 performs maintenance and uploads checkpoint snapshot for v2 on DFS. 
   3. Executor 1 performs commits for v3 and v4. RocksDB snapshots are taken, but only changelog files are committed. 
   4. Executor 2 starts from v0, commits v1 and v2, performs maintenance and overwrites checkpoint snapshot for v2. 
   5. Executor 1 reloads version 2.  At this point, we now download files committed by executor 2 in (4). **However, notice that lastSnapshotVersion is still 4**.
   6. Executor 1 commits v3 and v4. Changelog files are committed, but no RocksDB snapshots are taken because **lastSnapshotVersion >= newVersion** 
   7. Executor 1 uploads Snapshot v4 to DFS as part of maintenance. This will result in snapshot taken in (3) uploaded to DFS, with metadata pointing to current state as in (6), which are not compatible. 
   
   The testcase `time travel 2 (with changelog checkpointing)` covers this scenario. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org