You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "anishshri-db (via GitHub)" <gi...@apache.org> on 2024/01/17 03:39:42 UTC

Re: [PR] [SPARK-46705][SS] Make RocksDB State Store Compaction Less Likely to fall behind [spark]

anishshri-db commented on code in PR #44712:
URL: https://github.com/apache/spark/pull/44712#discussion_r1454494100


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala:
##########
@@ -105,6 +105,17 @@ class RocksDB(
   }
 
   columnFamilyOptions.setCompressionType(getCompressionType(conf.compression))
+  // We can easily accumulate many small L0 files if changelog is not enabled, or full snapshot
+  // interval is not large enough. Triggering compactions for those small files are expensive.
+  // We make it harder to trigger and allow more L0 files without write stalling.
+  // We increase L0->L1 compaction threshold to reduce the overhead of L0->L1 compaction.
+  // Given the nature of small L0 files, for most workloads, even the value of 16 is too low for
+  // L0->L1 write amplification. However, if the value is too large, we may risk the chance that
+  // in some workloads where batch size is very large, some data might take a very long time to
+  // be compacted.
+  columnFamilyOptions.setLevel0FileNumCompactionTrigger(16)
+  columnFamilyOptions.setLevel0SlowdownWritesTrigger(200)

Review Comment:
   Do we need to make any of these configurable ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org