You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/12/18 07:42:56 UTC

[GitHub] [spark] dongjoon-hyun opened a new pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

dongjoon-hyun opened a new pull request #34942:
URL: https://github.com/apache/spark/pull/34942


   ### What changes were proposed in this pull request?
   
   This PR aims to support `RocksDB` backend in Spark History Server via a new configuration,
   `spark.history.store.hybridStore.diskBackend`.
   
   ### Why are the changes needed?
   
   Currently, Spark History Server's `HybridStore` uses `LevelDB` which is not working on Java 17 native Apple Silicon VM.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, this introduce a new configuration.
   
   ### How was this patch tested?
   
   Pass the CIs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #34942:
URL: https://github.com/apache/spark/pull/34942#discussion_r774234750



##########
File path: core/src/main/scala/org/apache/spark/internal/config/History.scala
##########
@@ -211,4 +211,11 @@ private[spark] object History {
     .version("3.1.0")
     .bytesConf(ByteUnit.BYTE)
     .createWithDefaultString("2g")
+
+  val HYBRID_STORE_DISK_BACKEND = ConfigBuilder("spark.history.store.hybridStore.diskBackend")
+    .doc("Specifies a disk-based store used in hybrid store; 'leveldb' or 'rocksdb'.")
+    .version("3.3.0")
+    .stringConf
+    .checkValues(Set("leveldb", "rocksdb"))

Review comment:
       I forgot to reply this part.
   > How we provide smooth migration from LevelDB to RocksDB? End users already loaded their old applications via LevelDB. This applies to LevelDB KVStore and current Hybrid KVStore backed by LevelDB KVStore.
   
   It depends on the definition of `smooth` migration. 1) Currently, Spark dropped and rebuild the local db when the corruption happens. We can treat in that way. 2) We can add some copying logic from LevelDB to RocksDB, not vise versa.
   
   BTW, `RocksDB` backend needs to catch up the performance first before discussing the migration. In short, it's too early to consider the migration. Let me re-initiate the discussion when it's ready.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #34942:
URL: https://github.com/apache/spark/pull/34942#discussion_r774226247



##########
File path: core/src/main/scala/org/apache/spark/internal/config/History.scala
##########
@@ -211,4 +211,11 @@ private[spark] object History {
     .version("3.1.0")
     .bytesConf(ByteUnit.BYTE)
     .createWithDefaultString("2g")
+
+  val HYBRID_STORE_DISK_BACKEND = ConfigBuilder("spark.history.store.hybridStore.diskBackend")
+    .doc("Specifies a disk-based store used in hybrid store; 'leveldb' or 'rocksdb'.")
+    .version("3.3.0")
+    .stringConf
+    .checkValues(Set("leveldb", "rocksdb"))

Review comment:
       And please propose "deprecation" before "removal", so that everyone is not surprised for the proposal.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997165075


   cc @viirya , @mridulm , @LuciferYang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #34942:
URL: https://github.com/apache/spark/pull/34942#discussion_r771798813



##########
File path: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
##########
@@ -130,10 +131,16 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
   private val fastInProgressParsing = conf.get(FAST_IN_PROGRESS_PARSING)
 
   private val hybridStoreEnabled = conf.get(History.HYBRID_STORE_ENABLED)
+  private val hybridStoreDiskBackend = conf.get(History.HYBRID_STORE_DISK_BACKEND)
 
   // Visible for testing.
   private[history] val listing: KVStore = storePath.map { path =>
-    val dbPath = Files.createDirectories(new File(path, "listing.ldb").toPath()).toFile()
+    val dir = hybridStoreDiskBackend match {
+      case "leveldb" => "listing.ldb"
+      case "rocksdb" => "listing.rdb"
+      case db => throw new IllegalStateException(s"$db is not supported.")

Review comment:
       nit: `IllegalArgumentException`? KVUtils uses `IllegalArgumentException` for wrong disk backend config.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997234936


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50837/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #34942:
URL: https://github.com/apache/spark/pull/34942#discussion_r774225422



##########
File path: core/src/main/scala/org/apache/spark/internal/config/History.scala
##########
@@ -211,4 +211,11 @@ private[spark] object History {
     .version("3.1.0")
     .bytesConf(ByteUnit.BYTE)
     .createWithDefaultString("2g")
+
+  val HYBRID_STORE_DISK_BACKEND = ConfigBuilder("spark.history.store.hybridStore.diskBackend")
+    .doc("Specifies a disk-based store used in hybrid store; 'leveldb' or 'rocksdb'.")
+    .version("3.3.0")
+    .stringConf
+    .checkValues(Set("leveldb", "rocksdb"))

Review comment:
       >  I'm going to propose the removal of LevelDB dependency completely. Do you have any concern about the removal of LevelDB JNI, @HeartSaVioR ?
   
   Two points I can think of:
   
   1. How I interpret the benchmark in PR description of #34913? Since it takes histogram and uses timer, I'd guess it is about latency, and then looks like RocksDB is "slower" than LevelDB. It's good if I'm mistaken, but if I'm not mistaken, pretty sure I don't support the plan to bring performance regression. "It doesn't work with Apple Silicon" is not a sufficient fact to drop it out.
   
   2. How we provide smooth migration from LevelDB to RocksDB? End users already loaded their old applications via LevelDB. This applies to LevelDB KVStore and current Hybrid KVStore backed by LevelDB KVStore.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997242280


   **[Test build #146363 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146363/testReport)** for PR 34942 at commit [`eae923d`](https://github.com/apache/spark/commit/eae923d63b7f2fe9d0666d54126fbbaf702df932).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997184531


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146352/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997343837


   FYI @thejdeep, @shardulm94 
   You guys had investigated rocksdb as replacement for leveldb for SHS.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #34942:
URL: https://github.com/apache/spark/pull/34942


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997543951


   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #34942:
URL: https://github.com/apache/spark/pull/34942#discussion_r774233077



##########
File path: core/src/main/scala/org/apache/spark/internal/config/History.scala
##########
@@ -211,4 +211,11 @@ private[spark] object History {
     .version("3.1.0")
     .bytesConf(ByteUnit.BYTE)
     .createWithDefaultString("2g")
+
+  val HYBRID_STORE_DISK_BACKEND = ConfigBuilder("spark.history.store.hybridStore.diskBackend")
+    .doc("Specifies a disk-based store used in hybrid store; 'leveldb' or 'rocksdb'.")
+    .version("3.3.0")
+    .stringConf
+    .checkValues(Set("leveldb", "rocksdb"))

Review comment:
       Thank you for your feedback, @HeartSaVioR .
   1. Of course, yes for the deprecation!
   2. And, yes, `RocksDB` didn't outperform in my benchmark although I tried several storage optimization before making this PR. So, I put it out from this PR and didn't propose the deprecation yet (so far). The performance tuning is also on my roadmap in Apache Spark 3.3 timeframe.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997225167


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50837/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #34942:
URL: https://github.com/apache/spark/pull/34942#discussion_r774026184



##########
File path: core/src/main/scala/org/apache/spark/internal/config/History.scala
##########
@@ -211,4 +211,11 @@ private[spark] object History {
     .version("3.1.0")
     .bytesConf(ByteUnit.BYTE)
     .createWithDefaultString("2g")
+
+  val HYBRID_STORE_DISK_BACKEND = ConfigBuilder("spark.history.store.hybridStore.diskBackend")
+    .doc("Specifies a disk-based store used in hybrid store; 'leveldb' or 'rocksdb'.")
+    .version("3.3.0")
+    .stringConf
+    .checkValues(Set("leveldb", "rocksdb"))

Review comment:
       Here is the follow-up PR.
   - https://github.com/apache/spark/pull/34986




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #34942:
URL: https://github.com/apache/spark/pull/34942#discussion_r773582447



##########
File path: core/src/main/scala/org/apache/spark/internal/config/History.scala
##########
@@ -211,4 +211,11 @@ private[spark] object History {
     .version("3.1.0")
     .bytesConf(ByteUnit.BYTE)
     .createWithDefaultString("2g")
+
+  val HYBRID_STORE_DISK_BACKEND = ConfigBuilder("spark.history.store.hybridStore.diskBackend")
+    .doc("Specifies a disk-based store used in hybrid store; 'leveldb' or 'rocksdb'.")
+    .version("3.3.0")
+    .stringConf
+    .checkValues(Set("leveldb", "rocksdb"))

Review comment:
       Sorry for the late and post-review.
   
   Looks like we tend to transform the value to lowercase only or uppercase only, before applying checks and using it. Preferably we define enum for values of config as well.
   
   https://github.com/apache/spark/blob/7d88f1c5c7f38c0f1a2bd5e3116c668d9cbd98b1/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala#L153-L169
   
   Would you mind if I ask about addressing this as a follow-up PR? Thanks in advance!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #34942:
URL: https://github.com/apache/spark/pull/34942#discussion_r771834228



##########
File path: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
##########
@@ -130,10 +131,16 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
   private val fastInProgressParsing = conf.get(FAST_IN_PROGRESS_PARSING)
 
   private val hybridStoreEnabled = conf.get(History.HYBRID_STORE_ENABLED)
+  private val hybridStoreDiskBackend = conf.get(History.HYBRID_STORE_DISK_BACKEND)
 
   // Visible for testing.
   private[history] val listing: KVStore = storePath.map { path =>
-    val dbPath = Files.createDirectories(new File(path, "listing.ldb").toPath()).toFile()
+    val dir = hybridStoreDiskBackend match {
+      case "leveldb" => "listing.ldb"
+      case "rocksdb" => "listing.rdb"
+      case db => throw new IllegalStateException(s"$db is not supported.")

Review comment:
       Sure!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997235488


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50837/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #34942:
URL: https://github.com/apache/spark/pull/34942#discussion_r774001367



##########
File path: core/src/main/scala/org/apache/spark/internal/config/History.scala
##########
@@ -211,4 +211,11 @@ private[spark] object History {
     .version("3.1.0")
     .bytesConf(ByteUnit.BYTE)
     .createWithDefaultString("2g")
+
+  val HYBRID_STORE_DISK_BACKEND = ConfigBuilder("spark.history.store.hybridStore.diskBackend")
+    .doc("Specifies a disk-based store used in hybrid store; 'leveldb' or 'rocksdb'.")
+    .version("3.3.0")
+    .stringConf
+    .checkValues(Set("leveldb", "rocksdb"))

Review comment:
       Sure, I'll do. Thank you for the review.
   
   BTW, after one or two releases, I'm going to propose the removal of `LevelDB` dependency completely. Do you have any concern about the removal of LevelDB JNI, @HeartSaVioR ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #34942:
URL: https://github.com/apache/spark/pull/34942#discussion_r774225422



##########
File path: core/src/main/scala/org/apache/spark/internal/config/History.scala
##########
@@ -211,4 +211,11 @@ private[spark] object History {
     .version("3.1.0")
     .bytesConf(ByteUnit.BYTE)
     .createWithDefaultString("2g")
+
+  val HYBRID_STORE_DISK_BACKEND = ConfigBuilder("spark.history.store.hybridStore.diskBackend")
+    .doc("Specifies a disk-based store used in hybrid store; 'leveldb' or 'rocksdb'.")
+    .version("3.3.0")
+    .stringConf
+    .checkValues(Set("leveldb", "rocksdb"))

Review comment:
       >  I'm going to propose the removal of LevelDB dependency completely. Do you have any concern about the removal of LevelDB JNI, @HeartSaVioR ?
   
   Two points I can think of:
   
   1. How I interpret the benchmark in PR description of #34913? Since it takes histogram and uses timer, I'd guess it is about latency, and then looks like RocksDB is "slower" than LevelDB. It's good if I'm mistaken, but if I'm not mistaken, pretty sure I don't support the plan to bring performance regression. "It doesn't work with Apple Silicon" is not sufficient fact to drop it out.
   
   2. How we provide smooth migration from LevelDB to RocksDB? End users already loaded their old applications via LevelDB. This applies to LevelDB KVStore and current Hybrid KVStore backed by LevelDB KVStore.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997176163


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50826/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997219115


   **[Test build #146363 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146363/testReport)** for PR 34942 at commit [`eae923d`](https://github.com/apache/spark/commit/eae923d63b7f2fe9d0666d54126fbbaf702df932).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997235488


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50837/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997162421


   **[Test build #146352 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146352/testReport)** for PR 34942 at commit [`d6d0428`](https://github.com/apache/spark/commit/d6d0428f4b310c9fb4b8ed016e69b1c51078c80a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997261623


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146363/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997169495


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50826/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997233589


   lgtm


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997162421


   **[Test build #146352 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146352/testReport)** for PR 34942 at commit [`d6d0428`](https://github.com/apache/spark/commit/d6d0428f4b310c9fb4b8ed016e69b1c51078c80a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997261623


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146363/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997183928


   **[Test build #146352 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146352/testReport)** for PR 34942 at commit [`d6d0428`](https://github.com/apache/spark/commit/d6d0428f4b310c9fb4b8ed016e69b1c51078c80a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `  trait SwitchToDiskStoreListener `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997176163


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50826/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997175666


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50826/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997230927


   The last commit is irrelevant to unit tests. Thank you, @viirya 
   Merged to master for Apache Spark 3.3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997184531


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146352/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34942:
URL: https://github.com/apache/spark/pull/34942#issuecomment-997219115


   **[Test build #146363 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146363/testReport)** for PR 34942 at commit [`eae923d`](https://github.com/apache/spark/commit/eae923d63b7f2fe9d0666d54126fbbaf702df932).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #34942: [SPARK-37680][CORE] Support RocksDB backend in Spark History Server

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #34942:
URL: https://github.com/apache/spark/pull/34942#discussion_r774249559



##########
File path: core/src/main/scala/org/apache/spark/internal/config/History.scala
##########
@@ -211,4 +211,11 @@ private[spark] object History {
     .version("3.1.0")
     .bytesConf(ByteUnit.BYTE)
     .createWithDefaultString("2g")
+
+  val HYBRID_STORE_DISK_BACKEND = ConfigBuilder("spark.history.store.hybridStore.diskBackend")
+    .doc("Specifies a disk-based store used in hybrid store; 'leveldb' or 'rocksdb'.")
+    .version("3.3.0")
+    .stringConf
+    .checkValues(Set("leveldb", "rocksdb"))

Review comment:
       Sounds great. Let's defer the discussion later till we get enough rationalization to replace LevelDB with RocksDB.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org