You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/04 20:18:54 UTC

[GitHub] [spark] mridulm commented on a diff in pull request #36200: [SPARK-38909][CORE][YARN] Encapsulate `LevelDB` used by `ExternalShuffleBlockResolver`,`YarnShuffleService` and `RemoteBlockPushResolver` as `DB`

mridulm commented on code in PR #36200:
URL: https://github.com/apache/spark/pull/36200#discussion_r938207311


##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##########
@@ -930,15 +922,15 @@ List<byte[]> reloadActiveAppAttemptsPathInfo(DB db) throws IOException {
                   try {
                     // Add the former outdated DB key to deletion list
                     dbKeysToBeRemoved.add(getDbAppAttemptPathsKey(existingAppAttemptId));
-                  } catch (IOException e) {
-                    logger.error("Failed to get the DB key for {}", existingAppAttemptId, e);
+                  } catch (IOException ioe) {
+                    logger.error("Failed to get the DB key for {}", existingAppAttemptId, ioe);
                   }

Review Comment:
   nit: Let us minimize change and avoid renames unless they are badly named earlier



##########
common/network-common/src/main/java/org/apache/spark/network/util/DBProvider.java:
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.network.util;
+
+import java.io.File;
+import java.io.IOException;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.annotations.VisibleForTesting;
+
+import org.apache.spark.network.shuffledb.LevelDB;
+import org.apache.spark.network.shuffledb.DB;
+import org.apache.spark.network.shuffledb.StoreVersion;
+
+public class DBProvider {
+    public static DB initDB(File dbFile, StoreVersion version, ObjectMapper mapper)
+        throws IOException {
+        if (dbFile != null) {
+            if (dbFile.getName().endsWith(".ldb")) {
+                org.iq80.leveldb.DB levelDB = LevelDBProvider.initLevelDB(dbFile, version, mapper);
+                return levelDB != null ? new LevelDB(levelDB) : null;
+            } else {
+                return null;
+            }
+        }

Review Comment:
   Instead of relying on extension, we should make it explicit - for example what is in SPARK-37680.
   Any thoughts on how to refactor this out @dongjoon-hyun ?



##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##########
@@ -417,9 +417,7 @@ void removeAppAttemptPathInfoFromDB(String appId, int attemptId) {
     if (db != null) {
       try {
         byte[] key = getDbAppAttemptPathsKey(appAttemptId);
-        if (db.get(key) != null) {

Review Comment:
   Did you remove this change ? I thought it made sense to do this.



##########
common/network-common/src/main/java/org/apache/spark/network/shuffledb/DB.java:
##########
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffledb;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.Map;
+
+/**
+ * The local KV storage used to persist the shuffle state,
+ * the implementations may include leveldb, rocksdb, etc.
+ */
+public interface DB extends Closeable {
+    /**
+     * Set the DB entry for "key" to "value".
+     */
+    void put(byte[] key, byte[] value) throws RuntimeException;
+
+    /**
+     * Get which returns a new byte array storing the value associated
+     * with the specified input key if any.
+     */
+    byte[] get(byte[] key) throws RuntimeException;
+
+    /**
+     * Delete the DB entry (if any) for "key".
+     */
+    void delete(byte[] key) throws RuntimeException;
+
+    /**
+     * Read KV prefixed with `prefix` into a Map from DB.
+     */
+    Map<String, byte[]> readKVToMap(String prefix) throws IOException;

Review Comment:
   Discussion:
   
   * The number of keys matching a prefix can be large.
   The current code does the load using an iterator, and so can stream them.
   With the change to `Map`, we double the memory needed (`for creation of the Map + the subsequent datastructure built using this Map` vs `max-memory-per-row + the subsequent datastructure built using this Iterator`).
   
   * Ordering expectation
   There is an ordering expectation in level db, I dont think the current code leverages it, but future evolution could ?
   
   Given both, do we want to make this an `Iterator` instead of `Map` ?
   Any reasons why this might not be preferable ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org