You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/24 13:23:47 UTC

[GitHub] [spark] LuciferYang opened a new pull request, #37648: [SPARK-38909][CORE][YARN][FOLLOWUP] Name later

LuciferYang opened a new pull request, #37648:
URL: https://github.com/apache/spark/pull/37648

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a diff in pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
mridulm commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r959280415


##########
common/network-common/src/main/java/org/apache/spark/network/util/LevelDBProvider.java:
##########
@@ -85,14 +84,6 @@ public static DB initLevelDB(File dbFile, StoreVersion version, ObjectMapper map
     return tmpDb;
   }
 
-  @VisibleForTesting
-  static DB initLevelDB(File file) throws IOException {
-    Options options = new Options();
-    options.createIfMissing(true);
-    JniDBFactory factory = new JniDBFactory();
-    return factory.open(file, options);
-  }
-

Review Comment:
   This is mainly used for tests, any particular reason to remove it ?
   (There is a lack of error handling in this method, and would fail in case of corruption or other issues - which I believe `YarnShuffleIntegrationSuite` is leveraging)



##########
common/network-common/pom.xml:
##########
@@ -151,7 +151,6 @@
     <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-tags_${scala.binary.version}</artifactId>
-      <scope>test</scope>

Review Comment:
   `@Private` rather.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r963302692


##########
resource-managers/yarn/src/test/scala/org/apache/spark/network/shuffle/ShuffleTestAccessor.scala:
##########
@@ -208,9 +208,12 @@ object ShuffleTestAccessor {
   }
 
   def reloadRegisteredExecutors(
-    dbBackend: DBBackend,
-    file: File): ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
-    val db = DBProvider.initDB(dbBackend, file)
+      dbBackend: DBBackend,
+      file: File,
+      version: StoreVersion,
+      mapper: ObjectMapper)
+    : ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
+    val db = DBProvider.initDB(dbBackend, file, version, mapper)

Review Comment:
   If the original case does not test the `execStateCopy exists but open failed` scenario, it seems that there is no change.
   
   For the `initDB(dbBackend, file) ` method:
   
   https://github.com/apache/spark/blob/19b1780c4abed984b2223e6bbe7998b32a2ebad8/common/network-common/src/main/java/org/apache/spark/network/util/LevelDBProvider.java#L88-L94
   
   - if `execStateCopy` exists, it will re-open it and load data from `execStateCopy`
   - else if `execStateCopy` not exists, it will  open a new db and return empty data due to `options.createIfMissing(true)`, then assert failed
   
   For the `initDB(dbBackend, dbFile, version, mapper) ` method:
   
   -  if `execStateCopy` exists, it will re-open it and load data from `execStateCopy`
   -  else if `execStateCopy` not exists, it will open a new db and return empty data, then assert failed
   
   For `execStateCopy exists but open failed` scenario, use `initDB(dbBackend, file)` will throw an unhandled exception, 
   use `initDB(dbBackend, dbFile, version, mapper) ` will assert failed. Is this unacceptable?
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37648:
URL: https://github.com/apache/spark/pull/37648#issuecomment-1232555858

   Do you have time to further review this one? Thanks @mridulm 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a diff in pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
mridulm commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r963271711


##########
resource-managers/yarn/src/test/scala/org/apache/spark/network/shuffle/ShuffleTestAccessor.scala:
##########
@@ -208,9 +208,12 @@ object ShuffleTestAccessor {
   }
 
   def reloadRegisteredExecutors(
-    dbBackend: DBBackend,
-    file: File): ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
-    val db = DBProvider.initDB(dbBackend, file)
+      dbBackend: DBBackend,
+      file: File,
+      version: StoreVersion,
+      mapper: ObjectMapper)
+    : ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
+    val db = DBProvider.initDB(dbBackend, file, version, mapper)

Review Comment:
   This is a behavior change for test, right ([this ?](https://github.com/apache/spark/pull/37648#discussion_r959280415))
   Essentially, `YarnShuffleIntegrationSuite` would expect it to fail with an exception and not do error handling (which `initDB` does).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37648:
URL: https://github.com/apache/spark/pull/37648#issuecomment-1238973755

   [Run / Base image build](https://github.com/LuciferYang/spark/runs/8220878852?check_suite_focus=true#logs) failed as follows:
   
   ```
   2022-09-07T04:35:50.8451091Z #32 exporting to image
   2022-09-07T04:35:50.8451363Z #32 exporting layers done
   2022-09-07T04:35:50.8452103Z #32 exporting manifest sha256:1738df8ea1b7b1d839a5fdaa47f4a3d414d44c331688eddf60df5e9529e1fad5 done
   2022-09-07T04:35:50.8452943Z #32 exporting config sha256:9e4187b41de14caaad1f643e02cb8ac50ec0dbb65d3dea2dfa4397b86addfb1d
   2022-09-07T04:35:50.9955812Z #32 exporting config sha256:9e4187b41de14caaad1f643e02cb8ac50ec0dbb65d3dea2dfa4397b86addfb1d done
   2022-09-07T04:35:50.9956540Z #32 pushing layers
   2022-09-07T04:35:51.2964868Z #32 ...
   2022-09-07T04:35:51.2965252Z 
   2022-09-07T04:35:51.2966013Z #33 [auth] luciferyang/apache-spark-ci-image:pull,push token for ghcr.io
   2022-09-07T04:35:51.2966363Z #33 DONE 0.0s
   2022-09-07T04:35:51.4476109Z 
   2022-09-07T04:35:51.4479341Z #32 exporting to image
   2022-09-07T04:35:51.8931704Z #32 ...
   2022-09-07T04:35:51.8932307Z 
   2022-09-07T04:35:51.8933413Z #34 [auth] apache/spark/apache-spark-github-action-image-cache:pull luciferyang/apache-spark-ci-image:pull,push token for ghcr.io
   2022-09-07T04:35:51.8934022Z #34 DONE 0.0s
   2022-09-07T04:35:52.0437234Z 
   2022-09-07T04:35:52.0437900Z #32 exporting to image
   2022-09-07T04:35:52.4041282Z #32 pushing layers 1.5s done
   2022-09-07T04:35:52.4041635Z #32 ERROR: unexpected status: 403 Forbidden
   2022-09-07T04:35:52.4042165Z ------
   2022-09-07T04:35:52.4042409Z  > exporting to image:
   2022-09-07T04:35:52.4042677Z ------
   2022-09-07T04:35:52.4081024Z ERROR: failed to solve: unexpected status: 403 Forbidden
   2022-09-07T04:35:52.4215414Z ##[error]buildx failed with: ERROR: failed to solve: unexpected status: 403 Forbidden
   ```
   
   friendly ping @Yikun  for help, I should not have changed GitHub's configuration
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r963595703


##########
resource-managers/yarn/src/test/scala/org/apache/spark/network/shuffle/ShuffleTestAccessor.scala:
##########
@@ -208,9 +208,12 @@ object ShuffleTestAccessor {
   }
 
   def reloadRegisteredExecutors(
-    dbBackend: DBBackend,
-    file: File): ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
-    val db = DBProvider.initDB(dbBackend, file)
+      dbBackend: DBBackend,
+      file: File,
+      version: StoreVersion,
+      mapper: ObjectMapper)
+    : ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
+    val db = DBProvider.initDB(dbBackend, file, version, mapper)

Review Comment:
   @mridulm any suggestions?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a diff in pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
tgravescs commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r964891901


##########
resource-managers/yarn/src/test/scala/org/apache/spark/network/shuffle/ShuffleTestAccessor.scala:
##########
@@ -208,9 +208,12 @@ object ShuffleTestAccessor {
   }
 
   def reloadRegisteredExecutors(
-    dbBackend: DBBackend,
-    file: File): ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
-    val db = DBProvider.initDB(dbBackend, file)
+      dbBackend: DBBackend,
+      file: File,
+      version: StoreVersion,
+      mapper: ObjectMapper)
+    : ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
+    val db = DBProvider.initDB(dbBackend, file, version, mapper)

Review Comment:
   sorry didn't get a chance to look before your filed the other issue, happy to look if you make the change under the other issue



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r963302692


##########
resource-managers/yarn/src/test/scala/org/apache/spark/network/shuffle/ShuffleTestAccessor.scala:
##########
@@ -208,9 +208,12 @@ object ShuffleTestAccessor {
   }
 
   def reloadRegisteredExecutors(
-    dbBackend: DBBackend,
-    file: File): ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
-    val db = DBProvider.initDB(dbBackend, file)
+      dbBackend: DBBackend,
+      file: File,
+      version: StoreVersion,
+      mapper: ObjectMapper)
+    : ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
+    val db = DBProvider.initDB(dbBackend, file, version, mapper)

Review Comment:
   If the original case does not test the `execStateCopy exists but open failed` scenario, it seems that there is no change.
   
   For the `initDB(dbBackend, file) ` method:
   
   https://github.com/apache/spark/blob/19b1780c4abed984b2223e6bbe7998b32a2ebad8/common/network-common/src/main/java/org/apache/spark/network/util/LevelDBProvider.java#L88-L94
   
   - if `execStateCopy` exists, it will re-open it and load data from `execStateCopy`
   -if `execStateCopy` not exists, it will  open a new db and return empty data due to `options.createIfMissing(true)`, then assert failed
   
   For the `initDB(dbBackend, dbFile, version, mapper) ` method:
   
   -  if `execStateCopy` exists, it will re-open it and load data from `execStateCopy`
   -  else if `execStateCopy` not exists, it will open a new db and return empty data, then assert failed
   
   For `execStateCopy exists but open failed` scenario, use `initDB(dbBackend, file)` will throw an unhandled exception, 
   use `initDB(dbBackend, dbFile, version, mapper) ` will assert failed. Is this unacceptable?
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37648: [WIP][SPARK-38909][CORE][YARN][FOLLOWUP] Name later

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37648:
URL: https://github.com/apache/spark/pull/37648#issuecomment-1225725248

   cc @Ngone51  this pr used to fix the comments you left in https://github.com/apache/spark/pull/36200,
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37648: [WIP][SPARK-38909][BUILD][CORE][YARN][FOLLOWUP]

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r955690388


##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java:
##########
@@ -125,15 +125,17 @@ public ShuffleIndexInformation load(String filePath) throws IOException {
       .weigher((Weigher<String, ShuffleIndexInformation>)
         (filePath, indexInfo) -> indexInfo.getRetainedMemorySize())
       .build(indexCacheLoader);
-    DBBackend dbBackend = null;
+    DBBackend dbBackend;
     if (registeredExecutorFile != null) {

Review Comment:
   EDIT ~~And the log will look strange without this condition~~
   
   



##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java:
##########
@@ -125,15 +125,17 @@ public ShuffleIndexInformation load(String filePath) throws IOException {
       .weigher((Weigher<String, ShuffleIndexInformation>)
         (filePath, indexInfo) -> indexInfo.getRetainedMemorySize())
       .build(indexCacheLoader);
-    DBBackend dbBackend = null;
+    DBBackend dbBackend;
     if (registeredExecutorFile != null) {

Review Comment:
   Wait CI



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37648: [WIP][SPARK-38909][CORE][YARN][FOLLOWUP]

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r954701832


##########
common/network-common/pom.xml:
##########
@@ -151,7 +151,6 @@
     <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-tags_${scala.binary.version}</artifactId>
-      <scope>test</scope>

Review Comment:
   remove this due to @DeveloperApi



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
mridulm commented on PR #37648:
URL: https://github.com/apache/spark/pull/37648#issuecomment-1240121977

   Merged to master.
   Thanks for working on this @LuciferYang !
   Thanks for flagging this issue and the review @Ngone51 :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r964286098


##########
resource-managers/yarn/src/test/scala/org/apache/spark/network/shuffle/ShuffleTestAccessor.scala:
##########
@@ -208,9 +208,12 @@ object ShuffleTestAccessor {
   }
 
   def reloadRegisteredExecutors(
-    dbBackend: DBBackend,
-    file: File): ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
-    val db = DBProvider.initDB(dbBackend, file)
+      dbBackend: DBBackend,
+      file: File,
+      version: StoreVersion,
+      mapper: ObjectMapper)
+    : ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
+    val db = DBProvider.initDB(dbBackend, file, version, mapper)

Review Comment:
   OK, if this change is controversial or wrong, I will revert the code and keep the two `initLevelDB ` methods as before this pr.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r964905409


##########
resource-managers/yarn/src/test/scala/org/apache/spark/network/shuffle/ShuffleTestAccessor.scala:
##########
@@ -208,9 +208,12 @@ object ShuffleTestAccessor {
   }
 
   def reloadRegisteredExecutors(
-    dbBackend: DBBackend,
-    file: File): ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
-    val db = DBProvider.initDB(dbBackend, file)
+      dbBackend: DBBackend,
+      file: File,
+      version: StoreVersion,
+      mapper: ObjectMapper)
+    : ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
+    val db = DBProvider.initDB(dbBackend, file, version, mapper)

Review Comment:
   Thanks @tgravescs , I will give another pr later ~
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37648:
URL: https://github.com/apache/spark/pull/37648#issuecomment-1238988334

   > pls rebase this PR, after #37815 merged.
   
   Thanks ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r963302692


##########
resource-managers/yarn/src/test/scala/org/apache/spark/network/shuffle/ShuffleTestAccessor.scala:
##########
@@ -208,9 +208,12 @@ object ShuffleTestAccessor {
   }
 
   def reloadRegisteredExecutors(
-    dbBackend: DBBackend,
-    file: File): ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
-    val db = DBProvider.initDB(dbBackend, file)
+      dbBackend: DBBackend,
+      file: File,
+      version: StoreVersion,
+      mapper: ObjectMapper)
+    : ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
+    val db = DBProvider.initDB(dbBackend, file, version, mapper)

Review Comment:
   If the original case does not test the `execStateCopy exists but open failed` scenario, it seems that there is no change.
   
   For the `initDB(dbBackend, file) ` method:
   
   https://github.com/apache/spark/blob/19b1780c4abed984b2223e6bbe7998b32a2ebad8/common/network-common/src/main/java/org/apache/spark/network/util/LevelDBProvider.java#L88-L94
   
   - if `execStateCopy` exists, it will re-open it and load data from `execStateCopy`
   - else if `execStateCopy` not exists, it will  open a new db and return empty data due to `options.createIfMissing(true)`, then assert failed
   
   For the `initDB(dbBackend, dbFile, version, mapper) ` method:
   
   -  if `execStateCopy` exists, it will re-open it and load data from `execStateCopy`
   -  else if `execStateCopy` not exists, it will open a new db and return empty data, then assert failed
   
   For `execStateCopy exists but open failed` scenario, use `initDB(dbBackend, file)` will throw an unhandled exception,  use `initDB(dbBackend, dbFile, version, mapper) ` will assert failed. Is this unacceptable?
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a diff in pull request #37648: [WIP][SPARK-38909][BUILD][CORE][YARN][FOLLOWUP]

Posted by GitBox <gi...@apache.org>.
mridulm commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r956351701


##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java:
##########
@@ -125,18 +125,13 @@ public ShuffleIndexInformation load(String filePath) throws IOException {
       .weigher((Weigher<String, ShuffleIndexInformation>)
         (filePath, indexInfo) -> indexInfo.getRetainedMemorySize())
       .build(indexCacheLoader);
-    DBBackend dbBackend;
-    if (registeredExecutorFile != null) {
-      String dbBackendName =
-        conf.get(Constants.SHUFFLE_SERVICE_DB_BACKEND, DBBackend.LEVELDB.name());
-      dbBackend = DBBackend.byName(dbBackendName);
+    String dbBackendName =
+      conf.get(Constants.SHUFFLE_SERVICE_DB_BACKEND, DBBackend.LEVELDB.name());
+    DBBackend dbBackend = DBBackend.byName(dbBackendName);
+    db = DBProvider.initDB(dbBackend, this.registeredExecutorFile, CURRENT_VERSION, mapper);

Review Comment:
   Review note: Recovery need not be enabled for node managers - in which case `registeredExecutorFile` will be `null` (in addition to tests).
   
   `DBProvider.initDB` does handle null input though.
   
   So the main change in this file and `RemoteBlockPushResolver` is moving the log message to `if (db != null)`



##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java:
##########
@@ -125,18 +125,13 @@ public ShuffleIndexInformation load(String filePath) throws IOException {
       .weigher((Weigher<String, ShuffleIndexInformation>)
         (filePath, indexInfo) -> indexInfo.getRetainedMemorySize())
       .build(indexCacheLoader);
-    DBBackend dbBackend;
-    if (registeredExecutorFile != null) {
-      String dbBackendName =
-        conf.get(Constants.SHUFFLE_SERVICE_DB_BACKEND, DBBackend.LEVELDB.name());
-      dbBackend = DBBackend.byName(dbBackendName);
+    String dbBackendName =
+      conf.get(Constants.SHUFFLE_SERVICE_DB_BACKEND, DBBackend.LEVELDB.name());
+    DBBackend dbBackend = DBBackend.byName(dbBackendName);
+    db = DBProvider.initDB(dbBackend, this.registeredExecutorFile, CURRENT_VERSION, mapper);

Review Comment:
   Review note: Recovery need not be enabled for node managers - in which case `registeredExecutorFile` will be `null` (in addition to tests).
   
   `DBProvider.initDB` does handle null input though.
   
   So the main change in this file and `RemoteBlockPushResolver` is moving the log message into `if (db != null)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #37648: [WIP][SPARK-38909][BUILD][CORE][YARN][FOLLOWUP]

Posted by GitBox <gi...@apache.org>.
mridulm commented on PR #37648:
URL: https://github.com/apache/spark/pull/37648#issuecomment-1228821866

   Is this still WIP @LuciferYang ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37648: [WIP][SPARK-38909][BUILD][CORE][YARN][FOLLOWUP]

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r955704207


##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java:
##########
@@ -125,15 +125,17 @@ public ShuffleIndexInformation load(String filePath) throws IOException {
       .weigher((Weigher<String, ShuffleIndexInformation>)
         (filePath, indexInfo) -> indexInfo.getRetainedMemorySize())
       .build(indexCacheLoader);
-    DBBackend dbBackend = null;
+    DBBackend dbBackend;
     if (registeredExecutorFile != null) {

Review Comment:
   [e3500cf](https://github.com/apache/spark/pull/37648/commits/e3500cf714f9789604c1b977a5c64b82bd9337bb) remove the condition



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r959312369


##########
common/network-common/src/main/java/org/apache/spark/network/util/LevelDBProvider.java:
##########
@@ -85,14 +84,6 @@ public static DB initLevelDB(File dbFile, StoreVersion version, ObjectMapper map
     return tmpDb;
   }
 
-  @VisibleForTesting
-  static DB initLevelDB(File file) throws IOException {
-    Options options = new Options();
-    options.createIfMissing(true);
-    JniDBFactory factory = new JniDBFactory();
-    return factory.open(file, options);
-  }
-

Review Comment:
   Yes, this method is only called by `ShuffleTestAccessor#reloadRegisteredExecutors`. Currently, it is only used for one test case to verify a special test scenario:
   
   https://github.com/apache/spark/blob/6c3c9575cf06fb0537b22909d784bc6bc1d61b85/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnShuffleIntegrationSuite.scala#L152-L170
   
   This method  was originally in `ShuffleTestAccessor` and it was moved to `LevelDBProvider` in the previous PR, but it is not a general method, so I want move it back to `ShuffleTestAccessor`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r964332080


##########
resource-managers/yarn/src/test/scala/org/apache/spark/network/shuffle/ShuffleTestAccessor.scala:
##########
@@ -208,9 +208,12 @@ object ShuffleTestAccessor {
   }
 
   def reloadRegisteredExecutors(
-    dbBackend: DBBackend,
-    file: File): ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
-    val db = DBProvider.initDB(dbBackend, file)
+      dbBackend: DBBackend,
+      file: File,
+      version: StoreVersion,
+      mapper: ObjectMapper)
+    : ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
+    val db = DBProvider.initDB(dbBackend, file, version, mapper)

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37648:
URL: https://github.com/apache/spark/pull/37648#issuecomment-1240148842

   Thanks for your review @mridulm @tgravescs ~
   Thanks for helping fix GA @Yikun ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a diff in pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r961620059


##########
resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala:
##########
@@ -45,15 +45,14 @@ import org.apache.spark.SecurityManager
 import org.apache.spark.SparkFunSuite
 import org.apache.spark.internal.config._
 import org.apache.spark.network.server.BlockPushNonFatalFailure
-import org.apache.spark.network.shuffle.{MergedShuffleFileManager, NoOpMergedShuffleFileManager, RemoteBlockPushResolver, ShuffleTestAccessor}
+import org.apache.spark.network.shuffle.{Constants, MergedShuffleFileManager, NoOpMergedShuffleFileManager, RemoteBlockPushResolver, ShuffleTestAccessor}
 import org.apache.spark.network.shuffle.RemoteBlockPushResolver._
 import org.apache.spark.network.shuffle.protocol.ExecutorShuffleInfo
 import org.apache.spark.network.shuffledb.DBBackend
 import org.apache.spark.network.util.TransportConf
 import org.apache.spark.network.yarn.util.HadoopConfigProvider
 import org.apache.spark.tags.ExtendedLevelDBTest
 import org.apache.spark.util.Utils
-

Review Comment:
   revert this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37648:
URL: https://github.com/apache/spark/pull/37648#issuecomment-1229100012

   Removed [WIP], and need @Ngone51 confirm whether there are new comments of https://github.com/apache/spark/pull/36200.
   
   And I have a point to discuss:
   
   There are two `initLevelDB` method in `LevelDBProvider`,  one of them is only for testing code and only used by `org.apache.spark.network.shuffle.ShuffleTestAccessor` now, so should we move the test only `initLevelDB` method  to `ShuffleTestAccessor`?  The implementation of RocksDB will also encounter this issue
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a diff in pull request #37648: [WIP][SPARK-38909][BUILD][CORE][YARN][FOLLOWUP]

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r955680841


##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java:
##########
@@ -125,15 +125,17 @@ public ShuffleIndexInformation load(String filePath) throws IOException {
       .weigher((Weigher<String, ShuffleIndexInformation>)
         (filePath, indexInfo) -> indexInfo.getRetainedMemorySize())
       .build(indexCacheLoader);
-    DBBackend dbBackend = null;
+    DBBackend dbBackend;
     if (registeredExecutorFile != null) {

Review Comment:
   Do we really need this condition? I feel like it's useless.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37648:
URL: https://github.com/apache/spark/pull/37648#issuecomment-1236241254

   should we merge this one ?  @Ngone51 @mridulm 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikun commented on pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
Yikun commented on PR #37648:
URL: https://github.com/apache/spark/pull/37648#issuecomment-1238978278

   pls rebase this PR, after https://github.com/apache/spark/pull/37815 merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r963299931


##########
resource-managers/yarn/src/test/scala/org/apache/spark/network/shuffle/ShuffleTestAccessor.scala:
##########
@@ -208,9 +208,12 @@ object ShuffleTestAccessor {
   }
 
   def reloadRegisteredExecutors(
-    dbBackend: DBBackend,
-    file: File): ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
-    val db = DBProvider.initDB(dbBackend, file)
+      dbBackend: DBBackend,
+      file: File,
+      version: StoreVersion,
+      mapper: ObjectMapper)
+    : ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
+    val db = DBProvider.initDB(dbBackend, file, version, mapper)

Review Comment:
   https://github.com/apache/spark/blob/19b1780c4abed984b2223e6bbe7998b32a2ebad8/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnShuffleIntegrationSuite.scala#L155-L170
   
   Do you mean the behavior change is `execStateCopy exists but open failed`? The original method will throw an exception directly,  and the new method will return a new instance?  It seems that the case just to test whether an existing `execStateCopy` can be loaded again?  
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r959301662


##########
resource-managers/yarn/src/test/scala/org/apache/spark/network/shuffle/ShuffleTestAccessor.scala:
##########
@@ -220,4 +219,21 @@ object ShuffleTestAccessor {
       db: DB): ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
     ExternalShuffleBlockResolver.reloadRegisteredExecutors(db)
   }
+
+  private def initDB(dbBackend: DBBackend, file: File): DB = {
+    if (file != null) {
+      // TODO: SPARK-38888, add rocksdb implementation.
+      dbBackend match {
+        case DBBackend.LEVELDB =>
+          val options = new org.iq80.leveldb.Options()
+          options.createIfMissing(true)
+          val factory = new org.fusesource.leveldbjni.JniDBFactory()
+          new LevelDB(factory.open(file, options))
+        case _ =>
+          throw new IllegalArgumentException("Unsupported DBBackend: " + dbBackend)
+      }
+    } else {
+      null
+    }

Review Comment:
   OK, let me revert this change first and then see if there is a better way
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r959341662


##########
resource-managers/yarn/src/test/scala/org/apache/spark/network/shuffle/ShuffleTestAccessor.scala:
##########
@@ -220,4 +219,21 @@ object ShuffleTestAccessor {
       db: DB): ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
     ExternalShuffleBlockResolver.reloadRegisteredExecutors(db)
   }
+
+  private def initDB(dbBackend: DBBackend, file: File): DB = {
+    if (file != null) {
+      // TODO: SPARK-38888, add rocksdb implementation.
+      dbBackend match {
+        case DBBackend.LEVELDB =>
+          val options = new org.iq80.leveldb.Options()
+          options.createIfMissing(true)
+          val factory = new org.fusesource.leveldbjni.JniDBFactory()
+          new LevelDB(factory.open(file, options))
+        case _ =>
+          throw new IllegalArgumentException("Unsupported DBBackend: " + dbBackend)
+      }
+    } else {
+      null
+    }

Review Comment:
   do some check, I think we can directly use another `initDB` and delete this one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r961638092


##########
resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala:
##########
@@ -45,15 +45,14 @@ import org.apache.spark.SecurityManager
 import org.apache.spark.SparkFunSuite
 import org.apache.spark.internal.config._
 import org.apache.spark.network.server.BlockPushNonFatalFailure
-import org.apache.spark.network.shuffle.{MergedShuffleFileManager, NoOpMergedShuffleFileManager, RemoteBlockPushResolver, ShuffleTestAccessor}
+import org.apache.spark.network.shuffle.{Constants, MergedShuffleFileManager, NoOpMergedShuffleFileManager, RemoteBlockPushResolver, ShuffleTestAccessor}
 import org.apache.spark.network.shuffle.RemoteBlockPushResolver._
 import org.apache.spark.network.shuffle.protocol.ExecutorShuffleInfo
 import org.apache.spark.network.shuffledb.DBBackend
 import org.apache.spark.network.util.TransportConf
 import org.apache.spark.network.yarn.util.HadoopConfigProvider
 import org.apache.spark.tags.ExtendedLevelDBTest
 import org.apache.spark.util.Utils
-

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r964327262


##########
resource-managers/yarn/src/test/scala/org/apache/spark/network/shuffle/ShuffleTestAccessor.scala:
##########
@@ -208,9 +208,12 @@ object ShuffleTestAccessor {
   }
 
   def reloadRegisteredExecutors(
-    dbBackend: DBBackend,
-    file: File): ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
-    val db = DBProvider.initDB(dbBackend, file)
+      dbBackend: DBBackend,
+      file: File,
+      version: StoreVersion,
+      mapper: ObjectMapper)
+    : ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
+    val db = DBProvider.initDB(dbBackend, file, version, mapper)

Review Comment:
   OK to me, let me file a new jira to tracking this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikun commented on pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
Yikun commented on PR #37648:
URL: https://github.com/apache/spark/pull/37648#issuecomment-1239131114

   You can retrigger the ci, the issue github already fix the https://www.githubstatus.com/incidents/d181frs643d4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r964335790


##########
resource-managers/yarn/src/test/scala/org/apache/spark/network/shuffle/ShuffleTestAccessor.scala:
##########
@@ -208,9 +208,12 @@ object ShuffleTestAccessor {
   }
 
   def reloadRegisteredExecutors(
-    dbBackend: DBBackend,
-    file: File): ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
-    val db = DBProvider.initDB(dbBackend, file)
+      dbBackend: DBBackend,
+      file: File,
+      version: StoreVersion,
+      mapper: ObjectMapper)
+    : ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
+    val db = DBProvider.initDB(dbBackend, file, version, mapper)

Review Comment:
   @mridulm add [SPARK-40364](https://issues.apache.org/jira/browse/SPARK-40364) to tracking this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r964336980


##########
resource-managers/yarn/src/test/scala/org/apache/spark/network/shuffle/ShuffleTestAccessor.scala:
##########
@@ -208,9 +208,12 @@ object ShuffleTestAccessor {
   }
 
   def reloadRegisteredExecutors(
-    dbBackend: DBBackend,
-    file: File): ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
-    val db = DBProvider.initDB(dbBackend, file)
+      dbBackend: DBBackend,
+      file: File,
+      version: StoreVersion,
+      mapper: ObjectMapper)
+    : ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
+    val db = DBProvider.initDB(dbBackend, file, version, mapper)

Review Comment:
   I will give a separately pr to continue to explore the feasibility about this change
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37648: [WIP][SPARK-38909][BUILD][CORE][YARN][FOLLOWUP]

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r955688868


##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java:
##########
@@ -125,15 +125,17 @@ public ShuffleIndexInformation load(String filePath) throws IOException {
       .weigher((Weigher<String, ShuffleIndexInformation>)
         (filePath, indexInfo) -> indexInfo.getRetainedMemorySize())
       .build(indexCacheLoader);
-    DBBackend dbBackend = null;
+    DBBackend dbBackend;
     if (registeredExecutorFile != null) {

Review Comment:
   hmm... `registeredExecutorFile` may be null, for example, the following code path:
   
   https://github.com/apache/spark/blob/68c47d5e74b8be481318388b3cc8b40ead35beea/core/src/main/scala/org/apache/spark/deploy/ExternalShuffleService.scala#L81-L92
   
   https://github.com/apache/spark/blob/68c47d5e74b8be481318388b3cc8b40ead35beea/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java#L79-L84
   
   However, without adding this condition, the result will not change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37648:
URL: https://github.com/apache/spark/pull/37648#issuecomment-1229703838

   > Removed [WIP], and need @Ngone51 confirm whether there are new comments of #36200.
   > 
   > And I have a point to discuss:
   > 
   > There are two `initLevelDB` method in `LevelDBProvider`, one of them is only for testing code and only used by `org.apache.spark.network.shuffle.ShuffleTestAccessor` now, so should we move the test only `initLevelDB` method to `ShuffleTestAccessor`? The implementation of RocksDB will also encounter this issue
   
   @Ngone51 @mridulm  [7a22aeb](https://github.com/apache/spark/pull/37648/commits/7a22aebf2bb3121ce52bc2c3939d3593e6d21a16) Move `initDB` method used only by test back to `ShuffleTestAccessor`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a diff in pull request #37648: [WIP][SPARK-38909][BUILD][CORE][YARN][FOLLOWUP]

Posted by GitBox <gi...@apache.org>.
mridulm commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r956351701


##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java:
##########
@@ -125,18 +125,13 @@ public ShuffleIndexInformation load(String filePath) throws IOException {
       .weigher((Weigher<String, ShuffleIndexInformation>)
         (filePath, indexInfo) -> indexInfo.getRetainedMemorySize())
       .build(indexCacheLoader);
-    DBBackend dbBackend;
-    if (registeredExecutorFile != null) {
-      String dbBackendName =
-        conf.get(Constants.SHUFFLE_SERVICE_DB_BACKEND, DBBackend.LEVELDB.name());
-      dbBackend = DBBackend.byName(dbBackendName);
+    String dbBackendName =
+      conf.get(Constants.SHUFFLE_SERVICE_DB_BACKEND, DBBackend.LEVELDB.name());
+    DBBackend dbBackend = DBBackend.byName(dbBackendName);
+    db = DBProvider.initDB(dbBackend, this.registeredExecutorFile, CURRENT_VERSION, mapper);

Review Comment:
   Review note: Recovery need not be enabled for node managers - in which case `registeredExecutorFile` will be `null` (in addition to tests).
   
   `DBProvider.initDB` does handle null input though.
   
   So the main change in this file is moving the log message to `if (db != null)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a diff in pull request #37648: [WIP][SPARK-38909][BUILD][CORE][YARN][FOLLOWUP]

Posted by GitBox <gi...@apache.org>.
mridulm commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r956351701


##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java:
##########
@@ -125,18 +125,13 @@ public ShuffleIndexInformation load(String filePath) throws IOException {
       .weigher((Weigher<String, ShuffleIndexInformation>)
         (filePath, indexInfo) -> indexInfo.getRetainedMemorySize())
       .build(indexCacheLoader);
-    DBBackend dbBackend;
-    if (registeredExecutorFile != null) {
-      String dbBackendName =
-        conf.get(Constants.SHUFFLE_SERVICE_DB_BACKEND, DBBackend.LEVELDB.name());
-      dbBackend = DBBackend.byName(dbBackendName);
+    String dbBackendName =
+      conf.get(Constants.SHUFFLE_SERVICE_DB_BACKEND, DBBackend.LEVELDB.name());
+    DBBackend dbBackend = DBBackend.byName(dbBackendName);
+    db = DBProvider.initDB(dbBackend, this.registeredExecutorFile, CURRENT_VERSION, mapper);

Review Comment:
   Review note: Recovery need not be enabled for node managers - in which case `registeredExecutorFile` will be `null` (in addition to tests).
   
   `DBProvider.initDB` does handle null input though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a diff in pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
mridulm commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r964325864


##########
resource-managers/yarn/src/test/scala/org/apache/spark/network/shuffle/ShuffleTestAccessor.scala:
##########
@@ -208,9 +208,12 @@ object ShuffleTestAccessor {
   }
 
   def reloadRegisteredExecutors(
-    dbBackend: DBBackend,
-    file: File): ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
-    val db = DBProvider.initDB(dbBackend, file)
+      dbBackend: DBBackend,
+      file: File,
+      version: StoreVersion,
+      mapper: ObjectMapper)
+    : ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
+    val db = DBProvider.initDB(dbBackend, file, version, mapper)

Review Comment:
   Let us preserve the behavior in this PR, and explore the change in a different jira. Thoughts ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm closed pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
mridulm closed pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db
URL: https://github.com/apache/spark/pull/37648


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a diff in pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
mridulm commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r959299052


##########
resource-managers/yarn/src/test/scala/org/apache/spark/network/shuffle/ShuffleTestAccessor.scala:
##########
@@ -220,4 +219,21 @@ object ShuffleTestAccessor {
       db: DB): ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
     ExternalShuffleBlockResolver.reloadRegisteredExecutors(db)
   }
+
+  private def initDB(dbBackend: DBBackend, file: File): DB = {
+    if (file != null) {
+      // TODO: SPARK-38888, add rocksdb implementation.
+      dbBackend match {
+        case DBBackend.LEVELDB =>
+          val options = new org.iq80.leveldb.Options()
+          options.createIfMissing(true)
+          val factory = new org.fusesource.leveldbjni.JniDBFactory()
+          new LevelDB(factory.open(file, options))
+        case _ =>
+          throw new IllegalArgumentException("Unsupported DBBackend: " + dbBackend)
+      }
+    } else {
+      null
+    }

Review Comment:
   We will end up duplicating logic between `DBProvider` and `ShuffleTestAccessor` - let us consolidate it in `DBProvider`.
   This also impacts the integration test anyway



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a diff in pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

Posted by GitBox <gi...@apache.org>.
mridulm commented on code in PR #37648:
URL: https://github.com/apache/spark/pull/37648#discussion_r964072617


##########
resource-managers/yarn/src/test/scala/org/apache/spark/network/shuffle/ShuffleTestAccessor.scala:
##########
@@ -208,9 +208,12 @@ object ShuffleTestAccessor {
   }
 
   def reloadRegisteredExecutors(
-    dbBackend: DBBackend,
-    file: File): ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
-    val db = DBProvider.initDB(dbBackend, file)
+      dbBackend: DBBackend,
+      file: File,
+      version: StoreVersion,
+      mapper: ObjectMapper)
+    : ConcurrentMap[ExternalShuffleBlockResolver.AppExecId, ExecutorShuffleInfo] = {
+    val db = DBProvider.initDB(dbBackend, file, version, mapper)

Review Comment:
   @squito or @tgravescs should have more context about this, and comment better - looking at the test, I would expect this test code path to additionally test if the DB is missing/invalid/etc and fail in case there is any issues (due to lack of error handling/fallback in `LevelDBProvider.initLevelDB`) - which changes with in PR, that no longer happens; right ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org