You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "dongjoon-hyun (via GitHub)" <gi...@apache.org> on 2024/02/24 00:32:01 UTC

[PR] [SPARK-47152][SQL][BUILD] Provide Apache Hive Jackson dependency via a new optional directory [spark]

dongjoon-hyun opened a new pull request, #45237:
URL: https://github.com/apache/spark/pull/45237

   …
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'common/utils/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   <!--
   If generative AI tooling has been used in the process of authoring this patch, please include the
   phrase: 'Generated-by: ' followed by the name of the tool and its version.
   If no, write 'No'.
   Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501324976


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -64,8 +65,16 @@ package object config {
       .stringConf
       .createOptional
 
+  private[spark] val DRIVER_DEFAULT_EXTRA_CLASS_PATH =
+    ConfigBuilder(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH)
+      .internal()
+      .version("4.0.0")
+      .stringConf
+      .createWithDefault(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH_VALUE)
+
   private[spark] val DRIVER_CLASS_PATH =
     ConfigBuilder(SparkLauncher.DRIVER_EXTRA_CLASSPATH)
+      .withPrepended(DRIVER_DEFAULT_EXTRA_CLASS_PATH.key, File.pathSeparator)

Review Comment:
   Given that a company has many many Spark job, it's not simple because we need to add `SparkLauncher.DRIVER_EXTRA_CLASSPATH` and `SparkLauncher.EXECUTOR_EXTRA_CLASSPATH` for every Spark jobs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #45237:
URL: https://github.com/apache/spark/pull/45237#issuecomment-1962303767

   Actually, we are the one who asked that, @pan3793 . 😄 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501334313


##########
dev/make-distribution.sh:
##########
@@ -189,6 +189,12 @@ echo "Build flags: $@" >> "$DISTDIR/RELEASE"
 # Copy jars
 cp "$SPARK_HOME"/assembly/target/scala*/jars/* "$DISTDIR/jars/"
 
+# Only create the hive-jackson directory if they exist.
+for f in "$DISTDIR"/jars/jackson-*-asl-*.jar; do
+  mkdir -p "$DISTDIR"/hive-jackson
+  mv $f "$DISTDIR"/hive-jackson/
+done

Review Comment:
   There are 5 main benefits like `yarn` directory, @viirya .
   
   1. The following Apache Spark deamons will ignore `hive-jackson` directory.
       - Spark Master
       - Spark Worker
       - Spark History Server
   
   2. **Recoverability**: The AS-IS Spark 3 users can achieve the same goal if they delete those two files from Spark's `jar` directory manually. However, it's difficult to recover the deleted files when a production job fails due to Hive UDF. This PR provides more robust and safe way with a configuration.
   
   3. **Communication**: We (and the security team) can easily communicate that `hive-jackson` is not used like `yarn` directory because it's physically split from the distribution. Also, they can delete the directory easily (if they need) without knowing the details of dependency lists inside that directory.
   
   4. **Robustness**: If Apache Spark have everything in `jars`, it's difficult to prevent them from loading. Of course, we may choose a tricky way to filter out from class file lists via naming pattern. It's still less robust in a long term perspective.
   
   5. **Compatibility with `hive-jackson-provided`**:  With the existing `hive-jackson-provided`, this PR provides a cleaner injection point for the provided dependencies. For example, the custom build Jackson dependencies can be placed in `hive-jackson` (after they create this) instead of `jars`. We are very reluctant if someone put their custom jar files into Apache Spark's `jars` directory directly. `hive-jackson` directory could be more recommendable way than copying into Spark's `jars` directory.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501680651


##########
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java:
##########
@@ -271,6 +271,8 @@ Map<String, String> getEffectiveConfig() throws IOException {
       Properties p = loadPropertiesFile();
       p.stringPropertyNames().forEach(key ->
         effectiveConfig.computeIfAbsent(key, p::getProperty));
+      effectiveConfig.putIfAbsent(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH,
+        SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH_VALUE);

Review Comment:
   This is required because when it's loaded from the file. `p.stringPropertyNames()` doesn't have both `SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH`. So, `.forEach` is not invoked at all.
   ```
   Properties p = loadPropertiesFile();
         p.stringPropertyNames().
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #45237:
URL: https://github.com/apache/spark/pull/45237#issuecomment-1962561365

   Thank you so much, @viirya !
   Merged to master for Apache Spark 4.0.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501306714


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -64,8 +65,16 @@ package object config {
       .stringConf
       .createOptional
 
+  private[spark] val DRIVER_DEFAULT_EXTRA_CLASS_PATH =
+    ConfigBuilder(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH)
+      .internal()
+      .version("4.0.0")
+      .stringConf
+      .createWithDefault(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH_VALUE)
+
   private[spark] val DRIVER_CLASS_PATH =
     ConfigBuilder(SparkLauncher.DRIVER_EXTRA_CLASSPATH)
+      .withPrepended(DRIVER_DEFAULT_EXTRA_CLASS_PATH.key, File.pathSeparator)

Review Comment:
   Hmm, so looks like users can simply set `SparkLauncher.DRIVER_EXTRA_CLASSPATH` to achieve same effect, no?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #45237:
URL: https://github.com/apache/spark/pull/45237#issuecomment-1963002775

   > Should we change `mv $f "$DISTDIR"/hive-jackson/` to `mv "$f" "$DISTDIR"/hive-jackson/` ?
   
   Feel free to make a followup, @bjornjorgensen .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on PR #45237:
URL: https://github.com/apache/spark/pull/45237#issuecomment-1962298387

   FYI, I noticed that https://github.com/apache/hive/pull/4564 already cut the Jackson 1.x deps out, pending 2.3.10 release ...
   cc @sunchao @wangyum


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501326520


##########
dev/make-distribution.sh:
##########
@@ -189,6 +189,12 @@ echo "Build flags: $@" >> "$DISTDIR/RELEASE"
 # Copy jars
 cp "$SPARK_HOME"/assembly/target/scala*/jars/* "$DISTDIR/jars/"
 
+# Only create the hive-jackson directory if they exist.
+for f in "$DISTDIR"/jars/jackson-*-asl-*.jar; do
+  mkdir -p "$DISTDIR"/hive-jackson
+  mv $f "$DISTDIR"/hive-jackson/
+done

Review Comment:
   Btw, what's benefit to have separate class path for hive-jackson jars?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501374915


##########
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:
##########
@@ -498,6 +504,8 @@ protected boolean handle(String opt, String value) {
         case DRIVER_MEMORY -> conf.put(SparkLauncher.DRIVER_MEMORY, value);
         case DRIVER_JAVA_OPTIONS -> conf.put(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS, value);
         case DRIVER_LIBRARY_PATH -> conf.put(SparkLauncher.DRIVER_EXTRA_LIBRARY_PATH, value);
+        case DRIVER_DEFAULT_CLASS_PATH ->
+          conf.put(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH, value);

Review Comment:
   Please see here.
   - https://spark.apache.org/docs/latest/configuration.html#runtime-environment
   
   ![Screenshot 2024-02-24 at 01 08 02](https://github.com/apache/spark/assets/9700541/a40a38bb-ed2b-4a2e-83c4-6b81fe4729e0)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501683233


##########
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java:
##########
@@ -271,6 +271,8 @@ Map<String, String> getEffectiveConfig() throws IOException {
       Properties p = loadPropertiesFile();
       p.stringPropertyNames().forEach(key ->
         effectiveConfig.computeIfAbsent(key, p::getProperty));
+      effectiveConfig.putIfAbsent(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH,
+        SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH_VALUE);

Review Comment:
   `getEffectiveConfig` is shared by other classes not only by `SparkSubmitCommandBuilder`. That's the decision why I fixed it in `getEffectiveConfig `.
   ```
   $ git grep getEffectiveConfig
   launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java:  Map<String, String> getEffectiveConfig() throws IOException {
   launcher/src/main/java/org/apache/spark/launcher/InProcessLauncher.java:    if (builder.isClientMode(builder.getEffectiveConfig())) {
   launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java:    return builder.getEffectiveConfig().get(CHILD_PROCESS_LOGGER_NAME);
   launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:    Map<String, String> config = getEffectiveConfig();
   launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:      getEffectiveConfig().get(SparkLauncher.DRIVER_EXTRA_LIBRARY_PATH));
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501679831


##########
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java:
##########
@@ -271,6 +271,8 @@ Map<String, String> getEffectiveConfig() throws IOException {
       Properties p = loadPropertiesFile();
       p.stringPropertyNames().forEach(key ->
         effectiveConfig.computeIfAbsent(key, p::getProperty));
+      effectiveConfig.putIfAbsent(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH,
+        SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH_VALUE);

Review Comment:
   Hmm, why we need to specially deal with `SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH` here? I think Spark configs should be dealt together above:
   
   ```
   p.stringPropertyNames().forEach(key ->
            effectiveConfig.computeIfAbsent(key, p::getProperty));
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "bjornjorgensen (via GitHub)" <gi...@apache.org>.
bjornjorgensen commented on PR #45237:
URL: https://github.com/apache/spark/pull/45237#issuecomment-1962944309

   Should we change `mv $f "$DISTDIR"/hive-jackson/` to `mv "$f" "$DISTDIR"/hive-jackson/` ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501330897


##########
launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java:
##########
@@ -54,6 +54,10 @@ public class SparkLauncher extends AbstractLauncher<SparkLauncher> {
 
   /** Configuration key for the driver memory. */
   public static final String DRIVER_MEMORY = "spark.driver.memory";
+  /** Configuration key for the driver default extra class path. */
+  public static final String DRIVER_DEFAULT_EXTRA_CLASS_PATH =
+    "spark.driver.defaultExtraClassPath";
+  public static final String DRIVER_DEFAULT_EXTRA_CLASS_PATH_VALUE = "hive-jackson/*";

Review Comment:
   Yes, the configuration is simply pointing non-existing directly. As you can see in `make-distribution.sh` and PR description. `hive-jackson` directory is created only when there exist `jackson-*-asl-*.jar`s.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501363507


##########
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java:
##########
@@ -271,6 +271,8 @@ Map<String, String> getEffectiveConfig() throws IOException {
       Properties p = loadPropertiesFile();
       p.stringPropertyNames().forEach(key ->
         effectiveConfig.computeIfAbsent(key, p::getProperty));
+      effectiveConfig.putIfAbsent(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH,
+        SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH_VALUE);

Review Comment:
   Non SparkLauncher triggered (e.g., the Spark deamons you mentioned) classes are not affected, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501680651


##########
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java:
##########
@@ -271,6 +271,8 @@ Map<String, String> getEffectiveConfig() throws IOException {
       Properties p = loadPropertiesFile();
       p.stringPropertyNames().forEach(key ->
         effectiveConfig.computeIfAbsent(key, p::getProperty));
+      effectiveConfig.putIfAbsent(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH,
+        SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH_VALUE);

Review Comment:
   This is required because, when it's loaded from the file, `p.stringPropertyNames()` doesn't have both `SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH`. So, `.forEach` is not invoked at all.
   ```
   Properties p = loadPropertiesFile();
         p.stringPropertyNames().
   ```



##########
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java:
##########
@@ -271,6 +271,8 @@ Map<String, String> getEffectiveConfig() throws IOException {
       Properties p = loadPropertiesFile();
       p.stringPropertyNames().forEach(key ->
         effectiveConfig.computeIfAbsent(key, p::getProperty));
+      effectiveConfig.putIfAbsent(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH,
+        SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH_VALUE);

Review Comment:
   This is required because, when it's loaded from the file, `p.stringPropertyNames()` doesn't have `SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH`. So, `.forEach` is not invoked at all.
   ```
   Properties p = loadPropertiesFile();
         p.stringPropertyNames().
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501334313


##########
dev/make-distribution.sh:
##########
@@ -189,6 +189,12 @@ echo "Build flags: $@" >> "$DISTDIR/RELEASE"
 # Copy jars
 cp "$SPARK_HOME"/assembly/target/scala*/jars/* "$DISTDIR/jars/"
 
+# Only create the hive-jackson directory if they exist.
+for f in "$DISTDIR"/jars/jackson-*-asl-*.jar; do
+  mkdir -p "$DISTDIR"/hive-jackson
+  mv $f "$DISTDIR"/hive-jackson/
+done

Review Comment:
   There are 5 main benefits like `yarn` directory, @viirya .
   
   1. The following Apache Spark deamons (with uses `bin/spark-class`) will ignore `hive-jackson` directory.
       - Spark Master
       - Spark Worker
       - Spark History Server
   
   2. **Recoverability**: The AS-IS Spark 3 users can achieve the same goal if they delete those two files from Spark's `jar` directory manually. However, it's difficult to recover the deleted files when a production job fails due to Hive UDF. This PR provides more robust and safe way with a configuration.
   
   3. **Communication**: We (and the security team) can easily communicate that `hive-jackson` is not used like `yarn` directory because it's physically split from the distribution. Also, they can delete the directory easily (if they need) without knowing the details of dependency lists inside that directory.
   
   4. **Robustness**: If Apache Spark have everything in `jars`, it's difficult to prevent them from loading. Of course, we may choose a tricky way to filter out from class file lists via naming pattern. It's still less robust in a long term perspective.
   
   5. **Compatibility with `hive-jackson-provided`**:  With the existing `hive-jackson-provided`, this PR provides a cleaner injection point for the provided dependencies. For example, the custom build Jackson dependencies can be placed in `hive-jackson` (after they create this) instead of `jars`. We are very reluctant if someone put their custom jar files into Apache Spark's `jars` directory directly. `hive-jackson` directory could be more recommendable way than copying into Spark's `jars` directory.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501363556


##########
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java:
##########
@@ -271,6 +271,8 @@ Map<String, String> getEffectiveConfig() throws IOException {
       Properties p = loadPropertiesFile();
       p.stringPropertyNames().forEach(key ->
         effectiveConfig.computeIfAbsent(key, p::getProperty));
+      effectiveConfig.putIfAbsent(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH,
+        SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH_VALUE);

Review Comment:
   And, same question, why only driver default extra class is specified here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501373835


##########
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:
##########
@@ -498,6 +504,8 @@ protected boolean handle(String opt, String value) {
         case DRIVER_MEMORY -> conf.put(SparkLauncher.DRIVER_MEMORY, value);
         case DRIVER_JAVA_OPTIONS -> conf.put(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS, value);
         case DRIVER_LIBRARY_PATH -> conf.put(SparkLauncher.DRIVER_EXTRA_LIBRARY_PATH, value);
+        case DRIVER_DEFAULT_CLASS_PATH ->
+          conf.put(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH, value);

Review Comment:
   No~ This is `--driver-class-path` which used as a special case because the driver JVM is started already.
   
   We already have more general ones are `spark.executor.extraClassPath` and `spark.driver.extraClassPath` for both driver and executor. This PR extends the above with the following.
   ```
   spark.driver.defaultExtraClassPath
   spark.executor.defaultExtraClassPath
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501682599


##########
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java:
##########
@@ -271,6 +271,8 @@ Map<String, String> getEffectiveConfig() throws IOException {
       Properties p = loadPropertiesFile();
       p.stringPropertyNames().forEach(key ->
         effectiveConfig.computeIfAbsent(key, p::getProperty));
+      effectiveConfig.putIfAbsent(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH,
+        SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH_VALUE);

Review Comment:
   I see. Although this is only used by `SparkSubmitCommandBuilder` but not `SparkClassCommandBuilder` so I wonder if we should directly put it in `SparkSubmitCommandBuilder`.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501324272


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -64,8 +65,16 @@ package object config {
       .stringConf
       .createOptional
 
+  private[spark] val DRIVER_DEFAULT_EXTRA_CLASS_PATH =
+    ConfigBuilder(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH)
+      .internal()
+      .version("4.0.0")
+      .stringConf
+      .createWithDefault(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH_VALUE)
+
   private[spark] val DRIVER_CLASS_PATH =
     ConfigBuilder(SparkLauncher.DRIVER_EXTRA_CLASSPATH)
+      .withPrepended(DRIVER_DEFAULT_EXTRA_CLASS_PATH.key, File.pathSeparator)

Review Comment:
   Yes, right. However, it's a breaking change if we enforce to use it.
   >  so looks like users can simply set SparkLauncher.DRIVER_EXTRA_CLASSPATH to achieve same effect, no?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501325580


##########
launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java:
##########
@@ -54,6 +54,10 @@ public class SparkLauncher extends AbstractLauncher<SparkLauncher> {
 
   /** Configuration key for the driver memory. */
   public static final String DRIVER_MEMORY = "spark.driver.memory";
+  /** Configuration key for the driver default extra class path. */
+  public static final String DRIVER_DEFAULT_EXTRA_CLASS_PATH =
+    "spark.driver.defaultExtraClassPath";
+  public static final String DRIVER_DEFAULT_EXTRA_CLASS_PATH_VALUE = "hive-jackson/*";

Review Comment:
   So even users who use no hive distributions. they will also have this default class patch value in the extra class path, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501334313


##########
dev/make-distribution.sh:
##########
@@ -189,6 +189,12 @@ echo "Build flags: $@" >> "$DISTDIR/RELEASE"
 # Copy jars
 cp "$SPARK_HOME"/assembly/target/scala*/jars/* "$DISTDIR/jars/"
 
+# Only create the hive-jackson directory if they exist.
+for f in "$DISTDIR"/jars/jackson-*-asl-*.jar; do
+  mkdir -p "$DISTDIR"/hive-jackson
+  mv $f "$DISTDIR"/hive-jackson/
+done

Review Comment:
   There are 4 main benefits like `yarn` directory, @viirya .
   
   1. **Recoverability**: The AS-IS Spark 3 users can achieve the same goal if they delete those two files from Spark's `jar` directory manually. However, it's difficult to recover the deleted files when a production job fails due to Hive UDF. This PR provides more robust and safe way with a configuration.
   
   2. **Communication**: We (and the security team) can easily communicate that `hive-jackson` is not used like `yarn` directory because it's physically split from the distribution. Also, they can delete the directory easily (if they need) without knowing the details of dependency lists inside that directory.
   
   3. **Robustness**: If Apache Spark have everything in `jars`, it's difficult to prevent them from loading. Of course, we may choose a tricky way to filter out from class file lists via naming pattern. It's still less robust in a long term perspective.
   
   4. **Compatibility with `hive-jackson-provided`**:  With the existing `hive-jackson-provided`, this PR provides a cleaner injection point for the provided dependencies. For example, the custom build Jackson dependencies can be placed in `hive-jackson` (after they create this) instead of `jars`. We are very reluctant if someone put their custom jar files into Apache Spark's `jars` directory directly. `hive-jackson` directory could be more recommendable way than copying into Spark's `jars` directory.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501679601


##########
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:
##########
@@ -498,6 +504,8 @@ protected boolean handle(String opt, String value) {
         case DRIVER_MEMORY -> conf.put(SparkLauncher.DRIVER_MEMORY, value);
         case DRIVER_JAVA_OPTIONS -> conf.put(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS, value);
         case DRIVER_LIBRARY_PATH -> conf.put(SparkLauncher.DRIVER_EXTRA_LIBRARY_PATH, value);
+        case DRIVER_DEFAULT_CLASS_PATH ->
+          conf.put(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH, value);

Review Comment:
   Hmm, I think I mean `--driver-default-class-path` is added to spark-submit parameter list. How about executor's default class path? Is it configured like other configs (i.e., `-c ...`)? I think driver default class patch should be able to configure like that. Just wondering why needing adding `--driver-default-class-path` especially.
   
   I can see there is only `--driver-class-path` and no `--executor-class-path` too. 🤔 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501680578


##########
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:
##########
@@ -498,6 +504,8 @@ protected boolean handle(String opt, String value) {
         case DRIVER_MEMORY -> conf.put(SparkLauncher.DRIVER_MEMORY, value);
         case DRIVER_JAVA_OPTIONS -> conf.put(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS, value);
         case DRIVER_LIBRARY_PATH -> conf.put(SparkLauncher.DRIVER_EXTRA_LIBRARY_PATH, value);
+        case DRIVER_DEFAULT_CLASS_PATH ->
+          conf.put(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH, value);

Review Comment:
   Yea, `-c spark.driver.defaultExtraClassPath=""` should be same as `--driver-default-class-path ""`, so I wonder why we need it especially? And also what's the case we cannot specify it through `-c spark.driver.defaultExtraClassPath`?
   
   It's probably okay as we already have `--driver-class-path` so this addition follows it. Just a question if you know the exact reason.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501681695


##########
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:
##########
@@ -498,6 +504,8 @@ protected boolean handle(String opt, String value) {
         case DRIVER_MEMORY -> conf.put(SparkLauncher.DRIVER_MEMORY, value);
         case DRIVER_JAVA_OPTIONS -> conf.put(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS, value);
         case DRIVER_LIBRARY_PATH -> conf.put(SparkLauncher.DRIVER_EXTRA_LIBRARY_PATH, value);
+        case DRIVER_DEFAULT_CLASS_PATH ->
+          conf.put(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH, value);

Review Comment:
   `--driver-class-path` is required because Spark's Driver JVMs are started before loading `Spark`'s property files or config file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501363073


##########
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:
##########
@@ -498,6 +504,8 @@ protected boolean handle(String opt, String value) {
         case DRIVER_MEMORY -> conf.put(SparkLauncher.DRIVER_MEMORY, value);
         case DRIVER_JAVA_OPTIONS -> conf.put(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS, value);
         case DRIVER_LIBRARY_PATH -> conf.put(SparkLauncher.DRIVER_EXTRA_LIBRARY_PATH, value);
+        case DRIVER_DEFAULT_CLASS_PATH ->
+          conf.put(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH, value);

Review Comment:
   Hmm, why spark-submit can only specify driver default extra class path, cannot it specify executor default extra class path?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501374915


##########
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:
##########
@@ -498,6 +504,8 @@ protected boolean handle(String opt, String value) {
         case DRIVER_MEMORY -> conf.put(SparkLauncher.DRIVER_MEMORY, value);
         case DRIVER_JAVA_OPTIONS -> conf.put(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS, value);
         case DRIVER_LIBRARY_PATH -> conf.put(SparkLauncher.DRIVER_EXTRA_LIBRARY_PATH, value);
+        case DRIVER_DEFAULT_CLASS_PATH ->
+          conf.put(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH, value);

Review Comment:
   Please see here for the detail.
   - https://spark.apache.org/docs/latest/configuration.html#runtime-environment
   
   ![Screenshot 2024-02-24 at 01 08 02](https://github.com/apache/spark/assets/9700541/a40a38bb-ed2b-4a2e-83c4-6b81fe4729e0)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501334313


##########
dev/make-distribution.sh:
##########
@@ -189,6 +189,12 @@ echo "Build flags: $@" >> "$DISTDIR/RELEASE"
 # Copy jars
 cp "$SPARK_HOME"/assembly/target/scala*/jars/* "$DISTDIR/jars/"
 
+# Only create the hive-jackson directory if they exist.
+for f in "$DISTDIR"/jars/jackson-*-asl-*.jar; do
+  mkdir -p "$DISTDIR"/hive-jackson
+  mv $f "$DISTDIR"/hive-jackson/
+done

Review Comment:
   There are 4 main benefits like `yarn` directory, @viirya .
   
   1. **Recoverability**: The AS-IS Spark 3 users can achieve the same goal if they delete those two files from Spark's `jar` directory manually. However, it's difficult to recover the deleted files when a production job fails due to Hive UDF. This PR provides more robust and safe way with a configuration.
   
   2. **Communication**: We (and the security team) can easily communicate that `hive-jackson` is not used like `yarn` directory because it's physically split from the distribution. Also, they can delete the directory easily (if they need) without knowing the details of dependency lists.
   
   3. **Robustness**: If Apache Spark have everything in `jars`, it's difficult to prevent them from loading. Of course, we may choose a tricky way to filter out from class file lists via naming pattern. It's still less robust in a long term perspective.
   
   4. **Compatibility with `hive-jackson-provided`**:  With the existing `hive-jackson-provided`, this provides a cleaner injection point for the provided dependencies. For example, the custom build Jackson dependencies can be placed in `hive-jackson` instead of `jars`. We are very reluctant if someone put their custom jar files into Apache Spark's `jars` directory directly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501681850


##########
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:
##########
@@ -498,6 +504,8 @@ protected boolean handle(String opt, String value) {
         case DRIVER_MEMORY -> conf.put(SparkLauncher.DRIVER_MEMORY, value);
         case DRIVER_JAVA_OPTIONS -> conf.put(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS, value);
         case DRIVER_LIBRARY_PATH -> conf.put(SparkLauncher.DRIVER_EXTRA_LIBRARY_PATH, value);
+        case DRIVER_DEFAULT_CLASS_PATH ->
+          conf.put(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH, value);

Review Comment:
   We cannot create `SparkConf` class if we cannot load Spark's jar files. This is the chicken and egg situation. To solve this issue, we blindly load `jars/*`. So, if we have other jars, we need to use `--driver-class-path`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #45237: [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory
URL: https://github.com/apache/spark/pull/45237


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501334313


##########
dev/make-distribution.sh:
##########
@@ -189,6 +189,12 @@ echo "Build flags: $@" >> "$DISTDIR/RELEASE"
 # Copy jars
 cp "$SPARK_HOME"/assembly/target/scala*/jars/* "$DISTDIR/jars/"
 
+# Only create the hive-jackson directory if they exist.
+for f in "$DISTDIR"/jars/jackson-*-asl-*.jar; do
+  mkdir -p "$DISTDIR"/hive-jackson
+  mv $f "$DISTDIR"/hive-jackson/
+done

Review Comment:
   There are 5 main benefits like `yarn` directory, @viirya .
   
   1. The following Apache Spark deamons (with uses `bin/spark-daemon.sh start`) will ignore `hive-jackson` directory.
       - Spark Master
       - Spark Worker
       - Spark History Server
   ```
   $ grep 'spark-daemon.sh start' *
   start-history-server.sh:exec "${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 1 "$@"
   start-master.sh:"${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 1 \
   start-worker.sh:  "${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS $WORKER_NUM \
   ```
   
   2. **Recoverability**: The AS-IS Spark 3 users can achieve the same goal if they delete those two files from Spark's `jar` directory manually. However, it's difficult to recover the deleted files when a production job fails due to Hive UDF. This PR provides more robust and safe way with a configuration.
   
   3. **Communication**: We (and the security team) can easily communicate that `hive-jackson` is not used like `yarn` directory because it's physically split from the distribution. Also, they can delete the directory easily (if they need) without knowing the details of dependency lists inside that directory.
   
   4. **Robustness**: If Apache Spark have everything in `jars`, it's difficult to prevent them from loading. Of course, we may choose a tricky way to filter out from class file lists via naming pattern. It's still less robust in a long term perspective.
   
   5. **Compatibility with `hive-jackson-provided`**:  With the existing `hive-jackson-provided`, this PR provides a cleaner injection point for the provided dependencies. For example, the custom build Jackson dependencies can be placed in `hive-jackson` (after they create this) instead of `jars`. We are very reluctant if someone put their custom jar files into Apache Spark's `jars` directory directly. `hive-jackson` directory could be more recommendable way than copying into Spark's `jars` directory.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide Apache Hive Jackson dependency via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #45237:
URL: https://github.com/apache/spark/pull/45237#issuecomment-1962186669

   cc @viirya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501334313


##########
dev/make-distribution.sh:
##########
@@ -189,6 +189,12 @@ echo "Build flags: $@" >> "$DISTDIR/RELEASE"
 # Copy jars
 cp "$SPARK_HOME"/assembly/target/scala*/jars/* "$DISTDIR/jars/"
 
+# Only create the hive-jackson directory if they exist.
+for f in "$DISTDIR"/jars/jackson-*-asl-*.jar; do
+  mkdir -p "$DISTDIR"/hive-jackson
+  mv $f "$DISTDIR"/hive-jackson/
+done

Review Comment:
   There are 4 main benefits like `yarn` directory, @viirya .
   
   1. **Recoverability**: The AS-IS Spark 3 users can achieve the same goal if they delete those two files from Spark's `jar` directory manually. However, it's difficult to recover the deleted files when a production job fails due to Hive UDF. This PR provides more robust and safe way with a configuration.
   
   2. **Communication**: We (and the security team) can easily communicate that `hive-jackson` is not used like `yarn` directory because it's physically split from the distribution. Also, they can delete the directory easily (if they need) without knowing the details of dependency lists inside that directory.
   
   3. **Robustness**: If Apache Spark have everything in `jars`, it's difficult to prevent them from loading. Of course, we may choose a tricky way to filter out from class file lists via naming pattern. It's still less robust in a long term perspective.
   
   4. **Compatibility with `hive-jackson-provided`**:  With the existing `hive-jackson-provided`, this provides a cleaner injection point for the provided dependencies. For example, the custom build Jackson dependencies can be placed in `hive-jackson` instead of `jars`. We are very reluctant if someone put their custom jar files into Apache Spark's `jars` directory directly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501679826


##########
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:
##########
@@ -498,6 +504,8 @@ protected boolean handle(String opt, String value) {
         case DRIVER_MEMORY -> conf.put(SparkLauncher.DRIVER_MEMORY, value);
         case DRIVER_JAVA_OPTIONS -> conf.put(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS, value);
         case DRIVER_LIBRARY_PATH -> conf.put(SparkLauncher.DRIVER_EXTRA_LIBRARY_PATH, value);
+        case DRIVER_DEFAULT_CLASS_PATH ->
+          conf.put(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH, value);

Review Comment:
   To @viirya ,
   - In general, `bin/spark-submit ... -c spark.driver.defaultExtraClassPath="" -c spark.executor.defaultExtraClassPath=""` is a way to launch without hive-jackson dependency because `hive-jackson/*` are provided by those configuration. This works for the drivers of `cluster` mode submission and executors of both `client` and `cluster` submission modes always.
   - Apache Spark already provides `--driver-class-path` is used only for the cases where we cannot use `spark.driver.extraClassPath`. In this case, `spark.driver.defaultExtraClassPath` is also not applicable. So, this PR adds ` --driver-default-class-path ""` in the same way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501683488


##########
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java:
##########
@@ -271,6 +271,8 @@ Map<String, String> getEffectiveConfig() throws IOException {
       Properties p = loadPropertiesFile();
       p.stringPropertyNames().forEach(key ->
         effectiveConfig.computeIfAbsent(key, p::getProperty));
+      effectiveConfig.putIfAbsent(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH,
+        SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH_VALUE);

Review Comment:
   Hmm, but I saw you only changed `SparkSubmitCommandBuilder` to explicitly pick it up? If `SparkLauncher.DRIVER_EXTRA_CLASSPATH` is not set at all, these other classes won't use `SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH` too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501684446


##########
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java:
##########
@@ -271,6 +271,8 @@ Map<String, String> getEffectiveConfig() throws IOException {
       Properties p = loadPropertiesFile();
       p.stringPropertyNames().forEach(key ->
         effectiveConfig.computeIfAbsent(key, p::getProperty));
+      effectiveConfig.putIfAbsent(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH,
+        SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH_VALUE);

Review Comment:
   Let me double-check with them `InProcessLauncher` and `SparkLauncher` once more. Last time, I checked it was okay, but your concern rings a bell to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501334313


##########
dev/make-distribution.sh:
##########
@@ -189,6 +189,12 @@ echo "Build flags: $@" >> "$DISTDIR/RELEASE"
 # Copy jars
 cp "$SPARK_HOME"/assembly/target/scala*/jars/* "$DISTDIR/jars/"
 
+# Only create the hive-jackson directory if they exist.
+for f in "$DISTDIR"/jars/jackson-*-asl-*.jar; do
+  mkdir -p "$DISTDIR"/hive-jackson
+  mv $f "$DISTDIR"/hive-jackson/
+done

Review Comment:
   There are 5 main benefits like `yarn` directory, @viirya .
   
   1. Apache Spark deamons will ignore `hive-jackson` directory.
   
   2. **Recoverability**: The AS-IS Spark 3 users can achieve the same goal if they delete those two files from Spark's `jar` directory manually. However, it's difficult to recover the deleted files when a production job fails due to Hive UDF. This PR provides more robust and safe way with a configuration.
   
   3. **Communication**: We (and the security team) can easily communicate that `hive-jackson` is not used like `yarn` directory because it's physically split from the distribution. Also, they can delete the directory easily (if they need) without knowing the details of dependency lists inside that directory.
   
   4. **Robustness**: If Apache Spark have everything in `jars`, it's difficult to prevent them from loading. Of course, we may choose a tricky way to filter out from class file lists via naming pattern. It's still less robust in a long term perspective.
   
   5. **Compatibility with `hive-jackson-provided`**:  With the existing `hive-jackson-provided`, this PR provides a cleaner injection point for the provided dependencies. For example, the custom build Jackson dependencies can be placed in `hive-jackson` (after they create this) instead of `jars`. We are very reluctant if someone put their custom jar files into Apache Spark's `jars` directory directly. `hive-jackson` directory could be more recommendable way than copying into Spark's `jars` directory.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #45237:
URL: https://github.com/apache/spark/pull/45237#issuecomment-1962305486

   To @pan3793 , for the record, we've been waiting for that. There is no ETA for now. In addition, every dependency update has a risk of reverting as you see in the history. No matter what happens with `2.3.10` in the Hive community and in Spark community, we will delete this dependency in Apache Spark 4.0.0.
   ```
   [SPARK-44197][BUILD] Upgrade Hadoop to 3.3.6
   [SPARK-44678][BUILD][3.5] Downgrade Hadoop to 3.3.4
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501376844


##########
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:
##########
@@ -498,6 +504,8 @@ protected boolean handle(String opt, String value) {
         case DRIVER_MEMORY -> conf.put(SparkLauncher.DRIVER_MEMORY, value);
         case DRIVER_JAVA_OPTIONS -> conf.put(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS, value);
         case DRIVER_LIBRARY_PATH -> conf.put(SparkLauncher.DRIVER_EXTRA_LIBRARY_PATH, value);
+        case DRIVER_DEFAULT_CLASS_PATH ->
+          conf.put(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH, value);

Review Comment:
   Let me double-check it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501374189


##########
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java:
##########
@@ -271,6 +271,8 @@ Map<String, String> getEffectiveConfig() throws IOException {
       Properties p = loadPropertiesFile();
       p.stringPropertyNames().forEach(key ->
         effectiveConfig.computeIfAbsent(key, p::getProperty));
+      effectiveConfig.putIfAbsent(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH,
+        SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH_VALUE);

Review Comment:
   Yes, the following `spark-daemon.sh start` are not affected, @viirya .
   
   ```
   $ grep 'spark-daemon.sh start' *
   start-history-server.sh:exec "${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 1 "$@"
   start-master.sh:"${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 1 \
   start-worker.sh:  "${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS $WORKER_NUM \
   ```
   
   This is a command on the driver side. The executors are supposed to use `spark.executor.extraClassPath` and `spark.executor.defaultExtraClassPath`.
   > And, same question, why only driver default extra class is specified here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501375581


##########
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:
##########
@@ -498,6 +504,8 @@ protected boolean handle(String opt, String value) {
         case DRIVER_MEMORY -> conf.put(SparkLauncher.DRIVER_MEMORY, value);
         case DRIVER_JAVA_OPTIONS -> conf.put(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS, value);
         case DRIVER_LIBRARY_PATH -> conf.put(SparkLauncher.DRIVER_EXTRA_LIBRARY_PATH, value);
+        case DRIVER_DEFAULT_CLASS_PATH ->
+          conf.put(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH, value);

Review Comment:
   BTW, is your concern is about that the following is insufficient?
   ```
     private[spark] val EXECUTOR_CLASS_PATH =
       ConfigBuilder(SparkLauncher.EXECUTOR_EXTRA_CLASSPATH)
         .withPrepended(EXECUTOR_DEFAULT_EXTRA_CLASS_PATH.key, File.pathSeparator)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501681409


##########
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java:
##########
@@ -271,6 +271,8 @@ Map<String, String> getEffectiveConfig() throws IOException {
       Properties p = loadPropertiesFile();
       p.stringPropertyNames().forEach(key ->
         effectiveConfig.computeIfAbsent(key, p::getProperty));
+      effectiveConfig.putIfAbsent(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH,
+        SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH_VALUE);

Review Comment:
   Yes, here.
   
   When `getEffectiveConfig` is used like this in `SparkSubmitCommandBuilder.java`, `config` is a Java Map.
   So, `String defaultExtraClassPath = config.get(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH);` become `null` if we don't have this.
   
   ```
       Map<String, String> config = getEffectiveConfig();
       boolean isClientMode = isClientMode(config);
       String extraClassPath = isClientMode ? config.get(SparkLauncher.DRIVER_EXTRA_CLASSPATH) : null;
       String defaultExtraClassPath = config.get(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH);
       if (extraClassPath == null || extraClassPath.trim().isEmpty()) {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501679601


##########
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:
##########
@@ -498,6 +504,8 @@ protected boolean handle(String opt, String value) {
         case DRIVER_MEMORY -> conf.put(SparkLauncher.DRIVER_MEMORY, value);
         case DRIVER_JAVA_OPTIONS -> conf.put(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS, value);
         case DRIVER_LIBRARY_PATH -> conf.put(SparkLauncher.DRIVER_EXTRA_LIBRARY_PATH, value);
+        case DRIVER_DEFAULT_CLASS_PATH ->
+          conf.put(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH, value);

Review Comment:
   Hmm, I think I mean `--driver-default-class-path` is added to spark-submit parameter list. How about executor's default class path? Is it configured like other configs (i.e., `-c ...`)? I think driver default class path should be able to configure like that. Just wondering why needing adding `--driver-default-class-path` especially.
   
   I can see there is only `--driver-class-path` and no `--executor-class-path` too. 🤔 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501679942


##########
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:
##########
@@ -498,6 +504,8 @@ protected boolean handle(String opt, String value) {
         case DRIVER_MEMORY -> conf.put(SparkLauncher.DRIVER_MEMORY, value);
         case DRIVER_JAVA_OPTIONS -> conf.put(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS, value);
         case DRIVER_LIBRARY_PATH -> conf.put(SparkLauncher.DRIVER_EXTRA_LIBRARY_PATH, value);
+        case DRIVER_DEFAULT_CLASS_PATH ->
+          conf.put(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH, value);

Review Comment:
   This is identical with the way of the existing Apache Spark's `spark.*.extraClassPath` way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501680909


##########
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java:
##########
@@ -271,6 +271,8 @@ Map<String, String> getEffectiveConfig() throws IOException {
       Properties p = loadPropertiesFile();
       p.stringPropertyNames().forEach(key ->
         effectiveConfig.computeIfAbsent(key, p::getProperty));
+      effectiveConfig.putIfAbsent(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH,
+        SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH_VALUE);

Review Comment:
   Let me recheck.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501326535


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -64,8 +65,16 @@ package object config {
       .stringConf
       .createOptional
 
+  private[spark] val DRIVER_DEFAULT_EXTRA_CLASS_PATH =
+    ConfigBuilder(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH)
+      .internal()
+      .version("4.0.0")
+      .stringConf
+      .createWithDefault(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH_VALUE)
+
   private[spark] val DRIVER_CLASS_PATH =
     ConfigBuilder(SparkLauncher.DRIVER_EXTRA_CLASSPATH)
+      .withPrepended(DRIVER_DEFAULT_EXTRA_CLASS_PATH.key, File.pathSeparator)

Review Comment:
   Got it. Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45237:
URL: https://github.com/apache/spark/pull/45237#discussion_r1501683233


##########
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java:
##########
@@ -271,6 +271,8 @@ Map<String, String> getEffectiveConfig() throws IOException {
       Properties p = loadPropertiesFile();
       p.stringPropertyNames().forEach(key ->
         effectiveConfig.computeIfAbsent(key, p::getProperty));
+      effectiveConfig.putIfAbsent(SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH,
+        SparkLauncher.DRIVER_DEFAULT_EXTRA_CLASS_PATH_VALUE);

Review Comment:
   `getEffectiveConfig` is shared by other classes not only by `SparkSubmitCommandBuilder`. That's the decision why I fixed it `getEffectiveConfig `.
   ```
   $ git grep getEffectiveConfig
   launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java:  Map<String, String> getEffectiveConfig() throws IOException {
   launcher/src/main/java/org/apache/spark/launcher/InProcessLauncher.java:    if (builder.isClientMode(builder.getEffectiveConfig())) {
   launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java:    return builder.getEffectiveConfig().get(CHILD_PROCESS_LOGGER_NAME);
   launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:    Map<String, String> config = getEffectiveConfig();
   launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java:      getEffectiveConfig().get(SparkLauncher.DRIVER_EXTRA_LIBRARY_PATH));
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org