You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "advancedxy (via GitHub)" <gi...@apache.org> on 2023/05/15 04:04:22 UTC

[GitHub] [spark] advancedxy opened a new pull request, #41168: [SPARK-43454] support substitution for SparkConf's get and getAllWithPrefix

advancedxy opened a new pull request, #41168:
URL: https://github.com/apache/spark/pull/41168

   ### What changes were proposed in this pull request?
   1. sparkConf's getOption/get supports substitution for raw keys
   2. sparkConf's getAllWithPrefix supports substitution for values
   
   ### Why are the changes needed?
   To be consistent between `sparkConf.get(CONFIG_ENTRY_A)` and `sparkConf.get(CONFIG_ENTRY_A.key)`. 
   It's also a todo left for: https://github.com/apache/spark/pull/18394#discussion_r126241078
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. Some values contain env/sys variables, such as `${env:xxx}`, `${sys:yyy}` might be substituted by the real one. 
   However I doubt that's rarely possibel
   
   
   ### How was this patch tested?
   Added new UTs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] nsuke commented on a diff in pull request #41168: [SPARK-43454][CORE] support substitution for SparkConf's get and getAllWithPrefix

Posted by "nsuke (via GitHub)" <gi...@apache.org>.
nsuke commented on code in PR #41168:
URL: https://github.com/apache/spark/pull/41168#discussion_r1194385787


##########
core/src/main/scala/org/apache/spark/SparkConf.scala:
##########
@@ -403,7 +403,7 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria
    */
   def getAllWithPrefix(prefix: String): Array[(String, String)] = {

Review Comment:
   Wouldn't that mean the current inconsistent behavior would only be partially resolved by this PR? We may want to be more explicit about when to substitute and when not to.
   
   A few interesting cases are `spark.hadoop.*` and `spark.hive.*`:
   https://github.com/apache/spark/blob/c8e85eab3fca0e4e5f4bdf9d1d6d1702ecf3fd07/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L506
   https://github.com/apache/spark/blob/c8e85eab3fca0e4e5f4bdf9d1d6d1702ecf3fd07/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L534
   these particular cases may be easily covered by switching to `getAllWithPrefix` though. (assuming that the expected behavior is to do substitution here)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] advancedxy commented on pull request #41168: [SPARK-43454] support substitution for SparkConf's get and getAllWithPrefix

Posted by "advancedxy (via GitHub)" <gi...@apache.org>.
advancedxy commented on PR #41168:
URL: https://github.com/apache/spark/pull/41168#issuecomment-1547370686

   The CI should pass. Would you mind to take a look at this @cloud-fan @vanzin?  Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] advancedxy commented on a diff in pull request #41168: [SPARK-43454][CORE] support substitution for SparkConf's get and getAllWithPrefix

Posted by "advancedxy (via GitHub)" <gi...@apache.org>.
advancedxy commented on code in PR #41168:
URL: https://github.com/apache/spark/pull/41168#discussion_r1194522841


##########
core/src/main/scala/org/apache/spark/SparkConf.scala:
##########
@@ -403,7 +403,7 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria
    */
   def getAllWithPrefix(prefix: String): Array[(String, String)] = {

Review Comment:
   > We may want to be more explicit about when to substitute and when not to. 
   
   Sure and agreed. In my opinion, `getOption` should be substituted which is consistent with `get(xx_config_entry)`. 
   
   `getAll` shall not be substituted, as it's also used to create spark.properties file when submitted by spark-submit to resource managers, if substituted, the replaced value may reference the submit client's environment, which might not work as expected. We may add  `private[spark] def getAllWithSubstitution` method though.
   
   Variants deriving from `getAll`, such as `getAllWithPrefix`, as long as that's a query only operation, should be substituted, so they are consistent with get one by one.
   
   > shouldn't that mean the current inconsistent behavior would only be partially resolved by this PR?
   
   The `spark.hadoop.*` and `spark.hive.*` cases didn't occur to me. I'm not sure whether they should be substituted or not as they are also passed to hadoop/hive configuration, which may already have substitution enabled.
   
   For other cases, we may solve the inconsistent behavior case by case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] advancedxy commented on pull request #41168: [SPARK-43454][CORE] support substitution for SparkConf's get and getAllWithPrefix

Posted by "advancedxy (via GitHub)" <gi...@apache.org>.
advancedxy commented on PR #41168:
URL: https://github.com/apache/spark/pull/41168#issuecomment-1560607516

   @LuciferYang would you mind to take a look at this? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] advancedxy commented on a diff in pull request #41168: [SPARK-43454] support substitution for SparkConf's get and getAllWithPrefix

Posted by "advancedxy (via GitHub)" <gi...@apache.org>.
advancedxy commented on code in PR #41168:
URL: https://github.com/apache/spark/pull/41168#discussion_r1193306532


##########
core/src/main/scala/org/apache/spark/SparkConf.scala:
##########
@@ -403,7 +403,7 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria
    */
   def getAllWithPrefix(prefix: String): Array[(String, String)] = {

Review Comment:
   `getAll` is left intact as it's also used to serialized configs for persistence and distributed between driver and executors. It's too risk to change the behaviour.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] advancedxy commented on a diff in pull request #41168: [SPARK-43454][CORE] support substitution for SparkConf's get and getAllWithPrefix

Posted by "advancedxy (via GitHub)" <gi...@apache.org>.
advancedxy commented on code in PR #41168:
URL: https://github.com/apache/spark/pull/41168#discussion_r1194522841


##########
core/src/main/scala/org/apache/spark/SparkConf.scala:
##########
@@ -403,7 +403,7 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria
    */
   def getAllWithPrefix(prefix: String): Array[(String, String)] = {

Review Comment:
   > We may want to be more explicit about when to substitute and when not to. 
   
   Sure and agreed. In my opinion, `getOption` should be substituted which is consistent with `get(xx_config_entry)`. 
   
   `getAll` shall not be substituted, as it's also used to create spark.properties file when submitted by spark-submit to resource managers, if substituted, the replaced value may reference the submit client's environment, which might not work as expected. We may add  `private[spark] def getAllWithSubstitution` method though.
   
   Variants deriving from `getAll`, such as `getAllWithPrefix`, as long as that's a query only operation, should be substituted by principle, so they are consistent with get one by one. However for some corner cases, we may need to resolve them case by case
   
   > shouldn't that mean the current inconsistent behavior would only be partially resolved by this PR?
   
   The `spark.hadoop.*` and `spark.hive.*` cases didn't occur to me. I'm not sure whether they should be substituted or not as they are also passed to hadoop/hive configuration, which may already have substitution enabled.
   
   For other cases, we may solve the inconsistent behavior case by case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] advancedxy commented on pull request #41168: [SPARK-43454][CORE] support substitution for SparkConf's get and getAllWithPrefix

Posted by "advancedxy (via GitHub)" <gi...@apache.org>.
advancedxy commented on PR #41168:
URL: https://github.com/apache/spark/pull/41168#issuecomment-1549554217

   gently ping @cloud-fan @vanzin.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #41168: [SPARK-43454][CORE] support substitution for SparkConf's get and getAllWithPrefix

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #41168:
URL: https://github.com/apache/spark/pull/41168#issuecomment-1703544873

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #41168: [SPARK-43454][CORE] support substitution for SparkConf's get and getAllWithPrefix

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #41168: [SPARK-43454][CORE] support substitution for SparkConf's get and getAllWithPrefix
URL: https://github.com/apache/spark/pull/41168


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org