You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/05/25 01:57:38 UTC

[GitHub] [spark] manuzhang opened a new pull request, #36658: [SPARK-39278][CORE] Fix backward compatibility of alternative configs of Hadoop Filesystems to access

manuzhang opened a new pull request, #36658:
URL: https://github.com/apache/spark/pull/36658

   ### What changes were proposed in this pull request?
   Change precedence of alternative configs of Hadoop Filesystems to access (`spark.kerberos.access.hadoopFileSystems`).
   
   
   ### Why are the changes needed?
   Before https://github.com/apache/spark/pull/23698, the precedence of configuring Hadoop Filesystems to access is
   ```
   spark.yarn.access.hadoopFileSystems -> spark.yarn.access.namenodes
   ```
   Afterwards, it's
   ```
   spark.kerberos.access.hadoopFileSystems -> spark.yarn.access.namenodes -> spark.yarn.access.hadoopFileSystems
   ```
   When both `spark.yarn.access.hadoopFileSystems` and `spark.yarn.access.namenodes` are configured with different values, the PR will break backward compatibility and cause application failure.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. Fix backward compatibility.
   
   
   ### How was this patch tested?
   Updated UT.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #36658: [SPARK-39278][CORE] Fix backward compatibility of alternative configs of Hadoop Filesystems to access

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #36658: [SPARK-39278][CORE] Fix backward compatibility of alternative configs of Hadoop Filesystems to access
URL: https://github.com/apache/spark/pull/36658


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] manuzhang commented on pull request #36658: [SPARK-39278][CORE] Fix backward compatibility of alternative configs of Hadoop Filesystems to access

Posted by GitBox <gi...@apache.org>.
manuzhang commented on PR #36658:
URL: https://github.com/apache/spark/pull/36658#issuecomment-1136722951

   cc @gaborgsomogyi @vanzin 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gaborgsomogyi commented on pull request #36658: [SPARK-39278][CORE] Fix backward compatibility of alternative configs of Hadoop Filesystems to access

Posted by GitBox <gi...@apache.org>.
gaborgsomogyi commented on PR #36658:
URL: https://github.com/apache/spark/pull/36658#issuecomment-1136868593

   Let's take a look at how the configs were evolving:
   * `spark.yarn.access.namenodes` introduced
   * `spark.yarn.access.hadoopFileSystems` introduced so `spark.yarn.access.namenodes` deprecated
   * `spark.kerberos.access.hadoopFileSystems` introduced so `spark.yarn.access.hadoopFileSystems` deprecated
   
   So from my perspective the correct order is:
   * `spark.yarn.access.namenodes` must be overwritten by `spark.yarn.access.hadoopFileSystems`
   * `spark.yarn.access.namenodes` and `spark.yarn.access.hadoopFileSystems` must be overwritten by `spark.kerberos.access.hadoopFileSystems`
   
   I understand that it was different previously but I wouldn't change it because of the following reasons:
   * If we take a look at the deprecation history then the actual behavior makes sense(newer configs must take precedence). Personally I would consider https://github.com/apache/spark/pull/23698 as a fix.
   * We're fixing things on master branch but there these 2 configs are just deprecated and `spark.kerberos.access.hadoopFileSystems` is the preferred way so I suggest to migrate to that.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] manuzhang commented on pull request #36658: [SPARK-39278][CORE] Fix backward compatibility of alternative configs of Hadoop Filesystems to access

Posted by GitBox <gi...@apache.org>.
manuzhang commented on PR #36658:
URL: https://github.com/apache/spark/pull/36658#issuecomment-1156005030

   > newer configs must take precedence
   
   This is not true currently. Meanwhile, deprecated configs are not removed yet and as long as the configs are used they should not break users' applications.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #36658: [SPARK-39278][CORE] Fix backward compatibility of alternative configs of Hadoop Filesystems to access

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #36658:
URL: https://github.com/apache/spark/pull/36658#issuecomment-1256814940

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] manuzhang commented on a diff in pull request #36658: [SPARK-39278][CORE] Fix backward compatibility of alternative configs of Hadoop Filesystems to access

Posted by GitBox <gi...@apache.org>.
manuzhang commented on code in PR #36658:
URL: https://github.com/apache/spark/pull/36658#discussion_r897550040


##########
core/src/test/scala/org/apache/spark/SparkConfSuite.scala:
##########
@@ -277,8 +277,11 @@ class SparkConfSuite extends SparkFunSuite with LocalSparkContext with ResetSyst
     conf.set("spark.yarn.access.namenodes", "testNode")
     assert(conf.get(KERBEROS_FILESYSTEMS_TO_ACCESS) === Array("testNode"))
 
-    conf.set("spark.yarn.access.hadoopFileSystems", "testNode")

Review Comment:
   I'd like to point out this test is meaningless before this change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org