You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/24 23:10:54 UTC

[GitHub] [spark] JoshRosen commented on a change in pull request #34100: [SPARK-36835][FOLLOWUP][BUILD][TEST-HADOOP2.7] Fix maven issue for Hadoop 2.7 profile after enabling dependency reduced pom

JoshRosen commented on a change in pull request #34100:
URL: https://github.com/apache/spark/pull/34100#discussion_r715946127



##########
File path: pom.xml
##########
@@ -3273,7 +3273,7 @@
         <curator.version>2.7.1</curator.version>
         <commons-io.version>2.4</commons-io.version>
         <hadoop-client-api.artifact>hadoop-client</hadoop-client-api.artifact>
-        <hadoop-client-runtime.artifact>hadoop-client</hadoop-client-runtime.artifact>
+        <hadoop-client-runtime.artifact>hadoop-yarn-api</hadoop-client-runtime.artifact>

Review comment:
       Ahhh, this is a clever fix:
   
   Instead of the `hadoop-2.7` profile resulting in a duplicate direct dependency on `hadoop-client`, we now just declare an explicit dependency on one of `hadoop-client`'s transitive dependencies (`hadoop-yarn-api` in this case). Anything which depends on `hadoop-client-runtime.artifact` must also depend on `hadoop-client-api.artifact`, so this doesn't end up changing the set of dependencies pulled in.
   
   It looks like we didn't need to do that for `hadoop-client-minicluster.artifact` because that's only used in the `resource-managers/yarn` POM and that's already using Maven profiles to control the dependency selection (so the other workaround is less invasive in that context). In principle, though, I guess we could have changed that to some other transitive dep.
   
   ---
   
   Could you maybe add a one or two line comment above these Hadoop 2.7 lines to explain what's going on? And maybe edit the comment at https://github.com/apache/spark/blob/d73562ed3635bb3454ac67029ca6541b30ae0c02/pom.xml#L251-L255 to reflect this change? This fix is clever but a little subtle, so I think a comment calling it out (and maybe mentioning SPARK-36835) might help future readers.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org