You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/28 02:45:14 UTC

[GitHub] [spark] JoshRosen edited a comment on pull request #33989: [SPARK-36676][SQL][BUILD] Create shaded Hive module and upgrade Guava version to 30.1.1-jre

JoshRosen edited a comment on pull request #33989:
URL: https://github.com/apache/spark/pull/33989#issuecomment-928616164


   I think the test failures seen above are due to the tests not picking up the shaded classes: I wouldn't expect to see a reference to non-relocated Guava classnames if shaded Hive was being used by the tests.
   
   The SBT build doesn't perform any shading (because the `sbt-pom-reader` plugin only captures the project dependency structure, not build plugins). To date this hasn't been a problem since Spark's existing shading was only for the benefit of downstream consumers of Spark itself and wasn't necessary for Spark's internal functioning (and official releases are published with Maven, not SBT). With this new `hive-shaded`, though, we'll need to find a way to wire shaded classes into the SBT build.
   
   I think we'll also run into similar problems in the Maven build. According to [Maven's build lifecycle docs](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html):
   
   > `test` - test the compiled source code using a suitable unit testing framework. These tests should not require the code be packaged or deployed
   
   The shading runs during the `package` phase, which is after the test phase, so the tests won't be able to use the shaded artifacts.
   
   The `hive-shaded` module doesn't depend on any source code from Spark. Given this, a clean path forward might be to publish this Maven artifact separately from Spark's main build, then depend on the published artifact. This would be similar to the approach taken in https://github.com/apache/hbase-thirdparty and https://github.com/apache/hadoop-thirdparty . This would require a bunch of new infra setup, though, so if we're considering pursuing this then we should open a new thread on the dev@ mailing list to discuss.
   
   For now, you could test out that approach via manual local publishing: that'd unblock progress on testing the `IsolatedClientLoader` and other things.
   
   Why did the tests pass before? It looks like your most recent commit in c2ec1f8122ea9b871a1679571f0b20009be39cb7 included the `DependencyOverrides` version bump in SparkBuild.scala as well as the change to the `IsolatedClientLoader`. Prior to the `DependencyOverrides` change, the SBT build was using both unshaded Hive and old Guava, so I think the previous successful tests were passing accidentally and that the new failures aren't caused by the `IsolatedClientLoader` change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org