You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "JoshRosen (via GitHub)" <gi...@apache.org> on 2023/08/21 21:50:17 UTC

[GitHub] [spark] JoshRosen commented on pull request #42493: [SPARK-44811][BUILD] Upgrade Guava to 32+

JoshRosen commented on PR #42493:
URL: https://github.com/apache/spark/pull/42493#issuecomment-1687092403

   Regarding the proposed removal of `IsolatedClientLoader`, I am concerned that doing so might break some users' ability to interface with very old Hive metastores:
   
   Although newer Hive client versions are compatible with a wider range of metastore versions, I am unsure whether they cover _all_ old versions. If users are relying on Hive 0.13, for example, then I am concerned that removing `IsolatedClientLoader` might prevent those users from being able to upgrade Spark without first upgrading their metastore.
   
   As was pointed out upthread in https://github.com/apache/spark/pull/42493#issuecomment-1678857243, though, those old Hive versions require Java 8. If Spark 4.0 drops JDK8 support then we'd already lose support for Hive < 2.0 using out-of-the-box Hive metastore clients.
   
   However, some users run forked versions of old Hive which _do_ support newer JDKs and for them the `IsolatedClientLoader` removal would create pains for their Spark 4.0 upgrade. Since those users are _already_ forking Hive, though, I suppose they could simply recompile their Hive fork against Spark's newer Guava version.
   
   ---
   
   That said, I wonder whether we can decouple the `IsolatedClientLoader` removal decision from the Guava upgrade by removing Guava from `sharedClasses` (previously discussed at  https://github.com/apache/spark/pull/33989#issuecomment-928616327). This could also prove useful in case we uncover some other reason why we might want to continue using `IsolatedClientLoader`.
   
   I think Guava would only need to be shared if we were passing instances of Guava classes across the classloader boundary, but I don't think we're doing that.
   
   I tried removing Guava from `sharedClasses` and used SBT to run `org.apache.spark.sql.hive.client.HiveClientSuites` and tests seem to pass. Given this, maybe we could try kicking off a full CI run to see if we get any failures there? I think it may be worth re-testing here in order to verify our assumptions about Guava and the `IsolatedClientLoader`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org