You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "pan3793 (via GitHub)" <gi...@apache.org> on 2024/03/04 11:54:29 UTC

[PR] [DO-NOT-MERGE] Test Hive pre-2.3.10 [spark]

pan3793 opened a new pull request, #45372:
URL: https://github.com/apache/spark/pull/45372

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'common/utils/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   This PR aims to do a pre-test before the Apache Hive 2.3.10 official release, it includes
   
   - it should unblock Spark to upgrade to Guava 32+
   - it should unblock Spark to upgrade to Thrift 0.16+
   - it should allow Spark to drop Jackson 1.x while Hive UDF still works well
   - it should allow Spark to drop `javax.servlet`
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   Pre-integration test of Hive 2.3.10 before official release.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   Pass GA.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   <!--
   If generative AI tooling has been used in the process of authoring this patch, please include the
   phrase: 'Generated-by: ' followed by the name of the tool and its version.
   If no, write 'No'.
   Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [DO-NOT-MERGE] Test Hive pre-2.3.10 [spark]

Posted by "sunchao (via GitHub)" <gi...@apache.org>.
sunchao commented on PR #45372:
URL: https://github.com/apache/spark/pull/45372#issuecomment-1977151289

   Thanks, that's good to know. I was informed that I should consider the following JIRAs to. Plan to check soon.
   
   * https://issues.apache.org/jira/browse/HIVE-25665
   * https://issues.apache.org/jira/browse/HIVE-27466
   * https://issues.apache.org/jira/browse/HIVE-27467
   * https://issues.apache.org/jira/browse/HIVE-27468
   * https://issues.apache.org/jira/browse/HIVE-27478
   * https://issues.apache.org/jira/browse/HIVE-27504
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [DO-NOT-MERGE] Test Hive pre-2.3.10 [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1534204469


##########
pom.xml:
##########
@@ -199,14 +197,14 @@
     <!-- org.apache.commons/commons-pool2/-->
     <commons-pool2.version>2.12.0</commons-pool2.version>
     <datanucleus-core.version>4.1.17</datanucleus-core.version>
-    <guava.version>14.0.1</guava.version>
+    <guava.version>33.0.0-jre</guava.version>

Review Comment:
   We actually cannot avoid exposing vanilla Guava to classpath, except to Spark internal usage, 3-rd party libs still require Guava to work, e.g. Curator, Hive. Then, should we continue to do shading after upgrading?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [DO-NOT-MERGE] Test Apache Hive 2.3.10 RC0 [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on PR #45372:
URL: https://github.com/apache/spark/pull/45372#issuecomment-2067976499

   also cc @wangyum @HyukjinKwon @yaooqinn 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1575125465


##########
project/SparkBuild.scala:
##########
@@ -952,7 +955,7 @@ object Unsafe {
 object DockerIntegrationTests {
   // This serves to override the override specified in DependencyOverrides:
   lazy val settings = Seq(
-    dependencyOverrides += "com.google.guava" % "guava" % "33.0.0-jre"
+    dependencyOverrides += "com.google.guava" % "guava" % "33.1.0-jre"

Review Comment:
   Let me unblock this like 
   - #45088



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [DO-NOT-MERGE] Test Hive pre-2.3.10 [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1512146209


##########
pom.xml:
##########
@@ -199,14 +197,14 @@
     <!-- org.apache.commons/commons-pool2/-->
     <commons-pool2.version>2.12.0</commons-pool2.version>
     <datanucleus-core.version>4.1.17</datanucleus-core.version>
-    <guava.version>14.0.1</guava.version>
+    <guava.version>33.0.0-jre</guava.version>

Review Comment:
   @pan3793 If we upgrade the version of Guava, and it is also packaged into the `newwork-shuffle module` through the shaded+relocation method, then we need to package `com.google.guava:failureaccess` at the same time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [DO-NOT-MERGE] Test Hive pre-2.3.10 [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1512146209


##########
pom.xml:
##########
@@ -199,14 +197,14 @@
     <!-- org.apache.commons/commons-pool2/-->
     <commons-pool2.version>2.12.0</commons-pool2.version>
     <datanucleus-core.version>4.1.17</datanucleus-core.version>
-    <guava.version>14.0.1</guava.version>
+    <guava.version>33.0.0-jre</guava.version>

Review Comment:
   @pan3793 If we upgrade the version of Guava, and it is also packaged into the `newwork-shuffle` and `core` module through the shaded+relocation method, then we need to package `com.google.guava:failureaccess` at the same time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [DO-NOT-MERGE] Test Apache Hive 2.3.10 RC0 [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574062976


##########
sql/hive/src/test/java/org/apache/spark/sql/hive/test/Complex.java:
##########
@@ -16,7 +16,7 @@
  */
 package org.apache.spark.sql.hive.test;
 
-import org.apache.commons.lang.builder.HashCodeBuilder;
+import org.apache.commons.lang3.builder.HashCodeBuilder;

Review Comment:
   I think this can be submitted as an independent pr. Also, do you know why check-style didn't detect this issue?
   
   https://github.com/apache/spark/blob/9f34b8eca2f3578367352ec28532e29fe3a91864/dev/checkstyle.xml#L199-L202



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [DO-NOT-MERGE] Test Hive pre-2.3.10 [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #45372:
URL: https://github.com/apache/spark/pull/45372#issuecomment-1977981587

   happy to see this hive upgrade, thanks to @pan3793  and @sunchao 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574127899


##########
sql/hive/src/test/java/org/apache/spark/sql/hive/test/Complex.java:
##########
@@ -16,7 +16,7 @@
  */
 package org.apache.spark.sql.hive.test;
 
-import org.apache.commons.lang.builder.HashCodeBuilder;
+import org.apache.commons.lang3.builder.HashCodeBuilder;

Review Comment:
   opened https://github.com/apache/spark/pull/46154



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [DO-NOT-MERGE] Test Hive pre-2.3.10 [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on PR #45372:
URL: https://github.com/apache/spark/pull/45372#issuecomment-2012818437

   @sunchao I checked the JIRA tickets you listed, all are related to LICENSE, I created those PRs except for the last one (obviously it has conflicts with previous patches), should be easy to review, please take a look when you have time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574094232


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala:
##########
@@ -1626,6 +1625,7 @@ class HiveQuerySuite extends HiveComparisonTest with SQLTestUtils with BeforeAnd
 
   test("SPARK-33084: Add jar support Ivy URI in SQL") {
     val testData = TestHive.getHiveFile("data/files/sample.json").toURI
+    val hiveVersion = "2.3.9" // TODO remove after Hive 2.3.10 jar available in maven central

Review Comment:
   Ditto.  Do you know why `DEFAULT_ARTIFACT_REPOSITORY` change isn't enough here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1575140785


##########
dev/test-dependencies.sh:
##########
@@ -49,7 +49,7 @@ OLD_VERSION=$($MVN -q \
     --non-recursive \
     org.codehaus.mojo:exec-maven-plugin:1.6.0:exec | grep -E '[0-9]+\.[0-9]+\.[0-9]+')
 # dependency:get for guava and jetty-io are workaround for SPARK-37302.
-GUAVA_VERSION=$(build/mvn help:evaluate -Dexpression=guava.version -q -DforceStdout | grep -E "^[0-9.]+$")
+GUAVA_VERSION=$(build/mvn help:evaluate -Dexpression=guava.version -q -DforceStdout | grep -E "^[0-9\.]+")

Review Comment:
   Just a question. Why do we need this change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1575536345


##########
dev/test-dependencies.sh:
##########
@@ -49,7 +49,7 @@ OLD_VERSION=$($MVN -q \
     --non-recursive \
     org.codehaus.mojo:exec-maven-plugin:1.6.0:exec | grep -E '[0-9]+\.[0-9]+\.[0-9]+')
 # dependency:get for guava and jetty-io are workaround for SPARK-37302.
-GUAVA_VERSION=$(build/mvn help:evaluate -Dexpression=guava.version -q -DforceStdout | grep -E "^[0-9.]+$")
+GUAVA_VERSION=$(build/mvn help:evaluate -Dexpression=guava.version -q -DforceStdout | grep -E "^[0-9\.]+")

Review Comment:
   The recent Guava version has the suffix `-jre` or `-android`, the change is required to match the new version



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574143381


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala:
##########
@@ -1627,10 +1627,8 @@ class HiveQuerySuite extends HiveComparisonTest with SQLTestUtils with BeforeAnd
   test("SPARK-33084: Add jar support Ivy URI in SQL") {
     val testData = TestHive.getHiveFile("data/files/sample.json").toURI
     withTable("t") {
-      // hive-catalog-core has some transitive dependencies which dont exist on maven central

Review Comment:
   The "some transitive dependencies which dont exist on maven central" is "org.pentaho:pentaho-aggdesigner-algorithm", it's not a problem in Hive 2.3.10, see https://github.com/apache/hive/pull/4486



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [DO-NOT-MERGE] Test Apache Hive 2.3.10 RC0 [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574066564


##########
project/SparkBuild.scala:
##########
@@ -952,7 +955,7 @@ object Unsafe {
 object DockerIntegrationTests {
   // This serves to override the override specified in DependencyOverrides:
   lazy val settings = Seq(
-    dependencyOverrides += "com.google.guava" % "guava" % "33.0.0-jre"
+    dependencyOverrides += "com.google.guava" % "guava" % "33.1.0-jre"

Review Comment:
   fine to me



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574093194


##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala:
##########
@@ -3748,7 +3748,7 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
 
   test("SPARK-33084: Add jar support Ivy URI in SQL") {
     val sc = spark.sparkContext
-    val hiveVersion = "2.3.9"
+    val hiveVersion = "2.3.9" // TODO update to 2.3.10 after jar available in maven central

Review Comment:
   I'm not sure if this TODO is really required when we have a staging repository, @pan3793 .
   
   To be sure, we need to pass all the tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #45372:
URL: https://github.com/apache/spark/pull/45372#issuecomment-2096652579

   Could you update the PR description with RC1 link, @pan3793 ?
   - https://lists.apache.org/thread/3ton9oxlkkb1v7tbqqw1rhjpz9y8ymv5


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #45372:
URL: https://github.com/apache/spark/pull/45372#issuecomment-2070373629

   For Docker Integration test dependency, I made the following PR with @pan3793 's authorship. 
   - https://github.com/apache/spark/pull/46167


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574342250


##########
common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala:
##########
@@ -192,6 +192,13 @@ private[spark] object MavenUtils extends Logging {
       sys.env.getOrElse("DEFAULT_ARTIFACT_REPOSITORY", "https://repos.spark-packages.org/"))
     sp.setName("spark-packages")
     cr.add(sp)
+
+    val staging: IBiblioResolver = new IBiblioResolver
+    staging.setM2compatible(true)
+    staging.setUsepoms(true)
+    staging.setRoot("https://repository.apache.org/content/repositories/orgapachehive-1128/")
+    staging.setName("hive-staging-repo")

Review Comment:
   will be removed



##########
common/utils/src/test/scala/org/apache/spark/util/MavenUtilsSuite.scala:
##########
@@ -78,11 +78,12 @@ class MavenUtilsSuite
     val settings = new IvySettings
     val res1 = MavenUtils.createRepoResolvers(settings.getDefaultIvyUserDir)
     // should have central and spark-packages by default
-    assert(res1.getResolvers.size() === 4)
+    assert(res1.getResolvers.size() === 5)
     assert(res1.getResolvers.get(0).asInstanceOf[IBiblioResolver].getName === "local-m2-cache")
     assert(res1.getResolvers.get(1).asInstanceOf[FileSystemResolver].getName === "local-ivy-cache")
     assert(res1.getResolvers.get(2).asInstanceOf[IBiblioResolver].getName === "central")
     assert(res1.getResolvers.get(3).asInstanceOf[IBiblioResolver].getName === "spark-packages")
+    assert(res1.getResolvers.get(4).asInstanceOf[IBiblioResolver].getName === "hive-staging-repo")

Review Comment:
   will be removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [DO-NOT-MERGE] Test Apache Hive 2.3.10 RC0 [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574056705


##########
dev/deps/spark-deps-hadoop-3-hive-2.3:
##########
@@ -33,6 +33,7 @@ breeze-macros_2.13/2.1.0//breeze-macros_2.13-2.1.0.jar
 breeze_2.13/2.1.0//breeze_2.13-2.1.0.jar
 bundle/2.24.6//bundle-2.24.6.jar
 cats-kernel_2.13/2.8.0//cats-kernel_2.13-2.8.0.jar
+checker-qual/3.42.0//checker-qual-3.42.0.jar

Review Comment:
   Is this a cascading dependency brought by the new version of Guava?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [DO-NOT-MERGE] Test Apache Hive 2.3.10 RC0 [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574059438


##########
project/SparkBuild.scala:
##########
@@ -952,7 +955,7 @@ object Unsafe {
 object DockerIntegrationTests {
   // This serves to override the override specified in DependencyOverrides:
   lazy val settings = Seq(
-    dependencyOverrides += "com.google.guava" % "guava" % "33.0.0-jre"
+    dependencyOverrides += "com.google.guava" % "guava" % "33.1.0-jre"

Review Comment:
   Will similar configurations still be needed after this pr?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [DO-NOT-MERGE] Test Hive pre-2.3.10 [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #45372:
URL: https://github.com/apache/spark/pull/45372#issuecomment-1977230837

   Thank you for making progress, @pan3793 and @sunchao .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [DO-NOT-MERGE] Test Apache Hive 2.3.10 RC0 [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574065984


##########
project/SparkBuild.scala:
##########
@@ -952,7 +955,7 @@ object Unsafe {
 object DockerIntegrationTests {
   // This serves to override the override specified in DependencyOverrides:
   lazy val settings = Seq(
-    dependencyOverrides += "com.google.guava" % "guava" % "33.0.0-jre"
+    dependencyOverrides += "com.google.guava" % "guava" % "33.1.0-jre"

Review Comment:
   This PR is mainly of PoC, so I just make minimal changes to verify it works. I think we can unify the Guava version in all places after upgrading Hive 2.3.10, in an independent PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574144603


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala:
##########
@@ -1626,6 +1625,7 @@ class HiveQuerySuite extends HiveComparisonTest with SQLTestUtils with BeforeAnd
 
   test("SPARK-33084: Add jar support Ivy URI in SQL") {
     val testData = TestHive.getHiveFile("data/files/sample.json").toURI
+    val hiveVersion = "2.3.9" // TODO remove after Hive 2.3.10 jar available in maven central

Review Comment:
   I searched `DEFAULT_ARTIFACT_REPOSITORY`, it looks like affects multiple places, I addressed the `ADD JAR` in another approach https://github.com/apache/spark/pull/45372/commits/ec9848f108ce529e5e4f2d55bc66169474a0eb95



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [DO-NOT-MERGE] Test Apache Hive 2.3.10 RC0 [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on PR #45372:
URL: https://github.com/apache/spark/pull/45372#issuecomment-2067976202

   @dongjoon-hyun @sunchao @LuciferYang the integration test with Hive RC0 passed, do you have other concerns or extra cases that need to be verified?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574116717


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala:
##########
@@ -1626,6 +1625,7 @@ class HiveQuerySuite extends HiveComparisonTest with SQLTestUtils with BeforeAnd
 
   test("SPARK-33084: Add jar support Ivy URI in SQL") {
     val testData = TestHive.getHiveFile("data/files/sample.json").toURI
+    val hiveVersion = "2.3.9" // TODO remove after Hive 2.3.10 jar available in maven central

Review Comment:
   I remember it did not work on my first try, let me try again



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [DO-NOT-MERGE] Test Hive pre-2.3.10 [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on PR #45372:
URL: https://github.com/apache/spark/pull/45372#issuecomment-1977144850

   cc @sunchao, so far, the Hive branch-2.3 looks good on the Spark side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574116295


##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala:
##########
@@ -3748,7 +3748,7 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
 
   test("SPARK-33084: Add jar support Ivy URI in SQL") {
     val sc = spark.sparkContext
-    val hiveVersion = "2.3.9"
+    val hiveVersion = "2.3.9" // TODO update to 2.3.10 after jar available in maven central

Review Comment:
   this one is actually irrelevant, see SPARK-47928(https://github.com/apache/spark/pull/46150)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on PR #45372:
URL: https://github.com/apache/spark/pull/45372#issuecomment-2068475033

   @dongjoon-hyun This one is mostly for verification, let's try any ideas on it. I plan to split it into several PRs once Hive 2.3.10 is officially released.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [DO-NOT-MERGE] Test Apache Hive 2.3.10 RC0 [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574066575


##########
sql/hive/src/test/java/org/apache/spark/sql/hive/test/Complex.java:
##########
@@ -16,7 +16,7 @@
  */
 package org.apache.spark.sql.hive.test;
 
-import org.apache.commons.lang.builder.HashCodeBuilder;
+import org.apache.commons.lang3.builder.HashCodeBuilder;

Review Comment:
   not sure, why checkstyle does not complain about that. let me fix it independently.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [DO-NOT-MERGE] Test Apache Hive 2.3.10 RC0 [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574066664


##########
dev/deps/spark-deps-hadoop-3-hive-2.3:
##########
@@ -33,6 +33,7 @@ breeze-macros_2.13/2.1.0//breeze-macros_2.13-2.1.0.jar
 breeze_2.13/2.1.0//breeze_2.13-2.1.0.jar
 bundle/2.24.6//bundle-2.24.6.jar
 cats-kernel_2.13/2.8.0//cats-kernel_2.13-2.8.0.jar
+checker-qual/3.42.0//checker-qual-3.42.0.jar

Review Comment:
   Yes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on PR #45372:
URL: https://github.com/apache/spark/pull/45372#issuecomment-2096665513

   @dongjoon-hyun thanks for checking, updated


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org