You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "LuciferYang (via GitHub)" <gi...@apache.org> on 2023/12/04 08:26:59 UTC

[PR] [SPARK-46237][SQL][TESTS] Fix test failed of `HiveDDLSuite` [spark]

LuciferYang opened a new pull request, #44153:
URL: https://github.com/apache/spark/pull/44153

### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

### Was this patch authored or co-authored using generative AI tooling?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46237][SQL][TESTS] Make `HiveDDLSuite` independently testable [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.

LuciferYang commented on code in PR #44153:
URL: https://github.com/apache/spark/pull/44153#discussion_r1413692187


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala:
##########
@@ -3270,7 +3270,10 @@ class HiveDDLSuite
       val jarName = "TestUDTF.jar"
       val jar = spark.asInstanceOf[TestHiveSparkSession].getHiveFile(jarName).toURI.toString
       spark.sparkContext.allAddedJars.keys.find(_.contains(jarName))
-        .foreach(spark.sparkContext.addedJars("default").remove)
+        .foreach(k => spark.sparkContext.addedJars.get("default") match {

Review Comment:
   fine to me ~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46237][SQL][TESTS] Fix test failed of `HiveDDLSuite` [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.

LuciferYang commented on code in PR #44153:
URL: https://github.com/apache/spark/pull/44153#discussion_r1413534980


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala:
##########
@@ -3270,7 +3270,10 @@ class HiveDDLSuite
       val jarName = "TestUDTF.jar"
       val jar = spark.asInstanceOf[TestHiveSparkSession].getHiveFile(jarName).toURI.toString
       spark.sparkContext.allAddedJars.keys.find(_.contains(jarName))
-        .foreach(spark.sparkContext.addedJars("default").remove)
+        .foreach(k => spark.sparkContext.addedJars.get("default") match {

Review Comment:
   @HyukjinKwon 
   
   I noticed that this logic was added in https://github.com/apache/spark/pull/41495. If I'm not mistaken, this is a defensive deletion to avoid including `TestUDTF.jar` in `default` during testing? However, I found that running only this test:
   
   ```
   build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.HiveDDLSuite" -Phive
   ```
   
   throws an error:
   
   ```
   [info] - SPARK-34261: Avoid side effect if create exists temporary function *** FAILED *** (4 milliseconds)
   [info]   java.util.NoSuchElementException: key not found: default
   ...
   ```
   
   I manually printed the contents of `spark.sparkContext.addedJars` and it's an empty Map.
   
   But when I execute:
   
   ```
   build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.SQLQuerySuite org.apache.spark.sql.hive.execution.HiveDDLSuite" -Phive
   ```
   
   all tests pass, and the content of `spark.sparkContext.addedJars` is:
   
   ```
   Map(default -> Map(spark://localhost:54875/jars/SPARK-21101-1.0.jar -> 1701676986594, spark://localhost:54875/jars/hive-contrib-2.3.9.jar -> 1701676944590, spark://localhost:54875/jars/TestUDTF.jar -> 1701676921340))
   ```
   
   In GitHub Action tests, `SQLQuerySuite` does indeed execute before `HiveDDLSuite`, so the failure is not reproduced.
   
   So in this PR, I added a `case match` to only execute `remove` when `default` is not `None`.
   
   I didn't directly clear all the contents in `spark.sparkContext.allAddedJars` after `SQLQuerySuite` in this PR, because I'm not sure about the impact of this behavior.
   
   
   Do you think this is ok? Or do you have any better suggestions?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46237][SQL][TESTS] Make `HiveDDLSuite` independently testable [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.

dongjoon-hyun commented on PR #44153:
URL: https://github.com/apache/spark/pull/44153#issuecomment-1839065283

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46237][SQL][TESTS] Fix test failed of `HiveDDLSuite` [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.

LuciferYang commented on code in PR #44153:
URL: https://github.com/apache/spark/pull/44153#discussion_r1413534980


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala:
##########
@@ -3270,7 +3270,10 @@ class HiveDDLSuite
       val jarName = "TestUDTF.jar"
       val jar = spark.asInstanceOf[TestHiveSparkSession].getHiveFile(jarName).toURI.toString
       spark.sparkContext.allAddedJars.keys.find(_.contains(jarName))
-        .foreach(spark.sparkContext.addedJars("default").remove)
+        .foreach(k => spark.sparkContext.addedJars.get("default") match {

Review Comment:
   @HyukjinKwon 
   
   I noticed that this logic was added in https://github.com/apache/spark/pull/41495. If I'm not mistaken, this is a defensive deletion to avoid including `TestUDTF.jar` in `default` during testing? However, I found that running only this test:
   
   ```
   build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.HiveDDLSuite" -Phive
   ```
   
   throws an error:
   
   ```
   [info] - SPARK-34261: Avoid side effect if create exists temporary function *** FAILED *** (4 milliseconds)
   [info]   java.util.NoSuchElementException: key not found: default
   [info]   at scala.collection.MapOps.default(Map.scala:274)
   [info]   at scala.collection.MapOps.default$(Map.scala:273)
   [info]   at scala.collection.AbstractMap.default(Map.scala:405)
   [info]   at scala.collection.MapOps.apply(Map.scala:176)
   [info]   at scala.collection.MapOps.apply$(Map.scala:175)
   [info]   at scala.collection.AbstractMap.apply(Map.scala:405)
   [info]   at org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$445(HiveDDLSuite.scala:3275)
   [info]   at org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction(SQLTestUtils.scala:256)
   [info]   at org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction$(SQLTestUtils.scala:254)
   [info]   at org.apache.spark.sql.execution.command.DDLSuite.withUserDefinedFunction(DDLSuite.scala:326)
   [info]   at org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$444(HiveDDLSuite.scala:3267)
   ```
   
   I manually printed the contents of `spark.sparkContext.addedJars` and it's an empty Map.
   
   But when I execute:
   
   ```
   build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.SQLQuerySuite org.apache.spark.sql.hive.execution.HiveDDLSuite" -Phive
   ```
   
   all tests pass, and the content of `spark.sparkContext.addedJars` is:
   
   ```
   Map(default -> Map(spark://localhost:54875/jars/SPARK-21101-1.0.jar -> 1701676986594, spark://localhost:54875/jars/hive-contrib-2.3.9.jar -> 1701676944590, spark://localhost:54875/jars/TestUDTF.jar -> 1701676921340))
   ```
   
   In GitHub Action tests, `SQLQuerySuite` does indeed execute before `HiveDDLSuite`, so the failure is not reproduced.
   
   So in this PR, I added a `case match` to only execute `remove` when `default` is not `None`.
   
   I didn't directly clear all the contents in `spark.sparkContext.allAddedJars` after `SQLQuerySuite` in this PR, because I'm not sure about the impact of this behavior.
   
   
   Do you think this is ok? Or do you have any better suggestions?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46237][SQL][TESTS] Make `HiveDDLSuite` independently testable [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.

LuciferYang commented on PR #44153:
URL: https://github.com/apache/spark/pull/44153#issuecomment-1839065769

   Thanks @dongjoon-hyun @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46237][SQL][TESTS] Make `HiveDDLSuite` independently testable [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.

dongjoon-hyun closed pull request #44153: [SPARK-46237][SQL][TESTS] Make `HiveDDLSuite` independently testable
URL: https://github.com/apache/spark/pull/44153


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46237][SQL][TESTS] Make `HiveDDLSuite` independently testable [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #44153:
URL: https://github.com/apache/spark/pull/44153#discussion_r1413660735


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala:
##########
@@ -3270,7 +3270,10 @@ class HiveDDLSuite
       val jarName = "TestUDTF.jar"
       val jar = spark.asInstanceOf[TestHiveSparkSession].getHiveFile(jarName).toURI.toString
       spark.sparkContext.allAddedJars.keys.find(_.contains(jarName))
-        .foreach(spark.sparkContext.addedJars("default").remove)
+        .foreach(k => spark.sparkContext.addedJars.get("default") match {

Review Comment:
   The change seems fine. Can we do as below though?
   
   ```scala
   foreach(k => spark.sparkContext.addedJars.get("default").foreach(_.remove(k)))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org