You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/12 05:39:43 UTC

[GitHub] [spark] LuciferYang opened a new pull request, #37487: [SPARK-40053][SQL][TESTS] Ignore test of Spark 2.x and restrict test Python version for `HiveExternalCatalogVersionsSuite`

LuciferYang opened a new pull request, #37487:
URL: https://github.com/apache/spark/pull/37487

   ### What changes were proposed in this pull request?
   https://github.com/apache/spark/blob/d4c58159925133771d305cc7ac4f1248f215812c/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala#L269-L273
   
   The `testingVersions` filters as above seems out of date, if run the following command with Java 8 and Python 2.7.x
   
   ```
   build/mvn clean test -Phadoop-3 -Phive -pl sql/hive -Dtest=none -DwildcardSuites=org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite
   ```
   
   `testingVersions` will includes 2.4.8, 3.0.3, 3.1.3,3.2.2 and 3.3.0, the test will `ABORTED` instead of `CANCELED`.
   
   So this pr makes the following changes to the versions filter:
   
   - Ignore test of Spark 2.x test due to Spark 2.4.8 already EOL
   - Force test Python version to 3.7+ due to `Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.7+ and R 3.5+.`
   
   After this pr if test python version < 3.7, the test in `HiveExternalCatalogVersionsSuite` will be ignore.
   
   ### Why are the changes needed?
   HiveExternalCatalogVersionsSuite should not test Spark 2.4.8 and should not `ABORTED` when python is unavailable.
   
   ### Does this PR introduce _any_ user-facing change?
   No, just for test.
   
   
   ### How was this patch tested?
   
   - Pass GitHub Actions
   - Manual test with Java 8 and python3 is unavailable.
   
   ```
   build/mvn clean test -Phadoop-3 -Phive -pl sql/hive -Dtest=none -DwildcardSuites=org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite
   ```
   
   **Before**
   
   ```
   org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED ***
     Exception encountered when invoking run on a nested suite - spark-submit returned with exit code 1.
     Command line: '/home/disk0/spark-source/spark/sql/hive/target/tmp/test-spark-075ffae9-27ce-40bb-82d9-247651ccb6fb/spark-2.4.8/bin/spark-submit' '--name' 'prepare testing tables' '--master' 'local[2]' '--conf' 'spark.ui.enabled=false' '--conf' 'spark.master.rest.enabled=false' '--conf' 'spark.sql.hive.metastore.version=1.2' '--conf' 'spark.sql.hive.metastore.jars=maven' '--conf' 'spark.sql.warehouse.dir=/home/disk0/spark-source/spark/sql/hive/target/tmp/warehouse-f885ee00-27c4-4db2-92a0-3d6131275c8b' '--conf' 'spark.sql.test.version.index=0' '--driver-java-options' '-Dderby.system.home=/home/disk0/spark-source/spark/sql/hive/target/tmp/warehouse-f885ee00-27c4-4db2-92a0-3d6131275c8b -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNN
 AMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED' '/home/disk0/spark-source/spark/sql/hive/target/tmp/test6906586602292435904.py'
     
     2022-08-11 20:10:23.407 - stderr> 22/08/12 11:10:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
     2022-08-11 20:10:23.671 - stdout> Traceback (most recent call last):
     2022-08-11 20:10:23.672 - stdout>   File "/home/disk0/spark-source/spark/sql/hive/target/tmp/test6906586602292435904.py", line 2, in <module>
     2022-08-11 20:10:23.672 - stdout>     from pyspark.sql import SparkSession
     2022-08-11 20:10:23.672 - stdout>   File "/home/disk0/spark-source/spark/sql/hive/target/tmp/test-spark-075ffae9-27ce-40bb-82d9-247651ccb6fb/spark-2.4.8/python/lib/pyspark.zip/pyspark/__init__.py", line 51, in <module>
     2022-08-11 20:10:23.672 - stdout>   File "/home/disk0/spark-source/spark/sql/hive/target/tmp/test-spark-075ffae9-27ce-40bb-82d9-247651ccb6fb/spark-2.4.8/python/lib/pyspark.zip/pyspark/context.py", line 31, in <module>
     2022-08-11 20:10:23.672 - stdout>   File "/home/disk0/spark-source/spark/sql/hive/target/tmp/test-spark-075ffae9-27ce-40bb-82d9-247651ccb6fb/spark-2.4.8/python/lib/pyspark.zip/pyspark/accumulators.py", line 97, in <module>
     2022-08-11 20:10:23.672 - stdout>   File "/home/disk0/spark-source/spark/sql/hive/target/tmp/test-spark-075ffae9-27ce-40bb-82d9-247651ccb6fb/spark-2.4.8/python/lib/pyspark.zip/pyspark/serializers.py", line 72, in <module>
     2022-08-11 20:10:23.672 - stdout>   File "/home/disk0/spark-source/spark/sql/hive/target/tmp/test-spark-075ffae9-27ce-40bb-82d9-247651ccb6fb/spark-2.4.8/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 246, in <module>
     2022-08-11 20:10:23.672 - stdout>   File "/home/disk0/spark-source/spark/sql/hive/target/tmp/test-spark-075ffae9-27ce-40bb-82d9-247651ccb6fb/spark-2.4.8/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 270, in CloudPickler
     2022-08-11 20:10:23.672 - stdout> NameError: name 'memoryview' is not defined
     2022-08-11 20:10:23.682 - stderr> log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
     2022-08-11 20:10:23.682 - stderr> log4j:WARN Please initialize the log4j system properly.
     2022-08-11 20:10:23.682 - stderr> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. (SparkSubmitTestUtils.scala:96)
   ```
   
   **After**
   ```
   HiveExternalCatalogVersionsSuite:
   11:55:34.369 ERROR org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite: Python version <  3.7.0, the running environment is unavailable.
   - backward compatibility !!! CANCELED !!!
     PROCESS_TABLES.isPythonVersionAtLeast37 was false (HiveExternalCatalogVersionsSuite.scala:240)
   11:55:34.528 WARN org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite: 
   
   ===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.hive.HiveExternalCatalogVersionsSuite, threads: Keep-Alive-Timer (daemon=true) =====
   
   Run completed in 2 seconds, 87 milliseconds.
   Total number of tests run: 0
   Suites: completed 2, aborted 0
   Tests: succeeded 0, failed 0, canceled 1, ignored 0, pending 0
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Add `assume` to dynamic cancel cases which requiring Python runtime environment

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37487:
URL: https://github.com/apache/spark/pull/37487#discussion_r946296623


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala:
##########
@@ -201,7 +201,11 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
     // scalastyle:on line.size.limit
 
     if (PROCESS_TABLES.testingVersions.isEmpty) {
-      logError("Fail to get the latest Spark versions to test.")
+      if (PROCESS_TABLES.isPythonVersionAtLeast37) {
+        logError("Fail to get the latest Spark versions to test.")
+      } else {
+        logError("Python version <  3.7.0, the running environment is unavailable.")

Review Comment:
   Yes, you understand right, this is exactly what I want to discuss:
   
   Should we only check the availability of python3, regardless of the version, or should we specify the minimum available version?
   
   I set the condition that `HiveExternalCatalogVersionsSuite` is allowed to be tested as `isPythonVersionAtLeast37 ` in this pr due to found `[Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.7+ and R 3.5+.](https://github.com/apache/spark/blob/master/docs/index.md)` in document , so I also made `Python version < 3.7.0` clear in this message.  Under this condition, if the user uses `Python 3.0 ~ Python 3.6` for testing, they will clearly know that the test is ignored because the python version is too low and not Python 3 is unavailable
   
   
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Ignore test of Spark 2.x and restrict test Python version for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37487:
URL: https://github.com/apache/spark/pull/37487#discussion_r944130393


##########
core/src/main/scala/org/apache/spark/TestUtils.scala:
##########
@@ -281,9 +281,13 @@ private[spark] object TestUtils {
     attempt.isSuccess && attempt.get == 0
   }
 
-  def isPythonVersionAtLeast38(): Boolean = {
+  def isPythonVersionAtLeast37(): Boolean = isPythonVersionAtLeast(3, 7, 0)
+
+  def isPythonVersionAtLeast38(): Boolean = isPythonVersionAtLeast(3, 8, 0)

Review Comment:
   I keep this method, although it has not been called after this pr
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Ignore EOL versions test and restrict test Python version for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on PR #37487:
URL: https://github.com/apache/spark/pull/37487#issuecomment-1213771174

   We keep them for older branch testing, but Apache Spark 3.1.1 was also released on March 2, 2021 and will be EOL soon on September 2nd.
   
   So, we can focus on Apache Spark 3.4/3.3/3.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Add `assume` to dynamic cancel cases which requiring Python runtime environment

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37487:
URL: https://github.com/apache/spark/pull/37487#discussion_r946296623


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala:
##########
@@ -201,7 +201,11 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
     // scalastyle:on line.size.limit
 
     if (PROCESS_TABLES.testingVersions.isEmpty) {
-      logError("Fail to get the latest Spark versions to test.")
+      if (PROCESS_TABLES.isPythonVersionAtLeast37) {
+        logError("Fail to get the latest Spark versions to test.")
+      } else {
+        logError("Python version <  3.7.0, the running environment is unavailable.")

Review Comment:
   Yes, you understand right, this is exactly what I want to discuss:
   
   Should we only check the availability of python3, regardless of the version, or should we specify the minimum available version?
   
   I set the condition that `HiveExternalCatalogVersionsSuite` is allowed to be tested as `isPythonVersionAtLeast37 ` in this pr due to found [Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.7+ and R 3.5+.](https://github.com/apache/spark/blob/master/docs/index.md) in document , so I also made `Python version < 3.7.0` clear in this message.  Under this condition, if the user uses `Python 3.0 ~ Python 3.6` for testing, they will clearly know that the test is ignored because the python version is too low and not Python 3 is unavailable
   
   
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Ignore test of Spark 2.x and restrict test Python version for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37487:
URL: https://github.com/apache/spark/pull/37487#issuecomment-1213666921

   **pr description is out of date, will update it after the code is OK**
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Ignore test of Spark 2.x and restrict test Python version for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37487:
URL: https://github.com/apache/spark/pull/37487#discussion_r944130024


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala:
##########
@@ -267,7 +273,7 @@ object PROCESS_TABLES extends QueryTest with SQLTestUtils {
       case NonFatal(_) => Nil
     }
     versions
-      .filter(v => v.startsWith("3") || !TestUtils.isPythonVersionAtLeast38())
+      .filter(v => v.startsWith("3") && isPythonVersionAtLeast37)

Review Comment:
   Should we strictly restrict the test python version to be greater than 3.7? I manually tested 3.6 and Spark 3.3, also succeed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Ignore test of Spark 2.x and restrict test Python version for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37487:
URL: https://github.com/apache/spark/pull/37487#issuecomment-1212749951

   cc @HyukjinKwon @dongjoon-hyun @srowen 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Ignore test of Spark 2.x and restrict test Python version for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37487:
URL: https://github.com/apache/spark/pull/37487#discussion_r945066651


##########
core/src/main/scala/org/apache/spark/TestUtils.scala:
##########
@@ -281,9 +281,13 @@ private[spark] object TestUtils {
     attempt.isSuccess && attempt.get == 0
   }
 
-  def isPythonVersionAtLeast38(): Boolean = {
+  def isPythonVersionAtLeast37(): Boolean = isPythonVersionAtLeast(3, 7, 0)
+
+  def isPythonVersionAtLeast38(): Boolean = isPythonVersionAtLeast(3, 8, 0)

Review Comment:
   [21ce6c7](https://github.com/apache/spark/pull/37487/commits/21ce6c7fbaf51caa52d00c19e197600c574cd69a) remove this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Ignore EOL versions test and restrict test Python version for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37487:
URL: https://github.com/apache/spark/pull/37487#discussion_r945090727


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala:
##########
@@ -249,11 +254,18 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
 }
 
 object PROCESS_TABLES extends QueryTest with SQLTestUtils {
+  val isPythonVersionAtLeast37 = TestUtils.isPythonVersionAtLeast37()
   val releaseMirror = sys.env.getOrElse("SPARK_RELEASE_MIRROR",
     "https://dist.apache.org/repos/dist/release")
   // Tests the latest version of every release line.
-  val testingVersions: Seq[String] = {
+  val testingVersions: Seq[String] = if (isPythonVersionAtLeast37) {
     import scala.io.Source
+    // SPARK-40053: This set needs to be redefined when there are version releases or EOL.
+    val testableVersions = if (SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17)) {
+      Set("3.2, 3.3")
+    } else {
+      Set("3.1, 3.2, 3.3")

Review Comment:
   ```
   versions.filter(v => !v.startsWith("2.4") && !v.startsWith("3.0"))
       .filter(v => !(v.startsWith("3.1") && SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17)))
   ```
   
   If EOL versions are not deleted from the data source, it seems that the condition `!v.startsWith("2.4") && !v.startsWith("3.0")` will be added continuously, this is similar to
   
   ```scala
   // needs to be redefined eolVersions when there are EOL
   val eolVersions = Set("2.4", "3.0")
   versions.filterNot(v => eolVersions.exists(v.startsWith))
      .filter(v => !(v.startsWith("3.1") && SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17)))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Add `assume` to dynamic cancel cases which requiring Python runtime environment

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #37487:
URL: https://github.com/apache/spark/pull/37487#discussion_r946200456


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala:
##########
@@ -201,7 +201,11 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
     // scalastyle:on line.size.limit
 
     if (PROCESS_TABLES.testingVersions.isEmpty) {
-      logError("Fail to get the latest Spark versions to test.")
+      if (PROCESS_TABLES.isPythonVersionAtLeast37) {
+        logError("Fail to get the latest Spark versions to test.")
+      } else {
+        logError("Python version <  3.7.0, the running environment is unavailable.")

Review Comment:
   It seems that this PR shows `Python version <  3.7.0` when there is no python executable. Did I understand correctly?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Ignore test of Spark 2.x and restrict test Python version for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37487:
URL: https://github.com/apache/spark/pull/37487#discussion_r944130024


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala:
##########
@@ -267,7 +273,7 @@ object PROCESS_TABLES extends QueryTest with SQLTestUtils {
       case NonFatal(_) => Nil
     }
     versions
-      .filter(v => v.startsWith("3") || !TestUtils.isPythonVersionAtLeast38())
+      .filter(v => v.startsWith("3") && isPythonVersionAtLeast37)

Review Comment:
   Should we strictly restrict the test python version to be greater than 3.7? I manually tested 3.6+ Spark 3.3, also succeed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Add `assume` to dynamic cancel cases which requiring Python runtime environment

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37487:
URL: https://github.com/apache/spark/pull/37487#issuecomment-1214587140

   @dongjoon-hyun @srowen @HyukjinKwon I have added two similar cases that need to add `assume(isPythonAvailable)` and update pr description,  and continue to check whether there are any omissions. I will complete all the work in this pr to ensure that the command 
   
   ```
   mvn clean install -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive
   ``` 
   can succeed even when the python 3 running environment is unavailable
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Add `assume` to dynamic cancel cases which requiring Python runtime environment

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37487:
URL: https://github.com/apache/spark/pull/37487#issuecomment-1244978022

   > We keep them for older branch testing, but Apache Spark 3.1.1 was also released on March 2, 2021 and will be EOL soon on September 2nd.
   > 
   > So, we can focus on Apache Spark 3.4/3.3/3.2 from September.
   
   Should we also remove 3.1.x from dist.apache.org?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Simplify `testingVersions` filter conditions and add Python version assume for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on PR #37487:
URL: https://github.com/apache/spark/pull/37487#issuecomment-1214247833

   Thank you so much, @srowen !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Ignore test of Spark 2.x and restrict test Python version for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #37487:
URL: https://github.com/apache/spark/pull/37487#discussion_r945048394


##########
core/src/main/scala/org/apache/spark/TestUtils.scala:
##########
@@ -281,9 +281,13 @@ private[spark] object TestUtils {
     attempt.isSuccess && attempt.get == 0
   }
 
-  def isPythonVersionAtLeast38(): Boolean = {
+  def isPythonVersionAtLeast37(): Boolean = isPythonVersionAtLeast(3, 7, 0)
+
+  def isPythonVersionAtLeast38(): Boolean = isPythonVersionAtLeast(3, 8, 0)

Review Comment:
   Let's remove if this isn't used now in the codebase



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Add `assume` to dynamic cancel cases which requiring Python runtime environment

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37487:
URL: https://github.com/apache/spark/pull/37487#discussion_r946297189


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala:
##########
@@ -201,7 +201,11 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
     // scalastyle:on line.size.limit
 
     if (PROCESS_TABLES.testingVersions.isEmpty) {
-      logError("Fail to get the latest Spark versions to test.")
+      if (PROCESS_TABLES.isPythonVersionAtLeast37) {
+        logError("Fail to get the latest Spark versions to test.")
+      } else {
+        logError("Python version <  3.7.0, the running environment is unavailable.")

Review Comment:
   I'm not sure whether we need to make this restriction. If not, I can weaken this condition



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Add `assume` to dynamic cancel cases which requiring Python runtime environment

Posted by "srowen (via GitHub)" <gi...@apache.org>.
srowen commented on PR #37487:
URL: https://github.com/apache/spark/pull/37487#issuecomment-1427258422

   Yeah I think that's fine now. I can do it, let me remember how


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Add `assume` to dynamic cancel cases which requiring Python runtime environment

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37487:
URL: https://github.com/apache/spark/pull/37487#issuecomment-1214899365

   > @dongjoon-hyun @srowen @HyukjinKwon I have added two similar cases that need to add `assume(isPythonAvailable)` and update pr description, and continue to check whether there are any omissions. I will complete all the work in this pr to ensure that the command
   > 
   > ```
   > mvn clean install -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive
   > ```
   > 
   > can succeed even when the python 3 running environment is unavailable
   
   ```
   [INFO] ------------------------------------------------------------------------
   [INFO] Reactor Summary for Spark Project Parent POM 3.4.0-SNAPSHOT:
   [INFO] 
   [INFO] Spark Project Parent POM ........................... SUCCESS [  3.790 s]
   [INFO] Spark Project Tags ................................. SUCCESS [ 30.902 s]
   [INFO] Spark Project Sketch ............................... SUCCESS [ 24.123 s]
   [INFO] Spark Project Local DB ............................. SUCCESS [ 11.130 s]
   [INFO] Spark Project Networking ........................... SUCCESS [01:02 min]
   [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 16.631 s]
   [INFO] Spark Project Unsafe ............................... SUCCESS [ 12.289 s]
   [INFO] Spark Project Launcher ............................. SUCCESS [  8.497 s]
   [INFO] Spark Project Core ................................. SUCCESS [33:12 min]
   [INFO] Spark Project ML Local Library ..................... SUCCESS [ 33.611 s]
   [INFO] Spark Project GraphX ............................... SUCCESS [03:27 min]
   [INFO] Spark Project Streaming ............................ SUCCESS [06:19 min]
   [INFO] Spark Project Catalyst ............................. SUCCESS [13:03 min]
   [INFO] Spark Project SQL .................................. SUCCESS [  01:43 h]
   [INFO] Spark Project ML Library ........................... SUCCESS [24:16 min]
   [INFO] Spark Project Tools ................................ SUCCESS [ 10.749 s]
   [INFO] Spark Project Hive ................................. SUCCESS [  01:40 h]
   [INFO] Spark Project REPL ................................. SUCCESS [01:50 min]
   [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 13.176 s]
   [INFO] Spark Project YARN ................................. SUCCESS [10:20 min]
   [INFO] Spark Project Mesos ................................ SUCCESS [01:07 min]
   [INFO] Spark Project Kubernetes ........................... SUCCESS [03:52 min]
   [INFO] Spark Project Hive Thrift Server ................... SUCCESS [30:29 min]
   [INFO] Spark Ganglia Integration .......................... SUCCESS [ 13.533 s]
   [INFO] Spark Project Hadoop Cloud Integration ............. SUCCESS [ 30.778 s]
   [INFO] Spark Project Assembly ............................. SUCCESS [ 33.197 s]
   [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ 48.943 s]
   [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [02:26 min]
   [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [35:31 min]
   [INFO] Spark Kinesis Integration .......................... SUCCESS [01:31 min]
   [INFO] Spark Project Examples ............................. SUCCESS [44:00 min]
   [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 14.179 s]
   [INFO] Spark Avro ......................................... SUCCESS [04:29 min]
   [INFO] Spark Project Kinesis Assembly ..................... SUCCESS [ 17.180 s]
   [INFO] ------------------------------------------------------------------------
   [INFO] BUILD SUCCESS
   [INFO] ------------------------------------------------------------------------
   [INFO] Total time:  07:06 h
   [INFO] Finished at: 2022-08-15T17:34:05+08:00
   [INFO] ------------------------------------------------------------------------
   ```
   
   should passed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Ignore test of Spark 2.x and restrict test Python version for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37487:
URL: https://github.com/apache/spark/pull/37487#discussion_r945075217


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala:
##########
@@ -249,11 +254,18 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
 }
 
 object PROCESS_TABLES extends QueryTest with SQLTestUtils {
+  val isPythonVersionAtLeast37 = TestUtils.isPythonVersionAtLeast37()
   val releaseMirror = sys.env.getOrElse("SPARK_RELEASE_MIRROR",
     "https://dist.apache.org/repos/dist/release")
   // Tests the latest version of every release line.
-  val testingVersions: Seq[String] = {
+  val testingVersions: Seq[String] = if (isPythonVersionAtLeast37) {
     import scala.io.Source
+    // SPARK-40053: This set needs to be redefined when there are version releases or EOL.
+    val testableVersions = if (SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17)) {

Review Comment:
   @srowen In order not to test the EOL version, I made this change instead of multiple filters, maybe a little clumsy. is it OK ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Add `assume` to dynamic cancel cases which requiring Python runtime environment

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37487:
URL: https://github.com/apache/spark/pull/37487#discussion_r946299837


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala:
##########
@@ -201,7 +201,11 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
     // scalastyle:on line.size.limit
 
     if (PROCESS_TABLES.testingVersions.isEmpty) {
-      logError("Fail to get the latest Spark versions to test.")
+      if (PROCESS_TABLES.isPythonVersionAtLeast37) {
+        logError("Fail to get the latest Spark versions to test.")
+      } else {
+        logError("Python version <  3.7.0, the running environment is unavailable.")

Review Comment:
   @dongjoon-hyun In addition, are you worried that the string `3.7.0` here is not easy to maintain?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Add `assume` to dynamic cancel cases which requiring Python runtime environment

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37487:
URL: https://github.com/apache/spark/pull/37487#issuecomment-1226860560

   Thanks @HyukjinKwon @dongjoon-hyun @srowen 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Ignore EOL versions test and restrict test Python version for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37487:
URL: https://github.com/apache/spark/pull/37487#discussion_r945089983


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala:
##########
@@ -249,11 +254,18 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
 }
 
 object PROCESS_TABLES extends QueryTest with SQLTestUtils {
+  val isPythonVersionAtLeast37 = TestUtils.isPythonVersionAtLeast37()
   val releaseMirror = sys.env.getOrElse("SPARK_RELEASE_MIRROR",
     "https://dist.apache.org/repos/dist/release")
   // Tests the latest version of every release line.
-  val testingVersions: Seq[String] = {
+  val testingVersions: Seq[String] = if (isPythonVersionAtLeast37) {
     import scala.io.Source
+    // SPARK-40053: This set needs to be redefined when there are version releases or EOL.
+    val testableVersions = if (SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17)) {
+      Set("3.2, 3.3")
+    } else {
+      Set("3.1, 3.2, 3.3")

Review Comment:
   revert to
   ```scala
   versions.filter(v => !v.startsWith("2.4") && !v.startsWith("3.0"))
       .filter(v => !(v.startsWith("3.1") && SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17)))
   ```
   
   Manual test:
   
   ```scala
   scala> val versions = Seq("2.4.8", "3.0.3", "3.1.2", "3.2.2", "3.3.0")
   val versions: Seq[String] = List(2.4.8, 3.0.3, 3.1.2, 3.2.2, 3.3.0)
   
   // SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17) is true
   scala> versions.filter(v => !v.startsWith("2.4") && !v.startsWith("3.0"))
        |   .filter(v => !(v.startsWith("3.1") && true))
   val res0: Seq[String] = List(3.2.2, 3.3.0)
   
   // SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17) is false
   scala> versions.filter(v => !v.startsWith("2.4") && !v.startsWith("3.0"))
        |       .filter(v => !(v.startsWith("3.1") && false))
   val res1: Seq[String] = List(3.1.2, 3.2.2, 3.3.0)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Ignore test of Spark 2.x and restrict test Python version for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37487:
URL: https://github.com/apache/spark/pull/37487#discussion_r945065704


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala:
##########
@@ -267,7 +273,7 @@ object PROCESS_TABLES extends QueryTest with SQLTestUtils {
       case NonFatal(_) => Nil
     }
     versions
-      .filter(v => v.startsWith("3") || !TestUtils.isPythonVersionAtLeast38())
+      .filter(v => v.startsWith("3") && isPythonVersionAtLeast37)
       .filter(v => v.startsWith("3") || !SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_9))

Review Comment:
   `.filter(v => v.startsWith("3") || !SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_9))` 
   
   After thinking about it, I will delete this condition because it will not filter out any data
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Ignore test of Spark 2.x and restrict test Python version for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37487:
URL: https://github.com/apache/spark/pull/37487#issuecomment-1213648353

   > If the idea is to test only vs supported versions, yes that's fine, and 3.0.x is also EOL
   
   So 3.0.x doesn't need to be tested too, right?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Add `assume` to dynamic cancel cases which requiring Python runtime environment

Posted by GitBox <gi...@apache.org>.
srowen commented on PR #37487:
URL: https://github.com/apache/spark/pull/37487#issuecomment-1245314592

   Maybe, I mean we haven't really clearly said there won't be a 3.1.2. I'd wait until the end of the year


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Add `assume` to dynamic cancel cases which requiring Python runtime environment

Posted by "srowen (via GitHub)" <gi...@apache.org>.
srowen commented on PR #37487:
URL: https://github.com/apache/spark/pull/37487#issuecomment-1427330042

   Done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Add `assume` to dynamic cancel cases which requiring Python runtime environment

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #37487:
URL: https://github.com/apache/spark/pull/37487#issuecomment-1427257688

   Should we remove 3.1.x from dist.apache.org? @srowen 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Add `assume` to dynamic cancel cases which requiring Python runtime environment

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37487:
URL: https://github.com/apache/spark/pull/37487#issuecomment-1221862289

   friendly Ping @HyukjinKwon @dongjoon-hyun @srowen,  should we continue this pr ?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Add `assume` to dynamic cancel cases which requiring Python runtime environment

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #37487:
URL: https://github.com/apache/spark/pull/37487#issuecomment-1226859513

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Ignore EOL versions test and restrict test Python version for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37487:
URL: https://github.com/apache/spark/pull/37487#issuecomment-1213769704

   If it can be deleted from the data source, it is better


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Add `assume` to dynamic cancel cases which requiring Python runtime environment

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37487:
URL: https://github.com/apache/spark/pull/37487#discussion_r946299837


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala:
##########
@@ -201,7 +201,11 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
     // scalastyle:on line.size.limit
 
     if (PROCESS_TABLES.testingVersions.isEmpty) {
-      logError("Fail to get the latest Spark versions to test.")
+      if (PROCESS_TABLES.isPythonVersionAtLeast37) {
+        logError("Fail to get the latest Spark versions to test.")
+      } else {
+        logError("Python version <  3.7.0, the running environment is unavailable.")

Review Comment:
   @dongjoon-hyun In addition, are you worried that `3.7.0` here is not easy to maintain?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Add `assume` to dynamic cancel cases which requiring Python runtime environment

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37487:
URL: https://github.com/apache/spark/pull/37487#discussion_r951033711


##########
core/src/main/scala/org/apache/spark/TestUtils.scala:
##########
@@ -281,9 +281,19 @@ private[spark] object TestUtils {
     attempt.isSuccess && attempt.get == 0
   }
 
-  def isPythonVersionAtLeast38(): Boolean = {
+  // SPARK-40053: This string needs to be updated when the
+  // minimum python supported version changes.
+  val minimumPythonSupportedVersion: String = "3.7.0"
+
+  def isPythonVersionAvailable: Boolean = {

Review Comment:
   https://github.com/apache/spark/blob/cf1a80eeae8bf815270fb39568b1846c2bd8d437/sql/core/src/test/scala/org/apache/spark/sql/IntegratedUDFTestUtils.scala#L108
   
   Should we replace `TestUtils.testCommandAvailable(pythonExec)` with `TestUtils.isPythonVersionAvailable` ?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Add `assume` to dynamic cancel cases which requiring Python runtime environment

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Add `assume` to dynamic cancel cases which requiring Python runtime environment
URL: https://github.com/apache/spark/pull/37487


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Ignore EOL versions test and restrict test Python version for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #37487:
URL: https://github.com/apache/spark/pull/37487#discussion_r945085339


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala:
##########
@@ -249,11 +254,18 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
 }
 
 object PROCESS_TABLES extends QueryTest with SQLTestUtils {
+  val isPythonVersionAtLeast37 = TestUtils.isPythonVersionAtLeast37()
   val releaseMirror = sys.env.getOrElse("SPARK_RELEASE_MIRROR",
     "https://dist.apache.org/repos/dist/release")
   // Tests the latest version of every release line.
-  val testingVersions: Seq[String] = {
+  val testingVersions: Seq[String] = if (isPythonVersionAtLeast37) {
     import scala.io.Source
+    // SPARK-40053: This set needs to be redefined when there are version releases or EOL.
+    val testableVersions = if (SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17)) {
+      Set("3.2, 3.3")
+    } else {
+      Set("3.1, 3.2, 3.3")

Review Comment:
   So, does this mean we should add this for every release? I think we should better have a filter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Ignore EOL versions test and restrict test Python version for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37487:
URL: https://github.com/apache/spark/pull/37487#discussion_r945089983


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala:
##########
@@ -249,11 +254,18 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
 }
 
 object PROCESS_TABLES extends QueryTest with SQLTestUtils {
+  val isPythonVersionAtLeast37 = TestUtils.isPythonVersionAtLeast37()
   val releaseMirror = sys.env.getOrElse("SPARK_RELEASE_MIRROR",
     "https://dist.apache.org/repos/dist/release")
   // Tests the latest version of every release line.
-  val testingVersions: Seq[String] = {
+  val testingVersions: Seq[String] = if (isPythonVersionAtLeast37) {
     import scala.io.Source
+    // SPARK-40053: This set needs to be redefined when there are version releases or EOL.
+    val testableVersions = if (SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17)) {
+      Set("3.2, 3.3")
+    } else {
+      Set("3.1, 3.2, 3.3")

Review Comment:
   revert to
   ```
   versions.filter(v => !v.startsWith("2.4") && !v.startsWith("3.0"))
         .filter(v => !(v.startsWith("3.1") && SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17)))
   ```
   
   Manual test:
   
   ```
   scala> val versions = Seq("2.4.8", "3.0.3", "3.1.2", "3.2.2", "3.3.0")
   val versions: Seq[String] = List(2.4.8, 3.0.3, 3.1.2, 3.2.2, 3.3.0)
   
   // SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17) is true
   scala> versions.filter(v => !v.startsWith("2.4") && !v.startsWith("3.0"))
        |   .filter(v => !(v.startsWith("3.1") && true))
   val res0: Seq[String] = List(3.2.2, 3.3.0)
   
   // SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17) is false
   scala> versions.filter(v => !v.startsWith("2.4") && !v.startsWith("3.0"))
        |       .filter(v => !(v.startsWith("3.1") && false))
   val res1: Seq[String] = List(3.1.2, 3.2.2, 3.3.0)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Ignore test of Spark 2.x and restrict test Python version for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on PR #37487:
URL: https://github.com/apache/spark/pull/37487#issuecomment-1213302457

   Thank you for pinging me, @LuciferYang .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Ignore EOL versions test and restrict test Python version for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
srowen commented on PR #37487:
URL: https://github.com/apache/spark/pull/37487#issuecomment-1214153247

   I removed the 2.4.x and 3.0.x releases from dist.apache.org


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Ignore test of Spark 2.x and restrict test Python version for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37487:
URL: https://github.com/apache/spark/pull/37487#discussion_r945074317


##########
core/src/main/scala/org/apache/spark/TestUtils.scala:
##########
@@ -281,9 +281,13 @@ private[spark] object TestUtils {
     attempt.isSuccess && attempt.get == 0
   }
 
-  def isPythonVersionAtLeast38(): Boolean = {
+  def isPythonVersionAtLeast37(): Boolean = isPythonVersionAtLeast(3, 7, 0)
+
+  def isPythonVersionAtLeast38(): Boolean = isPythonVersionAtLeast(3, 8, 0)

Review Comment:
   @HyukjinKwon just have a question but not related to this pr
   
   https://github.com/apache/spark/blob/1a1e4938f97ecedbb99a440df27d365439a9a621/sql/core/src/test/scala/org/apache/spark/sql/IntegratedUDFTestUtils.scala#L108
   
   Should the above code call `TestUtils.isPythonVersionAtLeast37()` instead? like 
   
   ```
   private lazy val isPythonAvailable: Boolean = TestUtils.isPythonVersionAtLeast37()
   ```
   
    



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Ignore EOL versions test and restrict test Python version for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37487:
URL: https://github.com/apache/spark/pull/37487#discussion_r945090727


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala:
##########
@@ -249,11 +254,18 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
 }
 
 object PROCESS_TABLES extends QueryTest with SQLTestUtils {
+  val isPythonVersionAtLeast37 = TestUtils.isPythonVersionAtLeast37()
   val releaseMirror = sys.env.getOrElse("SPARK_RELEASE_MIRROR",
     "https://dist.apache.org/repos/dist/release")
   // Tests the latest version of every release line.
-  val testingVersions: Seq[String] = {
+  val testingVersions: Seq[String] = if (isPythonVersionAtLeast37) {
     import scala.io.Source
+    // SPARK-40053: This set needs to be redefined when there are version releases or EOL.
+    val testableVersions = if (SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17)) {
+      Set("3.2, 3.3")
+    } else {
+      Set("3.1, 3.2, 3.3")

Review Comment:
   ```
   versions.filter(v => !v.startsWith("2.4") && !v.startsWith("3.0"))
       .filter(v => !(v.startsWith("3.1") && SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17)))
   ```
   
   If EOL versions are not deleted from the data source, it seems that the condition `!v.startsWith("2.4") && !v.startsWith("3.0")` will be added continuously, this is similar to
   
   ```scala
   // needs to be redefined when there are EOL
   val eolVersions = Set("2.4", "3.0")
   versions.filterNot(v => eolVersions.exists(v.startsWith))
      .filter(v => !(v.startsWith("3.1") && SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17)))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Ignore EOL versions test and restrict test Python version for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #37487:
URL: https://github.com/apache/spark/pull/37487#discussion_r945090727


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala:
##########
@@ -249,11 +254,18 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
 }
 
 object PROCESS_TABLES extends QueryTest with SQLTestUtils {
+  val isPythonVersionAtLeast37 = TestUtils.isPythonVersionAtLeast37()
   val releaseMirror = sys.env.getOrElse("SPARK_RELEASE_MIRROR",
     "https://dist.apache.org/repos/dist/release")
   // Tests the latest version of every release line.
-  val testingVersions: Seq[String] = {
+  val testingVersions: Seq[String] = if (isPythonVersionAtLeast37) {
     import scala.io.Source
+    // SPARK-40053: This set needs to be redefined when there are version releases or EOL.
+    val testableVersions = if (SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17)) {
+      Set("3.2, 3.3")
+    } else {
+      Set("3.1, 3.2, 3.3")

Review Comment:
   ```
   versions.filter(v => !v.startsWith("2.4") && !v.startsWith("3.0"))
       .filter(v => !(v.startsWith("3.1") && SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17)))
   ```
   
   If EOL versions are not deleted from the data source, it seems that the condition `!v.startsWith("2.4") && !v.startsWith("3.0")` will be added continuously, this is similar to
   
   ```scala
   val eolVersions = Set("2.4", "3.0")
       versions.filterNot(v => eolVersions.exists(v.startsWith))
         .filter(v => !(v.startsWith("3.1") && SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17)))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37487: [SPARK-40053][CORE][SQL][TESTS] Ignore EOL versions test and restrict test Python version for `HiveExternalCatalogVersionsSuite`

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37487:
URL: https://github.com/apache/spark/pull/37487#issuecomment-1214168035

   [9966eeb](https://github.com/apache/spark/pull/37487/commits/9966eeb881a1ed6e03faf683344b2493ca32a62c) simplify the filter conditions, only keep `Spark 3.1 not test when using Java 17`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org