You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/24 21:24:53 UTC

[GitHub] [spark] MaxGekk opened a new pull request #35655: [WIP][SPARK-38315][SQL] Add a config to control decoding of datetime as Java 8 classes

MaxGekk opened a new pull request #35655:
URL: https://github.com/apache/spark/pull/35655


   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   By running the existing test suites:
   ```
   $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *RowEncoderSuite"
   $ build/sbt -Phive -Phive-thriftserver "test:testOnly *HiveThriftBinaryServerSuite"
   $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly org.apache.spark.sql.hive.thriftserver.CliSuite"
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #35655: [SPARK-38315][SQL] Add a config to control decoding of datetime as Java 8 classes

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #35655:
URL: https://github.com/apache/spark/pull/35655#issuecomment-1050841710


   @sadikovi @cloud-fan @HyukjinKwon @gengliangwang Could you review this PR, please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on a change in pull request #35655: [SPARK-38315][SQL] Add a config to control decoding of datetime as Java 8 classes

Posted by GitBox <gi...@apache.org>.
sadikovi commented on a change in pull request #35655:
URL: https://github.com/apache/spark/pull/35655#discussion_r815502277



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/sources/TableScanSuite.scala
##########
@@ -420,4 +434,32 @@ class TableScanSuite extends DataSourceTest with SharedSparkSession {
     val comments = planned.schema.fields.map(_.getComment().getOrElse("NO_COMMENT")).mkString(",")
     assert(comments === "SN,SA,NO_COMMENT")
   }
+
+  test("SPARK-38315: control decoding of datetime as Java 8 classes") {
+    val tableName = "relationProviderWithLegacyTimestamps"
+    def scanTable(): Unit = {
+      withTable (tableName) {
+        sql(
+          s"""
+             |CREATE TABLE $tableName (col TIMESTAMP)
+             |USING org.apache.spark.sql.sources.LegacyTimestampSource
+             """.stripMargin)
+        checkAnswer(
+          spark.table(tableName),
+          Row(java.sql.Timestamp.valueOf("2020-01-01 12:34:56")) :: Nil)
+      }
+    }
+
+    withSQLConf(SQLConf.DATETIME_JAVA8API_ENABLED.key -> "true") {
+      val e = intercept[SparkException] {
+        scanTable()
+      }.getCause
+      assert(e.isInstanceOf[java.lang.RuntimeException])
+      assert(e.getMessage.contains("Error while encoding"))
+
+      withSQLConf(SQLConf.LEGACY_DATETIME_ENCODER_ENABLED.key -> "true") {

Review comment:
       Could you add a comment explaining that this test case is supposed to pass and should not throw any exceptions?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #35655: [SPARK-38315][SQL] Add a config to control decoding of datetime as Java 8 classes

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #35655:
URL: https://github.com/apache/spark/pull/35655#discussion_r814818470



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/sources/TableScanSuite.scala
##########
@@ -109,6 +110,19 @@ case class AllDataTypesScan(
   }
 }
 
+class LegacyTimestampSource extends RelationProvider {

Review comment:
       I am not quite sure that I got your point. You proposed to add a config for `collect()` (which can control ONLY decoding, obviously) and to write a test using the DS write API? How (does `collect()` involve the write API)? Could you clarify, please.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35655: [SPARK-38315][SQL] Add a config to control decoding of datetime as Java 8 classes

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35655:
URL: https://github.com/apache/spark/pull/35655#discussion_r814781040



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/sources/TableScanSuite.scala
##########
@@ -109,6 +110,19 @@ case class AllDataTypesScan(
   }
 }
 
+class LegacyTimestampSource extends RelationProvider {

Review comment:
       can we use the data source write API as an example? For read API, another fix we should do later is to let Spark accept both old and new datetime classes, which will make this test useless.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35655: [SPARK-38315][SQL] Add a config to control decoding of datetime as Java 8 classes

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #35655:
URL: https://github.com/apache/spark/pull/35655#discussion_r815039384



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -3083,6 +3083,17 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val LEGACY_DATETIME_ENCODER_ENABLED =
+    buildConf("spark.sql.legacy.datetimeEncoder.enabled")

Review comment:
       +1 for the new config name.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk closed pull request #35655: [SPARK-38315][SQL] Add a config to control decoding of datetime as Java 8 classes

Posted by GitBox <gi...@apache.org>.
MaxGekk closed pull request #35655:
URL: https://github.com/apache/spark/pull/35655


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35655: [SPARK-38315][SQL] Add a config to control decoding of datetime as Java 8 classes

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #35655:
URL: https://github.com/apache/spark/pull/35655#discussion_r815039921



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -3083,6 +3083,17 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val LEGACY_DATETIME_ENCODER_ENABLED =
+    buildConf("spark.sql.legacy.datetimeEncoder.enabled")
+      .doc("If the configuration property is set to true, java.sql.Timestamp and " +
+        "java.sql.Date are used as external types for " +
+        "Catalyst's TimestampType and DateType in rows encoding and decoding when the property " +
+        s"${DATETIME_JAVA8API_ENABLED.key} is true (otherwise it is ignored).")
+      .version("3.3.0")
+      .internal()

Review comment:
       For the new config name, we had better remove this `internal` because the parent config is exposed already.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35655: [SPARK-38315][SQL] Add a config to control decoding of datetime as Java 8 classes

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35655:
URL: https://github.com/apache/spark/pull/35655#discussion_r814779179



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -3083,6 +3083,17 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val LEGACY_DATETIME_ENCODER_ENABLED =
+    buildConf("spark.sql.legacy.datetimeEncoder.enabled")

Review comment:
       How about `spark.sql.datetime.java8API.collect.enabled`? This should be a sub-config of `spark.sql.datetime.java8API.enabled`, which enables java8 datetime API everywhere, including data source, UDF and df.collect. It's pretty reasonable to have a sub-config of it to enable the datetime API in a specific place.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #35655: [SPARK-38315][SQL] Add a config to control decoding of datetime as Java 8 classes

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #35655:
URL: https://github.com/apache/spark/pull/35655#discussion_r815081330



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -3083,6 +3083,17 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val LEGACY_DATETIME_ENCODER_ENABLED =
+    buildConf("spark.sql.legacy.datetimeEncoder.enabled")

Review comment:
       I wouldn't introduce new **public** config if it is needed only in corner cases. I especially put it into the **legacy** namespace because it is need during the migration new Spark version when a datasource connecter still doesn't support all possible Spark SQL types.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #35655: [SPARK-38315][SQL] Add a config to control decoding of datetime as Java 8 classes

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #35655:
URL: https://github.com/apache/spark/pull/35655#discussion_r815082922



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -3083,6 +3083,17 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val LEGACY_DATETIME_ENCODER_ENABLED =
+    buildConf("spark.sql.legacy.datetimeEncoder.enabled")
+      .doc("If the configuration property is set to true, java.sql.Timestamp and " +
+        "java.sql.Date are used as external types for " +
+        "Catalyst's TimestampType and DateType in rows encoding and decoding when the property " +
+        s"${DATETIME_JAVA8API_ENABLED.key} is true (otherwise it is ignored).")
+      .version("3.3.0")
+      .internal()

Review comment:
       The "legacy" configs must be `internal`. Why it is a legacy one is another question.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #35655: [SPARK-38315][SQL] Add a config to control decoding of datetime as Java 8 classes

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #35655:
URL: https://github.com/apache/spark/pull/35655#discussion_r815081330



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -3083,6 +3083,17 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val LEGACY_DATETIME_ENCODER_ENABLED =
+    buildConf("spark.sql.legacy.datetimeEncoder.enabled")

Review comment:
       I wouldn't introduce new **public** config if it is needed only in corner cases. I especially put it into the **legacy** namespace because it is need during the migration to new Spark version when a datasource connecter still doesn't support all possible Spark SQL types.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #35655: [SPARK-38315][SQL] Add a config to control decoding of datetime as Java 8 classes

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #35655:
URL: https://github.com/apache/spark/pull/35655#issuecomment-1066439449


   I am closing this because https://github.com/apache/spark/pull/35756 should solve the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org