You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/05/30 08:40:54 UTC

[GitHub] [spark] sadikovi opened a new pull request, #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

sadikovi opened a new pull request, #36726:
URL: https://github.com/apache/spark/pull/36726

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   This PR adds support for TimestampNTZ (TIMESTAMP WITHOUT TIME ZONE) in JDBC data source. It also introduces a new configuration option `inferTimestampNTZType` which allows to read written timestamps as timestamp without time zone. By default this is set to `false`, i.e. all timestamps are read as legacy timestamp type.
   
   Here is the state of timestamp without time zone support in the built-in dialects:
   - H2: timestamp without time zone, seems to map to timestamp type
   - Derby: only has timestamp type
   - MySQL: only has timestamp type
   - Postgres: has timestamp without time zone, which maps to timestamp
   - SQL Server: only datetime/datetime2, neither are time zone aware
   - Oracle: seems to only have timestamp and timestamp with time zone
   - Teradata: similar to Oracle but I could not verify
   - DB2: has TIMESTAMP WITHOUT TIME ZONE but I could not make this type work in my test, only TIMESTAMP seems to work
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   Adds support for the new TimestampNTZ type, see https://issues.apache.org/jira/browse/SPARK-35662.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   JDBC data source is now capable of writing and reading TimestampNTZ types. When reading timestamp values, configuration option `inferTimestampNTZType` allows to infer those values as TIMESTAMP WITHOUT TIME ZONE. By default the option is set to `false` so the behaviour is unchanged and all timestamps are read TIMESTAMP WITH LOCAL TIME ZONE.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   
   I added a unit test to ensure the general functionality works. I also manually verified the write/read test for TimestampNTZ in the following databases (all I could get access to):
   - H2, `jdbc:h2:mem:testdb0`
   - Derby, `jdbc:derby:<filepath>`
   - MySQL, `docker run --name mysql -e MYSQL_ROOT_PASSWORD=secret -e MYSQL_DATABASE=db -e MYSQL_USER=user -e MYSQL_PASSWORD=secret -p 3306:3306 -d mysql:5.7`, `jdbc:mysql://127.0.0.1:3306/db?user=user&password=secret`
   - PostgreSQL, `docker run -d --name postgres -e POSTGRES_PASSWORD=secret -e POSTGRES_USER=user -e POSTGRES_DB=db -p 5432:5432 postgres:12.11`, `jdbc:postgresql://127.0.0.1:5432/db?user=user&password=secret`
   - SQL Server, `docker run -e "ACCEPT_EULA=Y" -e SA_PASSWORD='yourStrong(!)Password' -p 1433:1433 -d mcr.microsoft.com/mssql/server:2019-CU15-ubuntu-20.04`, `jdbc:sqlserver://127.0.0.1:1433;user=sa;password=yourStrong(!)Password`
   - DB2, ` docker run -itd --name mydb2 --privileged=true -p 50000:50000 -e LICENSE=accept -e DB2INST1_PASSWORD=secret -e DBNAME=db ibmcom/db2`, `jdbc:db2://127.0.0.1:50000/db:user=db2inst1;password=secret;`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36726:
URL: https://github.com/apache/spark/pull/36726#discussion_r894185741


##########
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala:
##########
@@ -1879,5 +1880,53 @@ class JDBCSuite extends QueryTest
     val fields = schema.fields
     assert(fields.length === 1)
     assert(fields(0).dataType === StringType)
-   }
+  }
+
+  test("SPARK-39339: Handle TimestampNTZType null values") {
+    val tableName = "timestamp_ntz_null_table"
+
+    val df = Seq(null.asInstanceOf[LocalDateTime]).toDF("col1")
+
+    df.write.format("jdbc")
+      .option("url", urlWithUserAndPass)
+      .option("dbtable", tableName).save()
+
+    val res = spark.read.format("jdbc")
+      .option("inferTimestampNTZType", "true")
+      .option("url", urlWithUserAndPass)
+      .option("dbtable", tableName)
+      .load()
+
+    checkAnswer(res, Seq(Row(null)))
+  }
+
+  test("SPARK-39339: TimestampNTZType with different local time zones") {
+    val tableName = "timestamp_ntz_diff_tz_support_table"
+
+    DateTimeTestUtils.outstandingTimezonesIds.foreach { timeZone =>
+      withSQLConf(SQLConf.SESSION_LOCAL_TIMEZONE.key -> timeZone) {
+        Seq(
+          "1972-07-04 03:30:00",
+          "2019-01-20 12:00:00.502",
+          "2019-01-20T00:00:00.123456",
+          "1500-01-20T00:00:00.123456"
+        ).foreach { case datetime =>
+          val df = spark.sql(s"select timestamp_ntz '$datetime'")
+          df.write.format("jdbc")
+            .mode("overwrite")
+            .option("url", urlWithUserAndPass)
+            .option("dbtable", tableName)
+            .save()
+
+          val res = spark.read.format("jdbc")
+            .option("inferTimestampNTZType", "true")
+            .option("url", urlWithUserAndPass)
+            .option("dbtable", tableName)
+            .load()

Review Comment:
   The test case still read and write to JDBC with the same time zone.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
sadikovi commented on PR #36726:
URL: https://github.com/apache/spark/pull/36726#issuecomment-1141738160

   @beliefer Can you review this PR from JDBC perspective? I think you have contributed extensively to this part of the code. Also, cc @gengliangwang.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #36726:
URL: https://github.com/apache/spark/pull/36726#issuecomment-1140942369

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on a diff in pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
sadikovi commented on code in PR #36726:
URL: https://github.com/apache/spark/pull/36726#discussion_r887638039


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala:
##########
@@ -150,6 +150,9 @@ object JdbcUtils extends Logging with SQLConfHelper {
       case StringType => Option(JdbcType("TEXT", java.sql.Types.CLOB))
       case BinaryType => Option(JdbcType("BLOB", java.sql.Types.BLOB))
       case TimestampType => Option(JdbcType("TIMESTAMP", java.sql.Types.TIMESTAMP))
+      // Most of the databases either don't support TIMESTAMP WITHOUT TIME ZONE or map it to
+      // TIMESTAMP type. This will be overwritten in dialects.
+      case TimestampNTZType => Option(JdbcType("TIMESTAMP", java.sql.Types.TIMESTAMP))

Review Comment:
   This is a common use case of treating TIMESTAMP as timestamp without time zone. JDBC dialects can override this setting if need be. For example, SQL Server uses DATETIME instead. I have verified that most of the jdbc data sources work fine with TIMESTAMP. 
   
   I left a comment above to explain.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
beliefer commented on PR #36726:
URL: https://github.com/apache/spark/pull/36726#issuecomment-1154742475

   I think we can't support timestamp ntz with the option. 
   We should let JDBC dialect to decide how to supports timestamp ntz.
   If one table have ts1 is timestamp and ts2 is timestamp ntz, what is the output when we specify the `inferTimestampNTZType` option ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on a diff in pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
sadikovi commented on code in PR #36726:
URL: https://github.com/apache/spark/pull/36726#discussion_r887638039


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala:
##########
@@ -150,6 +150,9 @@ object JdbcUtils extends Logging with SQLConfHelper {
       case StringType => Option(JdbcType("TEXT", java.sql.Types.CLOB))
       case BinaryType => Option(JdbcType("BLOB", java.sql.Types.BLOB))
       case TimestampType => Option(JdbcType("TIMESTAMP", java.sql.Types.TIMESTAMP))
+      // Most of the databases either don't support TIMESTAMP WITHOUT TIME ZONE or map it to
+      // TIMESTAMP type. This will be overwritten in dialects.
+      case TimestampNTZType => Option(JdbcType("TIMESTAMP", java.sql.Types.TIMESTAMP))

Review Comment:
   This is a generic case. JDBC dialects can override this setting if they want to. I confirm that most of them work fine with TIMESTAMP.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] 1104056452 commented on pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
1104056452 commented on PR #36726:
URL: https://github.com/apache/spark/pull/36726#issuecomment-1141086440

   Can someone help review this patch?thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
sadikovi commented on PR #36726:
URL: https://github.com/apache/spark/pull/36726#issuecomment-1145652717

   @gengliangwang Can you review?
   
   Thanks to the comments from the reviewers, I noticed that there could be inconsistencies when writing timestamp with local time zone and reading as timestamp_ntz. For example, writing 2020-01-01 00:00:00 in local time zone (UTC+1), timestamp_ntz will be read as 2019-12-31 23:00:00. This is because we always store timestamp in UTC and we don't have information on what time zone was used when writing timestamp.
   
   Could you advise on how to proceed? I am not sure we can do much about it because we don't store time zone information.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
sadikovi commented on PR #36726:
URL: https://github.com/apache/spark/pull/36726#issuecomment-1154752252

   @beliefer Maybe we can address your concerns in the follow-up work, what do you think? We can open a follow-up ticket and try to polish the implementation - it is not perfect by any means!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on PR #36726:
URL: https://github.com/apache/spark/pull/36726#issuecomment-1154692745

   Thanks, merging to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on PR #36726:
URL: https://github.com/apache/spark/pull/36726#issuecomment-1154737984

   @beliefer Your test case is testing ORC, while this PR is about JDBC...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on a diff in pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
sadikovi commented on code in PR #36726:
URL: https://github.com/apache/spark/pull/36726#discussion_r887638039


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala:
##########
@@ -150,6 +150,9 @@ object JdbcUtils extends Logging with SQLConfHelper {
       case StringType => Option(JdbcType("TEXT", java.sql.Types.CLOB))
       case BinaryType => Option(JdbcType("BLOB", java.sql.Types.BLOB))
       case TimestampType => Option(JdbcType("TIMESTAMP", java.sql.Types.TIMESTAMP))
+      // Most of the databases either don't support TIMESTAMP WITHOUT TIME ZONE or map it to
+      // TIMESTAMP type. This will be overwritten in dialects.
+      case TimestampNTZType => Option(JdbcType("TIMESTAMP", java.sql.Types.TIMESTAMP))

Review Comment:
   This is a common use case of treating TIMESTAMP as timestamp without time zone. JDBC dialects can override this setting if need be. For example, SQL Server uses DATETIME instead. I have verified that most of the jdbc data sources work fine with TIMESTAMP. 
   
   I am going to update the comment to elaborate in more detail.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
sadikovi commented on PR #36726:
URL: https://github.com/apache/spark/pull/36726#issuecomment-1140874400

   @gengliangwang @beliefer Can you review this PR? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #36726:
URL: https://github.com/apache/spark/pull/36726#discussion_r885186409


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala:
##########
@@ -472,6 +481,15 @@ object JdbcUtils extends Logging with SQLConfHelper {
           row.update(pos, null)
         }
 
+    case TimestampNTZType =>
+      (rs: ResultSet, row: InternalRow, pos: Int) =>
+        val t = rs.getTimestamp(pos + 1)
+        if (t != null) {
+          row.setLong(pos, DateTimeUtils.fromJavaTimestamp(t))
+        } else {
+          row.update(pos, null)
+        }

Review Comment:
   Seems same as `TimestampType` branch, should we merge them?
   
   https://github.com/apache/spark/blob/8bbbdb5a4667d90b27b202870cd73c9a19b74781/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L466-L473
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on a diff in pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
sadikovi commented on code in PR #36726:
URL: https://github.com/apache/spark/pull/36726#discussion_r885115376


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala:
##########
@@ -226,6 +226,9 @@ class JDBCOptions(
   // The prefix that is added to the query sent to the JDBC database.
   // This is required to support some complex queries with some JDBC databases.
   val prepareQuery = parameters.get(JDBC_PREPARE_QUERY).map(_ + " ").getOrElse("")
+
+  // Infers timestamp values as TimestampNTZ type when reading data.
+  val inferTimestampNTZType = parameters.getOrElse(JDBC_INFER_TIMESTAMP_NTZ, "false").toBoolean

Review Comment:
   Yes, I thought about it, let's ask @gengliangwang.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
beliefer commented on PR #36726:
URL: https://github.com/apache/spark/pull/36726#issuecomment-1154735759

   > Thanks, merging to master
   
   I update this test case and it will fail !
   ```
     test("SPARK-37463: read/write Timestamp ntz to Orc with different time zone") {
       DateTimeTestUtils.withDefaultTimeZone(DateTimeTestUtils.LA) {
         val sqlText = """
                         |select
                         | timestamp_ntz '2021-06-01 00:00:00' ts_ntz1,
                         | timestamp_ntz '1883-11-16 00:00:00.0' as ts_ntz2,
                         | timestamp_ntz '2021-03-14 02:15:00.0' as ts_ntz3
                         |""".stripMargin
   
         withTempPath { dir =>
           val path = dir.getCanonicalPath
           val df = sql(sqlText)
   
           df.write.mode("overwrite").orc(path)
   
           val query = s"select * from `orc`.`$path`"
   
           DateTimeTestUtils.outstandingZoneIds.foreach { zoneId =>
             DateTimeTestUtils.withDefaultTimeZone(zoneId) {
               withAllNativeOrcReaders {
                 checkAnswer(sql(query), df)
               }
             }
           }
         }
       }
     }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] codecov-commenter commented on pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #36726:
URL: https://github.com/apache/spark/pull/36726#issuecomment-1141997915

   # [Codecov](https://codecov.io/gh/apache/spark/pull/36726?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#36726](https://codecov.io/gh/apache/spark/pull/36726?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (8bbbdb5) into [master](https://codecov.io/gh/apache/spark/commit/f45cdda2cd55012fab85a17baec71f5ab637c400?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f45cdda) will **decrease** coverage by `10.40%`.
   > The diff coverage is `95.00%`.
   
   > :exclamation: Current head 8bbbdb5 differs from pull request most recent head 3cd7eb4. Consider uploading reports for the commit 3cd7eb4 to get more accurate results
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #36726       +/-   ##
   ===========================================
   - Coverage   86.89%   76.49%   -10.41%     
   ===========================================
     Files         259      217       -42     
     Lines       57743    51963     -5780     
     Branches     9156     8543      -613     
   ===========================================
   - Hits        50177    39748    -10429     
   - Misses       6277    11159     +4882     
   + Partials     1289     1056      -233     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | unittests | `76.47% <95.00%> (-10.40%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/spark/pull/36726?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [python/pyspark/pandas/window.py](https://codecov.io/gh/apache/spark/pull/36726/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3BhcmsvcGFuZGFzL3dpbmRvdy5weQ==) | `88.88% <ø> (-0.07%)` | :arrow_down: |
   | [python/pyspark/sql/session.py](https://codecov.io/gh/apache/spark/pull/36726/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3Bhcmsvc3FsL3Nlc3Npb24ucHk=) | `60.62% <33.33%> (-7.56%)` | :arrow_down: |
   | [python/pyspark/pandas/groupby.py](https://codecov.io/gh/apache/spark/pull/36726/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3BhcmsvcGFuZGFzL2dyb3VwYnkucHk=) | `95.39% <93.93%> (-0.17%)` | :arrow_down: |
   | [python/pyspark/pandas/generic.py](https://codecov.io/gh/apache/spark/pull/36726/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3BhcmsvcGFuZGFzL2dlbmVyaWMucHk=) | `87.94% <100.00%> (-0.07%)` | :arrow_down: |
   | [python/pyspark/pandas/series.py](https://codecov.io/gh/apache/spark/pull/36726/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3BhcmsvcGFuZGFzL3Nlcmllcy5weQ==) | `94.81% <100.00%> (+0.02%)` | :arrow_up: |
   | [python/pyspark/pandas/spark/functions.py](https://codecov.io/gh/apache/spark/pull/36726/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3BhcmsvcGFuZGFzL3NwYXJrL2Z1bmN0aW9ucy5weQ==) | `95.12% <100.00%> (+0.83%)` | :arrow_up: |
   | [python/pyspark/pandas/tests/test\_groupby.py](https://codecov.io/gh/apache/spark/pull/36726/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3BhcmsvcGFuZGFzL3Rlc3RzL3Rlc3RfZ3JvdXBieS5weQ==) | `95.30% <100.00%> (+0.06%)` | :arrow_up: |
   | [python/pyspark/pandas/tests/test\_series.py](https://codecov.io/gh/apache/spark/pull/36726/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3BhcmsvcGFuZGFzL3Rlc3RzL3Rlc3Rfc2VyaWVzLnB5) | `96.36% <100.00%> (+0.01%)` | :arrow_up: |
   | [python/pyspark/join.py](https://codecov.io/gh/apache/spark/pull/36726/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3Bhcmsvam9pbi5weQ==) | `12.12% <0.00%> (-81.82%)` | :arrow_down: |
   | [python/pyspark/ml/tuning.py](https://codecov.io/gh/apache/spark/pull/36726/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3B5c3BhcmsvbWwvdHVuaW5nLnB5) | `25.03% <0.00%> (-67.46%)` | :arrow_down: |
   | ... and [100 more](https://codecov.io/gh/apache/spark/pull/36726/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/spark/pull/36726?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/spark/pull/36726?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [1e194d2...3cd7eb4](https://codecov.io/gh/apache/spark/pull/36726?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #36726:
URL: https://github.com/apache/spark/pull/36726#discussion_r885186409


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala:
##########
@@ -472,6 +481,15 @@ object JdbcUtils extends Logging with SQLConfHelper {
           row.update(pos, null)
         }
 
+    case TimestampNTZType =>
+      (rs: ResultSet, row: InternalRow, pos: Int) =>
+        val t = rs.getTimestamp(pos + 1)
+        if (t != null) {
+          row.setLong(pos, DateTimeUtils.fromJavaTimestamp(t))
+        } else {
+          row.update(pos, null)
+        }

Review Comment:
   Same as `TimestampType` branch, should we merge them?
   
   https://github.com/apache/spark/blob/8bbbdb5a4667d90b27b202870cd73c9a19b74781/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L466-L473
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on a diff in pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
sadikovi commented on code in PR #36726:
URL: https://github.com/apache/spark/pull/36726#discussion_r885267600


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala:
##########
@@ -472,6 +481,15 @@ object JdbcUtils extends Logging with SQLConfHelper {
           row.update(pos, null)
         }
 
+    case TimestampNTZType =>
+      (rs: ResultSet, row: InternalRow, pos: Int) =>
+        val t = rs.getTimestamp(pos + 1)
+        if (t != null) {
+          row.setLong(pos, DateTimeUtils.fromJavaTimestamp(t))
+        } else {
+          row.update(pos, null)
+        }

Review Comment:
   Yes, we can merge. I will do that, thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang closed pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
gengliangwang closed pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source
URL: https://github.com/apache/spark/pull/36726


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
sadikovi commented on PR #36726:
URL: https://github.com/apache/spark/pull/36726#issuecomment-1152675607

   @gengliangwang I made a few small changes. Can you review again? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36726:
URL: https://github.com/apache/spark/pull/36726#discussion_r889503602


##########
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala:
##########
@@ -1879,5 +1880,53 @@ class JDBCSuite extends QueryTest
     val fields = schema.fields
     assert(fields.length === 1)
     assert(fields(0).dataType === StringType)
-   }
+  }
+
+  test("SPARK-39339: Handle TimestampNTZType null values") {
+    val tableName = "timestamp_ntz_null_table"
+
+    val df = Seq(null.asInstanceOf[LocalDateTime]).toDF("col1")
+
+    df.write.format("jdbc")
+      .option("url", urlWithUserAndPass)
+      .option("dbtable", tableName).save()
+
+    val res = spark.read.format("jdbc")
+      .option("inferTimestampNTZType", "true")
+      .option("url", urlWithUserAndPass)
+      .option("dbtable", tableName)
+      .load()
+
+    checkAnswer(res, Seq(Row(null)))
+  }
+
+  test("SPARK-39339: TimestampNTZType with different local time zones") {
+    val tableName = "timestamp_ntz_diff_tz_support_table"
+
+    DateTimeTestUtils.outstandingTimezonesIds.foreach { timeZone =>
+      withSQLConf(SQLConf.SESSION_LOCAL_TIMEZONE.key -> timeZone) {
+        Seq(
+          "1972-07-04 03:30:00",
+          "2019-01-20 12:00:00.502",
+          "2019-01-20T00:00:00.123456",
+          "1500-01-20T00:00:00.123456"
+        ).foreach { case datetime =>
+          val df = spark.sql(s"select timestamp_ntz '$datetime'")
+          df.write.format("jdbc")
+            .mode("overwrite")
+            .option("url", urlWithUserAndPass)
+            .option("dbtable", tableName)
+            .save()
+
+          val res = spark.read.format("jdbc")
+            .option("inferTimestampNTZType", "true")
+            .option("url", urlWithUserAndPass)
+            .option("dbtable", tableName)
+            .load()

Review Comment:
   This test case always read/write with the same time zone. You can reference https://github.com/apache/spark/blob/e410d98f57750080ad46932cc9211d2cf5154c24/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala#L808



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on PR #36726:
URL: https://github.com/apache/spark/pull/36726#issuecomment-1145676931

   @sadikovi yes will do. I just moved home. Sorry for the late reply.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on a diff in pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
sadikovi commented on code in PR #36726:
URL: https://github.com/apache/spark/pull/36726#discussion_r887638039


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala:
##########
@@ -150,6 +150,9 @@ object JdbcUtils extends Logging with SQLConfHelper {
       case StringType => Option(JdbcType("TEXT", java.sql.Types.CLOB))
       case BinaryType => Option(JdbcType("BLOB", java.sql.Types.BLOB))
       case TimestampType => Option(JdbcType("TIMESTAMP", java.sql.Types.TIMESTAMP))
+      // Most of the databases either don't support TIMESTAMP WITHOUT TIME ZONE or map it to
+      // TIMESTAMP type. This will be overwritten in dialects.
+      case TimestampNTZType => Option(JdbcType("TIMESTAMP", java.sql.Types.TIMESTAMP))

Review Comment:
   This is a generic case. JDBC dialects can override this setting if they want to. I confirm that most of them work fine with TIMESTAMP. See the inline comment in the code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
sadikovi commented on PR #36726:
URL: https://github.com/apache/spark/pull/36726#issuecomment-1154747718

   I think we can. JDBC dialects can configure how they map TimestampNTZ type.
   In the case you mentioned, both timestamps will be read as timestamp_ntz in MySQL and Postgres. In fact, the current timestamp type is stored as timestamp_ntz in those database systems.
   
   Even with dialects managing timestamp_ntz writes and reads, this would be the same problem unless you store them as different types.
   
   Also, the test passes in master:
   ```
   [ivan.sadikov@C02DV1TGMD6R spark-oss (master)]$ git log -n1
   commit 2349175e1b81b0a61e1ed90c2d051c01cf78de9b (HEAD -> master, upstream/master)
   Author: Ivan Sadikov <iv...@databricks.com>
   Date:   Mon Jun 13 21:22:15 2022 -0700
   
       [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source
       
   
   
   
   [info] JDBCSuite:
   05:53:04.233 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   05:53:07.936 ERROR org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProvider: Failed to load built-in provider.
   [info] - SPARK-39339: Handle TimestampNTZType null values (1 second, 555 milliseconds)
   [info] - SPARK-39339: TimestampNTZType with different local time zones (4 seconds, 48 milliseconds)
   05:53:14.022 WARN org.apache.spark.sql.jdbc.JDBCSuite: 
   
   ===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.jdbc.JDBCSuite, threads: Timer-2 (daemon=true), rpc-boss-3-1 (daemon=true), shuffle-boss-6-1 (daemon=true) =====
   [info] Run completed in 11 seconds, 724 milliseconds.
   [info] Total number of tests run: 2
   [info] Suites: completed 1, aborted 0
   [info] Tests: succeeded 2, failed 0, canceled 0, ignored 0, pending 0
   [info] All tests passed.
   [success] Total time: 81 s (01:21), completed Jun 14, 2022 5:53:14 AM
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
beliefer commented on PR #36726:
URL: https://github.com/apache/spark/pull/36726#issuecomment-1154742731

   > I updated the test case as you suggested and it passes on my machine. Can you share the error message? It also passed the build.
   
   ```
   == Results ==
   !== Correct Answer - 1 ==                                           == Spark Answer - 1 ==
    struct<TIMESTAMP_NTZ '1500-01-20 00:00:00.123456':timestamp_ntz>   struct<TIMESTAMP_NTZ '1500-01-20 00:00:00.123456':timestamp_ntz>
   ![1500-01-20T00:00:00.123456]                                       [1500-01-20T00:16:08.123456]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
sadikovi commented on PR #36726:
URL: https://github.com/apache/spark/pull/36726#issuecomment-1154740681

   I updated the test case as you suggested and it passes on my machine. Can you share the error message? It also passed the build.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #36726:
URL: https://github.com/apache/spark/pull/36726#discussion_r884667716


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala:
##########
@@ -226,6 +226,9 @@ class JDBCOptions(
   // The prefix that is added to the query sent to the JDBC database.
   // This is required to support some complex queries with some JDBC databases.
   val prepareQuery = parameters.get(JDBC_PREPARE_QUERY).map(_ + " ").getOrElse("")
+
+  // Infers timestamp values as TimestampNTZ type when reading data.
+  val inferTimestampNTZType = parameters.getOrElse(JDBC_INFER_TIMESTAMP_NTZ, "false").toBoolean

Review Comment:
   Should we maybe check if `spark.sql.timestampType` is `TIMESTAMP_NTZ` if `inferTimestampNTZType` is not set? That's how CSV type inference and Python type inference do.



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala:
##########
@@ -226,6 +226,9 @@ class JDBCOptions(
   // The prefix that is added to the query sent to the JDBC database.
   // This is required to support some complex queries with some JDBC databases.
   val prepareQuery = parameters.get(JDBC_PREPARE_QUERY).map(_ + " ").getOrElse("")
+
+  // Infers timestamp values as TimestampNTZ type when reading data.
+  val inferTimestampNTZType = parameters.getOrElse(JDBC_INFER_TIMESTAMP_NTZ, "false").toBoolean

Review Comment:
   cc @gengliangwang FYI



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36726:
URL: https://github.com/apache/spark/pull/36726#discussion_r887615192


##########
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala:
##########
@@ -1879,5 +1880,27 @@ class JDBCSuite extends QueryTest
     val fields = schema.fields
     assert(fields.length === 1)
     assert(fields(0).dataType === StringType)
-   }
+  }
+
+  test("SPARK-39339: TimestampNTZType support") {
+    val tableName = "timestamp_ntz_table"

Review Comment:
   Could you add tests write/read timestamp_ntz with different time zone ? I doubt the result is not correct.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on a diff in pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
sadikovi commented on code in PR #36726:
URL: https://github.com/apache/spark/pull/36726#discussion_r887638399


##########
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala:
##########
@@ -1879,5 +1880,27 @@ class JDBCSuite extends QueryTest
     val fields = schema.fields
     assert(fields.length === 1)
     assert(fields(0).dataType === StringType)
-   }
+  }
+
+  test("SPARK-39339: TimestampNTZType support") {
+    val tableName = "timestamp_ntz_table"

Review Comment:
   Timestamp NTZ type is independent of the time zone. Sure, I can add a test to confirm.



##########
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala:
##########
@@ -1879,5 +1880,27 @@ class JDBCSuite extends QueryTest
     val fields = schema.fields
     assert(fields.length === 1)
     assert(fields(0).dataType === StringType)
-   }
+  }
+
+  test("SPARK-39339: TimestampNTZType support") {
+    val tableName = "timestamp_ntz_table"

Review Comment:
   Timestamp NTZ type is independent of the time zone with timestamp rebased to UTC. Sure, I can add a test to confirm.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36726:
URL: https://github.com/apache/spark/pull/36726#discussion_r887631084


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala:
##########
@@ -150,6 +150,9 @@ object JdbcUtils extends Logging with SQLConfHelper {
       case StringType => Option(JdbcType("TEXT", java.sql.Types.CLOB))
       case BinaryType => Option(JdbcType("BLOB", java.sql.Types.BLOB))
       case TimestampType => Option(JdbcType("TIMESTAMP", java.sql.Types.TIMESTAMP))
+      // Most of the databases either don't support TIMESTAMP WITHOUT TIME ZONE or map it to
+      // TIMESTAMP type. This will be overwritten in dialects.
+      case TimestampNTZType => Option(JdbcType("TIMESTAMP", java.sql.Types.TIMESTAMP))

Review Comment:
   We cannot do this. We should let JDBC dialect decide how to do the mapping.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on a diff in pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
sadikovi commented on code in PR #36726:
URL: https://github.com/apache/spark/pull/36726#discussion_r889742470


##########
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala:
##########
@@ -1879,5 +1880,53 @@ class JDBCSuite extends QueryTest
     val fields = schema.fields
     assert(fields.length === 1)
     assert(fields(0).dataType === StringType)
-   }
+  }
+
+  test("SPARK-39339: Handle TimestampNTZType null values") {
+    val tableName = "timestamp_ntz_null_table"
+
+    val df = Seq(null.asInstanceOf[LocalDateTime]).toDF("col1")
+
+    df.write.format("jdbc")
+      .option("url", urlWithUserAndPass)
+      .option("dbtable", tableName).save()
+
+    val res = spark.read.format("jdbc")
+      .option("inferTimestampNTZType", "true")
+      .option("url", urlWithUserAndPass)
+      .option("dbtable", tableName)
+      .load()
+
+    checkAnswer(res, Seq(Row(null)))
+  }
+
+  test("SPARK-39339: TimestampNTZType with different local time zones") {
+    val tableName = "timestamp_ntz_diff_tz_support_table"
+
+    DateTimeTestUtils.outstandingTimezonesIds.foreach { timeZone =>
+      withSQLConf(SQLConf.SESSION_LOCAL_TIMEZONE.key -> timeZone) {
+        Seq(
+          "1972-07-04 03:30:00",
+          "2019-01-20 12:00:00.502",
+          "2019-01-20T00:00:00.123456",
+          "1500-01-20T00:00:00.123456"
+        ).foreach { case datetime =>
+          val df = spark.sql(s"select timestamp_ntz '$datetime'")
+          df.write.format("jdbc")
+            .mode("overwrite")
+            .option("url", urlWithUserAndPass)
+            .option("dbtable", tableName)
+            .save()
+
+          val res = spark.read.format("jdbc")
+            .option("inferTimestampNTZType", "true")
+            .option("url", urlWithUserAndPass)
+            .option("dbtable", tableName)
+            .load()

Review Comment:
   Yes, I will update, thanks 👍.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

Posted by GitBox <gi...@apache.org>.
beliefer commented on PR #36726:
URL: https://github.com/apache/spark/pull/36726#issuecomment-1154758012

   @sadikovi You can run the test case I added above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org