You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "tianhanhu (via GitHub)" <gi...@apache.org> on 2023/04/05 18:50:06 UTC

[GitHub] [spark] tianhanhu opened a new pull request, #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

tianhanhu opened a new pull request, #40678:
URL: https://github.com/apache/spark/pull/40678

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   https://github.com/apache/spark/pull/36726 supports TimestampNTZ type in JDBC data source and https://github.com/apache/spark/pull/37013 applies a fix to pass more test cases with H2.
   
   The problem is that Java Timestamp is a poorly defined class and different JDBC drivers implement "getTimestamp" and "setTimestamp" with different expected behaviors in mind. The general conversion implementation would work with some JDBC dialects and their drivers but not others. This issue is discovered when testing with PostgreSQL database.
   
   This PR adds a `dialect` parameter to `makeGetter` for applying dialect specific conversions when reading a Java Timestamp into TimestampNTZType. `makeSetter` already has a `dialect` field and we will use that for converting back to Java Timestamp.
   
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   Fix TimestampNTZ support for PostgreSQL. Allows other JDBC dialects to provide dialect specific implementation for
   converting between Java Timestamp and Spark TimestampNTZType.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   Existing unit test.
   I added new test cases for `PostgresIntegrationSuite` to cover TimestampNTZ read and writes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on PR #40678:
URL: https://github.com/apache/spark/pull/40678#issuecomment-1498981627

   also cc @yaooqinn 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1166519004


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala:
##########
@@ -104,6 +105,31 @@ abstract class JdbcDialect extends Serializable with Logging {
    */
   def getJDBCType(dt: DataType): Option[JdbcType] = None
 
+  /**
+   * Convert java.sql.Timestamp to a LocalDateTime representing the same wall-clock time as the
+   * value stored in a remote database.
+   * JDBC dialects should override this function to provide implementations that suite their
+   * JDBC drivers.
+   * @param t Timestamp returned from JDBC driver getTimestamp method.
+   * @return A LocalDateTime representing the same wall clock time as the timestamp in database.
+   */
+  @Since("3.5.0")
+  def convertJavaTimestampToTimestampNTZ(t: Timestamp): LocalDateTime = {
+    DateTimeUtils.microsToLocalDateTime(DateTimeUtils.fromJavaTimestampNoRebase(t))

Review Comment:
   For `convertTimestampNTZToJavaTimestamp`, Spark stores timestamp as long internally, then `df.collect` converts long to `LocalDateTime`, then dialect converts `LocalDateTime` to long again and creates `Timestamp`.
   
   If we don't want to change it, we should not change `convertJavaTimestampToTimestampNTZ` either to make them  symmetric.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianhanhu-db commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "tianhanhu-db (via GitHub)" <gi...@apache.org>.
tianhanhu-db commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1159090934


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala:
##########
@@ -98,6 +100,14 @@ private object PostgresDialect extends JdbcDialect with SQLConfHelper {
     case _ => None
   }
 
+  override def convertJavaTimestampToTimestampNTZ(t: Timestamp): Long = {
+    DateTimeUtils.localDateTimeToMicros(t.toLocalDateTime)
+  }
+
+  override  def convertTimestampNTZToJavaTimestamp(ldt: LocalDateTime): Timestamp = {

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianhanhu-db commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "tianhanhu-db (via GitHub)" <gi...@apache.org>.
tianhanhu-db commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1160376168


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala:
##########
@@ -98,6 +100,14 @@ private object PostgresDialect extends JdbcDialect with SQLConfHelper {
     case _ => None
   }
 
+  override def convertJavaTimestampToTimestampNTZ(t: Timestamp): Long = {
+    DateTimeUtils.localDateTimeToMicros(t.toLocalDateTime)

Review Comment:
   I tried the Postgres specific solution for the existing H2 test and it is not working.
   
   I checked H2 driver as well and I think what happens is that H2 is creating the timestamp using the the milliseconds from epoch and THEN converting the wall clock time to the local Timezone. This change in order makes the difference. If we take the previous case as an example, the resultant Timestamp would be "2023-04-05 01:00:00 America/Los_Angeles". This represent the same instant as "2023-04-05 08:00:00 UTC" which is why storing its microseconds from epoch works. It also explain why converting to LocalDateTime (Postgres specific solution) would not work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1159289217


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala:
##########
@@ -104,6 +105,23 @@ abstract class JdbcDialect extends Serializable with Logging {
    */
   def getJDBCType(dt: DataType): Option[JdbcType] = None
 
+  /**
+   * Convert java.sql.Timestamp to a Long value (internal representation of a TimestampNTZType)
+   * holding the microseconds since the epoch of 1970-01-01 00:00:00Z for this timestamp.
+   */
+  def convertJavaTimestampToTimestampNTZ(t: Timestamp): Long = {
+    DateTimeUtils.fromJavaTimestampNoRebase(t)
+  }
+
+  /**
+   * Converts a LocalDateTime representing a TimestampNTZ type to an
+   * instance of `java.sql.Timestamp`.
+   */
+  def convertTimestampNTZToJavaTimestamp(ldt: LocalDateTime): Timestamp = {

Review Comment:
   these two are not symmetry. Shall we let `convertJavaTimestampToTimestampNTZ` return `LocalDateTime` and then Spark converts `LocalDateTime` to a long?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1160407471


##########
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala:
##########
@@ -138,6 +139,12 @@ class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite {
       "c0 money)").executeUpdate()
     conn.prepareStatement("INSERT INTO money_types VALUES " +
       "('$1,000.00')").executeUpdate()
+
+    conn.prepareStatement(s"CREATE TABLE timestamp_ntz(v timestamp)").executeUpdate()

Review Comment:
   isn't it the default of `timestamp`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianhanhu-db commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "tianhanhu-db (via GitHub)" <gi...@apache.org>.
tianhanhu-db commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1162355247


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala:
##########
@@ -104,6 +105,23 @@ abstract class JdbcDialect extends Serializable with Logging {
    */
   def getJDBCType(dt: DataType): Option[JdbcType] = None
 
+  /**
+   * Convert java.sql.Timestamp to a Long value (internal representation of a TimestampNTZType)
+   * holding the microseconds since the epoch of 1970-01-01 00:00:00Z for this timestamp.
+   */
+  def convertJavaTimestampToTimestampNTZ(t: Timestamp): Long = {
+    DateTimeUtils.fromJavaTimestampNoRebase(t)
+  }
+
+  /**
+   * Converts a LocalDateTime representing a TimestampNTZ type to an
+   * instance of `java.sql.Timestamp`.
+   */
+  def convertTimestampNTZToJavaTimestamp(ldt: LocalDateTime): Timestamp = {

Review Comment:
   Done!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1160406281


##########
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala:
##########
@@ -138,6 +139,12 @@ class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite {
       "c0 money)").executeUpdate()
     conn.prepareStatement("INSERT INTO money_types VALUES " +
       "('$1,000.00')").executeUpdate()
+
+    conn.prepareStatement(s"CREATE TABLE timestamp_ntz(v timestamp)").executeUpdate()

Review Comment:
   It seems Postgres have timestamp without time zone. I think we can mapping timestamp -> timestamp, timestampNTZ -> timestampNTZ.
   H2 does not have timestampNTZ.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1160407528


##########
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala:
##########
@@ -379,4 +386,32 @@ class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite {
     assert(row(0).length === 1)
     assert(row(0).getString(0) === "$1,000.00")
   }
+
+  test("SPARK-43040: timestamp_ntz read test") {
+    val prop = new Properties
+    prop.setProperty("preferTimestampNTZ", "true")
+    val df = sqlContext.read.jdbc(jdbcUrl, "timestamp_ntz", prop)
+    val row = df.collect()

Review Comment:
   ah I see



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "sadikovi (via GitHub)" <gi...@apache.org>.
sadikovi commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1159092513


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala:
##########
@@ -328,21 +328,25 @@ object JdbcUtils extends Logging with SQLConfHelper {
   /**
    * Convert a [[ResultSet]] into an iterator of Catalyst Rows.
    */
-  def resultSetToRows(resultSet: ResultSet, schema: StructType): Iterator[Row] = {
+  def resultSetToRows(
+      resultSet: ResultSet,
+      schema: StructType,
+      dialect: Option[JdbcDialect] = None): Iterator[Row] = {

Review Comment:
   For backward compatibility, this solution would require a separate constructor, just having the default value would not work in Java.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1163925624


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala:
##########
@@ -104,6 +105,31 @@ abstract class JdbcDialect extends Serializable with Logging {
    */
   def getJDBCType(dt: DataType): Option[JdbcType] = None
 
+  /**
+   * Convert java.sql.Timestamp to a LocalDateTime representing the same wall-clock time as the
+   * value stored in a remote database.
+   * JDBC dialects should override this function to provide implementations that suite their
+   * JDBC drivers.
+   * @param t Timestamp returned from JDBC driver getTimestamp method.
+   * @return A LocalDateTime representing the same wall clock time as the timestamp in database.
+   */
+  @Since("3.5.0")
+  def convertJavaTimestampToTimestampNTZ(t: Timestamp): LocalDateTime = {
+    DateTimeUtils.microsToLocalDateTime(DateTimeUtils.fromJavaTimestampNoRebase(t))

Review Comment:
   which is just a noop, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianhanhu-db commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "tianhanhu-db (via GitHub)" <gi...@apache.org>.
tianhanhu-db commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1160368163


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala:
##########
@@ -98,6 +100,14 @@ private object PostgresDialect extends JdbcDialect with SQLConfHelper {
     case _ => None
   }
 
+  override def convertJavaTimestampToTimestampNTZ(t: Timestamp): Long = {
+    DateTimeUtils.localDateTimeToMicros(t.toLocalDateTime)

Review Comment:
   I will give a concrete Postgres read example where the general implementation would fail.
   
   Say there is a Timestamp of "2023-04-05 08:00:00" stored in Postgres database and we want to read it as Spark TimestampNTZType from a TimeZone of America/Los_Angeles. The expected results would be "2023-04-05 08:00:00".
   
   When we do `PostgresDriver.getTimestamp`, what happens under the hood is that Postgres would use the default JVM TimeZone and create a Timestamp representing an instant of the wall clock in that time zone. Thus, the Java Timestamp effectively represents "2023-04-05 08:00:00 America/Los_Angeles". 
   
   With our general conversion, we will just store the underlining microseconds from epoch to represent the TimestampNTZType. This is problematic as when displaying the TimestampNTZType, we convert to a LocalDateTime using UTC as the time zone. This will give as an erroneous result of "2023-04-05 15:00:00".
   
   The Postgres specific conversion first convert the Java Timestamp to LocalDateTime before getting its underlining microseconds from epoch. This basically restores the Timestamp to represent "2023-04-05 08:00:00 UTC". Thus when converting back we get the correct result.
   
   For write it is the similar story. @cloud-fan @beliefer 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1162360814


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala:
##########
@@ -104,6 +105,31 @@ abstract class JdbcDialect extends Serializable with Logging {
    */
   def getJDBCType(dt: DataType): Option[JdbcType] = None
 
+  /**
+   * Convert java.sql.Timestamp to a LocalDateTime representing the same wall-clock time as the
+   * value stored in a remote database.
+   * JDBC dialects should override this function to provide implementations that suite their
+   * JDBC drivers.
+   * @param t Timestamp returned from JDBC driver getTimestamp method.
+   * @return A LocalDateTime representing the same wall clock time as the timestamp in database.
+   */
+  @Since("3.5.0")
+  def convertJavaTimestampToTimestampNTZ(t: Timestamp): LocalDateTime = {

Review Comment:
   they are both timestamps in Java. How about `def toLocalDateTime` and `def toLegacyTimestamp`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1159289981


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala:
##########
@@ -98,6 +100,14 @@ private object PostgresDialect extends JdbcDialect with SQLConfHelper {
     case _ => None
   }
 
+  override def convertJavaTimestampToTimestampNTZ(t: Timestamp): Long = {
+    DateTimeUtils.localDateTimeToMicros(t.toLocalDateTime)

Review Comment:
   can we add a bit comments to explain why pgsql needs to override it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1160407865


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala:
##########
@@ -104,6 +105,23 @@ abstract class JdbcDialect extends Serializable with Logging {
    */
   def getJDBCType(dt: DataType): Option[JdbcType] = None
 
+  /**
+   * Convert java.sql.Timestamp to a Long value (internal representation of a TimestampNTZType)
+   * holding the microseconds since the epoch of 1970-01-01 00:00:00Z for this timestamp.
+   */
+  def convertJavaTimestampToTimestampNTZ(t: Timestamp): Long = {
+    DateTimeUtils.fromJavaTimestampNoRebase(t)
+  }
+
+  /**
+   * Converts a LocalDateTime representing a TimestampNTZ type to an
+   * instance of `java.sql.Timestamp`.
+   */
+  def convertTimestampNTZToJavaTimestamp(ldt: LocalDateTime): Timestamp = {

Review Comment:
   People can use `LocalDateTime.ofEpochSecond`, no need to specify year, month, day, ... fields



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1162360814


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala:
##########
@@ -104,6 +105,31 @@ abstract class JdbcDialect extends Serializable with Logging {
    */
   def getJDBCType(dt: DataType): Option[JdbcType] = None
 
+  /**
+   * Convert java.sql.Timestamp to a LocalDateTime representing the same wall-clock time as the
+   * value stored in a remote database.
+   * JDBC dialects should override this function to provide implementations that suite their
+   * JDBC drivers.
+   * @param t Timestamp returned from JDBC driver getTimestamp method.
+   * @return A LocalDateTime representing the same wall clock time as the timestamp in database.
+   */
+  @Since("3.5.0")
+  def convertJavaTimestampToTimestampNTZ(t: Timestamp): LocalDateTime = {

Review Comment:
   they are both timestamps in Java. How about `def toLocalDateTime` and `def toSQLTimestamp`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1166237216


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala:
##########
@@ -104,6 +105,31 @@ abstract class JdbcDialect extends Serializable with Logging {
    */
   def getJDBCType(dt: DataType): Option[JdbcType] = None
 
+  /**
+   * Convert java.sql.Timestamp to a LocalDateTime representing the same wall-clock time as the
+   * value stored in a remote database.
+   * JDBC dialects should override this function to provide implementations that suite their
+   * JDBC drivers.
+   * @param t Timestamp returned from JDBC driver getTimestamp method.
+   * @return A LocalDateTime representing the same wall clock time as the timestamp in database.
+   */
+  @Since("3.5.0")
+  def convertJavaTimestampToTimestampNTZ(t: Timestamp): LocalDateTime = {

Review Comment:
   SGTM



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1166516727


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala:
##########
@@ -368,12 +378,22 @@ object JdbcUtils extends Logging with SQLConfHelper {
    * Creates `JDBCValueGetter`s according to [[StructType]], which can set
    * each value from `ResultSet` to each field of [[InternalRow]] correctly.
    */
+<<<<<<< HEAD
+  private def makeGetters(
+      dialect: JdbcDialect,
+      schema: StructType): Array[JDBCValueGetter] =
+    schema.fields.map(sf => makeGetter(sf.dataType, dialect, sf.metadata))
+=======

Review Comment:
   please fix conflicts



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianhanhu-db commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "tianhanhu-db (via GitHub)" <gi...@apache.org>.
tianhanhu-db commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1160376168


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala:
##########
@@ -98,6 +100,14 @@ private object PostgresDialect extends JdbcDialect with SQLConfHelper {
     case _ => None
   }
 
+  override def convertJavaTimestampToTimestampNTZ(t: Timestamp): Long = {
+    DateTimeUtils.localDateTimeToMicros(t.toLocalDateTime)

Review Comment:
   I tried the Postgres specific solution for the existing H2 test and it is not working.
   
   I checked H2 driver as well and I think what happens is that H2 is creating the timestamp using the the milliseconds from epoch and THEN converting the wall clock time to the represent the instant in local Timezone. This change in order makes the difference. If we take the previous case as an example, the resultant Timestamp would be "2023-04-05 01:00:00 America/Los_Angeles". This represent the same instant as "2023-04-05 08:00:00 UTC" which is why storing its microseconds from epoch works. It also explain why converting to LocalDateTime (Postgres specific solution) would not work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "sadikovi (via GitHub)" <gi...@apache.org>.
sadikovi commented on PR #40678:
URL: https://github.com/apache/spark/pull/40678#issuecomment-1498401644

   cc @cloud-fan @dongjoon-hyun @beliefer 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1159289722


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala:
##########
@@ -104,6 +105,23 @@ abstract class JdbcDialect extends Serializable with Logging {
    */
   def getJDBCType(dt: DataType): Option[JdbcType] = None
 
+  /**
+   * Convert java.sql.Timestamp to a Long value (internal representation of a TimestampNTZType)
+   * holding the microseconds since the epoch of 1970-01-01 00:00:00Z for this timestamp.
+   */
+  def convertJavaTimestampToTimestampNTZ(t: Timestamp): Long = {

Review Comment:
   please add `@param`, `@return` and `@Since("3.5.0)`. dialect is a developer API and is user-facing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianhanhu-db commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "tianhanhu-db (via GitHub)" <gi...@apache.org>.
tianhanhu-db commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1160408842


##########
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala:
##########
@@ -138,6 +139,12 @@ class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite {
       "c0 money)").executeUpdate()
     conn.prepareStatement("INSERT INTO money_types VALUES " +
       "('$1,000.00')").executeUpdate()
+
+    conn.prepareStatement(s"CREATE TABLE timestamp_ntz(v timestamp)").executeUpdate()

Review Comment:
   It is actually the same case with H2 as well. `timestamp == timestamp without timezone` http://www.h2database.com/html/datatypes.html#timestamp_type



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1163518157


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala:
##########
@@ -104,6 +105,31 @@ abstract class JdbcDialect extends Serializable with Logging {
    */
   def getJDBCType(dt: DataType): Option[JdbcType] = None
 
+  /**
+   * Convert java.sql.Timestamp to a LocalDateTime representing the same wall-clock time as the
+   * value stored in a remote database.
+   * JDBC dialects should override this function to provide implementations that suite their
+   * JDBC drivers.
+   * @param t Timestamp returned from JDBC driver getTimestamp method.
+   * @return A LocalDateTime representing the same wall clock time as the timestamp in database.
+   */
+  @Since("3.5.0")
+  def convertJavaTimestampToTimestampNTZ(t: Timestamp): LocalDateTime = {
+    DateTimeUtils.microsToLocalDateTime(DateTimeUtils.fromJavaTimestampNoRebase(t))

Review Comment:
   The default dialect call `DateTimeUtils.microsToLocalDateTime` first and then call `DateTimeUtils.localDateTimeToMicros`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1162752090


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala:
##########
@@ -104,6 +105,31 @@ abstract class JdbcDialect extends Serializable with Logging {
    */
   def getJDBCType(dt: DataType): Option[JdbcType] = None
 
+  /**
+   * Convert java.sql.Timestamp to a LocalDateTime representing the same wall-clock time as the
+   * value stored in a remote database.
+   * JDBC dialects should override this function to provide implementations that suite their
+   * JDBC drivers.
+   * @param t Timestamp returned from JDBC driver getTimestamp method.
+   * @return A LocalDateTime representing the same wall clock time as the timestamp in database.
+   */
+  @Since("3.5.0")
+  def convertJavaTimestampToTimestampNTZ(t: Timestamp): LocalDateTime = {
+    DateTimeUtils.microsToLocalDateTime(DateTimeUtils.fromJavaTimestampNoRebase(t))

Review Comment:
   why is the conversion invalid?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianhanhu-db commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "tianhanhu-db (via GitHub)" <gi...@apache.org>.
tianhanhu-db commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1166162467


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala:
##########
@@ -104,6 +105,31 @@ abstract class JdbcDialect extends Serializable with Logging {
    */
   def getJDBCType(dt: DataType): Option[JdbcType] = None
 
+  /**
+   * Convert java.sql.Timestamp to a LocalDateTime representing the same wall-clock time as the
+   * value stored in a remote database.
+   * JDBC dialects should override this function to provide implementations that suite their
+   * JDBC drivers.
+   * @param t Timestamp returned from JDBC driver getTimestamp method.
+   * @return A LocalDateTime representing the same wall clock time as the timestamp in database.
+   */
+  @Since("3.5.0")
+  def convertJavaTimestampToTimestampNTZ(t: Timestamp): LocalDateTime = {

Review Comment:
   I would prefer to keep the names as they are.
   
   `toLocalDateTime` would collide with `Timestamp.toLocalDateTime` method which already exists. Also, I would like to emphasis that this is not a general Java type to Java type conversion, but special cased under the use case/interpretation of Spark TimestampNTZType. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan closed pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source
URL: https://github.com/apache/spark/pull/40678


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1162439129


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala:
##########
@@ -104,6 +105,31 @@ abstract class JdbcDialect extends Serializable with Logging {
    */
   def getJDBCType(dt: DataType): Option[JdbcType] = None
 
+  /**
+   * Convert java.sql.Timestamp to a LocalDateTime representing the same wall-clock time as the
+   * value stored in a remote database.
+   * JDBC dialects should override this function to provide implementations that suite their
+   * JDBC drivers.
+   * @param t Timestamp returned from JDBC driver getTimestamp method.
+   * @return A LocalDateTime representing the same wall clock time as the timestamp in database.
+   */
+  @Since("3.5.0")
+  def convertJavaTimestampToTimestampNTZ(t: Timestamp): LocalDateTime = {
+    DateTimeUtils.microsToLocalDateTime(DateTimeUtils.fromJavaTimestampNoRebase(t))

Review Comment:
   How about `DateTimeUtils.fromJavaTimestampNoRebase(t)` only ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1160377410


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala:
##########
@@ -98,6 +100,14 @@ private object PostgresDialect extends JdbcDialect with SQLConfHelper {
     case _ => None
   }
 
+  override def convertJavaTimestampToTimestampNTZ(t: Timestamp): Long = {
+    DateTimeUtils.localDateTimeToMicros(t.toLocalDateTime)

Review Comment:
   Does Postgres have TimestampNTZ ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "sadikovi (via GitHub)" <gi...@apache.org>.
sadikovi commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1159080729


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala:
##########
@@ -328,21 +328,25 @@ object JdbcUtils extends Logging with SQLConfHelper {
   /**
    * Convert a [[ResultSet]] into an iterator of Catalyst Rows.
    */
-  def resultSetToRows(resultSet: ResultSet, schema: StructType): Iterator[Row] = {
+  def resultSetToRows(
+      resultSet: ResultSet,
+      schema: StructType,
+      dialect: Option[JdbcDialect] = None): Iterator[Row] = {

Review Comment:
   Why is dialect optional? I think you can just pass dialect as JdbcDialect.



##########
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala:
##########
@@ -379,4 +386,32 @@ class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite {
     assert(row(0).length === 1)
     assert(row(0).getString(0) === "$1,000.00")
   }
+
+  test("SPARK-43040: timestamp_ntz read test") {
+    val prop = new Properties
+    prop.setProperty("preferTimestampNTZ", "true")
+    val df = sqlContext.read.jdbc(jdbcUrl, "timestamp_ntz", prop)
+    val row = df.collect()
+    assert(row.length === 3)
+    assert(row(0).length === 1)
+    assert(row(0) === Row(LocalDateTime.of(2013, 4, 5, 12, 1, 2)))

Review Comment:
   What result do you get without these changes?



##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala:
##########
@@ -98,6 +100,14 @@ private object PostgresDialect extends JdbcDialect with SQLConfHelper {
     case _ => None
   }
 
+  override def convertJavaTimestampToTimestampNTZ(t: Timestamp): Long = {
+    DateTimeUtils.localDateTimeToMicros(t.toLocalDateTime)
+  }
+
+  override  def convertTimestampNTZToJavaTimestamp(ldt: LocalDateTime): Timestamp = {

Review Comment:
   nit: extra space should be removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianhanhu-db commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "tianhanhu-db (via GitHub)" <gi...@apache.org>.
tianhanhu-db commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1160408228


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala:
##########
@@ -98,6 +100,14 @@ private object PostgresDialect extends JdbcDialect with SQLConfHelper {
     case _ => None
   }
 
+  override def convertJavaTimestampToTimestampNTZ(t: Timestamp): Long = {
+    DateTimeUtils.localDateTimeToMicros(t.toLocalDateTime)

Review Comment:
   In Postgres, `timestamp` is equivalent to timestamp without time zone.
   It has `timestamptz` to represent timestamp with time zone.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianhanhu-db commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "tianhanhu-db (via GitHub)" <gi...@apache.org>.
tianhanhu-db commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1160408278


##########
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala:
##########
@@ -138,6 +139,12 @@ class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite {
       "c0 money)").executeUpdate()
     conn.prepareStatement("INSERT INTO money_types VALUES " +
       "('$1,000.00')").executeUpdate()
+
+    conn.prepareStatement(s"CREATE TABLE timestamp_ntz(v timestamp)").executeUpdate()

Review Comment:
   Yes, in Postgres `timestamp` is equivalent to timestamp without time zone.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianhanhu-db commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "tianhanhu-db (via GitHub)" <gi...@apache.org>.
tianhanhu-db commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1160368163


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala:
##########
@@ -98,6 +100,14 @@ private object PostgresDialect extends JdbcDialect with SQLConfHelper {
     case _ => None
   }
 
+  override def convertJavaTimestampToTimestampNTZ(t: Timestamp): Long = {
+    DateTimeUtils.localDateTimeToMicros(t.toLocalDateTime)

Review Comment:
   I will give a concrete Postgres read example where the general implementation would fail.
   
   Say there is a Timestamp of "2023-04-05 08:00:00" stored in Postgres database and we want to read it as Spark TimestampNTZType from a TimeZone of America/Los_Angeles. The expected results would be "2023-04-05 08:00:00".
   
   When we do `PostgresDriver.getTimestamp`, what happens under the hood is that Postgres would use the default JVM TimeZone and create a Timestamp representing an instant of the wall clock in that time zone. Thus, the Java Timestamp effectively represents "2023-04-05 08:00:00 America/Los_Angeles". 
   
   With our general conversion, we will just store the underlining microseconds from epoch to represent the TimestampNTZType. This is problematic as when displaying the TimestampNTZType, we convert to a LocalDateTime using UTC as the time zone. This will give as an erroneous result of "2023-04-05 15:00:00".
   
   The Postgres specific conversion first convert the Java Timestamp to LocalDateTime before getting its underlining microseconds from epoch. This basically restores the Timestamp to represent "2023-04-05 08:00:00 UTC". Thus when converting back we get the correct result.
   
   For write it is the similar story.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianhanhu-db commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "tianhanhu-db (via GitHub)" <gi...@apache.org>.
tianhanhu-db commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1160376444


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala:
##########
@@ -98,6 +100,14 @@ private object PostgresDialect extends JdbcDialect with SQLConfHelper {
     case _ => None
   }
 
+  override def convertJavaTimestampToTimestampNTZ(t: Timestamp): Long = {
+    DateTimeUtils.localDateTimeToMicros(t.toLocalDateTime)

Review Comment:
   To conclude, JDBC drivers have different expected behaviors in regard to implementing "getTimestamp" and "setTimestamp". A general conversion strategy would not work for all of them.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yaooqinn commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1160399361


##########
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala:
##########
@@ -138,6 +139,12 @@ class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite {
       "c0 money)").executeUpdate()
     conn.prepareStatement("INSERT INTO money_types VALUES " +
       "('$1,000.00')").executeUpdate()
+
+    conn.prepareStatement(s"CREATE TABLE timestamp_ntz(v timestamp)").executeUpdate()

Review Comment:
   can we also add a column with `timestamp without time zone`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianhanhu-db commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "tianhanhu-db (via GitHub)" <gi...@apache.org>.
tianhanhu-db commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1160368163


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala:
##########
@@ -98,6 +100,14 @@ private object PostgresDialect extends JdbcDialect with SQLConfHelper {
     case _ => None
   }
 
+  override def convertJavaTimestampToTimestampNTZ(t: Timestamp): Long = {
+    DateTimeUtils.localDateTimeToMicros(t.toLocalDateTime)

Review Comment:
   I will give a concrete Postgres read example where the general implementation would fail.
   
   Say there is a Timestamp of "2023-04-05 08:00:00" stored in Postgres database and we want to read it as Spark TimestampNTZType from a TimeZone of America/Los_Angeles. The expected results would be "2023-04-05 08:00:00".
   
   When we do `PostgresDriver.getTimestamp`, what happens under the hood is that Postgres would use the default JVM TimeZone and create a Timestamp representing an instant of the wall clock in that time zone. Thus, the Java Timestamp effectively represents "2023-04-05 08:00:00 America/Los_Angeles". 
   
   With our general conversion, we will just store the underlining microseconds from epoch to represent the TimestampNTZType. This is problematic as when displaying the TimestampNTZType, we convert to a LocalDateTime using UTC as the time zone. This will give as an erroneous result of "2023-04-05 15:00:00".
   
   The Postgres specific conversion first convert the Java Timestamp to LocalDateTime before getting its underlining milliseconds from epoch. This basically restores the Timestamp to represent "2023-04-05 08:00:00 UTC". Thus when converting back we get the correct result.
   
   For write it is the similar story. @cloud-fan @beliefer 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1160408076


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala:
##########
@@ -98,6 +100,14 @@ private object PostgresDialect extends JdbcDialect with SQLConfHelper {
     case _ => None
   }
 
+  override def convertJavaTimestampToTimestampNTZ(t: Timestamp): Long = {
+    DateTimeUtils.localDateTimeToMicros(t.toLocalDateTime)

Review Comment:
   almost every database has timestamp ntz.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1159302383


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala:
##########
@@ -98,6 +100,14 @@ private object PostgresDialect extends JdbcDialect with SQLConfHelper {
     case _ => None
   }
 
+  override def convertJavaTimestampToTimestampNTZ(t: Timestamp): Long = {
+    DateTimeUtils.localDateTimeToMicros(t.toLocalDateTime)

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianhanhu-db commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "tianhanhu-db (via GitHub)" <gi...@apache.org>.
tianhanhu-db commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1159090252


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala:
##########
@@ -328,21 +328,25 @@ object JdbcUtils extends Logging with SQLConfHelper {
   /**
    * Convert a [[ResultSet]] into an iterator of Catalyst Rows.
    */
-  def resultSetToRows(resultSet: ResultSet, schema: StructType): Iterator[Row] = {
+  def resultSetToRows(
+      resultSet: ResultSet,
+      schema: StructType,
+      dialect: Option[JdbcDialect] = None): Iterator[Row] = {

Review Comment:
   I am not sure how and where this function is used.
   As it is a public function, I am thinking maybe we want to keep backward compatibilities?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianhanhu-db commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "tianhanhu-db (via GitHub)" <gi...@apache.org>.
tianhanhu-db commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1159101781


##########
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala:
##########
@@ -379,4 +386,32 @@ class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite {
     assert(row(0).length === 1)
     assert(row(0).getString(0) === "$1,000.00")
   }
+
+  test("SPARK-43040: timestamp_ntz read test") {
+    val prop = new Properties
+    prop.setProperty("preferTimestampNTZ", "true")
+    val df = sqlContext.read.jdbc(jdbcUrl, "timestamp_ntz", prop)
+    val row = df.collect()
+    assert(row.length === 3)
+    assert(row(0).length === 1)
+    assert(row(0) === Row(LocalDateTime.of(2013, 4, 5, 12, 1, 2)))

Review Comment:
   2013-04-05T19:01:02
   The results would be 7 hours ahead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianhanhu-db commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "tianhanhu-db (via GitHub)" <gi...@apache.org>.
tianhanhu-db commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1164427398


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala:
##########
@@ -104,6 +105,31 @@ abstract class JdbcDialect extends Serializable with Logging {
    */
   def getJDBCType(dt: DataType): Option[JdbcType] = None
 
+  /**
+   * Convert java.sql.Timestamp to a LocalDateTime representing the same wall-clock time as the
+   * value stored in a remote database.
+   * JDBC dialects should override this function to provide implementations that suite their
+   * JDBC drivers.
+   * @param t Timestamp returned from JDBC driver getTimestamp method.
+   * @return A LocalDateTime representing the same wall clock time as the timestamp in database.
+   */
+  @Since("3.5.0")
+  def convertJavaTimestampToTimestampNTZ(t: Timestamp): LocalDateTime = {
+    DateTimeUtils.microsToLocalDateTime(DateTimeUtils.fromJavaTimestampNoRebase(t))

Review Comment:
   You are right that converting from long to LocalDateTime and back is duplicated work, 
   but the goal is to make `convertJavaTimestampToTimestampNTZ` and `convertTimestampNTZToJavaTimestamp` symmetric.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1166466549


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala:
##########
@@ -104,6 +105,31 @@ abstract class JdbcDialect extends Serializable with Logging {
    */
   def getJDBCType(dt: DataType): Option[JdbcType] = None
 
+  /**
+   * Convert java.sql.Timestamp to a LocalDateTime representing the same wall-clock time as the
+   * value stored in a remote database.
+   * JDBC dialects should override this function to provide implementations that suite their
+   * JDBC drivers.
+   * @param t Timestamp returned from JDBC driver getTimestamp method.
+   * @return A LocalDateTime representing the same wall clock time as the timestamp in database.
+   */
+  @Since("3.5.0")
+  def convertJavaTimestampToTimestampNTZ(t: Timestamp): LocalDateTime = {
+    DateTimeUtils.microsToLocalDateTime(DateTimeUtils.fromJavaTimestampNoRebase(t))

Review Comment:
   So we should make long as the return type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianhanhu-db commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "tianhanhu-db (via GitHub)" <gi...@apache.org>.
tianhanhu-db commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1166162783


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala:
##########
@@ -104,6 +105,31 @@ abstract class JdbcDialect extends Serializable with Logging {
    */
   def getJDBCType(dt: DataType): Option[JdbcType] = None
 
+  /**
+   * Convert java.sql.Timestamp to a LocalDateTime representing the same wall-clock time as the
+   * value stored in a remote database.
+   * JDBC dialects should override this function to provide implementations that suite their
+   * JDBC drivers.
+   * @param t Timestamp returned from JDBC driver getTimestamp method.
+   * @return A LocalDateTime representing the same wall clock time as the timestamp in database.
+   */
+  @Since("3.5.0")
+  def convertJavaTimestampToTimestampNTZ(t: Timestamp): LocalDateTime = {
+    DateTimeUtils.microsToLocalDateTime(DateTimeUtils.fromJavaTimestampNoRebase(t))

Review Comment:
   I think I am missing something here.
   
   "`DateTimeUtils.fromJavaTimestampNoRebase(t)` only" would result in a Long value, where we wish for a `LocalDateTime` as the return type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on PR #40678:
URL: https://github.com/apache/spark/pull/40678#issuecomment-1535607944

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianhanhu-db commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "tianhanhu-db (via GitHub)" <gi...@apache.org>.
tianhanhu-db commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1160377729


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala:
##########
@@ -104,6 +105,23 @@ abstract class JdbcDialect extends Serializable with Logging {
    */
   def getJDBCType(dt: DataType): Option[JdbcType] = None
 
+  /**
+   * Convert java.sql.Timestamp to a Long value (internal representation of a TimestampNTZType)
+   * holding the microseconds since the epoch of 1970-01-01 00:00:00Z for this timestamp.
+   */
+  def convertJavaTimestampToTimestampNTZ(t: Timestamp): Long = {
+    DateTimeUtils.fromJavaTimestampNoRebase(t)
+  }
+
+  /**
+   * Converts a LocalDateTime representing a TimestampNTZ type to an
+   * instance of `java.sql.Timestamp`.
+   */
+  def convertTimestampNTZToJavaTimestamp(ldt: LocalDateTime): Timestamp = {

Review Comment:
   It would be hard to make them symmetric actually.
   For example, the neutral conversion solution here stores the Java Timestamp's underlining microseconds from epoch directly into the long value. It would be wrong to construct a `LocalDateTime` from the year, day, ..., minute, second fields in the Timestamp.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianhanhu-db commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "tianhanhu-db (via GitHub)" <gi...@apache.org>.
tianhanhu-db commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1160380014


##########
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala:
##########
@@ -379,4 +386,32 @@ class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite {
     assert(row(0).length === 1)
     assert(row(0).getString(0) === "$1,000.00")
   }
+
+  test("SPARK-43040: timestamp_ntz read test") {
+    val prop = new Properties
+    prop.setProperty("preferTimestampNTZ", "true")
+    val df = sqlContext.read.jdbc(jdbcUrl, "timestamp_ntz", prop)
+    val row = df.collect()

Review Comment:
   Do you mean `spark.sql.datetime.java8API.enabled`? This only controls `TimestampType` but not `TimestampNTZType` if I am not mistaken.
   `TimestampNTZType` always converts to `LocalDateTime`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1162441277


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala:
##########
@@ -98,6 +100,14 @@ private object PostgresDialect extends JdbcDialect with SQLConfHelper {
     case _ => None
   }
 
+  override def convertJavaTimestampToTimestampNTZ(t: Timestamp): LocalDateTime = {
+    t.toLocalDateTime

Review Comment:
   After the change refer: https://github.com/apache/spark/pull/40678/files#r1162437868
   We can update with `DateTimeUtils.localDateTimeToMicros(t.toLocalDateTime)` here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1164945683


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala:
##########
@@ -104,6 +105,31 @@ abstract class JdbcDialect extends Serializable with Logging {
    */
   def getJDBCType(dt: DataType): Option[JdbcType] = None
 
+  /**
+   * Convert java.sql.Timestamp to a LocalDateTime representing the same wall-clock time as the
+   * value stored in a remote database.
+   * JDBC dialects should override this function to provide implementations that suite their
+   * JDBC drivers.
+   * @param t Timestamp returned from JDBC driver getTimestamp method.
+   * @return A LocalDateTime representing the same wall clock time as the timestamp in database.
+   */
+  @Since("3.5.0")
+  def convertJavaTimestampToTimestampNTZ(t: Timestamp): LocalDateTime = {
+    DateTimeUtils.microsToLocalDateTime(DateTimeUtils.fromJavaTimestampNoRebase(t))

Review Comment:
   Yeah. The goal is good, but we should improve the code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1160410345


##########
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala:
##########
@@ -138,6 +139,12 @@ class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite {
       "c0 money)").executeUpdate()
     conn.prepareStatement("INSERT INTO money_types VALUES " +
       "('$1,000.00')").executeUpdate()
+
+    conn.prepareStatement(s"CREATE TABLE timestamp_ntz(v timestamp)").executeUpdate()

Review Comment:
   > It is actually the same case with H2 as well. `timestamp == timestamp without timezone` http://www.h2database.com/html/datatypes.html#timestamp_type
   
   Yeah. I forgot it. Because H2 dialect using the mapping `Spark timestampNTZ <-> H2 timestamp without timezone`, so I think Postgres dialect should follows the behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1162437868


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala:
##########
@@ -104,6 +105,31 @@ abstract class JdbcDialect extends Serializable with Logging {
    */
   def getJDBCType(dt: DataType): Option[JdbcType] = None
 
+  /**
+   * Convert java.sql.Timestamp to a LocalDateTime representing the same wall-clock time as the
+   * value stored in a remote database.
+   * JDBC dialects should override this function to provide implementations that suite their
+   * JDBC drivers.
+   * @param t Timestamp returned from JDBC driver getTimestamp method.
+   * @return A LocalDateTime representing the same wall clock time as the timestamp in database.
+   */
+  @Since("3.5.0")
+  def convertJavaTimestampToTimestampNTZ(t: Timestamp): LocalDateTime = {
+    DateTimeUtils.microsToLocalDateTime(DateTimeUtils.fromJavaTimestampNoRebase(t))

Review Comment:
   The default dialect seems call `DateTimeUtils.microsToLocalDateTime` first and then call `DateTimeUtils.localDateTimeToMicros`. We should avoid this invalid conversation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40678: [SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40678:
URL: https://github.com/apache/spark/pull/40678#discussion_r1159287620


##########
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala:
##########
@@ -379,4 +386,32 @@ class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite {
     assert(row(0).length === 1)
     assert(row(0).getString(0) === "$1,000.00")
   }
+
+  test("SPARK-43040: timestamp_ntz read test") {
+    val prop = new Properties
+    prop.setProperty("preferTimestampNTZ", "true")
+    val df = sqlContext.read.jdbc(jdbcUrl, "timestamp_ntz", prop)
+    val row = df.collect()

Review Comment:
   do we enable java8 datetime API somewhere? otherwise `df.collect` should return `java.sql.Timestamp`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org