You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/12/21 13:16:37 UTC

[GitHub] [spark] MaxGekk opened a new pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet file metadata

MaxGekk opened a new pull request #34973:
URL: https://github.com/apache/spark/pull/34973


   ### What changes were proposed in this pull request?
   In the PR, I propose to add new metadata key `org.apache.spark.timeZone` which Spark writes to parquet matadata while performing rebase of timestamps or dates.
   
   ### Why are the changes needed?
   Before the changes, Spark assumes that a writer uses the default JVM time zone while rebasing of dates/timestamps. And if a reader and the writer have different JVM time zone settings, the reader cannot load such columns correctly. So, the reader will have full info about writer settings after the changes.
   
   ### Does this PR introduce _any_ user-facing change?
   No, in the default case but behavior can be different when JVM time zone is different from the session time zone.
   
   ### How was this patch tested?
   By running new tests:
   ```
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000783702


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/51036/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000763559


   **[Test build #146561 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146561/testReport)** for PR 34973 at commit [`68a3504`](https://github.com/apache/spark/commit/68a3504a2eafd2f94ec04326355bb475905a6dce).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000847915


   **[Test build #146566 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146566/testReport)** for PR 34973 at commit [`e06f5f4`](https://github.com/apache/spark/commit/e06f5f42b0cb5d65fa9602c4011b691ec10b6b97).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000885553


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/51041/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #34973:
URL: https://github.com/apache/spark/pull/34973


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000796444


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146551/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000421982


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146517/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000708386


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/51026/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet file metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-998886084


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50915/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775466900



##########
File path: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
##########
@@ -1786,8 +1787,8 @@ abstract class AvroSuite
   }
 
   // It generates input files for the test below:
-  // "SPARK-31183: compatibility with Spark 2.4 in reading dates/timestamps"
-  ignore("SPARK-31855: generate test files for checking compatibility with Spark 2.4") {
+  // "SPARK-31183, SPARK-37705: compatibility with Spark 2.4/3.2 in reading dates/timestamps"
+  ignore("SPARK-31855: generate test files for checking compatibility with Spark 2.4/3.2") {

Review comment:
       how do you use this test to generate the new testing data files?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775473908



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala
##########
@@ -464,30 +485,43 @@ object RebaseDateTime {
   final val lastSwitchJulianTs: Long = getLastSwitchTs(julianGregRebaseMap)
 
   /**
-   * An optimized version of [[rebaseJulianToGregorianMicros(ZoneId, Long)]]. This method leverages
-   * the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't contain
-   * information about the current JVM system time zone or `micros` is related to Before Common Era,
-   * the function falls back to the regular unoptimized version.
-   *
-   * Note: The function assumes that the input micros was derived from a local timestamp
-   *       at the default system JVM time zone in the Julian calendar.
+   * An optimized version of [[rebaseJulianToGregorianMicros(TimeZone, Long)]]. This method
+   * leverages the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't
+   * contain information about the given time zone `timeZoneId` or `micros` is related to Before
+   * Common Era, the function falls back to the regular unoptimized version.
    *
+   * @param timeZoneId A string identifier of a time zone.
    * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
    *               in the Julian calendar. It can be negative.
    * @return The rebased microseconds since the epoch in Proleptic Gregorian calendar.
    */
-  def rebaseJulianToGregorianMicros(micros: Long): Long = {
+  def rebaseJulianToGregorianMicros(timeZoneId: String, micros: Long): Long = {
     if (micros >= lastSwitchJulianTs) {
       micros
     } else {
-      val timeZone = TimeZone.getDefault
-      val tzId = timeZone.getID
-      val rebaseRecord = julianGregRebaseMap.getOrNull(tzId)
+      val rebaseRecord = julianGregRebaseMap.getOrNull(timeZoneId)
       if (rebaseRecord == null || micros < rebaseRecord.switches(0)) {
-        rebaseJulianToGregorianMicros(timeZone, micros)
+        rebaseJulianToGregorianMicros(TimeZone.getTimeZone(timeZoneId), micros)
       } else {
         rebaseMicros(rebaseRecord, micros)
       }
     }
   }
+
+  /**
+   * An optimized version of [[rebaseJulianToGregorianMicros(TimeZone, Long)]]. This method
+   * leverages the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't
+   * contain information about the current JVM system time zone or `micros` is related to Before
+   * Common Era, the function falls back to the regular unoptimized version.
+   *
+   * Note: The function assumes that the input micros was derived from a local timestamp
+   *       at the default system JVM time zone in the Julian calendar.
+   *
+   * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
+   *               in the Julian calendar. It can be negative.
+   * @return The rebased microseconds since the epoch in Proleptic Gregorian calendar.
+   */
+  def rebaseJulianToGregorianMicros(micros: Long): Long = {

Review comment:
       Actually, it should be used everywhere except rebasing in Parquet/Avro like timestamp formatting.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775807894



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala
##########
@@ -464,30 +485,43 @@ object RebaseDateTime {
   final val lastSwitchJulianTs: Long = getLastSwitchTs(julianGregRebaseMap)
 
   /**
-   * An optimized version of [[rebaseJulianToGregorianMicros(ZoneId, Long)]]. This method leverages
-   * the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't contain
-   * information about the current JVM system time zone or `micros` is related to Before Common Era,
-   * the function falls back to the regular unoptimized version.
-   *
-   * Note: The function assumes that the input micros was derived from a local timestamp
-   *       at the default system JVM time zone in the Julian calendar.
+   * An optimized version of [[rebaseJulianToGregorianMicros(TimeZone, Long)]]. This method
+   * leverages the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't
+   * contain information about the given time zone `timeZoneId` or `micros` is related to Before
+   * Common Era, the function falls back to the regular unoptimized version.
    *
+   * @param timeZoneId A string identifier of a time zone.
    * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
    *               in the Julian calendar. It can be negative.
    * @return The rebased microseconds since the epoch in Proleptic Gregorian calendar.
    */
-  def rebaseJulianToGregorianMicros(micros: Long): Long = {
+  def rebaseJulianToGregorianMicros(timeZoneId: String, micros: Long): Long = {
     if (micros >= lastSwitchJulianTs) {
       micros
     } else {
-      val timeZone = TimeZone.getDefault
-      val tzId = timeZone.getID
-      val rebaseRecord = julianGregRebaseMap.getOrNull(tzId)
+      val rebaseRecord = julianGregRebaseMap.getOrNull(timeZoneId)
       if (rebaseRecord == null || micros < rebaseRecord.switches(0)) {
-        rebaseJulianToGregorianMicros(timeZone, micros)
+        rebaseJulianToGregorianMicros(TimeZone.getTimeZone(timeZoneId), micros)
       } else {
         rebaseMicros(rebaseRecord, micros)
       }
     }
   }
+
+  /**
+   * An optimized version of [[rebaseJulianToGregorianMicros(TimeZone, Long)]]. This method
+   * leverages the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't
+   * contain information about the current JVM system time zone or `micros` is related to Before
+   * Common Era, the function falls back to the regular unoptimized version.
+   *
+   * Note: The function assumes that the input micros was derived from a local timestamp
+   *       at the default system JVM time zone in the Julian calendar.
+   *
+   * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
+   *               in the Julian calendar. It can be negative.
+   * @return The rebased microseconds since the epoch in Proleptic Gregorian calendar.
+   */
+  def rebaseJulianToGregorianMicros(micros: Long): Long = {

Review comment:
       Then we are fine:
   1. when converting between java.sql.Timestamp, we must rely on JVM timezone
   2. when reading/writing JSON/CSV with legacy mode, we need to follow the legacy behavior and rebase the timestamp. We should have used the session timezone to rebase, but the ship is already sailed and we have to keep using JVM timezone now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775722102



##########
File path: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
##########
@@ -1786,8 +1787,8 @@ abstract class AvroSuite
   }
 
   // It generates input files for the test below:
-  // "SPARK-31183: compatibility with Spark 2.4 in reading dates/timestamps"
-  ignore("SPARK-31855: generate test files for checking compatibility with Spark 2.4") {
+  // "SPARK-31183, SPARK-37705: compatibility with Spark 2.4/3.2 in reading dates/timestamps"
+  ignore("SPARK-31855: generate test files for checking compatibility with Spark 2.4/3.2") {

Review comment:
       hmm, looking at the code, seems the version is fixed as `2_4_6`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000421982


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146517/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000763559


   **[Test build #146561 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146561/testReport)** for PR 34973 at commit [`68a3504`](https://github.com/apache/spark/commit/68a3504a2eafd2f94ec04326355bb475905a6dce).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000556311


   **[Test build #146538 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146538/testReport)** for PR 34973 at commit [`2a3fd19`](https://github.com/apache/spark/commit/2a3fd19fa32328b6728d2c470ce4f0b5035e43d8).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000669744


   **[Test build #146551 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146551/testReport)** for PR 34973 at commit [`6282a9e`](https://github.com/apache/spark/commit/6282a9e40535f30933889fa90cdc5f8a85ec9a82).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999584141


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50957/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999619805


   @cloud-fan @sadikovi Could you have a look at this PR. I am still working on it but existing tests have passed already. Need to just add more tests for the cases when JVM and the session time zone are different in the `LEGACY` mode, and check backward compatibility (maybe something more, any ideas are welcome).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775472996



##########
File path: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
##########
@@ -1933,7 +1934,8 @@ abstract class AvroSuite
       // By default we should fail to read ancient datetime values when parquet files don't
       // contain Spark version.
       "2_4_5" -> failInRead _,
-      "2_4_6" -> successInRead _
+      "2_4_6" -> successInRead _,
+      "3_2_2" -> successInRead _

Review comment:
       I used `branch-3.2` to generate the golden files. See the version in PR description:
   ```
   $ java -jar parquet-tools-1.12.0.jar meta sql/core/src/test/resources/test-data/before_1582_timestamp_micros_v3_2_2.snappy.parquet
   file:        file:/Users/maximgekk/proj/parquet-rebase-save-tz/sql/core/src/test/resources/test-data/before_1582_timestamp_micros_v3_2_2.snappy.parquet
   creator:     parquet-mr version 1.12.2 (build 77e30c8093386ec52c3cfa6c34b7ef3321322c94)
   extra:       org.apache.spark.version = 3.2.2
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775792988



##########
File path: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
##########
@@ -1786,8 +1787,8 @@ abstract class AvroSuite
   }
 
   // It generates input files for the test below:
-  // "SPARK-31183: compatibility with Spark 2.4 in reading dates/timestamps"
-  ignore("SPARK-31855: generate test files for checking compatibility with Spark 2.4") {
+  // "SPARK-31183, SPARK-37705: compatibility with Spark 2.4/3.2 in reading dates/timestamps"
+  ignore("SPARK-31855: generate test files for checking compatibility with Spark 2.4/3.2") {

Review comment:
       Yep, made it more flexible by using `SPARK_VERSION_SHORT`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775826435



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala
##########
@@ -464,30 +485,43 @@ object RebaseDateTime {
   final val lastSwitchJulianTs: Long = getLastSwitchTs(julianGregRebaseMap)
 
   /**
-   * An optimized version of [[rebaseJulianToGregorianMicros(ZoneId, Long)]]. This method leverages
-   * the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't contain
-   * information about the current JVM system time zone or `micros` is related to Before Common Era,
-   * the function falls back to the regular unoptimized version.
-   *
-   * Note: The function assumes that the input micros was derived from a local timestamp
-   *       at the default system JVM time zone in the Julian calendar.
+   * An optimized version of [[rebaseJulianToGregorianMicros(TimeZone, Long)]]. This method
+   * leverages the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't
+   * contain information about the given time zone `timeZoneId` or `micros` is related to Before
+   * Common Era, the function falls back to the regular unoptimized version.
    *
+   * @param timeZoneId A string identifier of a time zone.
    * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
    *               in the Julian calendar. It can be negative.
    * @return The rebased microseconds since the epoch in Proleptic Gregorian calendar.
    */
-  def rebaseJulianToGregorianMicros(micros: Long): Long = {
+  def rebaseJulianToGregorianMicros(timeZoneId: String, micros: Long): Long = {
     if (micros >= lastSwitchJulianTs) {
       micros
     } else {
-      val timeZone = TimeZone.getDefault
-      val tzId = timeZone.getID
-      val rebaseRecord = julianGregRebaseMap.getOrNull(tzId)
+      val rebaseRecord = julianGregRebaseMap.getOrNull(timeZoneId)
       if (rebaseRecord == null || micros < rebaseRecord.switches(0)) {
-        rebaseJulianToGregorianMicros(timeZone, micros)
+        rebaseJulianToGregorianMicros(TimeZone.getTimeZone(timeZoneId), micros)
       } else {
         rebaseMicros(rebaseRecord, micros)
       }
     }
   }
+
+  /**
+   * An optimized version of [[rebaseJulianToGregorianMicros(TimeZone, Long)]]. This method
+   * leverages the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't
+   * contain information about the current JVM system time zone or `micros` is related to Before
+   * Common Era, the function falls back to the regular unoptimized version.
+   *
+   * Note: The function assumes that the input micros was derived from a local timestamp
+   *       at the default system JVM time zone in the Julian calendar.
+   *
+   * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
+   *               in the Julian calendar. It can be negative.
+   * @return The rebased microseconds since the epoch in Proleptic Gregorian calendar.
+   */
+  def rebaseJulianToGregorianMicros(micros: Long): Long = {

Review comment:
       > ParquetFilters.timestampToMicros <-- NEED TO RE-CHECK THIS ONE
   
   This is fine too since the rebasing happens locally on the same JVM. Also users can enable Java 8 for filters, see https://github.com/apache/spark/pull/28696




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000881032


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/51041/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775475285



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRebaseDatetimeSuite.scala
##########
@@ -170,7 +173,8 @@ abstract class ParquetRebaseDatetimeSuite
       // By default we should fail to read ancient datetime values when parquet files don't
       // contain Spark version.
       "2_4_5" -> failInRead _,
-      "2_4_6" -> successInRead _).foreach { case (version, checkDefaultRead) =>
+      "2_4_6" -> successInRead _,
+      "3_2_2" -> successInRead _).foreach { case (version, checkDefaultRead) =>

Review comment:
       The generated files have `3.2.2` in metadata, in fact. I would keep it as is. For us, more important that the files have the keys:
   ```
   extra:       org.apache.spark.legacyINT96 =
   extra:       org.apache.spark.legacyDateTime =
   ```
   but **not any time zone**.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775802722



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala
##########
@@ -464,30 +485,43 @@ object RebaseDateTime {
   final val lastSwitchJulianTs: Long = getLastSwitchTs(julianGregRebaseMap)
 
   /**
-   * An optimized version of [[rebaseJulianToGregorianMicros(ZoneId, Long)]]. This method leverages
-   * the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't contain
-   * information about the current JVM system time zone or `micros` is related to Before Common Era,
-   * the function falls back to the regular unoptimized version.
-   *
-   * Note: The function assumes that the input micros was derived from a local timestamp
-   *       at the default system JVM time zone in the Julian calendar.
+   * An optimized version of [[rebaseJulianToGregorianMicros(TimeZone, Long)]]. This method
+   * leverages the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't
+   * contain information about the given time zone `timeZoneId` or `micros` is related to Before
+   * Common Era, the function falls back to the regular unoptimized version.
    *
+   * @param timeZoneId A string identifier of a time zone.
    * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
    *               in the Julian calendar. It can be negative.
    * @return The rebased microseconds since the epoch in Proleptic Gregorian calendar.
    */
-  def rebaseJulianToGregorianMicros(micros: Long): Long = {
+  def rebaseJulianToGregorianMicros(timeZoneId: String, micros: Long): Long = {
     if (micros >= lastSwitchJulianTs) {
       micros
     } else {
-      val timeZone = TimeZone.getDefault
-      val tzId = timeZone.getID
-      val rebaseRecord = julianGregRebaseMap.getOrNull(tzId)
+      val rebaseRecord = julianGregRebaseMap.getOrNull(timeZoneId)
       if (rebaseRecord == null || micros < rebaseRecord.switches(0)) {
-        rebaseJulianToGregorianMicros(timeZone, micros)
+        rebaseJulianToGregorianMicros(TimeZone.getTimeZone(timeZoneId), micros)
       } else {
         rebaseMicros(rebaseRecord, micros)
       }
     }
   }
+
+  /**
+   * An optimized version of [[rebaseJulianToGregorianMicros(TimeZone, Long)]]. This method
+   * leverages the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't
+   * contain information about the current JVM system time zone or `micros` is related to Before
+   * Common Era, the function falls back to the regular unoptimized version.
+   *
+   * Note: The function assumes that the input micros was derived from a local timestamp
+   *       at the default system JVM time zone in the Julian calendar.
+   *
+   * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
+   *               in the Julian calendar. It can be negative.
+   * @return The rebased microseconds since the epoch in Proleptic Gregorian calendar.
+   */
+  def rebaseJulianToGregorianMicros(micros: Long): Long = {

Review comment:
       `rebaseGregorianToJulianMicros` is used from:
   1. `toJavaTimestamp` which is used (33 times) from:
      1. `CatalystTypeConverter.TimestampConverter`
      2. `Iso8601TimestampFormatter.format()`
      3. `LegacySimpleTimestampFormatter.format`
      4. `HiveInspectors`
      5. `BaseScriptTransformationExec.outputFieldWriters`
      6. `JdbcUtils.makeSetter`
      7. `OrcFilters.castLiteralValue`
      8. `OrcSerializer.newConverter` 
   2. `LegacyFastTimestampFormatter.format()` (the same as for `parse()` above).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1002111054


   @MaxGekk can you open a backport PR for 3.2? There are conflicts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000220584


   **[Test build #146517 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146517/testReport)** for PR 34973 at commit [`ece9afc`](https://github.com/apache/spark/commit/ece9afc1c406c4390b39825a0fc888163e52ef82).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775467784



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRebaseDatetimeSuite.scala
##########
@@ -170,7 +173,8 @@ abstract class ParquetRebaseDatetimeSuite
       // By default we should fail to read ancient datetime values when parquet files don't
       // contain Spark version.
       "2_4_5" -> failInRead _,
-      "2_4_6" -> successInRead _).foreach { case (version, checkDefaultRead) =>
+      "2_4_6" -> successInRead _,
+      "3_2_2" -> successInRead _).foreach { case (version, checkDefaultRead) =>

Review comment:
       ditto, should be `3_2_1`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000553756


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/51013/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775708833



##########
File path: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
##########
@@ -1933,7 +1934,8 @@ abstract class AvroSuite
       // By default we should fail to read ancient datetime values when parquet files don't
       // contain Spark version.
       "2_4_5" -> failInRead _,
-      "2_4_6" -> successInRead _
+      "2_4_6" -> successInRead _,
+      "3_2_2" -> successInRead _

Review comment:
       @gengliangwang can we fix it? otherwise, we will have a problem when releasing 3.2.1.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775799409



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala
##########
@@ -464,30 +485,43 @@ object RebaseDateTime {
   final val lastSwitchJulianTs: Long = getLastSwitchTs(julianGregRebaseMap)
 
   /**
-   * An optimized version of [[rebaseJulianToGregorianMicros(ZoneId, Long)]]. This method leverages
-   * the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't contain
-   * information about the current JVM system time zone or `micros` is related to Before Common Era,
-   * the function falls back to the regular unoptimized version.
-   *
-   * Note: The function assumes that the input micros was derived from a local timestamp
-   *       at the default system JVM time zone in the Julian calendar.
+   * An optimized version of [[rebaseJulianToGregorianMicros(TimeZone, Long)]]. This method
+   * leverages the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't
+   * contain information about the given time zone `timeZoneId` or `micros` is related to Before
+   * Common Era, the function falls back to the regular unoptimized version.
    *
+   * @param timeZoneId A string identifier of a time zone.
    * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
    *               in the Julian calendar. It can be negative.
    * @return The rebased microseconds since the epoch in Proleptic Gregorian calendar.
    */
-  def rebaseJulianToGregorianMicros(micros: Long): Long = {
+  def rebaseJulianToGregorianMicros(timeZoneId: String, micros: Long): Long = {
     if (micros >= lastSwitchJulianTs) {
       micros
     } else {
-      val timeZone = TimeZone.getDefault
-      val tzId = timeZone.getID
-      val rebaseRecord = julianGregRebaseMap.getOrNull(tzId)
+      val rebaseRecord = julianGregRebaseMap.getOrNull(timeZoneId)
       if (rebaseRecord == null || micros < rebaseRecord.switches(0)) {
-        rebaseJulianToGregorianMicros(timeZone, micros)
+        rebaseJulianToGregorianMicros(TimeZone.getTimeZone(timeZoneId), micros)
       } else {
         rebaseMicros(rebaseRecord, micros)
       }
     }
   }
+
+  /**
+   * An optimized version of [[rebaseJulianToGregorianMicros(TimeZone, Long)]]. This method
+   * leverages the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't
+   * contain information about the current JVM system time zone or `micros` is related to Before
+   * Common Era, the function falls back to the regular unoptimized version.
+   *
+   * Note: The function assumes that the input micros was derived from a local timestamp
+   *       at the default system JVM time zone in the Julian calendar.
+   *
+   * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
+   *               in the Julian calendar. It can be negative.
+   * @return The rebased microseconds since the epoch in Proleptic Gregorian calendar.
+   */
+  def rebaseJulianToGregorianMicros(micros: Long): Long = {

Review comment:
       `rebaseJulianToGregorianMicros` is used from:
   1. `fromJavaTimestamp()` which is used (55 times) from
       1. `CatalystTypeConverter.TimestampConverter`
       2. `CatalogColumnStat.fromExternalString()`
       3. `Literal.apply()`
       4. `LegacyFastTimestampFormatter.format()`
       5. `LegacySimpleTimestampFormatter.parse()`
       6. `HiveInspectors`
       7. `HadoopTableReader`
       8. `KafkaRecordToRowConverter.toInternalRowWithoutHeaders` & `toInternalRowWithHeaders`
       9. `JdbcUtils.makeGetter()`
       10. `OrcAtomicColumnVector`
       11. `ParquetFilters.timestampToMicros` <-- NEED TO RE-CHECK THIS ONE
       12. In many tests
   2. `LegacyFastTimestampFormatter.parse()` which is used (20 times) from:
       1. `CSVInferSchema`
       2. `UnivocityGenerator`
       3. `UnivocityParser`
       4. `JacksonGenerator`
       5. `JacksonParser`
       6. `JsonInferSchema` 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775479636



##########
File path: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
##########
@@ -1933,7 +1934,8 @@ abstract class AvroSuite
       // By default we should fail to read ancient datetime values when parquet files don't
       // contain Spark version.
       "2_4_5" -> failInRead _,
-      "2_4_6" -> successInRead _
+      "2_4_6" -> successInRead _,
+      "3_2_2" -> successInRead _

Review comment:
       I have checked out `branch-3.2` and ran:
   ```
   $ ./build/mvn -Phive -Phive-thriftserver -Dskip -DskipTests package
   ```
   and got `core/target/extra-resources/spark-version-info.properties`:
   ```
   $ cat ./core/target/extra-resources/spark-version-info.properties
   version=3.2.2-SNAPSHOT
   user=maximgekk
   revision=d1cd110c20817eb1ccd716e099be5712df1f670c
   branch=branch-3.2
   date=2021-12-23T19:12:40Z
   url=https://github.com/apache/spark.git
   ```
   should be **3.2.1-SNAPSHOT** Not? cc @gengliangwang @cloud-fan ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000556424


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146538/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999466099


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50950/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000923272


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146566/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000813084


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146561/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999635345


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146474/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999631298


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50957/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000283648


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50993/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000553756


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/51013/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet file metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-998886084


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50915/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet file metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999010943


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146440/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999438450


   **[Test build #146474 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146474/testReport)** for PR 34973 at commit [`e01a29f`](https://github.com/apache/spark/commit/e01a29f161d0e41f5155121b1d455256d425a595).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet file metadata

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-998793337


   **[Test build #146440 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146440/testReport)** for PR 34973 at commit [`7e3ee92`](https://github.com/apache/spark/commit/7e3ee92a7b643cfbc61d7c2f5e99c420c5914807).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000552588


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/51013/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000813018


   **[Test build #146561 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146561/testReport)** for PR 34973 at commit [`68a3504`](https://github.com/apache/spark/commit/68a3504a2eafd2f94ec04326355bb475905a6dce).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775474340



##########
File path: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
##########
@@ -1786,8 +1787,8 @@ abstract class AvroSuite
   }
 
   // It generates input files for the test below:
-  // "SPARK-31183: compatibility with Spark 2.4 in reading dates/timestamps"
-  ignore("SPARK-31855: generate test files for checking compatibility with Spark 2.4") {
+  // "SPARK-31183, SPARK-37705: compatibility with Spark 2.4/3.2 in reading dates/timestamps"
+  ignore("SPARK-31855: generate test files for checking compatibility with Spark 2.4/3.2") {

Review comment:
       Just replace `ignore` -> `test`, and run it. For example, I checkout `branch-3.2` and ran the test to generate the golden files. After that, I copied new files to PR's branch.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775566663



##########
File path: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
##########
@@ -1933,7 +1934,8 @@ abstract class AvroSuite
       // By default we should fail to read ancient datetime values when parquet files don't
       // contain Spark version.
       "2_4_5" -> failInRead _,
-      "2_4_6" -> successInRead _
+      "2_4_6" -> successInRead _,
+      "3_2_2" -> successInRead _

Review comment:
       Yes it should be 3.2.1-SNAPSHOT..The RC script aborted sometimes. This should be a mistake when I rerun the script.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775722586



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala
##########
@@ -464,30 +485,43 @@ object RebaseDateTime {
   final val lastSwitchJulianTs: Long = getLastSwitchTs(julianGregRebaseMap)
 
   /**
-   * An optimized version of [[rebaseJulianToGregorianMicros(ZoneId, Long)]]. This method leverages
-   * the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't contain
-   * information about the current JVM system time zone or `micros` is related to Before Common Era,
-   * the function falls back to the regular unoptimized version.
-   *
-   * Note: The function assumes that the input micros was derived from a local timestamp
-   *       at the default system JVM time zone in the Julian calendar.
+   * An optimized version of [[rebaseJulianToGregorianMicros(TimeZone, Long)]]. This method
+   * leverages the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't
+   * contain information about the given time zone `timeZoneId` or `micros` is related to Before
+   * Common Era, the function falls back to the regular unoptimized version.
    *
+   * @param timeZoneId A string identifier of a time zone.
    * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
    *               in the Julian calendar. It can be negative.
    * @return The rebased microseconds since the epoch in Proleptic Gregorian calendar.
    */
-  def rebaseJulianToGregorianMicros(micros: Long): Long = {
+  def rebaseJulianToGregorianMicros(timeZoneId: String, micros: Long): Long = {
     if (micros >= lastSwitchJulianTs) {
       micros
     } else {
-      val timeZone = TimeZone.getDefault
-      val tzId = timeZone.getID
-      val rebaseRecord = julianGregRebaseMap.getOrNull(tzId)
+      val rebaseRecord = julianGregRebaseMap.getOrNull(timeZoneId)
       if (rebaseRecord == null || micros < rebaseRecord.switches(0)) {
-        rebaseJulianToGregorianMicros(timeZone, micros)
+        rebaseJulianToGregorianMicros(TimeZone.getTimeZone(timeZoneId), micros)
       } else {
         rebaseMicros(rebaseRecord, micros)
       }
     }
   }
+
+  /**
+   * An optimized version of [[rebaseJulianToGregorianMicros(TimeZone, Long)]]. This method
+   * leverages the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't
+   * contain information about the current JVM system time zone or `micros` is related to Before
+   * Common Era, the function falls back to the regular unoptimized version.
+   *
+   * Note: The function assumes that the input micros was derived from a local timestamp
+   *       at the default system JVM time zone in the Julian calendar.
+   *
+   * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
+   *               in the Julian calendar. It can be negative.
+   * @return The rebased microseconds since the epoch in Proleptic Gregorian calendar.
+   */
+  def rebaseJulianToGregorianMicros(micros: Long): Long = {

Review comment:
       how many places do we have? we should understand the places that rely on JVM timezone.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet file metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999007300


   **[Test build #146440 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146440/testReport)** for PR 34973 at commit [`7e3ee92`](https://github.com/apache/spark/commit/7e3ee92a7b643cfbc61d7c2f5e99c420c5914807).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet file metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-998824312


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50915/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999791323


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146481/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775466900



##########
File path: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
##########
@@ -1786,8 +1787,8 @@ abstract class AvroSuite
   }
 
   // It generates input files for the test below:
-  // "SPARK-31183: compatibility with Spark 2.4 in reading dates/timestamps"
-  ignore("SPARK-31855: generate test files for checking compatibility with Spark 2.4") {
+  // "SPARK-31183, SPARK-37705: compatibility with Spark 2.4/3.2 in reading dates/timestamps"
+  ignore("SPARK-31855: generate test files for checking compatibility with Spark 2.4/3.2") {

Review comment:
       how do we use this test to generate the new testing data files?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000807274


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/51036/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999635345


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146474/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999631258


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50957/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000793176


   **[Test build #146551 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146551/testReport)** for PR 34973 at commit [`6282a9e`](https://github.com/apache/spark/commit/6282a9e40535f30933889fa90cdc5f8a85ec9a82).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000796444


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146551/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000885553


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/51041/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
sadikovi commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999912440


   Thanks, I will take a look later. I noticed that we are changing OSS Spark method signature. Would it be possible to somehow avoid breaking binary compatibility between DBR and OSS classes/methods? At least, it would be good to assess the changes from this point.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000813084


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146561/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet file metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-998793337


   **[Test build #146440 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146440/testReport)** for PR 34973 at commit [`7e3ee92`](https://github.com/apache/spark/commit/7e3ee92a7b643cfbc61d7c2f5e99c420c5914807).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000682819


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/51026/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000863519


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/51041/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000669744


   **[Test build #146551 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146551/testReport)** for PR 34973 at commit [`6282a9e`](https://github.com/apache/spark/commit/6282a9e40535f30933889fa90cdc5f8a85ec9a82).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000919234


   **[Test build #146566 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146566/testReport)** for PR 34973 at commit [`e06f5f4`](https://github.com/apache/spark/commit/e06f5f42b0cb5d65fa9602c4011b691ec10b6b97).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000283619


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50993/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999516312


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50950/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999770333


   **[Test build #146481 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146481/testReport)** for PR 34973 at commit [`64c39a7`](https://github.com/apache/spark/commit/64c39a739a9623fea50f2b1a4649a26bd1a14a5b).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999553148


   **[Test build #146481 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146481/testReport)** for PR 34973 at commit [`64c39a7`](https://github.com/apache/spark/commit/64c39a739a9623fea50f2b1a4649a26bd1a14a5b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999631298


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50957/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1002110966


   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000527663


   > Would it be possible to somehow avoid breaking binary compatibility between DBR and OSS classes/methods?
   
   @sadikovi As far as I can see, I modified only private/internal methods. Maybe I am wrong, please, correct me. At OSS side, we care mostly of public APIs, and not so much of internal methods. If you have any ideas how to avoid changes in signatures of internal functions/methods, you are welcome.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000520508


   **[Test build #146538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146538/testReport)** for PR 34973 at commit [`2a3fd19`](https://github.com/apache/spark/commit/2a3fd19fa32328b6728d2c470ce4f0b5035e43d8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000697150


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/51026/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000283648


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50993/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi edited a comment on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
sadikovi edited a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999912440


   Thanks, I will take a look later. I noticed that we are changing OSS Spark method signature. Would it be possible to somehow avoid breaking binary compatibility between OSS classes/methods? At least, it would be good to assess the changes from this point.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
sadikovi commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000528309


   @MaxGekk You can ignore my comment above. Let's continue with the current approach. I will take a look next week, after the holidays.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000534717


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/51013/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000520508


   **[Test build #146538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146538/testReport)** for PR 34973 at commit [`2a3fd19`](https://github.com/apache/spark/commit/2a3fd19fa32328b6728d2c470ce4f0b5035e43d8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet file metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999010943


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146440/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999631425


   **[Test build #146474 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146474/testReport)** for PR 34973 at commit [`e01a29f`](https://github.com/apache/spark/commit/e01a29f161d0e41f5155121b1d455256d425a595).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet file metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-998883535


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50915/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775462814



##########
File path: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
##########
@@ -1933,7 +1934,8 @@ abstract class AvroSuite
       // By default we should fail to read ancient datetime values when parquet files don't
       // contain Spark version.
       "2_4_5" -> failInRead _,
-      "2_4_6" -> successInRead _
+      "2_4_6" -> successInRead _,
+      "3_2_2" -> successInRead _

Review comment:
       3.2.2? We have not released 3.2.1 yet.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775463439



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala
##########
@@ -363,33 +371,46 @@ object RebaseDateTime {
   }
 
   /**
-   * An optimized version of [[rebaseGregorianToJulianMicros(ZoneId, Long)]]. This method leverages
-   * the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't contain
-   * information about the current JVM system time zone or `micros` is related to Before Common Era,
-   * the function falls back to the regular unoptimized version.
-   *
-   * Note: The function assumes that the input micros was derived from a local timestamp
-   *       at the default system JVM time zone in Proleptic Gregorian calendar.
+   * An optimized version of [[rebaseGregorianToJulianMicros(TimeZone, Long)]]. This method
+   * leverages the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't
+   * contain information about the given time zone `timeZoneId` or `micros` is related to Before
+   * Common Era, the function falls back to the regular unoptimized version.
    *
+   * @param timeZoneId A string identifier of a time zone.
    * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
    *               in Proleptic Gregorian calendar. It can be negative.
    * @return The rebased microseconds since the epoch in Julian calendar.
    */
-  def rebaseGregorianToJulianMicros(micros: Long): Long = {
+  def rebaseGregorianToJulianMicros(timeZoneId: String, micros: Long): Long = {
     if (micros >= lastSwitchGregorianTs) {
       micros
     } else {
-      val timeZone = TimeZone.getDefault
-      val tzId = timeZone.getID
-      val rebaseRecord = gregJulianRebaseMap.getOrNull(tzId)
+      val rebaseRecord = gregJulianRebaseMap.getOrNull(timeZoneId)
       if (rebaseRecord == null || micros < rebaseRecord.switches(0)) {
-        rebaseGregorianToJulianMicros(timeZone, micros)
+        rebaseGregorianToJulianMicros(TimeZone.getTimeZone(timeZoneId), micros)
       } else {
         rebaseMicros(rebaseRecord, micros)
       }
     }
   }
 
+  /**
+   * An optimized version of [[rebaseGregorianToJulianMicros(TimeZone, Long)]]. This method
+   * leverages the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't
+   * contain information about the current JVM system time zone or `micros` is related to Before
+   * Common Era, the function falls back to the regular unoptimized version.
+   *
+   * Note: The function assumes that the input micros was derived from a local timestamp
+   *       at the default system JVM time zone in Proleptic Gregorian calendar.
+   *
+   * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
+   *               in Proleptic Gregorian calendar. It can be negative.
+   * @return The rebased microseconds since the epoch in Julian calendar.
+   */
+  def rebaseGregorianToJulianMicros(micros: Long): Long = {

Review comment:
       is it used in test only?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775464249



##########
File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
##########
@@ -218,7 +222,9 @@ public final void readLongsWithRebase(
         throw DataSourceUtils.newRebaseExceptionInRead("Parquet");
       } else {
         for (int i = 0; i < total; i += 1) {
-          c.putLong(rowId + i, RebaseDateTime.rebaseJulianToGregorianMicros(buffer.getLong()));
+          c.putLong(
+              rowId + i,

Review comment:
       nit: 2 spaces indentation




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775473655



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala
##########
@@ -363,33 +371,46 @@ object RebaseDateTime {
   }
 
   /**
-   * An optimized version of [[rebaseGregorianToJulianMicros(ZoneId, Long)]]. This method leverages
-   * the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't contain
-   * information about the current JVM system time zone or `micros` is related to Before Common Era,
-   * the function falls back to the regular unoptimized version.
-   *
-   * Note: The function assumes that the input micros was derived from a local timestamp
-   *       at the default system JVM time zone in Proleptic Gregorian calendar.
+   * An optimized version of [[rebaseGregorianToJulianMicros(TimeZone, Long)]]. This method
+   * leverages the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't
+   * contain information about the given time zone `timeZoneId` or `micros` is related to Before
+   * Common Era, the function falls back to the regular unoptimized version.
    *
+   * @param timeZoneId A string identifier of a time zone.
    * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
    *               in Proleptic Gregorian calendar. It can be negative.
    * @return The rebased microseconds since the epoch in Julian calendar.
    */
-  def rebaseGregorianToJulianMicros(micros: Long): Long = {
+  def rebaseGregorianToJulianMicros(timeZoneId: String, micros: Long): Long = {
     if (micros >= lastSwitchGregorianTs) {
       micros
     } else {
-      val timeZone = TimeZone.getDefault
-      val tzId = timeZone.getID
-      val rebaseRecord = gregJulianRebaseMap.getOrNull(tzId)
+      val rebaseRecord = gregJulianRebaseMap.getOrNull(timeZoneId)
       if (rebaseRecord == null || micros < rebaseRecord.switches(0)) {
-        rebaseGregorianToJulianMicros(timeZone, micros)
+        rebaseGregorianToJulianMicros(TimeZone.getTimeZone(timeZoneId), micros)
       } else {
         rebaseMicros(rebaseRecord, micros)
       }
     }
   }
 
+  /**
+   * An optimized version of [[rebaseGregorianToJulianMicros(TimeZone, Long)]]. This method
+   * leverages the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't
+   * contain information about the current JVM system time zone or `micros` is related to Before
+   * Common Era, the function falls back to the regular unoptimized version.
+   *
+   * Note: The function assumes that the input micros was derived from a local timestamp
+   *       at the default system JVM time zone in Proleptic Gregorian calendar.
+   *
+   * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
+   *               in Proleptic Gregorian calendar. It can be negative.
+   * @return The rebased microseconds since the epoch in Julian calendar.
+   */
+  def rebaseGregorianToJulianMicros(micros: Long): Long = {

Review comment:
       Not only, see
   https://github.com/apache/spark/blob/f7dabd8e57fc4edf057b9219d6d9382bc6adf749/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala#L328




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000253357


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50993/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000708386


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/51026/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775463526



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala
##########
@@ -464,30 +485,43 @@ object RebaseDateTime {
   final val lastSwitchJulianTs: Long = getLastSwitchTs(julianGregRebaseMap)
 
   /**
-   * An optimized version of [[rebaseJulianToGregorianMicros(ZoneId, Long)]]. This method leverages
-   * the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't contain
-   * information about the current JVM system time zone or `micros` is related to Before Common Era,
-   * the function falls back to the regular unoptimized version.
-   *
-   * Note: The function assumes that the input micros was derived from a local timestamp
-   *       at the default system JVM time zone in the Julian calendar.
+   * An optimized version of [[rebaseJulianToGregorianMicros(TimeZone, Long)]]. This method
+   * leverages the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't
+   * contain information about the given time zone `timeZoneId` or `micros` is related to Before
+   * Common Era, the function falls back to the regular unoptimized version.
    *
+   * @param timeZoneId A string identifier of a time zone.
    * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
    *               in the Julian calendar. It can be negative.
    * @return The rebased microseconds since the epoch in Proleptic Gregorian calendar.
    */
-  def rebaseJulianToGregorianMicros(micros: Long): Long = {
+  def rebaseJulianToGregorianMicros(timeZoneId: String, micros: Long): Long = {
     if (micros >= lastSwitchJulianTs) {
       micros
     } else {
-      val timeZone = TimeZone.getDefault
-      val tzId = timeZone.getID
-      val rebaseRecord = julianGregRebaseMap.getOrNull(tzId)
+      val rebaseRecord = julianGregRebaseMap.getOrNull(timeZoneId)
       if (rebaseRecord == null || micros < rebaseRecord.switches(0)) {
-        rebaseJulianToGregorianMicros(timeZone, micros)
+        rebaseJulianToGregorianMicros(TimeZone.getTimeZone(timeZoneId), micros)
       } else {
         rebaseMicros(rebaseRecord, micros)
       }
     }
   }
+
+  /**
+   * An optimized version of [[rebaseJulianToGregorianMicros(TimeZone, Long)]]. This method
+   * leverages the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't
+   * contain information about the current JVM system time zone or `micros` is related to Before
+   * Common Era, the function falls back to the regular unoptimized version.
+   *
+   * Note: The function assumes that the input micros was derived from a local timestamp
+   *       at the default system JVM time zone in the Julian calendar.
+   *
+   * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
+   *               in the Julian calendar. It can be negative.
+   * @return The rebased microseconds since the epoch in Proleptic Gregorian calendar.
+   */
+  def rebaseJulianToGregorianMicros(micros: Long): Long = {

Review comment:
       ditto




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000220584


   **[Test build #146517 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146517/testReport)** for PR 34973 at commit [`ece9afc`](https://github.com/apache/spark/commit/ece9afc1c406c4390b39825a0fc888163e52ef82).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000403809


   **[Test build #146517 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146517/testReport)** for PR 34973 at commit [`ece9afc`](https://github.com/apache/spark/commit/ece9afc1c406c4390b39825a0fc888163e52ef82).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000809549


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/51036/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000809549


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/51036/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000556424


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146538/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #34973:
URL: https://github.com/apache/spark/pull/34973#discussion_r775480447



##########
File path: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
##########
@@ -1933,7 +1934,8 @@ abstract class AvroSuite
       // By default we should fail to read ancient datetime values when parquet files don't
       // contain Spark version.
       "2_4_5" -> failInRead _,
-      "2_4_6" -> successInRead _
+      "2_4_6" -> successInRead _,
+      "3_2_2" -> successInRead _

Review comment:
       Let me take the commit https://github.com/apache/spark/commit/5d45a415f3a29898d92380380cfd82bfc7f579ea and generate the golden files using it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000847915


   **[Test build #146566 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146566/testReport)** for PR 34973 at commit [`e06f5f4`](https://github.com/apache/spark/commit/e06f5f42b0cb5d65fa9602c4011b691ec10b6b97).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34973: [SPARK-37705][SQL] Rebase timestamps in the session time zone saved in Parquet/Avro metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-1000923272


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146566/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999500338


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50950/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999516312


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50950/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999553148


   **[Test build #146481 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146481/testReport)** for PR 34973 at commit [`64c39a7`](https://github.com/apache/spark/commit/64c39a739a9623fea50f2b1a4649a26bd1a14a5b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999438450


   **[Test build #146474 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146474/testReport)** for PR 34973 at commit [`e01a29f`](https://github.com/apache/spark/commit/e01a29f161d0e41f5155121b1d455256d425a595).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34973: [WIP][SPARK-37705][SQL] Write the session time zone in Parquet/Avro file metadata

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34973:
URL: https://github.com/apache/spark/pull/34973#issuecomment-999791323


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146481/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org