You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/06/02 09:20:19 UTC

[GitHub] [spark] MaxGekk opened a new pull request #28705: [WIP][SPARK-30808][SQL] Enable Java 8 time API in Thrift server

MaxGekk opened a new pull request #28705:
URL: https://github.com/apache/spark/pull/28705


   ### What changes were proposed in this pull request?
   Set `spark.sql.datetime.java8API.enabled` to `true` in `hiveResultString()`, and restore it back at the end of the call.
   
   ### Why are the changes needed?
   TODO
   
   ### Does this PR introduce any user-facing change?
   Yes. Before:
   ```sql
   spark-sql> select make_date(-44, 3, 15);
   0045-03-15
   ```
   After:
   ```sql
   spark-sql> select make_date(-44, 3, 15);
   -0044-03-15
   ```
   
   
   ### How was this patch tested?
   By running hive-thiftserver tests. In particular:
   ```
   ./build/sbt -Phadoop-2.7 -Phive-2.3 -Phive-thriftserver "hive-thriftserver/test:testOnly *SparkThriftServerProtocolVersionsSuite"
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28705:
URL: https://github.com/apache/spark/pull/28705#discussion_r434120332



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/HiveResultSuite.scala
##########
@@ -17,36 +17,53 @@
 
 package org.apache.spark.sql.execution
 
+import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._
 import org.apache.spark.sql.connector.InMemoryTableCatalog
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.{ExamplePoint, ExamplePointUDT, SharedSparkSession}
 
 class HiveResultSuite extends SharedSparkSession {
   import testImplicits._
 
+  private def withOutstandingZoneIds(f: => Unit): Unit = {
+    for {
+      jvmZoneId <- outstandingZoneIds
+      sessionZoneId <- outstandingZoneIds

Review comment:
       This should test the cases when jvm default time zone and session time zone are not in sync.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in in Thriftserver SQL CLI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-638421176






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in in Thriftserver SQL CLI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-638420519


   **[Test build #123500 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123500/testReport)** for PR 28705 at commit [`74dfe4d`](https://github.com/apache/spark/commit/74dfe4d428000ec31c6cdaac8c35301b904d1d1d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski commented on a change in pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
juliuszsompolski commented on a change in pull request #28705:
URL: https://github.com/apache/spark/pull/28705#discussion_r434480076



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala
##########
@@ -37,30 +37,45 @@ object HiveResult {
    * Returns the result as a hive compatible sequence of strings. This is used in tests and
    * `SparkSQLDriver` for CLI applications.
    */
-  def hiveResultString(executedPlan: SparkPlan): Seq[String] = executedPlan match {
-    case ExecutedCommandExec(_: DescribeCommandBase) =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    case _: DescribeTableExec =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    // SHOW TABLES in Hive only output table names while our v1 command outputs
-    // database, table name, isTemp.
-    case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW TABLES in Hive only output table names while our v2 command outputs
-    // namespace and table name.
-    case command : ShowTablesExec =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW VIEWS in Hive only outputs view names while our v1 command outputs
-    // namespace, viewName, and isTemporary.
-    case command @ ExecutedCommandExec(_: ShowViewsCommand) =>
-      command.executeCollect().map(_.getString(1))
-    case other =>
-      val result: Seq[Seq[Any]] = other.executeCollectPublic().map(_.toSeq).toSeq
-      // We need the types so we can output struct field names
-      val types = executedPlan.output.map(_.dataType)
-      // Reformat to match hive tab delimited output.
-      result.map(_.zip(types).map(e => toHiveString(e)))

Review comment:
       @MaxGekk, thinking about what I just wrote to @cloud-fan, would the toHiveString here already handle conversion using the correct session timezone, not the JVM timezone?
   Or is there some other case that doesn't work? E.g. about hybrid vs. proleptic calendar?
   
   If that is the case, then we should also set the DATETIME_JAVA8API_ENABLED in the withLocalProperties around Thriftserver JDBC/ODBC operations, to make it work correctly also there.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski commented on a change in pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
juliuszsompolski commented on a change in pull request #28705:
URL: https://github.com/apache/spark/pull/28705#discussion_r434136228



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala
##########
@@ -37,30 +37,45 @@ object HiveResult {
    * Returns the result as a hive compatible sequence of strings. This is used in tests and
    * `SparkSQLDriver` for CLI applications.
    */
-  def hiveResultString(executedPlan: SparkPlan): Seq[String] = executedPlan match {
-    case ExecutedCommandExec(_: DescribeCommandBase) =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    case _: DescribeTableExec =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    // SHOW TABLES in Hive only output table names while our v1 command outputs
-    // database, table name, isTemp.
-    case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW TABLES in Hive only output table names while our v2 command outputs
-    // namespace and table name.
-    case command : ShowTablesExec =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW VIEWS in Hive only outputs view names while our v1 command outputs
-    // namespace, viewName, and isTemporary.
-    case command @ ExecutedCommandExec(_: ShowViewsCommand) =>
-      command.executeCollect().map(_.getString(1))
-    case other =>
-      val result: Seq[Seq[Any]] = other.executeCollectPublic().map(_.toSeq).toSeq
-      // We need the types so we can output struct field names
-      val types = executedPlan.output.map(_.dataType)
-      // Reformat to match hive tab delimited output.
-      result.map(_.zip(types).map(e => toHiveString(e)))
-        .map(_.mkString("\t"))
+  def hiveResultString(ds: Dataset[_]): Seq[String] = {
+    val executedPlan = ds.queryExecution.executedPlan
+    executedPlan match {
+      case ExecutedCommandExec(_: DescribeCommandBase) =>
+        formatDescribeTableOutput(executedPlan.executeCollectPublic())
+      case _: DescribeTableExec =>
+        formatDescribeTableOutput(executedPlan.executeCollectPublic())
+      // SHOW TABLES in Hive only output table names while our v1 command outputs
+      // database, table name, isTemp.
+      case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended =>
+        command.executeCollect().map(_.getString(1))
+      // SHOW TABLES in Hive only output table names while our v2 command outputs
+      // namespace and table name.
+      case command : ShowTablesExec =>
+        command.executeCollect().map(_.getString(1))
+      // SHOW VIEWS in Hive only outputs view names while our v1 command outputs
+      // namespace, viewName, and isTemporary.
+      case command @ ExecutedCommandExec(_: ShowViewsCommand) =>
+        command.executeCollect().map(_.getString(1))
+      case _ =>
+        val sessionWithJava8DatetimeEnabled = {
+          val cloned = ds.sparkSession.cloneSession()
+          cloned.conf.set(SQLConf.DATETIME_JAVA8API_ENABLED.key, true)
+          cloned
+        }
+        sessionWithJava8DatetimeEnabled.withActive {
+          // We cannot collect the original dataset because its encoders could be created
+          // with disabled Java 8 date-time API.
+          val result: Seq[Seq[Any]] = Dataset.ofRows(ds.sparkSession, ds.logicalPlan)
+            .queryExecution
+            .executedPlan
+            .executeCollectPublic().map(_.toSeq).toSeq
+          // We need the types so we can output struct field names
+          val types = executedPlan.output.map(_.dataType)
+          // Reformat to match hive tab delimited output.
+          result.map(_.zip(types).map(e => toHiveString(e)))
+            .map(_.mkString("\t"))
+        }
+    }

Review comment:
       SparkSQLCLIDriver is the only non-test user of this function, and if we want the CLI to always use the new Java 8 date-time APIs, I think we could better explicitly set it there, rather than cloning the session, and cloning the Dataset here to do it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski commented on a change in pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
juliuszsompolski commented on a change in pull request #28705:
URL: https://github.com/apache/spark/pull/28705#discussion_r434543531



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala
##########
@@ -37,30 +37,45 @@ object HiveResult {
    * Returns the result as a hive compatible sequence of strings. This is used in tests and
    * `SparkSQLDriver` for CLI applications.
    */
-  def hiveResultString(executedPlan: SparkPlan): Seq[String] = executedPlan match {
-    case ExecutedCommandExec(_: DescribeCommandBase) =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    case _: DescribeTableExec =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    // SHOW TABLES in Hive only output table names while our v1 command outputs
-    // database, table name, isTemp.
-    case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW TABLES in Hive only output table names while our v2 command outputs
-    // namespace and table name.
-    case command : ShowTablesExec =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW VIEWS in Hive only outputs view names while our v1 command outputs
-    // namespace, viewName, and isTemporary.
-    case command @ ExecutedCommandExec(_: ShowViewsCommand) =>
-      command.executeCollect().map(_.getString(1))
-    case other =>
-      val result: Seq[Seq[Any]] = other.executeCollectPublic().map(_.toSeq).toSeq
-      // We need the types so we can output struct field names
-      val types = executedPlan.output.map(_.dataType)
-      // Reformat to match hive tab delimited output.
-      result.map(_.zip(types).map(e => toHiveString(e)))

Review comment:
       Following that example:
   ```
   $ export TZ="Europe/Moscow"
   $ ./bin/spark-sql -S
   spark-sql> set spark.sql.session.timeZone=America/Los_Angeles;
   spark.sql.session.timeZone	America/Los_Angeles
   spark-sql> select date '2020-06-03';
   2020-06-02
   spark-sql> select make_date(2020, 6, 3);
   2020-06-02
   ```
   Could you explain why does the make_date(2020, 6, 3) -> 2020-06-02 happens?
   Does make_date create a date of midnight 2020-6-3 in Moscow TZ, and it gets returned in America/Los_Angeles, where it is still 2020-6-2?
   Could you explain step by step with examples what type and what timezone is used during parsing, during collecting, and for the string display before and after the changes?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in in Thriftserver SQL CLI

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-638420519


   **[Test build #123500 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123500/testReport)** for PR 28705 at commit [`74dfe4d`](https://github.com/apache/spark/commit/74dfe4d428000ec31c6cdaac8c35301b904d1d1d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in in Thriftserver SQL CLI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-638421176






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28705:
URL: https://github.com/apache/spark/pull/28705#discussion_r434600243



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala
##########
@@ -37,30 +37,45 @@ object HiveResult {
    * Returns the result as a hive compatible sequence of strings. This is used in tests and
    * `SparkSQLDriver` for CLI applications.
    */
-  def hiveResultString(executedPlan: SparkPlan): Seq[String] = executedPlan match {
-    case ExecutedCommandExec(_: DescribeCommandBase) =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    case _: DescribeTableExec =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    // SHOW TABLES in Hive only output table names while our v1 command outputs
-    // database, table name, isTemp.
-    case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW TABLES in Hive only output table names while our v2 command outputs
-    // namespace and table name.
-    case command : ShowTablesExec =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW VIEWS in Hive only outputs view names while our v1 command outputs
-    // namespace, viewName, and isTemporary.
-    case command @ ExecutedCommandExec(_: ShowViewsCommand) =>
-      command.executeCollect().map(_.getString(1))
-    case other =>
-      val result: Seq[Seq[Any]] = other.executeCollectPublic().map(_.toSeq).toSeq
-      // We need the types so we can output struct field names
-      val types = executedPlan.output.map(_.dataType)
-      // Reformat to match hive tab delimited output.
-      result.map(_.zip(types).map(e => toHiveString(e)))

Review comment:
       So everything is OK now? Enabling jave 8 time API is only for better performance and support negative year?
   
   BTW I doubt if we can support negative year in thriftserver. Even if the server-side can generate the datetime string correctly. The client-side parse the string using `Timestamp.of` which doesn't support negative year.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28705: [WIP][SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-637702326


   **[Test build #123445 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123445/testReport)** for PR 28705 at commit [`ee69a1b`](https://github.com/apache/spark/commit/ee69a1b630ddba4cb5120a742d52b2956bc9a3f0).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk closed pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in in Thriftserver SQL CLI

Posted by GitBox <gi...@apache.org>.
MaxGekk closed pull request #28705:
URL: https://github.com/apache/spark/pull/28705


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28705:
URL: https://github.com/apache/spark/pull/28705#discussion_r434360189



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala
##########
@@ -37,30 +37,45 @@ object HiveResult {
    * Returns the result as a hive compatible sequence of strings. This is used in tests and
    * `SparkSQLDriver` for CLI applications.
    */
-  def hiveResultString(executedPlan: SparkPlan): Seq[String] = executedPlan match {
-    case ExecutedCommandExec(_: DescribeCommandBase) =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    case _: DescribeTableExec =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    // SHOW TABLES in Hive only output table names while our v1 command outputs
-    // database, table name, isTemp.
-    case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW TABLES in Hive only output table names while our v2 command outputs
-    // namespace and table name.
-    case command : ShowTablesExec =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW VIEWS in Hive only outputs view names while our v1 command outputs
-    // namespace, viewName, and isTemporary.
-    case command @ ExecutedCommandExec(_: ShowViewsCommand) =>
-      command.executeCollect().map(_.getString(1))
-    case other =>
-      val result: Seq[Seq[Any]] = other.executeCollectPublic().map(_.toSeq).toSeq
-      // We need the types so we can output struct field names
-      val types = executedPlan.output.map(_.dataType)
-      // Reformat to match hive tab delimited output.
-      result.map(_.zip(types).map(e => toHiveString(e)))
-        .map(_.mkString("\t"))
+  def hiveResultString(ds: Dataset[_]): Seq[String] = {
+    val executedPlan = ds.queryExecution.executedPlan
+    executedPlan match {
+      case ExecutedCommandExec(_: DescribeCommandBase) =>
+        formatDescribeTableOutput(executedPlan.executeCollectPublic())
+      case _: DescribeTableExec =>
+        formatDescribeTableOutput(executedPlan.executeCollectPublic())
+      // SHOW TABLES in Hive only output table names while our v1 command outputs
+      // database, table name, isTemp.
+      case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended =>
+        command.executeCollect().map(_.getString(1))
+      // SHOW TABLES in Hive only output table names while our v2 command outputs
+      // namespace and table name.
+      case command : ShowTablesExec =>
+        command.executeCollect().map(_.getString(1))
+      // SHOW VIEWS in Hive only outputs view names while our v1 command outputs
+      // namespace, viewName, and isTemporary.
+      case command @ ExecutedCommandExec(_: ShowViewsCommand) =>
+        command.executeCollect().map(_.getString(1))
+      case _ =>
+        val sessionWithJava8DatetimeEnabled = {
+          val cloned = ds.sparkSession.cloneSession()
+          cloned.conf.set(SQLConf.DATETIME_JAVA8API_ENABLED.key, true)
+          cloned
+        }
+        sessionWithJava8DatetimeEnabled.withActive {
+          // We cannot collect the original dataset because its encoders could be created
+          // with disabled Java 8 date-time API.
+          val result: Seq[Seq[Any]] = Dataset.ofRows(ds.sparkSession, ds.logicalPlan)
+            .queryExecution
+            .executedPlan
+            .executeCollectPublic().map(_.toSeq).toSeq
+          // We need the types so we can output struct field names
+          val types = executedPlan.output.map(_.dataType)
+          // Reformat to match hive tab delimited output.
+          result.map(_.zip(types).map(e => toHiveString(e)))
+            .map(_.mkString("\t"))
+        }
+    }

Review comment:
       +1. This also reminds me of https://github.com/apache/spark/pull/28671. Is it possible to always enable java 8 time API in the thrifter server?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28705: [WIP][SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-637411536


   **[Test build #123428 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123428/testReport)** for PR 28705 at commit [`4ea85ad`](https://github.com/apache/spark/commit/4ea85add56af405fa99e64dcc321c7cc2087660b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28705:
URL: https://github.com/apache/spark/pull/28705#discussion_r434623653



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala
##########
@@ -37,30 +37,45 @@ object HiveResult {
    * Returns the result as a hive compatible sequence of strings. This is used in tests and
    * `SparkSQLDriver` for CLI applications.
    */
-  def hiveResultString(executedPlan: SparkPlan): Seq[String] = executedPlan match {
-    case ExecutedCommandExec(_: DescribeCommandBase) =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    case _: DescribeTableExec =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    // SHOW TABLES in Hive only output table names while our v1 command outputs
-    // database, table name, isTemp.
-    case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW TABLES in Hive only output table names while our v2 command outputs
-    // namespace and table name.
-    case command : ShowTablesExec =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW VIEWS in Hive only outputs view names while our v1 command outputs
-    // namespace, viewName, and isTemporary.
-    case command @ ExecutedCommandExec(_: ShowViewsCommand) =>
-      command.executeCollect().map(_.getString(1))
-    case other =>
-      val result: Seq[Seq[Any]] = other.executeCollectPublic().map(_.toSeq).toSeq
-      // We need the types so we can output struct field names
-      val types = executedPlan.output.map(_.dataType)
-      // Reformat to match hive tab delimited output.
-      result.map(_.zip(types).map(e => toHiveString(e)))

Review comment:
       Everything is ok when JVM time zone on executors (where `toJavaDate` is called) is equal to JVM time zone on the driver (where `HiveResult` is initialized). And both JVM time zones are equal to Spark's session time zone.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28705:
URL: https://github.com/apache/spark/pull/28705#discussion_r434578570



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala
##########
@@ -37,30 +37,45 @@ object HiveResult {
    * Returns the result as a hive compatible sequence of strings. This is used in tests and
    * `SparkSQLDriver` for CLI applications.
    */
-  def hiveResultString(executedPlan: SparkPlan): Seq[String] = executedPlan match {
-    case ExecutedCommandExec(_: DescribeCommandBase) =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    case _: DescribeTableExec =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    // SHOW TABLES in Hive only output table names while our v1 command outputs
-    // database, table name, isTemp.
-    case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW TABLES in Hive only output table names while our v2 command outputs
-    // namespace and table name.
-    case command : ShowTablesExec =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW VIEWS in Hive only outputs view names while our v1 command outputs
-    // namespace, viewName, and isTemporary.
-    case command @ ExecutedCommandExec(_: ShowViewsCommand) =>
-      command.executeCollect().map(_.getString(1))
-    case other =>
-      val result: Seq[Seq[Any]] = other.executeCollectPublic().map(_.toSeq).toSeq
-      // We need the types so we can output struct field names
-      val types = executedPlan.output.map(_.dataType)
-      // Reformat to match hive tab delimited output.
-      result.map(_.zip(types).map(e => toHiveString(e)))

Review comment:
       - The date literal '2020-06-03' (and make_date(2020, 6, 3)) is converted to the number of days since the epoch '1970-01-01'. The result is 18416, and it doesn't depend on time zone. You get the same via Java 8 API:
   ```scala
   scala> println(LocalDate.of(2020, 6, 3).toEpochDay)
   18416
   ```
   The number is stored as date value internally in Spark.
   
   - To print it out, we should collect it and convert to string. The following steps are for Java 8 OFF:
   1. The days are converted to `java.sql.Date` by `toJavaDate()` which is called from https://github.com/apache/spark/blob/b917a6593dc969b9b766259eb8cbbd6e90e0dc53/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala#L306-L309 .
     1. `toJavaDate()` has to create an instance of java.sql.Date from milliseconds since the epoch 1970-01-01 00:00:00Z in UTC time zone. It converts the days 18416 to milliseconds via 18416 * 86400000 and gets 1591142400000.
     2. 1591142400000 is interpreted as local milliseconds in the JVM time zone `Europe/Moscow` which has wall clock offset of 10800000 millis or 3 hours. So, 1591142400000 is shifted by 10800000 to get "UTC timestamp". The result is 1591131600000 which is:
         - `2020-06-02T21:00:00` in UTC
         - `2020-06-03T00:00:00` in Europe/Moscow
         - `2020-06-02T14:00:00` in America/Los_Angeles
   
   2. new Date(1591131600000) is collected and formatted in `toHiveString` by using the legacy date formatter. Currently, the legacy date formatter ignores Spark session time zone `America/Los_Angeles` and uses JVM time zone `Europe/Moscow`. In this way, it converts `new Date(1591131600000)` = `2020-06-03T00:00:00` in Europe/Moscow to `2020-06-03`. Looks fine but after this PR https://github.com/apache/spark/pull/28709, it takes `America/Los_Angeles` and performs the conversion `2020-06-02T14:00:00 America/Los_Angeles` to `2020-06-02`
   
   So, the problem is in `toJavaDate()` which still uses the default JVM time zone.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in in Thriftserver SQL CLI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-638554388






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28705:
URL: https://github.com/apache/spark/pull/28705#discussion_r434072557



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/HiveResultSuite.scala
##########
@@ -17,36 +17,53 @@
 
 package org.apache.spark.sql.execution
 
+import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._
 import org.apache.spark.sql.connector.InMemoryTableCatalog
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.{ExamplePoint, ExamplePointUDT, SharedSparkSession}
 
 class HiveResultSuite extends SharedSparkSession {
   import testImplicits._
 
+  private def withOutstandingZoneIds(f: => Unit): Unit = {
+    for {
+      jvmZoneId <- outstandingZoneIds
+      sessionZoneId <- outstandingZoneIds
+    } {
+      withDefaultTimeZone(jvmZoneId) {
+        withSQLConf(SQLConf.SESSION_LOCAL_TIMEZONE.key -> sessionZoneId.getId) {
+          f
+        }
+      }
+    }
+  }
+
   test("date formatting in hive result") {
-    val dates = Seq("2018-12-28", "1582-10-03", "1582-10-04", "1582-10-15")
-    val df = dates.toDF("a").selectExpr("cast(a as date) as b")
-    val executedPlan1 = df.queryExecution.executedPlan
-    val result = HiveResult.hiveResultString(executedPlan1)
-    assert(result == dates)
-    val executedPlan2 = df.selectExpr("array(b)").queryExecution.executedPlan
-    val result2 = HiveResult.hiveResultString(executedPlan2)
-    assert(result2 == dates.map(x => s"[$x]"))
+    withOutstandingZoneIds {

Review comment:
       So, when Java 8 API is off, the test will fail because Java 7 based date/timestamp conversions depend on JVM time zone on the executors side and HiveResult side. If they are not in sync each other and with Spark session time zone, the result can be wrong.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28705: [WIP][SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-637411536


   **[Test build #123428 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123428/testReport)** for PR 28705 at commit [`4ea85ad`](https://github.com/apache/spark/commit/4ea85add56af405fa99e64dcc321c7cc2087660b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-637849248






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28705: [WIP][SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-637572071






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-637848596


   **[Test build #123445 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123445/testReport)** for PR 28705 at commit [`ee69a1b`](https://github.com/apache/spark/commit/ee69a1b630ddba4cb5120a742d52b2956bc9a3f0).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in in Thriftserver SQL CLI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-638553604


   **[Test build #123500 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123500/testReport)** for PR 28705 at commit [`74dfe4d`](https://github.com/apache/spark/commit/74dfe4d428000ec31c6cdaac8c35301b904d1d1d).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `case class CheckOverflowInSum(`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in in Thriftserver SQL CLI

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-639612804


   I am closing this PR because https://github.com/apache/spark/pull/28729 has enabled Java 8 time API in Thrift server already


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28705: [WIP][SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-637699325






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski commented on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in in Thriftserver SQL CLI

Posted by GitBox <gi...@apache.org>.
juliuszsompolski commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-638798320


   Thanks for the explanations @MaxGekk . I agree it's a good fix to make it use the Java8 APIs all across the Thriftserver.
   I would however do it in SparkSQLDriver like you did in https://github.com/apache/spark/commit/916838a3d43aeac59cdc799fed0de8d279b0ad66 instead of changing toHiveResult, and I would also add it to the withLocalProperties wrapper in SparkOperation, that will apply it to the Thriftserver JDBC/ODBC operations.
   WDYT @cloud-fan ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28705:
URL: https://github.com/apache/spark/pull/28705#discussion_r434519207



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala
##########
@@ -37,30 +37,45 @@ object HiveResult {
    * Returns the result as a hive compatible sequence of strings. This is used in tests and
    * `SparkSQLDriver` for CLI applications.
    */
-  def hiveResultString(executedPlan: SparkPlan): Seq[String] = executedPlan match {
-    case ExecutedCommandExec(_: DescribeCommandBase) =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    case _: DescribeTableExec =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    // SHOW TABLES in Hive only output table names while our v1 command outputs
-    // database, table name, isTemp.
-    case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW TABLES in Hive only output table names while our v2 command outputs
-    // namespace and table name.
-    case command : ShowTablesExec =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW VIEWS in Hive only outputs view names while our v1 command outputs
-    // namespace, viewName, and isTemporary.
-    case command @ ExecutedCommandExec(_: ShowViewsCommand) =>
-      command.executeCollect().map(_.getString(1))
-    case other =>
-      val result: Seq[Seq[Any]] = other.executeCollectPublic().map(_.toSeq).toSeq
-      // We need the types so we can output struct field names
-      val types = executedPlan.output.map(_.dataType)
-      // Reformat to match hive tab delimited output.
-      result.map(_.zip(types).map(e => toHiveString(e)))

Review comment:
       I have tried to make the legacy date formatter independent from the JVM time zone in the PR https://github.com/apache/spark/pull/28709 but it brings some issues too: https://github.com/apache/spark/pull/28709#issuecomment-638123922 . Need to modify more legacy functions to make them independent from the JVM time zone.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in in Thriftserver SQL CLI

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-638816114


   I agree. Both SQL CLI and thriftserver have their own protocols, and Spark won't leak datetime objects to end-users. Turning on java 8 time API is only for internal consistency.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in in Thriftserver SQL CLI

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-639007307


   Here is the PR https://github.com/apache/spark/pull/28729 where I enabled Java 8 time API as @juliuszsompolski proposed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28705: [WIP][SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-637699325






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-637710202


   @cloud-fan @HyukjinKwon @juliuszsompolski Please, review this PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in in Thriftserver SQL CLI

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-639629998


   FYI, we run `ThriftServerQueryTestSuite` - SQL tests in `sql/core/src/test/resources/sql-tests` under Java 8 time API **ON** but `SQLQueryTestSuite` with Java 8 time API **OFF**. So, we run the same tests for both values of `spark.sql.datetime.java8API.enabled`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28705: [WIP][SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-637570502


   **[Test build #123428 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123428/testReport)** for PR 28705 at commit [`4ea85ad`](https://github.com/apache/spark/commit/4ea85add56af405fa99e64dcc321c7cc2087660b).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28705: [WIP][SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-637412239






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan edited a comment on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in in Thriftserver SQL CLI

Posted by GitBox <gi...@apache.org>.
cloud-fan edited a comment on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-639616539


   how about the SQL CLI (SQL shell)?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #28705: [WIP][SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28705:
URL: https://github.com/apache/spark/pull/28705#discussion_r434053054



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/HiveResultSuite.scala
##########
@@ -17,36 +17,53 @@
 
 package org.apache.spark.sql.execution
 
+import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._
 import org.apache.spark.sql.connector.InMemoryTableCatalog
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.{ExamplePoint, ExamplePointUDT, SharedSparkSession}
 
 class HiveResultSuite extends SharedSparkSession {
   import testImplicits._
 
+  private def withOutstandingZoneIds(f: => Unit): Unit = {
+    for {
+      jvmZoneId <- outstandingZoneIds
+      sessionZoneId <- outstandingZoneIds
+    } {
+      withDefaultTimeZone(jvmZoneId) {
+        withSQLConf(SQLConf.SESSION_LOCAL_TIMEZONE.key -> sessionZoneId.getId) {
+          f
+        }
+      }
+    }
+  }
+
   test("date formatting in hive result") {
-    val dates = Seq("2018-12-28", "1582-10-03", "1582-10-04", "1582-10-15")
-    val df = dates.toDF("a").selectExpr("cast(a as date) as b")
-    val executedPlan1 = df.queryExecution.executedPlan
-    val result = HiveResult.hiveResultString(executedPlan1)
-    assert(result == dates)
-    val executedPlan2 = df.selectExpr("array(b)").queryExecution.executedPlan
-    val result2 = HiveResult.hiveResultString(executedPlan2)
-    assert(result2 == dates.map(x => s"[$x]"))
+    withOutstandingZoneIds {

Review comment:
       After enabling Java 8 API, the result is always the same independently from JVM and Spark's session time zones. Before it wasn't possible.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28705: [WIP][SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-637412239






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski commented on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
juliuszsompolski commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-638116989


   @cloud-fan 
   > Is it possible to always enable java 8 time API in the thrifter server?
   
   We could, the cleanest way would be to add the set config to the session config in withLocalProperties in https://github.com/apache/spark/pull/28671.
   But the session timezone issues there are resolved by that PR already anyway - the Timestamp object collected from the query gets converted to string using HiveResult.toHiveString anyway. No matter if it's a Timestamp, or an Instant, it will result in the same string, working correctly with the session timezone....


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in in Thriftserver SQL CLI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-638554388






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28705:
URL: https://github.com/apache/spark/pull/28705#discussion_r434513532



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala
##########
@@ -37,30 +37,45 @@ object HiveResult {
    * Returns the result as a hive compatible sequence of strings. This is used in tests and
    * `SparkSQLDriver` for CLI applications.
    */
-  def hiveResultString(executedPlan: SparkPlan): Seq[String] = executedPlan match {
-    case ExecutedCommandExec(_: DescribeCommandBase) =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    case _: DescribeTableExec =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    // SHOW TABLES in Hive only output table names while our v1 command outputs
-    // database, table name, isTemp.
-    case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW TABLES in Hive only output table names while our v2 command outputs
-    // namespace and table name.
-    case command : ShowTablesExec =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW VIEWS in Hive only outputs view names while our v1 command outputs
-    // namespace, viewName, and isTemporary.
-    case command @ ExecutedCommandExec(_: ShowViewsCommand) =>
-      command.executeCollect().map(_.getString(1))
-    case other =>
-      val result: Seq[Seq[Any]] = other.executeCollectPublic().map(_.toSeq).toSeq
-      // We need the types so we can output struct field names
-      val types = executedPlan.output.map(_.dataType)
-      // Reformat to match hive tab delimited output.
-      result.map(_.zip(types).map(e => toHiveString(e)))

Review comment:
       @juliuszsompolski See the comment https://github.com/apache/spark/pull/28687/files#diff-303a0a0e4383242d22307edcdd82e1f1R75-R82 in the PR https://github.com/apache/spark/pull/28687




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-637849248






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski commented on a change in pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
juliuszsompolski commented on a change in pull request #28705:
URL: https://github.com/apache/spark/pull/28705#discussion_r434508113



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala
##########
@@ -37,30 +37,45 @@ object HiveResult {
    * Returns the result as a hive compatible sequence of strings. This is used in tests and
    * `SparkSQLDriver` for CLI applications.
    */
-  def hiveResultString(executedPlan: SparkPlan): Seq[String] = executedPlan match {
-    case ExecutedCommandExec(_: DescribeCommandBase) =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    case _: DescribeTableExec =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    // SHOW TABLES in Hive only output table names while our v1 command outputs
-    // database, table name, isTemp.
-    case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW TABLES in Hive only output table names while our v2 command outputs
-    // namespace and table name.
-    case command : ShowTablesExec =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW VIEWS in Hive only outputs view names while our v1 command outputs
-    // namespace, viewName, and isTemporary.
-    case command @ ExecutedCommandExec(_: ShowViewsCommand) =>
-      command.executeCollect().map(_.getString(1))
-    case other =>
-      val result: Seq[Seq[Any]] = other.executeCollectPublic().map(_.toSeq).toSeq
-      // We need the types so we can output struct field names
-      val types = executedPlan.output.map(_.dataType)
-      // Reformat to match hive tab delimited output.
-      result.map(_.zip(types).map(e => toHiveString(e)))

Review comment:
       Do date formatting depend on timezone for output formatting?
   I thought that timezone is only needed for date parsing, for special cases such as 'now' or 'today' or 'yesterday'?
   Or is it the hybrid/proleptic calendar output formatting depend on the timezone?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in in Thriftserver SQL CLI

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-639623917


   @cloud-fan The PR #28729 enabled Java 8 time API in CLI as well, see the test https://github.com/apache/spark/pull/28729/files#diff-f3b00321aca176ae6c1aa38ba034141eR555-R559


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-637702326


   **[Test build #123445 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123445/testReport)** for PR 28705 at commit [`ee69a1b`](https://github.com/apache/spark/commit/ee69a1b630ddba4cb5120a742d52b2956bc9a3f0).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28705:
URL: https://github.com/apache/spark/pull/28705#discussion_r434490357



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala
##########
@@ -37,30 +37,45 @@ object HiveResult {
    * Returns the result as a hive compatible sequence of strings. This is used in tests and
    * `SparkSQLDriver` for CLI applications.
    */
-  def hiveResultString(executedPlan: SparkPlan): Seq[String] = executedPlan match {
-    case ExecutedCommandExec(_: DescribeCommandBase) =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    case _: DescribeTableExec =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    // SHOW TABLES in Hive only output table names while our v1 command outputs
-    // database, table name, isTemp.
-    case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW TABLES in Hive only output table names while our v2 command outputs
-    // namespace and table name.
-    case command : ShowTablesExec =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW VIEWS in Hive only outputs view names while our v1 command outputs
-    // namespace, viewName, and isTemporary.
-    case command @ ExecutedCommandExec(_: ShowViewsCommand) =>
-      command.executeCollect().map(_.getString(1))
-    case other =>
-      val result: Seq[Seq[Any]] = other.executeCollectPublic().map(_.toSeq).toSeq
-      // We need the types so we can output struct field names
-      val types = executedPlan.output.map(_.dataType)
-      // Reformat to match hive tab delimited output.
-      result.map(_.zip(types).map(e => toHiveString(e)))

Review comment:
       I do believe the proper solution is to switch to Java 8 Api.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in in Thriftserver SQL CLI

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-639616539


   how about the SQL CLI?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski commented on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in in Thriftserver SQL CLI

Posted by GitBox <gi...@apache.org>.
juliuszsompolski commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-639634143


   > FYI, we run ThriftServerQueryTestSuite - SQL tests in sql/core/src/test/resources/sql-tests under Java 8 time API ON but SQLQueryTestSuite with Java 8 time API OFF. So, we run the same tests for both values of spark.sql.datetime.java8API.enabled.
   
   There is another difference in that SQLQueryTestSuite runs with Spark Dataframe API, so you collect the results directly using the Spark APIs, while ThriftServerQueryTestSuite runs it via JDBC connection and via Hive JDBC driver, so Date/Timestamps are being printed to string, and parsed back from String into java.sql.Date / java.sql.Timestamp by the Hive JDBC driver.
   So there is more differences than one running with Java 8 API ON, and the other with OFF.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski commented on pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
juliuszsompolski commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-637759308


   Could you make the PR title more precise by changing "in Thrift server" to "in Thriftserver SQL CLI"
   This change actually affects only SQL CLI, not Thriftserver used as ODBC/JDBC server.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28705:
URL: https://github.com/apache/spark/pull/28705#discussion_r434443354



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala
##########
@@ -37,30 +37,45 @@ object HiveResult {
    * Returns the result as a hive compatible sequence of strings. This is used in tests and
    * `SparkSQLDriver` for CLI applications.
    */
-  def hiveResultString(executedPlan: SparkPlan): Seq[String] = executedPlan match {
-    case ExecutedCommandExec(_: DescribeCommandBase) =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    case _: DescribeTableExec =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    // SHOW TABLES in Hive only output table names while our v1 command outputs
-    // database, table name, isTemp.
-    case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW TABLES in Hive only output table names while our v2 command outputs
-    // namespace and table name.
-    case command : ShowTablesExec =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW VIEWS in Hive only outputs view names while our v1 command outputs
-    // namespace, viewName, and isTemporary.
-    case command @ ExecutedCommandExec(_: ShowViewsCommand) =>
-      command.executeCollect().map(_.getString(1))
-    case other =>
-      val result: Seq[Seq[Any]] = other.executeCollectPublic().map(_.toSeq).toSeq
-      // We need the types so we can output struct field names
-      val types = executedPlan.output.map(_.dataType)
-      // Reformat to match hive tab delimited output.
-      result.map(_.zip(types).map(e => toHiveString(e)))
-        .map(_.mkString("\t"))
+  def hiveResultString(ds: Dataset[_]): Seq[String] = {
+    val executedPlan = ds.queryExecution.executedPlan
+    executedPlan match {
+      case ExecutedCommandExec(_: DescribeCommandBase) =>
+        formatDescribeTableOutput(executedPlan.executeCollectPublic())
+      case _: DescribeTableExec =>
+        formatDescribeTableOutput(executedPlan.executeCollectPublic())
+      // SHOW TABLES in Hive only output table names while our v1 command outputs
+      // database, table name, isTemp.
+      case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended =>
+        command.executeCollect().map(_.getString(1))
+      // SHOW TABLES in Hive only output table names while our v2 command outputs
+      // namespace and table name.
+      case command : ShowTablesExec =>
+        command.executeCollect().map(_.getString(1))
+      // SHOW VIEWS in Hive only outputs view names while our v1 command outputs
+      // namespace, viewName, and isTemporary.
+      case command @ ExecutedCommandExec(_: ShowViewsCommand) =>
+        command.executeCollect().map(_.getString(1))
+      case _ =>
+        val sessionWithJava8DatetimeEnabled = {
+          val cloned = ds.sparkSession.cloneSession()
+          cloned.conf.set(SQLConf.DATETIME_JAVA8API_ENABLED.key, true)
+          cloned
+        }
+        sessionWithJava8DatetimeEnabled.withActive {
+          // We cannot collect the original dataset because its encoders could be created
+          // with disabled Java 8 date-time API.
+          val result: Seq[Seq[Any]] = Dataset.ofRows(ds.sparkSession, ds.logicalPlan)
+            .queryExecution
+            .executedPlan
+            .executeCollectPublic().map(_.toSeq).toSeq
+          // We need the types so we can output struct field names
+          val types = executedPlan.output.map(_.dataType)
+          // Reformat to match hive tab delimited output.
+          result.map(_.zip(types).map(e => toHiveString(e)))
+            .map(_.mkString("\t"))
+        }
+    }

Review comment:
       This is what I did originally in previous PR https://github.com/apache/spark/pull/27552/commits/916838a3d43aeac59cdc799fed0de8d279b0ad66 but somehow we came up to the conclusion of cloning session and set Java 8 Api in `hiveResultString()` locally.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28705:
URL: https://github.com/apache/spark/pull/28705#discussion_r434490030



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala
##########
@@ -37,30 +37,45 @@ object HiveResult {
    * Returns the result as a hive compatible sequence of strings. This is used in tests and
    * `SparkSQLDriver` for CLI applications.
    */
-  def hiveResultString(executedPlan: SparkPlan): Seq[String] = executedPlan match {
-    case ExecutedCommandExec(_: DescribeCommandBase) =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    case _: DescribeTableExec =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    // SHOW TABLES in Hive only output table names while our v1 command outputs
-    // database, table name, isTemp.
-    case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW TABLES in Hive only output table names while our v2 command outputs
-    // namespace and table name.
-    case command : ShowTablesExec =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW VIEWS in Hive only outputs view names while our v1 command outputs
-    // namespace, viewName, and isTemporary.
-    case command @ ExecutedCommandExec(_: ShowViewsCommand) =>
-      command.executeCollect().map(_.getString(1))
-    case other =>
-      val result: Seq[Seq[Any]] = other.executeCollectPublic().map(_.toSeq).toSeq
-      // We need the types so we can output struct field names
-      val types = executedPlan.output.map(_.dataType)
-      // Reformat to match hive tab delimited output.
-      result.map(_.zip(types).map(e => toHiveString(e)))

Review comment:
       The problem is in `dateFormatter`. Currently, its legacy formatter which is used for `java.sql.Date` doesn't respect to the Spark session time zone and depends on JVM time zone. It works fine for Java 8 `LocalDate` and respect the session time zone.
   
   I tried to fix the issue in https://github.com/apache/spark/pull/28709 but the fix brings more troubles than fixes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28705: [WIP][SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28705:
URL: https://github.com/apache/spark/pull/28705#issuecomment-637572071






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28705:
URL: https://github.com/apache/spark/pull/28705#discussion_r434585919



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala
##########
@@ -37,30 +37,45 @@ object HiveResult {
    * Returns the result as a hive compatible sequence of strings. This is used in tests and
    * `SparkSQLDriver` for CLI applications.
    */
-  def hiveResultString(executedPlan: SparkPlan): Seq[String] = executedPlan match {
-    case ExecutedCommandExec(_: DescribeCommandBase) =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    case _: DescribeTableExec =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    // SHOW TABLES in Hive only output table names while our v1 command outputs
-    // database, table name, isTemp.
-    case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW TABLES in Hive only output table names while our v2 command outputs
-    // namespace and table name.
-    case command : ShowTablesExec =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW VIEWS in Hive only outputs view names while our v1 command outputs
-    // namespace, viewName, and isTemporary.
-    case command @ ExecutedCommandExec(_: ShowViewsCommand) =>
-      command.executeCollect().map(_.getString(1))
-    case other =>
-      val result: Seq[Seq[Any]] = other.executeCollectPublic().map(_.toSeq).toSeq
-      // We need the types so we can output struct field names
-      val types = executedPlan.output.map(_.dataType)
-      // Reformat to match hive tab delimited output.
-      result.map(_.zip(types).map(e => toHiveString(e)))

Review comment:
       And one more nuance, the legacy type `java.sql.Date` is not local date as Java 8 type `java.time.LocalDate` is. It is actually a timestamp in UTC linked to the JVM time zone. Using it as a local date is not good idea at all but this is Spark's legacy code.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org