You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "Yikf (via GitHub)" <gi...@apache.org> on 2023/03/15 09:42:42 UTC

[GitHub] [spark] Yikf opened a new pull request, #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Yikf opened a new pull request, #40437:
URL: https://github.com/apache/spark/pull/40437

### What changes were proposed in this pull request?

The spark-sql shell result output of Apache Spark is compatible with the hive, but the schema output is not processed.

Let's say have a case like the following with `spark.sql.cli.print.header true`:
```shell
spark-sql> create table tbl1 (id string) using parquet;
Response code
Time taken: 1.076 seconds
```

Before this PR:
```shell
spark-sql> show tables;
namespace tableName isTemporary
tbl1
```
After this PR:
```shell
spark-sql> show tables;
tableName
tbl1
```

### Why are the changes needed?

Bugfix

### Does this PR introduce _any_ user-facing change?

Yes, if the user uses the spark-sql shell and enables `spark.sql.cli.print.header`, will see the values and the corresponding schema.

### How was this patch tested?

Exist tests and new test.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1499051074

   Seems like we just need a small change: only invoke `hiveResultString` if a testing config is true, and we only set that testing config in `HiveComparisonTest`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver use the spark result string that is consistent with that of `df.show`

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #40437:
URL: https://github.com/apache/spark/pull/40437#discussion_r1169539279


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkResult.scala:
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.sql.catalyst.expressions.{Cast, EmptyRow, Literal}
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.{DataType, StringType}
+import org.apache.spark.sql.util.SchemaUtils
+
+object SparkResult {

Review Comment:
   shall we use it in `Dataset.show` as well to deduplicate code?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #40437:
URL: https://github.com/apache/spark/pull/40437#discussion_r1155883913


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala:
##########
@@ -59,18 +59,6 @@ object HiveResult {
         formatDescribeTableOutput(executedPlan.executeCollectPublic())
       case _: DescribeTableExec =>

Review Comment:
   OK so we still need it for `HiveComparisonTest`. Shall we refactor the code so that we only invoke `hiveResultString` in `HiveComparisonTest`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yikf commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "Yikf (via GitHub)" <gi...@apache.org>.

Yikf commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1500018627

   Like following:
   <img width="1440" alt="image" src="https://user-images.githubusercontent.com/51110188/230561881-177790a2-3b0b-4081-ba95-7e902af6b05f.png">
   
   And expression describes `ExpressionInfo`, it uses NULL instead of null
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yikf commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "Yikf (via GitHub)" <gi...@apache.org>.

Yikf commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1500014177

   It would be best if we had a separate 'df.show' method to format the results.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1500054156

   I'm looking for consistency. `df.show` is what users see, and `hiveResultString` is for golden files. Shouldn't the golden file match what users really see? Why do we test something that is almost invisible to users?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1501408587

   > Some data values do not have a nice string representation yet
   
   Let's fix `df.show` first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yikf commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "Yikf (via GitHub)" <gi...@apache.org>.

Yikf commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1500122229

   After code validation, ThriftServerQueryTestSuite and SQLQueryTestSuite depend on goldgen files;
   
   If goldgen file follows the format of df.show (the format of df.show depends on the DF rowEncoder);
   
   For SQLQueryTestSuite, you can easily generate df.show format;
   
   For ThriftServerQueryTestSuite, as a result of this suite ThriftServer, based on the client side output as the Object, not the DataSet, It is difficult to match the format of df.show (unless you copy the logic of a DataSet encoder);
   In addition to ThriftServerQueryTestSuite, it seems that it does not have to follow the df. The format of the show?
   
   In both of these kits, which are internal tests, is the format of output based on the DataSet only required to be consistent, rather than having to use the format of df.show? Because it's not testing at df.show


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver use the spark result string that is consistent with that of `df.show`

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #40437:
URL: https://github.com/apache/spark/pull/40437#discussion_r1169537708


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala:
##########
@@ -50,27 +50,14 @@ object HiveResult {
   }
 
   /**
-   * Returns the result as a hive compatible sequence of strings. This is used in tests and
-   * `SparkSQLDriver` for CLI applications.
+   * Returns the result as a hive compatible sequence of strings. This is used in tests.
    */
   def hiveResultString(executedPlan: SparkPlan): Seq[String] =
     stripRootCommandResult(executedPlan) match {
       case ExecutedCommandExec(_: DescribeCommandBase) =>
         formatDescribeTableOutput(executedPlan.executeCollectPublic())
       case _: DescribeTableExec =>
         formatDescribeTableOutput(executedPlan.executeCollectPublic())
-      // SHOW TABLES in Hive only output table names while our v1 command outputs

Review Comment:
   do we still need to change it given it's only used for hive compatibility test?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] yaooqinn commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.

yaooqinn commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1489781086

   I am not sure why we must keep consistent with Hive for such a case,
   
   1. this is just output from the command line interface, not a programming API.
   2. the `hive` CLI itself is already deprecated.
   
   Why do we not always use schemas of spark's commands as the CLI header?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1501781527

   @AngersZhuuuu are you programmatically parsing the output of the CLI? I thought it displays values for humans only.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] yaooqinn commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.

yaooqinn commented on code in PR #40437:
URL: https://github.com/apache/spark/pull/40437#discussion_r1155683471


##########
sql/core/src/test/resources/sql-tests/results/show-tables.sql.out:
##########
@@ -60,58 +60,58 @@ SHOW TABLES
 -- !query schema
 struct<namespace:string,tableName:string,isTemporary:boolean>
 -- !query output
-show_t1
-show_t2
-show_t3
+	show_t3	true

Review Comment:
   It did not go through the Hive CLI path directly, but mocked the steps 
    - For `-- !query schema`, we use Dataframe.schema
    - For `-- !query output`, 
        - `SQLQueryTestSuite` uses hiveResultString directly
        - `ThriftServerQueryTestSuite` bypasses validation from 'show xxx' (cc @Yikf we shall enable these in this PR)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] yaooqinn commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.

yaooqinn commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1500113298

   Adjusting `df.show` may need to change the output of `show` first. Some data values do not have a nice string representation yet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1489876875

   @yaooqinn this is a good point. If we are sure this is only for CLI display, not thriftserver protocol, I agree we don't need to follow Hive.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yikf commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "Yikf (via GitHub)" <gi...@apache.org>.

Yikf commented on code in PR #40437:
URL: https://github.com/apache/spark/pull/40437#discussion_r1147228669


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala:
##########
@@ -50,36 +51,44 @@ object HiveResult {
   }
 
   /**
-   * Returns the result as a hive compatible sequence of strings. This is used in tests and
-   * `SparkSQLDriver` for CLI applications.
+   * Returns the hive compatible results, which include output schemas and sequence of strings.
+   * This is used in tests and `SparkSQLDriver` for CLI applications.
    */
-  def hiveResultString(executedPlan: SparkPlan): Seq[String] =

Review Comment:
   Please take a look again, thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #40437:
URL: https://github.com/apache/spark/pull/40437#discussion_r1152923808


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala:
##########
@@ -59,18 +59,6 @@ object HiveResult {
         formatDescribeTableOutput(executedPlan.executeCollectPublic())

Review Comment:
   Can we check the commit history and figure out why did we add this add in the first place?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] yaooqinn commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.

yaooqinn commented on code in PR #40437:
URL: https://github.com/apache/spark/pull/40437#discussion_r1155683471


##########
sql/core/src/test/resources/sql-tests/results/show-tables.sql.out:
##########
@@ -60,58 +60,58 @@ SHOW TABLES
 -- !query schema
 struct<namespace:string,tableName:string,isTemporary:boolean>
 -- !query output
-show_t1
-show_t2
-show_t3
+	show_t3	true

Review Comment:
   It did not go through the Hive CLI path
    - For `-- !query schema`, we use Dataframe.schema
    - For `-- !query output`, 
        - `SQLQueryTestSuite` uses hiveResultString directly
        - `ThriftServerQueryTestSuite` bypasses validation from 'show xxx' (cc @Yikf we shall enable these in this PR)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yikf commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "Yikf (via GitHub)" <gi...@apache.org>.

Yikf commented on code in PR #40437:
URL: https://github.com/apache/spark/pull/40437#discussion_r1155867225


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala:
##########
@@ -59,18 +59,6 @@ object HiveResult {
         formatDescribeTableOutput(executedPlan.executeCollectPublic())
       case _: DescribeTableExec =>

Review Comment:
   The previous purpose of hiveResultString was twofold
   1. Keep the schema output of show tables/views consistent with the hive
   2. Ensure that the result string is compatible with the hive
   
   This PR intended to modify the first point that the result string sequence is compatible with hive should we keep it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver use the spark result string that is consistent with that of `df.show`

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1525140564

   https://github.com/apache/spark/pull/40922 has been merged, @Yikf can you open a PR to update `ToPrettyString` for all the needed changes to adjust the behavior of `df.show`? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yikf commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver use the spark result string that is consistent with that of `df.show`

Posted by "Yikf (via GitHub)" <gi...@apache.org>.

Yikf commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1534149477

   > #40922 has been merged, @Yikf can you open a PR to update `ToPrettyString` for all the needed changes to adjust the behavior of `df.show`?
   
   Sure, I will open PR to make `ToPrettyString` eval nice strings.
   
   And, I will close this PR since I think `spark-sql` and `df.show` are two different REPLs, don't have to make them exactly the same to break compatibility.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1489962469

   > I'm not sure why spark-sql CLI has to be compatible with hive output, personally, I don't think it's necessary. Maybe we should display spark's schema as is, just like thriftSever?
   
   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "AngersZhuuuu (via GitHub)" <gi...@apache.org>.

AngersZhuuuu commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1501660226

   Hmm, how about add a legacy conf to indicate whether we should follow the hive format here? and default we use false.  I'm not sure if there will be users who use this place to connect to the internal system strictly in accordance with the hive format


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] yaooqinn commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.

yaooqinn commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1489909241

   > If we are sure this is only for CLI display,
   
   Yes. hiveResultString is only used in spark-sql CLI. The thrift server-side always uses command output schema. Maybe this is the inconsistent issue we face internally


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1499967555

   > Spark generates golden files for SQLQueryTestSuite;
   
   I think golden files there should match `df.show` instead of hive result.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1500082745

   also cc @AngersZhuuuu 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yikf commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver use the spark result string that is consistent with that of `df.show`

Posted by "Yikf (via GitHub)" <gi...@apache.org>.

Yikf commented on code in PR #40437:
URL: https://github.com/apache/spark/pull/40437#discussion_r1169601459


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala:
##########
@@ -50,27 +50,14 @@ object HiveResult {
   }
 
   /**
-   * Returns the result as a hive compatible sequence of strings. This is used in tests and
-   * `SparkSQLDriver` for CLI applications.
+   * Returns the result as a hive compatible sequence of strings. This is used in tests.
    */
   def hiveResultString(executedPlan: SparkPlan): Seq[String] =
     stripRootCommandResult(executedPlan) match {
       case ExecutedCommandExec(_: DescribeCommandBase) =>
         formatDescribeTableOutput(executedPlan.executeCollectPublic())
       case _: DescribeTableExec =>
         formatDescribeTableOutput(executedPlan.executeCollectPublic())
-      // SHOW TABLES in Hive only output table names while our v1 command outputs

Review Comment:
   I think that it's necessary since the content schema and values of the golden files involving these commands are also consistent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yikf commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "Yikf (via GitHub)" <gi...@apache.org>.

Yikf commented on code in PR #40437:
URL: https://github.com/apache/spark/pull/40437#discussion_r1155677323


##########
sql/core/src/test/resources/sql-tests/results/show-tables.sql.out:
##########
@@ -60,58 +60,58 @@ SHOW TABLES
 -- !query schema
 struct<namespace:string,tableName:string,isTemporary:boolean>
 -- !query output
-show_t1
-show_t2
-show_t3
+	show_t3	true

Review Comment:
   And there was something strange about the previous golden file about `SHOW TABLES/VIEWS`.
   
   The file display schema is like this:
   ```
   -- !query
   SHOW TABLES IN showdb
   -- !query schema
   struct<namespace:string,tableName:string,isTemporary:boolean>
   ```
   
   but the output is like this
   ```
   -- !query output
   show_t1
   show_t2
   show_t3
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yikf commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "Yikf (via GitHub)" <gi...@apache.org>.

Yikf commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1493720082

   Please take a look, thanks @cloud-fan @yaooqinn 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yikf commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "Yikf (via GitHub)" <gi...@apache.org>.

Yikf commented on code in PR #40437:
URL: https://github.com/apache/spark/pull/40437#discussion_r1155697807


##########
sql/core/src/test/resources/sql-tests/results/show-tables.sql.out:
##########
@@ -60,58 +60,58 @@ SHOW TABLES
 -- !query schema
 struct<namespace:string,tableName:string,isTemporary:boolean>
 -- !query output
-show_t1
-show_t2
-show_t3
+	show_t3	true

Review Comment:
   Sure



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1506229279

   https://github.com/apache/spark/pull/40699 is merged, can we continue this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yikf commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "Yikf (via GitHub)" <gi...@apache.org>.

Yikf commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1506258450

   > #40699 is merged, can we continue this PR?
   
   Sure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yikf commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "Yikf (via GitHub)" <gi...@apache.org>.

Yikf commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1484545471

   Friendly ping @cloud-fan , Please take a look if you find a time, thanks 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yikf commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "Yikf (via GitHub)" <gi...@apache.org>.

Yikf commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1489933393

   Yes. `hiveResultString` is added to ensure compatibility with hive output.
   
   `hiveResultString` is only used by the spark-sql CLI. It is used only as the CLI display.
   
   `thriftServer` always outputs as spark's schema,
   1. `hiveResultString` is not used for the thrift protocol.
   2. spark-sql output from the CLI and `thriftServer` is inconsistent.
   
   I'm not sure why spark-sql CLI has to be compatible with hive output, personally, I don't think it's necessary. Maybe we should display spark's schema as is, just like thriftSever?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yikf commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "Yikf (via GitHub)" <gi...@apache.org>.

Yikf commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1469756155

   @cloud-fan Please take a look if you have time, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yikf commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "Yikf (via GitHub)" <gi...@apache.org>.

Yikf commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1500073818

   > I'm looking for consistency. `df.show` is what users see, and `hiveResultString` is for golden files. Shouldn't the golden file match what users really see? Why do we test something that is almost invisible to users?
   
   In fact, `SQLQueryTestSuite` does not test `df.show`, it is actually something under the `DataSet`, and `hiveResultString` only prints out the data of the `DataSet`, but I agree with you. It is better to print with spark's resultString


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yikf commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "Yikf (via GitHub)" <gi...@apache.org>.

Yikf commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1501690683

   Does `ThriftServerQueryTestSuite` also need to use sparkResultString golden file format?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #40437:
URL: https://github.com/apache/spark/pull/40437#discussion_r1155655030


##########
sql/core/src/test/resources/sql-tests/results/show-tables.sql.out:
##########
@@ -60,58 +60,58 @@ SHOW TABLES
 -- !query schema
 struct<namespace:string,tableName:string,isTemporary:boolean>
 -- !query output
-show_t1
-show_t2
-show_t3
+	show_t3	true

Review Comment:
   @yaooqinn so the golden file test framework also go through the Hive CLI code path?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #40437:
URL: https://github.com/apache/spark/pull/40437#discussion_r1155852609


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala:
##########
@@ -59,18 +59,6 @@ object HiveResult {
         formatDescribeTableOutput(executedPlan.executeCollectPublic())
       case _: DescribeTableExec =>

Review Comment:
   shall we remove the `hiveResultString` completely?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yikf commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "Yikf (via GitHub)" <gi...@apache.org>.

Yikf commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1499861262

   > Seems like we just need a small change: only invoke `hiveResultString` if a testing config is true, and we only set that testing config in `HiveComparisonTest`.
   
   Return the result as a hive compatible sequence of strings of `hiveResultString`  currently does two things:
   1. `HiveComparisonTest` used to compare compatibility with hive output;
   2. Spark generates golden files for `SQLQueryTestSuite`;
   
   2 is a friendly form of output, and I think we should keep it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yikf commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "Yikf (via GitHub)" <gi...@apache.org>.

Yikf commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1500012982

   > > Spark generates golden files for SQLQueryTestSuite;
   > 
   > I think golden files there should match `df.show` instead of hive result.
   
   This may be a bit of a regression, maybe...  first, golden generation and validation are done with `hiveResultString `, it won't have what problem, then, the output is more readable, finally, `ThriftServerQueryTestSuite` test case generation also uses the independent `hiveString`, this also needs to reconstruct. So what's the benefit of removal?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #40437:
URL: https://github.com/apache/spark/pull/40437#discussion_r1139919994


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala:
##########
@@ -50,36 +51,44 @@ object HiveResult {
   }
 
   /**
-   * Returns the result as a hive compatible sequence of strings. This is used in tests and
-   * `SparkSQLDriver` for CLI applications.
+   * Returns the hive compatible results, which include output schemas and sequence of strings.
+   * This is used in tests and `SparkSQLDriver` for CLI applications.
    */
-  def hiveResultString(executedPlan: SparkPlan): Seq[String] =

Review Comment:
   Instead of collecting the data as rows and adjusting it to follow Hive, we can add an analyzer rule to adjust the plan. The rule can add a `Project` over certain commands, to select columns, or run a UDF to adjust the string. This rule is only applied in thriftserver environment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver use the spark result string that is consistent with that of `df.show`

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #40437:
URL: https://github.com/apache/spark/pull/40437#discussion_r1169698221


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala:
##########
@@ -50,27 +50,14 @@ object HiveResult {
   }
 
   /**
-   * Returns the result as a hive compatible sequence of strings. This is used in tests and
-   * `SparkSQLDriver` for CLI applications.
+   * Returns the result as a hive compatible sequence of strings. This is used in tests.
    */
   def hiveResultString(executedPlan: SparkPlan): Seq[String] =
     stripRootCommandResult(executedPlan) match {
       case ExecutedCommandExec(_: DescribeCommandBase) =>
         formatDescribeTableOutput(executedPlan.executeCollectPublic())
       case _: DescribeTableExec =>
         formatDescribeTableOutput(executedPlan.executeCollectPublic())
-      // SHOW TABLES in Hive only output table names while our v1 command outputs

Review Comment:
   But this breaks Hive compatibility and we need to exclude more tests in the hive comparison test, which means we are losing test coverage.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yikf commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver use the spark result string that is consistent with that of `df.show`

Posted by "Yikf (via GitHub)" <gi...@apache.org>.

Yikf commented on PR #40437:
URL: https://github.com/apache/spark/pull/40437#issuecomment-1508909227

   @cloud-fan This PR currently makes spark-sql CLI use the same spark result string as df.show.
   
   And remove the use of hiveResultString by non-HiveComparisonTest. Like ExpressionInfoSuite, currently, only SQLQueryTestHelper and HiveComparisonTest use hiveResultString, and SQLQueryTestHelper is used for golden file generation. I try to make SQLQueryTestHelper also use sparkResultString, but ThriftServerQueryTestSuite also uses golden files.
   
   And Beautify the output of cast StructType to StringType, Such as {1, 2} -> {a:1, b:2}.  (It has nothing to do with the theme of this PR, Just in passing when making ExpressionInfoSuite to use sparkResultString, If you mind, I can send another PR to do this).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yikf commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "Yikf (via GitHub)" <gi...@apache.org>.

Yikf commented on code in PR #40437:
URL: https://github.com/apache/spark/pull/40437#discussion_r1155673813


##########
sql/core/src/test/resources/sql-tests/results/show-tables.sql.out:
##########
@@ -60,58 +60,58 @@ SHOW TABLES
 -- !query schema
 struct<namespace:string,tableName:string,isTemporary:boolean>
 -- !query output
-show_t1
-show_t2
-show_t3
+	show_t3	true

Review Comment:
   Yes, It use `hiveResultString`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #40437:
URL: https://github.com/apache/spark/pull/40437#discussion_r1152653649


##########
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/KeepCommandOutputWithHive.scala:
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.analysis
+
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Project, ShowTables}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.command.{ShowTablesCommand, ShowViewsCommand}
+import org.apache.spark.sql.internal.SQLConf
+
+object KeepCommandOutputWithHive extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+    if (conf.getConf(SQLConf.LEGACY_KEEP_COMMAND_OUTPUT_SCHEMA_WITH_HIVE)) {

Review Comment:
   We don't need to add a new config. We should just apply this rule in thriftserver. @yaooqinn do you know how to check if the current environment is thriftserver or not?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] yaooqinn commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.

yaooqinn commented on code in PR #40437:
URL: https://github.com/apache/spark/pull/40437#discussion_r1152809362


##########
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/KeepCommandOutputWithHive.scala:
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.analysis
+
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Project, ShowTables}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.command.{ShowTablesCommand, ShowViewsCommand}
+import org.apache.spark.sql.internal.SQLConf
+
+object KeepCommandOutputWithHive extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+    if (conf.getConf(SQLConf.LEGACY_KEEP_COMMAND_OUTPUT_SCHEMA_WITH_HIVE)) {

Review Comment:
   If this rule is thrift server specific and we do need to adjust, we can add it as a spark session extension in SparkSQLDriver 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yikf closed pull request #40437: [SPARK-41259][SQL] SparkSQLDriver use the spark result string that is consistent with that of `df.show`

Posted by "Yikf (via GitHub)" <gi...@apache.org>.

Yikf closed pull request #40437: [SPARK-41259][SQL] SparkSQLDriver use the spark result string that is consistent with that of `df.show`
URL: https://github.com/apache/spark/pull/40437


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org