You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2018/02/28 12:38:55 UTC

[GitHub] spark pull request #20692: [SPARK-23531][SQL] Show attribute type in explain

GitHub user mgaido91 opened a pull request:

    https://github.com/apache/spark/pull/20692

    [SPARK-23531][SQL] Show attribute type in explain

    ## What changes were proposed in this pull request?
    
    Added the attribute data type in order to be shown in the explain output.
    
    ## How was this patch tested?
    
    added UT

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgaido91/spark SPARK-23531

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20692.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20692
    
----
commit 0cbd0c5368115c2ac06928e163dcbc3ccaa10b83
Author: Marco Gaido <ma...@...>
Date:   2018-02-28T12:36:02Z

    [SPARK-23531][SQL] Show attribute type in explain

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87800/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    **[Test build #87791 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87791/testReport)** for PR 20692 at commit [`e076732`](https://github.com/apache/spark/commit/e07673284027b3cf9d13dc9fc8527f7d7c7d31c2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 closed the pull request at:

    https://github.com/apache/spark/pull/20692


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1157/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    @rdblue The types are important, but we do not need to embed it in the plan. Each attribute has the unique identifier. How about outputting the operator details after the plan? We need a clean plan. 
    
    Below is an example.
    http://db2commerce.com/2013/06/11/explain-part-2-command-line-explain-plans-using-db2exfmt/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    @mgaido91 It sounds like you are interested in the work on improvement of EXPLAIN. Maybe we can first hold this PR and do the design first? Could you investigate how the other DBs present their plans? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Thanks for your suggestion. I think different communities have different cultures. For new contributors, we always try to encourage them and assign them some trivial JIRAs and fixes. 
    
    @mgaido91 is not a new contributor. Basically, I trust him and believe he can make a good investigation and finish the whole work before the next release. If he has no bandwidth to do it, we still can take it over or find the other contributors to continue it. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20692#discussion_r171344948
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2264,4 +2264,18 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
         checkAnswer(df, Row(0, 10) :: Nil)
         assert(df.queryExecution.executedPlan.isInstanceOf[WholeStageCodegenExec])
       }
    +
    +  test("SPARK-23531: explain should show attributes' type") {
    --- End diff --
    
    thanks for the review @dongjoon-hyun, I added it too


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    **[Test build #87781 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87781/testReport)** for PR 20692 at commit [`e076732`](https://github.com/apache/spark/commit/e07673284027b3cf9d13dc9fc8527f7d7c7d31c2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    **[Test build #87801 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87801/testReport)** for PR 20692 at commit [`c3e4227`](https://github.com/apache/spark/commit/c3e4227b27a15aa8dda8dae66f66806b22d4e819).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    sure, @gatorsmile, no problem. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    > this is not urgent based on our release schedule.
    
    Marco is contributing this right now. It is a bad idea to ask contributors to show up in 4 months, if we don't have a better option by then. And in my experience, it's really hard to show up right before a release and get anything in because the focus is on the release.
    
    > I do not like adding the types into the plans that are already very complex.
    
    I see this as a critical piece of information that is missing from Spark plans. I've had to resort to a debugger several times to see what Spark thinks types are, and I think it is worth the extra size.
    
    I'm all for debating the value of adding these types, but we should focus on that question. Implementations can be fixed. And when have we ever avoided a contribution because it was too early?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    **[Test build #87773 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87773/testReport)** for PR 20692 at commit [`0cbd0c5`](https://github.com/apache/spark/commit/0cbd0c5368115c2ac06928e163dcbc3ccaa10b83).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    I agree with @cloud-fan that the type information doesn't need to be in every node of the plan, just in the scans and generators. We want enough information that we can tell what types are there, but we want to keep the output small if possible so there is a balance.
    
    I'm in favor of an option for extended types if it isn't too much trouble, but I don't think it is a very important feature.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20692#discussion_r171626542
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala ---
    @@ -94,7 +94,7 @@ case class CreateHiveTableAsSelectCommand(
         Seq.empty[Row]
       }
     
    -  override def argString: String = {
    +  override def argString(isLeaf: Boolean): String = {
    --- End diff --
    
    What is the value of adding `isLeaf`? Don't leaf nodes already know they are? Why not just have leaf nodes call `outputStringWithTypes` or something and avoid all of the API changes?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    **[Test build #87800 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87800/testReport)** for PR 20692 at commit [`67f297e`](https://github.com/apache/spark/commit/67f297e76c18aed21e3f5ef1472f26d4444ef99c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    @mgaido91, I'm happy without another option. Let's add it in the future if we find that we need it, but not assume that we will. This change should give most of the information needed to debug types.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87781/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    @mgaido91 Thanks for you investigation! These two weeks I am swamped. Will get back to you next week. Sorry for the delay.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20692#discussion_r171464091
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/inline-table.sql.out ---
    @@ -166,18 +166,18 @@ struct<plan:string>
     
     == Analyzed Logical Plan ==
     col1: string, col2: int, col1: string, col2: int
    -Project [col1#x, col2#x, col1#x, col2#x]
    +Project [col1#x: string, col2#x: int, col1#x: string, col2#x: int]
     +- Join Cross
    -   :- LocalRelation [col1#x, col2#x]
    -   +- LocalRelation [col1#x, col2#x]
    +   :- LocalRelation [col1#x: string, col2#x: int]
    +   +- LocalRelation [col1#x: string, col2#x: int]
    --- End diff --
    
    Repeatedly showing the data types for the same attributes may not useful. Seems too verbose. For a big query plan, it will be filled with redundant info.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    I don't think it is a good idea to block a fix like this on a redesign of explain plans. If that redesign was currently being worked on and this made it more difficult, it might be reasonable. But this can be a small interim fix and doesn't really affect larger plans.
    
    If this patch needs to be fixed to be more clean, then let's get that feedback to Marco. He has been really responsive on this contribution.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1150/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    thanks @gatorsmile and @rdblue for your discussion and thanks @gatorsmile for trusting me. I hope I will be worthy of your trust.
    
    I agree that since we are not close to a new release, having a throughout approach is better.
    
    Here is my analysis. Let me know if something else is needed. Basically, no other DB shows data type for columns. I won't include in this description the output for the various DBs, since for DB2 it is particularly verbose.
    Anyway, I confirm that DB2's output is what is shown in the article posted by @gatorsmile.
    Postgres's examples can be found here: https://www.postgresql.org/docs/10/static/using-explain.html.
    Oracle's ones here: https://docs.oracle.com/cd/B10501_01/server.920/a96540/statements_911a.htm#2061798.
    MySQL is not very different too, here it is a JSON representation of an example:
    ```
    {
      "query_block": {
        "select_id": 1,
        "nested_loop": [
          {
            "table": {
              "table_name": "u",
              "access_type": "ALL",
              "possible_keys": [
                "PRIMARY"
              ],
              "rows": 4,
              "filtered": 100
            }
          },
          {
            "table": {
              "table_name": "v",
              "access_type": "ref",
              "possible_keys": [
                "PRIMARY"
              ],
              "key": "PRIMARY",
              "used_key_parts": [
                "username"
              ],
              "key_length": "258",
              "ref": [
                "DBNAME.u.username"
              ],
              "rows": 1,
              "filtered": 100,
              "using_index": true
            }
          }
        ]
      }
    }
    ```
    Here is SQLServer output (quite similar to the others too):
    ```
    StmtText                                                                                                                                           StmtId      NodeId      Parent      PhysicalOp                     LogicalOp                      Argument                                                                                                              DefinedValues                                     EstimateRows   EstimateIO     EstimateCPU    AvgRowSize  TotalSubtreeCost OutputList                                                                                          Warnings Type                                                             Parallel EstimateExecutions
    -------------------------------------------------------------------------------------------------------------------------------------------------- ----------- ----------- ----------- ------------------------------ ------------------------------ --------------------------------------------------------------------------------------------------------------------- ------------------------------------------------- -------------- -------------- -------------- ----------- ---------------- --------------------------------------------------------------------------------------------------- -------- ---------------------------------------------------------------- -------- ------------------
    select * from t1 join t2 on t1.a=t2.a;                                                                                                                       1           1           0 NULL                           NULL                           1                                                                                                                     NULL                                                         1.0           NULL           NULL        NULL     2.4343444E-2 NULL                                                                                                NULL     SELECT                                                                  0               NULL
      |--Hash Match(Inner Join, HASH:([master].[dbo].[t2].[a])=([master].[dbo].[t1].[a]), RESIDUAL:([master].[dbo].[t2].[a]=[master].[dbo].[t1].[a]))            1           2           1 Hash Match                     Inner Join                     HASH:([master].[dbo].[t2].[a])=([master].[dbo].[t1].[a]), RESIDUAL:([master].[dbo].[t2].[a]=[master].[dbo].[t1].[a])  NULL                                                         1.0            0.0   1.7774245E-2          23     2.4343444E-2 [master].[dbo].[t1].[a], [master].[dbo].[t1].[b], [master].[dbo].[t2].[a], [master].[dbo].[t2].[b]  NULL     PLAN_ROW                                                                0                1.0
           |--Table Scan(OBJECT:([master].[dbo].[t2]))                                                                                                           1           3           2 Table Scan                     Table Scan                     OBJECT:([master].[dbo].[t2])                                                                                          [master].[dbo].[t2].[a], [master].[dbo].[t2].[b]             1.0       0.003125      0.0001581          16        0.0032831 [master].[dbo].[t2].[a], [master].[dbo].[t2].[b]                                                    NULL     PLAN_ROW                                                                0                1.0
           |--Table Scan(OBJECT:([master].[dbo].[t1]))                                                                                                           1           4           2 Table Scan                     Table Scan                     OBJECT:([master].[dbo].[t1])                                                                                          [master].[dbo].[t1].[a], [master].[dbo].[t1].[b]             1.0       0.003125      0.0001581          16        0.0032831 [master].[dbo].[t1].[a], [master].[dbo].[t1].[b]                                                    NULL     PLAN_ROW                                                                0                1.0
    ```
    
    The only one which shows datatype information is Hive. Hive's EXPLAIN actually shows a lot of information for every operation and it shows the datatype for every projection:
    ```
    STAGE DEPENDENCIES:
      Stage-1 is a root stage
      Stage-0 depends on stages: Stage-1
    
    STAGE PLANS:
      Stage: Stage-1
        Tez
          Edges:
            Reducer 2 <- Map 1 (SIMPLE_EDGE)
            Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
          DagName: root_20180302103255_e34d204d-6a8d-46b0-9932-ff7e0b024493:1
          Vertices:
            Map 1 
                Map Operator Tree:
                    TableScan
                      alias: store_sales_orc
                      Statistics: Num rows: 1000 Data size: 88276 Basic stats: COMPLETE Column stats: NONE
                      Select Operator
                        expressions: ss_sold_date_sk (type: int), ss_quantity (type: int)
                        outputColumnNames: ss_sold_date_sk, ss_quantity
                        Statistics: Num rows: 1000 Data size: 88276 Basic stats: COMPLETE Column stats: NONE
                        Group By Operator
                          aggregations: count(ss_quantity)
                          keys: ss_sold_date_sk (type: int)
                          mode: hash
                          outputColumnNames: _col0, _col1
                          Statistics: Num rows: 1000 Data size: 88276 Basic stats: COMPLETE Column stats: NONE
                          Reduce Output Operator
                            key expressions: _col0 (type: int)
                            sort order: +
                            Map-reduce partition columns: _col0 (type: int)
                            Statistics: Num rows: 1000 Data size: 88276 Basic stats: COMPLETE Column stats: NONE
                            value expressions: _col1 (type: bigint)
                Execution mode: vectorized
            Reducer 2 
                Reduce Operator Tree:
                  Group By Operator
                    aggregations: count(VALUE._col0)
                    keys: KEY._col0 (type: int)
                    mode: mergepartial
                    outputColumnNames: _col0, _col1
                    Statistics: Num rows: 500 Data size: 44138 Basic stats: COMPLETE Column stats: NONE
                    Reduce Output Operator
                      key expressions: _col1 (type: bigint)
                      sort order: -
                      Statistics: Num rows: 500 Data size: 44138 Basic stats: COMPLETE Column stats: NONE
                      value expressions: _col0 (type: int)
                Execution mode: vectorized
            Reducer 3 
                Reduce Operator Tree:
                  Select Operator
                    expressions: VALUE._col0 (type: int), KEY.reducesinkkey0 (type: bigint)
                    outputColumnNames: _col0, _col1
                    Statistics: Num rows: 500 Data size: 44138 Basic stats: COMPLETE Column stats: NONE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 500 Data size: 44138 Basic stats: COMPLETE Column stats: NONE
                      table:
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                Execution mode: vectorized
    
      Stage: Stage-0
        Fetch Operator
          limit: -1
          Processor Tree:
            ListSink
    ```
    
    What do you think? What should be Spark behavior according to you?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1163/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    **[Test build #87847 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87847/testReport)** for PR 20692 at commit [`3e91755`](https://github.com/apache/spark/commit/3e91755975c09d4f91ff3320981f4c14e492c03a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    **[Test build #87847 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87847/testReport)** for PR 20692 at commit [`3e91755`](https://github.com/apache/spark/commit/3e91755975c09d4f91ff3320981f4c14e492c03a).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    any thoughts @cloud-fan @dongjoon-hyun @gatorsmile @rdblue about the above analysis? Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    BTW I think we do need a story to improve the explain result. The current explain works pretty well with simple queries. But for complex queries, that a single plan node across many lines, it's really hard to read.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    @gatorsmile, yes, sure. Do you want me to investigate their behavior only related to how they show data types or more in general?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1145/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20692#discussion_r171421471
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/window.sql.out ---
    @@ -211,7 +211,7 @@ RANGE BETWEEN CURRENT ROW AND 1 FOLLOWING) FROM testData ORDER BY cate, val
     struct<>
     -- !query 13 output
     org.apache.spark.sql.AnalysisException
    -cannot resolve '(PARTITION BY testdata.`cate` ORDER BY testdata.`val` ASC NULLS FIRST, testdata.`cate` ASC NULLS FIRST RANGE BETWEEN CURRENT ROW AND 1 FOLLOWING)' due to data type mismatch: A range window frame with value boundaries cannot be used in a window specification with multiple order by expressions: val#x ASC NULLS FIRST,cate#x ASC NULLS FIRST; line 1 pos 33
    +cannot resolve '(PARTITION BY testdata.`cate` ORDER BY testdata.`val` ASC NULLS FIRST, testdata.`cate` ASC NULLS FIRST RANGE BETWEEN CURRENT ROW AND 1 FOLLOWING)' due to data type mismatch: A range window frame with value boundaries cannot be used in a window specification with multiple order by expressions: val#x: int ASC NULLS FIRST,cate#x: string ASC NULLS FIRST; line 1 pos 33
    --- End diff --
    
    The extra data types are pretty confusing. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    First, this is not urgent based on our release schedule. If no better solution is proposed, we can still revisit this PR before the release.
    
    Second, adding more extra string functions looks not good to me. It is just hard for us to maintain. We already have many xyzString functions. We do not want to complicate the codes for this. 
    
    Third, to really resolve this issue, the current solution is not clean. I do not like adding the types into the plans that are already very complex. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87801/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20692#discussion_r171387647
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveExplainSuite.scala ---
    @@ -182,9 +182,9 @@ class HiveExplainSuite extends QueryTest with SQLTestUtils with TestHiveSingleto
              |GlobalLimit 1
              |+- LocalLimit 1
              |   +- AnalysisBarrier
    -         |         +- Aggregate [a#0], [a#0, count(1) AS count#0L]
    -         |            +- Project [_1#0 AS a#0, _2#0 AS b#0]
    -         |               +- LocalRelation [_1#0, _2#0]
    +         |         +- Aggregate [a#0: int], [a#0: int, count(1) AS count#0L]
    +         |            +- Project [_1#0: int AS a#0, _2#0: int AS b#0]
    --- End diff --
    
    thanks for the comment @gatorsmile. In some cases, the leaf nodes are not the first occurrence of an attribute (es. think to the Range generator): so in those cases the types would not be displayed. Is this fine? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20692#discussion_r171630852
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala ---
    @@ -94,7 +94,7 @@ case class CreateHiveTableAsSelectCommand(
         Seq.empty[Row]
       }
     
    -  override def argString: String = {
    +  override def argString(isLeaf: Boolean): String = {
    --- End diff --
    
    thanks, you are right. I won't change the current implementation, since if we need to change behavior according to a flag (for instance if we decide to add the info only if the extended mode is set) these changes are needed. If not, I will follow this comment, thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    My intent is just to advocate for clear feedback on the content of PRs. Good to hear your confidence in @mgaido91, and if he wants to work on a better explain, that's great too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1165/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    The current solution looks not clean to me. I would like to hold it first since the next release is after another 6 months. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    thanks @rdblue, I updated the PR and now only the leaf nodes contain the data type information.
    
    Honestly I would avoid to have another option, since we already have the extended one. I think it could become confusing if we add too many options.
    But if you all agree on a different solution let me know.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    @cloud-fan I think it is not needed a new debug mode for it. I think it is good to have the attribute type always, but in case I think we can re-use the extended falg, without introducing a new level. What do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    **[Test build #87801 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87801/testReport)** for PR 20692 at commit [`c3e4227`](https://github.com/apache/spark/commit/c3e4227b27a15aa8dda8dae66f66806b22d4e819).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87847/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    @maropu this is a bit stuck because we first need to decide how to go on. Above you can see my analysis and comparison with other tool and now we have to choose what to do before going on. If you or anybody else has any feedback on the analysis we can move it forward. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    I don't see a compelling reason to block this work, which is well-defined and a reasonably small patch. What is here would help users with debugging.
    
    @gatorsmile, If you have an alternative approach, could you write it up and propose it on the dev list?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87773/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20692#discussion_r171421396
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveExplainSuite.scala ---
    @@ -182,9 +182,9 @@ class HiveExplainSuite extends QueryTest with SQLTestUtils with TestHiveSingleto
              |GlobalLimit 1
              |+- LocalLimit 1
              |   +- AnalysisBarrier
    -         |         +- Aggregate [a#0], [a#0, count(1) AS count#0L]
    -         |            +- Project [_1#0 AS a#0, _2#0 AS b#0]
    -         |               +- LocalRelation [_1#0, _2#0]
    +         |         +- Aggregate [a#0: int], [a#0: int, count(1) AS count#0L]
    +         |            +- Project [_1#0: int AS a#0, _2#0: int AS b#0]
    --- End diff --
    
    The input schema and the output schemas are what we really care. Our output of Explain is already pretty complex. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    **[Test build #87781 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87781/testReport)** for PR 20692 at commit [`e076732`](https://github.com/apache/spark/commit/e07673284027b3cf9d13dc9fc8527f7d7c7d31c2).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    We should clearly define when and where we need to display attribute data type. I think leaf nodes and some nodes that produce new data like `Generate` are good places. And we may also need to introduce a debug mode for explain. Personally most of the time I only focus on the shape of the query plan, not each attribute. The data type info is only needed when doing some deep debugging.
    
    also cc @rdblue


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87791/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    **[Test build #87773 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87773/testReport)** for PR 20692 at commit [`0cbd0c5`](https://github.com/apache/spark/commit/0cbd0c5368115c2ac06928e163dcbc3ccaa10b83).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Normally, in our Spark community, if the contributor is not available when we decide to merge, the others (including the committers) will take it over and give the credit back the contributor. 
    
    If this is critical in your usage scenario before our Spark 2.4 is out, I think you can add the patch to your fork. 
    
    We should output the data types, but not embed them in the plans. However, I do not know which is the best solution and that is why I am asking @mgaido91 to do more investigation. http://db2commerce.com/2013/06/11/explain-part-2-command-line-explain-plans-using-db2exfmt/ is just an example. 
    
    Hopefully, you can be also involved in the design discussion of our new formatted output of EXPLAIN. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1198/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    **[Test build #87800 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87800/testReport)** for PR 20692 at commit [`67f297e`](https://github.com/apache/spark/commit/67f297e76c18aed21e3f5ef1472f26d4444ef99c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20692#discussion_r171334033
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2264,4 +2264,18 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
         checkAnswer(df, Row(0, 10) :: Nil)
         assert(df.queryExecution.executedPlan.isInstanceOf[WholeStageCodegenExec])
       }
    +
    +  test("SPARK-23531: explain should show attributes' type") {
    --- End diff --
    
    Hi, @mgaido91 .
    For the test case, could you
    - Add a test coverage for `explain(true)`, too?
    - Try to add `explain.sql` into `SQLQueryTestSuite`?
    
    cc @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    > I think you can add the patch to your fork.
    
    My primary concern isn't the feature, although I do think it is useful. My concern is how we work with contributors. My worry here is that the objections are not on the merits of this PR.
    
    Telling a contributor to either take on something much larger or come back in a few months is discouraging and stifles contributions. This patch is here now, and the focus should be on either why the idea is a not good one (i.e., *why* types should not go in plans) or what should be improved to fix the implementation. That could also be to propose a better alternative, but without an alternate proposal, asking for something different without constructive criticism of the PR leaves me, at least, simply confused.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    The failures of `SQLQueryTestSuite` are legitimate.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    @mgaido91 Can you update this? Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Good point on nested types. I don't think heavy nesting is the usual case, but we can definitely improve the explain result in the long term by separating it out. Maybe just using a high-level type (e.g. struct<...>) for now would work?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    **[Test build #87791 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87791/testReport)** for PR 20692 at commit [`e076732`](https://github.com/apache/spark/commit/e07673284027b3cf9d13dc9fc8527f7d7c7d31c2).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20692#discussion_r171624644
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala ---
    @@ -106,7 +106,7 @@ abstract class Attribute extends LeafExpression with NamedExpression with NullIn
     
       override def toAttribute: Attribute = this
       def newInstance(): Attribute
    -
    +  def stringWithType: String = s"$toString: ${dataType.simpleString}"
    --- End diff --
    
    Nit: the `:` isn't needed and will accumulate to quite a few extra characters in plans. SQL uses `(id bigint, data string)` without `:` so I think we should mirror that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    If possible, we need the investigation first and then need more discussions before finalizing the design. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20692#discussion_r171373420
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveExplainSuite.scala ---
    @@ -182,9 +182,9 @@ class HiveExplainSuite extends QueryTest with SQLTestUtils with TestHiveSingleto
              |GlobalLimit 1
              |+- LocalLimit 1
              |   +- AnalysisBarrier
    -         |         +- Aggregate [a#0], [a#0, count(1) AS count#0L]
    -         |            +- Project [_1#0 AS a#0, _2#0 AS b#0]
    -         |               +- LocalRelation [_1#0, _2#0]
    +         |         +- Aggregate [a#0: int], [a#0: int, count(1) AS count#0L]
    +         |            +- Project [_1#0: int AS a#0, _2#0: int AS b#0]
    --- End diff --
    
    Except the leaf nodes, I think we do not need to add the types for the other nodes. cc @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    I am closing this as there is still some discussion to go on in order to choose the right approach. We can reopen it later/open a new one once we choose the approach. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20692
  
    My 2 cents: a data source may have a lot of columns, and a column may be a complex struct type. So embedding the type in the plan may make the plan unreadable.
    
    If I'm debugging and see a complex plan with attribute types inside the plan, it will be really hard to extract the information I want from the plan, as the plan node may across many lines and hard to read.
    
    Since attributes are unique across the query plan, I think put a list of "attribute -> type" before or after the explain result is acceptable.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org