You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/13 06:07:34 UTC

[GitHub] [spark] amaliujia opened a new pull request, #38638: [SPARK-41122][CONNECT] Explain API can support different modes

amaliujia opened a new pull request, #38638:
URL: https://github.com/apache/spark/pull/38638

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   This PR extends `Explain` API that can support `formatted`, `cost`, `codegen`, `extended` and `simple` modes.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   Improve Debuggability
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   NO
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38638:
URL: https://github.com/apache/spark/pull/38638#discussion_r1022266985


##########
connector/connect/src/main/protobuf/spark/connect/base.proto:
##########
@@ -48,6 +72,9 @@ message Request {
   // The logical plan to be executed / analyzed.
   Plan plan = 3;
 
+  // (Required) For analyzePlan rpc calls, configure the mode to explain plan in strings.
+  ExplainMode explain_mode= 4;

Review Comment:
   > df.show can be a proto message, why not df.explain?
   
   I think we can put all functions into `relations.proto`, but one difference is that `df.show` will trigger a job, while `df.explain` only analyzes, maybe better to move it to `AnalyzePlan`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38638:
URL: https://github.com/apache/spark/pull/38638#discussion_r1024541016


##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -667,12 +668,70 @@ def schema(self) -> StructType:
         else:
             return self._schema
 
-    def explain(self) -> str:
+    def explain(
+        self, extended: Optional[Union[bool, str]] = None, mode: Optional[str] = None

Review Comment:
   Correct me if I am wrong: 
   
   `keyword-only argument` means to not be compatible with PySpark `explain` API?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dengziming commented on a diff in pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

Posted by GitBox <gi...@apache.org>.
dengziming commented on code in PR #38638:
URL: https://github.com/apache/spark/pull/38638#discussion_r1021527911


##########
connector/connect/src/main/protobuf/spark/connect/base.proto:
##########
@@ -38,6 +38,30 @@ message Plan {
   }
 }
 
+// Plan explanation mode.
+enum ExplainMode {
+  MODE_UNSPECIFIED = 0;
+
+  // Generates only physical plan.
+  SIMPLE = 1;
+
+  // Generates parsed logical plan, analyzed logical plan, optimized logical plan and physical plan.
+  // Parsed Logical plan is a unresolved plan that extracted from the query. Analyzed logical plans
+  // transforms which translates unresolvedAttribute and unresolvedRelation into fully typed objects.
+  // The optimized logical plan transforms through a set of optimization rules, resulting in the
+  // physical plan.
+  EXTENDED = 2;
+
+  //Generates code for the statement, if any and a physical plan.

Review Comment:
   nit: space after "//"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38638:
URL: https://github.com/apache/spark/pull/38638#discussion_r1021980330


##########
connector/connect/src/main/protobuf/spark/connect/base.proto:
##########
@@ -48,6 +72,9 @@ message Request {
   // The logical plan to be executed / analyzed.
   Plan plan = 3;
 
+  // (Required) For analyzePlan rpc calls, configure the mode to explain plan in strings.
+  ExplainMode explain_mode= 4;

Review Comment:
   I make `Explain` as a message now. However given how we split  the rpc call, this message needs to stay in `base.proto`:
   ```
   // Main interface for the SparkConnect service.
   service SparkConnectService {
   
     // Executes a request that contains the query and returns a stream of [[Response]].
     rpc ExecutePlan(Request) returns (stream Response) {}
   
     // Analyzes a query and returns a [[AnalyzeResponse]] containing metadata about the query.
     rpc AnalyzePlan(Request) returns (AnalyzeResponse) {}
   }
   ```
   
   There might be better way to organize it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes
URL: https://github.com/apache/spark/pull/38638


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38638:
URL: https://github.com/apache/spark/pull/38638#discussion_r1021075427


##########
connector/connect/src/main/protobuf/spark/connect/base.proto:
##########
@@ -48,6 +72,9 @@ message Request {
   // The logical plan to be executed / analyzed.
   Plan plan = 3;
 
+  // (Required) For analyzePlan rpc calls, configure the mode to explain plan in strings.
+  ExplainMode explain_mode= 4;

Review Comment:
   hmmm yeah I guess we can go to the flatten approach which extracts sutff into message.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #38638:
URL: https://github.com/apache/spark/pull/38638#issuecomment-1319539398

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #38638:
URL: https://github.com/apache/spark/pull/38638#discussion_r1022286561


##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -667,12 +668,70 @@ def schema(self) -> StructType:
         else:
             return self._schema
 
-    def explain(self) -> str:
+    def explain(
+        self, extended: Optional[Union[bool, str]] = None, mode: Optional[str] = None

Review Comment:
   We could leverage keyword-only argument to simplify this for now. cc @zhengruifeng and @ueshin 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #38638:
URL: https://github.com/apache/spark/pull/38638#discussion_r1025039998


##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -667,12 +668,70 @@ def schema(self) -> StructType:
         else:
             return self._schema
 
-    def explain(self) -> str:
+    def explain(
+        self, extended: Optional[Union[bool, str]] = None, mode: Optional[str] = None

Review Comment:
   Actually NVM. Let's just go ahead as is for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38638:
URL: https://github.com/apache/spark/pull/38638#discussion_r1021009742


##########
connector/connect/src/main/protobuf/spark/connect/base.proto:
##########
@@ -48,6 +72,9 @@ message Request {
   // The logical plan to be executed / analyzed.
   Plan plan = 3;
 
+  // (Required) For analyzePlan rpc calls, configure the mode to explain plan in strings.
+  ExplainMode explain_mode= 4;

Review Comment:
   I actually feel like we should start to add more RPC methods. Maybe we should flatten AnalyzePlan to more methods so we don't overload protos....



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38638:
URL: https://github.com/apache/spark/pull/38638#discussion_r1022344836


##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -667,12 +668,70 @@ def schema(self) -> StructType:
         else:
             return self._schema
 
-    def explain(self) -> str:
+    def explain(
+        self, extended: Optional[Union[bool, str]] = None, mode: Optional[str] = None

Review Comment:
   Let me take a look.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38638:
URL: https://github.com/apache/spark/pull/38638#discussion_r1025851202


##########
connector/connect/src/main/protobuf/spark/connect/base.proto:
##########
@@ -38,16 +38,50 @@ message Plan {
   }
 }
 
+// Explains the input plan based on a configurable mode.
+message Explain {
+  // Plan explanation mode.
+  enum ExplainMode {
+    MODE_UNSPECIFIED = 0;
+
+    // Generates only physical plan.
+    SIMPLE = 1;
+
+    // Generates parsed logical plan, analyzed logical plan, optimized logical plan and physical plan.
+    // Parsed Logical plan is a unresolved plan that extracted from the query. Analyzed logical plans
+    // transforms which translates unresolvedAttribute and unresolvedRelation into fully typed objects.
+    // The optimized logical plan transforms through a set of optimization rules, resulting in the
+    // physical plan.
+    EXTENDED = 2;
+
+    // Generates code for the statement, if any and a physical plan.
+    CODEGEN = 3;
+
+    // If plan node statistics are available, generates a logical plan and also the statistics.
+    COST = 4;
+
+    // Generates a physical plan outline and also node details.
+    FORMATTED = 5;
+  }
+
+  // (Required) For analyzePlan rpc calls, configure the mode to explain plan in strings.
+  ExplainMode explain_mode= 1;
+}
+
 // A request to be executed by the service.
 message Request {
   // The client_id is set by the client to be able to collate streaming responses from
   // different queries.
   string client_id = 1;
   // User context
   UserContext user_context = 2;
-  // The logical plan to be executed / analyzed.
+
+  // (Required) The logical plan to be executed / analyzed.
   Plan plan = 3;
 
+  // Used when invoking AnalyzePlan rpc calls.
+  Explain explain = 4;

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38638:
URL: https://github.com/apache/spark/pull/38638#discussion_r1021002542


##########
connector/connect/src/main/protobuf/spark/connect/base.proto:
##########
@@ -48,6 +72,9 @@ message Request {
   // The logical plan to be executed / analyzed.
   Plan plan = 3;
 
+  // (Required) For analyzePlan rpc calls, configure the mode to explain plan in strings.
+  ExplainMode explain_mode= 4;

Review Comment:
   should we group the fields in `Request`?
   
   I notice that there are more methods seems should be placed in `AnalyzePlan`:
   
   - isLocal
   - isStreaming
   - inputFiles
   - semanticHash
   - catalog APIs (not sure about this)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38638:
URL: https://github.com/apache/spark/pull/38638#discussion_r1024541016


##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -667,12 +668,70 @@ def schema(self) -> StructType:
         else:
             return self._schema
 
-    def explain(self) -> str:
+    def explain(
+        self, extended: Optional[Union[bool, str]] = None, mode: Optional[str] = None

Review Comment:
   Correct me if I am wrong: 
   
   `keyword-only argument` means to not be compatible with PySpark API?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38638:
URL: https://github.com/apache/spark/pull/38638#discussion_r1021068631


##########
connector/connect/src/main/protobuf/spark/connect/base.proto:
##########
@@ -48,6 +72,9 @@ message Request {
   // The logical plan to be executed / analyzed.
   Plan plan = 3;
 
+  // (Required) For analyzePlan rpc calls, configure the mode to explain plan in strings.
+  ExplainMode explain_mode= 4;

Review Comment:
   `df.show` can be a proto message, why not `df.explain`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #38638:
URL: https://github.com/apache/spark/pull/38638#issuecomment-1314598771

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

Posted by GitBox <gi...@apache.org>.
amaliujia commented on PR #38638:
URL: https://github.com/apache/spark/pull/38638#issuecomment-1319362671

   @HyukjinKwon @cloud-fan @grundprinzip @zhengruifeng please take another look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #38638:
URL: https://github.com/apache/spark/pull/38638#discussion_r1021002618


##########
connector/connect/src/main/protobuf/spark/connect/base.proto:
##########
@@ -48,6 +72,9 @@ message Request {
   // The logical plan to be executed / analyzed.
   Plan plan = 3;
 
+  // (Required) For analyzePlan rpc calls, configure the mode to explain plan in strings.
+  ExplainMode explain_mode= 4;

Review Comment:
   also cc @HyukjinKwon 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

Posted by GitBox <gi...@apache.org>.
amaliujia commented on PR #38638:
URL: https://github.com/apache/spark/pull/38638#issuecomment-1312647905

   @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on code in PR #38638:
URL: https://github.com/apache/spark/pull/38638#discussion_r1022036015


##########
connector/connect/src/main/protobuf/spark/connect/base.proto:
##########
@@ -38,16 +38,50 @@ message Plan {
   }
 }
 
+// Explains the input plan based on a configurable mode.
+message Explain {
+  // Plan explanation mode.
+  enum ExplainMode {
+    MODE_UNSPECIFIED = 0;
+
+    // Generates only physical plan.
+    SIMPLE = 1;
+
+    // Generates parsed logical plan, analyzed logical plan, optimized logical plan and physical plan.
+    // Parsed Logical plan is a unresolved plan that extracted from the query. Analyzed logical plans
+    // transforms which translates unresolvedAttribute and unresolvedRelation into fully typed objects.
+    // The optimized logical plan transforms through a set of optimization rules, resulting in the
+    // physical plan.
+    EXTENDED = 2;
+
+    // Generates code for the statement, if any and a physical plan.
+    CODEGEN = 3;
+
+    // If plan node statistics are available, generates a logical plan and also the statistics.
+    COST = 4;
+
+    // Generates a physical plan outline and also node details.
+    FORMATTED = 5;
+  }
+
+  // (Required) For analyzePlan rpc calls, configure the mode to explain plan in strings.
+  ExplainMode explain_mode= 1;
+}
+
 // A request to be executed by the service.
 message Request {
   // The client_id is set by the client to be able to collate streaming responses from
   // different queries.
   string client_id = 1;
   // User context
   UserContext user_context = 2;
-  // The logical plan to be executed / analyzed.
+
+  // (Required) The logical plan to be executed / analyzed.
   Plan plan = 3;
 
+  // Used when invoking AnalyzePlan rpc calls.
+  Explain explain = 4;

Review Comment:
   I think this is probably the right time to split of the Request for Analyze and Execute because the explain message should probably not be part of the overall request.
   
   In the same way this actually, would follow the proto guidelines for the naming of the request objects to be named after their RPC method name.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org