You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/09/26 05:53:08 UTC

[GitHub] [spark] amaliujia opened a new pull request, #37994: [SPARK-40454] Initial DSL framework for protobuf testing

amaliujia opened a new pull request, #37994:
URL: https://github.com/apache/spark/pull/37994

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   Implement an approach to testing the proto to Scala conversion with a DSL according to the proposal in [spark-connect-testing-plan](https://docs.google.com/document/d/1n6EgS5vcmbwJUs5KGX4PzjKZVcSKd0qf0gLNZ6NFvOE/edit#bookmark=id.v673sn6dm9k4).
   
   
   More specifically, we need to decouple the testing for clients' data frame API and for the proto contract. The testing for proto is used to protect that any core changes in SQL does not break proto contracts. If necessary, a change in SQL plans has to justify that it should work with connect proto but does not need to cover clients (given that there could be many clients).  This DSL proposal is a part of that effort.
   
   
   This is still experimental and could face changes in the future. 
   
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   For testing the connect protobuf contract.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   NO
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   
   UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980308023


##########
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##########
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql
+
+import org.apache.spark.connect.proto
+

Review Comment:
   I changed to match what catalyst's DSL is doing. I also added  class comment.
   
   However when I try to document the usage guide, I found the import that works in test code which does not work for spark-shell
   
   ```
   
   scala> import org.apache.spark.sql.connect.plans.DslLogicalPlan
   <console>:23: error: object connect is not a member of package org.apache.spark.sql
          import org.apache.spark.sql.connect.plans.DslLogicalPlan
   ```
   
    I am thinking this is a part of cleanup work that we need to do. Can I create a follow up JIRA to fix this then go back to document the DSL usage next time?      
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r981556235


##########
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql

Review Comment:
   I addressed the package refactoring comment.  Please let me know if you also want to revert the shell change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980309040


##########
connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.connect.planner
+
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.connect.plans.DslLogicalPlan
+import org.apache.spark.sql.test.SharedSparkSession
+
+class SparkConnectProtoSuite extends SparkFunSuite
+  with SharedSparkSession with SparkConnectPlanTest with PlanTest with BeforeAndAfter {
+
+  lazy val connectTestRelation =

Review Comment:
   I tried to have a function to initialize test data.
   
   Basically it creates tables and then execute those `lazy val`. In the future when test data expands we can then just expand this util function.
   
   Do you have better idea? 



##########
connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.connect.planner
+
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.connect.plans.DslLogicalPlan
+import org.apache.spark.sql.test.SharedSparkSession
+
+class SparkConnectProtoSuite extends SparkFunSuite

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980613106


##########
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql

Review Comment:
   hmmm that will be a dependency issue? connect depends on sql but this DSL needs to depends on connect proto?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r979588104


##########
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##########
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql
+
+import org.apache.spark.connect.proto
+

Review Comment:
   Should probably document this like https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L36-L64. BTW since we're here, can we match the DSL object/class structure w/ Catalyst one?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r979588691


##########
connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.connect.planner
+
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.connect.plans.DslLogicalPlan
+import org.apache.spark.sql.test.SharedSparkSession
+
+class SparkConnectProtoSuite extends SparkFunSuite
+  with SharedSparkSession with SparkConnectPlanTest with PlanTest with BeforeAndAfter {
+
+  lazy val connectTestRelation =

Review Comment:
   Can we have a utility to create a table? seems like otherwise we have to define `connectTestRelation`, `sparkTestRelation` and `CREATE TABLE` statement always together.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r981519878


##########
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.analysis.UnresolvedAlias
+import org.apache.spark.sql.catalyst.expressions.{Expression, NamedExpression}
+
+
+/**
+ * A collection of implicit conversions that create a DSL for constructing connect protos.
+ *
+ * {{{
+ *   scala> import org.apache.spark.sql.connect.plans.DslLogicalPlan
+ *
+ *   // Standard way to construct connect proto
+ *   scala> import org.apache.spark.connect.proto
+ *   scala> :paste
+ *   // Entering paste mode (ctrl-D to finish)
+ *   val connectTestRelation =
+ *    proto.Relation.newBuilder()
+ *     .setRead(
+ *       proto.Read.newBuilder()
+ *       .setNamedTable(proto.Read.NamedTable.newBuilder().addParts("student").build())
+ *       .build())
+ *     .build()
+ *  // Exiting paste mode, now interpreting.
+ *    connectTestRelation: org.apache.spark.connect.proto.Relation =
+ *     read {
+ *      named_table {
+ *        parts: "student"
+ *      }
+ *    }
+ *
+ *   // Now we can apply select on the proto relation above
+ *   scala> import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
+ *   scala> connectTestRelation.select(UnresolvedAttribute(Seq("id")))
+ *   res14: org.apache.spark.connect.proto.Relation =
+ *    project {
+ *      input {
+ *        read {
+ *          named_table {
+ *            parts: "student"
+ *          }
+ *        }
+ *      }
+ *      expressions {
+ *        unresolved_attribute {
+ *          parts: "id"
+ *        }
+ *      }
+ *    }
+ *
+ * }}}
+ *
+ */
+package object connect {
+
+  object plans { // scalastyle:ignore
+    implicit class DslLogicalPlan(val logicalPlan: proto.Relation) {
+      def select(exprs: Expression*): proto.Relation = {

Review Comment:
   This shouldn't be `proto.Expression`.  In the near term I don't see a need for add `def select(exprs: proto.Expression*)`.
   
   We use this to generate proto and compare with normal code path of spark dataframe, the inputs here should be the same which are normal expressions. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r981927808


##########
connect/src/main/scala/org/apache/spark/sql/catalyst/connect/connect.scala:
##########
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.analysis.UnresolvedAlias
+import org.apache.spark.sql.catalyst.expressions.{Expression, NamedExpression}
+
+/**
+ * A collection of implicit conversions that create a DSL for constructing connect protos.
+ *
+ * {{{
+ *   scala> import org.apache.spark.sql.connect.plans.DslLogicalPlan
+ *
+ *   // Standard way to construct connect proto
+ *   scala> import org.apache.spark.connect.proto
+ *   scala> :paste
+ *   // Entering paste mode (ctrl-D to finish)
+ *   val connectTestRelation =
+ *    proto.Relation.newBuilder()
+ *     .setRead(
+ *       proto.Read.newBuilder()
+ *       .setNamedTable(proto.Read.NamedTable.newBuilder().addParts("student").build())
+ *       .build())
+ *     .build()
+ *  // Exiting paste mode, now interpreting.
+ *    connectTestRelation: org.apache.spark.connect.proto.Relation =
+ *     read {
+ *      named_table {
+ *        parts: "student"
+ *      }
+ *    }
+ *
+ *   // Now we can apply select on the proto relation above
+ *   scala> import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
+ *   scala> connectTestRelation.select(UnresolvedAttribute(Seq("id")))
+ *   res14: org.apache.spark.connect.proto.Relation =
+ *    project {
+ *      input {
+ *        read {
+ *          named_table {
+ *            parts: "student"
+ *          }
+ *        }
+ *      }
+ *      expressions {
+ *        unresolved_attribute {
+ *          parts: "id"
+ *        }
+ *      }
+ *    }
+ *
+ * }}}
+ *
+ */
+package object connect {
+
+  object plans { // scalastyle:ignore
+    implicit class DslLogicalPlan(val logicalPlan: proto.Relation) {
+      def select(exprs: Expression*): proto.Relation = {
+        val namedExpressions = exprs.map {
+          case e: NamedExpression => e
+          case e => UnresolvedAlias(e)
+        }
+
+        proto.Relation.newBuilder().setProject(
+          proto.Project.newBuilder()
+            .setInput(logicalPlan)
+            .addExpressions(
+              proto.Expression.newBuilder()
+                .setUnresolvedAttribute(

Review Comment:
   I see why you might be confused: 
   
   the code here is definitely not complete (of course it is hard to make it complete from the beginning).  
   
   I plan to expand this when we start to implement one API after another. 
   
   Basically when we start to check the `Projection` support, we then test every type of the expression for a `select` on a relation. After that this part will be complete (I hope).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r988643026


##########
connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.connect.planner
+
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.AttributeReference
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
+import org.apache.spark.sql.types.{DataType, IntegerType}
+
+/**
+ * This suite is based on connect DSL and test that given same dataframe operations, whether
+ * connect could construct a proto plan that can be translated back, and after analyzed, be the
+ * same as Spark dataframe's generated plan.
+ */
+class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest {
+
+  lazy val connectTestRelation = createLocalRelationProto(Map("id" -> IntegerType))
+
+  lazy val sparkTestRelation: LocalRelation = LocalRelation(AttributeReference("id", IntegerType)())
+
+  test("Basic select") {
+    val connectPlan = {
+      // TODO: Scala only allows one implicit per scope so we keep proto implicit imports in
+      // this scope. Need to find a better way to make two implicits work in the smae scope.
+      import org.apache.spark.sql.catalyst.connect.expressions._
+      import org.apache.spark.sql.catalyst.connect.plans._
+      transform(connectTestRelation.select("id".protoAttr))
+    }
+    val sparkPlan = sparkTestRelation.select("id")
+    comparePlans(connectPlan.analyze, sparkPlan.analyze, false)
+  }
+
+  private def createLocalRelationProto(colNameTypeMap: Map[String, DataType]): proto.Relation = {

Review Comment:
   Seq[AttributeReference] is better actually based on your comment below.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r982725060


##########
connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.connect.planner
+
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.connect.plans.DslLogicalPlan
+import org.apache.spark.sql.catalyst.connect.plans.DslString
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.test.SharedSparkSession
+
+/**
+ * This suite is based on connect DSL and test that given same dataframe operations, whether
+ * connect could construct a proto plan that can be translated back, and after analyzed, be the
+ * same as Spark dataframe's generated plan.
+ */
+class SparkConnectProtoSuite extends SparkFunSuite
+  with SharedSparkSession with SparkConnectPlanTest with PlanTest with BeforeAndAfter {
+
+  lazy val connectTestRelation =
+    proto.Relation.newBuilder()
+      .setRead(
+        proto.Read.newBuilder()
+        .setNamedTable(proto.Read.NamedTable.newBuilder().addParts("student").build())
+        .build())
+      .build()
+
+  lazy val sparkTestRelation = spark.table("student")
+
+  before {
+    setupTestData()
+  }
+
+  test("Basic select") {
+    val connectPlan = analyze(connectTestRelation.select("id".protoAttr))
+    val sparkPlan = sparkTestRelation.select("id").queryExecution.analyzed

Review Comment:
   I see what you are saying. In this case we need to support `LocalRelation` in connect proto and let connect plan generated from `LocalRelation` proto (otherwise connect plan will be different from the spark plan which is a LocalRelation).
   
   Is this what you prefer?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980615123


##########
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql

Review Comment:
   BTW, I don't think we have to make it available in spark-shell. The catalyst dsl was added before we have DataFrame API. At that time, it was the only way for end-users to build query plan with a dataframe-like API. We don't have such a need for spark connect dsl.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r981920306


##########
connect/src/main/scala/org/apache/spark/sql/catalyst/connect/connect.scala:
##########
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.analysis.UnresolvedAlias
+import org.apache.spark.sql.catalyst.expressions.{Expression, NamedExpression}
+
+/**
+ * A collection of implicit conversions that create a DSL for constructing connect protos.
+ *
+ * {{{
+ *   scala> import org.apache.spark.sql.connect.plans.DslLogicalPlan
+ *
+ *   // Standard way to construct connect proto
+ *   scala> import org.apache.spark.connect.proto
+ *   scala> :paste
+ *   // Entering paste mode (ctrl-D to finish)
+ *   val connectTestRelation =
+ *    proto.Relation.newBuilder()
+ *     .setRead(
+ *       proto.Read.newBuilder()
+ *       .setNamedTable(proto.Read.NamedTable.newBuilder().addParts("student").build())
+ *       .build())
+ *     .build()
+ *  // Exiting paste mode, now interpreting.
+ *    connectTestRelation: org.apache.spark.connect.proto.Relation =
+ *     read {
+ *      named_table {
+ *        parts: "student"
+ *      }
+ *    }
+ *
+ *   // Now we can apply select on the proto relation above
+ *   scala> import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
+ *   scala> connectTestRelation.select(UnresolvedAttribute(Seq("id")))
+ *   res14: org.apache.spark.connect.proto.Relation =
+ *    project {
+ *      input {
+ *        read {
+ *          named_table {
+ *            parts: "student"
+ *          }
+ *        }
+ *      }
+ *      expressions {
+ *        unresolved_attribute {
+ *          parts: "id"
+ *        }
+ *      }
+ *    }
+ *
+ * }}}
+ *
+ */
+package object connect {
+
+  object plans { // scalastyle:ignore
+    implicit class DslLogicalPlan(val logicalPlan: proto.Relation) {
+      def select(exprs: Expression*): proto.Relation = {
+        val namedExpressions = exprs.map {
+          case e: NamedExpression => e
+          case e => UnresolvedAlias(e)
+        }
+
+        proto.Relation.newBuilder().setProject(
+          proto.Project.newBuilder()
+            .setInput(logicalPlan)
+            .addExpressions(
+              proto.Expression.newBuilder()
+                .setUnresolvedAttribute(

Review Comment:
   I am not familiar with existing analyzer DSL enough but
   1. This DSL matches with the analyzer DSL API. So the inputs will be the same.
   2. Then I think yes, we gonna have more conversions. Function -> UnresolvedFunciton, Column -> UnresolvedAttribute, Literal -> proto defined literal, etc.
   
   With this PR merged in, when we start to work on API coverage, then more cases (various expressions) will be covered by adding more tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #37994:
URL: https://github.com/apache/spark/pull/37994#issuecomment-1258554960

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980404430


##########
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##########
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql
+
+import org.apache.spark.connect.proto
+

Review Comment:
   I double-checked in case I did't include right DSL classes in the shell but just import a different connect class with same error:
   ```
   scala> import org.apache.spark.sql.connect.service.SparkConnectService
   <console>:25: error: object connect is not a member of package org.apache.spark.sql
          import org.apache.spark.sql.connect.service.SparkConnectService
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r981920306


##########
connect/src/main/scala/org/apache/spark/sql/catalyst/connect/connect.scala:
##########
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.analysis.UnresolvedAlias
+import org.apache.spark.sql.catalyst.expressions.{Expression, NamedExpression}
+
+/**
+ * A collection of implicit conversions that create a DSL for constructing connect protos.
+ *
+ * {{{
+ *   scala> import org.apache.spark.sql.connect.plans.DslLogicalPlan
+ *
+ *   // Standard way to construct connect proto
+ *   scala> import org.apache.spark.connect.proto
+ *   scala> :paste
+ *   // Entering paste mode (ctrl-D to finish)
+ *   val connectTestRelation =
+ *    proto.Relation.newBuilder()
+ *     .setRead(
+ *       proto.Read.newBuilder()
+ *       .setNamedTable(proto.Read.NamedTable.newBuilder().addParts("student").build())
+ *       .build())
+ *     .build()
+ *  // Exiting paste mode, now interpreting.
+ *    connectTestRelation: org.apache.spark.connect.proto.Relation =
+ *     read {
+ *      named_table {
+ *        parts: "student"
+ *      }
+ *    }
+ *
+ *   // Now we can apply select on the proto relation above
+ *   scala> import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
+ *   scala> connectTestRelation.select(UnresolvedAttribute(Seq("id")))
+ *   res14: org.apache.spark.connect.proto.Relation =
+ *    project {
+ *      input {
+ *        read {
+ *          named_table {
+ *            parts: "student"
+ *          }
+ *        }
+ *      }
+ *      expressions {
+ *        unresolved_attribute {
+ *          parts: "id"
+ *        }
+ *      }
+ *    }
+ *
+ * }}}
+ *
+ */
+package object connect {
+
+  object plans { // scalastyle:ignore
+    implicit class DslLogicalPlan(val logicalPlan: proto.Relation) {
+      def select(exprs: Expression*): proto.Relation = {
+        val namedExpressions = exprs.map {
+          case e: NamedExpression => e
+          case e => UnresolvedAlias(e)
+        }
+
+        proto.Relation.newBuilder().setProject(
+          proto.Project.newBuilder()
+            .setInput(logicalPlan)
+            .addExpressions(
+              proto.Expression.newBuilder()
+                .setUnresolvedAttribute(

Review Comment:
   I am not familiar with existing analyzer DSL enough bu
   1. This DSL matches with the analyzer DSL API. So the inputs will be the same.
   2. Then I think yes, we gonna have more conversions. Function -> UnresolvedFunciton, Column -> UnresolvedAttribute, Literal -> proto defined literal, etc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r982063973


##########
connect/src/main/scala/org/apache/spark/sql/catalyst/connect/connect.scala:
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.connect.proto
+
+/**
+ * A collection of implicit conversions that create a DSL for constructing connect protos.
+ */
+package object connect {
+
+  object plans { // scalastyle:ignore
+    implicit class DslString(val s: String) {
+      def protoAttr: proto.Expression =
+        proto.Expression.newBuilder()
+          .setUnresolvedAttribute(
+            proto.Expression.UnresolvedAttribute.newBuilder()
+              .addParts(s)

Review Comment:
   shall we call `CatalystSqlParser.parseMultipartIdentifier` to support cases like `"a.b".protoAttr`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r988644725


##########
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.connect.planner
+
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.AttributeReference
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
+import org.apache.spark.sql.types.{DataType, IntegerType}
+
+/**
+ * This suite is based on connect DSL and test that given same dataframe operations, whether
+ * connect could construct a proto plan that can be translated back, and after analyzed, be the
+ * same as Spark dataframe's generated plan.
+ */
+class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest {
+
+  lazy val connectTestRelation = createLocalRelationProto(Map("id" -> IntegerType))
+
+  lazy val sparkTestRelation: LocalRelation = LocalRelation(AttributeReference("id", IntegerType)())

Review Comment:
   yes. this is perfect. 
   
   Many tricks to learn of how to use catalyst DSL...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r988824663


##########
connector/connect/src/main/scala/org/apache/spark/sql/catalyst/connect/connect.scala:
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.connect.proto

Review Comment:
   sorry, I didn't see that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r979588835


##########
connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.connect.planner
+
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.connect.plans.DslLogicalPlan
+import org.apache.spark.sql.test.SharedSparkSession
+
+class SparkConnectProtoSuite extends SparkFunSuite

Review Comment:
   Should probably document what this testing base is for.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980613293


##########
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql

Review Comment:
   Are you actually saying move to `org.apache.spark.sql.catalyst` in connect module?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #37994:
URL: https://github.com/apache/spark/pull/37994#issuecomment-1274011873

   I wonder if we can leverage existing test cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r988772280


##########
connector/connect/src/main/scala/org/apache/spark/sql/catalyst/connect/connect.scala:
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.connect.proto

Review Comment:
   This is package `org.apache.spark.sql.catalyst` under the `connect` module, so there is no circular reference. But it seems no reason to put tests under the `catalyst` package. Maybe we can just put test in package `org.apache.spark.connect.proto` and then we don't need this import.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r982067325


##########
connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.connect.planner
+
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.connect.plans.DslLogicalPlan
+import org.apache.spark.sql.catalyst.connect.plans.DslString
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.test.SharedSparkSession
+
+/**
+ * This suite is based on connect DSL and test that given same dataframe operations, whether
+ * connect could construct a proto plan that can be translated back, and after analyzed, be the
+ * same as Spark dataframe's generated plan.
+ */
+class SparkConnectProtoSuite extends SparkFunSuite
+  with SharedSparkSession with SparkConnectPlanTest with PlanTest with BeforeAndAfter {
+
+  lazy val connectTestRelation =
+    proto.Relation.newBuilder()
+      .setRead(
+        proto.Read.newBuilder()
+        .setNamedTable(proto.Read.NamedTable.newBuilder().addParts("student").build())
+        .build())
+      .build()
+
+  lazy val sparkTestRelation = spark.table("student")
+
+  before {
+    setupTestData()
+  }
+
+  test("Basic select") {
+    val connectPlan = analyze(connectTestRelation.select("id".protoAttr))
+    val sparkPlan = sparkTestRelation.select("id").queryExecution.analyzed

Review Comment:
   do we really need DataFrame here? We can use pure logical plans:
   ```
   val testRelation = LocalRelation(...)
   ...
   val connectPlan = transform(...).analyze
   val sparkPlan = testRelation.select... // using the catalyst dsl, not DataFrame APIs
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r988626062


##########
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.connect.planner
+
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.AttributeReference
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
+import org.apache.spark.sql.types.{DataType, IntegerType}
+
+/**
+ * This suite is based on connect DSL and test that given same dataframe operations, whether
+ * connect could construct a proto plan that can be translated back, and after analyzed, be the
+ * same as Spark dataframe's generated plan.
+ */
+class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest {
+
+  lazy val connectTestRelation = createLocalRelationProto(Map("id" -> IntegerType))
+
+  lazy val sparkTestRelation: LocalRelation = LocalRelation(AttributeReference("id", IntegerType)())

Review Comment:
   we can do the same for proto local relation to simplify code
   ```
   createLocalRelationProto(Seq($"id".int))
   def createLocalRelationProto(attrs: Seq[AttributeReference]) = {
     for (attr <- attrs) {
       .....setName(attr.name)
     }
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r988621244


##########
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -111,8 +122,12 @@ class SparkConnectPlanner(plan: proto.Relation, session: SparkSession) {
     }
   }
 
-  private def transformUnresolvedExpression(exp: proto.Expression): UnresolvedAttribute = {
-    UnresolvedAttribute(exp.getUnresolvedAttribute.getPartsList.asScala.toSeq)
+  private def transformUnresolvedExpression(exp: proto.Expression): Expression = {
+    if (exp.getUnresolvedAttribute.getPartsCount == 1) {
+      Literal.create(exp.getUnresolvedAttribute.getParts(0), StringType)

Review Comment:
   I need to change this is to match catalyst DSL. 
   
   catalyst DSL generates a string literal for `a` in `relation.select("a")`. 
   
   If we think catalyst DSL is the standard way to generate logical plan then we should also follow it in connect. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on PR #37994:
URL: https://github.com/apache/spark/pull/37994#issuecomment-1270888444

   Here is the build job for my last commit and all test should have passed: https://github.com/amaliujia/spark/actions/runs/3200321726


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980615123


##########
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql

Review Comment:
   BTW, I don't think we have to make it available in spark-shell. The catalyst dsl was added before we have DataFrame API. At that time, it was the only way for end-users to build query plan with a dataframe-like API.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r981927808


##########
connect/src/main/scala/org/apache/spark/sql/catalyst/connect/connect.scala:
##########
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.analysis.UnresolvedAlias
+import org.apache.spark.sql.catalyst.expressions.{Expression, NamedExpression}
+
+/**
+ * A collection of implicit conversions that create a DSL for constructing connect protos.
+ *
+ * {{{
+ *   scala> import org.apache.spark.sql.connect.plans.DslLogicalPlan
+ *
+ *   // Standard way to construct connect proto
+ *   scala> import org.apache.spark.connect.proto
+ *   scala> :paste
+ *   // Entering paste mode (ctrl-D to finish)
+ *   val connectTestRelation =
+ *    proto.Relation.newBuilder()
+ *     .setRead(
+ *       proto.Read.newBuilder()
+ *       .setNamedTable(proto.Read.NamedTable.newBuilder().addParts("student").build())
+ *       .build())
+ *     .build()
+ *  // Exiting paste mode, now interpreting.
+ *    connectTestRelation: org.apache.spark.connect.proto.Relation =
+ *     read {
+ *      named_table {
+ *        parts: "student"
+ *      }
+ *    }
+ *
+ *   // Now we can apply select on the proto relation above
+ *   scala> import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
+ *   scala> connectTestRelation.select(UnresolvedAttribute(Seq("id")))
+ *   res14: org.apache.spark.connect.proto.Relation =
+ *    project {
+ *      input {
+ *        read {
+ *          named_table {
+ *            parts: "student"
+ *          }
+ *        }
+ *      }
+ *      expressions {
+ *        unresolved_attribute {
+ *          parts: "id"
+ *        }
+ *      }
+ *    }
+ *
+ * }}}
+ *
+ */
+package object connect {
+
+  object plans { // scalastyle:ignore
+    implicit class DslLogicalPlan(val logicalPlan: proto.Relation) {
+      def select(exprs: Expression*): proto.Relation = {
+        val namedExpressions = exprs.map {
+          case e: NamedExpression => e
+          case e => UnresolvedAlias(e)
+        }
+
+        proto.Relation.newBuilder().setProject(
+          proto.Project.newBuilder()
+            .setInput(logicalPlan)
+            .addExpressions(
+              proto.Expression.newBuilder()
+                .setUnresolvedAttribute(

Review Comment:
   I see why you might be confused: 
   
   the code here is definitely not complete (of course it is hard to make it complete from the beginning).  
   
   I plan to expand this when we start to implement one API after another. 
   
   Basically when we start to check the `Projection` support, we then test every type of the expression for a `select` on a relation. After that this part will be complete (I hope). I think that PR might be titled `improve projection coverage`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r981922371


##########
repl/pom.xml:
##########
@@ -58,6 +58,11 @@
       <artifactId>spark-sql_${scala.binary.version}</artifactId>
       <version>${project.version}</version>
     </dependency>
+    <dependency>
+      <groupId>org.apache.spark</groupId>
+      <artifactId>spark-connect_${scala.binary.version}</artifactId>

Review Comment:
   done!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r989392831


##########
connector/connect/src/main/protobuf/spark/connect/expressions.proto:
##########
@@ -159,4 +159,17 @@ message Expression {
   // UnresolvedStar is used to expand all the fields of a relation or struct.
   message UnresolvedStar {
   }
+
+  // An resolved attribute that can specify a reference (e.g. column) without needing a resolution
+  // by the analyzer.
+  message Attribute {

Review Comment:
   Renamed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r981913805


##########
connect/src/main/scala/org/apache/spark/sql/catalyst/connect/connect.scala:
##########
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.analysis.UnresolvedAlias
+import org.apache.spark.sql.catalyst.expressions.{Expression, NamedExpression}
+
+/**
+ * A collection of implicit conversions that create a DSL for constructing connect protos.
+ *
+ * {{{
+ *   scala> import org.apache.spark.sql.connect.plans.DslLogicalPlan
+ *
+ *   // Standard way to construct connect proto
+ *   scala> import org.apache.spark.connect.proto
+ *   scala> :paste
+ *   // Entering paste mode (ctrl-D to finish)
+ *   val connectTestRelation =
+ *    proto.Relation.newBuilder()
+ *     .setRead(
+ *       proto.Read.newBuilder()
+ *       .setNamedTable(proto.Read.NamedTable.newBuilder().addParts("student").build())
+ *       .build())
+ *     .build()
+ *  // Exiting paste mode, now interpreting.
+ *    connectTestRelation: org.apache.spark.connect.proto.Relation =
+ *     read {
+ *      named_table {
+ *        parts: "student"
+ *      }
+ *    }
+ *
+ *   // Now we can apply select on the proto relation above
+ *   scala> import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
+ *   scala> connectTestRelation.select(UnresolvedAttribute(Seq("id")))
+ *   res14: org.apache.spark.connect.proto.Relation =
+ *    project {
+ *      input {
+ *        read {
+ *          named_table {
+ *            parts: "student"
+ *          }
+ *        }
+ *      }
+ *      expressions {
+ *        unresolved_attribute {
+ *          parts: "id"
+ *        }
+ *      }
+ *    }
+ *
+ * }}}
+ *
+ */
+package object connect {
+
+  object plans { // scalastyle:ignore
+    implicit class DslLogicalPlan(val logicalPlan: proto.Relation) {
+      def select(exprs: Expression*): proto.Relation = {
+        val namedExpressions = exprs.map {
+          case e: NamedExpression => e
+          case e => UnresolvedAlias(e)
+        }
+
+        proto.Relation.newBuilder().setProject(
+          proto.Project.newBuilder()
+            .setInput(logicalPlan)
+            .addExpressions(
+              proto.Expression.newBuilder()
+                .setUnresolvedAttribute(

Review Comment:
   Sorry I'm still a bit confused about this. How are we going to test more expressions in the future? Are we going to put a catalyst `Expression` to proto expression converter here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r981913250


##########
repl/pom.xml:
##########
@@ -58,6 +58,11 @@
       <artifactId>spark-sql_${scala.binary.version}</artifactId>
       <version>${project.version}</version>
     </dependency>
+    <dependency>
+      <groupId>org.apache.spark</groupId>
+      <artifactId>spark-connect_${scala.binary.version}</artifactId>

Review Comment:
   We don't need to support this dsl in Spar repl. We can remove this and also move the dsl to `connect/src/test`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980480469


##########
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##########
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql
+
+import org.apache.spark.connect.proto
+

Review Comment:
   Ah I found that we need to add connect to the deps of the REPL module. 
   
   I made the change and also then update the documentation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #37994: [SPARK-40454][Connect]Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on PR #37994:
URL: https://github.com/apache/spark/pull/37994#issuecomment-1257516316

   @cloud-fan @HyukjinKwon @@grundprinzip


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980615672


##########
connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.connect.planner
+
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.connect.plans.DslLogicalPlan
+import org.apache.spark.sql.test.SharedSparkSession
+
+/**
+ * This suite is based on connect DSL and test that given same dataframe operations, whether
+ * connect could construct a proto plan that can be translated back, and after analyzed, be the
+ * same as Spark dataframe's generated plan.
+ */
+class SparkConnectProtoSuite extends SparkFunSuite
+  with SharedSparkSession with SparkConnectPlanTest with PlanTest with BeforeAndAfter {
+
+  lazy val connectTestRelation =
+    proto.Relation.newBuilder()
+      .setRead(
+        proto.Read.newBuilder()
+        .setNamedTable(proto.Read.NamedTable.newBuilder().addParts("student").build())
+        .build())
+      .build()
+
+  lazy val sparkTestRelation = spark.table("student")
+
+  before {
+    setupTestData()
+  }
+
+  test("Basic select") {
+    val connectPlan = analyze(connectTestRelation.select(UnresolvedAttribute(Seq("id"))))
+    val sparkPlan = sparkTestRelation.select("id").queryExecution.analyzed
+    comparePlans(connectPlan, sparkPlan)
+  }
+
+  private def analyze(plan: proto.Relation): LogicalPlan = {
+    spark.sessionState.executePlan(transform(plan)).analyzed
+  }
+
+  protected override def comparePlans(
+    plan1: LogicalPlan,

Review Comment:
   nit: 4 spaces indentation



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r982067988


##########
connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.connect.planner
+
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.connect.plans.DslLogicalPlan
+import org.apache.spark.sql.catalyst.connect.plans.DslString
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.test.SharedSparkSession
+
+/**
+ * This suite is based on connect DSL and test that given same dataframe operations, whether
+ * connect could construct a proto plan that can be translated back, and after analyzed, be the
+ * same as Spark dataframe's generated plan.
+ */
+class SparkConnectProtoSuite extends SparkFunSuite
+  with SharedSparkSession with SparkConnectPlanTest with PlanTest with BeforeAndAfter {
+
+  lazy val connectTestRelation =
+    proto.Relation.newBuilder()
+      .setRead(
+        proto.Read.newBuilder()
+        .setNamedTable(proto.Read.NamedTable.newBuilder().addParts("student").build())
+        .build())
+      .build()
+
+  lazy val sparkTestRelation = spark.table("student")
+
+  before {
+    setupTestData()
+  }
+
+  test("Basic select") {
+    val connectPlan = analyze(connectTestRelation.select("id".protoAttr))
+    val sparkPlan = sparkTestRelation.select("id").queryExecution.analyzed

Review Comment:
   For example, `AnalysisSuite` is a pure logical plan test with dsl.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980610466


##########
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql

Review Comment:
   The dsl should not be a public API. Can we move it to `org.apache.spark.sql.catalyst` which is a private package?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980614606


##########
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql

Review Comment:
   yup



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on PR #37994:
URL: https://github.com/apache/spark/pull/37994#issuecomment-1270510395

   the maven build should have worked but I don't know why the SBT build has an issue. Taking a look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r988624766


##########
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.connect.planner
+
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.AttributeReference
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
+import org.apache.spark.sql.types.{DataType, IntegerType}
+
+/**
+ * This suite is based on connect DSL and test that given same dataframe operations, whether
+ * connect could construct a proto plan that can be translated back, and after analyzed, be the
+ * same as Spark dataframe's generated plan.
+ */
+class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest {
+
+  lazy val connectTestRelation = createLocalRelationProto(Map("id" -> IntegerType))
+
+  lazy val sparkTestRelation: LocalRelation = LocalRelation(AttributeReference("id", IntegerType)())

Review Comment:
   we can use dsl here: `LocalRelation($"id".int)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r988619447


##########
connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.connect.planner
+
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.AttributeReference
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
+import org.apache.spark.sql.types.{DataType, IntegerType}
+
+/**
+ * This suite is based on connect DSL and test that given same dataframe operations, whether
+ * connect could construct a proto plan that can be translated back, and after analyzed, be the
+ * same as Spark dataframe's generated plan.
+ */
+class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest {
+
+  lazy val connectTestRelation = createLocalRelationProto(Map("id" -> IntegerType))
+
+  lazy val sparkTestRelation: LocalRelation = LocalRelation(AttributeReference("id", IntegerType)())
+
+  test("Basic select") {
+    val connectPlan = {
+      // TODO: Scala only allows one implicit per scope so we keep proto implicit imports in
+      // this scope. Need to find a better way to make two implicits work in the smae scope.
+      import org.apache.spark.sql.catalyst.connect.expressions._
+      import org.apache.spark.sql.catalyst.connect.plans._
+      transform(connectTestRelation.select("id".protoAttr))
+    }
+    val sparkPlan = sparkTestRelation.select("id")
+    comparePlans(connectPlan.analyze, sparkPlan.analyze, false)
+  }
+
+  private def createLocalRelationProto(colNameTypeMap: Map[String, DataType]): proto.Relation = {

Review Comment:
   it's weird to pass a `Map` here, as the elements order is not deterministic. We should use `Seq[(String, DataType)]` or simply `Seq[AttributeReference]`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r988618331


##########
connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -109,8 +120,12 @@ class SparkConnectPlanner(plan: proto.Relation, session: SparkSession) {
     }
   }
 
-  private def transformUnresolvedExpression(exp: proto.Expression): UnresolvedAttribute = {
-    UnresolvedAttribute(exp.getUnresolvedAttribute.getPartsList.asScala.toSeq)
+  private def transformUnresolvedExpression(exp: proto.Expression): Expression = {
+    if (exp.getUnresolvedAttribute.getPartsCount == 1) {
+      Literal.create(exp.getUnresolvedAttribute.getParts(0), StringType)

Review Comment:
   We can clean up this part later. This special case for string literal is very weird.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r988622964


##########
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -111,8 +122,12 @@ class SparkConnectPlanner(plan: proto.Relation, session: SparkSession) {
     }
   }
 
-  private def transformUnresolvedExpression(exp: proto.Expression): UnresolvedAttribute = {
-    UnresolvedAttribute(exp.getUnresolvedAttribute.getPartsList.asScala.toSeq)
+  private def transformUnresolvedExpression(exp: proto.Expression): Expression = {
+    if (exp.getUnresolvedAttribute.getPartsCount == 1) {
+      Literal.create(exp.getUnresolvedAttribute.getParts(0), StringType)

Review Comment:
   ideally the catalyst dsl should do `relation.select($"a")` to indicate this selects a column.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r988657189


##########
connector/connect/src/main/protobuf/spark/connect/expressions.proto:
##########
@@ -159,4 +159,17 @@ message Expression {
   // UnresolvedStar is used to expand all the fields of a relation or struct.
   message UnresolvedStar {
   }
+
+  // An resolved attribute that can specify a reference (e.g. column) without needing a resolution
+  // by the analyzer.
+  message Attribute {
+    string name = 1;
+    // TODO: Full definition of an attribute is at https://github.com/apache/spark/blob/4f4a080a78c3979961e8483aace663bdc45f489d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala#L105
+    // bool Nullability = 2;
+    //  repeated string Qualifier = 3;
+    //  int64 exprId = 4;
+    //  string UUID = 6;

Review Comment:
   Those should probably never be set from the spark connect side.



##########
connector/connect/src/main/scala/org/apache/spark/sql/catalyst/connect/connect.scala:
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.connect.proto

Review Comment:
   Doesn't this create a circular reference?
   
   Connect depends on catalyst depends on connect?



##########
connector/connect/src/main/protobuf/spark/connect/expressions.proto:
##########
@@ -159,4 +159,17 @@ message Expression {
   // UnresolvedStar is used to expand all the fields of a relation or struct.
   message UnresolvedStar {
   }
+
+  // An resolved attribute that can specify a reference (e.g. column) without needing a resolution
+  // by the analyzer.
+  message Attribute {

Review Comment:
   Why do we need a resolved attribute in the proto? This should never occur?



##########
connector/connect/src/main/protobuf/spark/connect/expressions.proto:
##########
@@ -159,4 +159,17 @@ message Expression {
   // UnresolvedStar is used to expand all the fields of a relation or struct.
   message UnresolvedStar {
   }
+
+  // An resolved attribute that can specify a reference (e.g. column) without needing a resolution

Review Comment:
   As mentioned below "resolved" is not the right way to name this.



##########
connector/connect/src/main/protobuf/spark/connect/expressions.proto:
##########
@@ -159,4 +159,17 @@ message Expression {
   // UnresolvedStar is used to expand all the fields of a relation or struct.
   message UnresolvedStar {
   }
+
+  // An resolved attribute that can specify a reference (e.g. column) without needing a resolution
+  // by the analyzer.
+  message Attribute {

Review Comment:
   Actually, I think we should rename this into something like "Qualified Attribute" or something similar. 
   
   The reason is that it's a resolved attribute because the analyzer has never seen it. The only reason it's similar to a resolved attribute is that it carries all of the attributes of a resolved one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r988787565


##########
connector/connect/src/main/protobuf/spark/connect/expressions.proto:
##########
@@ -159,4 +159,17 @@ message Expression {
   // UnresolvedStar is used to expand all the fields of a relation or struct.
   message UnresolvedStar {
   }
+
+  // An resolved attribute that can specify a reference (e.g. column) without needing a resolution
+  // by the analyzer.
+  message Attribute {

Review Comment:
   The only reason to have this is to support `LocalRelation`. It's unclear how the client can send local data (e.g. java array, pandas dataframe) to the server, which is the opposite of `df.collect`, so this is most likely a testing util in the near future.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r988826397


##########
connector/connect/src/main/protobuf/spark/connect/expressions.proto:
##########
@@ -159,4 +159,17 @@ message Expression {
   // UnresolvedStar is used to expand all the fields of a relation or struct.
   message UnresolvedStar {
   }
+
+  // An resolved attribute that can specify a reference (e.g. column) without needing a resolution
+  // by the analyzer.
+  message Attribute {

Review Comment:
   We will need to add an optional field to the `Request` https://github.com/apache/spark/blob/master/connector/connect/src/main/protobuf/spark/connect/base.proto#L40 message to support arrow serialized data in the same way we do it for the response.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980206002


##########
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##########
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql
+
+import org.apache.spark.connect.proto
+

Review Comment:
   yea, `def select` should take `Expression` instead of `String`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r989392574


##########
connector/connect/src/main/protobuf/spark/connect/expressions.proto:
##########
@@ -159,4 +159,17 @@ message Expression {
   // UnresolvedStar is used to expand all the fields of a relation or struct.
   message UnresolvedStar {
   }
+
+  // An resolved attribute that can specify a reference (e.g. column) without needing a resolution
+  // by the analyzer.
+  message Attribute {
+    string name = 1;
+    // TODO: Full definition of an attribute is at https://github.com/apache/spark/blob/4f4a080a78c3979961e8483aace663bdc45f489d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala#L105
+    // bool Nullability = 2;
+    //  repeated string Qualifier = 3;
+    //  int64 exprId = 4;
+    //  string UUID = 6;

Review Comment:
   removed



##########
connector/connect/src/main/protobuf/spark/connect/expressions.proto:
##########
@@ -159,4 +159,17 @@ message Expression {
   // UnresolvedStar is used to expand all the fields of a relation or struct.
   message UnresolvedStar {
   }
+
+  // An resolved attribute that can specify a reference (e.g. column) without needing a resolution

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing
URL: https://github.com/apache/spark/pull/37994


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #37994:
URL: https://github.com/apache/spark/pull/37994#issuecomment-1271358433

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r981520979


##########
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.analysis.UnresolvedAlias
+import org.apache.spark.sql.catalyst.expressions.{Expression, NamedExpression}
+
+
+/**
+ * A collection of implicit conversions that create a DSL for constructing connect protos.
+ *
+ * {{{
+ *   scala> import org.apache.spark.sql.connect.plans.DslLogicalPlan
+ *
+ *   // Standard way to construct connect proto
+ *   scala> import org.apache.spark.connect.proto
+ *   scala> :paste
+ *   // Entering paste mode (ctrl-D to finish)
+ *   val connectTestRelation =
+ *    proto.Relation.newBuilder()
+ *     .setRead(
+ *       proto.Read.newBuilder()
+ *       .setNamedTable(proto.Read.NamedTable.newBuilder().addParts("student").build())
+ *       .build())
+ *     .build()
+ *  // Exiting paste mode, now interpreting.
+ *    connectTestRelation: org.apache.spark.connect.proto.Relation =
+ *     read {
+ *      named_table {
+ *        parts: "student"
+ *      }
+ *    }
+ *
+ *   // Now we can apply select on the proto relation above
+ *   scala> import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
+ *   scala> connectTestRelation.select(UnresolvedAttribute(Seq("id")))
+ *   res14: org.apache.spark.connect.proto.Relation =
+ *    project {
+ *      input {
+ *        read {
+ *          named_table {
+ *            parts: "student"
+ *          }
+ *        }
+ *      }
+ *      expressions {
+ *        unresolved_attribute {
+ *          parts: "id"
+ *        }
+ *      }
+ *    }
+ *
+ * }}}
+ *
+ */
+package object connect {
+
+  object plans { // scalastyle:ignore
+    implicit class DslLogicalPlan(val logicalPlan: proto.Relation) {

Review Comment:
   In current codebase, there are `proto.Relation` and `proto.Command`. `proto.Relation` means for non-command logical plans.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r981517158


##########
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql

Review Comment:
   @cloud-fan would you prefer that I revert the change for the shell (plus removing those java doc about how to use shell to play with this DSL)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r982038430


##########
connect/src/main/scala/org/apache/spark/sql/catalyst/connect/connect.scala:
##########
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.analysis.UnresolvedAlias
+import org.apache.spark.sql.catalyst.expressions.{Expression, NamedExpression}
+
+/**
+ * A collection of implicit conversions that create a DSL for constructing connect protos.
+ *
+ * {{{
+ *   scala> import org.apache.spark.sql.connect.plans.DslLogicalPlan
+ *
+ *   // Standard way to construct connect proto
+ *   scala> import org.apache.spark.connect.proto
+ *   scala> :paste
+ *   // Entering paste mode (ctrl-D to finish)
+ *   val connectTestRelation =
+ *    proto.Relation.newBuilder()
+ *     .setRead(
+ *       proto.Read.newBuilder()
+ *       .setNamedTable(proto.Read.NamedTable.newBuilder().addParts("student").build())
+ *       .build())
+ *     .build()
+ *  // Exiting paste mode, now interpreting.
+ *    connectTestRelation: org.apache.spark.connect.proto.Relation =
+ *     read {
+ *      named_table {
+ *        parts: "student"
+ *      }
+ *    }
+ *
+ *   // Now we can apply select on the proto relation above
+ *   scala> import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
+ *   scala> connectTestRelation.select(UnresolvedAttribute(Seq("id")))
+ *   res14: org.apache.spark.connect.proto.Relation =
+ *    project {
+ *      input {
+ *        read {
+ *          named_table {
+ *            parts: "student"
+ *          }
+ *        }
+ *      }
+ *      expressions {
+ *        unresolved_attribute {
+ *          parts: "id"
+ *        }
+ *      }
+ *    }
+ *
+ * }}}
+ *
+ */
+package object connect {
+
+  object plans { // scalastyle:ignore
+    implicit class DslLogicalPlan(val logicalPlan: proto.Relation) {
+      def select(exprs: Expression*): proto.Relation = {
+        val namedExpressions = exprs.map {
+          case e: NamedExpression => e
+          case e => UnresolvedAlias(e)
+        }
+
+        proto.Relation.newBuilder().setProject(
+          proto.Project.newBuilder()
+            .setInput(logicalPlan)
+            .addExpressions(
+              proto.Expression.newBuilder()
+                .setUnresolvedAttribute(

Review Comment:
   Added the DSL part for expressions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r982062517


##########
connect/src/main/scala/org/apache/spark/sql/catalyst/connect/connect.scala:
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.connect.proto
+
+/**
+ * A collection of implicit conversions that create a DSL for constructing connect protos.
+ */
+package object connect {
+
+  object plans { // scalastyle:ignore

Review Comment:
   hmmm, do we need this `object plans`? If we want to follow catalyst dsl completely, we should have both `object plans` and `object expressions`, so that callers can import plan and expression dsl individually.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r988641745


##########
connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.connect.planner
+
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.AttributeReference
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
+import org.apache.spark.sql.types.{DataType, IntegerType}
+
+/**
+ * This suite is based on connect DSL and test that given same dataframe operations, whether
+ * connect could construct a proto plan that can be translated back, and after analyzed, be the
+ * same as Spark dataframe's generated plan.
+ */
+class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest {
+
+  lazy val connectTestRelation = createLocalRelationProto(Map("id" -> IntegerType))
+
+  lazy val sparkTestRelation: LocalRelation = LocalRelation(AttributeReference("id", IntegerType)())
+
+  test("Basic select") {
+    val connectPlan = {
+      // TODO: Scala only allows one implicit per scope so we keep proto implicit imports in
+      // this scope. Need to find a better way to make two implicits work in the smae scope.
+      import org.apache.spark.sql.catalyst.connect.expressions._
+      import org.apache.spark.sql.catalyst.connect.plans._
+      transform(connectTestRelation.select("id".protoAttr))
+    }
+    val sparkPlan = sparkTestRelation.select("id")
+    comparePlans(connectPlan.analyze, sparkPlan.analyze, false)
+  }
+
+  private def createLocalRelationProto(colNameTypeMap: Map[String, DataType]): proto.Relation = {

Review Comment:
   Good point on the ordering.
   
   I changed to use `Seq[(String, DataType)]`. I guess eventually I will change it to `Seq[AttributeReference]` when we are ready to test all the fields of a `AttributeReference`. Right now only String is really used due to no data type support in proto.



##########
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -111,8 +122,12 @@ class SparkConnectPlanner(plan: proto.Relation, session: SparkSession) {
     }
   }
 
-  private def transformUnresolvedExpression(exp: proto.Expression): UnresolvedAttribute = {
-    UnresolvedAttribute(exp.getUnresolvedAttribute.getPartsList.asScala.toSeq)
+  private def transformUnresolvedExpression(exp: proto.Expression): Expression = {
+    if (exp.getUnresolvedAttribute.getPartsCount == 1) {
+      Literal.create(exp.getUnresolvedAttribute.getParts(0), StringType)

Review Comment:
   done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980615483


##########
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.analysis.UnresolvedAlias
+import org.apache.spark.sql.catalyst.expressions.{Expression, NamedExpression}
+
+
+/**
+ * A collection of implicit conversions that create a DSL for constructing connect protos.
+ *
+ * {{{
+ *   scala> import org.apache.spark.sql.connect.plans.DslLogicalPlan
+ *
+ *   // Standard way to construct connect proto
+ *   scala> import org.apache.spark.connect.proto
+ *   scala> :paste
+ *   // Entering paste mode (ctrl-D to finish)
+ *   val connectTestRelation =
+ *    proto.Relation.newBuilder()
+ *     .setRead(
+ *       proto.Read.newBuilder()
+ *       .setNamedTable(proto.Read.NamedTable.newBuilder().addParts("student").build())
+ *       .build())
+ *     .build()
+ *  // Exiting paste mode, now interpreting.
+ *    connectTestRelation: org.apache.spark.connect.proto.Relation =
+ *     read {
+ *      named_table {
+ *        parts: "student"
+ *      }
+ *    }
+ *
+ *   // Now we can apply select on the proto relation above
+ *   scala> import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
+ *   scala> connectTestRelation.select(UnresolvedAttribute(Seq("id")))
+ *   res14: org.apache.spark.connect.proto.Relation =
+ *    project {
+ *      input {
+ *        read {
+ *          named_table {
+ *            parts: "student"
+ *          }
+ *        }
+ *      }
+ *      expressions {
+ *        unresolved_attribute {
+ *          parts: "id"
+ *        }
+ *      }
+ *    }
+ *
+ * }}}
+ *
+ */
+package object connect {
+
+  object plans { // scalastyle:ignore
+    implicit class DslLogicalPlan(val logicalPlan: proto.Relation) {
+      def select(exprs: Expression*): proto.Relation = {

Review Comment:
   should this be `proto.Expression`? Will we ever add `proto.Expression`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980616391


##########
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.analysis.UnresolvedAlias
+import org.apache.spark.sql.catalyst.expressions.{Expression, NamedExpression}
+
+
+/**
+ * A collection of implicit conversions that create a DSL for constructing connect protos.
+ *
+ * {{{
+ *   scala> import org.apache.spark.sql.connect.plans.DslLogicalPlan
+ *
+ *   // Standard way to construct connect proto
+ *   scala> import org.apache.spark.connect.proto
+ *   scala> :paste
+ *   // Entering paste mode (ctrl-D to finish)
+ *   val connectTestRelation =
+ *    proto.Relation.newBuilder()
+ *     .setRead(
+ *       proto.Read.newBuilder()
+ *       .setNamedTable(proto.Read.NamedTable.newBuilder().addParts("student").build())
+ *       .build())
+ *     .build()
+ *  // Exiting paste mode, now interpreting.
+ *    connectTestRelation: org.apache.spark.connect.proto.Relation =
+ *     read {
+ *      named_table {
+ *        parts: "student"
+ *      }
+ *    }
+ *
+ *   // Now we can apply select on the proto relation above
+ *   scala> import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
+ *   scala> connectTestRelation.select(UnresolvedAttribute(Seq("id")))
+ *   res14: org.apache.spark.connect.proto.Relation =
+ *    project {
+ *      input {
+ *        read {
+ *          named_table {
+ *            parts: "student"
+ *          }
+ *        }
+ *      }
+ *      expressions {
+ *        unresolved_attribute {
+ *          parts: "id"
+ *        }
+ *      }
+ *    }
+ *
+ * }}}
+ *
+ */
+package object connect {
+
+  object plans { // scalastyle:ignore
+    implicit class DslLogicalPlan(val logicalPlan: proto.Relation) {

Review Comment:
   Does `proto.Relation` mean logical plan in the proto world?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

Posted by GitBox <gi...@apache.org>.
amaliujia commented on PR #37994:
URL: https://github.com/apache/spark/pull/37994#issuecomment-1274024522

   @HyukjinKwon yeah let me think about it...
   
   with this DSL framework at least we have increased the probability for reusing existing tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org