You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/05/27 06:55:07 UTC

[GitHub] [spark] pan3793 opened a new pull request, #36697: [SPARK-39313][SQL] V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

pan3793 opened a new pull request, #36697:
URL: https://github.com/apache/spark/pull/36697

   ### What changes were proposed in this pull request?
   
   `V2ExpressionUtils.toCatalystOrdering` should fail if V2Expression can not be translated instead of returning empty Seq.
   
   Before fixing it, I bring a simple test to demonstrate the bug.
   
   ### Why are the changes needed?
   
   `V2ExpressionUtils.toCatalystOrdering` is used by `DistributionAndOrderingUtils`, the current behavior will break the semantics of `RequiresDistributionAndOrdering#requiredOrdering` in some cases.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New UT.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on PR #36697:
URL: https://github.com/apache/spark/pull/36697#issuecomment-1145480808

   Thank you, @pan3793 , @sunchao , @cloud-fan !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36697: [SPARK-39313][SQL] V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on code in PR #36697:
URL: https://github.com/apache/spark/pull/36697#discussion_r883780623


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanPartitioning.scala:
##########
@@ -40,7 +40,7 @@ object V2ScanPartitioning extends Rule[LogicalPlan] with SQLConfHelper {
 
       val catalystPartitioning = scan.outputPartitioning() match {
         case kgp: KeyGroupedPartitioning => sequenceToOption(kgp.keys().map(
-          V2ExpressionUtils.toCatalyst(_, relation, funCatalogOpt)))
+          V2ExpressionUtils.toCatalyst(_, relation, false, funCatalogOpt)))

Review Comment:
   If you don't mind, shall we use the following style, @pan3793 ?
   ```scala
   - V2ExpressionUtils.toCatalyst(_, relation, false, funCatalogOpt)))
   + V2ExpressionUtils.toCatalyst(_, relation, strict = false, funCatalogOpt)))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on code in PR #36697:
URL: https://github.com/apache/spark/pull/36697#discussion_r884617691


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanPartitioning.scala:
##########
@@ -32,15 +32,15 @@ import org.apache.spark.util.collection.Utils.sequenceToOption
  */
 object V2ScanPartitioning extends Rule[LogicalPlan] with SQLConfHelper {
   override def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
-    case d @ DataSourceV2ScanRelation(relation, scan: SupportsReportPartitioning, _, _) =>
+    case d @ DataSourceV2ScanRelation(relation, scan: SupportsReportPartitioning, _, None) =>
       val funCatalogOpt = relation.catalog.flatMap {
         case c: FunctionCatalog => Some(c)
         case _ => None
       }
 
       val catalystPartitioning = scan.outputPartitioning() match {
         case kgp: KeyGroupedPartitioning => sequenceToOption(kgp.keys().map(
-          V2ExpressionUtils.toCatalyst(_, relation, funCatalogOpt)))
+          V2ExpressionUtils.toCatalystOpt(_, relation, funCatalogOpt)))

Review Comment:
   I'm wondering if we should also fail here. If a data source uses an invalid partitioning, we should fail the query and let users know, so that they can debug and fix the data source. Otherwise, users may live with a performance bug for a while as it's hard to figure out where the problem is. cc @sunchao 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

pan3793 commented on PR #36697:
URL: https://github.com/apache/spark/pull/36697#issuecomment-1140133553

   ```
   [info] - SPARK-30289 Create: partitioned by nested column *** FAILED *** (363 milliseconds)
   [info]   java.lang.RuntimeException: Once strategy's idempotence is broken for batch Early Filter and Projection Push-Down
   [info]  RelationV2[ts#1494] testcat.table_name   RelationV2[ts#1494] testcat.table_name
   [info]   at org.apache.spark.sql.errors.QueryExecutionErrors$.onceStrategyIdempotenceIsBrokenForBatchError(QueryExecutionErrors.scala:1302)
   [info]   at org.apache.spark.sql.catalyst.rules.RuleExecutor.checkBatchIdempotence(RuleExecutor.scala:168)
   [info]   at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:254)
   [info]   at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
   [info]   at scala.collection.immutable.List.foreach(List.scala:431)
   [info]   at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
   [info]   at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
   [info]   at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
   [info]   at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
   [info]   at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:125)
   [info]   at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
   [info]   at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:183)
   [info]   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
   [info]   at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:183)
   [info]   at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:121)
   [info]   at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:117)
   [info]   at org.apache.spark.sql.QueryTest.assertEmptyMissingInput(QueryTest.scala:226)
   [info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:148)
   [info]   at org.apache.spark.sql.DataFrameWriterV2Suite.$anonfun$new$97(DataFrameWriterV2Suite.scala:692)
   ```
   
   In `RuleExecutor.checkBatchIdempotence`, the `Alias` node of `keyGroupedPartitioning` changes exprId.
   
   <img width="888" alt="image" src="https://user-images.githubusercontent.com/26535726/170804934-b36c3c7b-32f3-4303-9620-622b2d83a998.png">


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on a diff in pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

pan3793 commented on code in PR #36697:
URL: https://github.com/apache/spark/pull/36697#discussion_r885194978


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanPartitioning.scala:
##########
@@ -32,15 +32,15 @@ import org.apache.spark.util.collection.Utils.sequenceToOption
  */
 object V2ScanPartitioning extends Rule[LogicalPlan] with SQLConfHelper {
   override def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
-    case d @ DataSourceV2ScanRelation(relation, scan: SupportsReportPartitioning, _, _) =>
+    case d @ DataSourceV2ScanRelation(relation, scan: SupportsReportPartitioning, _, None) =>
       val funCatalogOpt = relation.catalog.flatMap {
         case c: FunctionCatalog => Some(c)
         case _ => None
       }
 
       val catalystPartitioning = scan.outputPartitioning() match {
         case kgp: KeyGroupedPartitioning => sequenceToOption(kgp.keys().map(
-          V2ExpressionUtils.toCatalyst(_, relation, funCatalogOpt)))
+          V2ExpressionUtils.toCatalystOpt(_, relation, funCatalogOpt)))

Review Comment:
   Yes, w/o error messages the user must infer the issue by query plan, but I think it's a little bit aggressive to fail the query because it's only a performance issue but not for correctness, how about adding warning logs here? ping @sunchao 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36697: [SPARK-39313][SQL] V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on code in PR #36697:
URL: https://github.com/apache/spark/pull/36697#discussion_r883783961


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/V2ExpressionUtilsSuite.scala:
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, V2ExpressionUtils}
+import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
+import org.apache.spark.sql.connector.expressions._
+import org.apache.spark.sql.types.StringType
+
+class V2ExpressionUtilsSuite extends SparkFunSuite {

Review Comment:
   `V2ExpressionUtils` is `package org.apache.spark.sql.catalyst.expressions` while this `V2ExpressionUtilsSuite` is in `package org.apache.spark.sql.execution.datasources.v2`.
   
   If this test suite depends on another stuffs (outside `org.apache.spark.sql.catalyst.expressions` package), we had better avoid creating a new `V2ExpressionUtilsSuite` here. If you don't mind, shall we move this test case into another suite?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on a diff in pull request #36697: [SPARK-39313][SQL] V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

pan3793 commented on code in PR #36697:
URL: https://github.com/apache/spark/pull/36697#discussion_r883432683


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/V2ExpressionUtilsSuite.scala:
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, V2ExpressionUtils}
+import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
+import org.apache.spark.sql.connector.expressions._
+import org.apache.spark.sql.types.StringType
+
+class V2ExpressionUtilsSuite extends SparkFunSuite {
+
+  test("SPARK-39313: toCatalystOrdering should fail if V2Expression can not be translated") {
+    val supportedV2Sort = SortValue(
+      FieldReference("a"), SortDirection.ASCENDING, NullOrdering.NULLS_FIRST)
+    val unsupportedV2Sort = supportedV2Sort.copy(
+      expression = ApplyTransform("v2Fun", FieldReference("a") :: Nil))
+    val exc = intercept[AnalysisException] {
+      V2ExpressionUtils.toCatalystOrdering(
+        Array(supportedV2Sort, unsupportedV2Sort),
+        LocalRelation.apply(AttributeReference("a", StringType)()))

Review Comment:
   The current master/3.3 return `Seq.empty` instead of throwing `AnalysisException`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sunchao commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

sunchao commented on PR #36697:
URL: https://github.com/apache/spark/pull/36697#issuecomment-1140132786

   @pan3793 hmm how? that change is test only and I don't see any test failure in the latest CI run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sunchao commented on a diff in pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

sunchao commented on code in PR #36697:
URL: https://github.com/apache/spark/pull/36697#discussion_r884084376


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanPartitioning.scala:
##########
@@ -32,15 +32,15 @@ import org.apache.spark.util.collection.Utils.sequenceToOption
  */
 object V2ScanPartitioning extends Rule[LogicalPlan] with SQLConfHelper {
   override def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
-    case d @ DataSourceV2ScanRelation(relation, scan: SupportsReportPartitioning, _, _) =>
+    case d @ DataSourceV2ScanRelation(relation, scan: SupportsReportPartitioning, _, None) =>

Review Comment:
   Yes, I think it's better to match `None` here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on a diff in pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

pan3793 commented on code in PR #36697:
URL: https://github.com/apache/spark/pull/36697#discussion_r884616808


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala:
##########
@@ -54,19 +53,24 @@ object V2ExpressionUtils extends SQLConfHelper with Logging {
    * Converts the array of input V2 [[V2SortOrder]] into their counterparts in catalyst.
    */
   def toCatalystOrdering(ordering: Array[V2SortOrder], query: LogicalPlan): Seq[SortOrder] = {
-    sequenceToOption(ordering.map(toCatalyst(_, query))).asInstanceOf[Option[Seq[SortOrder]]]
-      .getOrElse(Seq.empty)
+    ordering.map(toCatalystOpt(_, query).get.asInstanceOf[SortOrder])

Review Comment:
   Oops, sorry, fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on a diff in pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

pan3793 commented on code in PR #36697:
URL: https://github.com/apache/spark/pull/36697#discussion_r884053249


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanPartitioning.scala:
##########
@@ -32,15 +32,15 @@ import org.apache.spark.util.collection.Utils.sequenceToOption
  */
 object V2ScanPartitioning extends Rule[LogicalPlan] with SQLConfHelper {
   override def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
-    case d @ DataSourceV2ScanRelation(relation, scan: SupportsReportPartitioning, _, _) =>
+    case d @ DataSourceV2ScanRelation(relation, scan: SupportsReportPartitioning, _, None) =>

Review Comment:
   Do you think it's sufficient to only match keyGroupedPartitioning = None?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on code in PR #36697:
URL: https://github.com/apache/spark/pull/36697#discussion_r884612608


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala:
##########
@@ -54,19 +53,24 @@ object V2ExpressionUtils extends SQLConfHelper with Logging {
    * Converts the array of input V2 [[V2SortOrder]] into their counterparts in catalyst.
    */
   def toCatalystOrdering(ordering: Array[V2SortOrder], query: LogicalPlan): Seq[SortOrder] = {
-    sequenceToOption(ordering.map(toCatalyst(_, query))).asInstanceOf[Option[Seq[SortOrder]]]
-      .getOrElse(Seq.empty)
+    ordering.map(toCatalystOpt(_, query).get.asInstanceOf[SortOrder])

Review Comment:
   shouldn't this call `toCatalyst`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

pan3793 commented on PR #36697:
URL: https://github.com/apache/spark/pull/36697#issuecomment-1140010474

   Thanks @sunchao and @dongjoon-hyun, changed as suggested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

pan3793 commented on PR #36697:
URL: https://github.com/apache/spark/pull/36697#issuecomment-1140130220

   > @pan3793 [this commit](https://github.com/sunchao/spark/commit/c2a97be651df747f4edf19f46f2c2d41cd89b230) should help to fix the tests.
   
   The change breaks "SPARK-30289 Create: partitioned by nested column"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on code in PR #36697:
URL: https://github.com/apache/spark/pull/36697#discussion_r884557336


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala:
##########
@@ -54,19 +53,19 @@ object V2ExpressionUtils extends SQLConfHelper with Logging {
    * Converts the array of input V2 [[V2SortOrder]] into their counterparts in catalyst.
    */
   def toCatalystOrdering(ordering: Array[V2SortOrder], query: LogicalPlan): Seq[SortOrder] = {
-    sequenceToOption(ordering.map(toCatalyst(_, query))).asInstanceOf[Option[Seq[SortOrder]]]
-      .getOrElse(Seq.empty)
+    ordering.map(toCatalyst(_, query).get.asInstanceOf[SortOrder])
   }
 
   def toCatalyst(
       expr: V2Expression,
       query: LogicalPlan,
+      strict: Boolean = true,

Review Comment:
   instead of adding a new parameter, I think it's clearer and more type-safe to have 2 methods
   ```
   def toCatalystOpt ... : Option[Expression]
   def toCatalyst(...): Expression = toCatalystOpt(...).getOrElse(throw error)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sunchao commented on a diff in pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

sunchao commented on code in PR #36697:
URL: https://github.com/apache/spark/pull/36697#discussion_r885234555


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanPartitioning.scala:
##########
@@ -32,15 +32,15 @@ import org.apache.spark.util.collection.Utils.sequenceToOption
  */
 object V2ScanPartitioning extends Rule[LogicalPlan] with SQLConfHelper {
   override def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
-    case d @ DataSourceV2ScanRelation(relation, scan: SupportsReportPartitioning, _, _) =>
+    case d @ DataSourceV2ScanRelation(relation, scan: SupportsReportPartitioning, _, None) =>
       val funCatalogOpt = relation.catalog.flatMap {
         case c: FunctionCatalog => Some(c)
         case _ => None
       }
 
       val catalystPartitioning = scan.outputPartitioning() match {
         case kgp: KeyGroupedPartitioning => sequenceToOption(kgp.keys().map(
-          V2ExpressionUtils.toCatalyst(_, relation, funCatalogOpt)))
+          V2ExpressionUtils.toCatalystOpt(_, relation, funCatalogOpt)))

Review Comment:
   I'm also a bit more inclined to use warning messages here. To debug performance issue, it should be relatively straightforward to check out the query plan and see whether there's any shuffle in it, combined with checking query logs. In addition, existing V2 data source tables could already have custom transforms such as bucket, days, etc, and we don't want queries against these tables start to fail (because there is no function catalog) after upgrading to Spark 3.3.
   
   There is already warning message when function catalog exists but the transform function doesn't, in the method `loadV2FunctionOpt`. It seems we don't have warning message for the case when a custom transform function is used but there is no function catalog.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sunchao commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

sunchao commented on PR #36697:
URL: https://github.com/apache/spark/pull/36697#issuecomment-1140327360

   Hmm thinking more about this, I think maybe we should fail the analysis on the write path, even if a V2 transform exist in the function catalog. Otherwise, the write may fail at a later stage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sunchao commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

sunchao commented on PR #36697:
URL: https://github.com/apache/spark/pull/36697#issuecomment-1139971681

   @pan3793 [this commit](https://github.com/sunchao/spark/commit/c2a97be651df747f4edf19f46f2c2d41cd89b230) should help to fix the tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on a diff in pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

pan3793 commented on code in PR #36697:
URL: https://github.com/apache/spark/pull/36697#discussion_r883789326


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/V2ExpressionUtilsSuite.scala:
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, V2ExpressionUtils}
+import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
+import org.apache.spark.sql.connector.expressions._
+import org.apache.spark.sql.types.StringType
+
+class V2ExpressionUtilsSuite extends SparkFunSuite {

Review Comment:
   Thanks for the suggestion, the current changes are not the final solution, I will make the following changes as suggested.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on PR #36697:
URL: https://github.com/apache/spark/pull/36697#issuecomment-1140402977

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sunchao closed pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

sunchao closed pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated
URL: https://github.com/apache/spark/pull/36697


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

pan3793 commented on PR #36697:
URL: https://github.com/apache/spark/pull/36697#issuecomment-1139436652

   cc @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36697: [SPARK-39313][SQL] V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on code in PR #36697:
URL: https://github.com/apache/spark/pull/36697#discussion_r883783961


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/V2ExpressionUtilsSuite.scala:
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, V2ExpressionUtils}
+import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
+import org.apache.spark.sql.connector.expressions._
+import org.apache.spark.sql.types.StringType
+
+class V2ExpressionUtilsSuite extends SparkFunSuite {

Review Comment:
   `V2ExpressionUtils` is `package org.apache.spark.sql.catalyst.expressions` while this `V2ExpressionUtilsSuite` is in `package org.apache.spark.sql.execution.datasources.v2`.
   
   If this test suite depends on another stuffs (outside `org.apache.spark.sql.catalyst.expressions` package), we had better avoid creating `V2ExpressionUtilsSuite` here. If you don't mind, shall we move this test case into another suite?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on a diff in pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

pan3793 commented on code in PR #36697:
URL: https://github.com/apache/spark/pull/36697#discussion_r884056303


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanPartitioning.scala:
##########
@@ -32,15 +32,15 @@ import org.apache.spark.util.collection.Utils.sequenceToOption
  */
 object V2ScanPartitioning extends Rule[LogicalPlan] with SQLConfHelper {
   override def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
-    case d @ DataSourceV2ScanRelation(relation, scan: SupportsReportPartitioning, _, _) =>
+    case d @ DataSourceV2ScanRelation(relation, scan: SupportsReportPartitioning, _, None) =>

Review Comment:
   After second thought, I think it should only match `None` here, otherwise `catalystPartitioning` will be calculated every round, if the generated `catalystPartitioning` contains `Alias`, will cause `!plan.fastEquals(reOptimized)` and fail `checkBatchIdempotence`.
   
   "SPARK-30289 Create: partitioned by nested column" happen to cover this case after we changed the `InMemoryTable#outputPartitioning`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on code in PR #36697:
URL: https://github.com/apache/spark/pull/36697#discussion_r884557336


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala:
##########
@@ -54,19 +53,19 @@ object V2ExpressionUtils extends SQLConfHelper with Logging {
    * Converts the array of input V2 [[V2SortOrder]] into their counterparts in catalyst.
    */
   def toCatalystOrdering(ordering: Array[V2SortOrder], query: LogicalPlan): Seq[SortOrder] = {
-    sequenceToOption(ordering.map(toCatalyst(_, query))).asInstanceOf[Option[Seq[SortOrder]]]
-      .getOrElse(Seq.empty)
+    ordering.map(toCatalyst(_, query).get.asInstanceOf[SortOrder])
   }
 
   def toCatalyst(
       expr: V2Expression,
       query: LogicalPlan,
+      strict: Boolean = true,

Review Comment:
   instead of adding a new parameter, I think it's clearer to have 2 methods
   ```
   def toCatalystOpt ...
   def toCatalyst = toCatalystOpt(...).getOrElse(throw error)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sunchao commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

sunchao commented on PR #36697:
URL: https://github.com/apache/spark/pull/36697#issuecomment-1143869351

   Merged, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

pan3793 commented on PR #36697:
URL: https://github.com/apache/spark/pull/36697#issuecomment-1139338616

   cc @sunchao, and @MaxGekk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on code in PR #36697:
URL: https://github.com/apache/spark/pull/36697#discussion_r884613073


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala:
##########
@@ -54,19 +53,24 @@ object V2ExpressionUtils extends SQLConfHelper with Logging {
    * Converts the array of input V2 [[V2SortOrder]] into their counterparts in catalyst.
    */
   def toCatalystOrdering(ordering: Array[V2SortOrder], query: LogicalPlan): Seq[SortOrder] = {
-    sequenceToOption(ordering.map(toCatalyst(_, query))).asInstanceOf[Option[Seq[SortOrder]]]
-      .getOrElse(Seq.empty)
+    ordering.map(toCatalystOpt(_, query).get.asInstanceOf[SortOrder])
   }
 
-  def toCatalyst(
+  def toCatalyst(expr: V2Expression,

Review Comment:
   nit: please follow the indentation of other methods in this file



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

pan3793 commented on PR #36697:
URL: https://github.com/apache/spark/pull/36697#issuecomment-1139798918

   Thanks to @sunchao for the confirmation, since `KeyGroupedPartitioningSuite` depends on the bug code to write data, the current change breaks it totally.
   
   I'm trying to get V2Write to support V2Function, but it seems more complicated than I thought at first. Would you please give me some suggestions? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on PR #36697:
URL: https://github.com/apache/spark/pull/36697#issuecomment-1139832757

   > I think this could cause data corruption ...
   
   If so, I will fail RC3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sunchao commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

sunchao commented on PR #36697:
URL: https://github.com/apache/spark/pull/36697#issuecomment-1139985698

   I think that can be a separate PR for master only to support function catalog on the write path.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

pan3793 commented on PR #36697:
URL: https://github.com/apache/spark/pull/36697#issuecomment-1140387555

   `DistributionAndOrderingUtils` does not pass `FunctionCatalog` into `toCatalyst`, so only the identity transform will be translated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] sunchao commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

sunchao commented on PR #36697:
URL: https://github.com/apache/spark/pull/36697#issuecomment-1139803476

   Sure. Let me check.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

pan3793 commented on PR #36697:
URL: https://github.com/apache/spark/pull/36697#issuecomment-1139976216

   > @pan3793 [this commit](https://github.com/sunchao/spark/commit/c2a97be651df747f4edf19f46f2c2d41cd89b230) should help to fix the tests.
   
   Thanks, seems my approach is too overkill to fix it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

pan3793 commented on PR #36697:
URL: https://github.com/apache/spark/pull/36697#issuecomment-1139664769

   My change breaks `KeyGroupedPartitioningSuite` because it try to write data to V2Relation which claims unsupported distributions and orderings.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on a diff in pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

pan3793 commented on code in PR #36697:
URL: https://github.com/apache/spark/pull/36697#discussion_r884617246


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala:
##########
@@ -54,19 +53,24 @@ object V2ExpressionUtils extends SQLConfHelper with Logging {
    * Converts the array of input V2 [[V2SortOrder]] into their counterparts in catalyst.
    */
   def toCatalystOrdering(ordering: Array[V2SortOrder], query: LogicalPlan): Seq[SortOrder] = {
-    sequenceToOption(ordering.map(toCatalyst(_, query))).asInstanceOf[Option[Seq[SortOrder]]]
-      .getOrElse(Seq.empty)
+    ordering.map(toCatalystOpt(_, query).get.asInstanceOf[SortOrder])
   }
 
-  def toCatalyst(
+  def toCatalyst(expr: V2Expression,

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on a diff in pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

pan3793 commented on code in PR #36697:
URL: https://github.com/apache/spark/pull/36697#discussion_r884591939


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala:
##########
@@ -54,19 +53,19 @@ object V2ExpressionUtils extends SQLConfHelper with Logging {
    * Converts the array of input V2 [[V2SortOrder]] into their counterparts in catalyst.
    */
   def toCatalystOrdering(ordering: Array[V2SortOrder], query: LogicalPlan): Seq[SortOrder] = {
-    sequenceToOption(ordering.map(toCatalyst(_, query))).asInstanceOf[Option[Seq[SortOrder]]]
-      .getOrElse(Seq.empty)
+    ordering.map(toCatalyst(_, query).get.asInstanceOf[SortOrder])
   }
 
   def toCatalyst(
       expr: V2Expression,
       query: LogicalPlan,
+      strict: Boolean = true,

Review Comment:
   Good suggestion, updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

Posted by GitBox <gi...@apache.org>.

pan3793 commented on PR #36697:
URL: https://github.com/apache/spark/pull/36697#issuecomment-1140265452

   CI is green, please take another look @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org