You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "xil-db (via GitHub)" <gi...@apache.org> on 2023/11/10 23:20:27 UTC

[PR] [SPARK-45892] Refactor optimizer plan validation [spark]

xil-db opened a new pull request, #43761:
URL: https://github.com/apache/spark/pull/43761

   ### What changes were proposed in this pull request?
   
   Currently, the expressionIDUniqueness validation is closely [coupled with output schema validation](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala#L403C7-L411C8). 
   
   This PR refactors the code to improve readability and reuse.
   
   ### Why are the changes needed?
   
   Improve code readability and maintainability.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing tests.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45892][SQL] Refactor optimizer plan validation to decouple `validateSchemaOutput` and `validateExprIdUniqueness` [spark]

Posted by "xil-db (via GitHub)" <gi...@apache.org>.

xil-db commented on code in PR #43761:
URL: https://github.com/apache/spark/pull/43761#discussion_r1391442595


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala:
##########
@@ -381,6 +381,15 @@ object LogicalPlanIntegrity {
     }.flatten
   }
 
+  def validateSchemaOutput(previousPlan: LogicalPlan, currentPlan: LogicalPlan): Option[String] = {

Review Comment:
   To be consistent with other validation functions, but happy to change if preferred.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45892][SQL] Refactor optimizer plan validation [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.

LuciferYang commented on code in PR #43761:
URL: https://github.com/apache/spark/pull/43761#discussion_r1390668300


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala:
##########
@@ -381,6 +381,15 @@ object LogicalPlanIntegrity {
     }.flatten
   }
 
+  def validateSchemaOutput(previousPlan: LogicalPlan, currentPlan: LogicalPlan): Option[String] = {
+    if (!DataTypeUtils.equalsIgnoreNullability(previousPlan.schema, currentPlan.schema)) {
+      Some(s"The plan output schema has changed from ${previousPlan.schema.sql} to " +

Review Comment:
   I think we can use string block syntax instead of manually adding `\n` for new lines.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala:
##########
@@ -381,6 +381,15 @@ object LogicalPlanIntegrity {
     }.flatten
   }
 
+  def validateSchemaOutput(previousPlan: LogicalPlan, currentPlan: LogicalPlan): Option[String] = {

Review Comment:
   this function can be `private`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45892][SQL] Refactor optimizer plan validation to decouple `validateSchemaOutput` and `validateExprIdUniqueness` [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.

LuciferYang commented on code in PR #43761:
URL: https://github.com/apache/spark/pull/43761#discussion_r1391446442


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala:
##########
@@ -381,6 +381,15 @@ object LogicalPlanIntegrity {
     }.flatten
   }
 
+  def validateSchemaOutput(previousPlan: LogicalPlan, currentPlan: LogicalPlan): Option[String] = {
+    if (!DataTypeUtils.equalsIgnoreNullability(previousPlan.schema, currentPlan.schema)) {
+      Some(s"The plan output schema has changed from ${previousPlan.schema.sql} to " +

Review Comment:
   This is not a very important issue, so it's not worth making batch changes. It's just that this pr touched a related case, so I hope it can be conveniently fixed. Personally, I think the string block syntax  is relatively clearer, for example, there is no need to pay attention to the escape characters :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45892][SQL] Refactor optimizer plan validation to decouple `validateSchemaOutput` and `validateExprIdUniqueness` [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #43761:
URL: https://github.com/apache/spark/pull/43761#issuecomment-1809154932

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45892][SQL] Refactor optimizer plan validation to decouple `validateSchemaOutput` and `validateExprIdUniqueness` [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43761:
URL: https://github.com/apache/spark/pull/43761#discussion_r1391724675


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala:
##########
@@ -381,6 +381,15 @@ object LogicalPlanIntegrity {
     }.flatten
   }
 
+  def validateSchemaOutput(previousPlan: LogicalPlan, currentPlan: LogicalPlan): Option[String] = {

Review Comment:
   yea in general we should keep the code syle the same with existing code in the same file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45892][SQL] Refactor optimizer plan validation to decouple `validateSchemaOutput` and `validateExprIdUniqueness` [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan closed pull request #43761: [SPARK-45892][SQL] Refactor optimizer plan validation to decouple `validateSchemaOutput` and `validateExprIdUniqueness`
URL: https://github.com/apache/spark/pull/43761


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45892][SQL] Refactor optimizer plan validation to decouple `validateSchemaOutput` and `validateExprIdUniqueness` [spark]

Posted by "xil-db (via GitHub)" <gi...@apache.org>.

xil-db commented on code in PR #43761:
URL: https://github.com/apache/spark/pull/43761#discussion_r1391441829


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala:
##########
@@ -381,6 +381,15 @@ object LogicalPlanIntegrity {
     }.flatten
   }
 
+  def validateSchemaOutput(previousPlan: LogicalPlan, currentPlan: LogicalPlan): Option[String] = {
+    if (!DataTypeUtils.equalsIgnoreNullability(previousPlan.schema, currentPlan.schema)) {
+      Some(s"The plan output schema has changed from ${previousPlan.schema.sql} to " +

Review Comment:
   Yes, we can do that. The current style is consistent with other validation functions so if we want to switch to string block syntax, I think we should change other validation functions, too. I don't have a strong opinion on this, please let me know if that's preferred.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org