You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/09/07 20:03:30 UTC

[GitHub] [spark] ahshahid opened a new pull request, #37824: Spark 40362 Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

ahshahid opened a new pull request, #37824:
URL: https://github.com/apache/spark/pull/37824

   The change in the canonicalization of expressions to split in 2 parts ( precanonicalization & canonicalization) has broken the canonicalization logic involving commutative expressions if they are sub-expressions.
   The reason is that the Main expression invokes precanonicalization of children and if the child is a commutative type expression, the re-ordering gets skipped, which will not be the case if that child's canonicalize is called directly.
   so if a main expression is canonicalized and if it is compared to the main expression built from leaves to top with children's canonicalization explicitly invoked, the two main expression's canonicalization will differ.
   
   The fix is to ensure that for commutative expressions , the canonicalize and pre-canonicalize are same.
   Not sure if this fix will completely nullify the perf improvement intended by splitting canonicalization in two parts.
   If it completely nullifies the perf, then it would be better to ignore this PR and revert the code change of splitting the canonicalization in 2 parts.
   But if we keep the two step canonicalization, then this PR is valid and is needed.
   
   Introduced a trait which is implemented by Commutative expressions which equalizes precanonicalize and canonnicalize to fix the issue.
   
   There are bug tests added which show the issue on master.
   
   No user facing changes
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on pull request #37824: Spark 40362 Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
peter-toth commented on PR #37824:
URL: https://github.com/apache/spark/pull/37824#issuecomment-1240770754

   As we know that `BinaryComparison` needs (fully) canonicalized children to get the right ordering of its children why don't we simply change `BinaryComparison.preCanonicalized` to `BinaryComparison.canonicalized` and modify it to use the canonicalized form of the nodes children:
   ```
   abstract class BinaryComparison extends BinaryOperator with Predicate {
   
     ...
   
     override lazy val canonicalized: Expression = {
       withNewChildren(children.map(_.canonicalized)) match {
       ...
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ahshahid commented on pull request #37824: [SPARK-40362][SQL] Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
ahshahid commented on PR #37824:
URL: https://github.com/apache/spark/pull/37824#issuecomment-1242223041

   > Agree with @peter-toth that we should do all ordering in the 2nd pass, in a bottom-up way.
   
   I suppose @cloud-fan @peter-toth  you want to code the change...? Or you want me to generate a new PR .. I think that making it bottom-up would result in many  tree traversals for commutative expressions resulting in perf degradation ..
   The reason I think is this:
   In your current code of top to bottom, the first encountered commutative expression will flatten all the consecutive similar commutative expressions ,so only one such recurring traversal will be needed. But if you go from bottom  to top, at the re-order will occur at each  child commutative expression...
   May be I can test the recommended change in a test & gauge the perf..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ahshahid commented on a diff in pull request #37824: Spark 40362 Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
ahshahid commented on code in PR #37824:
URL: https://github.com/apache/spark/pull/37824#discussion_r966173357


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala:
##########
@@ -65,6 +65,6 @@ object Canonicalize {
       val newChildren = orderCommutative(l, { case Least(children) => children })
       Least(newChildren)
 
-    case _ => e.withNewChildren(e.children.map(reorderCommutativeOperators))
+    case _ => e

Review Comment:
   @attilapiros  btw just realized that the case you mentioned as such is what is being tested in the bug tests as GreaterThan , EqualTo etc are non commutative expressions.. 
   But I will use abs & also have a deeply nested expression tree containing a mix of commutative & non commutative expressions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on pull request #37824: Spark 40362 Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
peter-toth commented on PR #37824:
URL: https://github.com/apache/spark/pull/37824#issuecomment-1240804232

   I think we could simply move the ordering logic from `BinaryComparison.preCanonicalized` to `Canonicalize.reorderCommutativeOperators`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ahshahid commented on pull request #37824: [SPARK-40362][SQL] Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
ahshahid commented on PR #37824:
URL: https://github.com/apache/spark/pull/37824#issuecomment-1243878958

   @peter-toth ..It just occured to me that even the precanonicalize variable and going through children is not needed.
   because reeorder already does that..
   I am testing the new change,..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on pull request #37824: Spark 40362 Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
peter-toth commented on PR #37824:
URL: https://github.com/apache/spark/pull/37824#issuecomment-1240832839

   cc @cloud-fan, as this issue can cause pref regression from 3.2 to 3.3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ahshahid commented on a diff in pull request #37824: Spark 40362 Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
ahshahid commented on code in PR #37824:
URL: https://github.com/apache/spark/pull/37824#discussion_r966162761


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala:
##########
@@ -65,6 +65,6 @@ object Canonicalize {
       val newChildren = orderCommutative(l, { case Least(children) => children })
       Least(newChildren)
 
-    case _ => e.withNewChildren(e.children.map(reorderCommutativeOperators))
+    case _ => e

Review Comment:
   That should not be an issue. The children traversal is not needed, because the precanonicalize on children would ensure that all the elements of subtree have the precanonicalize invoked, which would mean that any commutative expression's precanonicalize is also invoked, and for commutative expressions, the precanonicalize & canonicalize are identical & involves reordering.
   
   I will add tests for that too,.
   
   Also I think now there is no need for precanonicalize variable at all.  whatever code is in precanonicalize can directly be  assigned to canonicalize variable.
   The main issue was  tree traversal at each level ( & mostly redundant) caused by reorderCall.  that is taken care of by not dwelving into children and reorder is being called only for commutative expressions.
   I will also measure the perf impact. that should not change.
   That will simplify the code too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ahshahid commented on pull request #37824: [SPARK-40362][SQL] Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
ahshahid commented on PR #37824:
URL: https://github.com/apache/spark/pull/37824#issuecomment-1241064961

   Have replaced precanonicalize with canonicalize.
   I will check in some time if the perf benefit is maintained ( which I think it should)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ahshahid commented on pull request #37824: [SPARK-40362][SQL] Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
ahshahid commented on PR #37824:
URL: https://github.com/apache/spark/pull/37824#issuecomment-1241281980

   Though this PR fixes the bug, but performance benchmark is now 14 sec instead of 200ms. Which is not good.
   I also now appreciate much better the requirement of precanonicalize phase ( the cost of canonicalizing  an expression like
   a + b + c+ d + e + f   which is a nested tree of Add and for proper canonicalization needs to be flatenned hence recursive cost which precanonicalize avoids by having only 1 such deep call.
   I will try the alternate approach of ensuring hashCode is symmetric for commutative expressions which will mean minimal changes & fix the bug.
   The ugly part there will be specific handling of Seq[Expression] for the Least & Greatest.
   Once I modify this PR will elicit your inputs...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a diff in pull request #37824: Spark 40362 Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
attilapiros commented on code in PR #37824:
URL: https://github.com/apache/spark/pull/37824#discussion_r965247849


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala:
##########
@@ -563,7 +564,8 @@ case class Subtract(
 case class Multiply(
     left: Expression,
     right: Expression,
-    evalMode: EvalMode.Value = EvalMode.fromSQLConf(SQLConf.get)) extends BinaryArithmetic {
+    evalMode: EvalMode.Value = EvalMode.fromSQLConf(SQLConf.get)) extends BinaryArithmetic with

Review Comment:
   Nit: indentation the `with` should be moved to the next line



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala:
##########
@@ -65,6 +65,6 @@ object Canonicalize {
       val newChildren = orderCommutative(l, { case Least(children) => children })
       Least(newChildren)
 
-    case _ => e.withNewChildren(e.children.map(reorderCommutativeOperators))
+    case _ => e

Review Comment:
   I have some doubts regarding this change. 
   
   What about a tree where a commutative child has a non-commutative parent which is not listed above for example `abs('a + 'b)`? As with this change the recursive call stops at `abs`...
   
   can you extend the test cases which such an example?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on pull request #37824: [SPARK-40362][SQL] Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
peter-toth commented on PR #37824:
URL: https://github.com/apache/spark/pull/37824#issuecomment-1242780204

   @ahshahid, please check https://github.com/apache/spark/pull/37851 as an anternative to this PR. It moves reorder of childen of `BinaryComparison`s to the 2nd phase. As reorder already happens bottom-up way in `Canonicalize. reorderCommutativeOperators `, this is just a small addition.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on pull request #37824: [SPARK-40362][SQL] Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
peter-toth commented on PR #37824:
URL: https://github.com/apache/spark/pull/37824#issuecomment-1242909357

   > In case it is nested "And" then the perf is obtained because only the top would call reorder.. with this, it will happen on each child & may have adverse impact on perf.. But again that is just my hypothesis , need to verify..
   
   Reorder always happenes on each (non `And`) child of those consecutive `And`s at: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala#L68


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ahshahid commented on pull request #37824: Spark 40362 Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
ahshahid commented on PR #37824:
URL: https://github.com/apache/spark/pull/37824#issuecomment-1240934709

   @peter-toth as mentioned above, I am thinking of removing precanonicalize as  canonicalize itself should achieve the goal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ahshahid commented on pull request #37824: [SPARK-40362][SQL] Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
ahshahid commented on PR #37824:
URL: https://github.com/apache/spark/pull/37824#issuecomment-1244140015

   > > @peter-toth , @cloud-fan @attilapiros ..It just occured to me that even the precanonicalize variable( and hence going through children in it) is not needed.
   > 
   > I'm not sure that `preCanonicalized` is being a lazy val is bad as if you already have it calulated for an expression `e` and you just add a new root expression `r` on the top of `e`, you can reuse `e.preCanonicalized` to canonicalize `r` (if `r` is not a commutative operator).
   
   I thought about it, the thing is in any case precanonicalize has to undergo tree traversal at time of canonicalization, so there should hardly be any difference. More over I think currently precanonicalized is being called unnecessarily from many places.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #37824: Spark 40362 Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
attilapiros commented on PR #37824:
URL: https://github.com/apache/spark/pull/37824#issuecomment-1240191235

   @ahshahid Thanks for your contribution!
   The PR title should follow a pattern:
   >The PR title should be of the form [SPARK-xxxx][COMPONENT] Title, where SPARK-xxxx is the relevant JIRA number, COMPONENT is one of the PR categories shown at [spark-prs.appspot.com](https://spark-prs.appspot.com/) and Title may be the JIRA’s title or a more specific title describing the PR itself.
   
   For details see: https://spark.apache.org/contributing.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ahshahid commented on pull request #37824: [SPARK-40362][SQL] Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
ahshahid commented on PR #37824:
URL: https://github.com/apache/spark/pull/37824#issuecomment-1242792830

   > @ahshahid, please check #37851 as an anternative to this PR. It moves reorder of childen of `BinaryComparison`s to the 2nd phase. As reorder already happens bottom-up way in `Canonicalize. reorderCommutativeOperators `, this is just a small addition.
   
   I will go through it @peter-toth .. Will check perf impact.. if perf is maintained, then I agree its a simple & better fix..
   The thing is Canonicalize. reorderCommutativeOperators only happens bottoms up if the commutative operators are of same type. In case it is nested "And"  then the perf is obtained because only the top would call reorder.. with this, it will happen on each child & may have adverse impact on perf.. But again that is just my hypothesis , need to verify..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on pull request #37824: [SPARK-40362][SQL] Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
peter-toth commented on PR #37824:
URL: https://github.com/apache/spark/pull/37824#issuecomment-1244102590

   > @peter-toth , @cloud-fan @attilapiros ..It just occured to me that even the precanonicalize variable( and hence going through children in it) is not needed. 
   
   I'm not sure that `preCanonicalized` is being a lazy val is bad as if you already have it calulated for an expression `e` and you just add a new root expression `r` on the top of `e`, you can reuse `e.preCanonicalized` to canonicalize `r` (if `r` is not a comutative operator).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ahshahid commented on a diff in pull request #37824: [SPARK-40362][SQL] Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
ahshahid commented on code in PR #37824:
URL: https://github.com/apache/spark/pull/37824#discussion_r966280137


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala:
##########
@@ -563,7 +564,8 @@ case class Subtract(
 case class Multiply(
     left: Expression,
     right: Expression,
-    evalMode: EvalMode.Value = EvalMode.fromSQLConf(SQLConf.get)) extends BinaryArithmetic {
+    evalMode: EvalMode.Value = EvalMode.fromSQLConf(SQLConf.get)) extends BinaryArithmetic with

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ahshahid commented on pull request #37824: [SPARK-40362][SQL] Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
ahshahid commented on PR #37824:
URL: https://github.com/apache/spark/pull/37824#issuecomment-1241065477

   > 
   
   done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on pull request #37824: [SPARK-40362][SQL] Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
peter-toth commented on PR #37824:
URL: https://github.com/apache/spark/pull/37824#issuecomment-1243740986

   Hmm looks very interresting idea. If you don't mind I incomporated it into the latest https://github.com/apache/spark/pull/37851 to keep things simple.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ahshahid commented on pull request #37824: [SPARK-40362][SQL] Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
ahshahid commented on PR #37824:
URL: https://github.com/apache/spark/pull/37824#issuecomment-1241316795

   > I think we could simply move the ordering logic from `BinaryComparison.preCanonicalized` to `Canonicalize.reorderCommutativeOperators` (and rename it to `reorderOperators`) and so keep the performance improvement of #34883.
   
   I suppose that will also work... thinking on it..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #37824: [SPARK-40362][SQL] Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #37824:
URL: https://github.com/apache/spark/pull/37824#issuecomment-1241713531

   Agree with @peter-toth that we should do all ordering in the 2nd pass, in a bottom-up way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on pull request #37824: [SPARK-40362][SQL] Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
peter-toth commented on PR #37824:
URL: https://github.com/apache/spark/pull/37824#issuecomment-1241638796

   > > I think we could simply move the ordering logic from `BinaryComparison.preCanonicalized` to `Canonicalize.reorderCommutativeOperators` (and rename it to `reorderOperators`) and so keep the performance improvement of #34883.
   > 
   > I suppose that will also work... thinking on it.. Actually I think it wont work., BinaryComparison to get correct results , needs hashCode of the operands, and the hashCode wiill not be correct , till canonicalize of the children is called, so it wont solve the issue... right?
   
   If we do all ordering in the 2nd pass then by the time we are ordering BinaryComparison's children its childrens have been fully canonicalized (ordered) so their hashcode will be correct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #37824: [SPARK-40362][SQL] Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #37824:
URL: https://github.com/apache/spark/pull/37824#issuecomment-1241676122

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ahshahid commented on pull request #37824: [SPARK-40362][SQL] Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
ahshahid commented on PR #37824:
URL: https://github.com/apache/spark/pull/37824#issuecomment-1242997023

   @peter-toth @cloud-fan  Extending your fixes little further ( about moving the reorder of the binary comparison to second phase i.e after children's reordering) ,  it occured that gatherCommutative is called on precanonicalized children, it eliminates  the need for second pass for reordering.
   This halves the current benchmark perf to 75 ms.
   and another benchmark test to 130ms compared to approc 500 ms


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ahshahid closed pull request #37824: [SPARK-40362][SQL] Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Posted by GitBox <gi...@apache.org>.
ahshahid closed pull request #37824: [SPARK-40362][SQL] Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators
URL: https://github.com/apache/spark/pull/37824


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org