You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/12 14:02:39 UTC

[GitHub] [spark] peter-toth commented on pull request #37496: [SPARK-39887][SQL][3.1] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique

peter-toth commented on PR #37496:
URL: https://github.com/apache/spark/pull/37496#issuecomment-1213144688

   @cloud-fan, this one is a bit different to the 3.4, 3.3, 3.2.
   As you can see after the original change (https://github.com/apache/spark/pull/37496/commits/41acae298176640288208a5c4dd383a2afab6432) the plan regeneration (https://github.com/apache/spark/pull/37496/commits/ae87e3686bdff1834afbf829b7c056ef555fef57) didn't modify the expected `sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q14a.sf100/explain.txt` and `sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q14a/explain.txt` but modified a lot of golden files under `sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/`, that other versions didn't.
   
   It turned out that
   - the missing changes under `v1_4` are because in 3.1 we compare only the simplified golden files to detect changes.
   - the new changes under `v2_7` are because in those queries we have a parent `Union` whose first child is also an `Union` node. And the child `Union`'s children have intersecting output set. So the aliases in the 2nd+ child of the child `Union` are also kept. I've fixed this issue with this change: https://github.com/apache/spark/pull/37496/commits/b508ca23bcd0e96c7c5b664d67bbe626262317ac to not keep those attributes in 2nd+ childrens of an `Union` node that are intersecting with the 1st children. This fix allowed me to revert the plan regeneration: https://github.com/apache/spark/pull/37496/commits/90c2a8c7d6226cc0e8b6e03c6d6e237b0c72719e
   
   I think, we probably should land the fix commit https://github.com/apache/spark/pull/37496/commits/b508ca23bcd0e96c7c5b664d67bbe626262317ac in 3.4, 3.3 and 3.2 too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org