You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "zhengruifeng (via GitHub)" <gi...@apache.org> on 2023/07/20 04:52:07 UTC

[GitHub] [spark] zhengruifeng opened a new pull request, #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG `

zhengruifeng opened a new pull request, #42086:
URL: https://github.com/apache/spark/pull/42086

   ### What changes were proposed in this pull request?
   Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG `
   
   
   ### Why are the changes needed?
   In https://github.com/apache/spark/pull/39925, we introduced a new mechanism to resolve expression with specified plan.
   
   However, sometimes the plan ID might be eliminated by the analyzer, and then some expressions can not be correctly resolved, this issue is the main blocker of PS on Connect.
   
   
   ### Does this PR introduce _any_ user-facing change?
   yes, a lot of Pandas APIs enabled
   
   
   ### How was this patch tested?
   Enable 113 out of all the 122 related UTs 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG `

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #42086:
URL: https://github.com/apache/spark/pull/42086#issuecomment-1644169830

   CI link: https://github.com/zhengruifeng/spark/actions/runs/5612996302/job/15208077409


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG`

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #42086:
URL: https://github.com/apache/spark/pull/42086#issuecomment-1658159235

   @HyukjinKwon not sure, maybe not needed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG `

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #42086:
URL: https://github.com/apache/spark/pull/42086#issuecomment-1643892630

   I saw all tests in `pyspark-pandas-slow-connect` successed
   
   ![image](https://github.com/apache/spark/assets/7322292/b0f1d7b7-5520-4988-a0ba-0004ee082f9a)
   
   not sure why it suddenly became
   ![image](https://github.com/apache/spark/assets/7322292/943db764-7a65-454d-8953-8e75d7fc8eb2)
   
   let me disable a few UT to see what happens
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG`

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #42086:
URL: https://github.com/apache/spark/pull/42086#issuecomment-1658113936

   Should we merge this to 3.5 too?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG `

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #42086:
URL: https://github.com/apache/spark/pull/42086#discussion_r1270665267


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -3124,7 +3124,9 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor
 
         // Finally, generate output columns according to the original projectList.
         val finalProjectList = aggregateExprs.map(_.toAttribute)
-        Project(finalProjectList, withWindow)
+        val newProject = Project(finalProjectList, withWindow)
+        newProject.copyTagFrom(f, LogicalPlan.PLAN_ID_TAG)

Review Comment:
   shall we copy all tags?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG `

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #42086:
URL: https://github.com/apache/spark/pull/42086#issuecomment-1643320025

   > > The good news is that 90% UTs can be resolved by this single one, and I think we only need to touch 3~4 more rules.
   > 
   > Great! Could you help creating tickets for remaining 3~4 more rules under [SPARK-42497](https://issues.apache.org/jira/browse/SPARK-42497)??
   
   TBH, I am not very sure about which rules to modify, it needs further investigation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG`

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #42086:
URL: https://github.com/apache/spark/pull/42086#issuecomment-1651677693

   cc @cloud-fan all tests passed, would you mind taking another look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG `

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #42086:
URL: https://github.com/apache/spark/pull/42086#issuecomment-1643363137

   I am fine with this as a workaround for now but such implementation depending on tags is sort of flaky. The tags are easily lost when you, e.g., copy the expressions IIRC.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG`

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng closed pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG`
URL: https://github.com/apache/spark/pull/42086


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG `

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #42086:
URL: https://github.com/apache/spark/pull/42086#discussion_r1272864370


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -3124,7 +3124,9 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor
 
         // Finally, generate output columns according to the original projectList.
         val finalProjectList = aggregateExprs.map(_.toAttribute)
-        Project(finalProjectList, withWindow)
+        val newProject = Project(finalProjectList, withWindow)
+        newProject.copyTagFrom(f, LogicalPlan.PLAN_ID_TAG)

Review Comment:
   sounds good, let me have a try



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG`

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #42086:
URL: https://github.com/apache/spark/pull/42086#issuecomment-1652841100

   thanks! merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG `

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #42086:
URL: https://github.com/apache/spark/pull/42086#discussion_r1270663966


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala:
##########
@@ -151,6 +151,12 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]]
     }
   }
 
+  def copyTagFrom[T](other: BaseType, tag: TreeNodeTag[T]): Unit = {
+    // copy a specified tag if exists
+    other.getTagValue(tag)
+      .foreach(this.setTagValue(tag, _))

Review Comment:
   ```suggestion
       other.getTagValue(tag).foreach(this.setTagValue(tag, _))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG `

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #42086:
URL: https://github.com/apache/spark/pull/42086#issuecomment-1643306997

   > The good news is that 90% UTs can be resolved by this single one, and I think we only need to touch 3~4 more rules.
   
   Great! Could you help creating tickets for remaining 3~4 more rules??


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG`

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #42086:
URL: https://github.com/apache/spark/pull/42086#issuecomment-1652854519

   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG `

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #42086:
URL: https://github.com/apache/spark/pull/42086#issuecomment-1643187065

   I have checked with @cloud-fan that we might have to modify the rules one by one.
   The good news is that 90% UTs can be resolved by this single one, and I think we only need to touch another 3~4 rules.
   
   @HyukjinKwon @itholic 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG `

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #42086:
URL: https://github.com/apache/spark/pull/42086#issuecomment-1643329346

   I got it. Just created single ticket here: SPARK-44492 for addressing undefined remaining tests so we don't miss it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org