You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "beliefer (via GitHub)" <gi...@apache.org> on 2023/03/06 01:59:27 UTC

[GitHub] [spark] beliefer opened a new pull request, #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

beliefer opened a new pull request, #40287:
URL: https://github.com/apache/spark/pull/40287

   ### What changes were proposed in this pull request?
   UnresolvedNamedLambdaVariable do not need unique names in python. We already did this for the scala client, and it is good to have parity between the two implementations.
   
   
   ### Why are the changes needed?
   Try to avoid unique names for UnresolvedNamedLambdaVariable.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   New feature
   
   
   ### How was this patch tested?
   N/A
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #40287:
URL: https://github.com/apache/spark/pull/40287#issuecomment-1458109080

   > @beliefer here is the thing. When this was designed it was mainly aimed at sql, and there we definitely do not generate unique names in lambda functions either. This is all done in the analyzer. We should be able to follow the same path.
   >
   
   It seems only the lambda functions in SQL will be transformed with analyzer. But the scala, pyspark API will not through analyzer.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #40287:
URL: https://github.com/apache/spark/pull/40287#issuecomment-1458193681

   > Ehhhh... SQL/scala/Python all use the analyzer; they are all just frontends to the same thing.
   
   I found the reason. Although the scala API use analyzer too. `object ResolveLambdaVariables extends Rule[LogicalPlan]` can't resolve the issue.
   See at: https://github.com/apache/spark/blob/2e7207f96e1ff848def135de63f63bcda7402517/sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala#L5250
   ![image](https://user-images.githubusercontent.com/8486025/223439443-efa6346d-a829-4dd2-a430-df5f24fbd819.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #40287:
URL: https://github.com/apache/spark/pull/40287#issuecomment-1595508574

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #40287:
URL: https://github.com/apache/spark/pull/40287#issuecomment-1458116337

   Ehhhh... SQL/scala/Python all use the analyzer; they are all just frontends to the same thing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #40287:
URL: https://github.com/apache/spark/pull/40287#issuecomment-1459594072

   If the nested lambda issue also exists in the Scala Client, do we need to fix it in the same way?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #40287:
URL: https://github.com/apache/spark/pull/40287#issuecomment-1457433571

   @beliefer here is the thing. When this was designed it was mainly aimed at sql, and there we definitely do not generate unique names in lambda functions either. This is all done in the analyzer. We should be able to follow the same path.
   
   Do you happen to know if test failing for python also fail for scala?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #40287:
URL: https://github.com/apache/spark/pull/40287#issuecomment-1455390728

   ![image](https://user-images.githubusercontent.com/8486025/223014232-bf9b26ee-d0e8-4de4-a8fe-2d252813ac4d.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #40287:
URL: https://github.com/apache/spark/pull/40287#issuecomment-1455392063

   > I guess we will need to rewrite the lamda function in spark connect planner.
   
   Yeah.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names
URL: https://github.com/apache/spark/pull/40287


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #40287:
URL: https://github.com/apache/spark/pull/40287#issuecomment-1456026258

   It seems pyspark supports the nested lambda variables and two PR fix the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #40287:
URL: https://github.com/apache/spark/pull/40287#issuecomment-1459329942

   @hvanhovell Do we still need this change ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #40287:
URL: https://github.com/apache/spark/pull/40287#issuecomment-1457418420

   @hvanhovell Scala also uses `UnresolvedNamedLambdaVariable.freshVarName("x")` to get the unique names. see: 
   https://github.com/apache/spark/blob/201e08c03a31c763e3120540ac1b1ca8ef252e6b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L4096


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #40287:
URL: https://github.com/apache/spark/pull/40287#issuecomment-1456054467

   @beliefer scala does support nested lambda variables as well, and they actually work. So either (my) testing on the scala side is incomplete (which might well be the case), or something weird is going on here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #40287:
URL: https://github.com/apache/spark/pull/40287#issuecomment-1455384317

   @hvanhovell After my test, `python/run-tests --testnames 'pyspark.sql.connect.dataframe'` will not passed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #40287:
URL: https://github.com/apache/spark/pull/40287#issuecomment-1455388960

   I guess we will need to rewrite the lamda function in spark connect planner.
   
   cc @ueshin as well, since existing implementation follows the fix in https://github.com/apache/spark/pull/32523


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #40287:
URL: https://github.com/apache/spark/pull/40287#issuecomment-1455366786

   @HyukjinKwon @zhengruifeng the rationale for this change is that analyzer takes care of making lambda variables unique.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org