You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "xinrong-meng (via GitHub)" <gi...@apache.org> on 2023/03/20 11:26:03 UTC
[GitHub] [spark] xinrong-meng opened a new pull request, #40487: [WIP] Implement CoGrouped Map API
xinrong-meng opened a new pull request, #40487:
URL: https://github.com/apache/spark/pull/40487
### What changes were proposed in this pull request?
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xinrong-meng commented on pull request #40487: [SPARK-42891][CONNECT][PYTHON] Implement CoGrouped Map API
Posted by "xinrong-meng (via GitHub)" <gi...@apache.org>.
xinrong-meng commented on PR #40487:
URL: https://github.com/apache/spark/pull/40487#issuecomment-1480372625
cc [LuciferYang](https://github.com/LuciferYang) thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #40487: [WIP] Implement CoGrouped Map API
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #40487:
URL: https://github.com/apache/spark/pull/40487#discussion_r1142115606
##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -509,6 +511,26 @@ class SparkConnectPlanner(val session: SparkSession) {
.logicalPlan
}
+ private def transformCoGroupMap(rel: proto.GroupMap): LogicalPlan = {
Review Comment:
`rel: proto.GroupMap` -> `rel: proto.CoGroupMap`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #40487: [SPARK-42891][CONNECT][PYTHON] Implement CoGrouped Map API
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #40487:
URL: https://github.com/apache/spark/pull/40487#issuecomment-1480555708
Merged to master and branch-3.4.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xinrong-meng commented on a diff in pull request #40487: [WIP] Implement CoGrouped Map API
Posted by "xinrong-meng (via GitHub)" <gi...@apache.org>.
xinrong-meng commented on code in PR #40487:
URL: https://github.com/apache/spark/pull/40487#discussion_r1142823170
##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -509,6 +511,26 @@ class SparkConnectPlanner(val session: SparkSession) {
.logicalPlan
}
+ private def transformCoGroupMap(rel: proto.GroupMap): LogicalPlan = {
Review Comment:
Thanks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #40487: [SPARK-42891][CONNECT][PYTHON] Implement CoGrouped Map API
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #40487:
URL: https://github.com/apache/spark/pull/40487#issuecomment-1480687213
late LGTM
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xinrong-meng commented on pull request #40487: [SPARK-42891][CONNECT][PYTHON] Implement CoGrouped Map API
Posted by "xinrong-meng (via GitHub)" <gi...@apache.org>.
xinrong-meng commented on PR #40487:
URL: https://github.com/apache/spark/pull/40487#issuecomment-1480372425
May I get a review please @zhengruifeng @HyukjinKwon ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #40487: [WIP] Implement CoGrouped Map API
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #40487:
URL: https://github.com/apache/spark/pull/40487#discussion_r1142087439
##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -509,6 +511,26 @@ class SparkConnectPlanner(val session: SparkSession) {
.logicalPlan
}
+ private def transformCoGroupMap(rel: proto.GroupMap): LogicalPlan = {
+ val pythonUdf = transformPythonUDF(rel.getFunc)
+
+ val input_cols =
Review Comment:
For Scala code, I suggest using Camel-Case
##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -509,6 +511,26 @@ class SparkConnectPlanner(val session: SparkSession) {
.logicalPlan
}
+ private def transformCoGroupMap(rel: proto.GroupMap): LogicalPlan = {
+ val pythonUdf = transformPythonUDF(rel.getFunc)
+
+ val input_cols =
+ rel.getInputGroupingExpressionsList.asScala.toSeq.map(expr =>
+ Column(transformExpression(expr)))
+ val other_cols =
Review Comment:
ditto
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #40487: [SPARK-42891][CONNECT][PYTHON] Implement CoGrouped Map API
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #40487: [SPARK-42891][CONNECT][PYTHON] Implement CoGrouped Map API
URL: https://github.com/apache/spark/pull/40487
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #40487: [SPARK-42891][CONNECT][PYTHON] Implement CoGrouped Map API
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #40487:
URL: https://github.com/apache/spark/pull/40487#issuecomment-1480556026
It has a conflict w/ branch-3.4. mind creating a backport PR please?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-42891][CONNECT][PYTHON] Implement CoGrouped Map API [spark]
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #40487:
URL: https://github.com/apache/spark/pull/40487#discussion_r1403809250
##########
python/pyspark/sql/connect/plan.py:
##########
@@ -1950,6 +1950,46 @@ def plan(self, session: "SparkConnectClient") -> proto.Relation:
return plan
+class CoGroupMap(LogicalPlan):
+ """Logical plan object for a CoGroup Map API: applyInPandas."""
+
+ def __init__(
+ self,
+ input: Optional["LogicalPlan"],
+ input_grouping_cols: Sequence[Column],
+ other: Optional["LogicalPlan"],
+ other_grouping_cols: Sequence[Column],
+ function: "UserDefinedFunction",
+ cols: List[Column],
Review Comment:
@xinrong-meng just realised that we don't use this variable. Should we remove? or should it be assigned to somewhere?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org