You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/29 21:03:21 UTC

[GitHub] [spark] grundprinzip opened a new pull request, #38842: [SPARK-41326] [CONNECT] Fix deduplicate is missing input

grundprinzip opened a new pull request, #38842:
URL: https://github.com/apache/spark/pull/38842

   ### What changes were proposed in this pull request?
   In the transformation of the Spark Connect plan for `Deduplicate`, it was missing to copy the input relation into the plan. This caused an exception on the server and failing the query. 
   
   This patch fixes that bug.
   
   ### Why are the changes needed?
   Bugfix
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38842: [SPARK-41326] [CONNECT] Fix deduplicate is missing input

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38842:
URL: https://github.com/apache/spark/pull/38842#discussion_r1035289551


##########
python/pyspark/sql/connect/plan.py:
##########
@@ -466,6 +466,7 @@ def __init__(
     def plan(self, session: "SparkConnectClient") -> proto.Relation:
         assert self._child is not None
         plan = proto.Relation()
+        plan.deduplicate.input.CopyFrom(self._child.plan(session))

Review Comment:
   this is a case that probably we have a test in `test_connect_basic` to avoid, maybe BTW add a test case there?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #38842: [SPARK-41326] [CONNECT] Fix deduplicate is missing input

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #38842:
URL: https://github.com/apache/spark/pull/38842#issuecomment-1331510191

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38842: [SPARK-41326] [CONNECT] Fix deduplicate is missing input

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #38842:
URL: https://github.com/apache/spark/pull/38842#discussion_r1035429164


##########
python/pyspark/sql/connect/plan.py:
##########
@@ -466,6 +466,7 @@ def __init__(
     def plan(self, session: "SparkConnectClient") -> proto.Relation:
         assert self._child is not None
         plan = proto.Relation()
+        plan.deduplicate.input.CopyFrom(self._child.plan(session))

Review Comment:
   Let me just merge this for now and go forward.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #38842: [SPARK-41326] [CONNECT] Fix deduplicate is missing input

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #38842:
URL: https://github.com/apache/spark/pull/38842#issuecomment-1331425881

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #38842: [SPARK-41326] [CONNECT] Fix deduplicate is missing input

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on code in PR #38842:
URL: https://github.com/apache/spark/pull/38842#discussion_r1035295893


##########
python/pyspark/sql/connect/plan.py:
##########
@@ -466,6 +466,7 @@ def __init__(
     def plan(self, session: "SparkConnectClient") -> proto.Relation:
         assert self._child is not None
         plan = proto.Relation()
+        plan.deduplicate.input.CopyFrom(self._child.plan(session))

Review Comment:
   The existing tests are exhaustive they just missed that the input was never copied. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #38842: [SPARK-41326] [CONNECT] Fix deduplicate is missing input

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #38842:
URL: https://github.com/apache/spark/pull/38842#discussion_r1035302228


##########
python/pyspark/sql/connect/plan.py:
##########
@@ -466,6 +466,7 @@ def __init__(
     def plan(self, session: "SparkConnectClient") -> proto.Relation:
         assert self._child is not None
         plan = proto.Relation()
+        plan.deduplicate.input.CopyFrom(self._child.plan(session))

Review Comment:
   I won't block this PR by my comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #38842: [SPARK-41326] [CONNECT] Fix deduplicate is missing input

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #38842: [SPARK-41326] [CONNECT] Fix deduplicate is missing input
URL: https://github.com/apache/spark/pull/38842


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org