You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "ueshin (via GitHub)" <gi...@apache.org> on 2023/03/08 00:16:31 UTC

[GitHub] [spark] ueshin opened a new pull request, #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command

ueshin opened a new pull request, #40323:
URL: https://github.com/apache/spark/pull/40323

   ### What changes were proposed in this pull request?
   
   Fixes `spark.sql` to return values from the command.
   
   ### Why are the changes needed?
   
   Currently `spark.sql` doesn't return the result from the commands.
   
   ```py
   >>> spark.sql("show functions").show()
   +--------+
   |function|
   +--------+
   +--------+
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   `spark.sql` with commands will return the values.
   
   ### How was this patch tested?
   
   Added a test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a diff in pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command

Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin commented on code in PR #40323:
URL: https://github.com/apache/spark/pull/40323#discussion_r1129020413


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -1508,8 +1508,10 @@ class SparkConnectPlanner(val session: SparkSession) {
         maxRecordsPerBatch,
         maxBatchSize,
         timeZoneId)
-      assert(batches.size == 1)
-      batches.next()
+      assert(batches.hasNext)

Review Comment:
   The `batches` is an iterator, so `batches.size` consumes all the data in the iterator to calculate the size.
   Then `batches.next()` would return an empty data? Usually it should throw an Exception, though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #40323:
URL: https://github.com/apache/spark/pull/40323#issuecomment-1459252795

   > @LuciferYang This PR fix it in the connect planner, so should also works for the Scala Client.
   
   OK, got it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #40323:
URL: https://github.com/apache/spark/pull/40323#issuecomment-1459251014

   @LuciferYang This PR fix it in the connect planner, so should also works for the Scala Client.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command

Posted by "grundprinzip (via GitHub)" <gi...@apache.org>.
grundprinzip commented on code in PR #40323:
URL: https://github.com/apache/spark/pull/40323#discussion_r1129021709


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -1508,8 +1508,10 @@ class SparkConnectPlanner(val session: SparkSession) {
         maxRecordsPerBatch,
         maxBatchSize,
         timeZoneId)
-      assert(batches.size == 1)
-      batches.next()
+      assert(batches.hasNext)

Review Comment:
   I see thank you. My understanding with iterators was that non rewindable iterators simply do not have a size method. 
   
   But something new learned. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command

Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin commented on PR #40323:
URL: https://github.com/apache/spark/pull/40323#issuecomment-1459184767

   > Is there a similar case on Scala connect client ?
   
   I haven't tried Scala client, but yes, it would happen, and this will fix both.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #40323:
URL: https://github.com/apache/spark/pull/40323#issuecomment-1459158853

   Is there a similar case on Scala connect client ?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #40323:
URL: https://github.com/apache/spark/pull/40323#issuecomment-1459249689

   Is there a chance to add a similar case in `ClientE2ETestSuite`?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng closed pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command
URL: https://github.com/apache/spark/pull/40323


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command

Posted by "grundprinzip (via GitHub)" <gi...@apache.org>.
grundprinzip commented on code in PR #40323:
URL: https://github.com/apache/spark/pull/40323#discussion_r1129008040


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -1508,8 +1508,10 @@ class SparkConnectPlanner(val session: SparkSession) {
         maxRecordsPerBatch,
         maxBatchSize,
         timeZoneId)
-      assert(batches.size == 1)
-      batches.next()
+      assert(batches.hasNext)

Review Comment:
   Sorry for the late reply but how is this code different to the existing one?
   
   val bytes = batches.next()
   bytes
   
   
   Is the same as
   
   batches.next()
   
   
   The asserts in between don't count as they don't have side effects. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #40323:
URL: https://github.com/apache/spark/pull/40323#issuecomment-1459254286

   thank you all, merged into master/branch-3.4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org