You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "juliuszsompolski (via GitHub)" <gi...@apache.org> on 2023/08/04 00:13:43 UTC

[GitHub] [spark] juliuszsompolski opened a new pull request, #42331: [SPARK-44656][CONNECT] Make all iterators CloseableIterators

juliuszsompolski opened a new pull request, #42331:
URL: https://github.com/apache/spark/pull/42331

   ### What changes were proposed in this pull request?
   
   This makes sure that all iterators used in Spark Connect scala client are `CloseableIterator`.
   
   1. Makes `CustomSparkConnectBlockingStub.executePlan` return `CloseableIterator` and make all wrappers respect that.
   2. Makes `ExecutePlanResponseReattachableIterator` a `CloseableIterator`, with an implementation that will inform the server that query result can be released with ReleaseExecute.
   3. Makes `SparkResult.iterator` explicitly a `CloseableIterator`, and also register the `SparkResult.responses` iterator as with the `SparkResultCloseable` cleaner, which will make it close upon GC, if not closed explicitly sooner.
   4. Because `Dataset.toLocalIterator` requires a Java iterator, implement a conversion to `java.util.Iterator with AutoCloseable` to be returned there
   5. Using `CloseableIterator` consistently everywhere else removes the need to convert between iterator types.
   
   ### Why are the changes needed?
   
   Properly closeable iterators are needed for resource management, and with reattachable execution to inform server that processing finished.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Exercise current E2E tests.
   
   Co-authored-by: Alice Sayutina <al...@databricks.com>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski commented on pull request #42331: [SPARK-44656][CONNECT] Make all iterators CloseableIterators

Posted by "juliuszsompolski (via GitHub)" <gi...@apache.org>.
juliuszsompolski commented on PR #42331:
URL: https://github.com/apache/spark/pull/42331#issuecomment-1665727848

   > I am just wondering if at this point using the GRPC iterators is the easiest? Would it be easier to use a stream observer instead?
   
   That would imply changing the whole execution model in the client from pull based to push based...
   While the only place where we expose a pull based iterator in an external facing API is Dataset.toLocalIterator, all the internal implementation if pull based currently, so practically the whole client would need to be refactored...
   Maybe push based would make it easier to ensure that our internal consumer pushes everything out... but that would also introduce complications like e.g. doing some flow control of the push, which ultimately could again lead to blocking and not consuming... I see this also as quite complicated.
   In every place except Dataset.toLocalIterator we control the full lifecycle of the pull based iterator. With the CloseableIterator we can now ensure that this gets closed...
   There is however indeed a possible followup. We handle consuming the iterator, and we handle closing the ExecutePlanResponseReattachableIterator after errors from server. But we should also audit if we could do better closing if other errors, including interrupt are thrown. I raised  https://issues.apache.org/jira/browse/SPARK-44676 for that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on pull request #42331: [SPARK-44656][CONNECT] Make all iterators CloseableIterators

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #42331:
URL: https://github.com/apache/spark/pull/42331#issuecomment-1664950906

   A bit of a monkey wrench. I am fine with the current approach. I am just wondering if at this point using the GRPC iterators is the easiest? Would it be easier to use a stream observer instead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell closed pull request #42331: [SPARK-44656][CONNECT] Make all iterators CloseableIterators

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell closed pull request #42331: [SPARK-44656][CONNECT] Make all iterators CloseableIterators
URL: https://github.com/apache/spark/pull/42331


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on pull request #42331: [SPARK-44656][CONNECT] Make all iterators CloseableIterators

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #42331:
URL: https://github.com/apache/spark/pull/42331#issuecomment-1665667540

   Merging


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org