You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "juliuszsompolski (via GitHub)" <gi...@apache.org> on 2023/08/04 14:42:57 UTC

[GitHub] [spark] juliuszsompolski commented on pull request #42331: [SPARK-44656][CONNECT] Make all iterators CloseableIterators

juliuszsompolski commented on PR #42331:
URL: https://github.com/apache/spark/pull/42331#issuecomment-1665727848

   > I am just wondering if at this point using the GRPC iterators is the easiest? Would it be easier to use a stream observer instead?
   
   That would imply changing the whole execution model in the client from pull based to push based...
   While the only place where we expose a pull based iterator in an external facing API is Dataset.toLocalIterator, all the internal implementation if pull based currently, so practically the whole client would need to be refactored...
   Maybe push based would make it easier to ensure that our internal consumer pushes everything out... but that would also introduce complications like e.g. doing some flow control of the push, which ultimately could again lead to blocking and not consuming... I see this also as quite complicated.
   In every place except Dataset.toLocalIterator we control the full lifecycle of the pull based iterator. With the CloseableIterator we can now ensure that this gets closed...
   There is however indeed a possible followup. We handle consuming the iterator, and we handle closing the ExecutePlanResponseReattachableIterator after errors from server. But we should also audit if we could do better closing if other errors, including interrupt are thrown. I raised  https://issues.apache.org/jira/browse/SPARK-44676 for that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org