You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/03/12 18:46:44 UTC

[GitHub] [spark] BryanCutler commented on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope

BryanCutler commented on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope
URL: https://github.com/apache/spark/pull/24070#issuecomment-472133193
 
 
   I just want to highlight that the error that this fixes only kill the serving thread and Spark can continue normal operation. Although the error is pretty ugly and would lead users to think that something went terribly wrong. Since it's pretty common to not fully consume an iterator, e.g. taking a slice, I believe it is worth making this change.
   
   It is also possible that this change would be very beneficial because if the iterator is not fully consumed, it could save the triggering of unneeded jobs where the behavior before eagerly queued jobs for all partitions. In this sense, the change here more closely follows the Scala behavior.
   
   I'm also not entirely sure why I'm seeing a speedup for the RDD toLocalIterator. When using 8 partitions instead of 32, I noticed a slowdown. I will try to run some more tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org