You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/10/13 18:37:03 UTC

[GitHub] [spark] mridulm commented on pull request #38064: [SPARK-40622][SQL][CORE]Result of a single task in collect() must fit in 2GB

mridulm commented on PR #38064:
URL: https://github.com/apache/spark/pull/38064#issuecomment-1278026577

   @liuzqt Most task results are very small.
   We will now be over-provisioning that by a few orders of magnitude when moving to `ChunkedByteBufferOutputStream` - while a vanishingly small set of cases hit the large buffer case.
   This can potentially have an impact on memory utilization at executor, and if possible look at ways to mitigate - particularly, for example, when we have a good estimate of the output size.
   
   This is not to say I have serious concerns (we do use `ChunkedByteBufferOutputStream` with precisely that size everywhere else !) - but it is not without tradeoff.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org