You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/22 07:32:58 UTC

[GitHub] [arrow-datafusion] yjshen commented on issue #1637: DiskManager Performs Blocking IO

yjshen commented on issue #1637:
URL: https://github.com/apache/arrow-datafusion/issues/1637#issuecomment-1106108822

   This could be closed according to the discussions in #2226 Google Docs?
   
   > @tustvold  I'm suggesting just doing the sync disk IO in the same dedicated threadpool used for all query computation. In my benchmarks, I've not seen a compelling advantage to offloading the IO elsewhere as the loss of thread-locality hurts performance, the complexity cost is high, and frankly most realistic workloads are not close to being IO bound
   > In the case of spilling to disk, one could make a convincing case that you're actually memory bound and doing something else concurrently would be actively detrimental...
   
   > @houqp I see, that's certainly an option. I don't have a good intuition on whether this will a more optimal setup or not without some hands on benchmarks. My understanding is that to make this work well, we will need to create slightly more threads in the CPU thread pool to reduce core idleness caused by IO at the cost of slightly more context switches resulted from preemptive scheduling.
   > I also don't like the complexity of async io, so it's good idea to start with something simple, then benchmark and iterate from there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org