You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/06 23:38:23 UTC

[GitHub] [arrow] westonpace commented on pull request #13028: ARROW-16083: [WIP][C++] Implement AsofJoin execution node

westonpace commented on PR #13028:
URL: https://github.com/apache/arrow/pull/13028#issuecomment-1120077381

Sure. "Thread per core" is probably a bit of a misnomer too, but I haven't found a nicer term yet. The default thread pool size is std::hardware_concurrency which is the maximum number of concurrent threads the hardware supports. So we do not over-allocate threads.

When dealing with I/O you normally want to make sure the system is doing useful work while the I/O is happening. One possible solution is the synchronous approach where you create a pool with a lot of threads, more than your CPU can handle. When I/O is encountered you simply block synchronously on the I/O and let the OS schedule a different thread onto the hardware.

We don't do that today. Instead we take an asynchronous approach. To implement this we actually have two thread pools. The I/O thread pool is sized based on how many concurrent I/O requests make sense (e.g. not very many for HDD and a lot for S3). It is expected these threads are usually in a waiting state.

The second thread pool (the one that, by default, drives the execution engine) is the CPU thread pool. This thread pool (again, by default) has a fixed size based on the processor hardware. It's very important not to block a CPU thread because that usually means you are under utilizing the hardware.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org