You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/23 08:27:06 UTC

[GitHub] [arrow-datafusion] Dandandan opened a new issue #928: Generate Tokio Runtime in ExecutionContext

Dandandan opened a new issue #928:
URL: https://github.com/apache/arrow-datafusion/issues/928


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   
   Currently a Tokio runtime is created outside of the control of DataFusion, and requires another user to specify a `#[tokio::main]` or another runtime.
   
   The downside of this is that the config is outside of the control of DataFusion and depends on it being configured externally.
   
   **Describe the solution you'd like**
   Create a custom `Runtime` when creating the execution context.
   Here we can also configure the worker thread count and max threads for a blocking thread pool based on a config or on number of CPU cores.
   
   Possibly, for some usecases it would be nice to allow plugging in a runtime (e.g. when you want to share the runtime).
   
   **Describe alternatives you've considered**
   
   
   **Additional context**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on issue #928: Generate Tokio Runtime in ExecutionContext

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #928:
URL: https://github.com/apache/arrow-datafusion/issues/928#issuecomment-903553316


   FYI @alamb as this also has impact on IOx


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #928: Generate Tokio Runtime in ExecutionContext

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #928:
URL: https://github.com/apache/arrow-datafusion/issues/928#issuecomment-903668640


   If we go this approach, care must be taken to ensure any newly created executor does not interfere with one created with `[tokio::main]` 
   
   Here is an example of such a dedicated tokio executor that we made in IOx: https://github.com/influxdata/influxdb_iox/blob/main/query/src/exec/task.rs#L48-L65
   
   I would be willing to contribute the `DedicatedExecutor` code back upstream to DataFusion if there is interest -- and we could add a configuration setting for thread counts, etc. I think the `DedicatedExecutor` also satisfies the usecase of "running DataFusion CPU-bound tasks on a different thread pool than IO bound tasks"
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #928: Generate Tokio Runtime in ExecutionContext

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #928:
URL: https://github.com/apache/arrow-datafusion/issues/928#issuecomment-932727267


   In case anyone is interested: https://docs.rs/tokio/1.12.0/tokio/#cpu-bound-tasks-and-blocking-code
   
   Is upgraded to say that using a separate tokio executor is not a bad idea: 
   
   > If your code is CPU-bound and you wish to limit the number of threads used to run it, you should use a separate thread pool dedicated to CPU bound tasks. For example, you could consider using the rayon library for CPU-bound tasks. It is also possible to create an extra Tokio runtime dedicated to CPU-bound tasks, but if you do this, you should be careful that the extra runtime runs only CPU-bound tasks, as IO-bound tasks on that runtime will behave poorly.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb edited a comment on issue #928: Generate Tokio Runtime in ExecutionContext

Posted by GitBox <gi...@apache.org>.
alamb edited a comment on issue #928:
URL: https://github.com/apache/arrow-datafusion/issues/928#issuecomment-903668640


   If we go this approach, care must be taken to ensure any newly created executor does not interfere with one created with `[tokio::main]` 
   
   Here is an example of such a dedicated tokio executor that we made in IOx: https://github.com/influxdata/influxdb_iox/blob/main/query/src/exec/task.rs#L48-L65
   
   I would be willing to contribute the `DedicatedExecutor` code back upstream to DataFusion if there is interest -- and we could add a configuration setting for thread counts, etc. I think the `DedicatedExecutor`can also be used to satisfy the usecase of "running DataFusion CPU-bound tasks on a different thread pool than IO bound tasks"
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org