You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "paleolimbot (via GitHub)" <gi...@apache.org> on 2023/06/26 13:40:19 UTC

[GitHub] [arrow] paleolimbot opened a new pull request, #36304: GH-36121: [R] Warn for `set_io_thread_count()` with `num_threads` < 2

paleolimbot opened a new pull request, #36304:
URL: https://github.com/apache/arrow/pull/36304

   ### Rationale for this change
   
   Setting the number of threads in the IO thread pool to 1 causes a hang or crash when using some functions (notably: any Acero exec plan).
   
   ### What changes are included in this PR?
   
   `set_io_thread_count()` now warns for `num_threads == 1`:
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
   
   # Already errors from C++
   set_io_thread_count(0)
   #> Error: Invalid: ThreadPool capacity must be > 0
   # New warning!
   set_io_thread_count(1)
   #> Warning: `arrow::set_io_thread_count()` with num_threads < 2 may
   #> cause certain operations to hang or crash.
   #> ℹ Use num_threads >= 2 to support all operations
   # No warning!
   set_io_thread_count(2)
   ```
   
   <sup>Created on 2023-06-26 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>
   
   ### Are these changes tested?
   
   Yes
   
   ### Are there any user-facing changes?
   
   Yes: some existing code may issue a warning that previously did not. Documentation was added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #36304: GH-36121: [R] Warn for `set_io_thread_count()` with `num_threads` < 2

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #36304:
URL: https://github.com/apache/arrow/pull/36304#issuecomment-1607498226

   :warning: GitHub issue #36121 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #36304: GH-36121: [R] Warn for `set_io_thread_count()` with `num_threads` < 2

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #36304:
URL: https://github.com/apache/arrow/pull/36304#issuecomment-1614987439

   Conbench analyzed the 5 benchmark runs on commit `4a93a37a`.
   
   There were 3 benchmark results indicating a performance regression:
   
   - Commit Run on `arm64-m6g-linux-compute` at [2023-06-27 19:19:23Z](http://conbench.ursa.dev/compare/runs/bf5c588c08d24d5eb321d0f668bf6200...792e3c7b4b6e49ea94098c07e6d85c3e/)
     - [params=<Int32Type>/ConvertToSparseCOOTensorInt32, source=cpp-micro, suite=arrow-tensor-conversion-benchmark](http://conbench.ursa.dev/compare/benchmarks/0649b32427667a948000d31339f2a669...0649b36783537dac8000942d52c4eda3)
   
   - Commit Run on `arm64-t4g-linux-compute` at [2023-06-27 19:17:55Z](http://conbench.ursa.dev/compare/runs/f53cd69baef14a3a99b96b0db169ae46...9183bc9e6d4041a0b25a17065da121f2/)
     - [params=1048576/0, source=cpp-micro, suite=arrow-compute-scalar-if-else-benchmark](http://conbench.ursa.dev/compare/benchmarks/0649b335826d7a8980003ca6ad67724b...0649b35f1f1673288000b7c481217640)
   - and 1 more (see the report linked below)
   
   The [full Conbench report](https://github.com/apache/arrow/runs/14694431884) has more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot merged pull request #36304: GH-36121: [R] Warn for `set_io_thread_count()` with `num_threads` < 2

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot merged PR #36304:
URL: https://github.com/apache/arrow/pull/36304


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on pull request #36304: GH-36121: [R] Warn for `set_io_thread_count()` with `num_threads` < 2

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on PR #36304:
URL: https://github.com/apache/arrow/pull/36304#issuecomment-1609418628

   I'd like to keep the scope of this particular PR to the immediate concern, which is that somebody might set the number of IO threads to 1, observe a hang/crash, and not know why! I promise to follow up on this in #36324 because you're right: we don't dedicate much space to documenting these options and they can have a rather large impact on performance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] tdhock commented on pull request #36304: GH-36121: [R] Warn for `set_io_thread_count()` with `num_threads` < 2

Posted by "tdhock (via GitHub)" <gi...@apache.org>.
tdhock commented on PR #36304:
URL: https://github.com/apache/arrow/pull/36304#issuecomment-1608564686

   hi @paleolimbot thanks writing a PR that partially addresses the issue I created. 
   To fully address that issue, can you please add documentation to (1) clarify the difference between IO threads and CPU threads, and (2) to explain which function should be called to control the CSV reading operation?
   For (1) a link to the C++ doc web page https://arrow.apache.org/docs/cpp/threading.html would be very helpful.
   Could a link to that page be added on the R man pages for arrow::cpu_count and arrow::io_thread_count?
   For (2) I would have expected some mention of how to control number of threads used for CSV reading on the man page for read_csv_arrow, but there is no mention of threads on that man page. Something like "use arrow::set_cpu_count(N_CPUS) to tell arrow to use N_CPUS for reading the CSV file" on that man page would be useful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org