You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "kou (via GitHub)" <gi...@apache.org> on 2023/05/24 00:39:58 UTC

[GitHub] [arrow] kou commented on a diff in pull request #35672: GH-35176: [C++] Add support for disabling threading for emscripten (from sub-branch)

kou commented on code in PR #35672:
URL: https://github.com/apache/arrow/pull/35672#discussion_r1203200360


##########
ci/scripts/cpp_build.sh:
##########
@@ -129,6 +129,7 @@ cmake \
   -DARROW_WITH_UTF8PROC=${ARROW_WITH_UTF8PROC:-ON} \
   -DARROW_WITH_ZLIB=${ARROW_WITH_ZLIB:-OFF} \
   -DARROW_WITH_ZSTD=${ARROW_WITH_ZSTD:-OFF} \
+  -DARROW_ENABLE_THREADING=${ARROW_ENABLE_THREADING:-ON} \

Review Comment:
   Could you keep this list in alphabetical order?



##########
cpp/cmake_modules/ThirdpartyToolchain.cmake:
##########
@@ -5190,4 +5190,4 @@ message(STATUS "All bundled static libraries: ${ARROW_BUNDLED_STATIC_LIBS}")
 
 configure_file("src/arrow/util/config.h.cmake" "src/arrow/util/config.h" ESCAPE_QUOTES)
 install(FILES "${ARROW_BINARY_DIR}/src/arrow/util/config.h"
-        DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/arrow/util")
+        DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/arrow/util")

Review Comment:
   Could you revert this needless change?



##########
ci/scripts/cpp_test.sh:
##########
@@ -80,6 +80,12 @@ pushd ${build_dir}
 if [ -z "${PYTHON}" ] && ! which python > /dev/null 2>&1; then
   export PYTHON="${PYTHON:-python3}"
 fi
+if [[ "${ARROW_ENABLE_THREADING}" == "OFF" ]]; then
+# if threading is disabled, some tests take longer to run
+# but we can get away with running more tests in parallel because we know they wont make loads of threads
+  ARROW_CTEST_TIMEOUT=${ARROW_CTEST_TIMEOUT:-900}
+  n_jobs=$((n_jobs*4))

Review Comment:
   I don't think that this is a good idea...
   If we use more test processes than the number of cores, each test process competes CPU resource with others...



##########
ci/scripts/cpp_test.sh:
##########
@@ -80,6 +80,12 @@ pushd ${build_dir}
 if [ -z "${PYTHON}" ] && ! which python > /dev/null 2>&1; then
   export PYTHON="${PYTHON:-python3}"
 fi
+if [[ "${ARROW_ENABLE_THREADING}" == "OFF" ]]; then
+# if threading is disabled, some tests take longer to run
+# but we can get away with running more tests in parallel because we know they wont make loads of threads

Review Comment:
   ```suggestion
     # if threading is disabled, some tests take longer to run
     # but we can get away with running more tests in parallel because we know they won't make loads of threads
   ```



##########
cpp/src/arrow/acero/asof_join_node.cc:
##########
@@ -1514,14 +1514,19 @@ class AsofJoinNode : public ExecNode {
   }
 
   Status StartProducing() override {
-    ARROW_ASSIGN_OR_RAISE(process_task_, plan_->query_context()->BeginExternalTask(
-                                             "AsofJoinNode::ProcessThread"));
-    if (!process_task_.is_valid()) {
-      // Plan has already aborted.  Do not start process thread
+    #ifndef ARROW_ENABLE_THREADING
+      return Status::NotImplemented("ASOF join requires threading enabled");
+    #else

Review Comment:
   Can we use the following approach?
   
   ```cpp
   Status StartProducing() override {
   #ifndef ARROW_ENABLE_THREADING
     return Status::NotImplemented("ASOF join requires threading enabled");
   #endif
     ARROW_ASSIGN_OR_RAISE(process_task_, plan_->query_context()->BeginExternalTask(
     // use existing code as-is.
   ```
   
   Putting existing code to `#else ... #endif` reduces readability.



##########
cpp/src/arrow/acero/bloom_filter_test.cc:
##########
@@ -507,7 +507,11 @@ TEST(BloomFilter, Scaling) {
   num_build.push_back(4000000);
 
   std::vector<BloomFilterBuildStrategy> strategy;
+#ifndef ARROW_ENABLE_THREADING
+  strategy.push_back(BloomFilterBuildStrategy::SINGLE_THREADED);
+#else
   strategy.push_back(BloomFilterBuildStrategy::PARALLEL);
+#endif

Review Comment:
   Could you use `#ifdef ... #else ... #endif` not `#ifndef ... #else ... #endif` as much as possible? A `#ifndef` (invert) and `#else` (invert) pair reduces readability.
   
   ```cpp
   #ifdef ARROW_ENABLE_THREADING
     strategy.push_back(BloomFilterBuildStrategy::PARALLEL);
   #else
     strategy.push_back(BloomFilterBuildStrategy::SINGLE_THREADED);
   #endif
   ```



##########
ci/scripts/cpp_test.sh:
##########
@@ -80,6 +80,12 @@ pushd ${build_dir}
 if [ -z "${PYTHON}" ] && ! which python > /dev/null 2>&1; then
   export PYTHON="${PYTHON:-python3}"
 fi
+if [[ "${ARROW_ENABLE_THREADING}" == "OFF" ]]; then
+# if threading is disabled, some tests take longer to run
+# but we can get away with running more tests in parallel because we know they wont make loads of threads
+  ARROW_CTEST_TIMEOUT=${ARROW_CTEST_TIMEOUT:-900}

Review Comment:
   It seems that max test time is "239.99 sec" in our CI:
   
   https://github.com/ursacomputing/crossbow/actions/runs/5018372445/jobs/8997694994#step:6:3234
   
   > ```text
   > 89/103 Test  #21: arrow-compute-scalar-if-else-test .........   Passed  239.99 sec
   > ```
   
   Why is this needed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org