You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "benibus (via GitHub)" <gi...@apache.org> on 2023/02/22 22:43:11 UTC

[GitHub] [arrow] benibus opened a new pull request, #34295: GH-25025: [C++] Build core compute kernels unconditionally

benibus opened a new pull request, #34295:
URL: https://github.com/apache/arrow/pull/34295

   This includes the core compute machinery in libarrow by default - in addition to all cast kernels and several other kernels that are either dependencies of `cast` (`take`) or utilized in libarrow/libparquet (`unique`, `filter`). The remaining kernels won't be built/registered unless `ARROW_COMPUTE=ON` (note that this would slightly change the option's meaning, as currently, nothing in arrow/compute is built unless it's set).
   
   Initially this was more substantial as the original goal was to build the extra kernels as a shared library (suggested in the orginal issue). After some discussion in the issue thread, I opted not to do that - primarily because I can't personally see the utility of a separate lib here, even ignoring the complexity it introduces. However, there may be a good reason that simply hasn't occured to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #34295: GH-34388: [C++] Build core compute kernels unconditionally

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on code in PR #34295:
URL: https://github.com/apache/arrow/pull/34295#discussion_r1123857157


##########
cpp/src/arrow/compute/CMakeLists.txt:
##########
@@ -26,8 +26,22 @@ arrow_add_pkg_config("arrow-compute")
 # Unit tests
 #
 
+# As of 2023-02-28, the following kernels are always present:

Review Comment:
   ```suggestion
   # The following kernels are always present:
   ```
   I'm not sure the date is useful (and it will be hard to keep correct).  Users can always use git blame if they need it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1441141781

   cc @felipecrv 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34295: GH-34388: [C++] Build core compute kernels unconditionally

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1454906046

   Benchmark runs are scheduled for baseline = 4b31aa4f4d76b629e62f2223cd13906f6c911ccb and contender = be7a76340377cf23d5ee272f05c759160cdeb576. be7a76340377cf23d5ee272f05c759160cdeb576 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/876cfe5fa2e84e5b80b563674dc9dca2...4ceaaac76b5f45cf832c83621fadf1be/)
   [Failed :arrow_down:0.4% :arrow_up:0.06%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/7de908e882894ff8bfe36ce5d3e6f853...d03ee89ebf4f4edaa1493728eb6ce8e3/)
   [Finished :arrow_down:0.51% :arrow_up:0.26%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/11cc99f4fcfa44fc88dcc00c791cf80e...776a43a6560646818d92470c762c4630/)
   [Finished :arrow_down:0.25% :arrow_up:0.03%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/3afd98affba24a09b157529c53ad4566...a7e5a41f5d3b4813a56682f82e542739/)
   Buildkite builds:
   [Finished] [`be7a7634` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2474)
   [Finished] [`be7a7634` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2504)
   [Finished] [`be7a7634` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2472)
   [Finished] [`be7a7634` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2495)
   [Finished] [`4b31aa4f` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2473)
   [Failed] [`4b31aa4f` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2503)
   [Finished] [`4b31aa4f` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2471)
   [Finished] [`4b31aa4f` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2494)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1441138674

   One thing I haven't decided is how to deal with the compute unit tests since most of them make heavy use of the extra kernels, so a good chunk of them will fail without them. Easiest option would be force `ARROW_COMPUTE=ON` if `ARROW_BUILD_TESTS=ON` (not the worst idea in the world, i guess - as this is a packaging-focused feature). Alternatively, we could just not build the tests in question - although that would include most of the tests in `compute/exec`.
   
   (Also, I'll look into the unity build failures - not quite sure what's going wrong there...)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1440934890

   * Closes: #25025


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on a diff in pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on code in PR #34295:
URL: https://github.com/apache/arrow/pull/34295#discussion_r1117280086


##########
cpp/src/arrow/compute/kernels/CMakeLists.txt:
##########
@@ -18,10 +18,11 @@
 # ----------------------------------------------------------------------
 # Scalar kernels
 
+add_arrow_compute_test(scalar_cast_test SOURCES scalar_cast_test.cc test_util.cc)

Review Comment:
   It was so that the casts could be tested in isolation with/without the complete registry, given that none of the other scalar-type kernels are present in the base set. Also, the cast kernels seemed logically distinct enough to warrant a separate test IMO.
   
   The testing situation isn't exactly ideal at the moment... The following will fail if ARROW_COMPUTE isn't enabled:
   ```
   arrow-compute-scalar-type-test
   arrow-compute-scalar-if-else-test
   arrow-compute-scalar-temporal-test
   arrow-compute-scalar-math-test
   arrow-compute-scalar-utility-test
   arrow-compute-vector-test
   arrow-compute-aggregate-test
   arrow-compute-expression-test
   arrow-compute-plan-test
   arrow-compute-hash-join-node-test
   arrow-compute-asof-join-node-test
   arrow-compute-groupby-test
   ```
   I went into a bit more detail in an [earlier comment](https://github.com/apache/arrow/pull/34295#issuecomment-1441138674).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1444426004

   This failure looks related: https://github.com/apache/arrow/actions/runs/4247620924/jobs/7385865305
   
   Perhaps just a changing of include orders has angered the unity build gremlins in some way.  Given the two implementations are identical maybe we could put them in util_internal.h?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34295: GH-34388: [C++] Build core compute kernels unconditionally

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1448814340

   * Closes: #34388


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1442534401

   The goal is to eventually reduce the size of the "core" library right?  Do we have any idea how slim this set of baseline kernels is compared to the full set?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1442485458

   Well to be fair, that might still be the right move as it'd be easy to make that a follow-up PR if we decide to go down that road (assuming we get the library boundaries right in this one).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1443898863

   > Do we have any idea how slim this set of baseline kernels is compared to the full set?
   
   Here's what would be present in the default build:
   ```
   array_filter
   array_take
   cast
   dictionary_encode
   drop_null
   filter
   indices_nonzero
   take
   unique
   value_counts
   ```
   There are currently 240 kernels in the full set, so it's a pretty deep cut.
   
   > Do you want to add a CI job (or does one already exist?) that builds without ARROW_COMPUTE to ensure basic functionality (e.g. parquet reading/writing and csv reading/writing) still works?
   
   That would be a good idea, yes. AFAIK none of the existing jobs build without ARROW_COMPUTE. Even if they did, the CSV writer/STL tests wouldn't be included and libparquet wouldn't be built at all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace merged pull request #34295: GH-34388: [C++] Build core compute kernels unconditionally

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace merged PR #34295:
URL: https://github.com/apache/arrow/pull/34295


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on code in PR #34295:
URL: https://github.com/apache/arrow/pull/34295#discussion_r1117600493


##########
cpp/src/arrow/compute/kernels/CMakeLists.txt:
##########
@@ -18,10 +18,11 @@
 # ----------------------------------------------------------------------
 # Scalar kernels
 
+add_arrow_compute_test(scalar_cast_test SOURCES scalar_cast_test.cc test_util.cc)

Review Comment:
   Ah, ok.  That makes sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34295: GH-34388: [C++] Build core compute kernels unconditionally

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1448814435

   :warning: GitHub issue #34388 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on pull request #34295: GH-34388: [C++] Build core compute kernels unconditionally

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1448956483

   I still need to add the CI job, but in preparation, I set things up so that certain tests won't be built without the complete kernel registry - so we wouldn't need any special ctest flags to avoid  expected failures.
   
   The unity build redefinition errors should be fixed now. Most of the problematic code in scalar_round.cc was actually completely unused, so I just removed it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] felipecrv commented on pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally

Posted by "felipecrv (via GitHub)" <gi...@apache.org>.
felipecrv commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1442262492

   This isn't enough to close the issue, right? Do you want to associate this one with a sub-issue of #25025 so you can merge it before working on the shared library setup?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on code in PR #34295:
URL: https://github.com/apache/arrow/pull/34295#discussion_r1116332440


##########
cpp/src/arrow/compute/kernels/CMakeLists.txt:
##########
@@ -18,10 +18,11 @@
 # ----------------------------------------------------------------------
 # Scalar kernels
 
+add_arrow_compute_test(scalar_cast_test SOURCES scalar_cast_test.cc test_util.cc)

Review Comment:
   Why did you peel this off into its own test?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1440934990

   :warning: GitHub issue #25025 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on pull request #34295: GH-34388: [C++] Build core compute kernels unconditionally

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1454282642

   @felipecrv / @lidavidm I haven't been following the discussion on #25025 very closely.  However, this change seems good.  I assume we want to proceed with it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #34295: GH-34388: [C++] Build core compute kernels unconditionally

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1454284392

   Yes, either way, I think this is a necessary first step before we can unbundle the kernels + I hope it is easier to review this way


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org