You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "benibus (via GitHub)" <gi...@apache.org> on 2023/02/22 22:43:11 UTC
[GitHub] [arrow] benibus opened a new pull request, #34295: GH-25025: [C++] Build core compute kernels unconditionally
benibus opened a new pull request, #34295:
URL: https://github.com/apache/arrow/pull/34295
This includes the core compute machinery in libarrow by default - in addition to all cast kernels and several other kernels that are either dependencies of `cast` (`take`) or utilized in libarrow/libparquet (`unique`, `filter`). The remaining kernels won't be built/registered unless `ARROW_COMPUTE=ON` (note that this would slightly change the option's meaning, as currently, nothing in arrow/compute is built unless it's set).
Initially this was more substantial as the original goal was to build the extra kernels as a shared library (suggested in the orginal issue). After some discussion in the issue thread, I opted not to do that - primarily because I can't personally see the utility of a separate lib here, even ignoring the complexity it introduces. However, there may be a good reason that simply hasn't occured to me.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on a diff in pull request #34295: GH-34388: [C++] Build core compute kernels unconditionally
Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on code in PR #34295:
URL: https://github.com/apache/arrow/pull/34295#discussion_r1123857157
##########
cpp/src/arrow/compute/CMakeLists.txt:
##########
@@ -26,8 +26,22 @@ arrow_add_pkg_config("arrow-compute")
# Unit tests
#
+# As of 2023-02-28, the following kernels are always present:
Review Comment:
```suggestion
# The following kernels are always present:
```
I'm not sure the date is useful (and it will be hard to keep correct). Users can always use git blame if they need it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] benibus commented on pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally
Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1441141781
cc @felipecrv
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #34295: GH-34388: [C++] Build core compute kernels unconditionally
Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1454906046
Benchmark runs are scheduled for baseline = 4b31aa4f4d76b629e62f2223cd13906f6c911ccb and contender = be7a76340377cf23d5ee272f05c759160cdeb576. be7a76340377cf23d5ee272f05c759160cdeb576 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/876cfe5fa2e84e5b80b563674dc9dca2...4ceaaac76b5f45cf832c83621fadf1be/)
[Failed :arrow_down:0.4% :arrow_up:0.06%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/7de908e882894ff8bfe36ce5d3e6f853...d03ee89ebf4f4edaa1493728eb6ce8e3/)
[Finished :arrow_down:0.51% :arrow_up:0.26%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/11cc99f4fcfa44fc88dcc00c791cf80e...776a43a6560646818d92470c762c4630/)
[Finished :arrow_down:0.25% :arrow_up:0.03%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/3afd98affba24a09b157529c53ad4566...a7e5a41f5d3b4813a56682f82e542739/)
Buildkite builds:
[Finished] [`be7a7634` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2474)
[Finished] [`be7a7634` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2504)
[Finished] [`be7a7634` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2472)
[Finished] [`be7a7634` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2495)
[Finished] [`4b31aa4f` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2473)
[Failed] [`4b31aa4f` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2503)
[Finished] [`4b31aa4f` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2471)
[Finished] [`4b31aa4f` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2494)
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] benibus commented on pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally
Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1441138674
One thing I haven't decided is how to deal with the compute unit tests since most of them make heavy use of the extra kernels, so a good chunk of them will fail without them. Easiest option would be force `ARROW_COMPUTE=ON` if `ARROW_BUILD_TESTS=ON` (not the worst idea in the world, i guess - as this is a packaging-focused feature). Alternatively, we could just not build the tests in question - although that would include most of the tests in `compute/exec`.
(Also, I'll look into the unity build failures - not quite sure what's going wrong there...)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1440934890
* Closes: #25025
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] benibus commented on a diff in pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally
Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on code in PR #34295:
URL: https://github.com/apache/arrow/pull/34295#discussion_r1117280086
##########
cpp/src/arrow/compute/kernels/CMakeLists.txt:
##########
@@ -18,10 +18,11 @@
# ----------------------------------------------------------------------
# Scalar kernels
+add_arrow_compute_test(scalar_cast_test SOURCES scalar_cast_test.cc test_util.cc)
Review Comment:
It was so that the casts could be tested in isolation with/without the complete registry, given that none of the other scalar-type kernels are present in the base set. Also, the cast kernels seemed logically distinct enough to warrant a separate test IMO.
The testing situation isn't exactly ideal at the moment... The following will fail if ARROW_COMPUTE isn't enabled:
```
arrow-compute-scalar-type-test
arrow-compute-scalar-if-else-test
arrow-compute-scalar-temporal-test
arrow-compute-scalar-math-test
arrow-compute-scalar-utility-test
arrow-compute-vector-test
arrow-compute-aggregate-test
arrow-compute-expression-test
arrow-compute-plan-test
arrow-compute-hash-join-node-test
arrow-compute-asof-join-node-test
arrow-compute-groupby-test
```
I went into a bit more detail in an [earlier comment](https://github.com/apache/arrow/pull/34295#issuecomment-1441138674).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally
Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1444426004
This failure looks related: https://github.com/apache/arrow/actions/runs/4247620924/jobs/7385865305
Perhaps just a changing of include orders has angered the unity build gremlins in some way. Given the two implementations are identical maybe we could put them in util_internal.h?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #34295: GH-34388: [C++] Build core compute kernels unconditionally
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1448814340
* Closes: #34388
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally
Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1442534401
The goal is to eventually reduce the size of the "core" library right? Do we have any idea how slim this set of baseline kernels is compared to the full set?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] benibus commented on pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally
Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1442485458
Well to be fair, that might still be the right move as it'd be easy to make that a follow-up PR if we decide to go down that road (assuming we get the library boundaries right in this one).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] benibus commented on pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally
Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1443898863
> Do we have any idea how slim this set of baseline kernels is compared to the full set?
Here's what would be present in the default build:
```
array_filter
array_take
cast
dictionary_encode
drop_null
filter
indices_nonzero
take
unique
value_counts
```
There are currently 240 kernels in the full set, so it's a pretty deep cut.
> Do you want to add a CI job (or does one already exist?) that builds without ARROW_COMPUTE to ensure basic functionality (e.g. parquet reading/writing and csv reading/writing) still works?
That would be a good idea, yes. AFAIK none of the existing jobs build without ARROW_COMPUTE. Even if they did, the CSV writer/STL tests wouldn't be included and libparquet wouldn't be built at all.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace merged pull request #34295: GH-34388: [C++] Build core compute kernels unconditionally
Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace merged PR #34295:
URL: https://github.com/apache/arrow/pull/34295
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on a diff in pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally
Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on code in PR #34295:
URL: https://github.com/apache/arrow/pull/34295#discussion_r1117600493
##########
cpp/src/arrow/compute/kernels/CMakeLists.txt:
##########
@@ -18,10 +18,11 @@
# ----------------------------------------------------------------------
# Scalar kernels
+add_arrow_compute_test(scalar_cast_test SOURCES scalar_cast_test.cc test_util.cc)
Review Comment:
Ah, ok. That makes sense.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #34295: GH-34388: [C++] Build core compute kernels unconditionally
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1448814435
:warning: GitHub issue #34388 **has been automatically assigned in GitHub** to PR creator.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] benibus commented on pull request #34295: GH-34388: [C++] Build core compute kernels unconditionally
Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1448956483
I still need to add the CI job, but in preparation, I set things up so that certain tests won't be built without the complete kernel registry - so we wouldn't need any special ctest flags to avoid expected failures.
The unity build redefinition errors should be fixed now. Most of the problematic code in scalar_round.cc was actually completely unused, so I just removed it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] felipecrv commented on pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally
Posted by "felipecrv (via GitHub)" <gi...@apache.org>.
felipecrv commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1442262492
This isn't enough to close the issue, right? Do you want to associate this one with a sub-issue of #25025 so you can merge it before working on the shared library setup?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on a diff in pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally
Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on code in PR #34295:
URL: https://github.com/apache/arrow/pull/34295#discussion_r1116332440
##########
cpp/src/arrow/compute/kernels/CMakeLists.txt:
##########
@@ -18,10 +18,11 @@
# ----------------------------------------------------------------------
# Scalar kernels
+add_arrow_compute_test(scalar_cast_test SOURCES scalar_cast_test.cc test_util.cc)
Review Comment:
Why did you peel this off into its own test?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #34295: GH-25025: [C++] Build core compute kernels unconditionally
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1440934990
:warning: GitHub issue #25025 **has been automatically assigned in GitHub** to PR creator.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on pull request #34295: GH-34388: [C++] Build core compute kernels unconditionally
Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1454282642
@felipecrv / @lidavidm I haven't been following the discussion on #25025 very closely. However, this change seems good. I assume we want to proceed with it?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] lidavidm commented on pull request #34295: GH-34388: [C++] Build core compute kernels unconditionally
Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on PR #34295:
URL: https://github.com/apache/arrow/pull/34295#issuecomment-1454284392
Yes, either way, I think this is a necessary first step before we can unbundle the kernels + I hope it is easier to review this way
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org