You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/31 07:12:08 UTC
[GitHub] [arrow] jianxind opened a new pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
jianxind opened a new pull request #7871:
URL: https://github.com/apache/arrow/pull/7871
1. Use BitBlockCounter to speedup the performance for typical 0.01% null-able data.
2. Enable AVX compiler auto vectorize version for no-nulls on int types. Float/Double use fmin/fmax to handle NaN which can't be vectorize by compiler.
3. Also add test case to cover different null probability.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#discussion_r480175509
##########
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##########
@@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) {
AssertDatumsEqual(explicit_defaults, no_options_provided);
}
+template <typename ArrowType>
+using MinMaxResult = std::pair<typename ArrowType::c_type, typename ArrowType::c_type>;
+
+template <typename ArrowType>
+static enable_if_integer<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+ const Array& array) {
+ using T = typename ArrowType::c_type;
+ using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+ const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+ const auto values = array_numeric.raw_values();
+
+ if (array.length() <= array.null_count()) { // All null values
+ return {static_cast<T>(0), static_cast<T>(0)};
+ }
+
+ T min = std::numeric_limits<T>::max();
+ T max = std::numeric_limits<T>::min();
+ if (array.null_count() != 0) { // Some values are null
+ internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+ array.length());
+ for (int64_t i = 0; i < array.length(); i++) {
+ if (reader.IsSet()) {
+ min = std::min(min, values[i]);
+ max = std::max(max, values[i]);
+ }
+ reader.Next();
+ }
+ } else { // All true values
+ for (int64_t i = 0; i < array.length(); i++) {
+ min = std::min(min, values[i]);
+ max = std::max(max, values[i]);
+ }
+ }
+
+ return {min, max};
+}
+
+template <typename ArrowType>
+static enable_if_floating_point<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+ const Array& array) {
+ using T = typename ArrowType::c_type;
+ using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+ const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+ const auto values = array_numeric.raw_values();
+
+ if (array.length() <= array.null_count()) { // All null values
+ return {static_cast<T>(0), static_cast<T>(0)};
+ }
+
+ T min = std::numeric_limits<T>::infinity();
+ T max = -std::numeric_limits<T>::infinity();
+ if (array.null_count() != 0) { // Some values are null
+ internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+ array.length());
+ for (int64_t i = 0; i < array.length(); i++) {
+ if (reader.IsSet()) {
+ min = std::fmin(min, values[i]);
+ max = std::fmax(max, values[i]);
+ }
+ reader.Next();
+ }
+ } else { // All true values
+ for (int64_t i = 0; i < array.length(); i++) {
+ min = std::fmin(min, values[i]);
+ max = std::fmax(max, values[i]);
+ }
+ }
+
+ return {min, max};
+}
+
+template <typename ArrowType>
+void ValidateMinMax(const Array& array) {
+ using Traits = TypeTraits<ArrowType>;
+ using ScalarType = typename Traits::ScalarType;
+
+ ASSERT_OK_AND_ASSIGN(Datum out, MinMax(array));
+ const StructScalar& value = out.scalar_as<StructScalar>();
+
+ auto expected = NaiveMinMax<ArrowType>(array);
+ const auto& out_min = checked_cast<const ScalarType&>(*value.value[0]);
+ ASSERT_EQ(expected.first, out_min.value);
Review comment:
In this case, I would expect the test to look like this:
```c++
ASSERT_FALSE(out_min.is_valid);
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jianxind commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
jianxind commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-685615876
> ARROW_USER_SIMD_LEVEL=none
Below is the cmd I used, and compiler vectorise happens only on Int types.
```
ARROW_USER_SIMD_LEVEL=avx2 ./release/arrow-compute-aggregate-benchmark --benchmark_filter=MinMaxKernelInt64
ARROW_USER_SIMD_LEVEL=none ./release/arrow-compute-aggregate-benchmark --benchmark_filter=MinMaxKernelInt64
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-685656632
Rebased.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jianxind commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
jianxind commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-672689445
Ping. @wesm @pitrou
Could you help to review this? Similar approach to sum kernel, use compiler to vectorise the NoNulls part, use BitBlockCounter on the 0.01% data. https://github.com/apache/arrow/pull/7870 add the benchmark item for MinMax kernel.
Thanks.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou edited a comment on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
pitrou edited a comment on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-685637355
Ah, I also had `-DARROW_SIMD_LEVEL=AVX2` in CMake. Without it I do see a difference.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou closed pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #7871:
URL: https://github.com/apache/arrow/pull/7871
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#discussion_r481101916
##########
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##########
@@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) {
AssertDatumsEqual(explicit_defaults, no_options_provided);
}
+template <typename ArrowType>
+using MinMaxResult = std::pair<typename ArrowType::c_type, typename ArrowType::c_type>;
+
+template <typename ArrowType>
+static enable_if_integer<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+ const Array& array) {
+ using T = typename ArrowType::c_type;
+ using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+ const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+ const auto values = array_numeric.raw_values();
+
+ if (array.length() <= array.null_count()) { // All null values
+ return {static_cast<T>(0), static_cast<T>(0)};
+ }
+
+ T min = std::numeric_limits<T>::max();
+ T max = std::numeric_limits<T>::min();
+ if (array.null_count() != 0) { // Some values are null
+ internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+ array.length());
+ for (int64_t i = 0; i < array.length(); i++) {
+ if (reader.IsSet()) {
+ min = std::min(min, values[i]);
+ max = std::max(max, values[i]);
+ }
+ reader.Next();
+ }
+ } else { // All true values
+ for (int64_t i = 0; i < array.length(); i++) {
+ min = std::min(min, values[i]);
+ max = std::max(max, values[i]);
+ }
+ }
+
+ return {min, max};
+}
+
+template <typename ArrowType>
+static enable_if_floating_point<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+ const Array& array) {
+ using T = typename ArrowType::c_type;
+ using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+ const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+ const auto values = array_numeric.raw_values();
+
+ if (array.length() <= array.null_count()) { // All null values
+ return {static_cast<T>(0), static_cast<T>(0)};
+ }
+
+ T min = std::numeric_limits<T>::infinity();
+ T max = -std::numeric_limits<T>::infinity();
+ if (array.null_count() != 0) { // Some values are null
+ internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+ array.length());
+ for (int64_t i = 0; i < array.length(); i++) {
+ if (reader.IsSet()) {
+ min = std::fmin(min, values[i]);
+ max = std::fmax(max, values[i]);
+ }
+ reader.Next();
+ }
+ } else { // All true values
+ for (int64_t i = 0; i < array.length(); i++) {
+ min = std::fmin(min, values[i]);
+ max = std::fmax(max, values[i]);
+ }
+ }
+
+ return {min, max};
+}
+
+template <typename ArrowType>
+void ValidateMinMax(const Array& array) {
+ using Traits = TypeTraits<ArrowType>;
+ using ScalarType = typename Traits::ScalarType;
+
+ ASSERT_OK_AND_ASSIGN(Datum out, MinMax(array));
+ const StructScalar& value = out.scalar_as<StructScalar>();
+
+ auto expected = NaiveMinMax<ArrowType>(array);
+ const auto& out_min = checked_cast<const ScalarType&>(*value.value[0]);
+ ASSERT_EQ(expected.first, out_min.value);
Review comment:
I don't think that's sufficient. You should probably add a `bool is_valid` member to `MinMaxResult`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jianxind commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
jianxind commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-673192812
> @ursabot benchmark --suite-filter=arrow-compute-aggregate-benchmark --benchmark-filter=MinMax
Below is the results for null_percent 0.01% and 0% on https://ci.ursalabs.org/#/builders/73/builds/101
```
benchmark baseline contender change % counters
3 MinMaxKernelInt8/1048576/10000 812.254 MiB/sec 7.952 GiB/sec 902.442 {'run_name': 'MinMaxKernelInt8/1048576/10000',...
31 MinMaxKernelInt16/1048576/10000 1.583 GiB/sec 12.895 GiB/sec 714.512 {'run_name': 'MinMaxKernelInt16/1048576/10000'...
16 MinMaxKernelInt32/1048576/10000 3.152 GiB/sec 16.605 GiB/sec 426.876 {'run_name': 'MinMaxKernelInt32/1048576/10000'...
2 MinMaxKernelInt64/1048576/0 5.289 GiB/sec 11.092 GiB/sec 109.708 {'run_name': 'MinMaxKernelInt64/1048576/0', 'r...
14 MinMaxKernelInt64/1048576/10000 6.222 GiB/sec 10.055 GiB/sec 61.610 {'run_name': 'MinMaxKernelInt64/1048576/10000'...
1 MinMaxKernelInt32/1048576/0 18.103 GiB/sec 26.301 GiB/sec 45.282 {'run_name': 'MinMaxKernelInt32/1048576/0', 'r...
15 MinMaxKernelInt16/1048576/0 18.086 GiB/sec 26.274 GiB/sec 45.269 {'run_name': 'MinMaxKernelInt16/1048576/0', 'r...
7 MinMaxKernelInt8/1048576/0 18.112 GiB/sec 26.210 GiB/sec 44.708 {'run_name': 'MinMaxKernelInt8/1048576/0', 'ru...
26 MinMaxKernelDouble/1048576/10000 1.063 GiB/sec 1.315 GiB/sec 23.759 {'run_name': 'MinMaxKernelDouble/1048576/10000...
23 MinMaxKernelFloat/1048576/10000 551.756 MiB/sec 674.455 MiB/sec 22.238 {'run_name': 'MinMaxKernelFloat/1048576/10000'...
0 MinMaxKernelDouble/1048576/0 1.205 GiB/sec 1.332 GiB/sec 10.600 {'run_name': 'MinMaxKernelDouble/1048576/0', '...
12 MinMaxKernelFloat/1048576/0 621.824 MiB/sec 607.146 MiB/sec -2.361 {'run_name': 'MinMaxKernelFloat/1048576/0', 'r...
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-685587641
Passing `ARROW_USER_SIMD_LEVEL=none` doesn't seem to impact the results. Is something amiss?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jianxind removed a comment on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
jianxind removed a comment on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-669047527
Need a rebase after https://github.com/apache/arrow/pull/7903
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-666974009
https://issues.apache.org/jira/browse/ARROW-9605
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jianxind commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
jianxind commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-666978438
I can trigger a benchmark action once https://github.com/apache/arrow/pull/7870 get merged.
Below is the BM data for int types on my setup:
```
Before:
MinMaxKernelInt8/1048576/10000 847 us 845 us 828 bytes_per_second=1.15586G/s null_percent=0.01 size=1048.58k
MinMaxKernelInt8/1048576/0 43.9 us 43.8 us 15738 bytes_per_second=22.294G/s null_percent=0 size=1048.58k
MinMaxKernelInt16/1048576/10000 429 us 428 us 1637 bytes_per_second=2.28348G/s null_percent=0.01 size=1048.58k
MinMaxKernelInt16/1048576/0 42.4 us 42.4 us 15878 bytes_per_second=23.0572G/s null_percent=0 size=1048.58k
MinMaxKernelInt32/1048576/10000 295 us 294 us 2383 bytes_per_second=3.31751G/s null_percent=0.01 size=1048.58k
MinMaxKernelInt32/1048576/0 42.1 us 42.0 us 16620 bytes_per_second=23.2245G/s null_percent=0 size=1048.58k
MinMaxKernelInt64/1048576/10000 112 us 112 us 6309 bytes_per_second=8.70966G/s null_percent=0.01 size=1048.58k
MinMaxKernelInt64/1048576/0 82.2 us 82.1 us 8537 bytes_per_second=11.8992G/s null_percent=0 size=1048.58k
After(AVX2):
MinMaxKernelInt8/1048576/10000 92.9 us 92.6 us 7568 bytes_per_second=10.5421G/s null_percent=0.01 size=1048.58k
MinMaxKernelInt8/1048576/0 31.3 us 31.2 us 21832 bytes_per_second=31.2619G/s null_percent=0 size=1048.58k
MinMaxKernelInt16/1048576/10000 60.7 us 60.5 us 11501 bytes_per_second=16.1388G/s null_percent=0.01 size=1048.58k
MinMaxKernelInt16/1048576/0 31.5 us 31.4 us 22316 bytes_per_second=31.1085G/s null_percent=0 size=1048.58k
MinMaxKernelInt32/1048576/10000 51.0 us 50.9 us 13841 bytes_per_second=19.1853G/s null_percent=0.01 size=1048.58k
MinMaxKernelInt32/1048576/0 31.8 us 31.7 us 22111 bytes_per_second=30.8189G/s null_percent=0 size=1048.58k
MinMaxKernelInt64/1048576/10000 61.1 us 61.0 us 11610 bytes_per_second=16.016G/s null_percent=0.01 size=1048.58k
MinMaxKernelInt64/1048576/0 54.2 us 54.1 us 12935 bytes_per_second=18.0651G/s null_percent=0 size=1048.58k
AVX512:
MinMaxKernelInt32/1048576/10000 40.9 us 40.8 us 17151 bytes_per_second=23.9207G/s null_percent=0.01 size=1048.58k
MinMaxKernelInt32/1048576/0 25.6 us 25.6 us 26669 bytes_per_second=38.2196G/s null_percent=0 size=1048.58k
MinMaxKernelInt64/1048576/10000 34.5 us 34.4 us 20137 bytes_per_second=28.396G/s null_percent=0.01 size=1048.58k
MinMaxKernelInt64/1048576/0 23.7 us 23.7 us 25949 bytes_per_second=41.2537G/s null_percent=0 size=1048.58k
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#discussion_r477425988
##########
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##########
@@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) {
AssertDatumsEqual(explicit_defaults, no_options_provided);
}
+template <typename ArrowType>
+using MinMaxResult = std::pair<typename ArrowType::c_type, typename ArrowType::c_type>;
+
+template <typename ArrowType>
+static enable_if_integer<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+ const Array& array) {
+ using T = typename ArrowType::c_type;
+ using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+ const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+ const auto values = array_numeric.raw_values();
+
+ if (array.length() <= array.null_count()) { // All null values
+ return {static_cast<T>(0), static_cast<T>(0)};
+ }
+
+ T min = std::numeric_limits<T>::max();
+ T max = std::numeric_limits<T>::min();
+ if (array.null_count() != 0) { // Some values are null
+ internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+ array.length());
+ for (int64_t i = 0; i < array.length(); i++) {
+ if (reader.IsSet()) {
+ min = std::min(min, values[i]);
+ max = std::max(max, values[i]);
+ }
+ reader.Next();
+ }
+ } else { // All true values
+ for (int64_t i = 0; i < array.length(); i++) {
+ min = std::min(min, values[i]);
+ max = std::max(max, values[i]);
+ }
+ }
+
+ return {min, max};
+}
+
+template <typename ArrowType>
+static enable_if_floating_point<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+ const Array& array) {
+ using T = typename ArrowType::c_type;
+ using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+ const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+ const auto values = array_numeric.raw_values();
+
+ if (array.length() <= array.null_count()) { // All null values
+ return {static_cast<T>(0), static_cast<T>(0)};
+ }
+
+ T min = std::numeric_limits<T>::infinity();
+ T max = -std::numeric_limits<T>::infinity();
+ if (array.null_count() != 0) { // Some values are null
+ internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+ array.length());
+ for (int64_t i = 0; i < array.length(); i++) {
+ if (reader.IsSet()) {
+ min = std::fmin(min, values[i]);
+ max = std::fmax(max, values[i]);
+ }
+ reader.Next();
+ }
+ } else { // All true values
+ for (int64_t i = 0; i < array.length(); i++) {
+ min = std::fmin(min, values[i]);
+ max = std::fmax(max, values[i]);
+ }
+ }
+
+ return {min, max};
+}
+
+template <typename ArrowType>
+void ValidateMinMax(const Array& array) {
+ using Traits = TypeTraits<ArrowType>;
+ using ScalarType = typename Traits::ScalarType;
+
+ ASSERT_OK_AND_ASSIGN(Datum out, MinMax(array));
+ const StructScalar& value = out.scalar_as<StructScalar>();
+
+ auto expected = NaiveMinMax<ArrowType>(array);
+ const auto& out_min = checked_cast<const ScalarType&>(*value.value[0]);
+ ASSERT_EQ(expected.first, out_min.value);
+
+ const auto& out_max = checked_cast<const ScalarType&>(*value.value[1]);
+ ASSERT_EQ(expected.second, out_max.value);
+}
+
+template <typename ArrowType>
+class TestRandomNumericMinMaxKernel : public ::testing::Test {};
+
+TYPED_TEST_SUITE(TestRandomNumericMinMaxKernel, NumericArrowTypes);
+TYPED_TEST(TestRandomNumericMinMaxKernel, RandomArrayMinMax) {
+ auto rand = random::RandomArrayGenerator(0x8afc055);
+ // Test size up to 1<<13 (8192).
Review comment:
Sounds a bit large. Why not stop at e.g. 1024?
##########
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##########
@@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) {
AssertDatumsEqual(explicit_defaults, no_options_provided);
}
+template <typename ArrowType>
+using MinMaxResult = std::pair<typename ArrowType::c_type, typename ArrowType::c_type>;
+
+template <typename ArrowType>
+static enable_if_integer<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+ const Array& array) {
+ using T = typename ArrowType::c_type;
+ using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+ const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+ const auto values = array_numeric.raw_values();
+
+ if (array.length() <= array.null_count()) { // All null values
+ return {static_cast<T>(0), static_cast<T>(0)};
+ }
+
+ T min = std::numeric_limits<T>::max();
+ T max = std::numeric_limits<T>::min();
+ if (array.null_count() != 0) { // Some values are null
+ internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+ array.length());
+ for (int64_t i = 0; i < array.length(); i++) {
+ if (reader.IsSet()) {
+ min = std::min(min, values[i]);
+ max = std::max(max, values[i]);
+ }
+ reader.Next();
+ }
+ } else { // All true values
+ for (int64_t i = 0; i < array.length(); i++) {
+ min = std::min(min, values[i]);
+ max = std::max(max, values[i]);
+ }
+ }
+
+ return {min, max};
+}
+
+template <typename ArrowType>
+static enable_if_floating_point<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+ const Array& array) {
+ using T = typename ArrowType::c_type;
+ using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+ const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+ const auto values = array_numeric.raw_values();
+
+ if (array.length() <= array.null_count()) { // All null values
+ return {static_cast<T>(0), static_cast<T>(0)};
+ }
+
+ T min = std::numeric_limits<T>::infinity();
+ T max = -std::numeric_limits<T>::infinity();
+ if (array.null_count() != 0) { // Some values are null
+ internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+ array.length());
+ for (int64_t i = 0; i < array.length(); i++) {
+ if (reader.IsSet()) {
+ min = std::fmin(min, values[i]);
+ max = std::fmax(max, values[i]);
+ }
+ reader.Next();
+ }
+ } else { // All true values
+ for (int64_t i = 0; i < array.length(); i++) {
+ min = std::fmin(min, values[i]);
+ max = std::fmax(max, values[i]);
+ }
+ }
+
+ return {min, max};
+}
+
+template <typename ArrowType>
+void ValidateMinMax(const Array& array) {
+ using Traits = TypeTraits<ArrowType>;
+ using ScalarType = typename Traits::ScalarType;
+
+ ASSERT_OK_AND_ASSIGN(Datum out, MinMax(array));
+ const StructScalar& value = out.scalar_as<StructScalar>();
+
+ auto expected = NaiveMinMax<ArrowType>(array);
+ const auto& out_min = checked_cast<const ScalarType&>(*value.value[0]);
+ ASSERT_EQ(expected.first, out_min.value);
Review comment:
How does this work when the array is all nulls (`null_probability` = 1.0 below)?
##########
File path: cpp/src/arrow/compute/api_aggregate.h
##########
@@ -130,23 +130,6 @@ Result<Datum> MinMax(const Datum& value,
const MinMaxOptions& options = MinMaxOptions::Defaults(),
ExecContext* ctx = NULLPTR);
-/// \brief Calculate the min / max of a numeric array.
-///
-/// This function returns both the min and max as a collection. The resulting
-/// datum thus consists of two scalar datums: {Datum(min), Datum(max)}
-///
-/// \param[in] array input array
-/// \param[in] options see MinMaxOptions for more information
-/// \param[in] ctx the function execution context, optional
-/// \return resulting datum containing a {min, max} collection
-///
-/// \since 1.0.0
-/// \note API not yet finalized
-ARROW_EXPORT
-Result<Datum> MinMax(const Array& array,
Review comment:
Hmm... is this removal deliberate?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jianxind removed a comment on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
jianxind removed a comment on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-673185847
@ursabot benchmark --suite-filter=arrow-compute-aggregate-benchmark --benchmark_filter="MinMax"
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jianxind commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
jianxind commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-669047527
Need a rebase after https://github.com/apache/arrow/pull/7903
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jianxind commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
jianxind commented on a change in pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#discussion_r477781881
##########
File path: cpp/src/arrow/compute/api_aggregate.h
##########
@@ -130,23 +130,6 @@ Result<Datum> MinMax(const Datum& value,
const MinMaxOptions& options = MinMaxOptions::Defaults(),
ExecContext* ctx = NULLPTR);
-/// \brief Calculate the min / max of a numeric array.
-///
-/// This function returns both the min and max as a collection. The resulting
-/// datum thus consists of two scalar datums: {Datum(min), Datum(max)}
-///
-/// \param[in] array input array
-/// \param[in] options see MinMaxOptions for more information
-/// \param[in] ctx the function execution context, optional
-/// \return resulting datum containing a {min, max} collection
-///
-/// \since 1.0.0
-/// \note API not yet finalized
-ARROW_EXPORT
-Result<Datum> MinMax(const Array& array,
Review comment:
Yes. There's no implementation to Array input of MinMax, it will hit a link error if call MinMax with Array data.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jianxind commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
jianxind commented on a change in pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#discussion_r481880440
##########
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##########
@@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) {
AssertDatumsEqual(explicit_defaults, no_options_provided);
}
+template <typename ArrowType>
+using MinMaxResult = std::pair<typename ArrowType::c_type, typename ArrowType::c_type>;
+
+template <typename ArrowType>
+static enable_if_integer<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+ const Array& array) {
+ using T = typename ArrowType::c_type;
+ using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+ const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+ const auto values = array_numeric.raw_values();
+
+ if (array.length() <= array.null_count()) { // All null values
+ return {static_cast<T>(0), static_cast<T>(0)};
+ }
+
+ T min = std::numeric_limits<T>::max();
+ T max = std::numeric_limits<T>::min();
+ if (array.null_count() != 0) { // Some values are null
+ internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+ array.length());
+ for (int64_t i = 0; i < array.length(); i++) {
+ if (reader.IsSet()) {
+ min = std::min(min, values[i]);
+ max = std::max(max, values[i]);
+ }
+ reader.Next();
+ }
+ } else { // All true values
+ for (int64_t i = 0; i < array.length(); i++) {
+ min = std::min(min, values[i]);
+ max = std::max(max, values[i]);
+ }
+ }
+
+ return {min, max};
+}
+
+template <typename ArrowType>
+static enable_if_floating_point<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+ const Array& array) {
+ using T = typename ArrowType::c_type;
+ using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+ const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+ const auto values = array_numeric.raw_values();
+
+ if (array.length() <= array.null_count()) { // All null values
+ return {static_cast<T>(0), static_cast<T>(0)};
+ }
+
+ T min = std::numeric_limits<T>::infinity();
+ T max = -std::numeric_limits<T>::infinity();
+ if (array.null_count() != 0) { // Some values are null
+ internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+ array.length());
+ for (int64_t i = 0; i < array.length(); i++) {
+ if (reader.IsSet()) {
+ min = std::fmin(min, values[i]);
+ max = std::fmax(max, values[i]);
+ }
+ reader.Next();
+ }
+ } else { // All true values
+ for (int64_t i = 0; i < array.length(); i++) {
+ min = std::fmin(min, values[i]);
+ max = std::fmax(max, values[i]);
+ }
+ }
+
+ return {min, max};
+}
+
+template <typename ArrowType>
+void ValidateMinMax(const Array& array) {
+ using Traits = TypeTraits<ArrowType>;
+ using ScalarType = typename Traits::ScalarType;
+
+ ASSERT_OK_AND_ASSIGN(Datum out, MinMax(array));
+ const StructScalar& value = out.scalar_as<StructScalar>();
+
+ auto expected = NaiveMinMax<ArrowType>(array);
+ const auto& out_min = checked_cast<const ScalarType&>(*value.value[0]);
+ ASSERT_EQ(expected.first, out_min.value);
Review comment:
Thanks, added. Pls help to check again.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jianxind commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
jianxind commented on a change in pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#discussion_r477794871
##########
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##########
@@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) {
AssertDatumsEqual(explicit_defaults, no_options_provided);
}
+template <typename ArrowType>
+using MinMaxResult = std::pair<typename ArrowType::c_type, typename ArrowType::c_type>;
+
+template <typename ArrowType>
+static enable_if_integer<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+ const Array& array) {
+ using T = typename ArrowType::c_type;
+ using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+ const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+ const auto values = array_numeric.raw_values();
+
+ if (array.length() <= array.null_count()) { // All null values
+ return {static_cast<T>(0), static_cast<T>(0)};
+ }
+
+ T min = std::numeric_limits<T>::max();
+ T max = std::numeric_limits<T>::min();
+ if (array.null_count() != 0) { // Some values are null
+ internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+ array.length());
+ for (int64_t i = 0; i < array.length(); i++) {
+ if (reader.IsSet()) {
+ min = std::min(min, values[i]);
+ max = std::max(max, values[i]);
+ }
+ reader.Next();
+ }
+ } else { // All true values
+ for (int64_t i = 0; i < array.length(); i++) {
+ min = std::min(min, values[i]);
+ max = std::max(max, values[i]);
+ }
+ }
+
+ return {min, max};
+}
+
+template <typename ArrowType>
+static enable_if_floating_point<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+ const Array& array) {
+ using T = typename ArrowType::c_type;
+ using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+ const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+ const auto values = array_numeric.raw_values();
+
+ if (array.length() <= array.null_count()) { // All null values
+ return {static_cast<T>(0), static_cast<T>(0)};
+ }
+
+ T min = std::numeric_limits<T>::infinity();
+ T max = -std::numeric_limits<T>::infinity();
+ if (array.null_count() != 0) { // Some values are null
+ internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+ array.length());
+ for (int64_t i = 0; i < array.length(); i++) {
+ if (reader.IsSet()) {
+ min = std::fmin(min, values[i]);
+ max = std::fmax(max, values[i]);
+ }
+ reader.Next();
+ }
+ } else { // All true values
+ for (int64_t i = 0; i < array.length(); i++) {
+ min = std::fmin(min, values[i]);
+ max = std::fmax(max, values[i]);
+ }
+ }
+
+ return {min, max};
+}
+
+template <typename ArrowType>
+void ValidateMinMax(const Array& array) {
+ using Traits = TypeTraits<ArrowType>;
+ using ScalarType = typename Traits::ScalarType;
+
+ ASSERT_OK_AND_ASSIGN(Datum out, MinMax(array));
+ const StructScalar& value = out.scalar_as<StructScalar>();
+
+ auto expected = NaiveMinMax<ArrowType>(array);
+ const auto& out_min = checked_cast<const ScalarType&>(*value.value[0]);
+ ASSERT_EQ(expected.first, out_min.value);
Review comment:
Return null for both the expected test and MinMax.
if (array.length() <= array.null_count()) { // All null values
return {static_cast<T>(0), static_cast<T>(0)};
}
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jianxind commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
jianxind commented on a change in pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#discussion_r477809857
##########
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##########
@@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) {
AssertDatumsEqual(explicit_defaults, no_options_provided);
}
+template <typename ArrowType>
+using MinMaxResult = std::pair<typename ArrowType::c_type, typename ArrowType::c_type>;
+
+template <typename ArrowType>
+static enable_if_integer<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+ const Array& array) {
+ using T = typename ArrowType::c_type;
+ using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+ const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+ const auto values = array_numeric.raw_values();
+
+ if (array.length() <= array.null_count()) { // All null values
+ return {static_cast<T>(0), static_cast<T>(0)};
+ }
+
+ T min = std::numeric_limits<T>::max();
+ T max = std::numeric_limits<T>::min();
+ if (array.null_count() != 0) { // Some values are null
+ internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+ array.length());
+ for (int64_t i = 0; i < array.length(); i++) {
+ if (reader.IsSet()) {
+ min = std::min(min, values[i]);
+ max = std::max(max, values[i]);
+ }
+ reader.Next();
+ }
+ } else { // All true values
+ for (int64_t i = 0; i < array.length(); i++) {
+ min = std::min(min, values[i]);
+ max = std::max(max, values[i]);
+ }
+ }
+
+ return {min, max};
+}
+
+template <typename ArrowType>
+static enable_if_floating_point<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+ const Array& array) {
+ using T = typename ArrowType::c_type;
+ using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+ const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+ const auto values = array_numeric.raw_values();
+
+ if (array.length() <= array.null_count()) { // All null values
+ return {static_cast<T>(0), static_cast<T>(0)};
+ }
+
+ T min = std::numeric_limits<T>::infinity();
+ T max = -std::numeric_limits<T>::infinity();
+ if (array.null_count() != 0) { // Some values are null
+ internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+ array.length());
+ for (int64_t i = 0; i < array.length(); i++) {
+ if (reader.IsSet()) {
+ min = std::fmin(min, values[i]);
+ max = std::fmax(max, values[i]);
+ }
+ reader.Next();
+ }
+ } else { // All true values
+ for (int64_t i = 0; i < array.length(); i++) {
+ min = std::fmin(min, values[i]);
+ max = std::fmax(max, values[i]);
+ }
+ }
+
+ return {min, max};
+}
+
+template <typename ArrowType>
+void ValidateMinMax(const Array& array) {
+ using Traits = TypeTraits<ArrowType>;
+ using ScalarType = typename Traits::ScalarType;
+
+ ASSERT_OK_AND_ASSIGN(Datum out, MinMax(array));
+ const StructScalar& value = out.scalar_as<StructScalar>();
+
+ auto expected = NaiveMinMax<ArrowType>(array);
+ const auto& out_min = checked_cast<const ScalarType&>(*value.value[0]);
+ ASSERT_EQ(expected.first, out_min.value);
+
+ const auto& out_max = checked_cast<const ScalarType&>(*value.value[1]);
+ ASSERT_EQ(expected.second, out_max.value);
+}
+
+template <typename ArrowType>
+class TestRandomNumericMinMaxKernel : public ::testing::Test {};
+
+TYPED_TEST_SUITE(TestRandomNumericMinMaxKernel, NumericArrowTypes);
+TYPED_TEST(TestRandomNumericMinMaxKernel, RandomArrayMinMax) {
+ auto rand = random::RandomArrayGenerator(0x8afc055);
+ // Test size up to 1<<13 (8192).
Review comment:
The new implementation is based on BitBlockCounter, I am afraid 1024 size with 0.001(only one valid value) null_probability doesn't cover enough.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jianxind commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
jianxind commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-680420724
> @jianxind Sorry for the delay. Could you please rebase this PR? It looks like there are some conflicts now.
No problem at all. Rebased now. Thanks.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-679993093
@jianxind Sorry for the delay. Could you please rebase this PR? It looks like there are some conflicts now.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jianxind commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
jianxind commented on a change in pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#discussion_r480661786
##########
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##########
@@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) {
AssertDatumsEqual(explicit_defaults, no_options_provided);
}
+template <typename ArrowType>
+using MinMaxResult = std::pair<typename ArrowType::c_type, typename ArrowType::c_type>;
+
+template <typename ArrowType>
+static enable_if_integer<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+ const Array& array) {
+ using T = typename ArrowType::c_type;
+ using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+ const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+ const auto values = array_numeric.raw_values();
+
+ if (array.length() <= array.null_count()) { // All null values
+ return {static_cast<T>(0), static_cast<T>(0)};
+ }
+
+ T min = std::numeric_limits<T>::max();
+ T max = std::numeric_limits<T>::min();
+ if (array.null_count() != 0) { // Some values are null
+ internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+ array.length());
+ for (int64_t i = 0; i < array.length(); i++) {
+ if (reader.IsSet()) {
+ min = std::min(min, values[i]);
+ max = std::max(max, values[i]);
+ }
+ reader.Next();
+ }
+ } else { // All true values
+ for (int64_t i = 0; i < array.length(); i++) {
+ min = std::min(min, values[i]);
+ max = std::max(max, values[i]);
+ }
+ }
+
+ return {min, max};
+}
+
+template <typename ArrowType>
+static enable_if_floating_point<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+ const Array& array) {
+ using T = typename ArrowType::c_type;
+ using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+ const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+ const auto values = array_numeric.raw_values();
+
+ if (array.length() <= array.null_count()) { // All null values
+ return {static_cast<T>(0), static_cast<T>(0)};
+ }
+
+ T min = std::numeric_limits<T>::infinity();
+ T max = -std::numeric_limits<T>::infinity();
+ if (array.null_count() != 0) { // Some values are null
+ internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+ array.length());
+ for (int64_t i = 0; i < array.length(); i++) {
+ if (reader.IsSet()) {
+ min = std::fmin(min, values[i]);
+ max = std::fmax(max, values[i]);
+ }
+ reader.Next();
+ }
+ } else { // All true values
+ for (int64_t i = 0; i < array.length(); i++) {
+ min = std::fmin(min, values[i]);
+ max = std::fmax(max, values[i]);
+ }
+ }
+
+ return {min, max};
+}
+
+template <typename ArrowType>
+void ValidateMinMax(const Array& array) {
+ using Traits = TypeTraits<ArrowType>;
+ using ScalarType = typename Traits::ScalarType;
+
+ ASSERT_OK_AND_ASSIGN(Datum out, MinMax(array));
+ const StructScalar& value = out.scalar_as<StructScalar>();
+
+ auto expected = NaiveMinMax<ArrowType>(array);
+ const auto& out_min = checked_cast<const ScalarType&>(*value.value[0]);
+ ASSERT_EQ(expected.first, out_min.value);
Review comment:
Added a RandomNullArrayMinMax to cover 1.0 null_probability, thanks
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-685741244
Test failures are unrelated, will merge.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jianxind commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
jianxind commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-673186171
@ursabot benchmark --suite-filter=arrow-compute-aggregate-benchmark --benchmark-filter=MinMax
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-685637355
Ah, I also had {{-DARROW_SIMD_LEVEL=AVX2}} in CMake. Without it I do see a difference.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jianxind commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
jianxind commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-673185847
@ursabot benchmark --suite-filter=arrow-compute-aggregate-benchmark --benchmark_filter="MinMax"
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-673185853
```
no such option: --benchmark_filter
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org