You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/15 03:23:32 UTC
[GitHub] [arrow] cyb70289 opened a new pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers
cyb70289 opened a new pull request #8466:
URL: https://github.com/apache/arrow/pull/8466
Improve variance kernel performance for int32/16/8 by leveraging
textbook one pass algorithm and integer arithmetic.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers
Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#discussion_r507800242
##########
File path: cpp/src/arrow/compute/kernels/aggregate_var_std.cc
##########
@@ -57,6 +63,54 @@ struct VarStdState {
this->m2 = m2;
}
+ // int32/16/8: textbook one pass algorithm with integer arithmetic
+ template <typename T = ArrowType>
+ enable_if_t<is_integer_type<T>::value && (sizeof(CType) <= 4)> Consume(
+ const ArrayType& array) {
+ // max number of elements that sum will not overflow int64 (2Gi int32 elements)
+ // for uint32: 0 <= sum < 2^63 (int64 >= 0)
+ // for int32: -2^62 <= sum < 2^62
+ constexpr int64_t max_length = 1ULL << (63 - sizeof(CType) * 8);
+
+ int64_t start_index = 0;
+ int64_t valid_count = array.length() - array.null_count();
+
+ while (valid_count > 0) {
+ // process in chunks that overflow will never happen
+ const auto slice = array.Slice(start_index, max_length);
+ const int64_t count = slice->length() - slice->null_count();
+ start_index += max_length;
+ valid_count -= count;
+
+ if (count > 0) {
+ int64_t sum = 0;
+ int128_t square_sum = 0;
+ VisitArrayDataInline<ArrowType>(
+ *slice->data(),
+ [&sum, &square_sum](CType value) {
+ sum += value;
+ square_sum += static_cast<uint64_t>(value) * value;
+ },
+ []() {});
+
+ const double mean = static_cast<double>(sum) / count;
+ // calculate m2 = square_sum - sum * sum / count
+ // decompose `sum * sum / count` into integers and fractions
+ const int128_t sum_square = static_cast<int128_t>(sum) * sum;
+ const int128_t integers = sum_square / count;
+ const double fractions = static_cast<double>(sum_square % count) / count;
Review comment:
Hmm, yes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 commented on pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers
Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#issuecomment-709861866
Added int64 optimization. Updated benchmark result. Ready for review.
Big improvement for `int32`. Moderate improvement for `int64` with few null values.
Some drop for `int64` with many null values.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 commented on a change in pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers
Posted by GitBox <gi...@apache.org>.
cyb70289 commented on a change in pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#discussion_r508164101
##########
File path: cpp/src/arrow/compute/kernels/aggregate_var_std.cc
##########
@@ -57,6 +63,54 @@ struct VarStdState {
this->m2 = m2;
}
+ // int32/16/8: textbook one pass algorithm with integer arithmetic
+ template <typename T = ArrowType>
+ enable_if_t<is_integer_type<T>::value && (sizeof(CType) <= 4)> Consume(
+ const ArrayType& array) {
+ // max number of elements that sum will not overflow int64 (2Gi int32 elements)
+ // for uint32: 0 <= sum < 2^63 (int64 >= 0)
+ // for int32: -2^62 <= sum < 2^62
+ constexpr int64_t max_length = 1ULL << (63 - sizeof(CType) * 8);
+
+ int64_t start_index = 0;
+ int64_t valid_count = array.length() - array.null_count();
+
+ while (valid_count > 0) {
+ // process in chunks that overflow will never happen
+ const auto slice = array.Slice(start_index, max_length);
+ const int64_t count = slice->length() - slice->null_count();
+ start_index += max_length;
+ valid_count -= count;
+
+ if (count > 0) {
+ int64_t sum = 0;
+ int128_t square_sum = 0;
+ VisitArrayDataInline<ArrowType>(
+ *slice->data(),
+ [&sum, &square_sum](CType value) {
+ sum += value;
+ square_sum += static_cast<uint64_t>(value) * value;
+ },
+ []() {});
+
+ const double mean = static_cast<double>(sum) / count;
+ // calculate m2 = square_sum - sum * sum / count
+ // decompose `sum * sum / count` into integers and fractions
+ const int128_t sum_square = static_cast<int128_t>(sum) * sum;
+ const int128_t integers = sum_square / count;
+ const double fractions = static_cast<double>(sum_square % count) / count;
Review comment:
Looks compiler is able to figure out the pattern and generate same code, without unnecessary division.
https://godbolt.org/z/dWY9Ex
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 commented on pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers
Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#issuecomment-708913631
Will add 64bit integers optimization.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers
Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#discussion_r509381901
##########
File path: cpp/src/arrow/compute/kernels/aggregate_var_std.cc
##########
@@ -57,6 +63,54 @@ struct VarStdState {
this->m2 = m2;
}
+ // int32/16/8: textbook one pass algorithm with integer arithmetic
+ template <typename T = ArrowType>
+ enable_if_t<is_integer_type<T>::value && (sizeof(CType) <= 4)> Consume(
+ const ArrayType& array) {
+ // max number of elements that sum will not overflow int64 (2Gi int32 elements)
+ // for uint32: 0 <= sum < 2^63 (int64 >= 0)
+ // for int32: -2^62 <= sum < 2^62
+ constexpr int64_t max_length = 1ULL << (63 - sizeof(CType) * 8);
+
+ int64_t start_index = 0;
+ int64_t valid_count = array.length() - array.null_count();
+
+ while (valid_count > 0) {
+ // process in chunks that overflow will never happen
+ const auto slice = array.Slice(start_index, max_length);
+ const int64_t count = slice->length() - slice->null_count();
+ start_index += max_length;
+ valid_count -= count;
+
+ if (count > 0) {
+ int64_t sum = 0;
+ int128_t square_sum = 0;
+ VisitArrayDataInline<ArrowType>(
+ *slice->data(),
+ [&sum, &square_sum](CType value) {
+ sum += value;
+ square_sum += static_cast<uint64_t>(value) * value;
+ },
+ []() {});
+
+ const double mean = static_cast<double>(sum) / count;
+ // calculate m2 = square_sum - sum * sum / count
+ // decompose `sum * sum / count` into integers and fractions
+ const int128_t sum_square = static_cast<int128_t>(sum) * sum;
+ const int128_t integers = sum_square / count;
+ const double fractions = static_cast<double>(sum_square % count) / count;
Review comment:
Great, thank you.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 commented on pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers
Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#issuecomment-714173837
> I'm curious why Int64 would be faster than Double. Aren't they using the same algorithm? (and Int64 goes through an additional int-to-float conversion for each value)
There's no int-to-float conversion in Int64 summation loop (sum to Int128). It's faster than double summation.
https://quick-bench.com/q/-P9E6tgtXqnVBpVmmN6piaZHeUA
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers
Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#discussion_r507711304
##########
File path: cpp/src/arrow/compute/kernels/aggregate_var_std.cc
##########
@@ -57,6 +63,54 @@ struct VarStdState {
this->m2 = m2;
}
+ // int32/16/8: textbook one pass algorithm with integer arithmetic
+ template <typename T = ArrowType>
+ enable_if_t<is_integer_type<T>::value && (sizeof(CType) <= 4)> Consume(
+ const ArrayType& array) {
+ // max number of elements that sum will not overflow int64 (2Gi int32 elements)
+ // for uint32: 0 <= sum < 2^63 (int64 >= 0)
+ // for int32: -2^62 <= sum < 2^62
+ constexpr int64_t max_length = 1ULL << (63 - sizeof(CType) * 8);
+
+ int64_t start_index = 0;
+ int64_t valid_count = array.length() - array.null_count();
+
+ while (valid_count > 0) {
+ // process in chunks that overflow will never happen
+ const auto slice = array.Slice(start_index, max_length);
+ const int64_t count = slice->length() - slice->null_count();
+ start_index += max_length;
+ valid_count -= count;
+
+ if (count > 0) {
+ int64_t sum = 0;
+ int128_t square_sum = 0;
+ VisitArrayDataInline<ArrowType>(
+ *slice->data(),
+ [&sum, &square_sum](CType value) {
+ sum += value;
+ square_sum += static_cast<uint64_t>(value) * value;
+ },
+ []() {});
+
+ const double mean = static_cast<double>(sum) / count;
+ // calculate m2 = square_sum - sum * sum / count
+ // decompose `sum * sum / count` into integers and fractions
+ const int128_t sum_square = static_cast<int128_t>(sum) * sum;
+ const int128_t integers = sum_square / count;
+ const double fractions = static_cast<double>(sum_square % count) / count;
Review comment:
Or simply `sum_square - integers * count`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 commented on pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers
Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#issuecomment-708872803
**NOTE:** Benchmark PR https://github.com/apache/arrow/pull/8407 is not merged yet. Need to manually pull that PR to evaluation performance.
Tested on Xeon Gold 5218, clang-9.
```
benchmark baseline contender change %
8 VarianceKernelInt32/1048576/0 1.798 GiB/sec 6.742 GiB/sec 274.842 'null_percent': 0.0
18 VarianceKernelInt32/1048576/10000 1.742 GiB/sec 5.155 GiB/sec 195.990 'null_percent': 0.01
6 VarianceKernelInt32/1048576/100 1.135 GiB/sec 2.806 GiB/sec 147.285 'null_percent': 1.0
7 VarianceKernelInt32/1048576/10 858.921 MiB/sec 1.551 GiB/sec 84.901 'null_percent': 10.0
19 VarianceKernelInt32/1048576/2 456.209 MiB/sec 771.633 MiB/sec 69.140 'null_percent': 50.0
12 VarianceKernelFloat/1048576/2 396.262 MiB/sec 455.307 MiB/sec 14.900 'null_percent': 50.0
14 VarianceKernelFloat/1048576/1 1.095 TiB/sec 1.210 TiB/sec 10.563 null_percent': 100.0
3 VarianceKernelFloat/1048576/10 788.471 MiB/sec 857.623 MiB/sec 8.770 'null_percent': 10.0
15 VarianceKernelDouble/1048576/1 1.152 TiB/sec 1.213 TiB/sec 5.312 null_percent': 100.0
1 VarianceKernelInt32/1048576/1 1.200 TiB/sec 1.259 TiB/sec 4.901 null_percent': 100.0
23 VarianceKernelFloat/1048576/100 1.129 GiB/sec 1.134 GiB/sec 0.371 'null_percent': 1.0
4 VarianceKernelDouble/1048576/0 3.580 GiB/sec 3.587 GiB/sec 0.195 'null_percent': 0.0
20 VarianceKernelFloat/1048576/0 1.798 GiB/sec 1.800 GiB/sec 0.090 'null_percent': 0.0
10 VarianceKernelFloat/1048576/10000 1.743 GiB/sec 1.743 GiB/sec 0.006 'null_percent': 0.01
2 VarianceKernelInt64/1048576/10000 3.468 GiB/sec 3.468 GiB/sec -0.005 'null_percent': 0.01
17 VarianceKernelInt64/1048576/0 3.586 GiB/sec 3.584 GiB/sec -0.050 'null_percent': 0.0
0 VarianceKernelDouble/1048576/10000 3.476 GiB/sec 3.474 GiB/sec -0.066 'null_percent': 0.01
11 VarianceKernelDouble/1048576/100 2.276 GiB/sec 2.274 GiB/sec -0.122 'null_percent': 1.0
5 VarianceKernelInt64/1048576/100 2.274 GiB/sec 2.269 GiB/sec -0.227 'null_percent': 1.0
22 VarianceKernelDouble/1048576/10 1.652 GiB/sec 1.604 GiB/sec -2.856 'null_percent': 10.0
21 VarianceKernelInt64/1048576/1 1.242 TiB/sec 1.206 TiB/sec -2.919 null_percent': 100.0
13 VarianceKernelDouble/1048576/2 864.392 MiB/sec 832.143 MiB/sec -3.731 'null_percent': 50.0
16 VarianceKernelInt64/1048576/10 1.681 GiB/sec 1.546 GiB/sec -8.006 'null_percent': 10.0
9 VarianceKernelInt64/1048576/2 910.548 MiB/sec 786.507 MiB/sec -13.623 'null_percent': 50.0
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou closed pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers
Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #8466:
URL: https://github.com/apache/arrow/pull/8466
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 commented on a change in pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers
Posted by GitBox <gi...@apache.org>.
cyb70289 commented on a change in pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#discussion_r507784895
##########
File path: cpp/src/arrow/compute/kernels/aggregate_var_std.cc
##########
@@ -57,6 +63,54 @@ struct VarStdState {
this->m2 = m2;
}
+ // int32/16/8: textbook one pass algorithm with integer arithmetic
+ template <typename T = ArrowType>
+ enable_if_t<is_integer_type<T>::value && (sizeof(CType) <= 4)> Consume(
+ const ArrayType& array) {
+ // max number of elements that sum will not overflow int64 (2Gi int32 elements)
+ // for uint32: 0 <= sum < 2^63 (int64 >= 0)
+ // for int32: -2^62 <= sum < 2^62
+ constexpr int64_t max_length = 1ULL << (63 - sizeof(CType) * 8);
+
+ int64_t start_index = 0;
+ int64_t valid_count = array.length() - array.null_count();
+
+ while (valid_count > 0) {
+ // process in chunks that overflow will never happen
+ const auto slice = array.Slice(start_index, max_length);
+ const int64_t count = slice->length() - slice->null_count();
+ start_index += max_length;
+ valid_count -= count;
+
+ if (count > 0) {
+ int64_t sum = 0;
+ int128_t square_sum = 0;
+ VisitArrayDataInline<ArrowType>(
+ *slice->data(),
+ [&sum, &square_sum](CType value) {
+ sum += value;
+ square_sum += static_cast<uint64_t>(value) * value;
+ },
+ []() {});
+
+ const double mean = static_cast<double>(sum) / count;
+ // calculate m2 = square_sum - sum * sum / count
+ // decompose `sum * sum / count` into integers and fractions
+ const int128_t sum_square = static_cast<int128_t>(sum) * sum;
+ const int128_t integers = sum_square / count;
+ const double fractions = static_cast<double>(sum_square % count) / count;
Review comment:
`fractions` will be a floating number less than 1. Do you mean changing this line to
`const double fractions = static_cast<double>(sum_square - integers * count) / count;` ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 edited a comment on pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers
Posted by GitBox <gi...@apache.org>.
cyb70289 edited a comment on pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#issuecomment-708913631
Turn to draft. Will add 64bit integers optimization.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers
Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#discussion_r507711886
##########
File path: cpp/src/arrow/compute/kernels/aggregate_var_std.cc
##########
@@ -57,6 +63,54 @@ struct VarStdState {
this->m2 = m2;
}
+ // int32/16/8: textbook one pass algorithm with integer arithmetic
+ template <typename T = ArrowType>
+ enable_if_t<is_integer_type<T>::value && (sizeof(CType) <= 4)> Consume(
+ const ArrayType& array) {
+ // max number of elements that sum will not overflow int64 (2Gi int32 elements)
+ // for uint32: 0 <= sum < 2^63 (int64 >= 0)
+ // for int32: -2^62 <= sum < 2^62
+ constexpr int64_t max_length = 1ULL << (63 - sizeof(CType) * 8);
+
+ int64_t start_index = 0;
+ int64_t valid_count = array.length() - array.null_count();
+
+ while (valid_count > 0) {
+ // process in chunks that overflow will never happen
+ const auto slice = array.Slice(start_index, max_length);
+ const int64_t count = slice->length() - slice->null_count();
+ start_index += max_length;
+ valid_count -= count;
+
+ if (count > 0) {
+ int64_t sum = 0;
+ int128_t square_sum = 0;
+ VisitArrayDataInline<ArrowType>(
+ *slice->data(),
+ [&sum, &square_sum](CType value) {
+ sum += value;
+ square_sum += static_cast<uint64_t>(value) * value;
+ },
+ []() {});
+
+ const double mean = static_cast<double>(sum) / count;
+ // calculate m2 = square_sum - sum * sum / count
+ // decompose `sum * sum / count` into integers and fractions
+ const int128_t sum_square = static_cast<int128_t>(sum) * sum;
+ const int128_t integers = sum_square / count;
+ const double fractions = static_cast<double>(sum_square % count) / count;
Review comment:
Or simply `sum_square - integers * count`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#issuecomment-713661111
Results on an AMD Zen 2 CPU:
```
VarianceKernelInt32/1048576/10000 140 us 140 us 5030 bytes_per_second=6.98658G/s null_percent=0.01 size=1048.58k
VarianceKernelInt32/1048576/100 216 us 216 us 3267 bytes_per_second=4.5294G/s null_percent=1 size=1048.58k
VarianceKernelInt32/1048576/10 397 us 397 us 1763 bytes_per_second=2.45765G/s null_percent=10 size=1048.58k
VarianceKernelInt32/1048576/2 974 us 974 us 718 bytes_per_second=1026.87M/s null_percent=50 size=1048.58k
VarianceKernelInt32/1048576/1 0.816 us 0.816 us 844145 bytes_per_second=1.1684T/s null_percent=100 size=1048.58k
VarianceKernelInt32/1048576/0 130 us 130 us 5414 bytes_per_second=7.51569G/s null_percent=0 size=1048.58k
VarianceKernelInt64/1048576/10000 135 us 135 us 5174 bytes_per_second=7.22877G/s null_percent=0.01 size=1048.58k
VarianceKernelInt64/1048576/100 260 us 260 us 2682 bytes_per_second=3.7503G/s null_percent=1 size=1048.58k
VarianceKernelInt64/1048576/10 440 us 440 us 1591 bytes_per_second=2.21931G/s null_percent=10 size=1048.58k
VarianceKernelInt64/1048576/2 884 us 884 us 783 bytes_per_second=1.10507G/s null_percent=50 size=1048.58k
VarianceKernelInt64/1048576/1 0.821 us 0.821 us 840316 bytes_per_second=1.16182T/s null_percent=100 size=1048.58k
VarianceKernelInt64/1048576/0 123 us 123 us 5620 bytes_per_second=7.94262G/s null_percent=0 size=1048.58k
VarianceKernelFloat/1048576/10000 366 us 366 us 1909 bytes_per_second=2.66576G/s null_percent=0.01 size=1048.58k
VarianceKernelFloat/1048576/100 751 us 751 us 909 bytes_per_second=1.3003G/s null_percent=1 size=1048.58k
VarianceKernelFloat/1048576/10 1097 us 1097 us 637 bytes_per_second=911.712M/s null_percent=10 size=1048.58k
VarianceKernelFloat/1048576/2 1803 us 1802 us 387 bytes_per_second=554.854M/s null_percent=50 size=1048.58k
VarianceKernelFloat/1048576/1 0.817 us 0.817 us 838993 bytes_per_second=1.1679T/s null_percent=100 size=1048.58k
VarianceKernelFloat/1048576/0 346 us 346 us 2021 bytes_per_second=2.82409G/s null_percent=0 size=1048.58k
VarianceKernelDouble/1048576/10000 184 us 184 us 3751 bytes_per_second=5.30153G/s null_percent=0.01 size=1048.58k
VarianceKernelDouble/1048576/100 372 us 372 us 1869 bytes_per_second=2.62218G/s null_percent=1 size=1048.58k
VarianceKernelDouble/1048576/10 549 us 549 us 1249 bytes_per_second=1.77993G/s null_percent=10 size=1048.58k
VarianceKernelDouble/1048576/2 909 us 909 us 741 bytes_per_second=1099.92M/s null_percent=50 size=1048.58k
VarianceKernelDouble/1048576/1 0.831 us 0.831 us 831173 bytes_per_second=1.14779T/s null_percent=100 size=1048.58k
VarianceKernelDouble/1048576/0 174 us 174 us 4050 bytes_per_second=5.62431G/s null_percent=0 size=1048.58k
```
I'm curious why Int64 would be faster than Double. Aren't they using the same algorithm? (and Int64 goes through an additional int-to-float conversion for each value)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers
Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#discussion_r507711886
##########
File path: cpp/src/arrow/compute/kernels/aggregate_var_std.cc
##########
@@ -57,6 +63,54 @@ struct VarStdState {
this->m2 = m2;
}
+ // int32/16/8: textbook one pass algorithm with integer arithmetic
+ template <typename T = ArrowType>
+ enable_if_t<is_integer_type<T>::value && (sizeof(CType) <= 4)> Consume(
+ const ArrayType& array) {
+ // max number of elements that sum will not overflow int64 (2Gi int32 elements)
+ // for uint32: 0 <= sum < 2^63 (int64 >= 0)
+ // for int32: -2^62 <= sum < 2^62
+ constexpr int64_t max_length = 1ULL << (63 - sizeof(CType) * 8);
+
+ int64_t start_index = 0;
+ int64_t valid_count = array.length() - array.null_count();
+
+ while (valid_count > 0) {
+ // process in chunks that overflow will never happen
+ const auto slice = array.Slice(start_index, max_length);
+ const int64_t count = slice->length() - slice->null_count();
+ start_index += max_length;
+ valid_count -= count;
+
+ if (count > 0) {
+ int64_t sum = 0;
+ int128_t square_sum = 0;
+ VisitArrayDataInline<ArrowType>(
+ *slice->data(),
+ [&sum, &square_sum](CType value) {
+ sum += value;
+ square_sum += static_cast<uint64_t>(value) * value;
+ },
+ []() {});
+
+ const double mean = static_cast<double>(sum) / count;
+ // calculate m2 = square_sum - sum * sum / count
+ // decompose `sum * sum / count` into integers and fractions
+ const int128_t sum_square = static_cast<int128_t>(sum) * sum;
+ const int128_t integers = sum_square / count;
+ const double fractions = static_cast<double>(sum_square % count) / count;
Review comment:
Or simply `sum_square - integers * count`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 edited a comment on pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers
Posted by GitBox <gi...@apache.org>.
cyb70289 edited a comment on pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#issuecomment-714173837
> I'm curious why Int64 would be faster than Double. Aren't they using the same algorithm? (and Int64 goes through an additional int-to-float conversion for each value)
There's no int-to-float conversion in Int64 [summation loop](https://github.com/apache/arrow/pull/8466/files#diff-dedd9eac8bca7dd5bd0dd5951eab6daa717efb56d8b5d2adc49288c88216a62cR45-R50) (sum to Int128). It's faster than double summation.
https://quick-bench.com/q/-P9E6tgtXqnVBpVmmN6piaZHeUA
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#issuecomment-714320980
Ah, I hadn't noticed the `SumType`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#issuecomment-708872598
https://issues.apache.org/jira/browse/ARROW-10304
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 edited a comment on pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers
Posted by GitBox <gi...@apache.org>.
cyb70289 edited a comment on pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#issuecomment-708872803
**NOTE:** Benchmark PR https://github.com/apache/arrow/pull/8407 is not merged yet. Need to manually pull that PR to evaluation performance.
Tested on Xeon Gold 5218, clang-9.
```
benchmark baseline contender change %
7 VarianceKernelInt32/1048576/0 1.805 GiB/sec 6.896 GiB/sec 282.033 'null_percent': 0.0
8 VarianceKernelInt32/1048576/10000 1.751 GiB/sec 5.113 GiB/sec 191.898 'null_percent': 0.01
16 VarianceKernelInt32/1048576/100 1.139 GiB/sec 2.748 GiB/sec 141.206 'null_percent': 1.0
18 VarianceKernelInt32/1048576/10 862.260 MiB/sec 1.547 GiB/sec 83.714 'null_percent': 10.0
1 VarianceKernelInt32/1048576/2 457.956 MiB/sec 769.333 MiB/sec 67.993 'null_percent': 50.0
9 VarianceKernelInt64/1048576/0 3.596 GiB/sec 4.949 GiB/sec 37.601 'null_percent': 0.0
5 VarianceKernelInt64/1048576/10000 3.484 GiB/sec 4.627 GiB/sec 32.832 'null_percent': 0.01
22 VarianceKernelInt64/1048576/100 2.285 GiB/sec 2.787 GiB/sec 21.955 'null_percent': 1.0
3 VarianceKernelFloat/1048576/2 397.394 MiB/sec 454.485 MiB/sec 14.366 'null_percent': 50.0
10 VarianceKernelFloat/1048576/10 790.793 MiB/sec 854.667 MiB/sec 8.077 'null_percent': 10.0
0 VarianceKernelInt64/1048576/1 1.220 TiB/sec 1.261 TiB/sec 3.353 null_percent': 100.0
14 VarianceKernelInt32/1048576/1 1.222 TiB/sec 1.254 TiB/sec 2.647 null_percent': 100.0
21 VarianceKernelDouble/1048576/1 1.206 TiB/sec 1.235 TiB/sec 2.379 null_percent': 100.0
17 VarianceKernelFloat/1048576/1 1.180 TiB/sec 1.206 TiB/sec 2.208 null_percent': 100.0
4 VarianceKernelDouble/1048576/10000 3.485 GiB/sec 3.475 GiB/sec -0.277 'null_percent': 0.01
23 VarianceKernelDouble/1048576/0 3.595 GiB/sec 3.575 GiB/sec -0.557 'null_percent': 0.0
2 VarianceKernelFloat/1048576/100 1.133 GiB/sec 1.126 GiB/sec -0.632 'null_percent': 1.0
13 VarianceKernelFloat/1048576/0 1.804 GiB/sec 1.792 GiB/sec -0.643 'null_percent': 0.0
20 VarianceKernelFloat/1048576/10000 1.750 GiB/sec 1.739 GiB/sec -0.677 'null_percent': 0.01
11 VarianceKernelDouble/1048576/100 2.287 GiB/sec 2.262 GiB/sec -1.092 'null_percent': 1.0
19 VarianceKernelDouble/1048576/2 866.210 MiB/sec 836.046 MiB/sec -3.482 'null_percent': 50.0
6 VarianceKernelDouble/1048576/10 1.658 GiB/sec 1.597 GiB/sec -3.672 'null_percent': 10.0
15 VarianceKernelInt64/1048576/10 1.687 GiB/sec 1.564 GiB/sec -7.304 'null_percent': 10.0
12 VarianceKernelInt64/1048576/2 914.036 MiB/sec 789.209 MiB/sec -13.657 'null_percent': 50.0
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org