You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/15 03:23:32 UTC

[GitHub] [arrow] cyb70289 opened a new pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers

cyb70289 opened a new pull request #8466:
URL: https://github.com/apache/arrow/pull/8466


   Improve variance kernel performance for int32/16/8 by leveraging
   textbook one pass algorithm and integer arithmetic.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#discussion_r507800242



##########
File path: cpp/src/arrow/compute/kernels/aggregate_var_std.cc
##########
@@ -57,6 +63,54 @@ struct VarStdState {
     this->m2 = m2;
   }
 
+  // int32/16/8: textbook one pass algorithm with integer arithmetic
+  template <typename T = ArrowType>
+  enable_if_t<is_integer_type<T>::value && (sizeof(CType) <= 4)> Consume(
+      const ArrayType& array) {
+    // max number of elements that sum will not overflow int64 (2Gi int32 elements)
+    // for uint32:    0 <= sum < 2^63 (int64 >= 0)
+    // for int32: -2^62 <= sum < 2^62
+    constexpr int64_t max_length = 1ULL << (63 - sizeof(CType) * 8);
+
+    int64_t start_index = 0;
+    int64_t valid_count = array.length() - array.null_count();
+
+    while (valid_count > 0) {
+      // process in chunks that overflow will never happen
+      const auto slice = array.Slice(start_index, max_length);
+      const int64_t count = slice->length() - slice->null_count();
+      start_index += max_length;
+      valid_count -= count;
+
+      if (count > 0) {
+        int64_t sum = 0;
+        int128_t square_sum = 0;
+        VisitArrayDataInline<ArrowType>(
+            *slice->data(),
+            [&sum, &square_sum](CType value) {
+              sum += value;
+              square_sum += static_cast<uint64_t>(value) * value;
+            },
+            []() {});
+
+        const double mean = static_cast<double>(sum) / count;
+        // calculate m2 = square_sum - sum * sum / count
+        // decompose `sum * sum / count` into integers and fractions
+        const int128_t sum_square = static_cast<int128_t>(sum) * sum;
+        const int128_t integers = sum_square / count;
+        const double fractions = static_cast<double>(sum_square % count) / count;

Review comment:
       Hmm, yes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#issuecomment-709861866


   Added int64 optimization. Updated benchmark result. Ready for review.
   Big improvement for `int32`. Moderate improvement for `int64` with few null values.
   Some drop for `int64` with many null values.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on a change in pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on a change in pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#discussion_r508164101



##########
File path: cpp/src/arrow/compute/kernels/aggregate_var_std.cc
##########
@@ -57,6 +63,54 @@ struct VarStdState {
     this->m2 = m2;
   }
 
+  // int32/16/8: textbook one pass algorithm with integer arithmetic
+  template <typename T = ArrowType>
+  enable_if_t<is_integer_type<T>::value && (sizeof(CType) <= 4)> Consume(
+      const ArrayType& array) {
+    // max number of elements that sum will not overflow int64 (2Gi int32 elements)
+    // for uint32:    0 <= sum < 2^63 (int64 >= 0)
+    // for int32: -2^62 <= sum < 2^62
+    constexpr int64_t max_length = 1ULL << (63 - sizeof(CType) * 8);
+
+    int64_t start_index = 0;
+    int64_t valid_count = array.length() - array.null_count();
+
+    while (valid_count > 0) {
+      // process in chunks that overflow will never happen
+      const auto slice = array.Slice(start_index, max_length);
+      const int64_t count = slice->length() - slice->null_count();
+      start_index += max_length;
+      valid_count -= count;
+
+      if (count > 0) {
+        int64_t sum = 0;
+        int128_t square_sum = 0;
+        VisitArrayDataInline<ArrowType>(
+            *slice->data(),
+            [&sum, &square_sum](CType value) {
+              sum += value;
+              square_sum += static_cast<uint64_t>(value) * value;
+            },
+            []() {});
+
+        const double mean = static_cast<double>(sum) / count;
+        // calculate m2 = square_sum - sum * sum / count
+        // decompose `sum * sum / count` into integers and fractions
+        const int128_t sum_square = static_cast<int128_t>(sum) * sum;
+        const int128_t integers = sum_square / count;
+        const double fractions = static_cast<double>(sum_square % count) / count;

Review comment:
       Looks compiler is able to figure out the pattern and generate same code, without unnecessary division.
   https://godbolt.org/z/dWY9Ex




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#issuecomment-708913631


   Will add 64bit integers optimization.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#discussion_r509381901



##########
File path: cpp/src/arrow/compute/kernels/aggregate_var_std.cc
##########
@@ -57,6 +63,54 @@ struct VarStdState {
     this->m2 = m2;
   }
 
+  // int32/16/8: textbook one pass algorithm with integer arithmetic
+  template <typename T = ArrowType>
+  enable_if_t<is_integer_type<T>::value && (sizeof(CType) <= 4)> Consume(
+      const ArrayType& array) {
+    // max number of elements that sum will not overflow int64 (2Gi int32 elements)
+    // for uint32:    0 <= sum < 2^63 (int64 >= 0)
+    // for int32: -2^62 <= sum < 2^62
+    constexpr int64_t max_length = 1ULL << (63 - sizeof(CType) * 8);
+
+    int64_t start_index = 0;
+    int64_t valid_count = array.length() - array.null_count();
+
+    while (valid_count > 0) {
+      // process in chunks that overflow will never happen
+      const auto slice = array.Slice(start_index, max_length);
+      const int64_t count = slice->length() - slice->null_count();
+      start_index += max_length;
+      valid_count -= count;
+
+      if (count > 0) {
+        int64_t sum = 0;
+        int128_t square_sum = 0;
+        VisitArrayDataInline<ArrowType>(
+            *slice->data(),
+            [&sum, &square_sum](CType value) {
+              sum += value;
+              square_sum += static_cast<uint64_t>(value) * value;
+            },
+            []() {});
+
+        const double mean = static_cast<double>(sum) / count;
+        // calculate m2 = square_sum - sum * sum / count
+        // decompose `sum * sum / count` into integers and fractions
+        const int128_t sum_square = static_cast<int128_t>(sum) * sum;
+        const int128_t integers = sum_square / count;
+        const double fractions = static_cast<double>(sum_square % count) / count;

Review comment:
       Great, thank you.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#issuecomment-714173837


   > I'm curious why Int64 would be faster than Double. Aren't they using the same algorithm? (and Int64 goes through an additional int-to-float conversion for each value)
   
   There's no int-to-float conversion in Int64 summation loop (sum to Int128). It's faster than double summation.
   https://quick-bench.com/q/-P9E6tgtXqnVBpVmmN6piaZHeUA


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#discussion_r507711304



##########
File path: cpp/src/arrow/compute/kernels/aggregate_var_std.cc
##########
@@ -57,6 +63,54 @@ struct VarStdState {
     this->m2 = m2;
   }
 
+  // int32/16/8: textbook one pass algorithm with integer arithmetic
+  template <typename T = ArrowType>
+  enable_if_t<is_integer_type<T>::value && (sizeof(CType) <= 4)> Consume(
+      const ArrayType& array) {
+    // max number of elements that sum will not overflow int64 (2Gi int32 elements)
+    // for uint32:    0 <= sum < 2^63 (int64 >= 0)
+    // for int32: -2^62 <= sum < 2^62
+    constexpr int64_t max_length = 1ULL << (63 - sizeof(CType) * 8);
+
+    int64_t start_index = 0;
+    int64_t valid_count = array.length() - array.null_count();
+
+    while (valid_count > 0) {
+      // process in chunks that overflow will never happen
+      const auto slice = array.Slice(start_index, max_length);
+      const int64_t count = slice->length() - slice->null_count();
+      start_index += max_length;
+      valid_count -= count;
+
+      if (count > 0) {
+        int64_t sum = 0;
+        int128_t square_sum = 0;
+        VisitArrayDataInline<ArrowType>(
+            *slice->data(),
+            [&sum, &square_sum](CType value) {
+              sum += value;
+              square_sum += static_cast<uint64_t>(value) * value;
+            },
+            []() {});
+
+        const double mean = static_cast<double>(sum) / count;
+        // calculate m2 = square_sum - sum * sum / count
+        // decompose `sum * sum / count` into integers and fractions
+        const int128_t sum_square = static_cast<int128_t>(sum) * sum;
+        const int128_t integers = sum_square / count;
+        const double fractions = static_cast<double>(sum_square % count) / count;

Review comment:
       Or simply `sum_square - integers * count`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#issuecomment-708872803


   **NOTE:** Benchmark PR https://github.com/apache/arrow/pull/8407 is not merged yet. Need to manually pull that PR to evaluation performance.
   
   Tested on Xeon Gold 5218, clang-9.
   
   ```
                                benchmark         baseline        contender  change %
   8        VarianceKernelInt32/1048576/0    1.798 GiB/sec    6.742 GiB/sec   274.842   'null_percent': 0.0
   18   VarianceKernelInt32/1048576/10000    1.742 GiB/sec    5.155 GiB/sec   195.990  'null_percent': 0.01
   6      VarianceKernelInt32/1048576/100    1.135 GiB/sec    2.806 GiB/sec   147.285   'null_percent': 1.0
   7       VarianceKernelInt32/1048576/10  858.921 MiB/sec    1.551 GiB/sec    84.901  'null_percent': 10.0
   19       VarianceKernelInt32/1048576/2  456.209 MiB/sec  771.633 MiB/sec    69.140  'null_percent': 50.0
   12       VarianceKernelFloat/1048576/2  396.262 MiB/sec  455.307 MiB/sec    14.900  'null_percent': 50.0
   14       VarianceKernelFloat/1048576/1    1.095 TiB/sec    1.210 TiB/sec    10.563  null_percent': 100.0
   3       VarianceKernelFloat/1048576/10  788.471 MiB/sec  857.623 MiB/sec     8.770  'null_percent': 10.0
   15      VarianceKernelDouble/1048576/1    1.152 TiB/sec    1.213 TiB/sec     5.312  null_percent': 100.0
   1        VarianceKernelInt32/1048576/1    1.200 TiB/sec    1.259 TiB/sec     4.901  null_percent': 100.0
   23     VarianceKernelFloat/1048576/100    1.129 GiB/sec    1.134 GiB/sec     0.371   'null_percent': 1.0
   4       VarianceKernelDouble/1048576/0    3.580 GiB/sec    3.587 GiB/sec     0.195   'null_percent': 0.0
   20       VarianceKernelFloat/1048576/0    1.798 GiB/sec    1.800 GiB/sec     0.090   'null_percent': 0.0
   10   VarianceKernelFloat/1048576/10000    1.743 GiB/sec    1.743 GiB/sec     0.006  'null_percent': 0.01
   2    VarianceKernelInt64/1048576/10000    3.468 GiB/sec    3.468 GiB/sec    -0.005  'null_percent': 0.01
   17       VarianceKernelInt64/1048576/0    3.586 GiB/sec    3.584 GiB/sec    -0.050   'null_percent': 0.0
   0   VarianceKernelDouble/1048576/10000    3.476 GiB/sec    3.474 GiB/sec    -0.066  'null_percent': 0.01
   11    VarianceKernelDouble/1048576/100    2.276 GiB/sec    2.274 GiB/sec    -0.122   'null_percent': 1.0
   5      VarianceKernelInt64/1048576/100    2.274 GiB/sec    2.269 GiB/sec    -0.227   'null_percent': 1.0
   22     VarianceKernelDouble/1048576/10    1.652 GiB/sec    1.604 GiB/sec    -2.856  'null_percent': 10.0
   21       VarianceKernelInt64/1048576/1    1.242 TiB/sec    1.206 TiB/sec    -2.919  null_percent': 100.0
   13      VarianceKernelDouble/1048576/2  864.392 MiB/sec  832.143 MiB/sec    -3.731  'null_percent': 50.0
   16      VarianceKernelInt64/1048576/10    1.681 GiB/sec    1.546 GiB/sec    -8.006  'null_percent': 10.0
   9        VarianceKernelInt64/1048576/2  910.548 MiB/sec  786.507 MiB/sec   -13.623  'null_percent': 50.0
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #8466:
URL: https://github.com/apache/arrow/pull/8466


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on a change in pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on a change in pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#discussion_r507784895



##########
File path: cpp/src/arrow/compute/kernels/aggregate_var_std.cc
##########
@@ -57,6 +63,54 @@ struct VarStdState {
     this->m2 = m2;
   }
 
+  // int32/16/8: textbook one pass algorithm with integer arithmetic
+  template <typename T = ArrowType>
+  enable_if_t<is_integer_type<T>::value && (sizeof(CType) <= 4)> Consume(
+      const ArrayType& array) {
+    // max number of elements that sum will not overflow int64 (2Gi int32 elements)
+    // for uint32:    0 <= sum < 2^63 (int64 >= 0)
+    // for int32: -2^62 <= sum < 2^62
+    constexpr int64_t max_length = 1ULL << (63 - sizeof(CType) * 8);
+
+    int64_t start_index = 0;
+    int64_t valid_count = array.length() - array.null_count();
+
+    while (valid_count > 0) {
+      // process in chunks that overflow will never happen
+      const auto slice = array.Slice(start_index, max_length);
+      const int64_t count = slice->length() - slice->null_count();
+      start_index += max_length;
+      valid_count -= count;
+
+      if (count > 0) {
+        int64_t sum = 0;
+        int128_t square_sum = 0;
+        VisitArrayDataInline<ArrowType>(
+            *slice->data(),
+            [&sum, &square_sum](CType value) {
+              sum += value;
+              square_sum += static_cast<uint64_t>(value) * value;
+            },
+            []() {});
+
+        const double mean = static_cast<double>(sum) / count;
+        // calculate m2 = square_sum - sum * sum / count
+        // decompose `sum * sum / count` into integers and fractions
+        const int128_t sum_square = static_cast<int128_t>(sum) * sum;
+        const int128_t integers = sum_square / count;
+        const double fractions = static_cast<double>(sum_square % count) / count;

Review comment:
       `fractions` will be a floating number less than 1. Do you mean changing this line to
   `const double fractions = static_cast<double>(sum_square - integers * count) / count;` ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 edited a comment on pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers

Posted by GitBox <gi...@apache.org>.
cyb70289 edited a comment on pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#issuecomment-708913631


   Turn to draft. Will add 64bit integers optimization.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#discussion_r507711886



##########
File path: cpp/src/arrow/compute/kernels/aggregate_var_std.cc
##########
@@ -57,6 +63,54 @@ struct VarStdState {
     this->m2 = m2;
   }
 
+  // int32/16/8: textbook one pass algorithm with integer arithmetic
+  template <typename T = ArrowType>
+  enable_if_t<is_integer_type<T>::value && (sizeof(CType) <= 4)> Consume(
+      const ArrayType& array) {
+    // max number of elements that sum will not overflow int64 (2Gi int32 elements)
+    // for uint32:    0 <= sum < 2^63 (int64 >= 0)
+    // for int32: -2^62 <= sum < 2^62
+    constexpr int64_t max_length = 1ULL << (63 - sizeof(CType) * 8);
+
+    int64_t start_index = 0;
+    int64_t valid_count = array.length() - array.null_count();
+
+    while (valid_count > 0) {
+      // process in chunks that overflow will never happen
+      const auto slice = array.Slice(start_index, max_length);
+      const int64_t count = slice->length() - slice->null_count();
+      start_index += max_length;
+      valid_count -= count;
+
+      if (count > 0) {
+        int64_t sum = 0;
+        int128_t square_sum = 0;
+        VisitArrayDataInline<ArrowType>(
+            *slice->data(),
+            [&sum, &square_sum](CType value) {
+              sum += value;
+              square_sum += static_cast<uint64_t>(value) * value;
+            },
+            []() {});
+
+        const double mean = static_cast<double>(sum) / count;
+        // calculate m2 = square_sum - sum * sum / count
+        // decompose `sum * sum / count` into integers and fractions
+        const int128_t sum_square = static_cast<int128_t>(sum) * sum;
+        const int128_t integers = sum_square / count;
+        const double fractions = static_cast<double>(sum_square % count) / count;

Review comment:
       Or simply `sum_square - integers * count`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#issuecomment-713661111


   Results on an AMD Zen 2 CPU:
   ```
   VarianceKernelInt32/1048576/10000         140 us          140 us         5030 bytes_per_second=6.98658G/s null_percent=0.01 size=1048.58k
   VarianceKernelInt32/1048576/100           216 us          216 us         3267 bytes_per_second=4.5294G/s null_percent=1 size=1048.58k
   VarianceKernelInt32/1048576/10            397 us          397 us         1763 bytes_per_second=2.45765G/s null_percent=10 size=1048.58k
   VarianceKernelInt32/1048576/2             974 us          974 us          718 bytes_per_second=1026.87M/s null_percent=50 size=1048.58k
   VarianceKernelInt32/1048576/1           0.816 us        0.816 us       844145 bytes_per_second=1.1684T/s null_percent=100 size=1048.58k
   VarianceKernelInt32/1048576/0             130 us          130 us         5414 bytes_per_second=7.51569G/s null_percent=0 size=1048.58k
   
   VarianceKernelInt64/1048576/10000         135 us          135 us         5174 bytes_per_second=7.22877G/s null_percent=0.01 size=1048.58k
   VarianceKernelInt64/1048576/100           260 us          260 us         2682 bytes_per_second=3.7503G/s null_percent=1 size=1048.58k
   VarianceKernelInt64/1048576/10            440 us          440 us         1591 bytes_per_second=2.21931G/s null_percent=10 size=1048.58k
   VarianceKernelInt64/1048576/2             884 us          884 us          783 bytes_per_second=1.10507G/s null_percent=50 size=1048.58k
   VarianceKernelInt64/1048576/1           0.821 us        0.821 us       840316 bytes_per_second=1.16182T/s null_percent=100 size=1048.58k
   VarianceKernelInt64/1048576/0             123 us          123 us         5620 bytes_per_second=7.94262G/s null_percent=0 size=1048.58k
   
   VarianceKernelFloat/1048576/10000         366 us          366 us         1909 bytes_per_second=2.66576G/s null_percent=0.01 size=1048.58k
   VarianceKernelFloat/1048576/100           751 us          751 us          909 bytes_per_second=1.3003G/s null_percent=1 size=1048.58k
   VarianceKernelFloat/1048576/10           1097 us         1097 us          637 bytes_per_second=911.712M/s null_percent=10 size=1048.58k
   VarianceKernelFloat/1048576/2            1803 us         1802 us          387 bytes_per_second=554.854M/s null_percent=50 size=1048.58k
   VarianceKernelFloat/1048576/1           0.817 us        0.817 us       838993 bytes_per_second=1.1679T/s null_percent=100 size=1048.58k
   VarianceKernelFloat/1048576/0             346 us          346 us         2021 bytes_per_second=2.82409G/s null_percent=0 size=1048.58k
   
   VarianceKernelDouble/1048576/10000        184 us          184 us         3751 bytes_per_second=5.30153G/s null_percent=0.01 size=1048.58k
   VarianceKernelDouble/1048576/100          372 us          372 us         1869 bytes_per_second=2.62218G/s null_percent=1 size=1048.58k
   VarianceKernelDouble/1048576/10           549 us          549 us         1249 bytes_per_second=1.77993G/s null_percent=10 size=1048.58k
   VarianceKernelDouble/1048576/2            909 us          909 us          741 bytes_per_second=1099.92M/s null_percent=50 size=1048.58k
   VarianceKernelDouble/1048576/1          0.831 us        0.831 us       831173 bytes_per_second=1.14779T/s null_percent=100 size=1048.58k
   VarianceKernelDouble/1048576/0            174 us          174 us         4050 bytes_per_second=5.62431G/s null_percent=0 size=1048.58k
   ```
   
   I'm curious why Int64 would be faster than Double. Aren't they using the same algorithm? (and Int64 goes through an additional int-to-float conversion for each value)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#discussion_r507711886



##########
File path: cpp/src/arrow/compute/kernels/aggregate_var_std.cc
##########
@@ -57,6 +63,54 @@ struct VarStdState {
     this->m2 = m2;
   }
 
+  // int32/16/8: textbook one pass algorithm with integer arithmetic
+  template <typename T = ArrowType>
+  enable_if_t<is_integer_type<T>::value && (sizeof(CType) <= 4)> Consume(
+      const ArrayType& array) {
+    // max number of elements that sum will not overflow int64 (2Gi int32 elements)
+    // for uint32:    0 <= sum < 2^63 (int64 >= 0)
+    // for int32: -2^62 <= sum < 2^62
+    constexpr int64_t max_length = 1ULL << (63 - sizeof(CType) * 8);
+
+    int64_t start_index = 0;
+    int64_t valid_count = array.length() - array.null_count();
+
+    while (valid_count > 0) {
+      // process in chunks that overflow will never happen
+      const auto slice = array.Slice(start_index, max_length);
+      const int64_t count = slice->length() - slice->null_count();
+      start_index += max_length;
+      valid_count -= count;
+
+      if (count > 0) {
+        int64_t sum = 0;
+        int128_t square_sum = 0;
+        VisitArrayDataInline<ArrowType>(
+            *slice->data(),
+            [&sum, &square_sum](CType value) {
+              sum += value;
+              square_sum += static_cast<uint64_t>(value) * value;
+            },
+            []() {});
+
+        const double mean = static_cast<double>(sum) / count;
+        // calculate m2 = square_sum - sum * sum / count
+        // decompose `sum * sum / count` into integers and fractions
+        const int128_t sum_square = static_cast<int128_t>(sum) * sum;
+        const int128_t integers = sum_square / count;
+        const double fractions = static_cast<double>(sum_square % count) / count;

Review comment:
       Or simply `sum_square - integers * count`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 edited a comment on pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers

Posted by GitBox <gi...@apache.org>.
cyb70289 edited a comment on pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#issuecomment-714173837


   > I'm curious why Int64 would be faster than Double. Aren't they using the same algorithm? (and Int64 goes through an additional int-to-float conversion for each value)
   
   There's no int-to-float conversion in Int64 [summation loop](https://github.com/apache/arrow/pull/8466/files#diff-dedd9eac8bca7dd5bd0dd5951eab6daa717efb56d8b5d2adc49288c88216a62cR45-R50) (sum to Int128). It's faster than double summation.
   https://quick-bench.com/q/-P9E6tgtXqnVBpVmmN6piaZHeUA


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#issuecomment-714320980


   Ah, I hadn't noticed the `SumType`.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#issuecomment-708872598


   https://issues.apache.org/jira/browse/ARROW-10304


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 edited a comment on pull request #8466: ARROW-10304: [C++][Compute] Optimize variance kernel for integers

Posted by GitBox <gi...@apache.org>.
cyb70289 edited a comment on pull request #8466:
URL: https://github.com/apache/arrow/pull/8466#issuecomment-708872803


   **NOTE:** Benchmark PR https://github.com/apache/arrow/pull/8407 is not merged yet. Need to manually pull that PR to evaluation performance.
   
   Tested on Xeon Gold 5218, clang-9.
   
   ```
                                benchmark         baseline        contender  change %
   7        VarianceKernelInt32/1048576/0    1.805 GiB/sec    6.896 GiB/sec   282.033   'null_percent': 0.0
   8    VarianceKernelInt32/1048576/10000    1.751 GiB/sec    5.113 GiB/sec   191.898  'null_percent': 0.01
   16     VarianceKernelInt32/1048576/100    1.139 GiB/sec    2.748 GiB/sec   141.206   'null_percent': 1.0
   18      VarianceKernelInt32/1048576/10  862.260 MiB/sec    1.547 GiB/sec    83.714  'null_percent': 10.0
   1        VarianceKernelInt32/1048576/2  457.956 MiB/sec  769.333 MiB/sec    67.993  'null_percent': 50.0
   9        VarianceKernelInt64/1048576/0    3.596 GiB/sec    4.949 GiB/sec    37.601   'null_percent': 0.0
   5    VarianceKernelInt64/1048576/10000    3.484 GiB/sec    4.627 GiB/sec    32.832  'null_percent': 0.01
   22     VarianceKernelInt64/1048576/100    2.285 GiB/sec    2.787 GiB/sec    21.955   'null_percent': 1.0
   3        VarianceKernelFloat/1048576/2  397.394 MiB/sec  454.485 MiB/sec    14.366  'null_percent': 50.0
   10      VarianceKernelFloat/1048576/10  790.793 MiB/sec  854.667 MiB/sec     8.077  'null_percent': 10.0
   0        VarianceKernelInt64/1048576/1    1.220 TiB/sec    1.261 TiB/sec     3.353  null_percent': 100.0
   14       VarianceKernelInt32/1048576/1    1.222 TiB/sec    1.254 TiB/sec     2.647  null_percent': 100.0
   21      VarianceKernelDouble/1048576/1    1.206 TiB/sec    1.235 TiB/sec     2.379  null_percent': 100.0
   17       VarianceKernelFloat/1048576/1    1.180 TiB/sec    1.206 TiB/sec     2.208  null_percent': 100.0
   4   VarianceKernelDouble/1048576/10000    3.485 GiB/sec    3.475 GiB/sec    -0.277  'null_percent': 0.01
   23      VarianceKernelDouble/1048576/0    3.595 GiB/sec    3.575 GiB/sec    -0.557   'null_percent': 0.0
   2      VarianceKernelFloat/1048576/100    1.133 GiB/sec    1.126 GiB/sec    -0.632   'null_percent': 1.0
   13       VarianceKernelFloat/1048576/0    1.804 GiB/sec    1.792 GiB/sec    -0.643   'null_percent': 0.0
   20   VarianceKernelFloat/1048576/10000    1.750 GiB/sec    1.739 GiB/sec    -0.677  'null_percent': 0.01
   11    VarianceKernelDouble/1048576/100    2.287 GiB/sec    2.262 GiB/sec    -1.092   'null_percent': 1.0
   19      VarianceKernelDouble/1048576/2  866.210 MiB/sec  836.046 MiB/sec    -3.482  'null_percent': 50.0
   6      VarianceKernelDouble/1048576/10    1.658 GiB/sec    1.597 GiB/sec    -3.672  'null_percent': 10.0
   15      VarianceKernelInt64/1048576/10    1.687 GiB/sec    1.564 GiB/sec    -7.304  'null_percent': 10.0
   12       VarianceKernelInt64/1048576/2  914.036 MiB/sec  789.209 MiB/sec   -13.657  'null_percent': 50.0
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org