You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/12 05:44:13 UTC

[GitHub] [arrow] cyb70289 opened a new pull request #8437: ARROW-10263: [C++][Compute] Improve variance kernel numerical stability

cyb70289 opened a new pull request #8437:
URL: https://github.com/apache/arrow/pull/8437


   Improve variance merging method to address stabiliy issue when merging
   short chunks with approximate mean value.
   
   Improve reference variance accuracy by leveraging Kahan summation.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #8437: ARROW-10263: [C++][Compute] Improve variance kernel numerical stability

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #8437:
URL: https://github.com/apache/arrow/pull/8437


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on a change in pull request #8437: ARROW-10263: [C++][Compute] Improve variance kernel numerical stability

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on a change in pull request #8437:
URL: https://github.com/apache/arrow/pull/8437#discussion_r503317889



##########
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##########
@@ -1070,22 +1065,39 @@ TEST_F(TestVarStdKernelStability, Basics) {
   VarianceOptions options{1};  // ddof = 1
   this->AssertVarStdIs("[100000004, 100000007, 100000013, 100000016]", options, 30.0);
   this->AssertVarStdIs("[1000000004, 1000000007, 1000000013, 1000000016]", options, 30.0);
+
+#ifndef __MINGW32__  // MinGW has precision issues

Review comment:
       This test failed on mingw 32 community CI. And I see similar comments in decimal unit test.
   https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/decimal_test.cc#L695
   
   I didn't tested it on my side. Maybe I can start a 32bit VM to check.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8437: ARROW-10263: [C++][Compute] Improve variance kernel numerical stability

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8437:
URL: https://github.com/apache/arrow/pull/8437#discussion_r503198538



##########
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##########
@@ -1070,22 +1065,39 @@ TEST_F(TestVarStdKernelStability, Basics) {
   VarianceOptions options{1};  // ddof = 1
   this->AssertVarStdIs("[100000004, 100000007, 100000013, 100000016]", options, 30.0);
   this->AssertVarStdIs("[1000000004, 1000000007, 1000000013, 1000000016]", options, 30.0);
+
+#ifndef __MINGW32__  // MinGW has precision issues

Review comment:
       This was only the 32-bit MinGW build, i.e. it was perhaps not MinGW but x87 (perhaps you can check with a 32-bit Linux build?).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on pull request #8437: ARROW-10263: [C++][Compute] Improve variance kernel numerical stability

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #8437:
URL: https://github.com/apache/arrow/pull/8437#issuecomment-708095682


   > Are there any before/after benchmarks? It's really nice that we can have extra numerical stability, I'm just curious what's the penalty for it.
   
   This change is only for combing variances from multiple arrays. The time is trivial compared with computing variance for each array.
   Benchmark also shows no difference (benchmark PR is pending review, https://github.com/apache/arrow/pull/8407)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8437: ARROW-10263: [C++][Compute] Improve variance kernel numerical stability

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8437:
URL: https://github.com/apache/arrow/pull/8437#discussion_r503379785



##########
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##########
@@ -1070,22 +1065,39 @@ TEST_F(TestVarStdKernelStability, Basics) {
   VarianceOptions options{1};  // ddof = 1
   this->AssertVarStdIs("[100000004, 100000007, 100000013, 100000016]", options, 30.0);
   this->AssertVarStdIs("[1000000004, 1000000007, 1000000013, 1000000016]", options, 30.0);
+
+#ifndef __MINGW32__  // MinGW has precision issues

Review comment:
       Ok, I've checked and there is no failure on Linux i386. It does seem MinGW-related.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alippai commented on pull request #8437: ARROW-10263: [C++][Compute] Improve variance kernel numerical stability

Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #8437:
URL: https://github.com/apache/arrow/pull/8437#issuecomment-708063761


   Are there any before/after benchmarks? It's really nice that we can have extra numerical stability, I'm just curious what's the penalty for it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8437: ARROW-10263: [C++][Compute] Improve variance kernel numerical stability

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8437:
URL: https://github.com/apache/arrow/pull/8437#issuecomment-706879466


   https://issues.apache.org/jira/browse/ARROW-10263


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on pull request #8437: ARROW-10263: [C++][Compute] Improve variance kernel numerical stability

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #8437:
URL: https://github.com/apache/arrow/pull/8437#issuecomment-706965043


   CI failure looks not related


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alippai commented on pull request #8437: ARROW-10263: [C++][Compute] Improve variance kernel numerical stability

Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #8437:
URL: https://github.com/apache/arrow/pull/8437#issuecomment-708300180


   Amazing, thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org