You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/31 20:16:59 UTC

[GitHub] [arrow] lidavidm opened a new pull request #11041: ARROW-13812: [C++] Fix Valgrind error in Grouper.BooleanKey test

lidavidm opened a new pull request #11041:
URL: https://github.com/apache/arrow/pull/11041


   Essentially, this failure boils down to: when generating the array of uniques for booleans, we pack 8 bytes at a time into one byte. The bytes are packed from what turns out to be a scratch array allocated by TempVectorStack, which does not initialize its memory. So we may end up packing initialized bytes and uninitialized bytes together into a single garbage byte, resulting in Valgrind complaining.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #11041: ARROW-13812: [C++] Fix Valgrind error in Grouper.BooleanKey test

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #11041:
URL: https://github.com/apache/arrow/pull/11041#issuecomment-909604343


   @github-actions crossbow submit test-conda-cpp-valgrind


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 closed pull request #11041: ARROW-13812: [C++] Fix Valgrind error in Grouper.BooleanKey test

Posted by GitBox <gi...@apache.org>.
cyb70289 closed pull request #11041:
URL: https://github.com/apache/arrow/pull/11041


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on a change in pull request #11041: ARROW-13812: [C++] Fix Valgrind error in Grouper.BooleanKey test

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on a change in pull request #11041:
URL: https://github.com/apache/arrow/pull/11041#discussion_r699783080



##########
File path: cpp/src/arrow/compute/exec/key_encode.cc
##########
@@ -427,11 +427,19 @@ void KeyEncoder::EncoderInteger::Decode(uint32_t start_row, uint32_t num_rows,
     row_base += offset_within_row;
     uint8_t* col_base = col_prep.mutable_data(1);
     switch (col_prep.metadata().fixed_length) {
-      case 1:
+      case 1: {
         for (uint32_t i = 0; i < num_rows; ++i) {
           col_base[i] = row_base[i * row_size];
         }
+        // For booleans, we pack 8 bytes at a time, and the buffer we're
+        // writing to here may not be fully initialized - so make sure a
+        // multiple of 8 bytes are initialized to avoid Valgrind errors. The
+        // temp buffer is sized to num_rows uint16_t values, so there's more
+        // than enough space here.
+        uint32_t remainder = 8 - (num_rows % 8);

Review comment:
       Will it overflow the buffer if num_rows is multiplier of 8?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #11041: ARROW-13812: [C++] Fix Valgrind error in Grouper.BooleanKey test

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #11041:
URL: https://github.com/apache/arrow/pull/11041#discussion_r700184494



##########
File path: cpp/src/arrow/compute/exec/key_encode.cc
##########
@@ -427,11 +427,19 @@ void KeyEncoder::EncoderInteger::Decode(uint32_t start_row, uint32_t num_rows,
     row_base += offset_within_row;
     uint8_t* col_base = col_prep.mutable_data(1);
     switch (col_prep.metadata().fixed_length) {
-      case 1:
+      case 1: {
         for (uint32_t i = 0; i < num_rows; ++i) {
           col_base[i] = row_base[i * row_size];
         }
+        // For booleans, we pack 8 bytes at a time, and the buffer we're
+        // writing to here may not be fully initialized - so make sure a
+        // multiple of 8 bytes are initialized to avoid Valgrind errors. The
+        // temp buffer is sized to num_rows uint16_t values, so there's more
+        // than enough space here.
+        uint32_t remainder = 8 - (num_rows % 8);

Review comment:
       It should be OK (there's a typo here, it's uint32_t values so the temporary buffer is actually overallocated by a factor of 4x). But I've adjusted it anyways.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #11041: ARROW-13812: [C++] Fix Valgrind error in Grouper.BooleanKey test

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #11041:
URL: https://github.com/apache/arrow/pull/11041#issuecomment-910506925


   @ursabot please benchmark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #11041: ARROW-13812: [C++] Fix Valgrind error in Grouper.BooleanKey test

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #11041:
URL: https://github.com/apache/arrow/pull/11041#issuecomment-910507090


   Benchmark runs are scheduled for baseline = bb533fb56ef8ba93a8c5fa9df3366c242570fb04 and contender = c005c9b8103743a57725e7ac4a65fbff89dc5248. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/f36b6d36a05640fea3cd9292f7d35e84...33405eb5b08943ae90ae958f3ceeced7/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/08f8aee9379042bfbcf7ac145e0c369b...a3fb33c167004d24a0a86673b2184467/)
   [Finished :arrow_down:0.14% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/adce32d1f1b54a838191d514097a0d00...a6376fef01ea411dae929b4422f8d855/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R, JavaScript
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #11041: ARROW-13812: [C++] Fix Valgrind error in Grouper.BooleanKey test

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #11041:
URL: https://github.com/apache/arrow/pull/11041#discussion_r700411311



##########
File path: cpp/src/arrow/compute/exec/key_encode.cc
##########
@@ -427,11 +427,19 @@ void KeyEncoder::EncoderInteger::Decode(uint32_t start_row, uint32_t num_rows,
     row_base += offset_within_row;
     uint8_t* col_base = col_prep.mutable_data(1);
     switch (col_prep.metadata().fixed_length) {
-      case 1:
+      case 1: {
         for (uint32_t i = 0; i < num_rows; ++i) {
           col_base[i] = row_base[i * row_size];
         }
+        // For booleans, we pack 8 bytes at a time, and the buffer we're
+        // writing to here may not be fully initialized - so make sure a
+        // multiple of 8 bytes are initialized to avoid Valgrind errors. The
+        // temp buffer is sized to num_rows uint32_t values, so there's more
+        // than enough space here.

Review comment:
       I wonder if it wouldn't be easier to just always zero-initialize the 8 last bytes in the temp buffer?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #11041: ARROW-13812: [C++] Fix Valgrind error in Grouper.BooleanKey test

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11041:
URL: https://github.com/apache/arrow/pull/11041#issuecomment-909599230






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #11041: ARROW-13812: [C++] Fix Valgrind error in Grouper.BooleanKey test

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11041:
URL: https://github.com/apache/arrow/pull/11041#issuecomment-909599230






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #11041: ARROW-13812: [C++] Fix Valgrind error in Grouper.BooleanKey test

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #11041:
URL: https://github.com/apache/arrow/pull/11041#issuecomment-910507090


   Benchmark runs are scheduled for baseline = bb533fb56ef8ba93a8c5fa9df3366c242570fb04 and contender = c005c9b8103743a57725e7ac4a65fbff89dc5248. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/f36b6d36a05640fea3cd9292f7d35e84...33405eb5b08943ae90ae958f3ceeced7/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/08f8aee9379042bfbcf7ac145e0c369b...a3fb33c167004d24a0a86673b2184467/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/adce32d1f1b54a838191d514097a0d00...a6376fef01ea411dae929b4422f8d855/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R, JavaScript
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #11041: ARROW-13812: [C++] Fix Valgrind error in Grouper.BooleanKey test

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11041:
URL: https://github.com/apache/arrow/pull/11041#issuecomment-910256798


   Revision: 22cbfd915497a114b45035379b842ef214cd06d8
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-808](https://github.com/ursacomputing/crossbow/branches/all?query=actions-808)
   
   |Task|Status|
   |----|------|
   |test-conda-cpp-valgrind|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-808-azure-test-conda-cpp-valgrind)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-808-azure-test-conda-cpp-valgrind)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #11041: ARROW-13812: [C++] Fix Valgrind error in Grouper.BooleanKey test

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #11041:
URL: https://github.com/apache/arrow/pull/11041#issuecomment-910256006


   @github-actions crossbow submit test-conda-cpp-valgrind


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #11041: ARROW-13812: [C++] Fix Valgrind error in Grouper.BooleanKey test

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11041:
URL: https://github.com/apache/arrow/pull/11041#issuecomment-909605122


   Revision: 16700058feb3a672d5187cbfbf434b556b3a2fe2
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-807](https://github.com/ursacomputing/crossbow/branches/all?query=actions-807)
   
   |Task|Status|
   |----|------|
   |test-conda-cpp-valgrind|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-807-azure-test-conda-cpp-valgrind)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-807-azure-test-conda-cpp-valgrind)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on a change in pull request #11041: ARROW-13812: [C++] Fix Valgrind error in Grouper.BooleanKey test

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on a change in pull request #11041:
URL: https://github.com/apache/arrow/pull/11041#discussion_r699783080



##########
File path: cpp/src/arrow/compute/exec/key_encode.cc
##########
@@ -427,11 +427,19 @@ void KeyEncoder::EncoderInteger::Decode(uint32_t start_row, uint32_t num_rows,
     row_base += offset_within_row;
     uint8_t* col_base = col_prep.mutable_data(1);
     switch (col_prep.metadata().fixed_length) {
-      case 1:
+      case 1: {
         for (uint32_t i = 0; i < num_rows; ++i) {
           col_base[i] = row_base[i * row_size];
         }
+        // For booleans, we pack 8 bytes at a time, and the buffer we're
+        // writing to here may not be fully initialized - so make sure a
+        // multiple of 8 bytes are initialized to avoid Valgrind errors. The
+        // temp buffer is sized to num_rows uint16_t values, so there's more
+        // than enough space here.
+        uint32_t remainder = 8 - (num_rows % 8);

Review comment:
       Will it overflow the buffer if num_rows is multiplier of 8?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #11041: ARROW-13812: [C++] Fix Valgrind error in Grouper.BooleanKey test

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #11041:
URL: https://github.com/apache/arrow/pull/11041#issuecomment-910507090


   Benchmark runs are scheduled for baseline = bb533fb56ef8ba93a8c5fa9df3366c242570fb04 and contender = c005c9b8103743a57725e7ac4a65fbff89dc5248. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/f36b6d36a05640fea3cd9292f7d35e84...33405eb5b08943ae90ae958f3ceeced7/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/08f8aee9379042bfbcf7ac145e0c369b...a3fb33c167004d24a0a86673b2184467/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/adce32d1f1b54a838191d514097a0d00...a6376fef01ea411dae929b4422f8d855/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R, JavaScript
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #11041: ARROW-13812: [C++] Fix Valgrind error in Grouper.BooleanKey test

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #11041:
URL: https://github.com/apache/arrow/pull/11041#issuecomment-909604343


   @github-actions crossbow submit test-conda-cpp-valgrind


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #11041: ARROW-13812: [C++] Fix Valgrind error in Grouper.BooleanKey test

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #11041:
URL: https://github.com/apache/arrow/pull/11041#discussion_r700413374



##########
File path: cpp/src/arrow/compute/exec/key_encode.cc
##########
@@ -427,11 +427,19 @@ void KeyEncoder::EncoderInteger::Decode(uint32_t start_row, uint32_t num_rows,
     row_base += offset_within_row;
     uint8_t* col_base = col_prep.mutable_data(1);
     switch (col_prep.metadata().fixed_length) {
-      case 1:
+      case 1: {
         for (uint32_t i = 0; i < num_rows; ++i) {
           col_base[i] = row_base[i * row_size];
         }
+        // For booleans, we pack 8 bytes at a time, and the buffer we're
+        // writing to here may not be fully initialized - so make sure a
+        // multiple of 8 bytes are initialized to avoid Valgrind errors. The
+        // temp buffer is sized to num_rows uint32_t values, so there's more
+        // than enough space here.

Review comment:
       +1 from me




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #11041: ARROW-13812: [C++] Fix Valgrind error in Grouper.BooleanKey test

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #11041:
URL: https://github.com/apache/arrow/pull/11041#issuecomment-910507090


   Benchmark runs are scheduled for baseline = bb533fb56ef8ba93a8c5fa9df3366c242570fb04 and contender = c005c9b8103743a57725e7ac4a65fbff89dc5248. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/f36b6d36a05640fea3cd9292f7d35e84...33405eb5b08943ae90ae958f3ceeced7/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/08f8aee9379042bfbcf7ac145e0c369b...a3fb33c167004d24a0a86673b2184467/)
   [Finished :arrow_down:0.14% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/adce32d1f1b54a838191d514097a0d00...a6376fef01ea411dae929b4422f8d855/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R, JavaScript
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #11041: ARROW-13812: [C++] Fix Valgrind error in Grouper.BooleanKey test

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #11041:
URL: https://github.com/apache/arrow/pull/11041#discussion_r700412732



##########
File path: cpp/src/arrow/compute/exec/key_encode.cc
##########
@@ -427,11 +427,19 @@ void KeyEncoder::EncoderInteger::Decode(uint32_t start_row, uint32_t num_rows,
     row_base += offset_within_row;
     uint8_t* col_base = col_prep.mutable_data(1);
     switch (col_prep.metadata().fixed_length) {
-      case 1:
+      case 1: {
         for (uint32_t i = 0; i < num_rows; ++i) {
           col_base[i] = row_base[i * row_size];
         }
+        // For booleans, we pack 8 bytes at a time, and the buffer we're
+        // writing to here may not be fully initialized - so make sure a
+        // multiple of 8 bytes are initialized to avoid Valgrind errors. The
+        // temp buffer is sized to num_rows uint32_t values, so there's more
+        // than enough space here.

Review comment:
       In that case, since the temp buffer is reused quite a bit, we might as well just initialize the underlying buffer on allocation? It should be a fixed setup cost since one large buffer is allocated on creation of the grouper (TempVectorStack) and then slices of it (TempVectorHolder) are taken as scratch space.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org