You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/07 23:57:02 UTC

[GitHub] [arrow] tachyonwill opened a new pull request #12365: PARQUET-2119: Fix DeltaBitPackDecoder fuzzer found issue.

tachyonwill opened a new pull request #12365:
URL: https://github.com/apache/arrow/pull/12365


   DeltaBitPackDecoder was using num_values_(which includes) null to compute batch size instead of total_value_count_. This lead to a failed check when comparing those counts. Changed to just use total_value_count_ and get rid of the check. Alternatively, we could throw an exception instead of the check; I think we only enter this state if the file is malformed (assuming no bugs elsewhere). 
   
   Also modified decode arrow to check the return value of DecodeInternal; this might be pointless as its callers only compare the returned value count as a DCHECK. It might be a good idea to standardize at what stage of decoding values should be compared to expected values and the behavior when they are not equal (preferably an exception and not a check).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12365: PARQUET-2119: [C++] Fix DeltaBitPackDecoder fuzzer found issue

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12365:
URL: https://github.com/apache/arrow/pull/12365#issuecomment-1032932871


   Benchmark runs are scheduled for baseline = 43efadb26d4bf3f8ec8d9d61f61d4652796229bf and contender = 09c8554d4189cec4da96f3a93266df5cb0718a41. 09c8554d4189cec4da96f3a93266df5cb0718a41 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/bba37131e8704d8492bc52fe12dd0e1f...c8211b7cf41a40cca565a197a796a8cd/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/a4fc08cfe08f43b2b424a735e2fa2c71...54d9df2152204a508b8f149469ebc4f9/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/fe10e64a347a4b01b52cbba241e78ba5...973a1ca2e8eb4ead8f00e0d4d500dbe2/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/60ee836c72be467781c027ec041ee8ce...91e25e83cc5a442687ae28fd171ede5b/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tachyonwill commented on a change in pull request #12365: PARQUET-2119: Fix DeltaBitPackDecoder fuzzer found issue.

Posted by GitBox <gi...@apache.org>.
tachyonwill commented on a change in pull request #12365:
URL: https://github.com/apache/arrow/pull/12365#discussion_r801181457



##########
File path: cpp/src/parquet/encoding.cc
##########
@@ -2121,12 +2121,12 @@ class DeltaBitPackDecoder : public DecoderImpl, virtual public TypedDecoder<DTyp
       ParquetException::NYI("Delta bit pack DecodeArrow with null slots");
     }
     std::vector<T> values(num_values);
-    GetInternal(values.data(), num_values);
+    int decoded_count = GetInternal(values.data(), num_values);
     PARQUET_THROW_NOT_OK(out->Reserve(num_values));

Review comment:
       yes, changed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tachyonwill commented on a change in pull request #12365: PARQUET-2119: Fix DeltaBitPackDecoder fuzzer found issue.

Posted by GitBox <gi...@apache.org>.
tachyonwill commented on a change in pull request #12365:
URL: https://github.com/apache/arrow/pull/12365#discussion_r801164114



##########
File path: cpp/src/parquet/encoding.cc
##########
@@ -2181,12 +2181,14 @@ class DeltaBitPackDecoder : public DecoderImpl, virtual public TypedDecoder<DTyp
   }
 
   int GetInternal(T* buffer, int max_values) {
-    max_values = std::min(max_values, this->num_values_);
+    int total_value_count = static_cast<int>(std::min(
+        total_value_count_, static_cast<uint32_t>(std::numeric_limits<int>::max())));
+    max_values = std::min(max_values, total_value_count);
+    DCHECK_GE(max_values, 0);

Review comment:
       I worry that this is the wrong place to put it though. The basic contract of these decoders seems to be, 'give me how many values you want, I will give you min(what you want, what is left) and then return how many were actually given'. Changing this one to throw an exception if 'what you want > what is left' would make it unlike the others, even if I only expect the caller to request more than is what is available if the caller has a bug or the file is malformed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12365: PARQUET-2119: [C++] Fix DeltaBitPackDecoder fuzzer found issue

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12365:
URL: https://github.com/apache/arrow/pull/12365#issuecomment-1032932871


   Benchmark runs are scheduled for baseline = 43efadb26d4bf3f8ec8d9d61f61d4652796229bf and contender = 09c8554d4189cec4da96f3a93266df5cb0718a41. 09c8554d4189cec4da96f3a93266df5cb0718a41 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/bba37131e8704d8492bc52fe12dd0e1f...c8211b7cf41a40cca565a197a796a8cd/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/a4fc08cfe08f43b2b424a735e2fa2c71...54d9df2152204a508b8f149469ebc4f9/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/fe10e64a347a4b01b52cbba241e78ba5...973a1ca2e8eb4ead8f00e0d4d500dbe2/)
   [Finished :arrow_down:0.43% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/60ee836c72be467781c027ec041ee8ce...91e25e83cc5a442687ae28fd171ede5b/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12365: PARQUET-2119: [C++] Fix DeltaBitPackDecoder fuzzer found issue

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12365:
URL: https://github.com/apache/arrow/pull/12365#issuecomment-1032932871


   Benchmark runs are scheduled for baseline = 43efadb26d4bf3f8ec8d9d61f61d4652796229bf and contender = 09c8554d4189cec4da96f3a93266df5cb0718a41. 09c8554d4189cec4da96f3a93266df5cb0718a41 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/bba37131e8704d8492bc52fe12dd0e1f...c8211b7cf41a40cca565a197a796a8cd/)
   [Failed] [test-mac-arm](https://conbench.ursa.dev/compare/runs/a4fc08cfe08f43b2b424a735e2fa2c71...54d9df2152204a508b8f149469ebc4f9/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/fe10e64a347a4b01b52cbba241e78ba5...973a1ca2e8eb4ead8f00e0d4d500dbe2/)
   [Failed] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/60ee836c72be467781c027ec041ee8ce...91e25e83cc5a442687ae28fd171ede5b/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12365: PARQUET-2119: [C++] Fix DeltaBitPackDecoder fuzzer found issue

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12365:
URL: https://github.com/apache/arrow/pull/12365#issuecomment-1032932871


   Benchmark runs are scheduled for baseline = 43efadb26d4bf3f8ec8d9d61f61d4652796229bf and contender = 09c8554d4189cec4da96f3a93266df5cb0718a41. 09c8554d4189cec4da96f3a93266df5cb0718a41 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/bba37131e8704d8492bc52fe12dd0e1f...c8211b7cf41a40cca565a197a796a8cd/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/a4fc08cfe08f43b2b424a735e2fa2c71...54d9df2152204a508b8f149469ebc4f9/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/fe10e64a347a4b01b52cbba241e78ba5...973a1ca2e8eb4ead8f00e0d4d500dbe2/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/60ee836c72be467781c027ec041ee8ce...91e25e83cc5a442687ae28fd171ede5b/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tachyonwill commented on a change in pull request #12365: PARQUET-2119: Fix DeltaBitPackDecoder fuzzer found issue.

Posted by GitBox <gi...@apache.org>.
tachyonwill commented on a change in pull request #12365:
URL: https://github.com/apache/arrow/pull/12365#discussion_r801183544



##########
File path: cpp/src/parquet/encoding.cc
##########
@@ -2181,12 +2181,14 @@ class DeltaBitPackDecoder : public DecoderImpl, virtual public TypedDecoder<DTyp
   }
 
   int GetInternal(T* buffer, int max_values) {
-    max_values = std::min(max_values, this->num_values_);
+    int total_value_count = static_cast<int>(std::min(
+        total_value_count_, static_cast<uint32_t>(std::numeric_limits<int>::max())));
+    max_values = std::min(max_values, total_value_count);
+    DCHECK_GE(max_values, 0);

Review comment:
       On this subject, I noticed that DeltaByteArrayDecoder was only looking at the number of valid values in its prefix sub-decoder but not its suffix decoder; I added an exception if the suffix sub-decoder does not return the correct number of values.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tachyonwill commented on pull request #12365: PARQUET-2119: Fix DeltaBitPackDecoder fuzzer found issue.

Posted by GitBox <gi...@apache.org>.
tachyonwill commented on pull request #12365:
URL: https://github.com/apache/arrow/pull/12365#issuecomment-1032101335


   > Is there a fuzzer file to go along with this?
   
   Yes, https://github.com/apache/arrow-testing/pull/75 . I didn't include the submodule change with this PR to avoid last time's weirdness. Can send another PR with just the submodule update once both are merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12365: PARQUET-2119: [C++] Fix DeltaBitPackDecoder fuzzer found issue

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12365:
URL: https://github.com/apache/arrow/pull/12365#issuecomment-1032932871


   Benchmark runs are scheduled for baseline = 43efadb26d4bf3f8ec8d9d61f61d4652796229bf and contender = 09c8554d4189cec4da96f3a93266df5cb0718a41. 09c8554d4189cec4da96f3a93266df5cb0718a41 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/bba37131e8704d8492bc52fe12dd0e1f...c8211b7cf41a40cca565a197a796a8cd/)
   [Finished :arrow_down:0.13% :arrow_up:0.04%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/a4fc08cfe08f43b2b424a735e2fa2c71...54d9df2152204a508b8f149469ebc4f9/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/fe10e64a347a4b01b52cbba241e78ba5...973a1ca2e8eb4ead8f00e0d4d500dbe2/)
   [Finished :arrow_down:0.43% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/60ee836c72be467781c027ec041ee8ce...91e25e83cc5a442687ae28fd171ede5b/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #12365: PARQUET-2119: Fix DeltaBitPackDecoder fuzzer found issue.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12365:
URL: https://github.com/apache/arrow/pull/12365#issuecomment-1032062612


   https://issues.apache.org/jira/browse/PARQUET-2119


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #12365: PARQUET-2119: Fix DeltaBitPackDecoder fuzzer found issue.

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #12365:
URL: https://github.com/apache/arrow/pull/12365#discussion_r801160571



##########
File path: cpp/src/parquet/encoding.cc
##########
@@ -2181,12 +2181,14 @@ class DeltaBitPackDecoder : public DecoderImpl, virtual public TypedDecoder<DTyp
   }
 
   int GetInternal(T* buffer, int max_values) {
-    max_values = std::min(max_values, this->num_values_);
+    int total_value_count = static_cast<int>(std::min(
+        total_value_count_, static_cast<uint32_t>(std::numeric_limits<int>::max())));
+    max_values = std::min(max_values, total_value_count);
+    DCHECK_GE(max_values, 0);

Review comment:
       I think if this can result from a malformed file we should throw an exception (same in general for any checks in the parquet code per your point in the CL description.).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12365: PARQUET-2119: Fix DeltaBitPackDecoder fuzzer found issue.

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12365:
URL: https://github.com/apache/arrow/pull/12365#discussion_r801589517



##########
File path: cpp/src/parquet/encoding.cc
##########
@@ -2424,7 +2426,12 @@ class DeltaByteArrayDecoder : public DecoderImpl,
       return max_values;
     }
 
-    suffix_decoder_.Decode(buffer, max_values);
+    int suffix_read = suffix_decoder_.Decode(buffer, max_values);
+    if (ARROW_PREDICT_FALSE(suffix_read != max_values)) {
+      ParquetException::EofException("Read " + std::to_string(suffix_read) +
+                                     "Expecting " + std::to_string(max_values) +
+                                     " from suffix decoder");

Review comment:
       ```suggestion
         ParquetException::EofException("Read " + std::to_string(suffix_read) +
                                        ", expecting " + std::to_string(max_values) +
                                        " from suffix decoder");
   ```

##########
File path: cpp/src/parquet/encoding.cc
##########
@@ -2181,12 +2181,14 @@ class DeltaBitPackDecoder : public DecoderImpl, virtual public TypedDecoder<DTyp
   }
 
   int GetInternal(T* buffer, int max_values) {
-    max_values = std::min(max_values, this->num_values_);
+    int total_value_count = static_cast<int>(std::min(
+        total_value_count_, static_cast<uint32_t>(std::numeric_limits<int>::max())));
+    max_values = std::min(max_values, total_value_count);
+    DCHECK_GE(max_values, 0);

Review comment:
       What is this check for? Is there a situation where `max_values` can be negative (except the caller passing a negative value)?

##########
File path: cpp/src/parquet/encoding.cc
##########
@@ -2181,12 +2181,14 @@ class DeltaBitPackDecoder : public DecoderImpl, virtual public TypedDecoder<DTyp
   }
 
   int GetInternal(T* buffer, int max_values) {
-    max_values = std::min(max_values, this->num_values_);
+    int total_value_count = static_cast<int>(std::min(
+        total_value_count_, static_cast<uint32_t>(std::numeric_limits<int>::max())));
+    max_values = std::min(max_values, total_value_count);

Review comment:
       This is extremely convoluted. How about:
   ```suggestion
       max_values = static_cast<int>(std::min<int64_t>(max_values, total_value_count_));
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tachyonwill commented on a change in pull request #12365: PARQUET-2119: Fix DeltaBitPackDecoder fuzzer found issue.

Posted by GitBox <gi...@apache.org>.
tachyonwill commented on a change in pull request #12365:
URL: https://github.com/apache/arrow/pull/12365#discussion_r801236398



##########
File path: cpp/src/parquet/encoding.cc
##########
@@ -2181,12 +2181,14 @@ class DeltaBitPackDecoder : public DecoderImpl, virtual public TypedDecoder<DTyp
   }
 
   int GetInternal(T* buffer, int max_values) {
-    max_values = std::min(max_values, this->num_values_);
+    int total_value_count = static_cast<int>(std::min(
+        total_value_count_, static_cast<uint32_t>(std::numeric_limits<int>::max())));
+    max_values = std::min(max_values, total_value_count);
+    DCHECK_GE(max_values, 0);

Review comment:
       I will probably second a followup PR to this one to change some of those DCHECKs in the calling code to exceptions




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #12365: PARQUET-2119: Fix DeltaBitPackDecoder fuzzer found issue.

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #12365:
URL: https://github.com/apache/arrow/pull/12365#discussion_r801164860



##########
File path: cpp/src/parquet/encoding.cc
##########
@@ -2181,12 +2181,14 @@ class DeltaBitPackDecoder : public DecoderImpl, virtual public TypedDecoder<DTyp
   }
 
   int GetInternal(T* buffer, int max_values) {
-    max_values = std::min(max_values, this->num_values_);
+    int total_value_count = static_cast<int>(std::min(
+        total_value_count_, static_cast<uint32_t>(std::numeric_limits<int>::max())));
+    max_values = std::min(max_values, total_value_count);
+    DCHECK_GE(max_values, 0);

Review comment:
       Make sense.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12365: PARQUET-2119: [C++] Fix DeltaBitPackDecoder fuzzer found issue

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12365:
URL: https://github.com/apache/arrow/pull/12365#issuecomment-1032932871


   Benchmark runs are scheduled for baseline = 43efadb26d4bf3f8ec8d9d61f61d4652796229bf and contender = 09c8554d4189cec4da96f3a93266df5cb0718a41. 09c8554d4189cec4da96f3a93266df5cb0718a41 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/bba37131e8704d8492bc52fe12dd0e1f...c8211b7cf41a40cca565a197a796a8cd/)
   [Finished :arrow_down:0.13% :arrow_up:0.04%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/a4fc08cfe08f43b2b424a735e2fa2c71...54d9df2152204a508b8f149469ebc4f9/)
   [Failed :arrow_down:0.36% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/fe10e64a347a4b01b52cbba241e78ba5...973a1ca2e8eb4ead8f00e0d4d500dbe2/)
   [Finished :arrow_down:0.43% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/60ee836c72be467781c027ec041ee8ce...91e25e83cc5a442687ae28fd171ede5b/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #12365: PARQUET-2119: [C++] Fix DeltaBitPackDecoder fuzzer found issue

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #12365:
URL: https://github.com/apache/arrow/pull/12365#issuecomment-1032932871


   Benchmark runs are scheduled for baseline = 43efadb26d4bf3f8ec8d9d61f61d4652796229bf and contender = 09c8554d4189cec4da96f3a93266df5cb0718a41. 09c8554d4189cec4da96f3a93266df5cb0718a41 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/bba37131e8704d8492bc52fe12dd0e1f...c8211b7cf41a40cca565a197a796a8cd/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/a4fc08cfe08f43b2b424a735e2fa2c71...54d9df2152204a508b8f149469ebc4f9/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/fe10e64a347a4b01b52cbba241e78ba5...973a1ca2e8eb4ead8f00e0d4d500dbe2/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/60ee836c72be467781c027ec041ee8ce...91e25e83cc5a442687ae28fd171ede5b/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #12365: PARQUET-2119: [C++] Fix DeltaBitPackDecoder fuzzer found issue

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #12365:
URL: https://github.com/apache/arrow/pull/12365


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #12365: PARQUET-2119: Fix DeltaBitPackDecoder fuzzer found issue.

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #12365:
URL: https://github.com/apache/arrow/pull/12365#discussion_r801165534



##########
File path: cpp/src/parquet/encoding.cc
##########
@@ -2121,12 +2121,12 @@ class DeltaBitPackDecoder : public DecoderImpl, virtual public TypedDecoder<DTyp
       ParquetException::NYI("Delta bit pack DecodeArrow with null slots");
     }
     std::vector<T> values(num_values);
-    GetInternal(values.data(), num_values);
+    int decoded_count = GetInternal(values.data(), num_values);
     PARQUET_THROW_NOT_OK(out->Reserve(num_values));

Review comment:
       should this be decoded count now?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #12365: PARQUET-2119: Fix DeltaBitPackDecoder fuzzer found issue.

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #12365:
URL: https://github.com/apache/arrow/pull/12365#issuecomment-1032076644


   Is there  a fuzzer file to go along with this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] tachyonwill commented on a change in pull request #12365: PARQUET-2119: [C++] Fix DeltaBitPackDecoder fuzzer found issue

Posted by GitBox <gi...@apache.org>.
tachyonwill commented on a change in pull request #12365:
URL: https://github.com/apache/arrow/pull/12365#discussion_r801874153



##########
File path: cpp/src/parquet/encoding.cc
##########
@@ -2181,12 +2181,14 @@ class DeltaBitPackDecoder : public DecoderImpl, virtual public TypedDecoder<DTyp
   }
 
   int GetInternal(T* buffer, int max_values) {
-    max_values = std::min(max_values, this->num_values_);
+    int total_value_count = static_cast<int>(std::min(
+        total_value_count_, static_cast<uint32_t>(std::numeric_limits<int>::max())));
+    max_values = std::min(max_values, total_value_count);
+    DCHECK_GE(max_values, 0);

Review comment:
       The caller passing a negative value is what I had in mind. Removed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org