You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/10 12:29:54 UTC

[GitHub] [arrow] sfc-gh-nthimmegowda opened a new pull request, #14359: ARROW-17450 : [C++][Parquet] Add support for uint8 boolean decode in addition to bool array

sfc-gh-nthimmegowda opened a new pull request, #14359:
URL: https://github.com/apache/arrow/pull/14359

   Commit [466018084861f86a9171803c6d689e9c1c1efb5a](https://github.com/apache/arrow/commit/466018084861f86a9171803c6d689e9c1c1efb5a) added support for RLE boolean decoder. We refactored some additional code making it streamlined with other cases for decoder. 
   
   However there was a downstream dependency for a Decode function which taken in an array of `uint8` instead of `bool`. To not break any existing workload, adding back support for decode boolean datatype with array of `uint8`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #14359: ARROW-17450: [C++][Parquet] Add support for uint8 boolean decode in addition to bool array

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #14359:
URL: https://github.com/apache/arrow/pull/14359#issuecomment-1276992131

   Benchmark runs are scheduled for baseline = a02a3369da4c9f9df4d1dbb3c22af046a72d907a and contender = a9d2504b02f7c40a6c2dbed2a69ab6c447c1fa5b. a9d2504b02f7c40a6c2dbed2a69ab6c447c1fa5b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/12dff8009e414e619cf2e7d04c9dda92...85b4a52013c649ad8142fefcf35c0a39/)
   [Failed :arrow_down:1.67% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/0cdcddb8b4334504bba3de3f08bcb00f...b9f270c3b1ba4263a93c379a0a652da3/)
   [Failed :arrow_down:0.82% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/9f104a3aa6c24744b898bb7fe8ba5d21...3afd2c6c5ad348ca9d45ef459fcb65ff/)
   [Finished :arrow_down:0.36% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/46b209fc6e3a4977a9d9e613e19ef547...b86f57dfc5b24c729f07d2ae0992d622/)
   Buildkite builds:
   [Finished] [`a9d2504b` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1673)
   [Failed] [`a9d2504b` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1692)
   [Failed] [`a9d2504b` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1675)
   [Finished] [`a9d2504b` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1686)
   [Finished] [`a02a3369` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1672)
   [Failed] [`a02a3369` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1691)
   [Failed] [`a02a3369` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1674)
   [Finished] [`a02a3369` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1685)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #14359: ARROW-17450 : [C++][Parquet] Add support for uint8 boolean decode in addition to bool array

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14359:
URL: https://github.com/apache/arrow/pull/14359#issuecomment-1273256615

   https://issues.apache.org/jira/browse/ARROW-17450


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] sfc-gh-nthimmegowda commented on pull request #14359: ARROW-17450 : [C++][Parquet] Add support for uint8 boolean decode in addition to bool array

Posted by GitBox <gi...@apache.org>.
sfc-gh-nthimmegowda commented on PR #14359:
URL: https://github.com/apache/arrow/pull/14359#issuecomment-1273242491

   @kou @wgtmac Can you guys please take a look. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] sfc-gh-nthimmegowda commented on a diff in pull request #14359: ARROW-17450 : [C++][Parquet] Add support for uint8 boolean decode in addition to bool array

Posted by GitBox <gi...@apache.org>.
sfc-gh-nthimmegowda commented on code in PR #14359:
URL: https://github.com/apache/arrow/pull/14359#discussion_r992851767


##########
cpp/src/parquet/encoding.h:
##########
@@ -394,6 +394,12 @@ class DictDecoder : virtual public TypedDecoder<DType> {
 // ----------------------------------------------------------------------
 // TypedEncoder specializations, traits, and factory functions
 
+class BooleanDecoder : virtual public TypedDecoder<BooleanType> {
+ public:
+  using TypedDecoder<BooleanType>::Decode;
+  virtual int Decode(uint8_t* buffer, int max_values) = 0;

Review Comment:
   Done. Added a docstring. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #14359: ARROW-17450: [C++][Parquet] Add support for uint8 boolean decode in addition to bool array

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #14359:
URL: https://github.com/apache/arrow/pull/14359#discussion_r993338597


##########
cpp/src/parquet/encoding.h:
##########
@@ -394,6 +394,20 @@ class DictDecoder : virtual public TypedDecoder<DType> {
 // ----------------------------------------------------------------------
 // TypedEncoder specializations, traits, and factory functions
 
+class BooleanDecoder : virtual public TypedDecoder<BooleanType> {
+ public:
+  using TypedDecoder<BooleanType>::Decode;
+
+  /// \brief Decode values into a buffer
+  ///
+  /// \param[in] buffer destination for decoded values
+  /// This buffer contains bit-packed values.

Review Comment:
   ```suggestion
     /// This buffer will contain bit-packed values.
   ```



##########
cpp/src/parquet/encoding.h:
##########
@@ -394,6 +394,20 @@ class DictDecoder : virtual public TypedDecoder<DType> {
 // ----------------------------------------------------------------------
 // TypedEncoder specializations, traits, and factory functions
 
+class BooleanDecoder : virtual public TypedDecoder<BooleanType> {
+ public:
+  using TypedDecoder<BooleanType>::Decode;
+
+  /// \brief Decode values into a buffer

Review Comment:
   ```suggestion
     /// \brief Decode and bit-pack values into a buffer
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on pull request #14359: ARROW-17450 : [C++][Parquet] Add support for uint8 boolean decode in addition to bool array

Posted by GitBox <gi...@apache.org>.
wgtmac commented on PR #14359:
URL: https://github.com/apache/arrow/pull/14359#issuecomment-1273413920

   > @kou @wgtmac Can you guys please take a look. Thanks
   
   Thanks for adding the function back.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #14359: ARROW-17450: [C++][Parquet] Add support for uint8 boolean decode in addition to bool array

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #14359:
URL: https://github.com/apache/arrow/pull/14359#issuecomment-1276992263

   ['Python', 'R'] benchmarks have high level of regressions.
   [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/9f104a3aa6c24744b898bb7fe8ba5d21...3afd2c6c5ad348ca9d45ef459fcb65ff/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #14359: ARROW-17450 : [C++][Parquet] Add support for uint8 boolean decode in addition to bool array

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #14359:
URL: https://github.com/apache/arrow/pull/14359#discussion_r991287728


##########
cpp/src/parquet/encoding.h:
##########
@@ -394,6 +394,12 @@ class DictDecoder : virtual public TypedDecoder<DType> {
 // ----------------------------------------------------------------------
 // TypedEncoder specializations, traits, and factory functions
 
+class BooleanDecoder : virtual public TypedDecoder<BooleanType> {
+ public:
+  using TypedDecoder<BooleanType>::Decode;
+  virtual int Decode(uint8_t* buffer, int max_values) = 0;

Review Comment:
   Can you add a docstring? It should be clear that `buffer` will contain bit-packed values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou merged pull request #14359: ARROW-17450: [C++][Parquet] Add support for uint8 boolean decode in addition to bool array

Posted by GitBox <gi...@apache.org>.
pitrou merged PR #14359:
URL: https://github.com/apache/arrow/pull/14359


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org