You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "mapleFU (via GitHub)" <gi...@apache.org> on 2023/03/22 04:50:19 UTC

[GitHub] [arrow] mapleFU opened a new pull request, #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet

mapleFU opened a new pull request, #34676:
URL: https://github.com/apache/arrow/pull/34676

   <!--
   Thanks for opening a pull request!
   If this is your first pull request you can find detailed information on how 
   to contribute here:
     * [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request)
     * [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html)
   
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
   
   Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename the pull request title in the following format?
   
       GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   In the case of PARQUET issues on JIRA the title also supports:
   
       PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   -->
   
   ### Rationale for this change
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   Add boolean encoding benchmark for parquet.
   
   ### What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   Some benchmarks
   
   ### Are these changes tested?
   
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   No, they're benchmark
   
   ### Are there any user-facing changes?
   
   No
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please uncomment the line below and explain which changes are breaking.
   -->
   <!-- **This PR includes breaking changes to public APIs.** -->
   
   <!--
   Please uncomment the line below (and provide explanation) if the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld). We use this to highlight fixes to issues that may affect users without their knowledge. For this reason, fixing bugs that cause errors don't count, since those are usually obvious.
   -->
   <!-- **This PR contains a "Critical Fix".** -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on PR #34676:
URL: https://github.com/apache/arrow/pull/34676#issuecomment-1478968133

   I guess boost official uses https://boostorg.jfrog.io/artifactory/main/release/1.81.0/source/ , and we have:
   
   ```
   if(DEFINED ENV{ARROW_BOOST_URL})
     set(BOOST_SOURCE_URL "$ENV{ARROW_BOOST_URL}")
   else()
     string(REPLACE "." "_" ARROW_BOOST_BUILD_VERSION_UNDERSCORES
                    ${ARROW_BOOST_BUILD_VERSION})
     set_urls(BOOST_SOURCE_URL
              # These are trimmed boost bundles we maintain.
              # See cpp/build-support/trim-boost.sh
              # FIXME(ARROW-6407) automate uploading this archive to ensure it reflects
              # our currently used packages and doesn't fall out of sync with
              # ${ARROW_BOOST_BUILD_VERSION_UNDERSCORES}
              "${THIRDPARTY_MIRROR_URL}/boost_${ARROW_BOOST_BUILD_VERSION_UNDERSCORES}.tar.gz"
              "https://boostorg.jfrog.io/artifactory/main/release/${ARROW_BOOST_BUILD_VERSION}/source/boost_${ARROW_BOOST_BUILD_VERSION_UNDERSCORES}.tar.gz"
              "https://sourceforge.net/projects/boost/files/boost/${ARROW_BOOST_BUILD_VERSION}/boost_${ARROW_BOOST_BUILD_VERSION_UNDERSCORES}.tar.gz"
     )
   endif()
   ```
   
   Maybe hash for them is inconsistent, we can track it in [#34675](https://github.com/apache/arrow/issues/34675)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on PR #34676:
URL: https://github.com/apache/arrow/pull/34676#issuecomment-1478940339

   @kou Mind take a look at boost hash mismatch problem? 
   1. https://github.com/apache/arrow/actions/runs/4486679768/jobs/7889388957?pr=34676
   2. https://github.com/apache/arrow/actions/runs/4486679773/jobs/7889390019?pr=34676
   
   Found another problem in issues: https://github.com/apache/arrow/issues/34675


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on a diff in pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #34676:
URL: https://github.com/apache/arrow/pull/34676#discussion_r1144236462


##########
cpp/src/parquet/encoding_benchmark.cc:
##########
@@ -279,22 +316,21 @@ static void BM_PlainDecodingSpaced(benchmark::State& state) {
   const auto null_percent = static_cast<double>(state.range(1)) / 10000.0;
 
   auto rand = ::arrow::random::RandomArrayGenerator(1923);
-  const auto array = rand.Numeric<ArrowType>(num_values, -100, 100, null_percent);
+  std::shared_ptr<::arrow::Array> array;
+  if constexpr (std::is_same_v<ParquetType, BooleanType>) {
+    array = rand.Boolean(num_values, /*true_probability*/0.5, null_percent);
+  } else {
+    array = rand.Numeric<ArrowType>(num_values, -100, 100, null_percent);
+  }
   const auto valid_bits = array->null_bitmap_data();
   const int null_count = static_cast<int>(array->null_count());
   const auto array_actual = ::arrow::internal::checked_pointer_cast<ArrayType>(array);
-  const auto raw_values = array_actual->raw_values();
-  // Guarantee the type cast between raw_values and input of PutSpaced.
-  static_assert(sizeof(CType) == sizeof(*raw_values), "Type mismatch");
-  // Cast only happens for BooleanType as it use UInt8 for the array data to match a bool*
-  // input to PutSpaced.
-  const auto src = reinterpret_cast<const CType*>(raw_values);
 
-  auto encoder = MakeTypedEncoder<ParquetType>(Encoding::PLAIN);
-  encoder->PutSpaced(src, num_values, valid_bits, 0);
+  auto encoder = MakeTypedEncoder<ParquetType>(encoding);
+  encoder->Put(*array);

Review Comment:
   Currently, this benchmark tests `GetSpaced`, so, when `Put` I just call `Put(array)`, and encoder itself will handle null and values



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34676:
URL: https://github.com/apache/arrow/pull/34676#issuecomment-1478914164

   :warning: GitHub issue #34673 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on PR #34676:
URL: https://github.com/apache/arrow/pull/34676#issuecomment-1478959072

   We can ignore these failures for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on a diff in pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #34676:
URL: https://github.com/apache/arrow/pull/34676#discussion_r1144763362


##########
cpp/src/parquet/encoding_benchmark.cc:
##########
@@ -254,23 +255,59 @@ static void BM_PlainEncodingSpaced(benchmark::State& state) {
   state.SetBytesProcessed(state.iterations() * num_values * sizeof(CType));
 }
 
+template <>
+void BM_EncodingSpaced<BooleanType>(benchmark::State& state, Encoding::type encoding) {
+  using CType = bool;
+
+  const int num_values = static_cast<int>(state.range(0));
+  const double null_percent = static_cast<double>(state.range(1)) / 10000.0;

Review Comment:
   For spaced tests, arguments come from here:
   
   ```
   static void BM_SpacedArgs(benchmark::internal::Benchmark* bench) {
     constexpr auto kPlainSpacedSize = 32 * 1024;  // 32k
   
     bench->Args({/*size*/ kPlainSpacedSize, /*null_in_ten_thousand*/ 1});
     bench->Args({/*size*/ kPlainSpacedSize, /*null_in_ten_thousand*/ 100});
     bench->Args({/*size*/ kPlainSpacedSize, /*null_in_ten_thousand*/ 1000});
     bench->Args({/*size*/ kPlainSpacedSize, /*null_in_ten_thousand*/ 5000});
     bench->Args({/*size*/ kPlainSpacedSize, /*null_in_ten_thousand*/ 10000});
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on PR #34676:
URL: https://github.com/apache/arrow/pull/34676#issuecomment-1481946369

   @wjones127 mind take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on PR #34676:
URL: https://github.com/apache/arrow/pull/34676#issuecomment-1478914177

   On My M1 MacOS with Release:
   
   ```
   ----------------------------------------------------------------------------------------------------
   Benchmark                                          Time             CPU   Iterations UserCounters...
   ----------------------------------------------------------------------------------------------------
   BM_PlainEncodingBoolean/1024                    9482 ns         6847 ns       114183 bytes_per_second=142.633M/s
   BM_PlainEncodingBoolean/4096                   26418 ns        22688 ns        30173 bytes_per_second=172.173M/s
   BM_PlainEncodingBoolean/32768                 172280 ns       168138 ns         3905 bytes_per_second=185.86M/s
   BM_PlainEncodingBoolean/65536                 335606 ns       331450 ns         2115 bytes_per_second=188.565M/s
   BM_PlainDecodingBoolean/1024                   12075 ns        12052 ns        58222 bytes_per_second=81.0287M/s
   BM_PlainDecodingBoolean/4096                   46938 ns        46789 ns        14971 bytes_per_second=83.4862M/s
   BM_PlainDecodingBoolean/32768                 371347 ns       370281 ns         1882 bytes_per_second=84.3953M/s
   BM_PlainDecodingBoolean/65536                 751991 ns       746231 ns          950 bytes_per_second=83.7543M/s
   BM_PlainEncodingSpacedBoolean/32768/1         183979 ns       183310 ns         3758 bytes_per_second=170.476M/s null_percent=0.01
   BM_PlainEncodingSpacedBoolean/32768/100       190100 ns       189933 ns         3709 bytes_per_second=164.532M/s null_percent=1
   BM_PlainEncodingSpacedBoolean/32768/1000      234095 ns       233967 ns         2973 bytes_per_second=133.566M/s null_percent=10
   BM_PlainEncodingSpacedBoolean/32768/5000      269608 ns       268846 ns         2565 bytes_per_second=116.238M/s null_percent=50
   BM_PlainEncodingSpacedBoolean/32768/10000       2984 ns         2906 ns       246456 bytes_per_second=10.5026G/s null_percent=100
   BM_PlainDecodingSpacedBoolean/32768/1         376581 ns       375847 ns         1858 bytes_per_second=83.1456M/s null_percent=0.01
   BM_PlainDecodingSpacedBoolean/32768/100       388850 ns       388188 ns         1806 bytes_per_second=80.5022M/s null_percent=1
   BM_PlainDecodingSpacedBoolean/32768/1000      503674 ns       440648 ns         1675 bytes_per_second=70.9182M/s null_percent=10
   BM_PlainDecodingSpacedBoolean/32768/5000      454285 ns       395073 ns         1827 bytes_per_second=79.0993M/s null_percent=50
   BM_PlainDecodingSpacedBoolean/32768/10000        896 ns          769 ns       788075 bytes_per_second=39.678G/s null_percent=100
   BM_RleEncodingBoolean/1024                     21098 ns        14352 ns        50969 bytes_per_second=68.0455M/s
   BM_RleEncodingBoolean/4096                     77047 ns        53142 ns        12606 bytes_per_second=73.5053M/s
   BM_RleEncodingBoolean/32768                   499738 ns       420684 ns         1656 bytes_per_second=74.2838M/s
   BM_RleEncodingBoolean/65536                  1114495 ns       834759 ns          843 bytes_per_second=74.8719M/s
   BM_RleDecodingBoolean/1024                       646 ns          464 ns      1503998 bytes_per_second=2.05328G/s
   BM_RleDecodingBoolean/4096                       856 ns          582 ns      1000000 bytes_per_second=6.55999G/s
   BM_RleDecodingBoolean/32768                     1836 ns         1709 ns       418613 bytes_per_second=17.8598G/s
   BM_RleDecodingBoolean/65536                     3053 ns         2938 ns       217189 bytes_per_second=20.7755G/s
   BM_RleEncodingSpacedBoolean/32768/1           804563 ns       791332 ns          894 bytes_per_second=39.4904M/s null_percent=0.01
   BM_RleEncodingSpacedBoolean/32768/100         818675 ns       803809 ns          889 bytes_per_second=38.8774M/s null_percent=1
   BM_RleEncodingSpacedBoolean/32768/1000        800830 ns       791650 ns          869 bytes_per_second=39.4745M/s null_percent=10
   BM_RleEncodingSpacedBoolean/32768/5000        594831 ns       577282 ns         1218 bytes_per_second=54.1329M/s null_percent=50
   BM_RleEncodingSpacedBoolean/32768/10000         3830 ns         3483 ns       214975 bytes_per_second=8.76202G/s null_percent=100
   BM_RleDecodingSpacedBoolean/32768/1           414478 ns       405644 ns         1729 bytes_per_second=77.0379M/s null_percent=0.01
   BM_RleDecodingSpacedBoolean/32768/100         423042 ns       417850 ns         1635 bytes_per_second=74.7877M/s null_percent=1
   BM_RleDecodingSpacedBoolean/32768/1000        467885 ns       437760 ns         1594 bytes_per_second=71.3862M/s null_percent=10
   BM_RleDecodingSpacedBoolean/32768/5000        377334 ns       374812 ns         1863 bytes_per_second=83.3752M/s null_percent=50
   BM_RleDecodingSpacedBoolean/32768/10000          856 ns          788 ns       881335 bytes_per_second=38.7372G/s null_percent=100
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on code in PR #34676:
URL: https://github.com/apache/arrow/pull/34676#discussion_r1144737465


##########
cpp/src/parquet/encoding_benchmark.cc:
##########
@@ -254,23 +255,59 @@ static void BM_PlainEncodingSpaced(benchmark::State& state) {
   state.SetBytesProcessed(state.iterations() * num_values * sizeof(CType));
 }
 
+template <>
+void BM_EncodingSpaced<BooleanType>(benchmark::State& state, Encoding::type encoding) {
+  using CType = bool;
+
+  const int num_values = static_cast<int>(state.range(0));
+  const double null_percent = static_cast<double>(state.range(1)) / 10000.0;

Review Comment:
   Given that `state.range(1) = MAX_RANGE = 65536` we're always testing for `null_percent = 6.5536`? Perhaps we'd rather pass this explicitly and benchmark for `0.5` and `1`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on code in PR #34676:
URL: https://github.com/apache/arrow/pull/34676#discussion_r1147084614


##########
cpp/src/parquet/encoding_benchmark.cc:
##########
@@ -254,23 +255,59 @@ static void BM_PlainEncodingSpaced(benchmark::State& state) {
   state.SetBytesProcessed(state.iterations() * num_values * sizeof(CType));
 }
 
+template <>
+void BM_EncodingSpaced<BooleanType>(benchmark::State& state, Encoding::type encoding) {
+  using CType = bool;
+
+  const int num_values = static_cast<int>(state.range(0));
+  const double null_percent = static_cast<double>(state.range(1)) / 10000.0;

Review Comment:
   Oh, sorry I didn't follow the whole path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on a diff in pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on code in PR #34676:
URL: https://github.com/apache/arrow/pull/34676#discussion_r1144232519


##########
cpp/src/parquet/encoding_benchmark.cc:
##########
@@ -26,6 +26,7 @@
 #include "arrow/type.h"
 #include "arrow/util/byte_stream_split.h"
 

Review Comment:
   nit: remove empty line



##########
cpp/src/parquet/encoding_benchmark.cc:
##########
@@ -279,22 +316,21 @@ static void BM_PlainDecodingSpaced(benchmark::State& state) {
   const auto null_percent = static_cast<double>(state.range(1)) / 10000.0;
 
   auto rand = ::arrow::random::RandomArrayGenerator(1923);
-  const auto array = rand.Numeric<ArrowType>(num_values, -100, 100, null_percent);
+  std::shared_ptr<::arrow::Array> array;
+  if constexpr (std::is_same_v<ParquetType, BooleanType>) {
+    array = rand.Boolean(num_values, /*true_probability*/0.5, null_percent);
+  } else {
+    array = rand.Numeric<ArrowType>(num_values, -100, 100, null_percent);
+  }
   const auto valid_bits = array->null_bitmap_data();
   const int null_count = static_cast<int>(array->null_count());
   const auto array_actual = ::arrow::internal::checked_pointer_cast<ArrayType>(array);
-  const auto raw_values = array_actual->raw_values();
-  // Guarantee the type cast between raw_values and input of PutSpaced.
-  static_assert(sizeof(CType) == sizeof(*raw_values), "Type mismatch");
-  // Cast only happens for BooleanType as it use UInt8 for the array data to match a bool*
-  // input to PutSpaced.
-  const auto src = reinterpret_cast<const CType*>(raw_values);
 
-  auto encoder = MakeTypedEncoder<ParquetType>(Encoding::PLAIN);
-  encoder->PutSpaced(src, num_values, valid_bits, 0);
+  auto encoder = MakeTypedEncoder<ParquetType>(encoding);
+  encoder->Put(*array);

Review Comment:
   Why changing from `PutSpaced` to `Put`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on PR #34676:
URL: https://github.com/apache/arrow/pull/34676#issuecomment-1478914413

   cc @wgtmac @wjones127 @rok 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on PR #34676:
URL: https://github.com/apache/arrow/pull/34676#issuecomment-1482201072

   @rok would you mind check and merge this patch ? :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on a diff in pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #34676:
URL: https://github.com/apache/arrow/pull/34676#discussion_r1144757473


##########
cpp/src/parquet/encoding_benchmark.cc:
##########
@@ -254,23 +255,59 @@ static void BM_PlainEncodingSpaced(benchmark::State& state) {
   state.SetBytesProcessed(state.iterations() * num_values * sizeof(CType));
 }
 
+template <>
+void BM_EncodingSpaced<BooleanType>(benchmark::State& state, Encoding::type encoding) {
+  using CType = bool;
+
+  const int num_values = static_cast<int>(state.range(0));
+  const double null_percent = static_cast<double>(state.range(1)) / 10000.0;

Review Comment:
   These code was modified from origin test, and only `state.range(0) = MAX_RANGE` is possible. See https://github.com/apache/arrow/pull/34676#issuecomment-1478914177 . `state.range(1)` is null percent



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on a diff in pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #34676:
URL: https://github.com/apache/arrow/pull/34676#discussion_r1144236539


##########
cpp/src/parquet/encoding_benchmark.cc:
##########
@@ -26,6 +26,7 @@
 #include "arrow/type.h"
 #include "arrow/util/byte_stream_split.h"
 

Review Comment:
   Ok



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on code in PR #34676:
URL: https://github.com/apache/arrow/pull/34676#discussion_r1144737465


##########
cpp/src/parquet/encoding_benchmark.cc:
##########
@@ -254,23 +255,59 @@ static void BM_PlainEncodingSpaced(benchmark::State& state) {
   state.SetBytesProcessed(state.iterations() * num_values * sizeof(CType));
 }
 
+template <>
+void BM_EncodingSpaced<BooleanType>(benchmark::State& state, Encoding::type encoding) {
+  using CType = bool;
+
+  const int num_values = static_cast<int>(state.range(0));
+  const double null_percent = static_cast<double>(state.range(1)) / 10000.0;

Review Comment:
   Given that `state.range(1) = MAX_RANGE = 65536` we're always testing for `null_percent = 6.5536`? Perhaps we'd rather pass this explicitly and benchmark for `0.5` and `1`? I might be misunderstanding this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34676:
URL: https://github.com/apache/arrow/pull/34676#issuecomment-1478914144

   * Closes: #34673


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok merged pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok merged PR #34676:
URL: https://github.com/apache/arrow/pull/34676


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org