You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "mapleFU (via GitHub)" <gi...@apache.org> on 2023/03/22 04:50:19 UTC
[GitHub] [arrow] mapleFU opened a new pull request, #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet
mapleFU opened a new pull request, #34676:
URL: https://github.com/apache/arrow/pull/34676
<!--
Thanks for opening a pull request!
If this is your first pull request you can find detailed information on how
to contribute here:
* [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request)
* [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html)
If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
Then could you also rename the pull request title in the following format?
GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
or
MINOR: [${COMPONENT}] ${SUMMARY}
In the case of PARQUET issues on JIRA the title also supports:
PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
-->
### Rationale for this change
<!--
Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
-->
Add boolean encoding benchmark for parquet.
### What changes are included in this PR?
<!--
There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
-->
Some benchmarks
### Are these changes tested?
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code
If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
-->
No, they're benchmark
### Are there any user-facing changes?
No
<!--
If there are user-facing changes then we may require documentation to be updated before approving the PR.
-->
<!--
If there are any breaking changes to public APIs, please uncomment the line below and explain which changes are breaking.
-->
<!-- **This PR includes breaking changes to public APIs.** -->
<!--
Please uncomment the line below (and provide explanation) if the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld). We use this to highlight fixes to issues that may affect users without their knowledge. For this reason, fixing bugs that cause errors don't count, since those are usually obvious.
-->
<!-- **This PR contains a "Critical Fix".** -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] mapleFU commented on pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet
Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on PR #34676:
URL: https://github.com/apache/arrow/pull/34676#issuecomment-1478968133
I guess boost official uses https://boostorg.jfrog.io/artifactory/main/release/1.81.0/source/ , and we have:
```
if(DEFINED ENV{ARROW_BOOST_URL})
set(BOOST_SOURCE_URL "$ENV{ARROW_BOOST_URL}")
else()
string(REPLACE "." "_" ARROW_BOOST_BUILD_VERSION_UNDERSCORES
${ARROW_BOOST_BUILD_VERSION})
set_urls(BOOST_SOURCE_URL
# These are trimmed boost bundles we maintain.
# See cpp/build-support/trim-boost.sh
# FIXME(ARROW-6407) automate uploading this archive to ensure it reflects
# our currently used packages and doesn't fall out of sync with
# ${ARROW_BOOST_BUILD_VERSION_UNDERSCORES}
"${THIRDPARTY_MIRROR_URL}/boost_${ARROW_BOOST_BUILD_VERSION_UNDERSCORES}.tar.gz"
"https://boostorg.jfrog.io/artifactory/main/release/${ARROW_BOOST_BUILD_VERSION}/source/boost_${ARROW_BOOST_BUILD_VERSION_UNDERSCORES}.tar.gz"
"https://sourceforge.net/projects/boost/files/boost/${ARROW_BOOST_BUILD_VERSION}/boost_${ARROW_BOOST_BUILD_VERSION_UNDERSCORES}.tar.gz"
)
endif()
```
Maybe hash for them is inconsistent, we can track it in [#34675](https://github.com/apache/arrow/issues/34675)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] mapleFU commented on pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet
Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on PR #34676:
URL: https://github.com/apache/arrow/pull/34676#issuecomment-1478940339
@kou Mind take a look at boost hash mismatch problem?
1. https://github.com/apache/arrow/actions/runs/4486679768/jobs/7889388957?pr=34676
2. https://github.com/apache/arrow/actions/runs/4486679773/jobs/7889390019?pr=34676
Found another problem in issues: https://github.com/apache/arrow/issues/34675
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] mapleFU commented on a diff in pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet
Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #34676:
URL: https://github.com/apache/arrow/pull/34676#discussion_r1144236462
##########
cpp/src/parquet/encoding_benchmark.cc:
##########
@@ -279,22 +316,21 @@ static void BM_PlainDecodingSpaced(benchmark::State& state) {
const auto null_percent = static_cast<double>(state.range(1)) / 10000.0;
auto rand = ::arrow::random::RandomArrayGenerator(1923);
- const auto array = rand.Numeric<ArrowType>(num_values, -100, 100, null_percent);
+ std::shared_ptr<::arrow::Array> array;
+ if constexpr (std::is_same_v<ParquetType, BooleanType>) {
+ array = rand.Boolean(num_values, /*true_probability*/0.5, null_percent);
+ } else {
+ array = rand.Numeric<ArrowType>(num_values, -100, 100, null_percent);
+ }
const auto valid_bits = array->null_bitmap_data();
const int null_count = static_cast<int>(array->null_count());
const auto array_actual = ::arrow::internal::checked_pointer_cast<ArrayType>(array);
- const auto raw_values = array_actual->raw_values();
- // Guarantee the type cast between raw_values and input of PutSpaced.
- static_assert(sizeof(CType) == sizeof(*raw_values), "Type mismatch");
- // Cast only happens for BooleanType as it use UInt8 for the array data to match a bool*
- // input to PutSpaced.
- const auto src = reinterpret_cast<const CType*>(raw_values);
- auto encoder = MakeTypedEncoder<ParquetType>(Encoding::PLAIN);
- encoder->PutSpaced(src, num_values, valid_bits, 0);
+ auto encoder = MakeTypedEncoder<ParquetType>(encoding);
+ encoder->Put(*array);
Review Comment:
Currently, this benchmark tests `GetSpaced`, so, when `Put` I just call `Put(array)`, and encoder itself will handle null and values
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34676:
URL: https://github.com/apache/arrow/pull/34676#issuecomment-1478914164
:warning: GitHub issue #34673 **has been automatically assigned in GitHub** to PR creator.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kou commented on pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet
Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on PR #34676:
URL: https://github.com/apache/arrow/pull/34676#issuecomment-1478959072
We can ignore these failures for now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] mapleFU commented on a diff in pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet
Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #34676:
URL: https://github.com/apache/arrow/pull/34676#discussion_r1144763362
##########
cpp/src/parquet/encoding_benchmark.cc:
##########
@@ -254,23 +255,59 @@ static void BM_PlainEncodingSpaced(benchmark::State& state) {
state.SetBytesProcessed(state.iterations() * num_values * sizeof(CType));
}
+template <>
+void BM_EncodingSpaced<BooleanType>(benchmark::State& state, Encoding::type encoding) {
+ using CType = bool;
+
+ const int num_values = static_cast<int>(state.range(0));
+ const double null_percent = static_cast<double>(state.range(1)) / 10000.0;
Review Comment:
For spaced tests, arguments come from here:
```
static void BM_SpacedArgs(benchmark::internal::Benchmark* bench) {
constexpr auto kPlainSpacedSize = 32 * 1024; // 32k
bench->Args({/*size*/ kPlainSpacedSize, /*null_in_ten_thousand*/ 1});
bench->Args({/*size*/ kPlainSpacedSize, /*null_in_ten_thousand*/ 100});
bench->Args({/*size*/ kPlainSpacedSize, /*null_in_ten_thousand*/ 1000});
bench->Args({/*size*/ kPlainSpacedSize, /*null_in_ten_thousand*/ 5000});
bench->Args({/*size*/ kPlainSpacedSize, /*null_in_ten_thousand*/ 10000});
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] mapleFU commented on pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet
Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on PR #34676:
URL: https://github.com/apache/arrow/pull/34676#issuecomment-1481946369
@wjones127 mind take a look?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] mapleFU commented on pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet
Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on PR #34676:
URL: https://github.com/apache/arrow/pull/34676#issuecomment-1478914177
On My M1 MacOS with Release:
```
----------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
----------------------------------------------------------------------------------------------------
BM_PlainEncodingBoolean/1024 9482 ns 6847 ns 114183 bytes_per_second=142.633M/s
BM_PlainEncodingBoolean/4096 26418 ns 22688 ns 30173 bytes_per_second=172.173M/s
BM_PlainEncodingBoolean/32768 172280 ns 168138 ns 3905 bytes_per_second=185.86M/s
BM_PlainEncodingBoolean/65536 335606 ns 331450 ns 2115 bytes_per_second=188.565M/s
BM_PlainDecodingBoolean/1024 12075 ns 12052 ns 58222 bytes_per_second=81.0287M/s
BM_PlainDecodingBoolean/4096 46938 ns 46789 ns 14971 bytes_per_second=83.4862M/s
BM_PlainDecodingBoolean/32768 371347 ns 370281 ns 1882 bytes_per_second=84.3953M/s
BM_PlainDecodingBoolean/65536 751991 ns 746231 ns 950 bytes_per_second=83.7543M/s
BM_PlainEncodingSpacedBoolean/32768/1 183979 ns 183310 ns 3758 bytes_per_second=170.476M/s null_percent=0.01
BM_PlainEncodingSpacedBoolean/32768/100 190100 ns 189933 ns 3709 bytes_per_second=164.532M/s null_percent=1
BM_PlainEncodingSpacedBoolean/32768/1000 234095 ns 233967 ns 2973 bytes_per_second=133.566M/s null_percent=10
BM_PlainEncodingSpacedBoolean/32768/5000 269608 ns 268846 ns 2565 bytes_per_second=116.238M/s null_percent=50
BM_PlainEncodingSpacedBoolean/32768/10000 2984 ns 2906 ns 246456 bytes_per_second=10.5026G/s null_percent=100
BM_PlainDecodingSpacedBoolean/32768/1 376581 ns 375847 ns 1858 bytes_per_second=83.1456M/s null_percent=0.01
BM_PlainDecodingSpacedBoolean/32768/100 388850 ns 388188 ns 1806 bytes_per_second=80.5022M/s null_percent=1
BM_PlainDecodingSpacedBoolean/32768/1000 503674 ns 440648 ns 1675 bytes_per_second=70.9182M/s null_percent=10
BM_PlainDecodingSpacedBoolean/32768/5000 454285 ns 395073 ns 1827 bytes_per_second=79.0993M/s null_percent=50
BM_PlainDecodingSpacedBoolean/32768/10000 896 ns 769 ns 788075 bytes_per_second=39.678G/s null_percent=100
BM_RleEncodingBoolean/1024 21098 ns 14352 ns 50969 bytes_per_second=68.0455M/s
BM_RleEncodingBoolean/4096 77047 ns 53142 ns 12606 bytes_per_second=73.5053M/s
BM_RleEncodingBoolean/32768 499738 ns 420684 ns 1656 bytes_per_second=74.2838M/s
BM_RleEncodingBoolean/65536 1114495 ns 834759 ns 843 bytes_per_second=74.8719M/s
BM_RleDecodingBoolean/1024 646 ns 464 ns 1503998 bytes_per_second=2.05328G/s
BM_RleDecodingBoolean/4096 856 ns 582 ns 1000000 bytes_per_second=6.55999G/s
BM_RleDecodingBoolean/32768 1836 ns 1709 ns 418613 bytes_per_second=17.8598G/s
BM_RleDecodingBoolean/65536 3053 ns 2938 ns 217189 bytes_per_second=20.7755G/s
BM_RleEncodingSpacedBoolean/32768/1 804563 ns 791332 ns 894 bytes_per_second=39.4904M/s null_percent=0.01
BM_RleEncodingSpacedBoolean/32768/100 818675 ns 803809 ns 889 bytes_per_second=38.8774M/s null_percent=1
BM_RleEncodingSpacedBoolean/32768/1000 800830 ns 791650 ns 869 bytes_per_second=39.4745M/s null_percent=10
BM_RleEncodingSpacedBoolean/32768/5000 594831 ns 577282 ns 1218 bytes_per_second=54.1329M/s null_percent=50
BM_RleEncodingSpacedBoolean/32768/10000 3830 ns 3483 ns 214975 bytes_per_second=8.76202G/s null_percent=100
BM_RleDecodingSpacedBoolean/32768/1 414478 ns 405644 ns 1729 bytes_per_second=77.0379M/s null_percent=0.01
BM_RleDecodingSpacedBoolean/32768/100 423042 ns 417850 ns 1635 bytes_per_second=74.7877M/s null_percent=1
BM_RleDecodingSpacedBoolean/32768/1000 467885 ns 437760 ns 1594 bytes_per_second=71.3862M/s null_percent=10
BM_RleDecodingSpacedBoolean/32768/5000 377334 ns 374812 ns 1863 bytes_per_second=83.3752M/s null_percent=50
BM_RleDecodingSpacedBoolean/32768/10000 856 ns 788 ns 881335 bytes_per_second=38.7372G/s null_percent=100
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] rok commented on a diff in pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet
Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on code in PR #34676:
URL: https://github.com/apache/arrow/pull/34676#discussion_r1144737465
##########
cpp/src/parquet/encoding_benchmark.cc:
##########
@@ -254,23 +255,59 @@ static void BM_PlainEncodingSpaced(benchmark::State& state) {
state.SetBytesProcessed(state.iterations() * num_values * sizeof(CType));
}
+template <>
+void BM_EncodingSpaced<BooleanType>(benchmark::State& state, Encoding::type encoding) {
+ using CType = bool;
+
+ const int num_values = static_cast<int>(state.range(0));
+ const double null_percent = static_cast<double>(state.range(1)) / 10000.0;
Review Comment:
Given that `state.range(1) = MAX_RANGE = 65536` we're always testing for `null_percent = 6.5536`? Perhaps we'd rather pass this explicitly and benchmark for `0.5` and `1`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] rok commented on a diff in pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet
Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on code in PR #34676:
URL: https://github.com/apache/arrow/pull/34676#discussion_r1147084614
##########
cpp/src/parquet/encoding_benchmark.cc:
##########
@@ -254,23 +255,59 @@ static void BM_PlainEncodingSpaced(benchmark::State& state) {
state.SetBytesProcessed(state.iterations() * num_values * sizeof(CType));
}
+template <>
+void BM_EncodingSpaced<BooleanType>(benchmark::State& state, Encoding::type encoding) {
+ using CType = bool;
+
+ const int num_values = static_cast<int>(state.range(0));
+ const double null_percent = static_cast<double>(state.range(1)) / 10000.0;
Review Comment:
Oh, sorry I didn't follow the whole path.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wgtmac commented on a diff in pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet
Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on code in PR #34676:
URL: https://github.com/apache/arrow/pull/34676#discussion_r1144232519
##########
cpp/src/parquet/encoding_benchmark.cc:
##########
@@ -26,6 +26,7 @@
#include "arrow/type.h"
#include "arrow/util/byte_stream_split.h"
Review Comment:
nit: remove empty line
##########
cpp/src/parquet/encoding_benchmark.cc:
##########
@@ -279,22 +316,21 @@ static void BM_PlainDecodingSpaced(benchmark::State& state) {
const auto null_percent = static_cast<double>(state.range(1)) / 10000.0;
auto rand = ::arrow::random::RandomArrayGenerator(1923);
- const auto array = rand.Numeric<ArrowType>(num_values, -100, 100, null_percent);
+ std::shared_ptr<::arrow::Array> array;
+ if constexpr (std::is_same_v<ParquetType, BooleanType>) {
+ array = rand.Boolean(num_values, /*true_probability*/0.5, null_percent);
+ } else {
+ array = rand.Numeric<ArrowType>(num_values, -100, 100, null_percent);
+ }
const auto valid_bits = array->null_bitmap_data();
const int null_count = static_cast<int>(array->null_count());
const auto array_actual = ::arrow::internal::checked_pointer_cast<ArrayType>(array);
- const auto raw_values = array_actual->raw_values();
- // Guarantee the type cast between raw_values and input of PutSpaced.
- static_assert(sizeof(CType) == sizeof(*raw_values), "Type mismatch");
- // Cast only happens for BooleanType as it use UInt8 for the array data to match a bool*
- // input to PutSpaced.
- const auto src = reinterpret_cast<const CType*>(raw_values);
- auto encoder = MakeTypedEncoder<ParquetType>(Encoding::PLAIN);
- encoder->PutSpaced(src, num_values, valid_bits, 0);
+ auto encoder = MakeTypedEncoder<ParquetType>(encoding);
+ encoder->Put(*array);
Review Comment:
Why changing from `PutSpaced` to `Put`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] mapleFU commented on pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet
Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on PR #34676:
URL: https://github.com/apache/arrow/pull/34676#issuecomment-1478914413
cc @wgtmac @wjones127 @rok
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] mapleFU commented on pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet
Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on PR #34676:
URL: https://github.com/apache/arrow/pull/34676#issuecomment-1482201072
@rok would you mind check and merge this patch ? :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] mapleFU commented on a diff in pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet
Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #34676:
URL: https://github.com/apache/arrow/pull/34676#discussion_r1144757473
##########
cpp/src/parquet/encoding_benchmark.cc:
##########
@@ -254,23 +255,59 @@ static void BM_PlainEncodingSpaced(benchmark::State& state) {
state.SetBytesProcessed(state.iterations() * num_values * sizeof(CType));
}
+template <>
+void BM_EncodingSpaced<BooleanType>(benchmark::State& state, Encoding::type encoding) {
+ using CType = bool;
+
+ const int num_values = static_cast<int>(state.range(0));
+ const double null_percent = static_cast<double>(state.range(1)) / 10000.0;
Review Comment:
These code was modified from origin test, and only `state.range(0) = MAX_RANGE` is possible. See https://github.com/apache/arrow/pull/34676#issuecomment-1478914177 . `state.range(1)` is null percent
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] mapleFU commented on a diff in pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet
Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #34676:
URL: https://github.com/apache/arrow/pull/34676#discussion_r1144236539
##########
cpp/src/parquet/encoding_benchmark.cc:
##########
@@ -26,6 +26,7 @@
#include "arrow/type.h"
#include "arrow/util/byte_stream_split.h"
Review Comment:
Ok
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] rok commented on a diff in pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet
Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on code in PR #34676:
URL: https://github.com/apache/arrow/pull/34676#discussion_r1144737465
##########
cpp/src/parquet/encoding_benchmark.cc:
##########
@@ -254,23 +255,59 @@ static void BM_PlainEncodingSpaced(benchmark::State& state) {
state.SetBytesProcessed(state.iterations() * num_values * sizeof(CType));
}
+template <>
+void BM_EncodingSpaced<BooleanType>(benchmark::State& state, Encoding::type encoding) {
+ using CType = bool;
+
+ const int num_values = static_cast<int>(state.range(0));
+ const double null_percent = static_cast<double>(state.range(1)) / 10000.0;
Review Comment:
Given that `state.range(1) = MAX_RANGE = 65536` we're always testing for `null_percent = 6.5536`? Perhaps we'd rather pass this explicitly and benchmark for `0.5` and `1`? I might be misunderstanding this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34676:
URL: https://github.com/apache/arrow/pull/34676#issuecomment-1478914144
* Closes: #34673
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] rok merged pull request #34676: GH-34673: [C++][Parquet] Add Boolean Encoding benchmark for parquet
Posted by "rok (via GitHub)" <gi...@apache.org>.
rok merged PR #34676:
URL: https://github.com/apache/arrow/pull/34676
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org