You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "mapleFU (via GitHub)" <gi...@apache.org> on 2023/10/06 09:26:53 UTC

[PR] GH-38042: [C++] Benchmark: Add benchmark for non-stream Codec Compression/Decompression [arrow]

mapleFU opened a new pull request, #38067:
URL: https://github.com/apache/arrow/pull/38067

   <!--
   Thanks for opening a pull request!
   If this is your first pull request you can find detailed information on how 
   to contribute here:
     * [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request)
     * [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html)
   
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
   
   Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename the pull request title in the following format?
   
       GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   In the case of PARQUET issues on JIRA the title also supports:
   
       PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   -->
   
   ### Rationale for this change
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   Currently, we will enable compression benchmark with ARROW_WITH_BENCHMARKS_REFERENCE
   
   Note that it only has benchmark for compressor ( make by Codec::MakeCompressor() ) and decompressor ( make by Codec::MakeDecompressor ). However, Parquet uses Codec to encode and decode. So, I'd like to add benchmarks that use Codec directly.
   
   ### What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   Add benchmark for direct compression and decompression
   
   ### Are these changes tested?
   
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   no need
   
   ### Are there any user-facing changes?
   
   no
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please uncomment the line below and explain which changes are breaking.
   -->
   <!-- **This PR includes breaking changes to public APIs.** -->
   
   <!--
   Please uncomment the line below (and provide explanation) if the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld). We use this to highlight fixes to issues that may affect users without their knowledge. For this reason, fixing bugs that cause errors don't count, since those are usually obvious.
   -->
   <!-- **This PR contains a "Critical Fix".** -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38042: [C++][Benchmark] Add non-stream Codec Compression/Decompression [arrow]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #38067:
URL: https://github.com/apache/arrow/pull/38067#issuecomment-1778676712

   So we could have added LZ4 and Snappy here. @mapleFU Would you like to do that as a followup PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38042: [C++][Benchmark] Add non-stream Codec Compression/Decompression [arrow]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #38067:
URL: https://github.com/apache/arrow/pull/38067#issuecomment-1778683079

   It's just reasonable to benchmark all available codecs, not a subset of hem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38042: [C++][Benchmark] Add non-stream Codec Compression/Decompression [arrow]

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #38067:
URL: https://github.com/apache/arrow/pull/38067#discussion_r1370279111


##########
cpp/src/arrow/util/compression_benchmark.cc:
##########
@@ -175,27 +206,56 @@ static void ReferenceStreamingDecompression(
   StreamingDecompression(COMPRESSION, data, state);
 }
 
+template <Compression::type COMPRESSION>
+static void ReferenceDecompression(
+    benchmark::State& state) {                        // NOLINT non-const reference
+  auto data = MakeCompressibleData(8 * 1024 * 1024);  // 8 MB
+
+  auto codec = *Codec::Create(COMPRESSION);
+
+  std::vector<uint8_t> compressed_data;
+  ARROW_UNUSED(NonStreamingCompress(codec.get(), data, &compressed_data));
+  state.counters["ratio"] =
+      static_cast<double>(data.size()) / static_cast<double>(compressed_data.size());
+
+  std::vector<uint8_t> decompressed_data(data);
+  while (state.KeepRunning()) {
+    auto result = codec->Decompress(compressed_data.size(), compressed_data.data(),
+                                    decompressed_data.size(), decompressed_data.data());
+    ARROW_CHECK(result.ok());
+    ARROW_CHECK(*result == static_cast<int64_t>(decompressed_data.size()));
+  }
+  state.SetBytesProcessed(state.iterations() * data.size());
+}
+
 #ifdef ARROW_WITH_ZLIB
 BENCHMARK_TEMPLATE(ReferenceStreamingCompression, Compression::GZIP);
+BENCHMARK_TEMPLATE(ReferenceCompression, Compression::GZIP);
 BENCHMARK_TEMPLATE(ReferenceStreamingDecompression, Compression::GZIP);
+BENCHMARK_TEMPLATE(ReferenceDecompression, Compression::GZIP);
 #endif
 
 #ifdef ARROW_WITH_BROTLI
 BENCHMARK_TEMPLATE(ReferenceStreamingCompression, Compression::BROTLI);
+BENCHMARK_TEMPLATE(ReferenceCompression, Compression::BROTLI);
 BENCHMARK_TEMPLATE(ReferenceStreamingDecompression, Compression::BROTLI);
+BENCHMARK_TEMPLATE(ReferenceDecompression, Compression::BROTLI);
 #endif
 
 #ifdef ARROW_WITH_ZSTD
 BENCHMARK_TEMPLATE(ReferenceStreamingCompression, Compression::ZSTD);
+BENCHMARK_TEMPLATE(ReferenceCompression, Compression::ZSTD);
 BENCHMARK_TEMPLATE(ReferenceStreamingDecompression, Compression::ZSTD);
+BENCHMARK_TEMPLATE(ReferenceDecompression, Compression::ZSTD);
 #endif
 
 #ifdef ARROW_WITH_LZ4
 BENCHMARK_TEMPLATE(ReferenceStreamingCompression, Compression::LZ4_FRAME);
+BENCHMARK_TEMPLATE(ReferenceCompression, Compression::LZ4_FRAME);
 BENCHMARK_TEMPLATE(ReferenceStreamingDecompression, Compression::LZ4_FRAME);
+BENCHMARK_TEMPLATE(ReferenceDecompression, Compression::LZ4_FRAME);

Review Comment:
   And I don't think they have too many differences...
   
   Currently I didn't add `LZ4`. But feel free to add if neccesssary



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38042: [C++] Benchmark: Add benchmark for non-stream Codec Compression/Decompression [arrow]

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on PR #38067:
URL: https://github.com/apache/arrow/pull/38067#issuecomment-1772145107

   @pitrou @kou Since parquet uses Compression/Decompression in `Codec`, I've add group of test here. Would you mind take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38042: [C++][Benchmark] Add non-stream Codec Compression/Decompression [arrow]

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou merged PR #38067:
URL: https://github.com/apache/arrow/pull/38067


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38042: [C++][Benchmark] Add non-stream Codec Compression/Decompression [arrow]

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on PR #38067:
URL: https://github.com/apache/arrow/pull/38067#issuecomment-1778681560

   Let me rush it :-)
   
   (Just curiously, is it related to https://github.com/apache/arrow/issues/38389 ) ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38042: [C++][Benchmark] Add non-stream Codec Compression/Decompression [arrow]

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #38067:
URL: https://github.com/apache/arrow/pull/38067#discussion_r1369004030


##########
cpp/src/arrow/util/compression_benchmark.cc:
##########
@@ -175,27 +206,56 @@ static void ReferenceStreamingDecompression(
   StreamingDecompression(COMPRESSION, data, state);
 }
 
+template <Compression::type COMPRESSION>
+static void ReferenceDecompression(
+    benchmark::State& state) {                        // NOLINT non-const reference
+  auto data = MakeCompressibleData(8 * 1024 * 1024);  // 8 MB
+
+  auto codec = *Codec::Create(COMPRESSION);
+
+  std::vector<uint8_t> compressed_data;
+  ARROW_UNUSED(NonStreamingCompress(codec.get(), data, &compressed_data));
+  state.counters["ratio"] =
+      static_cast<double>(data.size()) / static_cast<double>(compressed_data.size());
+
+  std::vector<uint8_t> decompressed_data(data);
+  while (state.KeepRunning()) {
+    auto result = codec->Decompress(compressed_data.size(), compressed_data.data(),
+                                    decompressed_data.size(), decompressed_data.data());
+    ARROW_CHECK(result.ok());
+    ARROW_CHECK(*result == static_cast<int64_t>(decompressed_data.size()));
+  }
+  state.SetBytesProcessed(state.iterations() * data.size());
+}
+
 #ifdef ARROW_WITH_ZLIB
 BENCHMARK_TEMPLATE(ReferenceStreamingCompression, Compression::GZIP);
+BENCHMARK_TEMPLATE(ReferenceCompression, Compression::GZIP);
 BENCHMARK_TEMPLATE(ReferenceStreamingDecompression, Compression::GZIP);
+BENCHMARK_TEMPLATE(ReferenceDecompression, Compression::GZIP);
 #endif
 
 #ifdef ARROW_WITH_BROTLI
 BENCHMARK_TEMPLATE(ReferenceStreamingCompression, Compression::BROTLI);
+BENCHMARK_TEMPLATE(ReferenceCompression, Compression::BROTLI);
 BENCHMARK_TEMPLATE(ReferenceStreamingDecompression, Compression::BROTLI);
+BENCHMARK_TEMPLATE(ReferenceDecompression, Compression::BROTLI);
 #endif
 
 #ifdef ARROW_WITH_ZSTD
 BENCHMARK_TEMPLATE(ReferenceStreamingCompression, Compression::ZSTD);
+BENCHMARK_TEMPLATE(ReferenceCompression, Compression::ZSTD);
 BENCHMARK_TEMPLATE(ReferenceStreamingDecompression, Compression::ZSTD);
+BENCHMARK_TEMPLATE(ReferenceDecompression, Compression::ZSTD);
 #endif
 
 #ifdef ARROW_WITH_LZ4
 BENCHMARK_TEMPLATE(ReferenceStreamingCompression, Compression::LZ4_FRAME);
+BENCHMARK_TEMPLATE(ReferenceCompression, Compression::LZ4_FRAME);
 BENCHMARK_TEMPLATE(ReferenceStreamingDecompression, Compression::LZ4_FRAME);
+BENCHMARK_TEMPLATE(ReferenceDecompression, Compression::LZ4_FRAME);

Review Comment:
   > It seems that Parquet doesn't use LZ4_FRAME
   
   Aha I remember parquet-mr first implement LZ4. And arrow implement a different version ( LZ4_FRAME ). `LZ4` stores an extra-length here.
   
   Maybe https://github.com/apache/parquet-format/pull/168 helps



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38042: [C++] Benchmark: Add benchmark for non-stream Codec Compression/Decompression [arrow]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #38067:
URL: https://github.com/apache/arrow/pull/38067#issuecomment-1750275856

   :warning: GitHub issue #38042 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38042: [C++][Benchmark] Add non-stream Codec Compression/Decompression [arrow]

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on code in PR #38067:
URL: https://github.com/apache/arrow/pull/38067#discussion_r1367496792


##########
cpp/src/arrow/util/compression_benchmark.cc:
##########
@@ -133,6 +132,38 @@ static void ReferenceStreamingCompression(
   StreamingCompression(COMPRESSION, data, state);
 }
 
+int64_t NonStreamingCompress(Codec* codec, const std::vector<uint8_t>& data,
+                             std::vector<uint8_t>* compressed_data) {
+  const uint8_t* input = data.data();
+  int64_t input_len = data.size();
+  int64_t compressed_size = 0;
+  int64_t max_compress_len =
+      codec->MaxCompressedLen(static_cast<int64_t>(data.size()), data.data());

Review Comment:
   Can we use defined variables here?
   
   ```suggestion
         codec->MaxCompressedLen(input_len, input);
   ```



##########
cpp/src/arrow/util/compression_benchmark.cc:
##########
@@ -133,6 +132,38 @@ static void ReferenceStreamingCompression(
   StreamingCompression(COMPRESSION, data, state);
 }
 
+int64_t NonStreamingCompress(Codec* codec, const std::vector<uint8_t>& data,
+                             std::vector<uint8_t>* compressed_data) {
+  const uint8_t* input = data.data();
+  int64_t input_len = data.size();
+  int64_t compressed_size = 0;
+  int64_t max_compress_len =
+      codec->MaxCompressedLen(static_cast<int64_t>(data.size()), data.data());
+  compressed_data->resize(max_compress_len);
+
+  if (input_len > 0) {
+    compressed_size = *codec->Compress(input_len, input, compressed_data->size(),
+                                       compressed_data->data());
+    compressed_data->resize(compressed_size);
+  }
+  return compressed_size;
+}
+
+template <Compression::type COMPRESSION>
+static void ReferenceCompression(benchmark::State& state) {  // NOLINT non-const reference
+  auto data = MakeCompressibleData(8 * 1024 * 1024);         // 8 MB
+
+  auto codec = *Codec::Create(COMPRESSION);
+
+  while (state.KeepRunning()) {
+    std::vector<uint8_t> compressed_data;
+    ARROW_UNUSED(NonStreamingCompress(codec.get(), data, &compressed_data));
+    state.counters["ratio"] =
+        static_cast<double>(data.size()) / static_cast<double>(compressed_data.size());

Review Comment:
   How about using return value?
   
   ```suggestion
       auto compressed_size = NonStreamingCompress(codec.get(), data, &compressed_data);
       state.counters["ratio"] =
           static_cast<double>(data.size()) / static_cast<double>(compressed_size);
   ```



##########
cpp/src/arrow/util/compression_benchmark.cc:
##########
@@ -133,6 +132,38 @@ static void ReferenceStreamingCompression(
   StreamingCompression(COMPRESSION, data, state);
 }
 
+int64_t NonStreamingCompress(Codec* codec, const std::vector<uint8_t>& data,

Review Comment:
   How about removing `NonStreaming` because `ReferenceCompression` doesn't have `NonStreaming` prefix?
   
   ```suggestion
   int64_t Compress(Codec* codec, const std::vector<uint8_t>& data,
   ```



##########
cpp/src/arrow/util/compression_benchmark.cc:
##########
@@ -175,27 +206,56 @@ static void ReferenceStreamingDecompression(
   StreamingDecompression(COMPRESSION, data, state);
 }
 
+template <Compression::type COMPRESSION>
+static void ReferenceDecompression(
+    benchmark::State& state) {                        // NOLINT non-const reference
+  auto data = MakeCompressibleData(8 * 1024 * 1024);  // 8 MB
+
+  auto codec = *Codec::Create(COMPRESSION);
+
+  std::vector<uint8_t> compressed_data;
+  ARROW_UNUSED(NonStreamingCompress(codec.get(), data, &compressed_data));
+  state.counters["ratio"] =
+      static_cast<double>(data.size()) / static_cast<double>(compressed_data.size());
+
+  std::vector<uint8_t> decompressed_data(data);
+  while (state.KeepRunning()) {
+    auto result = codec->Decompress(compressed_data.size(), compressed_data.data(),
+                                    decompressed_data.size(), decompressed_data.data());
+    ARROW_CHECK(result.ok());
+    ARROW_CHECK(*result == static_cast<int64_t>(decompressed_data.size()));
+  }
+  state.SetBytesProcessed(state.iterations() * data.size());
+}
+
 #ifdef ARROW_WITH_ZLIB
 BENCHMARK_TEMPLATE(ReferenceStreamingCompression, Compression::GZIP);
+BENCHMARK_TEMPLATE(ReferenceCompression, Compression::GZIP);
 BENCHMARK_TEMPLATE(ReferenceStreamingDecompression, Compression::GZIP);
+BENCHMARK_TEMPLATE(ReferenceDecompression, Compression::GZIP);
 #endif
 
 #ifdef ARROW_WITH_BROTLI
 BENCHMARK_TEMPLATE(ReferenceStreamingCompression, Compression::BROTLI);
+BENCHMARK_TEMPLATE(ReferenceCompression, Compression::BROTLI);
 BENCHMARK_TEMPLATE(ReferenceStreamingDecompression, Compression::BROTLI);
+BENCHMARK_TEMPLATE(ReferenceDecompression, Compression::BROTLI);
 #endif
 
 #ifdef ARROW_WITH_ZSTD
 BENCHMARK_TEMPLATE(ReferenceStreamingCompression, Compression::ZSTD);
+BENCHMARK_TEMPLATE(ReferenceCompression, Compression::ZSTD);
 BENCHMARK_TEMPLATE(ReferenceStreamingDecompression, Compression::ZSTD);
+BENCHMARK_TEMPLATE(ReferenceDecompression, Compression::ZSTD);
 #endif
 
 #ifdef ARROW_WITH_LZ4
 BENCHMARK_TEMPLATE(ReferenceStreamingCompression, Compression::LZ4_FRAME);
+BENCHMARK_TEMPLATE(ReferenceCompression, Compression::LZ4_FRAME);
 BENCHMARK_TEMPLATE(ReferenceStreamingDecompression, Compression::LZ4_FRAME);
+BENCHMARK_TEMPLATE(ReferenceDecompression, Compression::LZ4_FRAME);

Review Comment:
   Is `LZ4_FRAME` OK?
   It seems that Parquet doesn't use `LZ4_FRAME`.



##########
cpp/src/arrow/util/compression_benchmark.cc:
##########
@@ -133,6 +132,38 @@ static void ReferenceStreamingCompression(
   StreamingCompression(COMPRESSION, data, state);
 }
 
+int64_t NonStreamingCompress(Codec* codec, const std::vector<uint8_t>& data,
+                             std::vector<uint8_t>* compressed_data) {
+  const uint8_t* input = data.data();
+  int64_t input_len = data.size();
+  int64_t compressed_size = 0;
+  int64_t max_compress_len =

Review Comment:
   ```suggestion
     int64_t max_compressed_len =
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38042: [C++][Benchmark] Add non-stream Codec Compression/Decompression [arrow]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #38067:
URL: https://github.com/apache/arrow/pull/38067#discussion_r1368996548


##########
cpp/src/arrow/util/compression_benchmark.cc:
##########
@@ -175,27 +206,56 @@ static void ReferenceStreamingDecompression(
   StreamingDecompression(COMPRESSION, data, state);
 }
 
+template <Compression::type COMPRESSION>
+static void ReferenceDecompression(
+    benchmark::State& state) {                        // NOLINT non-const reference
+  auto data = MakeCompressibleData(8 * 1024 * 1024);  // 8 MB
+
+  auto codec = *Codec::Create(COMPRESSION);
+
+  std::vector<uint8_t> compressed_data;
+  ARROW_UNUSED(NonStreamingCompress(codec.get(), data, &compressed_data));
+  state.counters["ratio"] =
+      static_cast<double>(data.size()) / static_cast<double>(compressed_data.size());
+
+  std::vector<uint8_t> decompressed_data(data);
+  while (state.KeepRunning()) {
+    auto result = codec->Decompress(compressed_data.size(), compressed_data.data(),
+                                    decompressed_data.size(), decompressed_data.data());
+    ARROW_CHECK(result.ok());
+    ARROW_CHECK(*result == static_cast<int64_t>(decompressed_data.size()));
+  }
+  state.SetBytesProcessed(state.iterations() * data.size());
+}
+
 #ifdef ARROW_WITH_ZLIB
 BENCHMARK_TEMPLATE(ReferenceStreamingCompression, Compression::GZIP);
+BENCHMARK_TEMPLATE(ReferenceCompression, Compression::GZIP);
 BENCHMARK_TEMPLATE(ReferenceStreamingDecompression, Compression::GZIP);
+BENCHMARK_TEMPLATE(ReferenceDecompression, Compression::GZIP);
 #endif
 
 #ifdef ARROW_WITH_BROTLI
 BENCHMARK_TEMPLATE(ReferenceStreamingCompression, Compression::BROTLI);
+BENCHMARK_TEMPLATE(ReferenceCompression, Compression::BROTLI);
 BENCHMARK_TEMPLATE(ReferenceStreamingDecompression, Compression::BROTLI);
+BENCHMARK_TEMPLATE(ReferenceDecompression, Compression::BROTLI);
 #endif
 
 #ifdef ARROW_WITH_ZSTD
 BENCHMARK_TEMPLATE(ReferenceStreamingCompression, Compression::ZSTD);
+BENCHMARK_TEMPLATE(ReferenceCompression, Compression::ZSTD);
 BENCHMARK_TEMPLATE(ReferenceStreamingDecompression, Compression::ZSTD);
+BENCHMARK_TEMPLATE(ReferenceDecompression, Compression::ZSTD);
 #endif
 
 #ifdef ARROW_WITH_LZ4
 BENCHMARK_TEMPLATE(ReferenceStreamingCompression, Compression::LZ4_FRAME);
+BENCHMARK_TEMPLATE(ReferenceCompression, Compression::LZ4_FRAME);
 BENCHMARK_TEMPLATE(ReferenceStreamingDecompression, Compression::LZ4_FRAME);
+BENCHMARK_TEMPLATE(ReferenceDecompression, Compression::LZ4_FRAME);

Review Comment:
   We can even benchmark both LZ4 variants.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38042: [C++][Benchmark] Add non-stream Codec Compression/Decompression [arrow]

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #38067:
URL: https://github.com/apache/arrow/pull/38067#discussion_r1369004030


##########
cpp/src/arrow/util/compression_benchmark.cc:
##########
@@ -175,27 +206,56 @@ static void ReferenceStreamingDecompression(
   StreamingDecompression(COMPRESSION, data, state);
 }
 
+template <Compression::type COMPRESSION>
+static void ReferenceDecompression(
+    benchmark::State& state) {                        // NOLINT non-const reference
+  auto data = MakeCompressibleData(8 * 1024 * 1024);  // 8 MB
+
+  auto codec = *Codec::Create(COMPRESSION);
+
+  std::vector<uint8_t> compressed_data;
+  ARROW_UNUSED(NonStreamingCompress(codec.get(), data, &compressed_data));
+  state.counters["ratio"] =
+      static_cast<double>(data.size()) / static_cast<double>(compressed_data.size());
+
+  std::vector<uint8_t> decompressed_data(data);
+  while (state.KeepRunning()) {
+    auto result = codec->Decompress(compressed_data.size(), compressed_data.data(),
+                                    decompressed_data.size(), decompressed_data.data());
+    ARROW_CHECK(result.ok());
+    ARROW_CHECK(*result == static_cast<int64_t>(decompressed_data.size()));
+  }
+  state.SetBytesProcessed(state.iterations() * data.size());
+}
+
 #ifdef ARROW_WITH_ZLIB
 BENCHMARK_TEMPLATE(ReferenceStreamingCompression, Compression::GZIP);
+BENCHMARK_TEMPLATE(ReferenceCompression, Compression::GZIP);
 BENCHMARK_TEMPLATE(ReferenceStreamingDecompression, Compression::GZIP);
+BENCHMARK_TEMPLATE(ReferenceDecompression, Compression::GZIP);
 #endif
 
 #ifdef ARROW_WITH_BROTLI
 BENCHMARK_TEMPLATE(ReferenceStreamingCompression, Compression::BROTLI);
+BENCHMARK_TEMPLATE(ReferenceCompression, Compression::BROTLI);
 BENCHMARK_TEMPLATE(ReferenceStreamingDecompression, Compression::BROTLI);
+BENCHMARK_TEMPLATE(ReferenceDecompression, Compression::BROTLI);
 #endif
 
 #ifdef ARROW_WITH_ZSTD
 BENCHMARK_TEMPLATE(ReferenceStreamingCompression, Compression::ZSTD);
+BENCHMARK_TEMPLATE(ReferenceCompression, Compression::ZSTD);
 BENCHMARK_TEMPLATE(ReferenceStreamingDecompression, Compression::ZSTD);
+BENCHMARK_TEMPLATE(ReferenceDecompression, Compression::ZSTD);
 #endif
 
 #ifdef ARROW_WITH_LZ4
 BENCHMARK_TEMPLATE(ReferenceStreamingCompression, Compression::LZ4_FRAME);
+BENCHMARK_TEMPLATE(ReferenceCompression, Compression::LZ4_FRAME);
 BENCHMARK_TEMPLATE(ReferenceStreamingDecompression, Compression::LZ4_FRAME);
+BENCHMARK_TEMPLATE(ReferenceDecompression, Compression::LZ4_FRAME);

Review Comment:
   > It seems that Parquet doesn't use LZ4_FRAME
   
   Aha I remember parquet-mr first implement LZ4. And arrow implement a different version ( LZ4_FRAME ). `LZ4` stores an extra-length here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38042: [C++][Benchmark] Add non-stream Codec Compression/Decompression [arrow]

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #38067:
URL: https://github.com/apache/arrow/pull/38067#issuecomment-1780544121

   After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 3be5e603dab1fbb48b5fae11df8e1d6d1cef9e03.
   
   There were no benchmark performance regressions. 🎉
   
   The [full Conbench report](https://github.com/apache/arrow/runs/18071374334) has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38042: [C++][Benchmark] Add non-stream Codec Compression/Decompression [arrow]

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on PR #38067:
URL: https://github.com/apache/arrow/pull/38067#issuecomment-1778716374

   @pitrou added https://github.com/apache/arrow/pull/38453


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org