You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2022/12/15 16:55:21 UTC

[GitHub] [nifi-minifi-cpp] lordgamez opened a new pull request, #1480: MINIFICPP-2007 Add compression options for flowfile and content repo

lordgamez opened a new pull request, #1480:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1480

   https://issues.apache.org/jira/browse/MINIFICPP-2007
   
   - Add bundled zstd and lz4 thirdparty libraries
   - Upgrade rocksdb to version 7.7.3
   - Add compression options bzip2, zlib, zstd and lz4 on Unix and xpress on Windows
   
   ----------------------
   Thank you for submitting a contribution to Apache NiFi - MiNiFi C++.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [ ] Is there a JIRA ticket associated with this PR? Is it referenced
        in the commit message?
   
   - [ ] Does your PR title start with MINIFICPP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
   
   - [ ] Has your PR been rebased against the latest commit within the target branch (typically main)?
   
   - [ ] Is your initial contribution a single, squashed commit?
   
   ### For code changes:
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the LICENSE file?
   - [ ] If applicable, have you updated the NOTICE file?
   
   ### For documentation related changes:
   - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
   
   ### Note:
   Please ensure that once the PR is submitted, you check GitHub Actions CI results for build issues and submit an update to your PR as soon as possible.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] szaszm closed pull request #1480: MINIFICPP-2007 Add compression options for flowfile and content repo

Posted by "szaszm (via GitHub)" <gi...@apache.org>.
szaszm closed pull request #1480: MINIFICPP-2007 Add compression options for flowfile and content repo
URL: https://github.com/apache/nifi-minifi-cpp/pull/1480


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] szaszm commented on a diff in pull request #1480: MINIFICPP-2007 Add compression options for flowfile and content repo

Posted by "szaszm (via GitHub)" <gi...@apache.org>.
szaszm commented on code in PR #1480:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1480#discussion_r1100041964


##########
extensions/rocksdb-repos/FlowFileRepository.h:
##########
@@ -131,11 +131,35 @@ class FlowFileRepository : public ThreadedRepository, public SwapManager {
     // To avoid DB write issues during heavy load it's recommended to have high number of buffer.
     // Rocksdb's stall feature can also trigger in case the number of buffers is >= 3.
     // The more buffers we have the more memory rocksdb can utilize without significant memory consumption under low load.
-    auto cf_options = [] (rocksdb::ColumnFamilyOptions& cf_opts) {
+    auto cf_options = [&configure] (rocksdb::ColumnFamilyOptions& cf_opts) {
       cf_opts.OptimizeForPointLookup(4);
       cf_opts.write_buffer_size = 8ULL << 20U;
       cf_opts.max_write_buffer_number = 20;
       cf_opts.min_write_buffer_number_to_merge = 1;
+      std::string value;
+      if (configure->get(Configure::nifi_flow_repository_rocksdb_compression, value) && !value.empty()) {
+#ifdef WIN32
+        if (value == "xpress") {
+          cf_opts.compression = rocksdb::CompressionType::kXpressCompression;
+        } else {
+          throw Exception(REPOSITORY_EXCEPTION, "RocksDB compression type not supported: " + value);
+        }
+#else
+        if (value == "zlib") {
+          cf_opts.compression = rocksdb::CompressionType::kZlibCompression;
+        } else if (value == "bzip2") {
+          cf_opts.compression = rocksdb::CompressionType::kBZip2Compression;

Review Comment:
   I propose to add an option like "auto" or "yes", which chooses a good default compression algorithm on all platforms. This way we can keep the config platform-independent, and people don't need to become experts in compression algorithms to make a good choice.
   On Windows, it obviously has to be xpress, since there's nothing else. On others, I'd go for zstd.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] lordgamez commented on a diff in pull request #1480: MINIFICPP-2007 Add compression options for flowfile and content repo

Posted by "lordgamez (via GitHub)" <gi...@apache.org>.
lordgamez commented on code in PR #1480:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1480#discussion_r1100228433


##########
extensions/rocksdb-repos/FlowFileRepository.h:
##########
@@ -131,11 +131,35 @@ class FlowFileRepository : public ThreadedRepository, public SwapManager {
     // To avoid DB write issues during heavy load it's recommended to have high number of buffer.
     // Rocksdb's stall feature can also trigger in case the number of buffers is >= 3.
     // The more buffers we have the more memory rocksdb can utilize without significant memory consumption under low load.
-    auto cf_options = [] (rocksdb::ColumnFamilyOptions& cf_opts) {
+    auto cf_options = [&configure] (rocksdb::ColumnFamilyOptions& cf_opts) {
       cf_opts.OptimizeForPointLookup(4);
       cf_opts.write_buffer_size = 8ULL << 20U;
       cf_opts.max_write_buffer_number = 20;
       cf_opts.min_write_buffer_number_to_merge = 1;
+      std::string value;
+      if (configure->get(Configure::nifi_flow_repository_rocksdb_compression, value) && !value.empty()) {
+#ifdef WIN32
+        if (value == "xpress") {
+          cf_opts.compression = rocksdb::CompressionType::kXpressCompression;
+        } else {
+          throw Exception(REPOSITORY_EXCEPTION, "RocksDB compression type not supported: " + value);
+        }
+#else
+        if (value == "zlib") {
+          cf_opts.compression = rocksdb::CompressionType::kZlibCompression;
+        } else if (value == "bzip2") {
+          cf_opts.compression = rocksdb::CompressionType::kBZip2Compression;

Review Comment:
   Great idea! Updated in 1d51fe6f83e37c7f9ae81306f55d920abc1f69e1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] szaszm commented on a diff in pull request #1480: MINIFICPP-2007 Add compression options for flowfile and content repo

Posted by "szaszm (via GitHub)" <gi...@apache.org>.
szaszm commented on code in PR #1480:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1480#discussion_r1108397217


##########
conf/minifi.properties:
##########
@@ -29,9 +29,11 @@ nifi.provenance.repository.max.storage.time=1 MIN
 nifi.provenance.repository.max.storage.size=1 MB
 nifi.flowfile.repository.directory.default=${MINIFI_HOME}/flowfile_repository
 nifi.flowfile.checkpoint.directory.default=${MINIFI_HOME}/flowfile_checkpoint
+# nifi.flowfile.repository.rocksdb.compression=auto

Review Comment:
   Is it possible to change these settings without losing existing repository data? So basically does it detect existing data in a different format and do conversion at startup? If not, is this something we could make work, possibly in a followup?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] lordgamez commented on a diff in pull request #1480: MINIFICPP-2007 Add compression options for flowfile and content repo

Posted by "lordgamez (via GitHub)" <gi...@apache.org>.
lordgamez commented on code in PR #1480:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1480#discussion_r1108469672


##########
conf/minifi.properties:
##########
@@ -29,9 +29,11 @@ nifi.provenance.repository.max.storage.time=1 MIN
 nifi.provenance.repository.max.storage.size=1 MB
 nifi.flowfile.repository.directory.default=${MINIFI_HOME}/flowfile_repository
 nifi.flowfile.checkpoint.directory.default=${MINIFI_HOME}/flowfile_checkpoint
+# nifi.flowfile.repository.rocksdb.compression=auto

Review Comment:
   I tried switching compression types between restarts and the flow continued where it left off previously. I could see in the rocksdb log files the used compression was changed to the new one in the minifi property file so it seems to me that rocksdb could handle the config change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org