You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2020/01/27 15:31:55 UTC

[GitHub] [nifi-minifi-cpp] arpadboda commented on a change in pull request #715: MINIFICPP-1126 - Reduce sawtooth in memory usage of rocksdb flowfile …

arpadboda commented on a change in pull request #715: MINIFICPP-1126 - Reduce sawtooth in memory usage of rocksdb flowfile …
URL: https://github.com/apache/nifi-minifi-cpp/pull/715#discussion_r371309009
 
 

 ##########
 File path: extensions/rocksdb-repos/FlowFileRepository.h
 ##########
 @@ -103,6 +105,9 @@ class FlowFileRepository : public core::Repository, public std::enable_shared_fr
     options.create_if_missing = true;
     options.use_direct_io_for_flush_and_compaction = true;
     options.use_direct_reads = true;
+    options.write_buffer_size = 8 << 20;
 
 Review comment:
   This is the important part of this PR.
   
   When operations are done in rocksdb, it stores them in an unsorted list of events. The buffer getting full means that these events should be merged and serialized, so records are written to the disk in the regular structure of rocksdb. 
   
   In our case it means two things:
   -During our regular usecase the content of the buffer in continuously growing as creation and later deletion of the same element results in two events added to the buffer (log). 
   -When the buffer is full and events are merged, there is a CPU spike and the result (nothing or nearly nothing in our case) is written to the underlying storage. 
   
   As this buffer only contains flowfile metadata (attributes and content location), it's filled quite slowly. It takes hours in case 10 flowfiles are generated / sec. 
   
   To avoid memory usage going that high (FF repo can consume more than the rest of whole MiNiFI when the buffer is close to full) and have smaller CPU peaks the buffer is now emptied more frequently. This still means minutes with a decent load. 
   
   For more information check this: https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#memtable
   (Note: write amplification doesn't matter in our case as we usually persist negligible amount of data as flowfiles fade out of the system in a very short time)
   
   The logging added in this PR logs the current write buffer usage as well. It helps monitoring the peaks. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services