You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2019/06/28 18:23:12 UTC

[GitHub] [nifi-minifi-cpp] achristianson opened a new pull request #603: MINIFICPP-929 mmap

achristianson opened a new pull request #603: MINIFICPP-929 mmap
URL: https://github.com/apache/nifi-minifi-cpp/pull/603
 
 
   ****--DRAFT PR please review... lots of code changes, so many eyes are welcome--****
   
   This PR adds a mmap() interface to allow processors to map FlowFile payloads to a memory address. This increases efficiency and performance significantly for some use cases. The change does not negatively impact performance in almost all cases, as shown in benchmarks.
   
   Original/full reason/justification:
   
   "Currently, MiNiFi - C++ only support stream-oriented i/o to FlowFile payloads. This can limit performance in cases where in-place access to the payload is desirable. In cases where data can be accessed randomly and in-place, a significant speedup can be realized by mapping the payload into system memory address space. This is natively supported at the kernel level in Linux, MacOS, and Windows via the mmap() interface on files. Other repositories, such as the VolatileRepository, already store the entire payload in memory, so it is natural to pass through this memory block as if it were a memory-mapped file. While the DatabaseContentRepostory does not appear to natively support a memory map interface, accesses via an emulated memory-map interface should be possible with no performance degradation with respect to a full read via the streaming interface.
   
   Cases where in-place, random access is beneficial include, but are not limited to:
   
       in-place parsing of JSON (e.g. RapidJSON supports parsing in-place, at least for strings).
       access of payload via protocol buffers
       random access of large files on disk, where it would otherwise require many seek() and read() syscalls
   
   The interface should be accessible by processors via a mmap() call on ProcessSession (adjacent to read() and write()). A MemoryMapCallback should be provided, which is called back via a process() call where the argument is an instance of BaseMemoryMap. The BaseMemoryMap is extended for each type of repository that MiNiFi - C++ supports, including: FileSystemRepository, VolatileRepository, and DatabaseContentRepository.
   
   As part of the change, in addition to extensive unit test coverage, benchmarks should be written such that the performance impact can be empirically measured and evaluated."
   
   Here is the full benchmark suite:
   
   ```
   -------------------------------------------------------------------------------------------------------------------
   Benchmark                                                                         Time             CPU   Iterations
   -------------------------------------------------------------------------------------------------------------------
   FSMemoryMapBMFixture/MemoryMap_FileSystemRepository_Read_Tiny                  2956 ns         2923 ns       240558
   [2019-06-debugl 14:10:44.663] [org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] org::apache::nifi::minifi::io::FileStream logger got sink
   s from namespace root and level error from namespace root
   FSMemoryMapBMFixture/Callback_FileSystemRepository_Read_Tiny                   4258 ns         4227 ns       164835
   FSMemoryMapBMFixture/MemoryMap_FileSystemRepository_WriteRead_Tiny             7764 ns         7665 ns        91078
   FSMemoryMapBMFixture/Callback_FileSystemRepository_WriteRead_Tiny             14152 ns        14022 ns        49870
   FSMemoryMapBMFixture/MemoryMap_FileSystemRepository_Read_Small                15671 ns        15631 ns        44870
   FSMemoryMapBMFixture/Callback_FileSystemRepository_Read_Small                 21020 ns        20977 ns        33246
   FSMemoryMapBMFixture/MemoryMap_FileSystemRepository_WriteRead_Small           59944 ns        59772 ns        11701
   FSMemoryMapBMFixture/Callback_FileSystemRepository_WriteRead_Small            57354 ns        57152 ns        12237
   FSMemoryMapBMFixture/MemoryMap_FileSystemRepository_Read_Large              3592536 ns      3587026 ns          194
   FSMemoryMapBMFixture/Callback_FileSystemRepository_Read_Large              17014790 ns     16979026 ns           41
   FSMemoryMapBMFixture/MemoryMap_FileSystemRepository_WriteRead_Large        16578370 ns     16530633 ns           42
   FSMemoryMapBMFixture/Callback_FileSystemRepository_WriteRead_Large         26228637 ns     26159193 ns           27
   FSMemoryMapBMFixture/MemoryMap_FileSystemRepository_RandomRead_Large           53.7 ns         53.7 ns     13026678
   FSMemoryMapBMFixture/Callback_FileSystemRepository_RandomRead_Large          170905 ns       170829 ns         4074
   [2019-06-debugl 14:10:56.874] [org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] org::apache::nifi::minifi::core::Repository logger got si
   nks from namespace root and level error from namespace root
   [2019-06-debugl 14:10:56.874] [org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] org::apache::nifi::minifi::core::repository::VolatileRepo
   sitory<std::shared_ptr<org::apache::nifi::minifi::ResourceClaim> > logger got sinks from namespace root and level error from namespace root
   [2019-06-debugl 14:10:56.874] [org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] org::apache::nifi::minifi::core::repository::VolatileCont
   entRepository logger got sinks from namespace root and level error from namespace root
   [2019-06-debugl 14:10:56.874] [org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] org::apache::nifi::minifi::io::AtomicEntryMemoryMap<std::
   shared_ptr<org::apache::nifi::minifi::ResourceClaim> > () logger got sinks from namespace root and level error from namespace root
   VolatileMemoryMapBMFixture/MemoryMap_VolatileRepository_Read_Tiny               267 ns          267 ns      2627306
   [2019-06-debugl 14:10:57.877] [org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] org::apache::nifi::minifi::io::AtomicEntryStream<std::sha
   red_ptr<org::apache::nifi::minifi::ResourceClaim> > () logger got sinks from namespace root and level error from namespace root
   VolatileMemoryMapBMFixture/Callback_VolatileRepository_Read_Tiny                360 ns          360 ns      1957163
   VolatileMemoryMapBMFixture/MemoryMap_VolatileRepository_WriteRead_Tiny          558 ns          558 ns      1255342
   VolatileMemoryMapBMFixture/Callback_VolatileRepository_WriteRead_Tiny          1024 ns         1024 ns       682374
   VolatileMemoryMapBMFixture/MemoryMap_VolatileRepository_Read_Small             2654 ns         2653 ns       254700
   VolatileMemoryMapBMFixture/Callback_VolatileRepository_Read_Small              7920 ns         7916 ns        96029
   VolatileMemoryMapBMFixture/MemoryMap_VolatileRepository_WriteRead_Small        7581 ns         7578 ns       105741
   VolatileMemoryMapBMFixture/Callback_VolatileRepository_WriteRead_Small        11594 ns        11590 ns        60342
   VolatileMemoryMapBMFixture/MemoryMap_VolatileRepository_Read_Large          2438303 ns      2434904 ns          286
   VolatileMemoryMapBMFixture/Callback_VolatileRepository_Read_Large          14859059 ns     14838872 ns           47
   VolatileMemoryMapBMFixture/MemoryMap_VolatileRepository_WriteRead_Large     5889984 ns      5879759 ns          119
   VolatileMemoryMapBMFixture/Callback_VolatileRepository_WriteRead_Large     17126978 ns     17105183 ns           41
   [2019-06-debugl 14:11:07.870] [org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] org::apache::nifi::minifi::core::repository::DatabaseCont
   entRepository logger got sinks from namespace root and level error from namespace root
   [2019-06-debugl 14:11:07.872] [org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] org::apache::nifi::minifi::io::RocksDbStream logger got s
   inks from namespace root and level error from namespace root
   DatabaseMemoryMapBMFixture/MemoryMap_DatabaseRepository_Read_Tiny               285 ns          285 ns      2469053
   DatabaseMemoryMapBMFixture/Callback_DatabaseRepository_Read_Tiny                254 ns          254 ns      2757993
   DatabaseMemoryMapBMFixture/MemoryMap_DatabaseRepository_WriteRead_Tiny         4573 ns         4571 ns       150453
   DatabaseMemoryMapBMFixture/Callback_DatabaseRepository_WriteRead_Tiny          3553 ns         3551 ns       197460
   DatabaseMemoryMapBMFixture/MemoryMap_DatabaseRepository_Read_Small            12882 ns        12876 ns        54293
   DatabaseMemoryMapBMFixture/Callback_DatabaseRepository_Read_Small             11930 ns        11925 ns        58679
   DatabaseMemoryMapBMFixture/MemoryMap_DatabaseRepository_WriteRead_Small       88615 ns        88436 ns         7935
   DatabaseMemoryMapBMFixture/Callback_DatabaseRepository_WriteRead_Small        90748 ns        90548 ns         7717
   DatabaseMemoryMapBMFixture/MemoryMap_DatabaseRepository_Read_Large         26695310 ns     26666793 ns           26
   DatabaseMemoryMapBMFixture/Callback_DatabaseRepository_Read_Large          26571426 ns     26544032 ns           26
   DatabaseMemoryMapBMFixture/MemoryMap_DatabaseRepository_WriteRead_Large    49532071 ns     49459516 ns           14
   DatabaseMemoryMapBMFixture/Callback_DatabaseRepository_WriteRead_Large     62205023 ns     62085915 ns           12
   DatabaseMemoryMapBMFixture/MemoryMap_DatabaseRepository_RandomRead_Large       55.5 ns         55.4 ns     12612349
   DatabaseMemoryMapBMFixture/Callback_DatabaseRepository_RandomRead_Large         514 ns          514 ns      1094727
   ```
   
   The benchmarks show a significant performance increase in almost all cases. Both the FS repository and volatile can natively support memory mapping, but the DB repo has to simulate it by reading the full object. This has almost no performance impact in most cases, but is somewhat slower for the "small" (131KB payload) benchmark cases. The random access benchmarks show the most significant increase, even with the DB repo.
   
   Caveats:
   
   - No Windows build yet, although it should be possible (https://docs.microsoft.com/en-us/windows/desktop/memory/file-mapping). I mainly need a set of Windows build instructions to test and validate against, as there's several possible ways to do a windows build.
   - There are some formatting changes due to clang-format (I included a .clang-format to hopefully reduce the issue going forward)
   - It's a fairly big code change so there could be some other things I missed
   - RocksDB is updated and needed RTTI to build on my machine. We can talk about this and/or extract out the rocks update.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services