You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2019/06/28 18:23:12 UTC
[GitHub] [nifi-minifi-cpp] achristianson opened a new pull request #603:
MINIFICPP-929 mmap
achristianson opened a new pull request #603: MINIFICPP-929 mmap
URL: https://github.com/apache/nifi-minifi-cpp/pull/603
****--DRAFT PR please review... lots of code changes, so many eyes are welcome--****
This PR adds a mmap() interface to allow processors to map FlowFile payloads to a memory address. This increases efficiency and performance significantly for some use cases. The change does not negatively impact performance in almost all cases, as shown in benchmarks.
Original/full reason/justification:
"Currently, MiNiFi - C++ only support stream-oriented i/o to FlowFile payloads. This can limit performance in cases where in-place access to the payload is desirable. In cases where data can be accessed randomly and in-place, a significant speedup can be realized by mapping the payload into system memory address space. This is natively supported at the kernel level in Linux, MacOS, and Windows via the mmap() interface on files. Other repositories, such as the VolatileRepository, already store the entire payload in memory, so it is natural to pass through this memory block as if it were a memory-mapped file. While the DatabaseContentRepostory does not appear to natively support a memory map interface, accesses via an emulated memory-map interface should be possible with no performance degradation with respect to a full read via the streaming interface.
Cases where in-place, random access is beneficial include, but are not limited to:
in-place parsing of JSON (e.g. RapidJSON supports parsing in-place, at least for strings).
access of payload via protocol buffers
random access of large files on disk, where it would otherwise require many seek() and read() syscalls
The interface should be accessible by processors via a mmap() call on ProcessSession (adjacent to read() and write()). A MemoryMapCallback should be provided, which is called back via a process() call where the argument is an instance of BaseMemoryMap. The BaseMemoryMap is extended for each type of repository that MiNiFi - C++ supports, including: FileSystemRepository, VolatileRepository, and DatabaseContentRepository.
As part of the change, in addition to extensive unit test coverage, benchmarks should be written such that the performance impact can be empirically measured and evaluated."
Here is the full benchmark suite:
```
-------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------------------------------------------------
FSMemoryMapBMFixture/MemoryMap_FileSystemRepository_Read_Tiny 2956 ns 2923 ns 240558
[2019-06-debugl 14:10:44.663] [org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] org::apache::nifi::minifi::io::FileStream logger got sink
s from namespace root and level error from namespace root
FSMemoryMapBMFixture/Callback_FileSystemRepository_Read_Tiny 4258 ns 4227 ns 164835
FSMemoryMapBMFixture/MemoryMap_FileSystemRepository_WriteRead_Tiny 7764 ns 7665 ns 91078
FSMemoryMapBMFixture/Callback_FileSystemRepository_WriteRead_Tiny 14152 ns 14022 ns 49870
FSMemoryMapBMFixture/MemoryMap_FileSystemRepository_Read_Small 15671 ns 15631 ns 44870
FSMemoryMapBMFixture/Callback_FileSystemRepository_Read_Small 21020 ns 20977 ns 33246
FSMemoryMapBMFixture/MemoryMap_FileSystemRepository_WriteRead_Small 59944 ns 59772 ns 11701
FSMemoryMapBMFixture/Callback_FileSystemRepository_WriteRead_Small 57354 ns 57152 ns 12237
FSMemoryMapBMFixture/MemoryMap_FileSystemRepository_Read_Large 3592536 ns 3587026 ns 194
FSMemoryMapBMFixture/Callback_FileSystemRepository_Read_Large 17014790 ns 16979026 ns 41
FSMemoryMapBMFixture/MemoryMap_FileSystemRepository_WriteRead_Large 16578370 ns 16530633 ns 42
FSMemoryMapBMFixture/Callback_FileSystemRepository_WriteRead_Large 26228637 ns 26159193 ns 27
FSMemoryMapBMFixture/MemoryMap_FileSystemRepository_RandomRead_Large 53.7 ns 53.7 ns 13026678
FSMemoryMapBMFixture/Callback_FileSystemRepository_RandomRead_Large 170905 ns 170829 ns 4074
[2019-06-debugl 14:10:56.874] [org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] org::apache::nifi::minifi::core::Repository logger got si
nks from namespace root and level error from namespace root
[2019-06-debugl 14:10:56.874] [org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] org::apache::nifi::minifi::core::repository::VolatileRepo
sitory<std::shared_ptr<org::apache::nifi::minifi::ResourceClaim> > logger got sinks from namespace root and level error from namespace root
[2019-06-debugl 14:10:56.874] [org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] org::apache::nifi::minifi::core::repository::VolatileCont
entRepository logger got sinks from namespace root and level error from namespace root
[2019-06-debugl 14:10:56.874] [org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] org::apache::nifi::minifi::io::AtomicEntryMemoryMap<std::
shared_ptr<org::apache::nifi::minifi::ResourceClaim> > () logger got sinks from namespace root and level error from namespace root
VolatileMemoryMapBMFixture/MemoryMap_VolatileRepository_Read_Tiny 267 ns 267 ns 2627306
[2019-06-debugl 14:10:57.877] [org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] org::apache::nifi::minifi::io::AtomicEntryStream<std::sha
red_ptr<org::apache::nifi::minifi::ResourceClaim> > () logger got sinks from namespace root and level error from namespace root
VolatileMemoryMapBMFixture/Callback_VolatileRepository_Read_Tiny 360 ns 360 ns 1957163
VolatileMemoryMapBMFixture/MemoryMap_VolatileRepository_WriteRead_Tiny 558 ns 558 ns 1255342
VolatileMemoryMapBMFixture/Callback_VolatileRepository_WriteRead_Tiny 1024 ns 1024 ns 682374
VolatileMemoryMapBMFixture/MemoryMap_VolatileRepository_Read_Small 2654 ns 2653 ns 254700
VolatileMemoryMapBMFixture/Callback_VolatileRepository_Read_Small 7920 ns 7916 ns 96029
VolatileMemoryMapBMFixture/MemoryMap_VolatileRepository_WriteRead_Small 7581 ns 7578 ns 105741
VolatileMemoryMapBMFixture/Callback_VolatileRepository_WriteRead_Small 11594 ns 11590 ns 60342
VolatileMemoryMapBMFixture/MemoryMap_VolatileRepository_Read_Large 2438303 ns 2434904 ns 286
VolatileMemoryMapBMFixture/Callback_VolatileRepository_Read_Large 14859059 ns 14838872 ns 47
VolatileMemoryMapBMFixture/MemoryMap_VolatileRepository_WriteRead_Large 5889984 ns 5879759 ns 119
VolatileMemoryMapBMFixture/Callback_VolatileRepository_WriteRead_Large 17126978 ns 17105183 ns 41
[2019-06-debugl 14:11:07.870] [org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] org::apache::nifi::minifi::core::repository::DatabaseCont
entRepository logger got sinks from namespace root and level error from namespace root
[2019-06-debugl 14:11:07.872] [org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] org::apache::nifi::minifi::io::RocksDbStream logger got s
inks from namespace root and level error from namespace root
DatabaseMemoryMapBMFixture/MemoryMap_DatabaseRepository_Read_Tiny 285 ns 285 ns 2469053
DatabaseMemoryMapBMFixture/Callback_DatabaseRepository_Read_Tiny 254 ns 254 ns 2757993
DatabaseMemoryMapBMFixture/MemoryMap_DatabaseRepository_WriteRead_Tiny 4573 ns 4571 ns 150453
DatabaseMemoryMapBMFixture/Callback_DatabaseRepository_WriteRead_Tiny 3553 ns 3551 ns 197460
DatabaseMemoryMapBMFixture/MemoryMap_DatabaseRepository_Read_Small 12882 ns 12876 ns 54293
DatabaseMemoryMapBMFixture/Callback_DatabaseRepository_Read_Small 11930 ns 11925 ns 58679
DatabaseMemoryMapBMFixture/MemoryMap_DatabaseRepository_WriteRead_Small 88615 ns 88436 ns 7935
DatabaseMemoryMapBMFixture/Callback_DatabaseRepository_WriteRead_Small 90748 ns 90548 ns 7717
DatabaseMemoryMapBMFixture/MemoryMap_DatabaseRepository_Read_Large 26695310 ns 26666793 ns 26
DatabaseMemoryMapBMFixture/Callback_DatabaseRepository_Read_Large 26571426 ns 26544032 ns 26
DatabaseMemoryMapBMFixture/MemoryMap_DatabaseRepository_WriteRead_Large 49532071 ns 49459516 ns 14
DatabaseMemoryMapBMFixture/Callback_DatabaseRepository_WriteRead_Large 62205023 ns 62085915 ns 12
DatabaseMemoryMapBMFixture/MemoryMap_DatabaseRepository_RandomRead_Large 55.5 ns 55.4 ns 12612349
DatabaseMemoryMapBMFixture/Callback_DatabaseRepository_RandomRead_Large 514 ns 514 ns 1094727
```
The benchmarks show a significant performance increase in almost all cases. Both the FS repository and volatile can natively support memory mapping, but the DB repo has to simulate it by reading the full object. This has almost no performance impact in most cases, but is somewhat slower for the "small" (131KB payload) benchmark cases. The random access benchmarks show the most significant increase, even with the DB repo.
Caveats:
- No Windows build yet, although it should be possible (https://docs.microsoft.com/en-us/windows/desktop/memory/file-mapping). I mainly need a set of Windows build instructions to test and validate against, as there's several possible ways to do a windows build.
- There are some formatting changes due to clang-format (I included a .clang-format to hopefully reduce the issue going forward)
- It's a fairly big code change so there could be some other things I missed
- RocksDB is updated and needed RTTI to build on my machine. We can talk about this and/or extract out the rocks update.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services