You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Andrew Christianson (JIRA)" <ji...@apache.org> on 2019/06/21 18:01:00 UTC
[jira] [Created] (MINIFICPP-929) Create memory map interface to
flow files in ProcessSession/ContentRepository
Andrew Christianson created MINIFICPP-929:
---------------------------------------------
Summary: Create memory map interface to flow files in ProcessSession/ContentRepository
Key: MINIFICPP-929
URL: https://issues.apache.org/jira/browse/MINIFICPP-929
Project: Apache NiFi MiNiFi C++
Issue Type: Improvement
Reporter: Andrew Christianson
Assignee: Andrew Christianson
Currently, MiNiFi - C++ only support stream-oriented i/o to FlowFile payloads. This can limit performance in cases where in-place access to the payload is desirable. In cases where data can be accessed randomly and in-place, a significant speedup can be realized by mapping the payload into system memory address space. This is natively supported at the kernel level in Linux, MacOS, and Windows via the mmap() interface on files. Other repositories, such as the VolatileRepository, already store the entire payload in memory, so it is natural to pass through this memory block as if it were a memory-mapped file. While the DatabaseContentRepostory does not appear to natively support a memory map interface, accesses via an emulated memory-map interface should be possible with no performance degradation with respect to a full read via the streaming interface.
Cases where in-place, random access is beneficial include, but are not limited to:
* in-place parsing of JSON (e.g. RapidJSON supports parsing in-place, at least for strings).
* access of payload via protocol buffers
* random access of large files on disk, where it would otherwise require many seek() and read() syscalls
The interface should be accessible by processors via a mmap() call on ProcessSession (adjacent to read() and write()). A MemoryMapCallback should be provided, which is called back via a process() call where the argument is an instance of BaseMemoryMap. The BaseMemoryMap is extended for each type of repository that MiNiFi - C++ supports, including: FileSystemRepository, VolatileRepository, and DatabaseContentRepository.
As part of the change, in addition to extensive unit test coverage, benchmarks should be written such that the performance impact can be empirically measured and evaluated.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)