You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Andrew Christianson (JIRA)" <ji...@apache.org> on 2019/06/21 18:01:00 UTC

[jira] [Created] (MINIFICPP-929) Create memory map interface to flow files in ProcessSession/ContentRepository

Andrew Christianson created MINIFICPP-929:
---------------------------------------------

             Summary: Create memory map interface to flow files in ProcessSession/ContentRepository
                 Key: MINIFICPP-929
                 URL: https://issues.apache.org/jira/browse/MINIFICPP-929
             Project: Apache NiFi MiNiFi C++
          Issue Type: Improvement
            Reporter: Andrew Christianson
            Assignee: Andrew Christianson


Currently, MiNiFi - C++ only support stream-oriented i/o to FlowFile payloads. This can limit performance in cases where in-place access to the payload is desirable. In cases where data can be accessed randomly and in-place, a significant speedup can be realized by mapping the payload into system memory address space. This is natively supported at the kernel level in Linux, MacOS, and Windows via the mmap() interface on files. Other repositories, such as the VolatileRepository, already store the entire payload in memory, so it is natural to pass through this memory block as if it were a memory-mapped file. While the DatabaseContentRepostory does not appear to natively support a memory map interface, accesses via an emulated memory-map interface should be possible with no performance degradation with respect to a full read via the streaming interface.

Cases where in-place, random access is beneficial include, but are not limited to:
 * in-place parsing of JSON (e.g. RapidJSON supports parsing in-place, at least for strings).
 * access of payload via protocol buffers
 * random access of large files on disk, where it would otherwise require many seek() and read() syscalls

The interface should be accessible by processors via a mmap() call on ProcessSession (adjacent to read() and write()). A MemoryMapCallback should be provided, which is called back via a process() call where the argument is an instance of BaseMemoryMap. The BaseMemoryMap is extended for each type of repository that MiNiFi - C++ supports, including: FileSystemRepository, VolatileRepository, and DatabaseContentRepository.

As part of the change, in addition to extensive unit test coverage, benchmarks should be written such that the performance impact can be empirically measured and evaluated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)