You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Malthe Borch (JIRA)" <ji...@apache.org> on 2019/07/30 14:26:00 UTC

[jira] [Comment Edited] (NIFI-6496) Add compression support to record reader processor

    [ https://issues.apache.org/jira/browse/NIFI-6496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896179#comment-16896179 ] 

Malthe Borch edited comment on NIFI-6496 at 7/30/19 2:25 PM:
-------------------------------------------------------------

Adding compression-support to {{FlowFile}} itself seems ideal, but ideally exposed only through interfaces such as {{CompressContent}}.


In this model, {{CompressContent}} would have an optional streaming (or "lazy") mode such that the unpacked file contents would not have to be written to disk. The effect of running the processor would effectively be to set an internal flag that enables transparent decompression in a subsequent step. The {{fileSize}} should not need to be updated because effectively, the size has not changed (this should be mostly of interest in the context of provenance).



If in some cases content is not streamed (but loaded entirely into memory) then I would think that it an issue that can fixed separately?


was (Author: malthe):
Adding compression-support to {{FlowFile}} itself seems ideal, but ideally exposed only through interfaces such as `CompressContent`.


In this model, {{CompressContent}} would have an optional streaming (or "lazy") mode such that the unpacked file contents would not have to be written to disk. The effect of running the processor would effectively be to set an internal flag that enables transparent decompression in a subsequent step. The {{fileSize}} should not need to be updated because effectively, the size has not changed (this should be mostly of interest in the context of provenance).



If in some cases content is not streamed (but loaded entirely into memory) then I would think that it an issue that can fixed separately?

> Add compression support to record reader processor
> --------------------------------------------------
>
>                 Key: NIFI-6496
>                 URL: https://issues.apache.org/jira/browse/NIFI-6496
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Malthe Borch
>            Priority: Minor
>              Labels: easyfix, usability
>
> Text-based record formats such as CSV, JSON and XML compress well and will often be transmitted in a compressed format. If compression support is added to the relevant processors, users will not need to explicitly unpack files before processing (which may not be feasible or practical due to space requirements).
> There are at least two ways of implementing this, using either a generic approach where a {{CompressedRecordReaderFactory}} is the basis for a new controller service that wraps the underlying record reader controller service (e.g. {{CSVReader}}); or adding the functionality at the relevant record reader implementations.
> The latter option may provide a better UX because no additional {{ControllerService}} has to be configured.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)