You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "David Handermann (Jira)" <ji...@apache.org> on 2023/04/01 12:58:00 UTC

[jira] [Commented] (NIFI-11167) Add Excel Record Reader

    [ https://issues.apache.org/jira/browse/NIFI-11167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17707552#comment-17707552 ] 

David Handermann commented on NIFI-11167:
-----------------------------------------

[~dstiegli1] Regarding the styles, thanks for researching the details. Although converting numbers to dates should not be a problem, it sounds like the styles are necessary to help determine whether the column is a date. With that background, it seems like reading styles should always be enabled, or perhaps limited to when using the infer schema strategy.

Regarding the infer schema strategy, the [CSVSchemaInference|https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVSchemaInference.java] and [CSVHeaderSchemaStrategy|https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVHeaderSchemaStrategy.java] have some examples, but I am not aware of any other documentation aside from related Jira issues.

> Add Excel Record Reader
> -----------------------
>
>                 Key: NIFI-11167
>                 URL: https://issues.apache.org/jira/browse/NIFI-11167
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: David Handermann
>            Assignee: Daniel Stieglitz
>            Priority: Minor
>
> A new Excel Record Reader should be implemented to support reading XSLX spreadsheet rows as NiFi Records. This Reader will enable integration with various record-oriented components, obviating the need for the narrowly focused ConvertExcelToCSVProcessor. The initial version of the Excel Reader should not support the legacy binary XLS format.
> The ExcelReader should use a library that supports reading from a stream of rows to avoid consuming large amounts of heap memory during processing.
> The ExcelReader should support configurable properties to read selected sheets. With Excel supporting typed field values, some amount of field type mapping will be required. Additional input filtering properties should not be implemented as existing Processors like QueryRecord support a wide variety of filtering and projection use cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)