You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Daniel Stieglitz (Jira)" <ji...@apache.org> on 2023/03/06 15:45:00 UTC
[jira] [Comment Edited] (NIFI-10792) ConvertExcelToCSVProcessor : Failed to convert file over 10MB

    [ https://issues.apache.org/jira/browse/NIFI-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697005#comment-17697005 ] 

Daniel Stieglitz edited comment on NIFI-10792 at 3/6/23 3:44 PM:
-----------------------------------------------------------------

[~exceptionfactory] I believe I have a fix for this bug using [excel-streaming-reader|https://github.com/pjfanning/excel-streaming-reader]. I just had a a couple of questions before submitting a PR for this.

# Besides adding the dependency for excel-streaming-reader to the pom.xml, do I need to update any license file if it already uses ASF 2.0? It is not clear to me from the [ContributorGuide|https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide#ContributorGuide-UpdateLicensingDocumentation]
# I have a unit test which demonstrates this work but it uses a 20MB Excel spreadsheet. Is that something you want checked in? Does this type of test belong in a unit test or perhaps an integration test?
# In order to configure excel-streaming-reader I have code like 
       
{code:java}
try (Workbook workbook = StreamingReader.builder()
                        .rowCacheSize(500)
                        .bufferSize(1000000)
                        .setReadStyles(formatValues)
                        .open(inputStream))
{code}
I do not think rowCacheSize and bufferSize should be hard coded rather there should be PropertyDescriptor to expose them allowing users to find the right setting for their application. Would it be okay to add PropertyDescriptor for these?



was (Author: JIRAUSER294662):
[~exceptionfactory] I believe I have a fix for this bug using [excel-streaming-reader|https://github.com/pjfanning/excel-streaming-reader]. I just had a a couple of questions before submitting a PR for this.

# Besides adding the dependency for excel-streaming-reader to the pom.xml, do I need to update any license file if it already uses ASF 2.0? It is not clear to me from the [ContributorGuide|https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide#ContributorGuide-UpdateLicensingDocumentation]
# I have a unit test which demonstrates this work but it uses a 20MB Excel spreadsheet. Is that something you want checked in? Does this type of test belong in a unit test or perhaps an integration test?
# In order to configure excel-streaming-reader I have code like 
       
{code:java}
try (Workbook workbook = StreamingReader.builder()
                        .rowCacheSize(500)
                        .bufferSize(1000000)
                        .setReadStyles(formatValues)
                        .open(inputStream))
{code}
I do not think rowCacheSize and bufferSize should be hard coded rather there should be PropertyDescriptor to expose them allowing users to find the right setting for their application. Would it be okay to PropertyDescriptor for these?


> ConvertExcelToCSVProcessor : Failed to convert file over 10MB 
> --------------------------------------------------------------
>
>                 Key: NIFI-10792
>                 URL: https://issues.apache.org/jira/browse/NIFI-10792
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core UI
>    Affects Versions: 1.17.0, 1.16.3, 1.18.0
>            Reporter: mayki
>            Priority: Critical
>              Labels: Excel, csv, processor
>             Fix For: 1.15.3
>
>         Attachments: ConvertExcelToCSVProcessor_1_18_0_with_POI_OLD.PNG, ConvertExcelToCSVProcessor_1_19_1.PNG
>
>
> Hello all,
> It seems all version greater 1.15.3 introduce a failure on the processor 
> *ConvertExcelToCSVProcessor* with this error :
> {code:java}
> Tried to allocate an array of length 101,695,141, but the maximum length for this record type is 100,000,000. If the file is not corrupt or large, please open an issue on bugzilla to request increasing the maximum allowable size for this record type. As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride() {code}
> I have tested with 2 differences instances nifi version 1.15.3 ==> Work: OK
> And since upgrade in 1.16, 1.17, 1.18 ==> same processsor *failed* with file greater than 10MB.
> Could you help us to correct this bug ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)