You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@daffodil.apache.org by "Dave Thompson (Jira)" <ji...@apache.org> on 2021/02/03 13:53:00 UTC

[jira] [Created] (DAFFODIL-2468) Uparsing an infoset for an 800mb csv file runs out of memory

Dave Thompson created DAFFODIL-2468:
---------------------------------------

             Summary: Uparsing an infoset for an 800mb csv file runs out of memory 
                 Key: DAFFODIL-2468
                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2468
             Project: Daffodil
          Issue Type: Bug
    Affects Versions: 3.1.0
            Reporter: Dave Thompson
         Attachments: csv_data800m.csv.gz

While verifying DAFFODIL-2455 - - Large CSV file causes "Attempting to backtrack too far" exception, found that unparsing the successfully parsed 800mb CSV files infoset ran out of memory.

Increased the DAFFODIL_JAVA_OPTS memory setting several time up to 32gb and tried unparsing the infoset, each time running out of memory. Ran on test platform which has 90+GB of memory. 

Parsed and unparsed using the shema from dfdl-shemas/dfdl-csv repo.

The 800gb csv file (csv_data800m.csv) gzipped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)