You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@daffodil.apache.org by GitBox <gi...@apache.org> on 2018/07/16 12:56:36 UTC

[GitHub] stevedlawrence opened a new pull request #81: Reduce memory usage regressions in commit 07ee2434bb

stevedlawrence opened a new pull request #81: Reduce memory usage regressions in commit 07ee2434bb
URL: https://github.com/apache/incubator-daffodil/pull/81
 
 
   The modifications to the IO layer to support streaming made changes
   that substantially increased memory usage. This makes the following
   changes to minimize that:
   
   - No longer save the char iterator state. Saving this state required
     duplication a LongBuffer and CharBuffer, which are two non-trivial
     allocations/copies for every point of uncertainty. This really adds up
     for some file types. Instead, never save the char iterator state. When
     the bit position changes due to resetting a mark, we will just clear
     the char iterator state and decode data again. This does mean some
     data might be decoded twice if we backtrack, but that should be
     relatively quick, and means we only take a hit when we backtrack
     instead of every time there is a point of uncertainty.
   - The regexMatch buffers are intentionally large to match long patterns.
     Unfortunately, the PState was changed so that every PState allocated
     its own regex buffers, which resulted in a lot of large allocations.
     Instead, modify the DataProcessor to store ThreadLocal state for the
     regex buffers, and the PState access that state when necessary. So we
     will no only have large regex buffers for each Thread rather than each
     call to parse.
   - For every file parsed in the CLI performance command, we allocated a
     new InputSourceDataInputStream before doing any performance testing.
     So if you wanted to do a performance test of 500,000 files, we would
     allocate 500,000 InputSourceDataInputStreams immediately. This class
     isn't huge, but it can add up pretty quick and use a lot of memory.
     Instead, just allocate the InputSourceDataInputStream right before the
     call to parse so that it can be garbage collected when the parse ends.
   
   DAFFODIL-1966

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services