You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@daffodil.apache.org by GitBox <gi...@apache.org> on 2018/07/16 12:56:36 UTC

[GitHub] stevedlawrence opened a new pull request #81: Reduce memory usage regressions in commit 07ee2434bb

stevedlawrence opened a new pull request #81: Reduce memory usage regressions in commit 07ee2434bb
URL: https://github.com/apache/incubator-daffodil/pull/81

The modifications to the IO layer to support streaming made changes
that substantially increased memory usage. This makes the following
changes to minimize that:

- No longer save the char iterator state. Saving this state required
duplication a LongBuffer and CharBuffer, which are two non-trivial
allocations/copies for every point of uncertainty. This really adds up
for some file types. Instead, never save the char iterator state. When
the bit position changes due to resetting a mark, we will just clear
the char iterator state and decode data again. This does mean some
data might be decoded twice if we backtrack, but that should be
relatively quick, and means we only take a hit when we backtrack
instead of every time there is a point of uncertainty.
- The regexMatch buffers are intentionally large to match long patterns.
Unfortunately, the PState was changed so that every PState allocated
its own regex buffers, which resulted in a lot of large allocations.
Instead, modify the DataProcessor to store ThreadLocal state for the
regex buffers, and the PState access that state when necessary. So we
will no only have large regex buffers for each Thread rather than each
call to parse.
- For every file parsed in the CLI performance command, we allocated a
new InputSourceDataInputStream before doing any performance testing.
So if you wanted to do a performance test of 500,000 files, we would
allocate 500,000 InputSourceDataInputStreams immediately. This class
isn't huge, but it can add up pretty quick and use a lot of memory.
Instead, just allocate the InputSourceDataInputStream right before the
call to parse so that it can be garbage collected when the parse ends.

DAFFODIL-1966

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services