You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/02/17 07:33:41 UTC
[jira] [Comment Edited] (DRILL-5273) CompliantTextReader exhausts 4
GB memory when reading 5000 small files
[ https://issues.apache.org/jira/browse/DRILL-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871311#comment-15871311 ]
Paul Rogers edited comment on DRILL-5273 at 2/17/17 7:33 AM:
-------------------------------------------------------------
Tracking the {{close()}} call in a test with a single file:
{code}
After close: 1114112
{code}
However, the close operation does not appear to raise an error about the leaked memory. (But, see note below about why the operator *should not* close its allocator.)
The reason appears to be that the {{ScanBatch}} operator never closes its allocator:
{code}
@Override
public void close() throws Exception {
container.clear();
for (final ValueVector v : implicitVectors.values()) {
v.clear();
}
fieldVectorMap.clear();
currentReader.close();
}
{code}
By contrast, the external sort explicitly closes its allocator.
Once we add the allocator close, we get many exceptions, including for ALTER SESSION commands.
ALTER SESSION:
{code}
java.lang.IllegalStateException: Memory was leaked by query. Memory leaked: (256)
Allocator(op:0:0:0:DirectSubScan) 1000000/256/4530432/10000000000 (res/actual/peak/limit)
{code}
Then, the 256 bytes is released *after* the call to close:
{code}
java.lang.IllegalStateException: Attempting operation on allocator when allocator is closed.
Allocator(op:0:0:0:DirectSubScan) 1000000/256/4530432/10000000000 (res/actual/peak/limit)
{code}
This is very likely why the call to close the allocator was never added. Checking file history, I could not find a version of the file that closed the allocator.
was (Author: paul-rogers):
Tracking the {{close()}} call in a test with a single file:
{code}
After close: 1114112
{code}
However, the close operation does not appear to raise an error about the leaked memory. The reason appears to be that the {{ScanBatch}} operator never closes its allocator:
{code}
@Override
public void close() throws Exception {
container.clear();
for (final ValueVector v : implicitVectors.values()) {
v.clear();
}
fieldVectorMap.clear();
currentReader.close();
}
{code}
By contrast, the external sort explicitly closes its allocator.
Once we add the allocator close, we get many exceptions, including for ALTER SESSION commands.
ALTER SESSION:
{code}
java.lang.IllegalStateException: Memory was leaked by query. Memory leaked: (256)
Allocator(op:0:0:0:DirectSubScan) 1000000/256/4530432/10000000000 (res/actual/peak/limit)
{code}
Then, the 256 bytes is released *after* the call to close:
{code}
java.lang.IllegalStateException: Attempting operation on allocator when allocator is closed.
Allocator(op:0:0:0:DirectSubScan) 1000000/256/4530432/10000000000 (res/actual/peak/limit)
{code}
This is very likely why the call to close the allocator was never added. Checking file history, I could not find a version of the file that closed the allocator.
> CompliantTextReader exhausts 4 GB memory when reading 5000 small files
> ----------------------------------------------------------------------
>
> Key: DRILL-5273
> URL: https://issues.apache.org/jira/browse/DRILL-5273
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.10
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: 1.10
>
>
> A test case was created that consists of 5000 text files, each with a single line with the file number: 1 to 5001. Each file has a single record, and at most 4 characters per record.
> Run the following query:
> {code}
> SELECT * FROM `dfs.data`.`5000files/text
> {code}
> The query will fail with an OOM in the scan batch on around record 3700 on a Mac with 4GB of direct memory.
> The code to read records in {ScanBatch} is complex. The following appears to occur:
> * Iterate over the record readers for each file.
> * For each, call setup
> The setup code is:
> {code}
> public void setup(OperatorContext context, OutputMutator outputMutator) throws ExecutionSetupException {
> oContext = context;
> readBuffer = context.getManagedBuffer(READ_BUFFER);
> whitespaceBuffer = context.getManagedBuffer(WHITE_SPACE_BUFFER);
> {code}
> The two buffers are in direct memory. There is no code that releases the buffers.
> The sizes are:
> {code}
> private static final int READ_BUFFER = 1024*1024;
> private static final int WHITE_SPACE_BUFFER = 64*1024;
> = 1,048,576 + 65536 = 1,114,112
> {code}
> This is exactly the amount of memory that accumulates per call to {{ScanBatch.next()}}
> {code}
> Ctor: 0 -- Initial memory in constructor
> Init setup: 1114112 -- After call to first record reader setup
> Entry Memory: 1114112 -- first next() call, returns one record
> Entry Memory: 1114112 -- second next(), eof and start second reader
> Entry Memory: 2228224 -- third next(), second reader returns EOF
> ...
> {code}
> If we leak 1 MB per file, with 5000 files we would leak 5 GB of memory, which would explain the OOM when given only 4 GB.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)