You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by Nate Butler <na...@chartbeat.com> on 2017/05/01 14:44:49 UTC

External Sort - Unable to Allocate Buffer error

We keep running into this issue when trying to issue a query with hashagg
disabled. When I look at system memory usage though, drill doesn't seem to
be using much of it but still hits this error.

Our environment:

- 1 r3.8xl
- 1 drillbit version 1.10.0 configured with 4GB of Heap and 230G of Direct
- Data stored on S3 is compressed CSV

I've tried increasing planner.memory.max_query_memory_per_node to 230G and
lowered planner.width.max_per_query to 1 and it still fails.

We've applied the patch from this bug in the hopes that it would resolve
the issue but it hasn't:

https://issues.apache.org/jira/browse/DRILL-5226

Stack Trace:

  (org.apache.drill.exec.exception.OutOfMemoryException) Unable to allocate
buffer of size 16777216 due to memory limit. Current allocation: 8445952
    org.apache.drill.exec.memory.BaseAllocator.buffer():220
    org.apache.drill.exec.memory.BaseAllocator.buffer():195
    org.apache.drill.exec.vector.VarCharVector.reAlloc():425
    org.apache.drill.exec.vector.VarCharVector.copyFromSafe():278
    org.apache.drill.exec.vector.NullableVarCharVector.copyFromSafe():379

org.apache.drill.exec.test.generated.PriorityQueueCopierGen328.doCopy():22
    org.apache.drill.exec.test.generated.PriorityQueueCopierGen328.next():75

org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill():602

org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext():428
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109

org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():137
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.physical.impl.BaseRootExec.next():104

org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():144
    org.apache.drill.exec.physical.impl.BaseRootExec.next():94
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1657
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1142
    java.util.concurrent.ThreadPoolExecutor$Worker.run():617
    java.lang.Thread.run():745 (state=,code=0)

Is there something I'm missing here? Any help/direction would be
appreciated.

Thanks,
Nate

Re: External Sort - Unable to Allocate Buffer error

Posted by Paul Rogers <pr...@mapr.com>.

Hi Nate,

I’ll give you three separate suggestions. The first two build on the discussion with Zelaine. The third gets at a separate problem that could be the root cause.

First, let’s discuss logging. When we hit a bug such as this, the logs are incredibly useful to learn what is going on. Turn on debug logging. If you are familiar with Java logging, then you only need to enable the debug level for the org.apache.drill.exec.physical.impl.xsort.managed package. Then, look for lines that say “ExternalSortBatch”.

You will see a number of entries early on that identify the amount of memory available to the sort, the size of the incoming batches, and how we will slice up memory. Please post those lines to your JIRA entry.

Then, later, you’ll see an entry for the OOM error. Review the preceding entries to get a sense of where the sort was: was it still reading and spilling data from upstream (the sort phase)? Or, had it gotten to the merge phase in which we reread spilled data.

The log entries, while cryptic on first glance, make a bit more sense after you scan through the full set. Post those lines with summary info.

Also, the query profile will tell you how much memory was actually used at the time of the OOM. You can compare that with the “budget” explained in the log file entry mentioned above.

Second, we can better define how Drill works with sort memory to help you properly configure your setup.

Here is some background.

* Your system has some amount of memory. In your case, 230 GB.
* To allocate memory to the sort, Drill does not use the actual memory. Instead, we use planner.memory.max_query_memory_per_node. (The idea is that you set this value as, roughly, system memory / number of concurrent queries.)
* Drill divides up memory to compute per-sort memory as: query memory per node / no. of slices / no. of sorts in the query.
* In your system, the number of slices is 23, so each fragment gets 10 GB of memory.
* If your query has a single sort, then each sort gets 10 GB of memory.
* However, memory per query is capped by the boot-time drill.memory.top.max option. (See below) which defaults go 20 GB. Not an issue here, but is an issue if the numbers above come out differently.
* When you changed planner.width.max_per_query, it has no effect on memory.
* You’d ideally change planner.width.max_per_node to 1 to run the query single-threaded. But, due to the item above, no sort will get more than 20 GB anyway.

For the actual code, see [1].

Despite all this, the likely original 10 GB allocation should be plenty; the sort is supposed to spill. How much it spills depends on your input data size. When sorting, performance is affected by memory:

* If your data is smaller than sort memory, sorting happens in memory, and performance is optimal.
* If your data is larger than memory, but smaller than 8x memory, you’ll get a “single generation” spill/merge and performance should be no worse than 3x an in-memory sort. (1 x is the original data read, then another 1x for spill and the third 1x for read/merge.)
* If your data is larger than 8x memory, sorting will need multiple generations of spill/merge/re-spill, and run-time will increase accordingly.

Some options:

* Set planner.width.max_per_node to 1 to run the query single-threaded. This will use all memory for the single sort.
* But, we’ve got that pesky 20 GB global cap. So, change your drill-override.conf file as follows:

drill.memory.top.max: 100000000000;

(Sorry for all the zeros. It is supposed to be 100 GB. We really should switch to a better format to specify memory…) 100 GB seems plenty without going larger.

You can verify that these changes take effect by looking for the log line that explains the managed sort’s memory calculations (when debug logging is enabled.)

Third, all that said, I wonder if the problem is elsewhere. Yes, you are getting an Out of Memory (OOM) error. But, not in the usual place that indicates a sort issue. Instead, you are getting it in the allocation of a “value vector.” This raises some questions:

* How big is your input data (size on disk)?
* How many columns?
* How wide are your VarChar columns, on average?

You mentioned data is compressed CSV. With typical 8x compression, actual data sorted will be ~8x your on-disk size.

The column width question is critical. I see that the vector is trying to allocate 16 MB of data, which suggests that your column widths are 250 or larger. If so, we are probably looking at a different error that happens to be showing up while sorting.

Once we see the details of your data size, we can determine if we should focus more closely in that area.

Thanks,

- Paul

[1] https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/util/MemoryAllocationUtilities.java

> On May 2, 2017, at 10:47 AM, rahul challapalli <ch...@gmail.com> wrote:
> 
> This is clearly a bug and like zelaine suggested the new sort is still work
> in progress. We have a few similar bugs open for the new sort. I could have
> pointed to the jira's but unfortunately JIRA is not working for me due to
> firewall issues.
> 
> Another suggestion is build drill from the latest master and try it out, if
> you are willing to spend some time. But again there is no guarantee yet.
> 
> Please go ahead and raise a new jira. If it is a duplicate, I will mark it
> as such later. Thank You.
> 
> - Rahul
> 
> On Tue, May 2, 2017 at 8:24 AM, Nate Butler <na...@chartbeat.com> wrote:
> 
>> Zelaine, thanks for the suggestion. I added this option both to the
>> drill-override and in the session and this time the query did stay running
>> for much longer but it still eventually failed with the same error,
>> although much different memory values.
>> 
>>  (org.apache.drill.exec.exception.OutOfMemoryException) Unable to
>> allocate
>> buffer of size 134217728 due to memory limit. Current allocation:
>> 10653214316
>>    org.apache.drill.exec.memory.BaseAllocator.buffer():220
>>    org.apache.drill.exec.memory.BaseAllocator.buffer():195
>>    org.apache.drill.exec.vector.VarCharVector.reAlloc():425
>>    org.apache.drill.exec.vector.VarCharVector.copyFromSafe():278
>>    org.apache.drill.exec.vector.NullableVarCharVector.copyFromSafe():379
>>    org.apache.drill.exec.test.generated.PriorityQueueCopierGen8.
>> doCopy():22
>>    org.apache.drill.exec.test.generated.PriorityQueueCopierGen8.next():76
>> 
>> org.apache.drill.exec.physical.impl.xsort.managed.
>> CopierHolder$BatchMerger.next():234
>> 
>> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.
>> doMergeAndSpill():1408
>> 
>> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.
>> mergeAndSpill():1376
>> 
>> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.
>> spillFromMemory():1339
>> 
>> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.
>> processBatch():831
>> 
>> org.apache.drill.exec.physical.impl.xsort.managed.
>> ExternalSortBatch.loadBatch():618
>> 
>> org.apache.drill.exec.physical.impl.xsort.managed.
>> ExternalSortBatch.load():660
>> 
>> org.apache.drill.exec.physical.impl.xsort.managed.
>> ExternalSortBatch.innerNext():559
>>    org.apache.drill.exec.record.AbstractRecordBatch.next():162
>>    org.apache.drill.exec.record.AbstractRecordBatch.next():119
>>    org.apache.drill.exec.record.AbstractRecordBatch.next():109
>> 
>> org.apache.drill.exec.physical.impl.aggregate.
>> StreamingAggBatch.innerNext():137
>>    org.apache.drill.exec.record.AbstractRecordBatch.next():162
>>    org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>> 
>> org.apache.drill.exec.physical.impl.partitionsender.
>> PartitionSenderRootExec.innerNext():144
>>    org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>>    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
>>    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
>>    java.security.AccessController.doPrivileged():-2
>>    javax.security.auth.Subject.doAs():422
>>    org.apache.hadoop.security.UserGroupInformation.doAs():1657
>>    org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
>>    org.apache.drill.common.SelfCleaningRunnable.run():38
>>    java.util.concurrent.ThreadPoolExecutor.runWorker():1142
>>    java.util.concurrent.ThreadPoolExecutor$Worker.run():617
>>    java.lang.Thread.run():745 (state=,code=0)
>> 
>> At first I didn't change planner.width.max_per_query and the default on a
>> 32 core machine makes it 23. This query failed after 34 minutes. I then
>> tried setting planner.width.max_per_query=1 and this query also failed but
>> of course took took longer, about 2 hours. In both cases,
>> planner.memory.max_query_memory_per_node was set to 230G.
>> 
>> 
>> On Mon, May 1, 2017 at 11:09 AM, Zelaine Fong <zf...@mapr.com> wrote:
>> 
>>> Nate,
>>> 
>>> The Jira you’ve referenced relates to the new external sort, which is not
>>> enabled by default, as it is still going through some additional testing.
>>> If you’d like to try it to see if it resolves your problem, you’ll need
>> to
>>> set “sort.external.disable_managed” as follows  in your
>>> drill-override.conf file:
>>> 
>>> drill.exec: {
>>>  cluster-id: "drillbits1",
>>>  zk.connect: "localhost:2181",
>>>  sort.external.disable_managed: false
>>> }
>>> 
>>> and run the following query:
>>> 
>>> ALTER SESSION SET `exec.sort.disable_managed` = false;
>>> 
>>> -- Zelaine
>>> 
>>> On 5/1/17, 7:44 AM, "Nate Butler" <na...@chartbeat.com> wrote:
>>> 
>>>    We keep running into this issue when trying to issue a query with
>>> hashagg
>>>    disabled. When I look at system memory usage though, drill doesn't
>>> seem to
>>>    be using much of it but still hits this error.
>>> 
>>>    Our environment:
>>> 
>>>    - 1 r3.8xl
>>>    - 1 drillbit version 1.10.0 configured with 4GB of Heap and 230G of
>>> Direct
>>>    - Data stored on S3 is compressed CSV
>>> 
>>>    I've tried increasing planner.memory.max_query_memory_per_node to
>>> 230G and
>>>    lowered planner.width.max_per_query to 1 and it still fails.
>>> 
>>>    We've applied the patch from this bug in the hopes that it would
>>> resolve
>>>    the issue but it hasn't:
>>> 
>>>    https://issues.apache.org/jira/browse/DRILL-5226
>>> 
>>>    Stack Trace:
>>> 
>>>      (org.apache.drill.exec.exception.OutOfMemoryException) Unable to
>>> allocate
>>>    buffer of size 16777216 due to memory limit. Current allocation:
>>> 8445952
>>>        org.apache.drill.exec.memory.BaseAllocator.buffer():220
>>>        org.apache.drill.exec.memory.BaseAllocator.buffer():195
>>>        org.apache.drill.exec.vector.VarCharVector.reAlloc():425
>>>        org.apache.drill.exec.vector.VarCharVector.copyFromSafe():278
>>>        org.apache.drill.exec.vector.NullableVarCharVector.
>>> copyFromSafe():379
>>> 
>>>    org.apache.drill.exec.test.generated.PriorityQueueCopierGen328.
>>> doCopy():22
>>>        org.apache.drill.exec.test.generated.PriorityQueueCopierGen328.
>>> next():75
>>> 
>>>    org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.
>>> mergeAndSpill():602
>>> 
>>>    org.apache.drill.exec.physical.impl.xsort.
>>> ExternalSortBatch.innerNext():428
>>>        org.apache.drill.exec.record.AbstractRecordBatch.next():162
>>>        org.apache.drill.exec.record.AbstractRecordBatch.next():119
>>>        org.apache.drill.exec.record.AbstractRecordBatch.next():109
>>> 
>>>    org.apache.drill.exec.physical.impl.aggregate.
>>> StreamingAggBatch.innerNext():137
>>>        org.apache.drill.exec.record.AbstractRecordBatch.next():162
>>>        org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>>> 
>>>    org.apache.drill.exec.physical.impl.partitionsender.
>>> PartitionSenderRootExec.innerNext():144
>>>        org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>>>        org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
>>>        org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
>>>        java.security.AccessController.doPrivileged():-2
>>>        javax.security.auth.Subject.doAs():422
>>>        org.apache.hadoop.security.UserGroupInformation.doAs():1657
>>>        org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
>>>        org.apache.drill.common.SelfCleaningRunnable.run():38
>>>        java.util.concurrent.ThreadPoolExecutor.runWorker():1142
>>>        java.util.concurrent.ThreadPoolExecutor$Worker.run():617
>>>        java.lang.Thread.run():745 (state=,code=0)
>>> 
>>>    Is there something I'm missing here? Any help/direction would be
>>>    appreciated.
>>> 
>>>    Thanks,
>>>    Nate
>>> 
>>> 
>>> 
>>

Re: External Sort - Unable to Allocate Buffer error

Posted by Nate Butler <na...@chartbeat.com>.

Ok, thanks Rahul, I will do that.

On Tue, May 2, 2017 at 1:47 PM, rahul challapalli <
challapallirahul@gmail.com> wrote:

> This is clearly a bug and like zelaine suggested the new sort is still work
> in progress. We have a few similar bugs open for the new sort. I could have
> pointed to the jira's but unfortunately JIRA is not working for me due to
> firewall issues.
>
> Another suggestion is build drill from the latest master and try it out, if
> you are willing to spend some time. But again there is no guarantee yet.
>
> Please go ahead and raise a new jira. If it is a duplicate, I will mark it
> as such later. Thank You.
>
> - Rahul
>
> On Tue, May 2, 2017 at 8:24 AM, Nate Butler <na...@chartbeat.com> wrote:
>
> > Zelaine, thanks for the suggestion. I added this option both to the
> > drill-override and in the session and this time the query did stay
> running
> > for much longer but it still eventually failed with the same error,
> > although much different memory values.
> >
> >   (org.apache.drill.exec.exception.OutOfMemoryException) Unable to
> > allocate
> > buffer of size 134217728 due to memory limit. Current allocation:
> > 10653214316
> >     org.apache.drill.exec.memory.BaseAllocator.buffer():220
> >     org.apache.drill.exec.memory.BaseAllocator.buffer():195
> >     org.apache.drill.exec.vector.VarCharVector.reAlloc():425
> >     org.apache.drill.exec.vector.VarCharVector.copyFromSafe():278
> >     org.apache.drill.exec.vector.NullableVarCharVector.
> copyFromSafe():379
> >     org.apache.drill.exec.test.generated.PriorityQueueCopierGen8.
> > doCopy():22
> >     org.apache.drill.exec.test.generated.PriorityQueueCopierGen8.next()
> :76
> >
> > org.apache.drill.exec.physical.impl.xsort.managed.
> > CopierHolder$BatchMerger.next():234
> >
> > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.
> > doMergeAndSpill():1408
> >
> > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.
> > mergeAndSpill():1376
> >
> > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.
> > spillFromMemory():1339
> >
> > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.
> > processBatch():831
> >
> > org.apache.drill.exec.physical.impl.xsort.managed.
> > ExternalSortBatch.loadBatch():618
> >
> > org.apache.drill.exec.physical.impl.xsort.managed.
> > ExternalSortBatch.load():660
> >
> > org.apache.drill.exec.physical.impl.xsort.managed.
> > ExternalSortBatch.innerNext():559
> >     org.apache.drill.exec.record.AbstractRecordBatch.next():162
> >     org.apache.drill.exec.record.AbstractRecordBatch.next():119
> >     org.apache.drill.exec.record.AbstractRecordBatch.next():109
> >
> > org.apache.drill.exec.physical.impl.aggregate.
> > StreamingAggBatch.innerNext():137
> >     org.apache.drill.exec.record.AbstractRecordBatch.next():162
> >     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> >
> > org.apache.drill.exec.physical.impl.partitionsender.
> > PartitionSenderRootExec.innerNext():144
> >     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> >     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
> >     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
> >     java.security.AccessController.doPrivileged():-2
> >     javax.security.auth.Subject.doAs():422
> >     org.apache.hadoop.security.UserGroupInformation.doAs():1657
> >     org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
> >     org.apache.drill.common.SelfCleaningRunnable.run():38
> >     java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> >     java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> >     java.lang.Thread.run():745 (state=,code=0)
> >
> > At first I didn't change planner.width.max_per_query and the default on a
> > 32 core machine makes it 23. This query failed after 34 minutes. I then
> > tried setting planner.width.max_per_query=1 and this query also failed
> but
> > of course took took longer, about 2 hours. In both cases,
> > planner.memory.max_query_memory_per_node was set to 230G.
> >
> >
> > On Mon, May 1, 2017 at 11:09 AM, Zelaine Fong <zf...@mapr.com> wrote:
> >
> > > Nate,
> > >
> > > The Jira you’ve referenced relates to the new external sort, which is
> not
> > > enabled by default, as it is still going through some additional
> testing.
> > > If you’d like to try it to see if it resolves your problem, you’ll need
> > to
> > > set “sort.external.disable_managed” as follows  in your
> > > drill-override.conf file:
> > >
> > > drill.exec: {
> > >   cluster-id: "drillbits1",
> > >   zk.connect: "localhost:2181",
> > >   sort.external.disable_managed: false
> > > }
> > >
> > > and run the following query:
> > >
> > > ALTER SESSION SET `exec.sort.disable_managed` = false;
> > >
> > > -- Zelaine
> > >
> > > On 5/1/17, 7:44 AM, "Nate Butler" <na...@chartbeat.com> wrote:
> > >
> > >     We keep running into this issue when trying to issue a query with
> > > hashagg
> > >     disabled. When I look at system memory usage though, drill doesn't
> > > seem to
> > >     be using much of it but still hits this error.
> > >
> > >     Our environment:
> > >
> > >     - 1 r3.8xl
> > >     - 1 drillbit version 1.10.0 configured with 4GB of Heap and 230G of
> > > Direct
> > >     - Data stored on S3 is compressed CSV
> > >
> > >     I've tried increasing planner.memory.max_query_memory_per_node to
> > > 230G and
> > >     lowered planner.width.max_per_query to 1 and it still fails.
> > >
> > >     We've applied the patch from this bug in the hopes that it would
> > > resolve
> > >     the issue but it hasn't:
> > >
> > >     https://issues.apache.org/jira/browse/DRILL-5226
> > >
> > >     Stack Trace:
> > >
> > >       (org.apache.drill.exec.exception.OutOfMemoryException) Unable to
> > > allocate
> > >     buffer of size 16777216 due to memory limit. Current allocation:
> > > 8445952
> > >         org.apache.drill.exec.memory.BaseAllocator.buffer():220
> > >         org.apache.drill.exec.memory.BaseAllocator.buffer():195
> > >         org.apache.drill.exec.vector.VarCharVector.reAlloc():425
> > >         org.apache.drill.exec.vector.VarCharVector.copyFromSafe():278
> > >         org.apache.drill.exec.vector.NullableVarCharVector.
> > > copyFromSafe():379
> > >
> > >     org.apache.drill.exec.test.generated.PriorityQueueCopierGen328.
> > > doCopy():22
> > >         org.apache.drill.exec.test.generated.
> PriorityQueueCopierGen328.
> > > next():75
> > >
> > >     org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.
> > > mergeAndSpill():602
> > >
> > >     org.apache.drill.exec.physical.impl.xsort.
> > > ExternalSortBatch.innerNext():428
> > >         org.apache.drill.exec.record.AbstractRecordBatch.next():162
> > >         org.apache.drill.exec.record.AbstractRecordBatch.next():119
> > >         org.apache.drill.exec.record.AbstractRecordBatch.next():109
> > >
> > >     org.apache.drill.exec.physical.impl.aggregate.
> > > StreamingAggBatch.innerNext():137
> > >         org.apache.drill.exec.record.AbstractRecordBatch.next():162
> > >         org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> > >
> > >     org.apache.drill.exec.physical.impl.partitionsender.
> > > PartitionSenderRootExec.innerNext():144
> > >         org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> > >         org.apache.drill.exec.work.fragment.FragmentExecutor$1.
> run():232
> > >         org.apache.drill.exec.work.fragment.FragmentExecutor$1.
> run():226
> > >         java.security.AccessController.doPrivileged():-2
> > >         javax.security.auth.Subject.doAs():422
> > >         org.apache.hadoop.security.UserGroupInformation.doAs():1657
> > >         org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
> > >         org.apache.drill.common.SelfCleaningRunnable.run():38
> > >         java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> > >         java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> > >         java.lang.Thread.run():745 (state=,code=0)
> > >
> > >     Is there something I'm missing here? Any help/direction would be
> > >     appreciated.
> > >
> > >     Thanks,
> > >     Nate
> > >
> > >
> > >
> >
>

Re: External Sort - Unable to Allocate Buffer error

Posted by rahul challapalli <ch...@gmail.com>.

This is clearly a bug and like zelaine suggested the new sort is still work
in progress. We have a few similar bugs open for the new sort. I could have
pointed to the jira's but unfortunately JIRA is not working for me due to
firewall issues.

Another suggestion is build drill from the latest master and try it out, if
you are willing to spend some time. But again there is no guarantee yet.

Please go ahead and raise a new jira. If it is a duplicate, I will mark it
as such later. Thank You.

- Rahul

On Tue, May 2, 2017 at 8:24 AM, Nate Butler <na...@chartbeat.com> wrote:

> Zelaine, thanks for the suggestion. I added this option both to the
> drill-override and in the session and this time the query did stay running
> for much longer but it still eventually failed with the same error,
> although much different memory values.
>
>   (org.apache.drill.exec.exception.OutOfMemoryException) Unable to
> allocate
> buffer of size 134217728 due to memory limit. Current allocation:
> 10653214316
>     org.apache.drill.exec.memory.BaseAllocator.buffer():220
>     org.apache.drill.exec.memory.BaseAllocator.buffer():195
>     org.apache.drill.exec.vector.VarCharVector.reAlloc():425
>     org.apache.drill.exec.vector.VarCharVector.copyFromSafe():278
>     org.apache.drill.exec.vector.NullableVarCharVector.copyFromSafe():379
>     org.apache.drill.exec.test.generated.PriorityQueueCopierGen8.
> doCopy():22
>     org.apache.drill.exec.test.generated.PriorityQueueCopierGen8.next():76
>
> org.apache.drill.exec.physical.impl.xsort.managed.
> CopierHolder$BatchMerger.next():234
>
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.
> doMergeAndSpill():1408
>
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.
> mergeAndSpill():1376
>
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.
> spillFromMemory():1339
>
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.
> processBatch():831
>
> org.apache.drill.exec.physical.impl.xsort.managed.
> ExternalSortBatch.loadBatch():618
>
> org.apache.drill.exec.physical.impl.xsort.managed.
> ExternalSortBatch.load():660
>
> org.apache.drill.exec.physical.impl.xsort.managed.
> ExternalSortBatch.innerNext():559
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>
> org.apache.drill.exec.physical.impl.aggregate.
> StreamingAggBatch.innerNext():137
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>
> org.apache.drill.exec.physical.impl.partitionsender.
> PartitionSenderRootExec.innerNext():144
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1657
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1142
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():617
>     java.lang.Thread.run():745 (state=,code=0)
>
> At first I didn't change planner.width.max_per_query and the default on a
> 32 core machine makes it 23. This query failed after 34 minutes. I then
> tried setting planner.width.max_per_query=1 and this query also failed but
> of course took took longer, about 2 hours. In both cases,
> planner.memory.max_query_memory_per_node was set to 230G.
>
>
> On Mon, May 1, 2017 at 11:09 AM, Zelaine Fong <zf...@mapr.com> wrote:
>
> > Nate,
> >
> > The Jira you’ve referenced relates to the new external sort, which is not
> > enabled by default, as it is still going through some additional testing.
> > If you’d like to try it to see if it resolves your problem, you’ll need
> to
> > set “sort.external.disable_managed” as follows  in your
> > drill-override.conf file:
> >
> > drill.exec: {
> >   cluster-id: "drillbits1",
> >   zk.connect: "localhost:2181",
> >   sort.external.disable_managed: false
> > }
> >
> > and run the following query:
> >
> > ALTER SESSION SET `exec.sort.disable_managed` = false;
> >
> > -- Zelaine
> >
> > On 5/1/17, 7:44 AM, "Nate Butler" <na...@chartbeat.com> wrote:
> >
> >     We keep running into this issue when trying to issue a query with
> > hashagg
> >     disabled. When I look at system memory usage though, drill doesn't
> > seem to
> >     be using much of it but still hits this error.
> >
> >     Our environment:
> >
> >     - 1 r3.8xl
> >     - 1 drillbit version 1.10.0 configured with 4GB of Heap and 230G of
> > Direct
> >     - Data stored on S3 is compressed CSV
> >
> >     I've tried increasing planner.memory.max_query_memory_per_node to
> > 230G and
> >     lowered planner.width.max_per_query to 1 and it still fails.
> >
> >     We've applied the patch from this bug in the hopes that it would
> > resolve
> >     the issue but it hasn't:
> >
> >     https://issues.apache.org/jira/browse/DRILL-5226
> >
> >     Stack Trace:
> >
> >       (org.apache.drill.exec.exception.OutOfMemoryException) Unable to
> > allocate
> >     buffer of size 16777216 due to memory limit. Current allocation:
> > 8445952
> >         org.apache.drill.exec.memory.BaseAllocator.buffer():220
> >         org.apache.drill.exec.memory.BaseAllocator.buffer():195
> >         org.apache.drill.exec.vector.VarCharVector.reAlloc():425
> >         org.apache.drill.exec.vector.VarCharVector.copyFromSafe():278
> >         org.apache.drill.exec.vector.NullableVarCharVector.
> > copyFromSafe():379
> >
> >     org.apache.drill.exec.test.generated.PriorityQueueCopierGen328.
> > doCopy():22
> >         org.apache.drill.exec.test.generated.PriorityQueueCopierGen328.
> > next():75
> >
> >     org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.
> > mergeAndSpill():602
> >
> >     org.apache.drill.exec.physical.impl.xsort.
> > ExternalSortBatch.innerNext():428
> >         org.apache.drill.exec.record.AbstractRecordBatch.next():162
> >         org.apache.drill.exec.record.AbstractRecordBatch.next():119
> >         org.apache.drill.exec.record.AbstractRecordBatch.next():109
> >
> >     org.apache.drill.exec.physical.impl.aggregate.
> > StreamingAggBatch.innerNext():137
> >         org.apache.drill.exec.record.AbstractRecordBatch.next():162
> >         org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> >
> >     org.apache.drill.exec.physical.impl.partitionsender.
> > PartitionSenderRootExec.innerNext():144
> >         org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> >         org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
> >         org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
> >         java.security.AccessController.doPrivileged():-2
> >         javax.security.auth.Subject.doAs():422
> >         org.apache.hadoop.security.UserGroupInformation.doAs():1657
> >         org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
> >         org.apache.drill.common.SelfCleaningRunnable.run():38
> >         java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> >         java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> >         java.lang.Thread.run():745 (state=,code=0)
> >
> >     Is there something I'm missing here? Any help/direction would be
> >     appreciated.
> >
> >     Thanks,
> >     Nate
> >
> >
> >
>

Re: External Sort - Unable to Allocate Buffer error

Posted by Nate Butler <na...@chartbeat.com>.

Zelaine, thanks for the suggestion. I added this option both to the
drill-override and in the session and this time the query did stay running
for much longer but it still eventually failed with the same error,
although much different memory values.

  (org.apache.drill.exec.exception.OutOfMemoryException) Unable to allocate
buffer of size 134217728 due to memory limit. Current allocation:
10653214316
    org.apache.drill.exec.memory.BaseAllocator.buffer():220
    org.apache.drill.exec.memory.BaseAllocator.buffer():195
    org.apache.drill.exec.vector.VarCharVector.reAlloc():425
    org.apache.drill.exec.vector.VarCharVector.copyFromSafe():278
    org.apache.drill.exec.vector.NullableVarCharVector.copyFromSafe():379
    org.apache.drill.exec.test.generated.PriorityQueueCopierGen8.doCopy():22
    org.apache.drill.exec.test.generated.PriorityQueueCopierGen8.next():76

org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder$BatchMerger.next():234

org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.doMergeAndSpill():1408

org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeAndSpill():1376

org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.spillFromMemory():1339

org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.processBatch():831

org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch():618

org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():660

org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():559
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109

org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():137
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.physical.impl.BaseRootExec.next():104

org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():144
    org.apache.drill.exec.physical.impl.BaseRootExec.next():94
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1657
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1142
    java.util.concurrent.ThreadPoolExecutor$Worker.run():617
    java.lang.Thread.run():745 (state=,code=0)

At first I didn't change planner.width.max_per_query and the default on a
32 core machine makes it 23. This query failed after 34 minutes. I then
tried setting planner.width.max_per_query=1 and this query also failed but
of course took took longer, about 2 hours. In both cases,
planner.memory.max_query_memory_per_node was set to 230G.


On Mon, May 1, 2017 at 11:09 AM, Zelaine Fong <zf...@mapr.com> wrote:

> Nate,
>
> The Jira you’ve referenced relates to the new external sort, which is not
> enabled by default, as it is still going through some additional testing.
> If you’d like to try it to see if it resolves your problem, you’ll need to
> set “sort.external.disable_managed” as follows  in your
> drill-override.conf file:
>
> drill.exec: {
>   cluster-id: "drillbits1",
>   zk.connect: "localhost:2181",
>   sort.external.disable_managed: false
> }
>
> and run the following query:
>
> ALTER SESSION SET `exec.sort.disable_managed` = false;
>
> -- Zelaine
>
> On 5/1/17, 7:44 AM, "Nate Butler" <na...@chartbeat.com> wrote:
>
>     We keep running into this issue when trying to issue a query with
> hashagg
>     disabled. When I look at system memory usage though, drill doesn't
> seem to
>     be using much of it but still hits this error.
>
>     Our environment:
>
>     - 1 r3.8xl
>     - 1 drillbit version 1.10.0 configured with 4GB of Heap and 230G of
> Direct
>     - Data stored on S3 is compressed CSV
>
>     I've tried increasing planner.memory.max_query_memory_per_node to
> 230G and
>     lowered planner.width.max_per_query to 1 and it still fails.
>
>     We've applied the patch from this bug in the hopes that it would
> resolve
>     the issue but it hasn't:
>
>     https://issues.apache.org/jira/browse/DRILL-5226
>
>     Stack Trace:
>
>       (org.apache.drill.exec.exception.OutOfMemoryException) Unable to
> allocate
>     buffer of size 16777216 due to memory limit. Current allocation:
> 8445952
>         org.apache.drill.exec.memory.BaseAllocator.buffer():220
>         org.apache.drill.exec.memory.BaseAllocator.buffer():195
>         org.apache.drill.exec.vector.VarCharVector.reAlloc():425
>         org.apache.drill.exec.vector.VarCharVector.copyFromSafe():278
>         org.apache.drill.exec.vector.NullableVarCharVector.
> copyFromSafe():379
>
>     org.apache.drill.exec.test.generated.PriorityQueueCopierGen328.
> doCopy():22
>         org.apache.drill.exec.test.generated.PriorityQueueCopierGen328.
> next():75
>
>     org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.
> mergeAndSpill():602
>
>     org.apache.drill.exec.physical.impl.xsort.
> ExternalSortBatch.innerNext():428
>         org.apache.drill.exec.record.AbstractRecordBatch.next():162
>         org.apache.drill.exec.record.AbstractRecordBatch.next():119
>         org.apache.drill.exec.record.AbstractRecordBatch.next():109
>
>     org.apache.drill.exec.physical.impl.aggregate.
> StreamingAggBatch.innerNext():137
>         org.apache.drill.exec.record.AbstractRecordBatch.next():162
>         org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>
>     org.apache.drill.exec.physical.impl.partitionsender.
> PartitionSenderRootExec.innerNext():144
>         org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>         org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
>         org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
>         java.security.AccessController.doPrivileged():-2
>         javax.security.auth.Subject.doAs():422
>         org.apache.hadoop.security.UserGroupInformation.doAs():1657
>         org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
>         org.apache.drill.common.SelfCleaningRunnable.run():38
>         java.util.concurrent.ThreadPoolExecutor.runWorker():1142
>         java.util.concurrent.ThreadPoolExecutor$Worker.run():617
>         java.lang.Thread.run():745 (state=,code=0)
>
>     Is there something I'm missing here? Any help/direction would be
>     appreciated.
>
>     Thanks,
>     Nate
>
>
>

Re: External Sort - Unable to Allocate Buffer error

Posted by Zelaine Fong <zf...@mapr.com>.

Nate,

The Jira you’ve referenced relates to the new external sort, which is not enabled by default, as it is still going through some additional testing.  If you’d like to try it to see if it resolves your problem, you’ll need to 
set “sort.external.disable_managed” as follows  in your drill-override.conf file:

drill.exec: {
  cluster-id: "drillbits1",
  zk.connect: "localhost:2181",
  sort.external.disable_managed: false
}

and run the following query:

ALTER SESSION SET `exec.sort.disable_managed` = false;

-- Zelaine

On 5/1/17, 7:44 AM, "Nate Butler" <na...@chartbeat.com> wrote:

    We keep running into this issue when trying to issue a query with hashagg
    disabled. When I look at system memory usage though, drill doesn't seem to
    be using much of it but still hits this error.
    
    Our environment:
    
    - 1 r3.8xl
    - 1 drillbit version 1.10.0 configured with 4GB of Heap and 230G of Direct
    - Data stored on S3 is compressed CSV
    
    I've tried increasing planner.memory.max_query_memory_per_node to 230G and
    lowered planner.width.max_per_query to 1 and it still fails.
    
    We've applied the patch from this bug in the hopes that it would resolve
    the issue but it hasn't:
    
    https://issues.apache.org/jira/browse/DRILL-5226
    
    Stack Trace:
    
      (org.apache.drill.exec.exception.OutOfMemoryException) Unable to allocate
    buffer of size 16777216 due to memory limit. Current allocation: 8445952
        org.apache.drill.exec.memory.BaseAllocator.buffer():220
        org.apache.drill.exec.memory.BaseAllocator.buffer():195
        org.apache.drill.exec.vector.VarCharVector.reAlloc():425
        org.apache.drill.exec.vector.VarCharVector.copyFromSafe():278
        org.apache.drill.exec.vector.NullableVarCharVector.copyFromSafe():379
    
    org.apache.drill.exec.test.generated.PriorityQueueCopierGen328.doCopy():22
        org.apache.drill.exec.test.generated.PriorityQueueCopierGen328.next():75
    
    org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill():602
    
    org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext():428
        org.apache.drill.exec.record.AbstractRecordBatch.next():162
        org.apache.drill.exec.record.AbstractRecordBatch.next():119
        org.apache.drill.exec.record.AbstractRecordBatch.next():109
    
    org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():137
        org.apache.drill.exec.record.AbstractRecordBatch.next():162
        org.apache.drill.exec.physical.impl.BaseRootExec.next():104
    
    org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():144
        org.apache.drill.exec.physical.impl.BaseRootExec.next():94
        org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
        org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
        java.security.AccessController.doPrivileged():-2
        javax.security.auth.Subject.doAs():422
        org.apache.hadoop.security.UserGroupInformation.doAs():1657
        org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
        org.apache.drill.common.SelfCleaningRunnable.run():38
        java.util.concurrent.ThreadPoolExecutor.runWorker():1142
        java.util.concurrent.ThreadPoolExecutor$Worker.run():617
        java.lang.Thread.run():745 (state=,code=0)
    
    Is there something I'm missing here? Any help/direction would be
    appreciated.
    
    Thanks,
    Nate