You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by rahul challapalli <ch...@gmail.com> on 2016/09/01 18:21:56 UTC

Re: Query hangs on planning

While planning we use heap memory. 2GB of heap should be sufficient for
what you mentioned. This looks like a bug to me. Can you raise a jira for
the same? And it would be super helpful if you can also attach the data set
used.

Rahul

On Wed, Aug 31, 2016 at 9:14 AM, Oscar Morante <sp...@gmail.com> wrote:

> Sure,
> This is what I remember:
>
> * Failure
>    - embedded mode on my laptop
>    - drill memory: 2Gb/4Gb (heap/direct)
>    - cpu: 4cores (+hyperthreading)
>    - `planner.width.max_per_node=6`
>
> * Success
>    - AWS Cluster 2x c3.8xlarge
>    - drill memory: 16Gb/32Gb
>    - cpu: limited by kubernetes to 24cores
>    - `planner.width.max_per_node=23`
>
> I'm very busy right now to test again, but I'll try to provide better info
> as soon as I can.
>
>
>
> On Wed, Aug 31, 2016 at 05:38:53PM +0530, Khurram Faraaz wrote:
>
>> Can you please share the number of cores on the setup where the query hung
>> as compared to the number of cores on the setup where the query went
>> through successfully.
>> And details of memory from the two scenarios.
>>
>> Thanks,
>> Khurram
>>
>> On Wed, Aug 31, 2016 at 4:50 PM, Oscar Morante <sp...@gmail.com>
>> wrote:
>>
>> For the record, I think this was just bad memory configuration after all.
>>> I retested on bigger machines and everything seems to be working fine.
>>>
>>>
>>> On Tue, Aug 09, 2016 at 10:46:33PM +0530, Khurram Faraaz wrote:
>>>
>>> Oscar, can you please report a JIRA with the required steps to reproduce
>>>> the OOM error. That way someone from the Drill team will take a look and
>>>> investigate.
>>>>
>>>> For others interested here is the stack trace.
>>>>
>>>> 2016-08-09 16:51:14,280 [285642de-ab37-de6e-a54c-378aaa4ce50e:foreman]
>>>> ERROR o.a.drill.common.CatastrophicFailure - Catastrophic Failure
>>>> Occurred,
>>>> exiting. Information message: Unable to handle out of memory condition
>>>> in
>>>> Foreman.
>>>> java.lang.OutOfMemoryError: Java heap space
>>>>        at java.util.Arrays.copyOfRange(Arrays.java:2694)
>>>> ~[na:1.7.0_111]
>>>>        at java.lang.String.<init>(String.java:203) ~[na:1.7.0_111]
>>>>        at java.lang.StringBuilder.toString(StringBuilder.java:405)
>>>> ~[na:1.7.0_111]
>>>>        at org.apache.calcite.util.Util.newInternal(Util.java:785)
>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>>>        at
>>>> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
>>>> VolcanoRuleCall.java:251)
>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>>>        at
>>>> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(
>>>> VolcanoPlanner.java:808)
>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>>>        at
>>>> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303)
>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>>>        at
>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>>>> .transform(DefaultSqlHandler.java:404)
>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>        at
>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>>>> .transform(DefaultSqlHandler.java:343)
>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>        at
>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>>>> .convertToDrel(DefaultSqlHandler.java:240)
>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>        at
>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>>>> .convertToDrel(DefaultSqlHandler.java:290)
>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>        at
>>>> org.apache.drill.exec.planner.sql.handlers.ExplainHandler.ge
>>>> tPlan(ExplainHandler.java:61)
>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>        at
>>>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(Dri
>>>> llSqlWorker.java:94)
>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>        at
>>>> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:978)
>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>        at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:
>>>> 257)
>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>        at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>>> Executor.java:1145)
>>>> [na:1.7.0_111]
>>>>        at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>>> lExecutor.java:615)
>>>> [na:1.7.0_111]
>>>>        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
>>>>
>>>> Thanks,
>>>> Khurram
>>>>
>>>> On Tue, Aug 9, 2016 at 7:46 PM, Oscar Morante <sp...@gmail.com>
>>>> wrote:
>>>>
>>>> Yeah, when I uncomment only the `upload_date` lines (a dir0 alias),
>>>>
>>>>> explain succeeds within ~30s.  Enabling any of the other lines triggers
>>>>> the
>>>>> failure.
>>>>>
>>>>> This is a log with the `upload_date` lines and `usage <> 'Test'`
>>>>> enabled:
>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022b3c55e
>>>>>
>>>>> The client times out around here (~1.5hours):
>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
>>>>> b3c55e#file-drillbit-log-L178
>>>>>
>>>>> And it still keeps running for a while until it dies (~2.5hours):
>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
>>>>> b3c55e#file-drillbit-log-L178
>>>>>
>>>>> The memory settings for this test were:
>>>>>
>>>>>    DRILL_HEAP="4G"
>>>>>    DRILL_MAX_DIRECT_MEMORY="8G"
>>>>>
>>>>> This is on a laptop with 16G and I should probably lower it, but it
>>>>> seems
>>>>> a bit excessive for such a small query.  And I think I got the same
>>>>> results
>>>>> on a 2 node cluster with 8/16.  I'm gonna try again on the cluster to
>>>>> make
>>>>> sure.
>>>>>
>>>>> Thanks,
>>>>> Oscar
>>>>>
>>>>>
>>>>> On Tue, Aug 09, 2016 at 04:13:17PM +0530, Khurram Faraaz wrote:
>>>>>
>>>>> You mentioned "*But if I uncomment the where clause then it runs for a
>>>>>
>>>>>> couple of hours until it runs out of memory.*"
>>>>>>
>>>>>> Can you please share the OutOfMemory details from drillbit.log and the
>>>>>> value of DRILL_MAX_DIRECT_MEMORY
>>>>>>
>>>>>> Can you also try to see what happens if you retain just this line
>>>>>> where
>>>>>> upload_date = '2016-08-01' in your where clause, can you check if the
>>>>>> explain succeeds.
>>>>>>
>>>>>> Thanks,
>>>>>> Khurram
>>>>>>
>>>>>> On Tue, Aug 9, 2016 at 4:00 PM, Oscar Morante <sp...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi there,
>>>>>>
>>>>>> I've been stuck with this for a while and I'm not sure if I'm running
>>>>>>> into
>>>>>>> a bug or I'm just doing something very wrong.
>>>>>>>
>>>>>>> I have this stripped-down version of my query:
>>>>>>> https://gist.github.com/spacepluk/9ab1e1a0cfec6f0efb298f023f4c805b
>>>>>>>
>>>>>>> The data is just a single file with one record (1.5K).
>>>>>>>
>>>>>>> Without changing anything, explain takes ~1sec on my machine.  But
>>>>>>> if I
>>>>>>> uncomment the where clause then it runs for a couple of hours until
>>>>>>> it
>>>>>>> runs
>>>>>>> out of memory.
>>>>>>>
>>>>>>> Also if I uncomment the where clause *and* take out the join, then it
>>>>>>> takes around 30s to plan.
>>>>>>>
>>>>>>> Any ideas?
>>>>>>> Thanks!
>>>>>>>
>>>>>>>

Re: Query hangs on planning

Posted by Oscar Morante <sp...@gmail.com>.

Hi Rahul,
I'm still very busy :(  But I haven't forgotten about this.  I'll open a 
JIRA with a proper test-case as soon as I get the chance.


On Thu, Sep 01, 2016 at 12:03:43PM -0700, Zelaine Fong wrote:
>Ah ... yes, you're right.  I forgot that was off heap.
>
>-- Zelaine
>
>On Thu, Sep 1, 2016 at 11:41 AM, Sudheesh Katkam <sk...@maprtech.com>
>wrote:
>
>> That setting is for off-heap memory. The earlier case hit heap memory
>> limit.
>>
>> > On Sep 1, 2016, at 11:36 AM, Zelaine Fong <zf...@maprtech.com> wrote:
>> >
>> > One other thing ... have you tried tuning the planner.memory_limit
>> > parameter?  Based on the earlier stack trace, you're hitting a memory
>> limit
>> > during query planning.  So, tuning this parameter should help that.  The
>> > default is 256 MB.
>> >
>> > -- Zelaine
>> >
>> > On Thu, Sep 1, 2016 at 11:21 AM, rahul challapalli <
>> > challapallirahul@gmail.com> wrote:
>> >
>> >> While planning we use heap memory. 2GB of heap should be sufficient for
>> >> what you mentioned. This looks like a bug to me. Can you raise a jira
>> for
>> >> the same? And it would be super helpful if you can also attach the data
>> set
>> >> used.
>> >>
>> >> Rahul
>> >>
>> >> On Wed, Aug 31, 2016 at 9:14 AM, Oscar Morante <sp...@gmail.com>
>> >> wrote:
>> >>
>> >>> Sure,
>> >>> This is what I remember:
>> >>>
>> >>> * Failure
>> >>>   - embedded mode on my laptop
>> >>>   - drill memory: 2Gb/4Gb (heap/direct)
>> >>>   - cpu: 4cores (+hyperthreading)
>> >>>   - `planner.width.max_per_node=6`
>> >>>
>> >>> * Success
>> >>>   - AWS Cluster 2x c3.8xlarge
>> >>>   - drill memory: 16Gb/32Gb
>> >>>   - cpu: limited by kubernetes to 24cores
>> >>>   - `planner.width.max_per_node=23`
>> >>>
>> >>> I'm very busy right now to test again, but I'll try to provide better
>> >> info
>> >>> as soon as I can.
>> >>>
>> >>>
>> >>>
>> >>> On Wed, Aug 31, 2016 at 05:38:53PM +0530, Khurram Faraaz wrote:
>> >>>
>> >>>> Can you please share the number of cores on the setup where the query
>> >> hung
>> >>>> as compared to the number of cores on the setup where the query went
>> >>>> through successfully.
>> >>>> And details of memory from the two scenarios.
>> >>>>
>> >>>> Thanks,
>> >>>> Khurram
>> >>>>
>> >>>> On Wed, Aug 31, 2016 at 4:50 PM, Oscar Morante <sp...@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>> For the record, I think this was just bad memory configuration after
>> >> all.
>> >>>>> I retested on bigger machines and everything seems to be working
>> fine.
>> >>>>>
>> >>>>>
>> >>>>> On Tue, Aug 09, 2016 at 10:46:33PM +0530, Khurram Faraaz wrote:
>> >>>>>
>> >>>>> Oscar, can you please report a JIRA with the required steps to
>> >> reproduce
>> >>>>>> the OOM error. That way someone from the Drill team will take a look
>> >> and
>> >>>>>> investigate.
>> >>>>>>
>> >>>>>> For others interested here is the stack trace.
>> >>>>>>
>> >>>>>> 2016-08-09 16:51:14,280 [285642de-ab37-de6e-a54c-
>> >> 378aaa4ce50e:foreman]
>> >>>>>> ERROR o.a.drill.common.CatastrophicFailure - Catastrophic Failure
>> >>>>>> Occurred,
>> >>>>>> exiting. Information message: Unable to handle out of memory
>> condition
>> >>>>>> in
>> >>>>>> Foreman.
>> >>>>>> java.lang.OutOfMemoryError: Java heap space
>> >>>>>>       at java.util.Arrays.copyOfRange(Arrays.java:2694)
>> >>>>>> ~[na:1.7.0_111]
>> >>>>>>       at java.lang.String.<init>(String.java:203) ~[na:1.7.0_111]
>> >>>>>>       at java.lang.StringBuilder.toString(StringBuilder.java:405)
>> >>>>>> ~[na:1.7.0_111]
>> >>>>>>       at org.apache.calcite.util.Util.newInternal(Util.java:785)
>> >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>> >>>>>>       at
>> >>>>>> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
>> >>>>>> VolcanoRuleCall.java:251)
>> >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>> >>>>>>       at
>> >>>>>> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(
>> >>>>>> VolcanoPlanner.java:808)
>> >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>> >>>>>>       at
>> >>>>>> org.apache.calcite.tools.Programs$RuleSetProgram.run(
>> >> Programs.java:303)
>> >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>> >>>>>>       at
>> >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>> >>>>>> .transform(DefaultSqlHandler.java:404)
>> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>> >>>>>>       at
>> >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>> >>>>>> .transform(DefaultSqlHandler.java:343)
>> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>> >>>>>>       at
>> >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>> >>>>>> .convertToDrel(DefaultSqlHandler.java:240)
>> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>> >>>>>>       at
>> >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>> >>>>>> .convertToDrel(DefaultSqlHandler.java:290)
>> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>> >>>>>>       at
>> >>>>>> org.apache.drill.exec.planner.sql.handlers.ExplainHandler.ge
>> >>>>>> tPlan(ExplainHandler.java:61)
>> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>> >>>>>>       at
>> >>>>>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(Dri
>> >>>>>> llSqlWorker.java:94)
>> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>> >>>>>>       at
>> >>>>>> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:978)
>> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>> >>>>>>       at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.
>> >> java:
>> >>>>>> 257)
>> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>> >>>>>>       at
>> >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> >>>>>> Executor.java:1145)
>> >>>>>> [na:1.7.0_111]
>> >>>>>>       at
>> >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> >>>>>> lExecutor.java:615)
>> >>>>>> [na:1.7.0_111]
>> >>>>>>       at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Khurram
>> >>>>>>
>> >>>>>> On Tue, Aug 9, 2016 at 7:46 PM, Oscar Morante <sp...@gmail.com>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>> Yeah, when I uncomment only the `upload_date` lines (a dir0 alias),
>> >>>>>>
>> >>>>>>> explain succeeds within ~30s.  Enabling any of the other lines
>> >> triggers
>> >>>>>>> the
>> >>>>>>> failure.
>> >>>>>>>
>> >>>>>>> This is a log with the `upload_date` lines and `usage <> 'Test'`
>> >>>>>>> enabled:
>> >>>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022b3c55e
>> >>>>>>>
>> >>>>>>> The client times out around here (~1.5hours):
>> >>>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
>> >>>>>>> b3c55e#file-drillbit-log-L178
>> >>>>>>>
>> >>>>>>> And it still keeps running for a while until it dies (~2.5hours):
>> >>>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
>> >>>>>>> b3c55e#file-drillbit-log-L178
>> >>>>>>>
>> >>>>>>> The memory settings for this test were:
>> >>>>>>>
>> >>>>>>>   DRILL_HEAP="4G"
>> >>>>>>>   DRILL_MAX_DIRECT_MEMORY="8G"
>> >>>>>>>
>> >>>>>>> This is on a laptop with 16G and I should probably lower it, but it
>> >>>>>>> seems
>> >>>>>>> a bit excessive for such a small query.  And I think I got the same
>> >>>>>>> results
>> >>>>>>> on a 2 node cluster with 8/16.  I'm gonna try again on the cluster
>> to
>> >>>>>>> make
>> >>>>>>> sure.
>> >>>>>>>
>> >>>>>>> Thanks,
>> >>>>>>> Oscar
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Tue, Aug 09, 2016 at 04:13:17PM +0530, Khurram Faraaz wrote:
>> >>>>>>>
>> >>>>>>> You mentioned "*But if I uncomment the where clause then it runs
>> for
>> >> a
>> >>>>>>>
>> >>>>>>>> couple of hours until it runs out of memory.*"
>> >>>>>>>>
>> >>>>>>>> Can you please share the OutOfMemory details from drillbit.log and
>> >> the
>> >>>>>>>> value of DRILL_MAX_DIRECT_MEMORY
>> >>>>>>>>
>> >>>>>>>> Can you also try to see what happens if you retain just this line
>> >>>>>>>> where
>> >>>>>>>> upload_date = '2016-08-01' in your where clause, can you check if
>> >> the
>> >>>>>>>> explain succeeds.
>> >>>>>>>>
>> >>>>>>>> Thanks,
>> >>>>>>>> Khurram
>> >>>>>>>>
>> >>>>>>>> On Tue, Aug 9, 2016 at 4:00 PM, Oscar Morante <
>> spacepluk@gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> Hi there,
>> >>>>>>>>
>> >>>>>>>> I've been stuck with this for a while and I'm not sure if I'm
>> >> running
>> >>>>>>>>> into
>> >>>>>>>>> a bug or I'm just doing something very wrong.
>> >>>>>>>>>
>> >>>>>>>>> I have this stripped-down version of my query:
>> >>>>>>>>> https://gist.github.com/spacepluk/9ab1e1a0cfec6f0efb298f023f4c80
>> 5b
>> >>>>>>>>>
>> >>>>>>>>> The data is just a single file with one record (1.5K).
>> >>>>>>>>>
>> >>>>>>>>> Without changing anything, explain takes ~1sec on my machine.
>> But
>> >>>>>>>>> if I
>> >>>>>>>>> uncomment the where clause then it runs for a couple of hours
>> until
>> >>>>>>>>> it
>> >>>>>>>>> runs
>> >>>>>>>>> out of memory.
>> >>>>>>>>>
>> >>>>>>>>> Also if I uncomment the where clause *and* take out the join,
>> then
>> >> it
>> >>>>>>>>> takes around 30s to plan.
>> >>>>>>>>>
>> >>>>>>>>> Any ideas?
>> >>>>>>>>> Thanks!
>> >>>>>>>>>
>> >>>>>>>>>
>> >>
>>
>>

-- 
Oscar Morante
"Self-education is, I firmly believe, the only kind of education there is."
                                                          -- Isaac Asimov.

Re: Query hangs on planning

Posted by Zelaine Fong <zf...@maprtech.com>.

Ah ... yes, you're right.  I forgot that was off heap.

-- Zelaine

On Thu, Sep 1, 2016 at 11:41 AM, Sudheesh Katkam <sk...@maprtech.com>
wrote:

> That setting is for off-heap memory. The earlier case hit heap memory
> limit.
>
> > On Sep 1, 2016, at 11:36 AM, Zelaine Fong <zf...@maprtech.com> wrote:
> >
> > One other thing ... have you tried tuning the planner.memory_limit
> > parameter?  Based on the earlier stack trace, you're hitting a memory
> limit
> > during query planning.  So, tuning this parameter should help that.  The
> > default is 256 MB.
> >
> > -- Zelaine
> >
> > On Thu, Sep 1, 2016 at 11:21 AM, rahul challapalli <
> > challapallirahul@gmail.com> wrote:
> >
> >> While planning we use heap memory. 2GB of heap should be sufficient for
> >> what you mentioned. This looks like a bug to me. Can you raise a jira
> for
> >> the same? And it would be super helpful if you can also attach the data
> set
> >> used.
> >>
> >> Rahul
> >>
> >> On Wed, Aug 31, 2016 at 9:14 AM, Oscar Morante <sp...@gmail.com>
> >> wrote:
> >>
> >>> Sure,
> >>> This is what I remember:
> >>>
> >>> * Failure
> >>>   - embedded mode on my laptop
> >>>   - drill memory: 2Gb/4Gb (heap/direct)
> >>>   - cpu: 4cores (+hyperthreading)
> >>>   - `planner.width.max_per_node=6`
> >>>
> >>> * Success
> >>>   - AWS Cluster 2x c3.8xlarge
> >>>   - drill memory: 16Gb/32Gb
> >>>   - cpu: limited by kubernetes to 24cores
> >>>   - `planner.width.max_per_node=23`
> >>>
> >>> I'm very busy right now to test again, but I'll try to provide better
> >> info
> >>> as soon as I can.
> >>>
> >>>
> >>>
> >>> On Wed, Aug 31, 2016 at 05:38:53PM +0530, Khurram Faraaz wrote:
> >>>
> >>>> Can you please share the number of cores on the setup where the query
> >> hung
> >>>> as compared to the number of cores on the setup where the query went
> >>>> through successfully.
> >>>> And details of memory from the two scenarios.
> >>>>
> >>>> Thanks,
> >>>> Khurram
> >>>>
> >>>> On Wed, Aug 31, 2016 at 4:50 PM, Oscar Morante <sp...@gmail.com>
> >>>> wrote:
> >>>>
> >>>> For the record, I think this was just bad memory configuration after
> >> all.
> >>>>> I retested on bigger machines and everything seems to be working
> fine.
> >>>>>
> >>>>>
> >>>>> On Tue, Aug 09, 2016 at 10:46:33PM +0530, Khurram Faraaz wrote:
> >>>>>
> >>>>> Oscar, can you please report a JIRA with the required steps to
> >> reproduce
> >>>>>> the OOM error. That way someone from the Drill team will take a look
> >> and
> >>>>>> investigate.
> >>>>>>
> >>>>>> For others interested here is the stack trace.
> >>>>>>
> >>>>>> 2016-08-09 16:51:14,280 [285642de-ab37-de6e-a54c-
> >> 378aaa4ce50e:foreman]
> >>>>>> ERROR o.a.drill.common.CatastrophicFailure - Catastrophic Failure
> >>>>>> Occurred,
> >>>>>> exiting. Information message: Unable to handle out of memory
> condition
> >>>>>> in
> >>>>>> Foreman.
> >>>>>> java.lang.OutOfMemoryError: Java heap space
> >>>>>>       at java.util.Arrays.copyOfRange(Arrays.java:2694)
> >>>>>> ~[na:1.7.0_111]
> >>>>>>       at java.lang.String.<init>(String.java:203) ~[na:1.7.0_111]
> >>>>>>       at java.lang.StringBuilder.toString(StringBuilder.java:405)
> >>>>>> ~[na:1.7.0_111]
> >>>>>>       at org.apache.calcite.util.Util.newInternal(Util.java:785)
> >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
> >>>>>>       at
> >>>>>> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
> >>>>>> VolcanoRuleCall.java:251)
> >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
> >>>>>>       at
> >>>>>> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(
> >>>>>> VolcanoPlanner.java:808)
> >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
> >>>>>>       at
> >>>>>> org.apache.calcite.tools.Programs$RuleSetProgram.run(
> >> Programs.java:303)
> >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
> >>>>>>       at
> >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
> >>>>>> .transform(DefaultSqlHandler.java:404)
> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>>>       at
> >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
> >>>>>> .transform(DefaultSqlHandler.java:343)
> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>>>       at
> >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
> >>>>>> .convertToDrel(DefaultSqlHandler.java:240)
> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>>>       at
> >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
> >>>>>> .convertToDrel(DefaultSqlHandler.java:290)
> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>>>       at
> >>>>>> org.apache.drill.exec.planner.sql.handlers.ExplainHandler.ge
> >>>>>> tPlan(ExplainHandler.java:61)
> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>>>       at
> >>>>>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(Dri
> >>>>>> llSqlWorker.java:94)
> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>>>       at
> >>>>>> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:978)
> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>>>       at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.
> >> java:
> >>>>>> 257)
> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>>>       at
> >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >>>>>> Executor.java:1145)
> >>>>>> [na:1.7.0_111]
> >>>>>>       at
> >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >>>>>> lExecutor.java:615)
> >>>>>> [na:1.7.0_111]
> >>>>>>       at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Khurram
> >>>>>>
> >>>>>> On Tue, Aug 9, 2016 at 7:46 PM, Oscar Morante <sp...@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Yeah, when I uncomment only the `upload_date` lines (a dir0 alias),
> >>>>>>
> >>>>>>> explain succeeds within ~30s.  Enabling any of the other lines
> >> triggers
> >>>>>>> the
> >>>>>>> failure.
> >>>>>>>
> >>>>>>> This is a log with the `upload_date` lines and `usage <> 'Test'`
> >>>>>>> enabled:
> >>>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022b3c55e
> >>>>>>>
> >>>>>>> The client times out around here (~1.5hours):
> >>>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
> >>>>>>> b3c55e#file-drillbit-log-L178
> >>>>>>>
> >>>>>>> And it still keeps running for a while until it dies (~2.5hours):
> >>>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
> >>>>>>> b3c55e#file-drillbit-log-L178
> >>>>>>>
> >>>>>>> The memory settings for this test were:
> >>>>>>>
> >>>>>>>   DRILL_HEAP="4G"
> >>>>>>>   DRILL_MAX_DIRECT_MEMORY="8G"
> >>>>>>>
> >>>>>>> This is on a laptop with 16G and I should probably lower it, but it
> >>>>>>> seems
> >>>>>>> a bit excessive for such a small query.  And I think I got the same
> >>>>>>> results
> >>>>>>> on a 2 node cluster with 8/16.  I'm gonna try again on the cluster
> to
> >>>>>>> make
> >>>>>>> sure.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Oscar
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Aug 09, 2016 at 04:13:17PM +0530, Khurram Faraaz wrote:
> >>>>>>>
> >>>>>>> You mentioned "*But if I uncomment the where clause then it runs
> for
> >> a
> >>>>>>>
> >>>>>>>> couple of hours until it runs out of memory.*"
> >>>>>>>>
> >>>>>>>> Can you please share the OutOfMemory details from drillbit.log and
> >> the
> >>>>>>>> value of DRILL_MAX_DIRECT_MEMORY
> >>>>>>>>
> >>>>>>>> Can you also try to see what happens if you retain just this line
> >>>>>>>> where
> >>>>>>>> upload_date = '2016-08-01' in your where clause, can you check if
> >> the
> >>>>>>>> explain succeeds.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Khurram
> >>>>>>>>
> >>>>>>>> On Tue, Aug 9, 2016 at 4:00 PM, Oscar Morante <
> spacepluk@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Hi there,
> >>>>>>>>
> >>>>>>>> I've been stuck with this for a while and I'm not sure if I'm
> >> running
> >>>>>>>>> into
> >>>>>>>>> a bug or I'm just doing something very wrong.
> >>>>>>>>>
> >>>>>>>>> I have this stripped-down version of my query:
> >>>>>>>>> https://gist.github.com/spacepluk/9ab1e1a0cfec6f0efb298f023f4c80
> 5b
> >>>>>>>>>
> >>>>>>>>> The data is just a single file with one record (1.5K).
> >>>>>>>>>
> >>>>>>>>> Without changing anything, explain takes ~1sec on my machine.
> But
> >>>>>>>>> if I
> >>>>>>>>> uncomment the where clause then it runs for a couple of hours
> until
> >>>>>>>>> it
> >>>>>>>>> runs
> >>>>>>>>> out of memory.
> >>>>>>>>>
> >>>>>>>>> Also if I uncomment the where clause *and* take out the join,
> then
> >> it
> >>>>>>>>> takes around 30s to plan.
> >>>>>>>>>
> >>>>>>>>> Any ideas?
> >>>>>>>>> Thanks!
> >>>>>>>>>
> >>>>>>>>>
> >>
>
>

Re: Query hangs on planning

Posted by Sudheesh Katkam <sk...@maprtech.com>.

That setting is for off-heap memory. The earlier case hit heap memory limit.

> On Sep 1, 2016, at 11:36 AM, Zelaine Fong <zf...@maprtech.com> wrote:
> 
> One other thing ... have you tried tuning the planner.memory_limit
> parameter?  Based on the earlier stack trace, you're hitting a memory limit
> during query planning.  So, tuning this parameter should help that.  The
> default is 256 MB.
> 
> -- Zelaine
> 
> On Thu, Sep 1, 2016 at 11:21 AM, rahul challapalli <
> challapallirahul@gmail.com> wrote:
> 
>> While planning we use heap memory. 2GB of heap should be sufficient for
>> what you mentioned. This looks like a bug to me. Can you raise a jira for
>> the same? And it would be super helpful if you can also attach the data set
>> used.
>> 
>> Rahul
>> 
>> On Wed, Aug 31, 2016 at 9:14 AM, Oscar Morante <sp...@gmail.com>
>> wrote:
>> 
>>> Sure,
>>> This is what I remember:
>>> 
>>> * Failure
>>>   - embedded mode on my laptop
>>>   - drill memory: 2Gb/4Gb (heap/direct)
>>>   - cpu: 4cores (+hyperthreading)
>>>   - `planner.width.max_per_node=6`
>>> 
>>> * Success
>>>   - AWS Cluster 2x c3.8xlarge
>>>   - drill memory: 16Gb/32Gb
>>>   - cpu: limited by kubernetes to 24cores
>>>   - `planner.width.max_per_node=23`
>>> 
>>> I'm very busy right now to test again, but I'll try to provide better
>> info
>>> as soon as I can.
>>> 
>>> 
>>> 
>>> On Wed, Aug 31, 2016 at 05:38:53PM +0530, Khurram Faraaz wrote:
>>> 
>>>> Can you please share the number of cores on the setup where the query
>> hung
>>>> as compared to the number of cores on the setup where the query went
>>>> through successfully.
>>>> And details of memory from the two scenarios.
>>>> 
>>>> Thanks,
>>>> Khurram
>>>> 
>>>> On Wed, Aug 31, 2016 at 4:50 PM, Oscar Morante <sp...@gmail.com>
>>>> wrote:
>>>> 
>>>> For the record, I think this was just bad memory configuration after
>> all.
>>>>> I retested on bigger machines and everything seems to be working fine.
>>>>> 
>>>>> 
>>>>> On Tue, Aug 09, 2016 at 10:46:33PM +0530, Khurram Faraaz wrote:
>>>>> 
>>>>> Oscar, can you please report a JIRA with the required steps to
>> reproduce
>>>>>> the OOM error. That way someone from the Drill team will take a look
>> and
>>>>>> investigate.
>>>>>> 
>>>>>> For others interested here is the stack trace.
>>>>>> 
>>>>>> 2016-08-09 16:51:14,280 [285642de-ab37-de6e-a54c-
>> 378aaa4ce50e:foreman]
>>>>>> ERROR o.a.drill.common.CatastrophicFailure - Catastrophic Failure
>>>>>> Occurred,
>>>>>> exiting. Information message: Unable to handle out of memory condition
>>>>>> in
>>>>>> Foreman.
>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>>       at java.util.Arrays.copyOfRange(Arrays.java:2694)
>>>>>> ~[na:1.7.0_111]
>>>>>>       at java.lang.String.<init>(String.java:203) ~[na:1.7.0_111]
>>>>>>       at java.lang.StringBuilder.toString(StringBuilder.java:405)
>>>>>> ~[na:1.7.0_111]
>>>>>>       at org.apache.calcite.util.Util.newInternal(Util.java:785)
>>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>>>>>       at
>>>>>> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
>>>>>> VolcanoRuleCall.java:251)
>>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>>>>>       at
>>>>>> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(
>>>>>> VolcanoPlanner.java:808)
>>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>>>>>       at
>>>>>> org.apache.calcite.tools.Programs$RuleSetProgram.run(
>> Programs.java:303)
>>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>>>>>       at
>>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>>>>>> .transform(DefaultSqlHandler.java:404)
>>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>>>       at
>>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>>>>>> .transform(DefaultSqlHandler.java:343)
>>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>>>       at
>>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>>>>>> .convertToDrel(DefaultSqlHandler.java:240)
>>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>>>       at
>>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>>>>>> .convertToDrel(DefaultSqlHandler.java:290)
>>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>>>       at
>>>>>> org.apache.drill.exec.planner.sql.handlers.ExplainHandler.ge
>>>>>> tPlan(ExplainHandler.java:61)
>>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>>>       at
>>>>>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(Dri
>>>>>> llSqlWorker.java:94)
>>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>>>       at
>>>>>> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:978)
>>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>>>       at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.
>> java:
>>>>>> 257)
>>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>>>       at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>>>>> Executor.java:1145)
>>>>>> [na:1.7.0_111]
>>>>>>       at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>>>>> lExecutor.java:615)
>>>>>> [na:1.7.0_111]
>>>>>>       at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
>>>>>> 
>>>>>> Thanks,
>>>>>> Khurram
>>>>>> 
>>>>>> On Tue, Aug 9, 2016 at 7:46 PM, Oscar Morante <sp...@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>> Yeah, when I uncomment only the `upload_date` lines (a dir0 alias),
>>>>>> 
>>>>>>> explain succeeds within ~30s.  Enabling any of the other lines
>> triggers
>>>>>>> the
>>>>>>> failure.
>>>>>>> 
>>>>>>> This is a log with the `upload_date` lines and `usage <> 'Test'`
>>>>>>> enabled:
>>>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022b3c55e
>>>>>>> 
>>>>>>> The client times out around here (~1.5hours):
>>>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
>>>>>>> b3c55e#file-drillbit-log-L178
>>>>>>> 
>>>>>>> And it still keeps running for a while until it dies (~2.5hours):
>>>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
>>>>>>> b3c55e#file-drillbit-log-L178
>>>>>>> 
>>>>>>> The memory settings for this test were:
>>>>>>> 
>>>>>>>   DRILL_HEAP="4G"
>>>>>>>   DRILL_MAX_DIRECT_MEMORY="8G"
>>>>>>> 
>>>>>>> This is on a laptop with 16G and I should probably lower it, but it
>>>>>>> seems
>>>>>>> a bit excessive for such a small query.  And I think I got the same
>>>>>>> results
>>>>>>> on a 2 node cluster with 8/16.  I'm gonna try again on the cluster to
>>>>>>> make
>>>>>>> sure.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Oscar
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Aug 09, 2016 at 04:13:17PM +0530, Khurram Faraaz wrote:
>>>>>>> 
>>>>>>> You mentioned "*But if I uncomment the where clause then it runs for
>> a
>>>>>>> 
>>>>>>>> couple of hours until it runs out of memory.*"
>>>>>>>> 
>>>>>>>> Can you please share the OutOfMemory details from drillbit.log and
>> the
>>>>>>>> value of DRILL_MAX_DIRECT_MEMORY
>>>>>>>> 
>>>>>>>> Can you also try to see what happens if you retain just this line
>>>>>>>> where
>>>>>>>> upload_date = '2016-08-01' in your where clause, can you check if
>> the
>>>>>>>> explain succeeds.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Khurram
>>>>>>>> 
>>>>>>>> On Tue, Aug 9, 2016 at 4:00 PM, Oscar Morante <sp...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Hi there,
>>>>>>>> 
>>>>>>>> I've been stuck with this for a while and I'm not sure if I'm
>> running
>>>>>>>>> into
>>>>>>>>> a bug or I'm just doing something very wrong.
>>>>>>>>> 
>>>>>>>>> I have this stripped-down version of my query:
>>>>>>>>> https://gist.github.com/spacepluk/9ab1e1a0cfec6f0efb298f023f4c805b
>>>>>>>>> 
>>>>>>>>> The data is just a single file with one record (1.5K).
>>>>>>>>> 
>>>>>>>>> Without changing anything, explain takes ~1sec on my machine.  But
>>>>>>>>> if I
>>>>>>>>> uncomment the where clause then it runs for a couple of hours until
>>>>>>>>> it
>>>>>>>>> runs
>>>>>>>>> out of memory.
>>>>>>>>> 
>>>>>>>>> Also if I uncomment the where clause *and* take out the join, then
>> it
>>>>>>>>> takes around 30s to plan.
>>>>>>>>> 
>>>>>>>>> Any ideas?
>>>>>>>>> Thanks!
>>>>>>>>> 
>>>>>>>>> 
>>

Re: Query hangs on planning

Posted by Zelaine Fong <zf...@maprtech.com>.

One other thing ... have you tried tuning the planner.memory_limit
parameter?  Based on the earlier stack trace, you're hitting a memory limit
during query planning.  So, tuning this parameter should help that.  The
default is 256 MB.

-- Zelaine

On Thu, Sep 1, 2016 at 11:21 AM, rahul challapalli <
challapallirahul@gmail.com> wrote:

> While planning we use heap memory. 2GB of heap should be sufficient for
> what you mentioned. This looks like a bug to me. Can you raise a jira for
> the same? And it would be super helpful if you can also attach the data set
> used.
>
> Rahul
>
> On Wed, Aug 31, 2016 at 9:14 AM, Oscar Morante <sp...@gmail.com>
> wrote:
>
> > Sure,
> > This is what I remember:
> >
> > * Failure
> >    - embedded mode on my laptop
> >    - drill memory: 2Gb/4Gb (heap/direct)
> >    - cpu: 4cores (+hyperthreading)
> >    - `planner.width.max_per_node=6`
> >
> > * Success
> >    - AWS Cluster 2x c3.8xlarge
> >    - drill memory: 16Gb/32Gb
> >    - cpu: limited by kubernetes to 24cores
> >    - `planner.width.max_per_node=23`
> >
> > I'm very busy right now to test again, but I'll try to provide better
> info
> > as soon as I can.
> >
> >
> >
> > On Wed, Aug 31, 2016 at 05:38:53PM +0530, Khurram Faraaz wrote:
> >
> >> Can you please share the number of cores on the setup where the query
> hung
> >> as compared to the number of cores on the setup where the query went
> >> through successfully.
> >> And details of memory from the two scenarios.
> >>
> >> Thanks,
> >> Khurram
> >>
> >> On Wed, Aug 31, 2016 at 4:50 PM, Oscar Morante <sp...@gmail.com>
> >> wrote:
> >>
> >> For the record, I think this was just bad memory configuration after
> all.
> >>> I retested on bigger machines and everything seems to be working fine.
> >>>
> >>>
> >>> On Tue, Aug 09, 2016 at 10:46:33PM +0530, Khurram Faraaz wrote:
> >>>
> >>> Oscar, can you please report a JIRA with the required steps to
> reproduce
> >>>> the OOM error. That way someone from the Drill team will take a look
> and
> >>>> investigate.
> >>>>
> >>>> For others interested here is the stack trace.
> >>>>
> >>>> 2016-08-09 16:51:14,280 [285642de-ab37-de6e-a54c-
> 378aaa4ce50e:foreman]
> >>>> ERROR o.a.drill.common.CatastrophicFailure - Catastrophic Failure
> >>>> Occurred,
> >>>> exiting. Information message: Unable to handle out of memory condition
> >>>> in
> >>>> Foreman.
> >>>> java.lang.OutOfMemoryError: Java heap space
> >>>>        at java.util.Arrays.copyOfRange(Arrays.java:2694)
> >>>> ~[na:1.7.0_111]
> >>>>        at java.lang.String.<init>(String.java:203) ~[na:1.7.0_111]
> >>>>        at java.lang.StringBuilder.toString(StringBuilder.java:405)
> >>>> ~[na:1.7.0_111]
> >>>>        at org.apache.calcite.util.Util.newInternal(Util.java:785)
> >>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
> >>>>        at
> >>>> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
> >>>> VolcanoRuleCall.java:251)
> >>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
> >>>>        at
> >>>> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(
> >>>> VolcanoPlanner.java:808)
> >>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
> >>>>        at
> >>>> org.apache.calcite.tools.Programs$RuleSetProgram.run(
> Programs.java:303)
> >>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
> >>>>        at
> >>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
> >>>> .transform(DefaultSqlHandler.java:404)
> >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>        at
> >>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
> >>>> .transform(DefaultSqlHandler.java:343)
> >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>        at
> >>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
> >>>> .convertToDrel(DefaultSqlHandler.java:240)
> >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>        at
> >>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
> >>>> .convertToDrel(DefaultSqlHandler.java:290)
> >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>        at
> >>>> org.apache.drill.exec.planner.sql.handlers.ExplainHandler.ge
> >>>> tPlan(ExplainHandler.java:61)
> >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>        at
> >>>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(Dri
> >>>> llSqlWorker.java:94)
> >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>        at
> >>>> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:978)
> >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>        at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.
> java:
> >>>> 257)
> >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>        at
> >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >>>> Executor.java:1145)
> >>>> [na:1.7.0_111]
> >>>>        at
> >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >>>> lExecutor.java:615)
> >>>> [na:1.7.0_111]
> >>>>        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
> >>>>
> >>>> Thanks,
> >>>> Khurram
> >>>>
> >>>> On Tue, Aug 9, 2016 at 7:46 PM, Oscar Morante <sp...@gmail.com>
> >>>> wrote:
> >>>>
> >>>> Yeah, when I uncomment only the `upload_date` lines (a dir0 alias),
> >>>>
> >>>>> explain succeeds within ~30s.  Enabling any of the other lines
> triggers
> >>>>> the
> >>>>> failure.
> >>>>>
> >>>>> This is a log with the `upload_date` lines and `usage <> 'Test'`
> >>>>> enabled:
> >>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022b3c55e
> >>>>>
> >>>>> The client times out around here (~1.5hours):
> >>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
> >>>>> b3c55e#file-drillbit-log-L178
> >>>>>
> >>>>> And it still keeps running for a while until it dies (~2.5hours):
> >>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
> >>>>> b3c55e#file-drillbit-log-L178
> >>>>>
> >>>>> The memory settings for this test were:
> >>>>>
> >>>>>    DRILL_HEAP="4G"
> >>>>>    DRILL_MAX_DIRECT_MEMORY="8G"
> >>>>>
> >>>>> This is on a laptop with 16G and I should probably lower it, but it
> >>>>> seems
> >>>>> a bit excessive for such a small query.  And I think I got the same
> >>>>> results
> >>>>> on a 2 node cluster with 8/16.  I'm gonna try again on the cluster to
> >>>>> make
> >>>>> sure.
> >>>>>
> >>>>> Thanks,
> >>>>> Oscar
> >>>>>
> >>>>>
> >>>>> On Tue, Aug 09, 2016 at 04:13:17PM +0530, Khurram Faraaz wrote:
> >>>>>
> >>>>> You mentioned "*But if I uncomment the where clause then it runs for
> a
> >>>>>
> >>>>>> couple of hours until it runs out of memory.*"
> >>>>>>
> >>>>>> Can you please share the OutOfMemory details from drillbit.log and
> the
> >>>>>> value of DRILL_MAX_DIRECT_MEMORY
> >>>>>>
> >>>>>> Can you also try to see what happens if you retain just this line
> >>>>>> where
> >>>>>> upload_date = '2016-08-01' in your where clause, can you check if
> the
> >>>>>> explain succeeds.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Khurram
> >>>>>>
> >>>>>> On Tue, Aug 9, 2016 at 4:00 PM, Oscar Morante <sp...@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Hi there,
> >>>>>>
> >>>>>> I've been stuck with this for a while and I'm not sure if I'm
> running
> >>>>>>> into
> >>>>>>> a bug or I'm just doing something very wrong.
> >>>>>>>
> >>>>>>> I have this stripped-down version of my query:
> >>>>>>> https://gist.github.com/spacepluk/9ab1e1a0cfec6f0efb298f023f4c805b
> >>>>>>>
> >>>>>>> The data is just a single file with one record (1.5K).
> >>>>>>>
> >>>>>>> Without changing anything, explain takes ~1sec on my machine.  But
> >>>>>>> if I
> >>>>>>> uncomment the where clause then it runs for a couple of hours until
> >>>>>>> it
> >>>>>>> runs
> >>>>>>> out of memory.
> >>>>>>>
> >>>>>>> Also if I uncomment the where clause *and* take out the join, then
> it
> >>>>>>> takes around 30s to plan.
> >>>>>>>
> >>>>>>> Any ideas?
> >>>>>>> Thanks!
> >>>>>>>
> >>>>>>>
>