You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tez.apache.org by Kostas Tzoumas <kt...@apache.org> on 2014/11/06 20:09:00 UTC

Re: OOM with Hive on Tez

I am running into the same error [1] with plain Tez (not Hive):

Any advice on what configuration parameters I should start looking at?

Kostas

[1] java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
at
org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
at
org.apache.tez.runtime.library.common.shuffle.MemoryFetchedInput.<init>(MemoryFetchedInput.java:38)
at
org.apache.tez.runtime.library.common.shuffle.impl.SimpleFetchedInputAllocator.allocate(SimpleFetchedInputAllocator.java:139)
at
org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:713)
at
org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:485)
at
org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:394)
at
org.apache.tez.runtime.library.common.shuffle.Fetcher.call(Fetcher.java:189)
at
org.apache.tez.runtime.library.common.shuffle.Fetcher.call(Fetcher.java:71)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

On Tue, Aug 26, 2014 at 4:26 PM, Suma Shivaprasad <
sumasai.shivaprasad@gmail.com> wrote:

> Am using Tez 0.4.0 and counters for the query run are as below
>
> 2014-08-26 14:06:41,203 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(171)) - org.apache.tez.common.counters.DAGCounter:
> 2014-08-26 14:06:41,205 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    NUM_FAILED_TASKS: 67
> 2014-08-26 14:06:41,205 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    NUM_KILLED_TASKS: 312
> 2014-08-26 14:06:41,205 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    TOTAL_LAUNCHED_TASKS: 259
> 2014-08-26 14:06:41,205 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    DATA_LOCAL_TASKS: 59
> 2014-08-26 14:06:41,205 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    RACK_LOCAL_TASKS: 27
> 2014-08-26 14:06:41,207 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(171)) - File System Counters:
> 2014-08-26 14:06:41,208 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    FILE: BYTES_READ: 0
> 2014-08-26 14:06:41,208 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    FILE: BYTES_WRITTEN: 3201156949
> 2014-08-26 14:06:41,208 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    FILE: READ_OPS: 0
> 2014-08-26 14:06:41,209 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    FILE: LARGE_READ_OPS: 0
> 2014-08-26 14:06:41,209 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    FILE: WRITE_OPS: 0
> 2014-08-26 14:06:41,209 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    HDFS: BYTES_READ: 30052072845
> 2014-08-26 14:06:41,209 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    HDFS: BYTES_WRITTEN: 0
> 2014-08-26 14:06:41,209 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    HDFS: READ_OPS: 768
> 2014-08-26 14:06:41,209 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    HDFS: LARGE_READ_OPS: 0
> 2014-08-26 14:06:41,209 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    HDFS: WRITE_OPS: 0
> 2014-08-26 14:06:41,211 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(171)) - org.apache.tez.common.counters.TaskCounter:
> 2014-08-26 14:06:41,211 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    GC_TIME_MILLIS: 148639
> 2014-08-26 14:06:41,211 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    CPU_MILLISECONDS: 1420020
> 2014-08-26 14:06:41,211 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    PHYSICAL_MEMORY_BYTES: 304725393408
> 2014-08-26 14:06:41,211 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    VIRTUAL_MEMORY_BYTES: 440084279296
> 2014-08-26 14:06:41,212 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    COMMITTED_HEAP_BYTES: 337806557184
> 2014-08-26 14:06:41,212 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    INPUT_RECORDS_PROCESSED: 722420718
> 2014-08-26 14:06:41,212 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    OUTPUT_RECORDS: 144488481
> 2014-08-26 14:06:41,212 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    OUTPUT_BYTES: 6876509984
> 2014-08-26 14:06:41,212 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    OUTPUT_BYTES_WITH_OVERHEAD: 7165487118
> 2014-08-26 14:06:41,212 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    OUTPUT_BYTES_PHYSICAL: 3201154197
> 2014-08-26 14:06:41,212 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(171)) -
> org.apache.hadoop.hive.ql.exec.FilterOperator$Counter:
> 2014-08-26 14:06:41,212 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    FILTERED: 863123081
> 2014-08-26 14:06:41,212 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    PASSED: 215782564
> 2014-08-26 14:06:41,212 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(171)) -
> org.apache.hadoop.hive.ql.exec.MapOperator$Counter:
> 2014-08-26 14:06:41,212 INFO  [Thread-13]: exec.Task
> (TezTask.java:execute(173)) -    DESERIALIZE_ERRORS: 0
>
> Thanks
> Suma
>
>
> On Tue, Aug 26, 2014 at 7:47 PM, Suma Shivaprasad <
> sumasai.shivaprasad@gmail.com> wrote:
>
> > Trying to run a query on Tez with the following configurations
> >
> >
> > *set hive.tez.container.size=5120*
> > *set mapreduce.map.child.java.opts=-Xmx5120M*
> > *set hive.tez.java.opts=-Xmx4096M*
> > *set hive.auto.convert.join.noconditionaltask.size=805306000*
> > *set tez.am.resource.memory.mb=5120*
> > *set tez.am.java.opts=-Xmx4096M*
> >
> > The above config settings were set after  running
> >
> https://github.com/hortonworks/hdp-configuration-utils/blob/master/2.1/hdp-configuration-utils.py
> > to get the right memory configs
> >
> > Tried with both
> >
> > set tez.runtime.io.sort.mb=512
> > set mapreduce.task.io.sort.mb=512
> >
> > and
> >
> > set tez.runtime.io.sort.mb=2048
> > set mapreduce.task.io.sort.mb=2048
> >
> >
> > The query I am trying run is
> >
> > *select sum(tab1.m1),sum(tab1.m2)*
> > * from tab1 join tab2 dm on tab1.col1=tab2.col1*
> > * where tab1.dt = '2014-06-01' *
> > * and tab2.col2 = '..'*
> > * and tab2.col3 IN ('..')*
> > * group by TAB1.col1*
> >
> > *where TAB1.col1 has high cardinality(around 700- 800 million)*
> >
> > And its going OOM during shuffle phase.
> >
> >  errorMessage=Fetch failed
> > Container released by application,
> > AttemptID:attempt_1407396011310_1577_1_01_000000_4 Info:Error:
> > exceptionThrown=java.lang.OutOfMemoryError: Java heap space
> >  at
> >
> org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
> > at
> >
> org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
> >  at
> >
> org.apache.tez.runtime.library.shuffle.common.MemoryFetchedInput.<init>(MemoryFetchedInput.java:38)
> > at
> >
> org.apache.tez.runtime.library.shuffle.common.impl.SimpleFetchedInputAllocator.allocate(SimpleFetchedInputAllocator.java:137)
> >  at
> >
> org.apache.tez.runtime.library.shuffle.common.Fetcher.fetchInputs(Fetcher.java:252)
> > at
> >
> org.apache.tez.runtime.library.shuffle.common.Fetcher.call(Fetcher.java:184)
> >  at
> >
> org.apache.tez.runtime.library.shuffle.common.Fetcher.call(Fetcher.java:59)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >  at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > at java.lang.Thread.run(Thread.java:662)
> >
> >
> > Please advice if the configurations look ok? Do I need to change
> anything?
> >
> >
> >
> > Thanks
> > Suma
> >
> >
> >
>

Re: OOM with Hive on Tez

Posted by Rajesh Balamohan <rb...@apache.org>.
Hi Kostas,

Can you provide the container size you were running with?

Were you storing any data to ObjectRegistry in this job?

Please feel free to open a JIRA and it would be helpful if you could upload
the app-logs along with tez-site.xml.

~Rajesh.B

On Thu, Nov 6, 2014 at 11:09 AM, Kostas Tzoumas <kt...@apache.org> wrote:

> I am running into the same error [1] with plain Tez (not Hive):
>
> Any advice on what configuration parameters I should start looking at?
>
> Kostas
>
> [1] java.lang.OutOfMemoryError: Java heap space
> at
>
> org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
> at
>
> org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
> at
>
> org.apache.tez.runtime.library.common.shuffle.MemoryFetchedInput.<init>(MemoryFetchedInput.java:38)
> at
>
> org.apache.tez.runtime.library.common.shuffle.impl.SimpleFetchedInputAllocator.allocate(SimpleFetchedInputAllocator.java:139)
> at
>
> org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:713)
> at
>
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:485)
> at
>
> org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:394)
> at
>
> org.apache.tez.runtime.library.common.shuffle.Fetcher.call(Fetcher.java:189)
> at
> org.apache.tez.runtime.library.common.shuffle.Fetcher.call(Fetcher.java:71)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> On Tue, Aug 26, 2014 at 4:26 PM, Suma Shivaprasad <
> sumasai.shivaprasad@gmail.com> wrote:
>
> > Am using Tez 0.4.0 and counters for the query run are as below
> >
> > 2014-08-26 14:06:41,203 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(171)) - org.apache.tez.common.counters.DAGCounter:
> > 2014-08-26 14:06:41,205 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    NUM_FAILED_TASKS: 67
> > 2014-08-26 14:06:41,205 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    NUM_KILLED_TASKS: 312
> > 2014-08-26 14:06:41,205 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    TOTAL_LAUNCHED_TASKS: 259
> > 2014-08-26 14:06:41,205 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    DATA_LOCAL_TASKS: 59
> > 2014-08-26 14:06:41,205 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    RACK_LOCAL_TASKS: 27
> > 2014-08-26 14:06:41,207 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(171)) - File System Counters:
> > 2014-08-26 14:06:41,208 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    FILE: BYTES_READ: 0
> > 2014-08-26 14:06:41,208 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    FILE: BYTES_WRITTEN: 3201156949
> > 2014-08-26 14:06:41,208 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    FILE: READ_OPS: 0
> > 2014-08-26 14:06:41,209 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    FILE: LARGE_READ_OPS: 0
> > 2014-08-26 14:06:41,209 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    FILE: WRITE_OPS: 0
> > 2014-08-26 14:06:41,209 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    HDFS: BYTES_READ: 30052072845
> > 2014-08-26 14:06:41,209 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    HDFS: BYTES_WRITTEN: 0
> > 2014-08-26 14:06:41,209 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    HDFS: READ_OPS: 768
> > 2014-08-26 14:06:41,209 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    HDFS: LARGE_READ_OPS: 0
> > 2014-08-26 14:06:41,209 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    HDFS: WRITE_OPS: 0
> > 2014-08-26 14:06:41,211 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(171)) - org.apache.tez.common.counters.TaskCounter:
> > 2014-08-26 14:06:41,211 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    GC_TIME_MILLIS: 148639
> > 2014-08-26 14:06:41,211 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    CPU_MILLISECONDS: 1420020
> > 2014-08-26 14:06:41,211 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    PHYSICAL_MEMORY_BYTES: 304725393408
> > 2014-08-26 14:06:41,211 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    VIRTUAL_MEMORY_BYTES: 440084279296
> > 2014-08-26 14:06:41,212 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    COMMITTED_HEAP_BYTES: 337806557184
> > 2014-08-26 14:06:41,212 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    INPUT_RECORDS_PROCESSED: 722420718
> > 2014-08-26 14:06:41,212 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    OUTPUT_RECORDS: 144488481
> > 2014-08-26 14:06:41,212 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    OUTPUT_BYTES: 6876509984
> > 2014-08-26 14:06:41,212 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    OUTPUT_BYTES_WITH_OVERHEAD: 7165487118
> > 2014-08-26 14:06:41,212 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    OUTPUT_BYTES_PHYSICAL: 3201154197
> > 2014-08-26 14:06:41,212 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(171)) -
> > org.apache.hadoop.hive.ql.exec.FilterOperator$Counter:
> > 2014-08-26 14:06:41,212 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    FILTERED: 863123081
> > 2014-08-26 14:06:41,212 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    PASSED: 215782564
> > 2014-08-26 14:06:41,212 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(171)) -
> > org.apache.hadoop.hive.ql.exec.MapOperator$Counter:
> > 2014-08-26 14:06:41,212 INFO  [Thread-13]: exec.Task
> > (TezTask.java:execute(173)) -    DESERIALIZE_ERRORS: 0
> >
> > Thanks
> > Suma
> >
> >
> > On Tue, Aug 26, 2014 at 7:47 PM, Suma Shivaprasad <
> > sumasai.shivaprasad@gmail.com> wrote:
> >
> > > Trying to run a query on Tez with the following configurations
> > >
> > >
> > > *set hive.tez.container.size=5120*
> > > *set mapreduce.map.child.java.opts=-Xmx5120M*
> > > *set hive.tez.java.opts=-Xmx4096M*
> > > *set hive.auto.convert.join.noconditionaltask.size=805306000*
> > > *set tez.am.resource.memory.mb=5120*
> > > *set tez.am.java.opts=-Xmx4096M*
> > >
> > > The above config settings were set after  running
> > >
> >
> https://github.com/hortonworks/hdp-configuration-utils/blob/master/2.1/hdp-configuration-utils.py
> > > to get the right memory configs
> > >
> > > Tried with both
> > >
> > > set tez.runtime.io.sort.mb=512
> > > set mapreduce.task.io.sort.mb=512
> > >
> > > and
> > >
> > > set tez.runtime.io.sort.mb=2048
> > > set mapreduce.task.io.sort.mb=2048
> > >
> > >
> > > The query I am trying run is
> > >
> > > *select sum(tab1.m1),sum(tab1.m2)*
> > > * from tab1 join tab2 dm on tab1.col1=tab2.col1*
> > > * where tab1.dt = '2014-06-01' *
> > > * and tab2.col2 = '..'*
> > > * and tab2.col3 IN ('..')*
> > > * group by TAB1.col1*
> > >
> > > *where TAB1.col1 has high cardinality(around 700- 800 million)*
> > >
> > > And its going OOM during shuffle phase.
> > >
> > >  errorMessage=Fetch failed
> > > Container released by application,
> > > AttemptID:attempt_1407396011310_1577_1_01_000000_4 Info:Error:
> > > exceptionThrown=java.lang.OutOfMemoryError: Java heap space
> > >  at
> > >
> >
> org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
> > > at
> > >
> >
> org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
> > >  at
> > >
> >
> org.apache.tez.runtime.library.shuffle.common.MemoryFetchedInput.<init>(MemoryFetchedInput.java:38)
> > > at
> > >
> >
> org.apache.tez.runtime.library.shuffle.common.impl.SimpleFetchedInputAllocator.allocate(SimpleFetchedInputAllocator.java:137)
> > >  at
> > >
> >
> org.apache.tez.runtime.library.shuffle.common.Fetcher.fetchInputs(Fetcher.java:252)
> > > at
> > >
> >
> org.apache.tez.runtime.library.shuffle.common.Fetcher.call(Fetcher.java:184)
> > >  at
> > >
> >
> org.apache.tez.runtime.library.shuffle.common.Fetcher.call(Fetcher.java:59)
> > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > >  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > > at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > >  at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > > at java.lang.Thread.run(Thread.java:662)
> > >
> > >
> > > Please advice if the configurations look ok? Do I need to change
> > anything?
> > >
> > >
> > >
> > > Thanks
> > > Suma
> > >
> > >
> > >
> >
>