You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2015/01/26 04:12:34 UTC

[jira] [Updated] (TEZ-1997) Remove synchronization DefaultSorter::isRLENeeded() (Causes sorter to hang indefinitely in large jobs)

     [ https://issues.apache.org/jira/browse/TEZ-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rajesh Balamohan updated TEZ-1997:
----------------------------------
    Description: 
{code}

Thread 21822: (state = BLOCKED)
 - org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.isRLENeeded() @bci=0, line=724 (Interpreted frame)
 - org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(int, int) @bci=99, line=754 (Compiled frame)
 - org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill() @bci=29, line=732 (Interpreted frame)
 - org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$SpillThread.run() @bci=69, line=660 (Interpreted frame)


Thread 19983: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=175 (Compiled frame)
 - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() @bci=42, line=2039 (Compiled frame)
 - org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$Buffer.write(byte[], int, int) @bci=451, line=562 (Compiled frame)
 - java.io.DataOutputStream.write(byte[], int, int) @bci=7, line=107 (Compiled frame)
 - org.apache.tez.runtime.library.common.serializer.TezBytesWritableSerialization$TezBytesWritableSerializer.serialize(org.apache.hadoop.io.Writable) @bci=18, line=123 (Compiled frame)
 - org.apache.tez.runtime.library.common.serializer.TezBytesWritableSerialization$TezBytesWritableSerializer.serialize(java.lang.Object) @bci=5, line=110 (Compiled frame)
 - org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.collect(java.lang.Object, java.lang.Object, int) @bci=544, line=283 (Compiled frame)
 - org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.write(java.lang.Object, java.lang.Object) @bci=18, line=185 (Compiled frame)
 - org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(java.lang.Object, java.lang.Object) @bci=9, line=126 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(java.lang.Object, java.lang.Object) @bci=6, line=211 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(org.apache.hadoop.io.BytesWritable, org.apache.hadoop.io.Writable) @bci=94, line=534 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(java.lang.Object, int) @bci=662, line=380 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.processOp(java.lang.Object, int) @bci=30, line=77 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.Operator.forward(java.lang.Object, org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector) @bci=63, line=815 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(java.lang.Object, int) @bci=165, line=138 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.Operator.forward(java.lang.Object, org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector) @bci=63, line=815 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(java.lang.Object, int) @bci=64, line=95 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(java.lang.Object) @bci=18, line=157 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(org.apache.hadoop.io.Writable) @bci=53, line=45 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(java.lang.Object) @bci=20, line=83 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord() @bci=40, line=68 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run() @bci=9, line=294 (Interpreted frame)
 - org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(java.util.Map, java.util.Map) @bci=224, line=163 (Interpreted frame)
 - org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(java.util.Map, java.util.Map) @bci=86, line=138 (Interpreted frame)
 - org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run() @bci=65, line=328 (Interpreted frame)
 - org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run() @bci=119, line=179 (Interpreted frame)
 - org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run() @bci=1, line=171 (Interpreted frame)
 - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Compiled frame)
 - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=422 (Interpreted frame)
 - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1656 (Interpreted frame)
 - org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call() @bci=15, line=171 (Interpreted frame)
 - org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call() @bci=1, line=166 (Interpreted frame)
 - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1142 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
{code}

"Thread 19983" invokes "synchronized void collect(Object key, Object value, final int partition)" and gets into waiting state.

"Thread 21822" invokes "private synchronized boolean isRLENeeded()" and keeps waiting for the lock.

Note: This can easily be reproduced with hive on tez (with tpch dataset) "set tez.runtime.sort.threads=1; create testData as select * from lineitem distribute by l_shipdate".

  was:
{code}

Thread 21822: (state = BLOCKED)
 - org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.isRLENeeded() @bci=0, line=724 (Interpreted frame)
 - org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(int, int) @bci=99, line=754 (Compiled frame)
 - org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill() @bci=29, line=732 (Interpreted frame)
 - org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$SpillThread.run() @bci=69, line=660 (Interpreted frame)


Thread 19983: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=175 (Compiled frame)
 - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() @bci=42, line=2039 (Compiled frame)
 - org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$Buffer.write(byte[], int, int) @bci=451, line=562 (Compiled frame)
 - java.io.DataOutputStream.write(byte[], int, int) @bci=7, line=107 (Compiled frame)
 - org.apache.tez.runtime.library.common.serializer.TezBytesWritableSerialization$TezBytesWritableSerializer.serialize(org.apache.hadoop.io.Writable) @bci=18, line=123 (Compiled frame)
 - org.apache.tez.runtime.library.common.serializer.TezBytesWritableSerialization$TezBytesWritableSerializer.serialize(java.lang.Object) @bci=5, line=110 (Compiled frame)
 - org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.collect(java.lang.Object, java.lang.Object, int) @bci=544, line=283 (Compiled frame)
 - org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.write(java.lang.Object, java.lang.Object) @bci=18, line=185 (Compiled frame)
 - org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(java.lang.Object, java.lang.Object) @bci=9, line=126 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(java.lang.Object, java.lang.Object) @bci=6, line=211 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(org.apache.hadoop.io.BytesWritable, org.apache.hadoop.io.Writable) @bci=94, line=534 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(java.lang.Object, int) @bci=662, line=380 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.processOp(java.lang.Object, int) @bci=30, line=77 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.Operator.forward(java.lang.Object, org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector) @bci=63, line=815 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(java.lang.Object, int) @bci=165, line=138 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.Operator.forward(java.lang.Object, org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector) @bci=63, line=815 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(java.lang.Object, int) @bci=64, line=95 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(java.lang.Object) @bci=18, line=157 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(org.apache.hadoop.io.Writable) @bci=53, line=45 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(java.lang.Object) @bci=20, line=83 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord() @bci=40, line=68 (Compiled frame)
 - org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run() @bci=9, line=294 (Interpreted frame)
 - org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(java.util.Map, java.util.Map) @bci=224, line=163 (Interpreted frame)
 - org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(java.util.Map, java.util.Map) @bci=86, line=138 (Interpreted frame)
 - org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run() @bci=65, line=328 (Interpreted frame)
 - org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run() @bci=119, line=179 (Interpreted frame)
 - org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run() @bci=1, line=171 (Interpreted frame)
 - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Compiled frame)
 - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=422 (Interpreted frame)
 - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1656 (Interpreted frame)
 - org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call() @bci=15, line=171 (Interpreted frame)
 - org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call() @bci=1, line=166 (Interpreted frame)
 - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1142 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
{code}

"Thread 19983" invokes "synchronized void collect(Object key, Object value, final int partition)" and gets into waiting state.

"Thread 21822" invokes "private synchronized boolean isRLENeeded()" and keeps waiting for the lock.


> Remove synchronization DefaultSorter::isRLENeeded() (Causes sorter to hang indefinitely in large jobs) 
> -------------------------------------------------------------------------------------------------------
>
>                 Key: TEZ-1997
>                 URL: https://issues.apache.org/jira/browse/TEZ-1997
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Rajesh Balamohan
>
> {code}
> Thread 21822: (state = BLOCKED)
>  - org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.isRLENeeded() @bci=0, line=724 (Interpreted frame)
>  - org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(int, int) @bci=99, line=754 (Compiled frame)
>  - org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill() @bci=29, line=732 (Interpreted frame)
>  - org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$SpillThread.run() @bci=69, line=660 (Interpreted frame)
> Thread 19983: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=175 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() @bci=42, line=2039 (Compiled frame)
>  - org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$Buffer.write(byte[], int, int) @bci=451, line=562 (Compiled frame)
>  - java.io.DataOutputStream.write(byte[], int, int) @bci=7, line=107 (Compiled frame)
>  - org.apache.tez.runtime.library.common.serializer.TezBytesWritableSerialization$TezBytesWritableSerializer.serialize(org.apache.hadoop.io.Writable) @bci=18, line=123 (Compiled frame)
>  - org.apache.tez.runtime.library.common.serializer.TezBytesWritableSerialization$TezBytesWritableSerializer.serialize(java.lang.Object) @bci=5, line=110 (Compiled frame)
>  - org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.collect(java.lang.Object, java.lang.Object, int) @bci=544, line=283 (Compiled frame)
>  - org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.write(java.lang.Object, java.lang.Object) @bci=18, line=185 (Compiled frame)
>  - org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(java.lang.Object, java.lang.Object) @bci=9, line=126 (Compiled frame)
>  - org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(java.lang.Object, java.lang.Object) @bci=6, line=211 (Compiled frame)
>  - org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(org.apache.hadoop.io.BytesWritable, org.apache.hadoop.io.Writable) @bci=94, line=534 (Compiled frame)
>  - org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(java.lang.Object, int) @bci=662, line=380 (Compiled frame)
>  - org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.processOp(java.lang.Object, int) @bci=30, line=77 (Compiled frame)
>  - org.apache.hadoop.hive.ql.exec.Operator.forward(java.lang.Object, org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector) @bci=63, line=815 (Compiled frame)
>  - org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(java.lang.Object, int) @bci=165, line=138 (Compiled frame)
>  - org.apache.hadoop.hive.ql.exec.Operator.forward(java.lang.Object, org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector) @bci=63, line=815 (Compiled frame)
>  - org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(java.lang.Object, int) @bci=64, line=95 (Compiled frame)
>  - org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(java.lang.Object) @bci=18, line=157 (Compiled frame)
>  - org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(org.apache.hadoop.io.Writable) @bci=53, line=45 (Compiled frame)
>  - org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(java.lang.Object) @bci=20, line=83 (Compiled frame)
>  - org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord() @bci=40, line=68 (Compiled frame)
>  - org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run() @bci=9, line=294 (Interpreted frame)
>  - org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(java.util.Map, java.util.Map) @bci=224, line=163 (Interpreted frame)
>  - org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(java.util.Map, java.util.Map) @bci=86, line=138 (Interpreted frame)
>  - org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run() @bci=65, line=328 (Interpreted frame)
>  - org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run() @bci=119, line=179 (Interpreted frame)
>  - org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run() @bci=1, line=171 (Interpreted frame)
>  - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Compiled frame)
>  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=422 (Interpreted frame)
>  - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1656 (Interpreted frame)
>  - org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call() @bci=15, line=171 (Interpreted frame)
>  - org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call() @bci=1, line=166 (Interpreted frame)
>  - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1142 (Interpreted frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> {code}
> "Thread 19983" invokes "synchronized void collect(Object key, Object value, final int partition)" and gets into waiting state.
> "Thread 21822" invokes "private synchronized boolean isRLENeeded()" and keeps waiting for the lock.
> Note: This can easily be reproduced with hive on tez (with tpch dataset) "set tez.runtime.sort.threads=1; create testData as select * from lineitem distribute by l_shipdate".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)