You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2016/04/04 21:54:25 UTC

[jira] [Commented] (TEZ-3196) java.lang.InternalError from decompression codec is fatal to a task during shuffle

    [ https://issues.apache.org/jira/browse/TEZ-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15224931#comment-15224931 ] 

Jason Lowe commented on TEZ-3196:
---------------------------------

Sample stacktrace:
{noformat}
2016-04-02 08:44:03,058 [INFO] [TezChild] |task.TezTaskRunner|: Encounted an error while executing task: attempt_1458300907858_475320_1_01_000934_3
org.apache.pig.backend.executionengine.ExecException: ERROR 0: org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in fetcher {scope_168} #27
	at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POShuffleTezLoad.attachInputs(POShuffleTezLoad.java:121)
	at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.initializeInputs(PigProcessor.java:332)
	at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:210)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in fetcher {scope_168} #27
	at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:360)
	at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:336)
	... 5 more
Caused by: java.lang.InternalError: lzo1x_decompress returned: -8
	at com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect(Native Method)
	at com.hadoop.compression.lzo.LzoDecompressor.decompress(LzoDecompressor.java:292)
	at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88)
	at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
	at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:199)
	at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.readToMemory(IFile.java:626)
	at org.apache.tez.runtime.library.common.shuffle.ShuffleUtils.shuffleToMemory(ShuffleUtils.java:113)
	at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyMapOutput(FetcherOrderedGrouped.java:510)
	at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:286)
	at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:176)
	at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run(FetcherOrderedGrouped.java:191)
{noformat}

MapReduce addressed this in MAPREDUCE-5053, and it looks like Tez needs a similar fix.

> java.lang.InternalError from decompression codec is fatal to a task during shuffle
> ----------------------------------------------------------------------------------
>
>                 Key: TEZ-3196
>                 URL: https://issues.apache.org/jira/browse/TEZ-3196
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Jason Lowe
>             Fix For: 0.7.1
>
>
> Many codecs throw java.lang.InternalError when their native implementations encounter an error in the codec.  This is not treated like a fetch failure and instead is fatal to the task.  The task should treat codec errors during fetch like other fetch failures and retry, hopefully triggering a re-run of the upstream task if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)