You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2017/04/26 07:05:04 UTC

[jira] [Assigned] (TEZ-3699) For large dataset, pipelined shuffle throws exceptions in consumer side for UnorderedPartitioned edge

     [ https://issues.apache.org/jira/browse/TEZ-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rajesh Balamohan reassigned TEZ-3699:
-------------------------------------

    Assignee: Rajesh Balamohan

> For large dataset, pipelined shuffle throws exceptions in consumer side for UnorderedPartitioned edge
> -----------------------------------------------------------------------------------------------------
>
>                 Key: TEZ-3699
>                 URL: https://issues.apache.org/jira/browse/TEZ-3699
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-3699.1.patch
>
>
> {noformat}
> 2017-04-25 21:16:18,179 [INFO] [Fetcher_B {Map_1} #1] |ShuffleManager.fetch|: Completed fetch for attempt: {110, 0, attempt_1490656001509_1098_1_04_000110_0_10013_4, 2, 4} to MEMORY, csize=10505, dsize=32638, EndTime=1493180178179, TimeTaken=1, Rate=10.01 MB/s
> 2017-04-25 21:16:18,183 [INFO] [Fetcher_B {Map_1} #2] |HttpConnection.url|: for url=http://node120:13562/mapOutput?job=job_1490656001509_1098&dag=1&reduce=43&map=attempt_1490656001509_1098_1_04_000075_0_10008_15,attempt_1490656001509_1098_1_04_000035_0_10008_16,attempt_1490656001509_1098_1_04_
> 000055_0_10007_16,attempt_1490656001509_1098_1_04_000055_0_10007_16,attempt_1490656001509_1098_1_04_000075_0_10008_16,attempt_1490656001509_1098_1_04_000195_0_10013_1,attempt_1490656001509_1098_1_04_000075_0_10008_17,attempt_1490656001509_1098_1_04_000180_0_10013_1,attempt_1490656001509_1098_1_04_000195_0_10013_2,att
> empt_1490656001509_1098_1_04_000075_0_10008_18,attempt_1490656001509_1098_1_04_000180_0_10013_2,attempt_1490656001509_1098_1_04_000195_0_10013_3&keepAlive=true sent hash and receievd reply 0 ms
> 2017-04-25 21:16:18,183 [INFO] [Fetcher_B {Map_1} #2] |ShuffleManager.fetch|: Completed fetch for attempt: {75, 0, attempt_1490656001509_1098_1_04_000075_0_10008_15, 1, 15} to MEMORY, csize=23437, dsize=70987, EndTime=1493180178183, TimeTaken=0, Rate=0.00 MB/s
> 2017-04-25 21:16:18,183 [INFO] [Fetcher_B {Map_1} #2] |shuffle.Fetcher|: Failed to read data to memory for InputAttemptIdentifier [inputIdentifier=35, attemptNumber=0, pathComponent=attempt_1490656001509_1098_1_04_000035_0_10008_16, spillType=1, spillId=16]. len=0, decomp=0. ExceptionMessage=Not a valid ifile header
> 2017-04-25 21:16:18,185 [WARN] [Fetcher_B {Map_1} #2] |shuffle.Fetcher|: Failed to shuffle output of InputAttemptIdentifier [inputIdentifier=35, attemptNumber=0, pathComponent=attempt_1490656001509_1098_1_04_000035_0_10008_16, spillType=1, spillId=16] from cn120-10.l42scl.hortonworks.com
> java.io.IOException: Not a valid ifile header
>         at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.verifyHeaderMagic(IFile.java:833)
>         at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.isCompressedFlagEnabled(IFile.java:840)
>         at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.readToMemory(IFile.java:608)
>         at org.apache.tez.runtime.library.common.shuffle.ShuffleUtils.shuffleToMemory(ShuffleUtils.java:134)
>         at org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:814)
>         at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:539)
>         at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:428)
>         at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:226)
>         at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:73)
>         at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)        
> {noformat}
> Couple of issue. One is related to 0 length partition. 
> Second is related to duplicate entry for attempt_1490656001509_1098_1_04_000055_0_10007_16 (need to check if it has got different spill type)
> For smaller datasets it works fine. But when there is large number of spills, this happens.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)