You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Hitesh Shah (JIRA)" <ji...@apache.org> on 2015/05/05 06:27:05 UTC

[jira] [Comment Edited] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333

    [ https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527881#comment-14527881 ] 

Hitesh Shah edited comment on TEZ-2366 at 5/5/15 4:26 AM:
----------------------------------------------------------

[~pramachandran] Can you confirm that this path is not invoked in local mode? The shuffle meta data will not be present in local mode. 

In any case, maybe to be safe, it might be better to write more defensive code for retrieving the shuffle port and if shuffle port is not available, then disable local fetch. 


was (Author: hitesh):
[~pramachandran] Can you confirm that this path is not invoked in local mode? 

> Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
> --------------------------------------------------------------------
>
>                 Key: TEZ-2366
>                 URL: https://issues.apache.org/jira/browse/TEZ-2366
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Daniel Dai
>            Assignee: Prakash Ramachandran
>            Priority: Critical
>         Attachments: TEZ-2366.1.patch, TEZ-2366.test.txt, TEZ-2366.wip.1.patch
>
>
> There are around 20 unit tests (out of around 2000) fail intermittently after TEZ-2333. Here is a stack:
> {code}
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find output/attempt_1429899954360_0001_1_01_000000_1_10003/file.out.index in any of the configured local directories
>         at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449)
>         at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
>         at org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611)
>         at org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591)
>         at org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536)
>         at org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517)
>         at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190)
>         at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72)
>         at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> To reproduce that in Pig test, using the following commands:
> svn co http://svn.apache.org/repos/asf/pig/trunk
> ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism test
> Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "true" (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup). I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does not help. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)