You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Ilya Katsov <ik...@griddynamics.com> on 2012/09/21 14:01:43 UTC
How to specify temporary dirs in Pig local mode?
Hello All,
I'm trying to run Pig e2e tests in parallel and there are many
failures like this in local mode:
WARN org.apache.hadoop.mapred.Task - Could not find output size
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
output/file.out in any of the configured local directories
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
at org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:56)
at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:944)
at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:924)
at org.apache.hadoop.mapred.Task.done(Task.java:875)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:374)
It seems that the problem is in concurrent access to the JobTracker's
temporary directory - file.out is a temporary JobTracker's file. It's
clearly visible that different tests open files in the same directory:
$ lsof | grep output
java 20719 ikatsov 13r REG 8,1 3486
17039996 /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_0.out
java 20719 ikatsov 16r REG 8,1 349196
17039986 /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_1.out
$ lsof | grep output
java 25410 ikatsov 13w REG 8,1 8145
17039997 /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/spill0.out
$ lsof | grep output
java 2223 ikatsov 13r REG 8,1 289196
16384629 /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0003/attempt_local_0003_r_000000_0/output/map_0.out
$ lsof | grep output
java 12187 ikatsov 14r REG 8,1 349196
17039996 /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_0.out
java 12187 ikatsov 17r REG 8,1 349196
17039999 /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_1.out
I wonder, is there way to specify temporary Hadoop directory
(mapreduce.cluster.local.dir) when launching Pig in local mode?
Thank you in advance,
Ilya
Re: How to specify temporary dirs in Pig local mode?
Posted by Rohini Palaniswamy <ro...@gmail.com>.
Ilya,
Could you try passing -Dmapred.local.dir=<random generated tmp location>
( -Dyarn.nodemanager.local-dirs=<random generated tmp location> in case of
hadoop 23) when launching pig local mode tests and see if that works.
TestDriver.pm already has a block that passes additional java_params to
local mode.
if ($testCmd->{'exectype'} eq "local") {
push(@{$testCmd->{'java_params'}}, "-Xmx1024m");
push(@pigCmd, ("-x", "local"));
}
Regards,
Rohini
On Fri, Sep 21, 2012 at 5:01 AM, Ilya Katsov <ik...@griddynamics.com>wrote:
> Hello All,
>
> I'm trying to run Pig e2e tests in parallel and there are many
> failures like this in local mode:
>
> WARN org.apache.hadoop.mapred.Task - Could not find output size
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> output/file.out in any of the configured local directories
> at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429)
> at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
> at
> org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:56)
> at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:944)
> at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:924)
> at org.apache.hadoop.mapred.Task.done(Task.java:875)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:374)
>
> It seems that the problem is in concurrent access to the JobTracker's
> temporary directory - file.out is a temporary JobTracker's file. It's
> clearly visible that different tests open files in the same directory:
>
> $ lsof | grep output
> java 20719 ikatsov 13r REG 8,1 3486
> 17039996
> /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_0.out
> java 20719 ikatsov 16r REG 8,1 349196
> 17039986
> /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_1.out
>
> $ lsof | grep output
> java 25410 ikatsov 13w REG 8,1 8145
> 17039997
> /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/spill0.out
>
> $ lsof | grep output
> java 2223 ikatsov 13r REG 8,1 289196
> 16384629
> /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0003/attempt_local_0003_r_000000_0/output/map_0.out
>
> $ lsof | grep output
> java 12187 ikatsov 14r REG 8,1 349196
> 17039996
> /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_0.out
> java 12187 ikatsov 17r REG 8,1 349196
> 17039999
> /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_1.out
>
>
> I wonder, is there way to specify temporary Hadoop directory
> (mapreduce.cluster.local.dir) when launching Pig in local mode?
>
> Thank you in advance,
> Ilya
>