You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Ilya Katsov <ik...@griddynamics.com> on 2012/09/21 14:01:43 UTC

How to specify temporary dirs in Pig local mode?

Hello All,

I'm trying to run Pig e2e tests in parallel and there are many
failures like this in local mode:

WARN  org.apache.hadoop.mapred.Task - Could not find output size
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
output/file.out in any of the configured local directories
        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429)
        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
        at org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:56)
        at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:944)
        at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:924)
        at org.apache.hadoop.mapred.Task.done(Task.java:875)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:374)

It seems that the problem is in concurrent access to the JobTracker's
temporary directory - file.out is a temporary JobTracker's file. It's
clearly visible that different tests open files in the same directory:

$ lsof | grep output
java      20719    ikatsov   13r      REG                8,1      3486
  17039996 /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_0.out
java      20719    ikatsov   16r      REG                8,1    349196
  17039986 /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_1.out

$ lsof | grep output
java      25410    ikatsov   13w      REG                8,1      8145
  17039997 /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/spill0.out

$ lsof | grep output
java       2223    ikatsov   13r      REG                8,1    289196
  16384629 /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0003/attempt_local_0003_r_000000_0/output/map_0.out

$ lsof | grep output
java      12187    ikatsov   14r      REG                8,1    349196
  17039996 /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_0.out
java      12187    ikatsov   17r      REG                8,1    349196
  17039999 /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_1.out


I wonder, is there way to specify temporary Hadoop directory
(mapreduce.cluster.local.dir) when launching Pig in local mode?

Thank you in advance,
Ilya

Re: How to specify temporary dirs in Pig local mode?

Posted by Rohini Palaniswamy <ro...@gmail.com>.
Ilya,
  Could you try passing -Dmapred.local.dir=<random generated tmp location>
( -Dyarn.nodemanager.local-dirs=<random generated tmp location> in case of
hadoop 23)  when launching pig local mode tests and see if that works.

TestDriver.pm already has a block that passes additional java_params to
local mode.
if ($testCmd->{'exectype'} eq "local") {
push(@{$testCmd->{'java_params'}}, "-Xmx1024m");
        push(@pigCmd, ("-x", "local"));
    }

Regards,
Rohini

On Fri, Sep 21, 2012 at 5:01 AM, Ilya Katsov <ik...@griddynamics.com>wrote:

> Hello All,
>
> I'm trying to run Pig e2e tests in parallel and there are many
> failures like this in local mode:
>
> WARN  org.apache.hadoop.mapred.Task - Could not find output size
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> output/file.out in any of the configured local directories
>         at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429)
>         at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
>         at
> org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:56)
>         at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:944)
>         at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:924)
>         at org.apache.hadoop.mapred.Task.done(Task.java:875)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:374)
>
> It seems that the problem is in concurrent access to the JobTracker's
> temporary directory - file.out is a temporary JobTracker's file. It's
> clearly visible that different tests open files in the same directory:
>
> $ lsof | grep output
> java      20719    ikatsov   13r      REG                8,1      3486
>   17039996
> /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_0.out
> java      20719    ikatsov   16r      REG                8,1    349196
>   17039986
> /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_1.out
>
> $ lsof | grep output
> java      25410    ikatsov   13w      REG                8,1      8145
>   17039997
> /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/spill0.out
>
> $ lsof | grep output
> java       2223    ikatsov   13r      REG                8,1    289196
>   16384629
> /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0003/attempt_local_0003_r_000000_0/output/map_0.out
>
> $ lsof | grep output
> java      12187    ikatsov   14r      REG                8,1    349196
>   17039996
> /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_0.out
> java      12187    ikatsov   17r      REG                8,1    349196
>   17039999
> /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_1.out
>
>
> I wonder, is there way to specify temporary Hadoop directory
> (mapreduce.cluster.local.dir) when launching Pig in local mode?
>
> Thank you in advance,
> Ilya
>