You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Liu Xiao (JIRA)" <ji...@apache.org> on 2015/02/10 05:00:41 UTC

[jira] [Created] (MAPREDUCE-6249) Streaming task will not untar tgz uploaded with -archives

Liu Xiao created MAPREDUCE-6249:
-----------------------------------

             Summary: Streaming task will not untar tgz uploaded with -archives
                 Key: MAPREDUCE-6249
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6249
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: contrib/streaming
    Affects Versions: 2.5.2
         Environment: hadoop-2.5.2
hadoop-streaming-2.5.2.jar
            Reporter: Liu Xiao


when writing hadoop streaming task. i used -archives to upload a tgz from local machine to hdfs task working directory, but it has not been untarred as the document says. I've searched a lot without any luck.

Here is the hadoop streaming task starting command with hadoop-2.5.2

hadoop jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.2.jar \
    -files mapper.sh
    -archives /home/hadoop/tmp/test.tgz#test \
    -D mapreduce.job.maps=1 \
    -D mapreduce.job.reduces=1 \
    -input "/test/test.txt" \
    -output "/res/" \
    -mapper "sh mapper.sh" \
    -reducer "cat"

and "mapper.sh"

cat > /dev/null
ls -l test
exit 0

in "test.tgz" there is two files "test.1.txt" and "test.2.txt"

echo "abcd" > test.1.txt
echo "efgh" > test.2.txt
tar zcvf test.tgz test.1.txt test.2.txt

the output from above task

lrwxrwxrwx 1 hadoop hadoop     71 Feb  8 23:25 test -> /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/116/test.tgz

but what desired may be like this

-rw-r--r-- 1 hadoop hadoop 5 Feb  8 23:25 test.1.txt
-rw-r--r-- 1 hadoop hadoop 5 Feb  8 23:25 test.2.txt

so, why test.tgz has not been untarred automatically as document says, and or there is actually another way makes the "tgz" being untarred



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)