You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Liu Xiao (JIRA)" <ji...@apache.org> on 2015/02/10 05:00:41 UTC
[jira] [Created] (MAPREDUCE-6249) Streaming task will not untar tgz
uploaded with -archives
Liu Xiao created MAPREDUCE-6249:
-----------------------------------
Summary: Streaming task will not untar tgz uploaded with -archives
Key: MAPREDUCE-6249
URL: https://issues.apache.org/jira/browse/MAPREDUCE-6249
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: contrib/streaming
Affects Versions: 2.5.2
Environment: hadoop-2.5.2
hadoop-streaming-2.5.2.jar
Reporter: Liu Xiao
when writing hadoop streaming task. i used -archives to upload a tgz from local machine to hdfs task working directory, but it has not been untarred as the document says. I've searched a lot without any luck.
Here is the hadoop streaming task starting command with hadoop-2.5.2
hadoop jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.2.jar \
-files mapper.sh
-archives /home/hadoop/tmp/test.tgz#test \
-D mapreduce.job.maps=1 \
-D mapreduce.job.reduces=1 \
-input "/test/test.txt" \
-output "/res/" \
-mapper "sh mapper.sh" \
-reducer "cat"
and "mapper.sh"
cat > /dev/null
ls -l test
exit 0
in "test.tgz" there is two files "test.1.txt" and "test.2.txt"
echo "abcd" > test.1.txt
echo "efgh" > test.2.txt
tar zcvf test.tgz test.1.txt test.2.txt
the output from above task
lrwxrwxrwx 1 hadoop hadoop 71 Feb 8 23:25 test -> /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/116/test.tgz
but what desired may be like this
-rw-r--r-- 1 hadoop hadoop 5 Feb 8 23:25 test.1.txt
-rw-r--r-- 1 hadoop hadoop 5 Feb 8 23:25 test.2.txt
so, why test.tgz has not been untarred automatically as document says, and or there is actually another way makes the "tgz" being untarred
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)