You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2015/02/10 15:46:12 UTC

[jira] [Resolved] (MAPREDUCE-6249) Streaming task will not untar tgz uploaded with -archives

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe resolved MAPREDUCE-6249.
-----------------------------------
    Resolution: Not a Problem

This is something better sent to the [Hadoop User mailing list|http://hadoop.apache.org/mailing_lists.html#User] rather than JIRA.

The archive was untarred as requested, but it was untarred into a directory (named "test" per the '#test' URI fragment in the archive argument).  An archive is always unpacked into a directory specific to that archive, and the distributed cache does not support unpacking directly into the task's working directory.  If you need files placed in the task working directory then you will need to specify them separately (e.g.: via the "-files" directive).

> Streaming task will not untar tgz uploaded with -archives
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-6249
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6249
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 2.5.2
>         Environment: hadoop-2.5.2
> hadoop-streaming-2.5.2.jar
>            Reporter: Liu Xiao
>
> when writing hadoop streaming task. i used -archives to upload a tgz from local machine to hdfs task working directory, but it has not been untarred as the document says. I've searched a lot without any luck.
> Here is the hadoop streaming task starting command with hadoop-2.5.2
> hadoop jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.2.jar \
>     -files mapper.sh
>     -archives /home/hadoop/tmp/test.tgz#test \
>     -D mapreduce.job.maps=1 \
>     -D mapreduce.job.reduces=1 \
>     -input "/test/test.txt" \
>     -output "/res/" \
>     -mapper "sh mapper.sh" \
>     -reducer "cat"
> and "mapper.sh"
> cat > /dev/null
> ls -l test
> exit 0
> in "test.tgz" there is two files "test.1.txt" and "test.2.txt"
> echo "abcd" > test.1.txt
> echo "efgh" > test.2.txt
> tar zcvf test.tgz test.1.txt test.2.txt
> the output from above task
> lrwxrwxrwx 1 hadoop hadoop     71 Feb  8 23:25 test -> /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/116/test.tgz
> but what desired may be like this
> -rw-r--r-- 1 hadoop hadoop 5 Feb  8 23:25 test.1.txt
> -rw-r--r-- 1 hadoop hadoop 5 Feb  8 23:25 test.2.txt
> so, why test.tgz has not been untarred automatically as document says, and or there is actually another way makes the "tgz" being untarred



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)