You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Prachi Gupta (JIRA)" <ji...@apache.org> on 2007/09/07 01:16:28 UTC
[jira] Updated: (HADOOP-1853) multiple -cacheFile option in hadoop
streaming does not seem to work
[ https://issues.apache.org/jira/browse/HADOOP-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Prachi Gupta updated HADOOP-1853:
---------------------------------
Status: Patch Available (was: Open)
> multiple -cacheFile option in hadoop streaming does not seem to work
> ---------------------------------------------------------------------
>
> Key: HADOOP-1853
> URL: https://issues.apache.org/jira/browse/HADOOP-1853
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/streaming
> Reporter: Prachi Gupta
>
> Specifying one -cacheFile option in hadoop streaming works. Specifying more than one, gives a parse error. A patch to fix this and a unit test to test the fix has been attached with this bug. Following is an example of this bug:
> This works:
> -----------------------
> [hod] (parthas) >> stream -input "/user/parthas/test/tmp.data" -mapper
> "testcache.py abc" -output "/user/parthas/qc/exp2/filterData/subLab/0"
> -file "/home/parthas/proj/qc/bin/testcache.py" -cacheFile
> 'hdfs://kry-nn1.inktomisearch.com:8020/user/parthas/test/101-0.qlab.head#abc'
> -jobconf mapred.map.tasks=1 -jobconf
> mapred.job.name="SubByLabel-101-0.ulab.aa" -jobconf numReduceTasks=0
> additionalConfSpec_:null
> null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
> packageJobJar: [/home/parthas/proj/qc/bin/testcache.py,
> /export/crawlspace/kryptonite/hod/tmp/hod-1467-tmp/hadoop-unjar56313/]
> [] /tmp/streamjob56314.jar tmpDir=null
> 07/07/25 16:51:31 INFO mapred.FileInputFormat: Total input paths to
> process : 1
> 07/07/25 16:51:32 INFO streaming.StreamJob: getLocalDirs():
> [/export/crawlspace/kryptonite/hod/tmp/hod-1467-tmp/mapred/local]
> 07/07/25 16:51:32 INFO streaming.StreamJob: Running job: job_0006
> 07/07/25 16:51:32 INFO streaming.StreamJob: To kill this job, run:
> 07/07/25 16:51:32 INFO streaming.StreamJob:
> /export/crawlspace/kryptonite/hadoop/mapred/current/bin/../bin/hadoop
> job -Dmapred.job.tracker=kry1590:50264 -kill job_0006
> 07/07/25 16:51:32 INFO streaming.StreamJob: Tracking URL:
> http://kry1590.inktomisearch.com:56285/jobdetails.jsp?jobid=job_0006
> 07/07/25 16:51:33 INFO streaming.StreamJob: map 0% reduce 0%
> 07/07/25 16:51:34 INFO streaming.StreamJob: map 100% reduce 0%
> 07/07/25 16:51:40 INFO streaming.StreamJob: map 100% reduce 100%
> 07/07/25 16:51:40 INFO streaming.StreamJob: Job complete: job_0006
> 07/07/25 16:51:40 INFO streaming.StreamJob: Output:
> /user/parthas/qc/exp2/filterData/subLab/0
> ---------------
> This does not.
> ----------------------
> [hod] (parthas) >> stream -input "/user/parthas/test/tmp.data" -mapper
> "testcache.py abc def" -output
> "/user/parthas/qc/exp2/filterData/subLab/0" -file
> "/home/parthas/proj/qc/bin/testcache.py" -cacheFile
> 'hdfs://kry-nn1.inktomisearch.com:8020/user/parthas/test/101-0.qlab.head#abc'
> -cacheFile
> 'hdfs://kry-nn1.inktomisearch.com:8020/user/parthas/test/101-0.ulab.aa.head#def'
> -jobconf mapred.map.tasks=1 -jobconf
> mapred.job.name="SubByLabel-101-0.ulab.aa" -jobconf numReduceTasks=0
> 07/07/25 16:52:17 ERROR streaming.StreamJob: Unexpected
> hdfs://kry-nn1.inktomisearch.com:8020/user/parthas/test/101-0.ulab.aa.head#def
> while processing
> -input|-output|-mapper|-combiner|-reducer|-file|-dfs|-jt|-additionalconfspec|-inputformat|-outputformat|-partitioner|-numReduceTasks|-inputreader|||-cacheFile|-cacheArchive|-verbose|-info|-debug|-inputtagged|-help
> Usage: $HADOOP_HOME/bin/hadoop [--config dir] jar \
> $HADOOP_HOME/hadoop-streaming.jar [options]
> Options:
> -input <path> DFS input file(s) for the Map step
> -output <path> DFS output directory for the Reduce step
> -mapper <cmd|JavaClassName> The streaming command to run
> -combiner <JavaClassName> Combiner has to be a Java class
> -reducer <cmd|JavaClassName> The streaming command to run
> -file <file> File/dir to be shipped in the Job jar file
> -dfs <h:p>|local Optional. Override DFS configuration
> -jt <h:p>|local Optional. Override JobTracker configuration
> -additionalconfspec specfile Optional.
> -inputformat
> TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName
> Optional.
> -outputformat TextOutputFormat(default)|JavaClassName Optional.
> -partitioner JavaClassName Optional.
> -numReduceTasks <num> Optional.
> -inputreader <spec> Optional.
> -jobconf <n>=<v> Optional. Add or override a JobConf property
> -cmdenv <n>=<v> Optional. Pass env.var to streaming commands
> -cacheFile fileNameURI
> -cacheArchive fileNameURI
> -verbose
> For more details about these options:
> Use $HADOOP_HOME/bin/hadoop jar build/hadoop-streaming.jar -info
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.