You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Venky Shankar <yk...@gmail.com> on 2011/05/23 09:08:11 UTC
mapred.job.split.file not present in job conf

Hey folks,

I am writing a hadoop plugin (somewhat like FTPFileSystem) so as to run
Map/Reduce jobs on data stored on my backing store. The jar (containing the
FileSystem implementation) is copied in hadoop's lib/ directory and the
necessary changes to conf/core-site.xml and conf/mapred-site.xml is done so
as to load the jar when a Map/Reduce job is run. I am using hadoop-0.20.2.

After starting start-mapred.sh script i run a sample Map/Reduce application
('Grep' example that ships with hadoop's distribution), during which I get
the following error (in JobTracker logs)

2011-05-23 11:25:17,464 ERROR org.apache.hadoop.mapred.JobTracker: Job
initialization failed:
java.lang.IllegalArgumentException: Can not create a Path from a null string
        at org.apache.hadoop.fs.Path.checkPathArg(Path.java:78)
        at org.apache.hadoop.fs.Path.<init>(Path.java:90)
        at
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:417)
        at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3150)
        at
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)

I can see the split file (job.split) and the xml conf file (job.xml) getting
created in the backing store (inside:
$SYSTEMDIR/tmp/hadoop-<user>/mapred/system/job_XXXXXXX_XXXX/). It looks like
initTasks (inside JobInProgress) gets an null string from the job conf (for
'mapred.job.split.file') as seen from the backtrace above. But the entry is
present in the xml file:

$ grep mapred.job.split job.xml
<property><name>mapred.job.split.file</name><value>dummyfs://host:<port>/tmp/hadoop-<user>/mapred/system/job_201105231124_0001/job.split</value></property>

Any pointers/tips on how to debug this further. Am i missing something that
could cause this kind of behavior.  Also does the JobTracker gets the
configurations from the xml file (job.xml ?) or from somewhere else (so the
above entry in the xml file does not matter).

Thanks,
-Venky