You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Frank Lanitz <fr...@sql-ag.de> on 2015/02/01 21:45:00 UTC
Re: How to debug why example not finishing (or even starting)

Am 28.01.2015 um 14:39 schrieb Frank Lanitz:
> Hi,
> 
> I've got a simple 3-node-setup where I wanted to test the grep function
> based on some examples. So
> 
> $ hadoop fs -put /home/hadoop/hadoop/etc/hadoop hadoop-config
> $ hadoop jar
> hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep
> /user/hadoop/hadoop-config /user/hadoop/output 'dfs[a-z.]+'
> 
> When running this I've get
> 
>> 15/01/28 14:32:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>> 15/01/28 14:32:15 INFO client.RMProxy: Connecting to ResourceManager at hadoopm/<ip>:8032
>> 15/01/28 14:32:15 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
>> 15/01/28 14:32:15 INFO input.FileInputFormat: Total input paths to process : 30
>> 15/01/28 14:32:16 INFO mapreduce.JobSubmitter: number of splits:30
>> 15/01/28 14:32:16 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1422451252723_0002
>> 15/01/28 14:32:16 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
>> 15/01/28 14:32:16 INFO impl.YarnClientImpl: Submitted application application_1422451252723_0002
>> 15/01/28 14:32:16 INFO mapreduce.Job: The url to track the job: http://hadoopm:8088/proxy/application_1422451252723_0002/
>> 15/01/28 14:32:16 INFO mapreduce.Job: Running job: job_1422451252723_0002
> 
> and nothing more seems to happen. When checking
> http://hadoopm:8088/cluster/apps I can see, that the job is accepted,
> but in undefined state. However, when killing the job and restarting a
> new one it might starts to work. Obviously something is not working well
> here -- so I'm wondering how to debug what's going wrong here.

At least I was able to figure out one part of the reasons behind:
The bigger nodes inside my test cluster wasn't properly configured
inside DNS so name resolving failed at process of assigning a job to
this box. I was able to find this by digging into ResouceManager-Logfile
after enabling debug level for log4j checking for exceptions.

However, this is not solving my complete issue, because it appears that
still the other nodes of the cluster are not getting recognized of being
able to process some part of the data.

Cheers,
Frank