You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by "Henjarappa, Savitha" <sa...@hp.com> on 2013/02/17 03:18:25 UTC

Hadoop problems

All,

What are the most common problems that an Hadoop Administrator should be on top of?

What would be the possible reasons for a job failure? I understand disk failure is one of the reason.

Thanks,
Savitha


Re: Hadoop problems

Posted by Marcos Ortiz <ml...@uci.cu>.
In the next ApacheCon, Kathleen Ting, one of Cloudera´s Custome 
Operations Engineer will
give a talk related to this topic. I don´t have the exact link right 
now, but you can easily find it looking in the Big Data track of the 
conference. She did another similar talk in the Hadoop World 2011. You 
can see it here[1]

Then, you should use "Hadoop Operations" book, written by Eric Sammer,
Engineering Manager at Cloudera and an expert in all this stuff.

Both guys talk always about that Clusters misconfiguration is the 
primary cause of
cluster failures. Like you said, disk failure is a possible cause too, 
but there are more:
- Disk full
- Too many open files for a particular user
- JVM and GC related issues
- Use of OpenJDK VM instead Oracle Java VM
- NTP synhcronization issues
- SSH related issues
- and many more
[1] http://bit.ly/cloudera_talk

  Best wishes
El 16/02/2013 23:18, Henjarappa, Savitha escribió:
> All,
> What are the most common problems that an Hadoop Administrator should 
> be on top of?
> What would be the possible reasons for a job failure? I understand 
> disk failure is one of the reason.
> Thanks,
> Savitha

-- Marcos Ortíz Valmaseda
Product Manager && Data Scientist at UCI
Blog: http://marcosluis2186.posterous.com
LinkedIn: http://www.linkedin.com/in/marcosluis2186
Twitter: @marcosluis2186 <https://twitter.com/marcosluis2186>

Re: Hadoop problems

Posted by Marcos Ortiz <ml...@uci.cu>.
In the next ApacheCon, Kathleen Ting, one of Cloudera´s Custome 
Operations Engineer will
give a talk related to this topic. I don´t have the exact link right 
now, but you can easily find it looking in the Big Data track of the 
conference. She did another similar talk in the Hadoop World 2011. You 
can see it here[1]

Then, you should use "Hadoop Operations" book, written by Eric Sammer,
Engineering Manager at Cloudera and an expert in all this stuff.

Both guys talk always about that Clusters misconfiguration is the 
primary cause of
cluster failures. Like you said, disk failure is a possible cause too, 
but there are more:
- Disk full
- Too many open files for a particular user
- JVM and GC related issues
- Use of OpenJDK VM instead Oracle Java VM
- NTP synhcronization issues
- SSH related issues
- and many more
[1] http://bit.ly/cloudera_talk

  Best wishes
El 16/02/2013 23:18, Henjarappa, Savitha escribió:
> All,
> What are the most common problems that an Hadoop Administrator should 
> be on top of?
> What would be the possible reasons for a job failure? I understand 
> disk failure is one of the reason.
> Thanks,
> Savitha

-- Marcos Ortíz Valmaseda
Product Manager && Data Scientist at UCI
Blog: http://marcosluis2186.posterous.com
LinkedIn: http://www.linkedin.com/in/marcosluis2186
Twitter: @marcosluis2186 <https://twitter.com/marcosluis2186>

Re: Hadoop problems

Posted by Marcos Ortiz <ml...@uci.cu>.
In the next ApacheCon, Kathleen Ting, one of Cloudera´s Custome 
Operations Engineer will
give a talk related to this topic. I don´t have the exact link right 
now, but you can easily find it looking in the Big Data track of the 
conference. She did another similar talk in the Hadoop World 2011. You 
can see it here[1]

Then, you should use "Hadoop Operations" book, written by Eric Sammer,
Engineering Manager at Cloudera and an expert in all this stuff.

Both guys talk always about that Clusters misconfiguration is the 
primary cause of
cluster failures. Like you said, disk failure is a possible cause too, 
but there are more:
- Disk full
- Too many open files for a particular user
- JVM and GC related issues
- Use of OpenJDK VM instead Oracle Java VM
- NTP synhcronization issues
- SSH related issues
- and many more
[1] http://bit.ly/cloudera_talk

  Best wishes
El 16/02/2013 23:18, Henjarappa, Savitha escribió:
> All,
> What are the most common problems that an Hadoop Administrator should 
> be on top of?
> What would be the possible reasons for a job failure? I understand 
> disk failure is one of the reason.
> Thanks,
> Savitha

-- Marcos Ortíz Valmaseda
Product Manager && Data Scientist at UCI
Blog: http://marcosluis2186.posterous.com
LinkedIn: http://www.linkedin.com/in/marcosluis2186
Twitter: @marcosluis2186 <https://twitter.com/marcosluis2186>

Re: Hadoop problems

Posted by Marcos Ortiz <ml...@uci.cu>.
In the next ApacheCon, Kathleen Ting, one of Cloudera´s Custome 
Operations Engineer will
give a talk related to this topic. I don´t have the exact link right 
now, but you can easily find it looking in the Big Data track of the 
conference. She did another similar talk in the Hadoop World 2011. You 
can see it here[1]

Then, you should use "Hadoop Operations" book, written by Eric Sammer,
Engineering Manager at Cloudera and an expert in all this stuff.

Both guys talk always about that Clusters misconfiguration is the 
primary cause of
cluster failures. Like you said, disk failure is a possible cause too, 
but there are more:
- Disk full
- Too many open files for a particular user
- JVM and GC related issues
- Use of OpenJDK VM instead Oracle Java VM
- NTP synhcronization issues
- SSH related issues
- and many more
[1] http://bit.ly/cloudera_talk

  Best wishes
El 16/02/2013 23:18, Henjarappa, Savitha escribió:
> All,
> What are the most common problems that an Hadoop Administrator should 
> be on top of?
> What would be the possible reasons for a job failure? I understand 
> disk failure is one of the reason.
> Thanks,
> Savitha

-- Marcos Ortíz Valmaseda
Product Manager && Data Scientist at UCI
Blog: http://marcosluis2186.posterous.com
LinkedIn: http://www.linkedin.com/in/marcosluis2186
Twitter: @marcosluis2186 <https://twitter.com/marcosluis2186>