You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by "Henjarappa, Savitha" <sa...@hp.com> on 2013/02/17 03:18:25 UTC
Hadoop problems
All,
What are the most common problems that an Hadoop Administrator should be on top of?
What would be the possible reasons for a job failure? I understand disk failure is one of the reason.
Thanks,
Savitha
Re: Hadoop problems
Posted by Marcos Ortiz <ml...@uci.cu>.
In the next ApacheCon, Kathleen Ting, one of Cloudera´s Custome
Operations Engineer will
give a talk related to this topic. I don´t have the exact link right
now, but you can easily find it looking in the Big Data track of the
conference. She did another similar talk in the Hadoop World 2011. You
can see it here[1]
Then, you should use "Hadoop Operations" book, written by Eric Sammer,
Engineering Manager at Cloudera and an expert in all this stuff.
Both guys talk always about that Clusters misconfiguration is the
primary cause of
cluster failures. Like you said, disk failure is a possible cause too,
but there are more:
- Disk full
- Too many open files for a particular user
- JVM and GC related issues
- Use of OpenJDK VM instead Oracle Java VM
- NTP synhcronization issues
- SSH related issues
- and many more
[1] http://bit.ly/cloudera_talk
Best wishes
El 16/02/2013 23:18, Henjarappa, Savitha escribió:
> All,
> What are the most common problems that an Hadoop Administrator should
> be on top of?
> What would be the possible reasons for a job failure? I understand
> disk failure is one of the reason.
> Thanks,
> Savitha
-- Marcos Ortíz Valmaseda
Product Manager && Data Scientist at UCI
Blog: http://marcosluis2186.posterous.com
LinkedIn: http://www.linkedin.com/in/marcosluis2186
Twitter: @marcosluis2186 <https://twitter.com/marcosluis2186>
Re: Hadoop problems
Posted by Marcos Ortiz <ml...@uci.cu>.
In the next ApacheCon, Kathleen Ting, one of Cloudera´s Custome
Operations Engineer will
give a talk related to this topic. I don´t have the exact link right
now, but you can easily find it looking in the Big Data track of the
conference. She did another similar talk in the Hadoop World 2011. You
can see it here[1]
Then, you should use "Hadoop Operations" book, written by Eric Sammer,
Engineering Manager at Cloudera and an expert in all this stuff.
Both guys talk always about that Clusters misconfiguration is the
primary cause of
cluster failures. Like you said, disk failure is a possible cause too,
but there are more:
- Disk full
- Too many open files for a particular user
- JVM and GC related issues
- Use of OpenJDK VM instead Oracle Java VM
- NTP synhcronization issues
- SSH related issues
- and many more
[1] http://bit.ly/cloudera_talk
Best wishes
El 16/02/2013 23:18, Henjarappa, Savitha escribió:
> All,
> What are the most common problems that an Hadoop Administrator should
> be on top of?
> What would be the possible reasons for a job failure? I understand
> disk failure is one of the reason.
> Thanks,
> Savitha
-- Marcos Ortíz Valmaseda
Product Manager && Data Scientist at UCI
Blog: http://marcosluis2186.posterous.com
LinkedIn: http://www.linkedin.com/in/marcosluis2186
Twitter: @marcosluis2186 <https://twitter.com/marcosluis2186>
Re: Hadoop problems
Posted by Marcos Ortiz <ml...@uci.cu>.
In the next ApacheCon, Kathleen Ting, one of Cloudera´s Custome
Operations Engineer will
give a talk related to this topic. I don´t have the exact link right
now, but you can easily find it looking in the Big Data track of the
conference. She did another similar talk in the Hadoop World 2011. You
can see it here[1]
Then, you should use "Hadoop Operations" book, written by Eric Sammer,
Engineering Manager at Cloudera and an expert in all this stuff.
Both guys talk always about that Clusters misconfiguration is the
primary cause of
cluster failures. Like you said, disk failure is a possible cause too,
but there are more:
- Disk full
- Too many open files for a particular user
- JVM and GC related issues
- Use of OpenJDK VM instead Oracle Java VM
- NTP synhcronization issues
- SSH related issues
- and many more
[1] http://bit.ly/cloudera_talk
Best wishes
El 16/02/2013 23:18, Henjarappa, Savitha escribió:
> All,
> What are the most common problems that an Hadoop Administrator should
> be on top of?
> What would be the possible reasons for a job failure? I understand
> disk failure is one of the reason.
> Thanks,
> Savitha
-- Marcos Ortíz Valmaseda
Product Manager && Data Scientist at UCI
Blog: http://marcosluis2186.posterous.com
LinkedIn: http://www.linkedin.com/in/marcosluis2186
Twitter: @marcosluis2186 <https://twitter.com/marcosluis2186>
Re: Hadoop problems
Posted by Marcos Ortiz <ml...@uci.cu>.
In the next ApacheCon, Kathleen Ting, one of Cloudera´s Custome
Operations Engineer will
give a talk related to this topic. I don´t have the exact link right
now, but you can easily find it looking in the Big Data track of the
conference. She did another similar talk in the Hadoop World 2011. You
can see it here[1]
Then, you should use "Hadoop Operations" book, written by Eric Sammer,
Engineering Manager at Cloudera and an expert in all this stuff.
Both guys talk always about that Clusters misconfiguration is the
primary cause of
cluster failures. Like you said, disk failure is a possible cause too,
but there are more:
- Disk full
- Too many open files for a particular user
- JVM and GC related issues
- Use of OpenJDK VM instead Oracle Java VM
- NTP synhcronization issues
- SSH related issues
- and many more
[1] http://bit.ly/cloudera_talk
Best wishes
El 16/02/2013 23:18, Henjarappa, Savitha escribió:
> All,
> What are the most common problems that an Hadoop Administrator should
> be on top of?
> What would be the possible reasons for a job failure? I understand
> disk failure is one of the reason.
> Thanks,
> Savitha
-- Marcos Ortíz Valmaseda
Product Manager && Data Scientist at UCI
Blog: http://marcosluis2186.posterous.com
LinkedIn: http://www.linkedin.com/in/marcosluis2186
Twitter: @marcosluis2186 <https://twitter.com/marcosluis2186>