You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Hari A V (JIRA)" <ji...@apache.org> on 2011/06/08 11:12:04 UTC

[jira] [Commented] (MAPREDUCE-225) Fault tolerant Hadoop Job Tracker

    [ https://issues.apache.org/jira/browse/MAPREDUCE-225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045849#comment-13045849 ] 

Hari A V commented on MAPREDUCE-225:
------------------------------------

hi,

Sorry for a very late response. 
@Arun: Yes MAPREDUCE-225 is a completely new architecture. May be still need to wait for longer time to get it done. For those who uses 0.20 version and need a simple "availability solution", a much simpler approach would be helpful
@Leitao: Yes, its similar to HMaster HA. It works. I have finished the development of ZK based framework and integrated with JT. I am in the process of contributing it back. As a first step, i have opened a Jira in Zookeeper for a generic LeaderElectionService (ZOOKEEPER-1080). I will upload the patch soon.

ZK+JT may not be a full fledged HA solution. But what it tries to address is 
1. Avoid manual intervention during a Jobtracker failure.
2. Recover and Continue the jobs ( even re-submitting the jobs) without notifying to clients who submitted the job. 

Solution remains very simple as no need to synchronize the "state of the jobs". 

Cons
-------
Job may take longer time to finish during failover due to re-submission of jobs

Please provide suggestions

-Hari



> Fault tolerant Hadoop Job Tracker
> ---------------------------------
>
>                 Key: MAPREDUCE-225
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-225
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>         Environment: High availability enterprise system
>            Reporter: Francesco Salbaroli
>            Assignee: Francesco Salbaroli
>         Attachments: Enhancing the Hadoop MapReduce framework by adding fault.ppt, FaultTolerantHadoop.pdf, HADOOP-4586-0.1.patch, HADOOP-4586v0.3.patch, jgroups-all.jar
>
>
> The Hadoop framework has been designed, in an eort to enhance perfor-
> mances, with a single JobTracker (master node). It's responsibilities varies
> from managing job submission process, compute the input splits, schedule
> the tasks to the slave nodes (TaskTrackers) and monitor their health.
> In some environments, like the IBM and Google's Internet-scale com-
> puting initiative, there is the need for high-availability, and performances
> becomes a secondary issue. In this environments, having a system with
> a Single Point of Failure (such as Hadoop's single JobTracker) is a major
> concern.
> My proposal is to provide a redundant version of Hadoop by adding
> support for multiple replicated JobTrackers. This design can be approached
> in many dierent ways. 
> In the document at: http://sites.google.com/site/hadoopthesis/Home/FaultTolerantHadoop.pdf?attredirects=0
> I wrote an overview of the problem and some approaches to solve it.
> I post this to the community to gather feedback on the best way to proceed in my work.
> Thank you!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira