You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Bo Shi (JIRA)" <ji...@apache.org> on 2009/01/08 18:33:00 UTC

[jira] Commented: (HADOOP-4586) Fault tolerant Hadoop Job Tracker

    [ https://issues.apache.org/jira/browse/HADOOP-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662048#action_12662048 ] 

Bo Shi commented on HADOOP-4586:
--------------------------------

> In my opinion, it is better to avoid using active copies due to the high
> complexity of the coordination protocol and, instead, using a master-slave
> model with soft-state shared between copies through a distributed cache
> mechanism or saved on HDFS.

Please forgive me if I'm being naive here (I see that I'm a bit late to the show), but wouldn't using Zookeeper to persist jobtracker state effectively mask this complexity?

Has anyone explored refactoring the job tracker to use Zookeeper instead of engineering a new master/slave replication system?

> Fault tolerant Hadoop Job Tracker
> ---------------------------------
>
>                 Key: HADOOP-4586
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4586
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>    Affects Versions: 0.18.0, 0.18.1, 0.18.2
>         Environment: High availability enterprise system
>            Reporter: Francesco Salbaroli
>            Assignee: Francesco Salbaroli
>             Fix For: 0.21.0
>
>         Attachments: Enhancing the Hadoop MapReduce framework by adding fault.ppt, FaultTolerantHadoop.pdf, HADOOP-4586-0.1.patch, jgroups-all.jar
>
>
> The Hadoop framework has been designed, in an eort to enhance perfor-
> mances, with a single JobTracker (master node). It's responsibilities varies
> from managing job submission process, compute the input splits, schedule
> the tasks to the slave nodes (TaskTrackers) and monitor their health.
> In some environments, like the IBM and Google's Internet-scale com-
> puting initiative, there is the need for high-availability, and performances
> becomes a secondary issue. In this environments, having a system with
> a Single Point of Failure (such as Hadoop's single JobTracker) is a major
> concern.
> My proposal is to provide a redundant version of Hadoop by adding
> support for multiple replicated JobTrackers. This design can be approached
> in many dierent ways. 
> In the document at: http://sites.google.com/site/hadoopthesis/Home/FaultTolerantHadoop.pdf?attredirects=0
> I wrote an overview of the problem and some approaches to solve it.
> I post this to the community to gather feedback on the best way to proceed in my work.
> Thank you!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.