You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@hadoop.apache.org by Andrew Purtell <ap...@apache.org> on 2012/09/09 19:57:10 UTC

MRv1 JT Availability (was [DISCUSS] Spin out MR, HDFS and YARN ...)

Hi Arun,

On Mon, Sep 3, 2012 at 4:02 AM Arun C Murthy wrote:
> > On Sep 1, 2012, at 6:32 AM, Andrew Purtell wrote:
> > I'd imagine such a MR(v1) in Hadoop, if this happened, would concentrate on
> > performance improvements, maybe such things as alternate shuffle plugins.
> > Perhaps a HA JobTracker for parity with HDFS.
>
> Lots of this has already happened in branch-1, please look at:
> # JT Availability: MAPREDUCE-3837, MAPREDUCE-4328, MAPREDUCE-4603 (WIP)

Thanks for the pointers!

I just want to be more clear in what I meant by "HA JobTracker for
parity with HDFS". There should be no need to quiesce the JT with a
highly available NameNode, and restarting jobs from the beginning if
the JT crashes isn't good enough to meet the user expectations implied
by "high availability", at least those who are our internal customers.
I meant hot JT failover, that there is a primary and backup JT, that
they share state sufficient for the backup to take over immediately if
the primary fails, and that the TTs and JobClients both will switch
seamlessly to the backup should their communications with the primary
fail. I'd expect state sharing to limit scalability to the small- and
medium-cluster range, and that's fine, YARN is the answer for
scalability issues in the large and largest clusters already.

> # Performance - backports of PureJavaCrc32 in spills (MAPREDUCE-782), fadvise backports (MAPREDUCE-3289) and other several misc. fixes.

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)

Re: MRv1 JT Availability (was [DISCUSS] Spin out MR, HDFS and YARN ...)

Posted by Eric Baldeschwieler <er...@hortonworks.com>.

> I just want to be more clear in what I meant by "HA JobTracker for
> parity with HDFS". There should be no need to quiesce the JT with a
> highly available NameNode, and restarting jobs from the beginning if
> the JT crashes isn't good enough to meet the user expectations implied
> by "high availability", at least those who are our internal customers.

Hi Andrew.

A couple of points...

1) Quiescing the JT can be slightly refined, but the focus there is to have a reasonable behavior if the storage layer becomes unavailable or has not been started in a boot sequence.  This is useful functionality that simply addresses a different set of failure cases. 

2) I agree that restarting jobs is not desirable.  This is an independent issue we've been working on in YARN.  The key here is simply sorting out how you manage state efficiently on ZK or HDFS.  The good news is HBase demonstrates how this can be done (region servers and master designs.

> I meant hot JT failover, that there is a primary and backup JT, that
> they share state sufficient for the backup to take over immediately if
> the primary fails, and that the TTs and JobClients both will switch
> seamlessly to the backup should their communications with the primary
> fail.

I think state sharing is very expensive and error prone.  These kind of hot-hot solutions are almost an anti-patern IMO.  In the case of HDFS we are half way through implementing this, so we don't need to reopen that.  One can argue that HBase and HDFS might need them, given the desire for MANY very low latency requests,  But HBase hasn't opted for this complexity yet I'd observe and I'm more tempted to emulate its designs than HDFS's for MR.

For MR, a good simple cold failover design should be MUCH easier to implement and debug and maintain.  Running jobs need not be lost (their state can be stored in durable storage or recovered from the cluster) and the time to detect failure should end up dominating the time to recover, much like what we are seeing in HDFS testing.  So for small clusters there should be zero reason to do hot-hot.

I think we are much better off focusing on simple design patterns using the storage systems we have (ZK and HDFS) to restore state quickly on failover.  The HBase region server and masters are good examples of good design in this area that we should emulate here IMO.  MR has much simpler problems and any investment we make in improving WALs and state management on HDFS is going to make HBase and every new compute model ported to YARN better.

On Sep 9, 2012, at 10:57 AM, Andrew Purtell <ap...@apache.org> wrote:

> Hi Arun,
> 
> On Mon, Sep 3, 2012 at 4:02 AM Arun C Murthy wrote:
>>> On Sep 1, 2012, at 6:32 AM, Andrew Purtell wrote:
>>> I'd imagine such a MR(v1) in Hadoop, if this happened, would concentrate on
>>> performance improvements, maybe such things as alternate shuffle plugins.
>>> Perhaps a HA JobTracker for parity with HDFS.
>> 
>> Lots of this has already happened in branch-1, please look at:
>> # JT Availability: MAPREDUCE-3837, MAPREDUCE-4328, MAPREDUCE-4603 (WIP)
> 
> Thanks for the pointers!
> 
> I just want to be more clear in what I meant by "HA JobTracker for
> parity with HDFS". There should be no need to quiesce the JT with a
> highly available NameNode, and restarting jobs from the beginning if
> the JT crashes isn't good enough to meet the user expectations implied
> by "high availability", at least those who are our internal customers.
> I meant hot JT failover, that there is a primary and backup JT, that
> they share state sufficient for the backup to take over immediately if
> the primary fails, and that the TTs and JobClients both will switch
> seamlessly to the backup should their communications with the primary
> fail. I'd expect state sharing to limit scalability to the small- and
> medium-cluster range, and that's fine, YARN is the answer for
> scalability issues in the large and largest clusters already.
> 
>> # Performance - backports of PureJavaCrc32 in spills (MAPREDUCE-782), fadvise backports (MAPREDUCE-3289) and other several misc. fixes.
> 
> -- 
> Best regards,
> 
>   - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet
> Hein (via Tom White)