You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@samza.apache.org by Ethan Setnik <et...@mobileaware.com> on 2014/02/21 23:27:17 UTC

Samza Highly Available YARN Configuration

I'm looking to deploy Samza on AWS infrastructure in a HA configuration.  I
have a clear picture of how to configure all the components such that they
do not contain any single point of failure.

I'm stuck, however, when it comes to the YARN architecture.  It seems that
YARN relies on the single-master / multi-slave pattern as described in the
YARN documentation.  This introduces a single point of failure at the
ResourceManager level such that a failed ResourceManager will fail the
entire YARN cluster.  How does LinkedIn architect a HA configuration for
Samza on YARN such that a complete instance failure of ResourceManager
provides failover for the YARN cluster?

Thanks for your help.

Best,
Ethan


-- 
Ethan Setnik
MobileAware

m: +1 617 513 2052
e: ethan.setnik@mobileaware.com

Re: Samza Highly Available YARN Configuration

Posted by Zhijie Shen <zs...@hortonworks.com>.

Hi Chris,

Since 2.3, the basic features of HA should already be available, such as
ZK-backed stuff. 2.4 is not adding major new features for HA, but
stabilizing it. I'm not closely watching on the progress at HA, but AFAIK,
a lot of effort is put on making user-faced APIs to work seamlessly while
RM fails over, improving the configuration, RM web UI and web services
redirection, RM failover issues on secured cluster and so on. I haven't
follow the CHD modifications.

- Zhijie


On Thu, Mar 20, 2014 at 12:50 PM, Chris Riccomini
<cr...@linkedin.com>wrote:

> Hey Zhijie,
>
> Do you know what exactly is coming out in Apache Hadoop 2.4 for HA? Will
> it have ZK-backed (both state and leadership election) HA RMs? I've had a
> lot of trouble figuring out exactly what the state of HA is in YARN in all
> the JIRAs and CDH modifications.
>
> Cheers,
> Chris
>
> On 3/20/14 11:14 AM, "Zhijie Shen" <zs...@hortonworks.com> wrote:
>
> >If I remember correctly, qjournal is used by HDFS, but not YARN. Both of
> >the components have separate HA stack. BTW, after Hadoop 2.4, HA should be
> >in a better shape.
> >
> >- Zhijie
> >
> >
> >On Thu, Mar 20, 2014 at 10:59 AM, Dan Di Spaltro
> ><da...@gmail.com>wrote:
> >
> >> Is there a different type of YARN HA?  It seems the method of HA for
> >>CDH5
> >> uses the qjournal on top of the zkfc.
> >>
> >> -Dan
> >>
> >>
> >> On Wed, Mar 19, 2014 at 10:53 AM, Yan Fang <ya...@gmail.com>
> wrote:
> >>
> >> > Hi Chris,
> >> >
> >> > I have made the Samza run in HA yarn, leveraging the high available
> >> > configuration. Just put my coarse approach here in case someone faces
> >>the
> >> > similar problem.
> >> >
> >> > The HA yarn is from CDH5-beta 2 version, which is ZK-based HA yarn. It
> >> > seems not working by just replacing the jar file. So the way I made it
> >> work
> >> > is a little hacky: changed the samza-yarn a little, having the client
> >> check
> >> > the current active RM from Zookeeper every time it submits AM. (
> >>Because
> >> HA
> >> > yarn keeps the active RM name in the ZK ). Of course, Samza works
> >>well.
> >> It
> >> > will automatically get restarted when the RM changes (that is,
> >>standby RM
> >> > becomes active when active RM fails).
> >> >
> >> > Hope someone has a better idea for doing this. Thank you.
> >> >
> >> > Cheers,
> >> >
> >> > Fang, Yan
> >> > yanfang724@gmail.com
> >> > +1 (206) 849-4108
> >> >
> >> >
> >> > On Mon, Mar 10, 2014 at 4:35 PM, Yan Fang <ya...@gmail.com>
> >>wrote:
> >> >
> >> > > Hi Chris,
> >> > >
> >> > > Thank you! You are correct, I am actually working in a CDH5-beta
> >> version.
> >> > > Will definitely try as you recommended and do some experiments to
> >>see
> >> how
> >> > > Samza performances.
> >> > >
> >> > > Cheers,
> >> > >
> >> > > Fang, Yan
> >> > > yanfang724@gmail.com
> >> > > +1 (206) 849-4108
> >> > >
> >> > >
> >> > > On Mon, Mar 10, 2014 at 3:54 PM, Chris Riccomini <
> >> > criccomini@linkedin.com>wrote:
> >> > >
> >> > >> Hey Yan,
> >> > >>
> >> > >> I'm not aware of anyone successfully running Samza with CDH5's HA
> >> YARN.
> >> > As
> >> > >> far as I understand, those patches are not fully merged in to
> >>Apache
> >> yet
> >> > >> (I could be wrong, though).
> >> > >>
> >> > >> At a minimum, you'll probably need to replace Samza's 2.2 YARN jars
> >> with
> >> > >> the CDH5 jars, so that Samza properly interprets the different
> >>configs
> >> > >> (e.g. The new RM style of config, which you've mentioned).
> >> > >>
> >> > >> I'm not sure how Samza's YARN AM will behave when the RM is failed
> >> over.
> >> > >> You'll have to experiment with this and see. If you find anything
> >>out,
> >> > >> it'd be very very useful if you could share it with the rest of us.
> >> > Samza
> >> > >> and HA RMs is something that we're investigating as well.
> >> > >>
> >> > >> Cheers,
> >> > >> Chris
> >> > >>
> >> > >> On 3/10/14 12:11 PM, "Yan Fang" <ya...@gmail.com> wrote:
> >> > >>
> >> > >> >Hi All,
> >> > >> >
> >> > >> >Happy daylight saving! I am wondering if anyone in this
> >>mailing-list
> >> > has
> >> > >> >successfully run the Samza in a HA YARN cluster ?
> >> > >> >
> >> > >> >We are trying to run Samza in CDH5 which has HA YARN
> >>configurations.
> >> I
> >> > am
> >> > >> >able to run Samza only by updating the yarn-default.xml (change
> >> > >> >yarn.resourcemanager.address), the same approach Nirmal Kumar
> >> mentioned
> >> > >> in
> >> > >> >"Running Samza on multi node". Otherwise, it will always connect
> >>to
> >> > >> >0.0.0.0
> >> > >> >in yarn-default.xml. (I am sure I set the conf file and YARN_HOME
> >> > >> >correctly.)
> >> > >> >
> >> > >> >So my question is:
> >> > >> >1. Can't Samza interpret HA YARN configuration file correctly? (
> >>Is
> >> > that
> >> > >> >because the HA YARN configuration is using, say,
> >> > >> >yarn.resourcemanager.address.*rm15* instead of
> >> > >> >yarn.resourcemanager.address
> >> > >> >?)
> >> > >> >
> >> > >> >2. Is it possible to switch to a new RM automatically when one is
> >> down?
> >> > >> >Because we have two RMs, one for Active and one for Standby but I
> >>can
> >> > >> only
> >> > >> >put one RM address in yarn-deault.xml. I am wondering if it is
> >> possible
> >> > >> to
> >> > >> >detect the active RM automatically in Samza (or other method)?
> >> > >> >
> >> > >> >3. Any one has the luck to leverage the HA YARN?
> >> > >> >
> >> > >> >Thank you.
> >> > >> >
> >> > >> >Cheers,
> >> > >> >
> >> > >> >Fang, Yan
> >> > >> >yanfang724@gmail.com
> >> > >> >+1 (206) 849-4108
> >> > >> >
> >> > >> >
> >> > >> >On Fri, Feb 21, 2014 at 3:23 PM, Chris Riccomini
> >> > >> ><cr...@linkedin.com>wrote:
> >> > >> >
> >> > >> >> Hey Ethan,
> >> > >> >>
> >> > >> >> YARN's HA support is marginal right now, and we're still
> >> > investigating
> >> > >> >> this stuff. Some useful things to read are:
> >> > >> >>
> >> > >> >> * https://issues.apache.org/jira/browse/YARN-128
> >> > >> >> * https://issues.apache.org/jira/browse/YARN-149
> >> > >> >> * https://issues.apache.org/jira/browse/YARN-353
> >> > >> >> * https://issues.apache.org/jira/browse/YARN-556
> >> > >> >>
> >> > >> >>
> >> > >> >> Also, CDH seems to be packaging some of the ZK-based HA stuff
> >> > already:
> >> > >> >>
> >> > >> >>
> >> > >> >>
> >> > >> >>
> >> > >>
> >> >
> >>
> >>
> https://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/late
> >> > >> >>st
> >> > >> >> /CDH5-High-Availability-Guide/cdh5hag_cfg_RM_HA.html
> >> > >> >>
> >> > >> >>
> >> > >> >> At LI, we're still experimenting with the best setup, so my
> >> guidance
> >> > >> >>might
> >> > >> >> not be state of the art. We currently configure the YARN RM's
> >>store
> >> > >> >> (yarn.resourcemanager.store.class) to use the file system store
> >> > >> >>
> >> > >>
> >> > >>
> >> >
> >>
> >>>>(org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMSta
> >>>>te
> >> > >> >>St
> >> > >> >> ore). The failover is a manual operation where we copy the RM
> >>state
> >> > to
> >> > >> a
> >> > >> >> new machine, and then start the RM on that machine. You then
> >>need
> >> to
> >> > >> >>front
> >> > >> >> the RM with a VIP or DNS entry, which you can update to point to
> >> the
> >> > >> new
> >> > >> >> RM machine when a failover occurs. The NMs need to be
> >>configured to
> >> > >> >>point
> >> > >> >> to this VIP/DNS entry, so that when a failover occurs, the NMs
> >> don't
> >> > >> >>need
> >> > >> >> to update their yarn-site.xml files.
> >> > >> >>
> >> > >> >>
> >> > >> >> It sounds like in the future you won't need to use VIPs/DNS
> >> entries.
> >> > >> You
> >> > >> >> should probably also email the YARN mailing list, just in case
> >> we're
> >> > >> >> misinformed or unaware of some new updates.
> >> > >> >>
> >> > >> >> Cheers,
> >> > >> >> Chris
> >> > >> >>
> >> > >> >> On 2/21/14 2:27 PM, "Ethan Setnik"
> >><et...@mobileaware.com>
> >> > >> wrote:
> >> > >> >>
> >> > >> >> >I'm looking to deploy Samza on AWS infrastructure in a HA
> >> > >> >>configuration.
> >> > >> >> >I
> >> > >> >> >have a clear picture of how to configure all the components
> >>such
> >> > that
> >> > >> >>they
> >> > >> >> >do not contain any single point of failure.
> >> > >> >> >
> >> > >> >> >I'm stuck, however, when it comes to the YARN architecture.  It
> >> > seems
> >> > >> >>that
> >> > >> >> >YARN relies on the single-master / multi-slave pattern as
> >> described
> >> > in
> >> > >> >>the
> >> > >> >> >YARN documentation.  This introduces a single point of failure
> >>at
> >> > the
> >> > >> >> >ResourceManager level such that a failed ResourceManager will
> >>fail
> >> > the
> >> > >> >> >entire YARN cluster.  How does LinkedIn architect a HA
> >> configuration
> >> > >> >>for
> >> > >> >> >Samza on YARN such that a complete instance failure of
> >> > ResourceManager
> >> > >> >> >provides failover for the YARN cluster?
> >> > >> >> >
> >> > >> >> >Thanks for your help.
> >> > >> >> >
> >> > >> >> >Best,
> >> > >> >> >Ethan
> >> > >> >> >
> >> > >> >> >
> >> > >> >> >--
> >> > >> >> >Ethan Setnik
> >> > >> >> >MobileAware
> >> > >> >> >
> >> > >> >> >m: +1 617 513 2052
> >> > >> >> >e: ethan.setnik@mobileaware.com
> >> > >> >>
> >> > >> >>
> >> > >>
> >> > >>
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Dan Di Spaltro
> >>
> >
> >
> >
> >--
> >Zhijie Shen
> >Hortonworks Inc.
> >http://hortonworks.com/
> >
> >--
> >CONFIDENTIALITY NOTICE
> >NOTICE: This message is intended for the use of the individual or entity
> >to
> >which it is addressed and may contain information that is confidential,
> >privileged and exempt from disclosure under applicable law. If the reader
> >of this message is not the intended recipient, you are hereby notified
> >that
> >any printing, copying, dissemination, distribution, disclosure or
> >forwarding of this communication is strictly prohibited. If you have
> >received this communication in error, please contact the sender
> >immediately
> >and delete it from your system. Thank You.
>
>


-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Samza Highly Available YARN Configuration

Posted by Chris Riccomini <cr...@linkedin.com>.

Hey Zhijie,

Do you know what exactly is coming out in Apache Hadoop 2.4 for HA? Will
it have ZK-backed (both state and leadership election) HA RMs? I've had a
lot of trouble figuring out exactly what the state of HA is in YARN in all
the JIRAs and CDH modifications.

Cheers,
Chris

On 3/20/14 11:14 AM, "Zhijie Shen" <zs...@hortonworks.com> wrote:

>If I remember correctly, qjournal is used by HDFS, but not YARN. Both of
>the components have separate HA stack. BTW, after Hadoop 2.4, HA should be
>in a better shape.
>
>- Zhijie
>
>
>On Thu, Mar 20, 2014 at 10:59 AM, Dan Di Spaltro
><da...@gmail.com>wrote:
>
>> Is there a different type of YARN HA?  It seems the method of HA for
>>CDH5
>> uses the qjournal on top of the zkfc.
>>
>> -Dan
>>
>>
>> On Wed, Mar 19, 2014 at 10:53 AM, Yan Fang <ya...@gmail.com> wrote:
>>
>> > Hi Chris,
>> >
>> > I have made the Samza run in HA yarn, leveraging the high available
>> > configuration. Just put my coarse approach here in case someone faces
>>the
>> > similar problem.
>> >
>> > The HA yarn is from CDH5-beta 2 version, which is ZK-based HA yarn. It
>> > seems not working by just replacing the jar file. So the way I made it
>> work
>> > is a little hacky: changed the samza-yarn a little, having the client
>> check
>> > the current active RM from Zookeeper every time it submits AM. (
>>Because
>> HA
>> > yarn keeps the active RM name in the ZK ). Of course, Samza works
>>well.
>> It
>> > will automatically get restarted when the RM changes (that is,
>>standby RM
>> > becomes active when active RM fails).
>> >
>> > Hope someone has a better idea for doing this. Thank you.
>> >
>> > Cheers,
>> >
>> > Fang, Yan
>> > yanfang724@gmail.com
>> > +1 (206) 849-4108
>> >
>> >
>> > On Mon, Mar 10, 2014 at 4:35 PM, Yan Fang <ya...@gmail.com>
>>wrote:
>> >
>> > > Hi Chris,
>> > >
>> > > Thank you! You are correct, I am actually working in a CDH5-beta
>> version.
>> > > Will definitely try as you recommended and do some experiments to
>>see
>> how
>> > > Samza performances.
>> > >
>> > > Cheers,
>> > >
>> > > Fang, Yan
>> > > yanfang724@gmail.com
>> > > +1 (206) 849-4108
>> > >
>> > >
>> > > On Mon, Mar 10, 2014 at 3:54 PM, Chris Riccomini <
>> > criccomini@linkedin.com>wrote:
>> > >
>> > >> Hey Yan,
>> > >>
>> > >> I'm not aware of anyone successfully running Samza with CDH5's HA
>> YARN.
>> > As
>> > >> far as I understand, those patches are not fully merged in to
>>Apache
>> yet
>> > >> (I could be wrong, though).
>> > >>
>> > >> At a minimum, you'll probably need to replace Samza's 2.2 YARN jars
>> with
>> > >> the CDH5 jars, so that Samza properly interprets the different
>>configs
>> > >> (e.g. The new RM style of config, which you've mentioned).
>> > >>
>> > >> I'm not sure how Samza's YARN AM will behave when the RM is failed
>> over.
>> > >> You'll have to experiment with this and see. If you find anything
>>out,
>> > >> it'd be very very useful if you could share it with the rest of us.
>> > Samza
>> > >> and HA RMs is something that we're investigating as well.
>> > >>
>> > >> Cheers,
>> > >> Chris
>> > >>
>> > >> On 3/10/14 12:11 PM, "Yan Fang" <ya...@gmail.com> wrote:
>> > >>
>> > >> >Hi All,
>> > >> >
>> > >> >Happy daylight saving! I am wondering if anyone in this
>>mailing-list
>> > has
>> > >> >successfully run the Samza in a HA YARN cluster ?
>> > >> >
>> > >> >We are trying to run Samza in CDH5 which has HA YARN
>>configurations.
>> I
>> > am
>> > >> >able to run Samza only by updating the yarn-default.xml (change
>> > >> >yarn.resourcemanager.address), the same approach Nirmal Kumar
>> mentioned
>> > >> in
>> > >> >"Running Samza on multi node". Otherwise, it will always connect
>>to
>> > >> >0.0.0.0
>> > >> >in yarn-default.xml. (I am sure I set the conf file and YARN_HOME
>> > >> >correctly.)
>> > >> >
>> > >> >So my question is:
>> > >> >1. Can't Samza interpret HA YARN configuration file correctly? (
>>Is
>> > that
>> > >> >because the HA YARN configuration is using, say,
>> > >> >yarn.resourcemanager.address.*rm15* instead of
>> > >> >yarn.resourcemanager.address
>> > >> >?)
>> > >> >
>> > >> >2. Is it possible to switch to a new RM automatically when one is
>> down?
>> > >> >Because we have two RMs, one for Active and one for Standby but I
>>can
>> > >> only
>> > >> >put one RM address in yarn-deault.xml. I am wondering if it is
>> possible
>> > >> to
>> > >> >detect the active RM automatically in Samza (or other method)?
>> > >> >
>> > >> >3. Any one has the luck to leverage the HA YARN?
>> > >> >
>> > >> >Thank you.
>> > >> >
>> > >> >Cheers,
>> > >> >
>> > >> >Fang, Yan
>> > >> >yanfang724@gmail.com
>> > >> >+1 (206) 849-4108
>> > >> >
>> > >> >
>> > >> >On Fri, Feb 21, 2014 at 3:23 PM, Chris Riccomini
>> > >> ><cr...@linkedin.com>wrote:
>> > >> >
>> > >> >> Hey Ethan,
>> > >> >>
>> > >> >> YARN's HA support is marginal right now, and we're still
>> > investigating
>> > >> >> this stuff. Some useful things to read are:
>> > >> >>
>> > >> >> * https://issues.apache.org/jira/browse/YARN-128
>> > >> >> * https://issues.apache.org/jira/browse/YARN-149
>> > >> >> * https://issues.apache.org/jira/browse/YARN-353
>> > >> >> * https://issues.apache.org/jira/browse/YARN-556
>> > >> >>
>> > >> >>
>> > >> >> Also, CDH seems to be packaging some of the ZK-based HA stuff
>> > already:
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>
>> > >>
>> >
>> 
>>https://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/late
>> > >> >>st
>> > >> >> /CDH5-High-Availability-Guide/cdh5hag_cfg_RM_HA.html
>> > >> >>
>> > >> >>
>> > >> >> At LI, we're still experimenting with the best setup, so my
>> guidance
>> > >> >>might
>> > >> >> not be state of the art. We currently configure the YARN RM's
>>store
>> > >> >> (yarn.resourcemanager.store.class) to use the file system store
>> > >> >>
>> > >>
>> > >>
>> >
>> 
>>>>(org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMSta
>>>>te
>> > >> >>St
>> > >> >> ore). The failover is a manual operation where we copy the RM
>>state
>> > to
>> > >> a
>> > >> >> new machine, and then start the RM on that machine. You then
>>need
>> to
>> > >> >>front
>> > >> >> the RM with a VIP or DNS entry, which you can update to point to
>> the
>> > >> new
>> > >> >> RM machine when a failover occurs. The NMs need to be
>>configured to
>> > >> >>point
>> > >> >> to this VIP/DNS entry, so that when a failover occurs, the NMs
>> don't
>> > >> >>need
>> > >> >> to update their yarn-site.xml files.
>> > >> >>
>> > >> >>
>> > >> >> It sounds like in the future you won't need to use VIPs/DNS
>> entries.
>> > >> You
>> > >> >> should probably also email the YARN mailing list, just in case
>> we're
>> > >> >> misinformed or unaware of some new updates.
>> > >> >>
>> > >> >> Cheers,
>> > >> >> Chris
>> > >> >>
>> > >> >> On 2/21/14 2:27 PM, "Ethan Setnik"
>><et...@mobileaware.com>
>> > >> wrote:
>> > >> >>
>> > >> >> >I'm looking to deploy Samza on AWS infrastructure in a HA
>> > >> >>configuration.
>> > >> >> >I
>> > >> >> >have a clear picture of how to configure all the components
>>such
>> > that
>> > >> >>they
>> > >> >> >do not contain any single point of failure.
>> > >> >> >
>> > >> >> >I'm stuck, however, when it comes to the YARN architecture.  It
>> > seems
>> > >> >>that
>> > >> >> >YARN relies on the single-master / multi-slave pattern as
>> described
>> > in
>> > >> >>the
>> > >> >> >YARN documentation.  This introduces a single point of failure
>>at
>> > the
>> > >> >> >ResourceManager level such that a failed ResourceManager will
>>fail
>> > the
>> > >> >> >entire YARN cluster.  How does LinkedIn architect a HA
>> configuration
>> > >> >>for
>> > >> >> >Samza on YARN such that a complete instance failure of
>> > ResourceManager
>> > >> >> >provides failover for the YARN cluster?
>> > >> >> >
>> > >> >> >Thanks for your help.
>> > >> >> >
>> > >> >> >Best,
>> > >> >> >Ethan
>> > >> >> >
>> > >> >> >
>> > >> >> >--
>> > >> >> >Ethan Setnik
>> > >> >> >MobileAware
>> > >> >> >
>> > >> >> >m: +1 617 513 2052
>> > >> >> >e: ethan.setnik@mobileaware.com
>> > >> >>
>> > >> >>
>> > >>
>> > >>
>> > >
>> >
>>
>>
>>
>> --
>> Dan Di Spaltro
>>
>
>
>
>-- 
>Zhijie Shen
>Hortonworks Inc.
>http://hortonworks.com/
>
>-- 
>CONFIDENTIALITY NOTICE
>NOTICE: This message is intended for the use of the individual or entity
>to 
>which it is addressed and may contain information that is confidential,
>privileged and exempt from disclosure under applicable law. If the reader
>of this message is not the intended recipient, you are hereby notified
>that 
>any printing, copying, dissemination, distribution, disclosure or
>forwarding of this communication is strictly prohibited. If you have
>received this communication in error, please contact the sender
>immediately 
>and delete it from your system. Thank You.

Re: Samza Highly Available YARN Configuration

Posted by Zhijie Shen <zs...@hortonworks.com>.

If I remember correctly, qjournal is used by HDFS, but not YARN. Both of
the components have separate HA stack. BTW, after Hadoop 2.4, HA should be
in a better shape.

- Zhijie


On Thu, Mar 20, 2014 at 10:59 AM, Dan Di Spaltro <da...@gmail.com>wrote:

> Is there a different type of YARN HA?  It seems the method of HA for CDH5
> uses the qjournal on top of the zkfc.
>
> -Dan
>
>
> On Wed, Mar 19, 2014 at 10:53 AM, Yan Fang <ya...@gmail.com> wrote:
>
> > Hi Chris,
> >
> > I have made the Samza run in HA yarn, leveraging the high available
> > configuration. Just put my coarse approach here in case someone faces the
> > similar problem.
> >
> > The HA yarn is from CDH5-beta 2 version, which is ZK-based HA yarn. It
> > seems not working by just replacing the jar file. So the way I made it
> work
> > is a little hacky: changed the samza-yarn a little, having the client
> check
> > the current active RM from Zookeeper every time it submits AM. ( Because
> HA
> > yarn keeps the active RM name in the ZK ). Of course, Samza works well.
> It
> > will automatically get restarted when the RM changes (that is, standby RM
> > becomes active when active RM fails).
> >
> > Hope someone has a better idea for doing this. Thank you.
> >
> > Cheers,
> >
> > Fang, Yan
> > yanfang724@gmail.com
> > +1 (206) 849-4108
> >
> >
> > On Mon, Mar 10, 2014 at 4:35 PM, Yan Fang <ya...@gmail.com> wrote:
> >
> > > Hi Chris,
> > >
> > > Thank you! You are correct, I am actually working in a CDH5-beta
> version.
> > > Will definitely try as you recommended and do some experiments to see
> how
> > > Samza performances.
> > >
> > > Cheers,
> > >
> > > Fang, Yan
> > > yanfang724@gmail.com
> > > +1 (206) 849-4108
> > >
> > >
> > > On Mon, Mar 10, 2014 at 3:54 PM, Chris Riccomini <
> > criccomini@linkedin.com>wrote:
> > >
> > >> Hey Yan,
> > >>
> > >> I'm not aware of anyone successfully running Samza with CDH5's HA
> YARN.
> > As
> > >> far as I understand, those patches are not fully merged in to Apache
> yet
> > >> (I could be wrong, though).
> > >>
> > >> At a minimum, you'll probably need to replace Samza's 2.2 YARN jars
> with
> > >> the CDH5 jars, so that Samza properly interprets the different configs
> > >> (e.g. The new RM style of config, which you've mentioned).
> > >>
> > >> I'm not sure how Samza's YARN AM will behave when the RM is failed
> over.
> > >> You'll have to experiment with this and see. If you find anything out,
> > >> it'd be very very useful if you could share it with the rest of us.
> > Samza
> > >> and HA RMs is something that we're investigating as well.
> > >>
> > >> Cheers,
> > >> Chris
> > >>
> > >> On 3/10/14 12:11 PM, "Yan Fang" <ya...@gmail.com> wrote:
> > >>
> > >> >Hi All,
> > >> >
> > >> >Happy daylight saving! I am wondering if anyone in this mailing-list
> > has
> > >> >successfully run the Samza in a HA YARN cluster ?
> > >> >
> > >> >We are trying to run Samza in CDH5 which has HA YARN configurations.
> I
> > am
> > >> >able to run Samza only by updating the yarn-default.xml (change
> > >> >yarn.resourcemanager.address), the same approach Nirmal Kumar
> mentioned
> > >> in
> > >> >"Running Samza on multi node". Otherwise, it will always connect to
> > >> >0.0.0.0
> > >> >in yarn-default.xml. (I am sure I set the conf file and YARN_HOME
> > >> >correctly.)
> > >> >
> > >> >So my question is:
> > >> >1. Can't Samza interpret HA YARN configuration file correctly? ( Is
> > that
> > >> >because the HA YARN configuration is using, say,
> > >> >yarn.resourcemanager.address.*rm15* instead of
> > >> >yarn.resourcemanager.address
> > >> >?)
> > >> >
> > >> >2. Is it possible to switch to a new RM automatically when one is
> down?
> > >> >Because we have two RMs, one for Active and one for Standby but I can
> > >> only
> > >> >put one RM address in yarn-deault.xml. I am wondering if it is
> possible
> > >> to
> > >> >detect the active RM automatically in Samza (or other method)?
> > >> >
> > >> >3. Any one has the luck to leverage the HA YARN?
> > >> >
> > >> >Thank you.
> > >> >
> > >> >Cheers,
> > >> >
> > >> >Fang, Yan
> > >> >yanfang724@gmail.com
> > >> >+1 (206) 849-4108
> > >> >
> > >> >
> > >> >On Fri, Feb 21, 2014 at 3:23 PM, Chris Riccomini
> > >> ><cr...@linkedin.com>wrote:
> > >> >
> > >> >> Hey Ethan,
> > >> >>
> > >> >> YARN's HA support is marginal right now, and we're still
> > investigating
> > >> >> this stuff. Some useful things to read are:
> > >> >>
> > >> >> * https://issues.apache.org/jira/browse/YARN-128
> > >> >> * https://issues.apache.org/jira/browse/YARN-149
> > >> >> * https://issues.apache.org/jira/browse/YARN-353
> > >> >> * https://issues.apache.org/jira/browse/YARN-556
> > >> >>
> > >> >>
> > >> >> Also, CDH seems to be packaging some of the ZK-based HA stuff
> > already:
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >>
> >
> https://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/late
> > >> >>st
> > >> >> /CDH5-High-Availability-Guide/cdh5hag_cfg_RM_HA.html
> > >> >>
> > >> >>
> > >> >> At LI, we're still experimenting with the best setup, so my
> guidance
> > >> >>might
> > >> >> not be state of the art. We currently configure the YARN RM's store
> > >> >> (yarn.resourcemanager.store.class) to use the file system store
> > >> >>
> > >>
> > >>
> >
> >>(org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMState
> > >> >>St
> > >> >> ore). The failover is a manual operation where we copy the RM state
> > to
> > >> a
> > >> >> new machine, and then start the RM on that machine. You then need
> to
> > >> >>front
> > >> >> the RM with a VIP or DNS entry, which you can update to point to
> the
> > >> new
> > >> >> RM machine when a failover occurs. The NMs need to be configured to
> > >> >>point
> > >> >> to this VIP/DNS entry, so that when a failover occurs, the NMs
> don't
> > >> >>need
> > >> >> to update their yarn-site.xml files.
> > >> >>
> > >> >>
> > >> >> It sounds like in the future you won't need to use VIPs/DNS
> entries.
> > >> You
> > >> >> should probably also email the YARN mailing list, just in case
> we're
> > >> >> misinformed or unaware of some new updates.
> > >> >>
> > >> >> Cheers,
> > >> >> Chris
> > >> >>
> > >> >> On 2/21/14 2:27 PM, "Ethan Setnik" <et...@mobileaware.com>
> > >> wrote:
> > >> >>
> > >> >> >I'm looking to deploy Samza on AWS infrastructure in a HA
> > >> >>configuration.
> > >> >> >I
> > >> >> >have a clear picture of how to configure all the components such
> > that
> > >> >>they
> > >> >> >do not contain any single point of failure.
> > >> >> >
> > >> >> >I'm stuck, however, when it comes to the YARN architecture.  It
> > seems
> > >> >>that
> > >> >> >YARN relies on the single-master / multi-slave pattern as
> described
> > in
> > >> >>the
> > >> >> >YARN documentation.  This introduces a single point of failure at
> > the
> > >> >> >ResourceManager level such that a failed ResourceManager will fail
> > the
> > >> >> >entire YARN cluster.  How does LinkedIn architect a HA
> configuration
> > >> >>for
> > >> >> >Samza on YARN such that a complete instance failure of
> > ResourceManager
> > >> >> >provides failover for the YARN cluster?
> > >> >> >
> > >> >> >Thanks for your help.
> > >> >> >
> > >> >> >Best,
> > >> >> >Ethan
> > >> >> >
> > >> >> >
> > >> >> >--
> > >> >> >Ethan Setnik
> > >> >> >MobileAware
> > >> >> >
> > >> >> >m: +1 617 513 2052
> > >> >> >e: ethan.setnik@mobileaware.com
> > >> >>
> > >> >>
> > >>
> > >>
> > >
> >
>
>
>
> --
> Dan Di Spaltro
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Samza Highly Available YARN Configuration

Posted by Dan Di Spaltro <da...@gmail.com>.

Is there a different type of YARN HA?  It seems the method of HA for CDH5
uses the qjournal on top of the zkfc.

-Dan


On Wed, Mar 19, 2014 at 10:53 AM, Yan Fang <ya...@gmail.com> wrote:

> Hi Chris,
>
> I have made the Samza run in HA yarn, leveraging the high available
> configuration. Just put my coarse approach here in case someone faces the
> similar problem.
>
> The HA yarn is from CDH5-beta 2 version, which is ZK-based HA yarn. It
> seems not working by just replacing the jar file. So the way I made it work
> is a little hacky: changed the samza-yarn a little, having the client check
> the current active RM from Zookeeper every time it submits AM. ( Because HA
> yarn keeps the active RM name in the ZK ). Of course, Samza works well. It
> will automatically get restarted when the RM changes (that is, standby RM
> becomes active when active RM fails).
>
> Hope someone has a better idea for doing this. Thank you.
>
> Cheers,
>
> Fang, Yan
> yanfang724@gmail.com
> +1 (206) 849-4108
>
>
> On Mon, Mar 10, 2014 at 4:35 PM, Yan Fang <ya...@gmail.com> wrote:
>
> > Hi Chris,
> >
> > Thank you! You are correct, I am actually working in a CDH5-beta version.
> > Will definitely try as you recommended and do some experiments to see how
> > Samza performances.
> >
> > Cheers,
> >
> > Fang, Yan
> > yanfang724@gmail.com
> > +1 (206) 849-4108
> >
> >
> > On Mon, Mar 10, 2014 at 3:54 PM, Chris Riccomini <
> criccomini@linkedin.com>wrote:
> >
> >> Hey Yan,
> >>
> >> I'm not aware of anyone successfully running Samza with CDH5's HA YARN.
> As
> >> far as I understand, those patches are not fully merged in to Apache yet
> >> (I could be wrong, though).
> >>
> >> At a minimum, you'll probably need to replace Samza's 2.2 YARN jars with
> >> the CDH5 jars, so that Samza properly interprets the different configs
> >> (e.g. The new RM style of config, which you've mentioned).
> >>
> >> I'm not sure how Samza's YARN AM will behave when the RM is failed over.
> >> You'll have to experiment with this and see. If you find anything out,
> >> it'd be very very useful if you could share it with the rest of us.
> Samza
> >> and HA RMs is something that we're investigating as well.
> >>
> >> Cheers,
> >> Chris
> >>
> >> On 3/10/14 12:11 PM, "Yan Fang" <ya...@gmail.com> wrote:
> >>
> >> >Hi All,
> >> >
> >> >Happy daylight saving! I am wondering if anyone in this mailing-list
> has
> >> >successfully run the Samza in a HA YARN cluster ?
> >> >
> >> >We are trying to run Samza in CDH5 which has HA YARN configurations. I
> am
> >> >able to run Samza only by updating the yarn-default.xml (change
> >> >yarn.resourcemanager.address), the same approach Nirmal Kumar mentioned
> >> in
> >> >"Running Samza on multi node". Otherwise, it will always connect to
> >> >0.0.0.0
> >> >in yarn-default.xml. (I am sure I set the conf file and YARN_HOME
> >> >correctly.)
> >> >
> >> >So my question is:
> >> >1. Can't Samza interpret HA YARN configuration file correctly? ( Is
> that
> >> >because the HA YARN configuration is using, say,
> >> >yarn.resourcemanager.address.*rm15* instead of
> >> >yarn.resourcemanager.address
> >> >?)
> >> >
> >> >2. Is it possible to switch to a new RM automatically when one is down?
> >> >Because we have two RMs, one for Active and one for Standby but I can
> >> only
> >> >put one RM address in yarn-deault.xml. I am wondering if it is possible
> >> to
> >> >detect the active RM automatically in Samza (or other method)?
> >> >
> >> >3. Any one has the luck to leverage the HA YARN?
> >> >
> >> >Thank you.
> >> >
> >> >Cheers,
> >> >
> >> >Fang, Yan
> >> >yanfang724@gmail.com
> >> >+1 (206) 849-4108
> >> >
> >> >
> >> >On Fri, Feb 21, 2014 at 3:23 PM, Chris Riccomini
> >> ><cr...@linkedin.com>wrote:
> >> >
> >> >> Hey Ethan,
> >> >>
> >> >> YARN's HA support is marginal right now, and we're still
> investigating
> >> >> this stuff. Some useful things to read are:
> >> >>
> >> >> * https://issues.apache.org/jira/browse/YARN-128
> >> >> * https://issues.apache.org/jira/browse/YARN-149
> >> >> * https://issues.apache.org/jira/browse/YARN-353
> >> >> * https://issues.apache.org/jira/browse/YARN-556
> >> >>
> >> >>
> >> >> Also, CDH seems to be packaging some of the ZK-based HA stuff
> already:
> >> >>
> >> >>
> >> >>
> >> >>
> >>
> https://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/late
> >> >>st
> >> >> /CDH5-High-Availability-Guide/cdh5hag_cfg_RM_HA.html
> >> >>
> >> >>
> >> >> At LI, we're still experimenting with the best setup, so my guidance
> >> >>might
> >> >> not be state of the art. We currently configure the YARN RM's store
> >> >> (yarn.resourcemanager.store.class) to use the file system store
> >> >>
> >>
> >>
> >>(org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMState
> >> >>St
> >> >> ore). The failover is a manual operation where we copy the RM state
> to
> >> a
> >> >> new machine, and then start the RM on that machine. You then need to
> >> >>front
> >> >> the RM with a VIP or DNS entry, which you can update to point to the
> >> new
> >> >> RM machine when a failover occurs. The NMs need to be configured to
> >> >>point
> >> >> to this VIP/DNS entry, so that when a failover occurs, the NMs don't
> >> >>need
> >> >> to update their yarn-site.xml files.
> >> >>
> >> >>
> >> >> It sounds like in the future you won't need to use VIPs/DNS entries.
> >> You
> >> >> should probably also email the YARN mailing list, just in case we're
> >> >> misinformed or unaware of some new updates.
> >> >>
> >> >> Cheers,
> >> >> Chris
> >> >>
> >> >> On 2/21/14 2:27 PM, "Ethan Setnik" <et...@mobileaware.com>
> >> wrote:
> >> >>
> >> >> >I'm looking to deploy Samza on AWS infrastructure in a HA
> >> >>configuration.
> >> >> >I
> >> >> >have a clear picture of how to configure all the components such
> that
> >> >>they
> >> >> >do not contain any single point of failure.
> >> >> >
> >> >> >I'm stuck, however, when it comes to the YARN architecture.  It
> seems
> >> >>that
> >> >> >YARN relies on the single-master / multi-slave pattern as described
> in
> >> >>the
> >> >> >YARN documentation.  This introduces a single point of failure at
> the
> >> >> >ResourceManager level such that a failed ResourceManager will fail
> the
> >> >> >entire YARN cluster.  How does LinkedIn architect a HA configuration
> >> >>for
> >> >> >Samza on YARN such that a complete instance failure of
> ResourceManager
> >> >> >provides failover for the YARN cluster?
> >> >> >
> >> >> >Thanks for your help.
> >> >> >
> >> >> >Best,
> >> >> >Ethan
> >> >> >
> >> >> >
> >> >> >--
> >> >> >Ethan Setnik
> >> >> >MobileAware
> >> >> >
> >> >> >m: +1 617 513 2052
> >> >> >e: ethan.setnik@mobileaware.com
> >> >>
> >> >>
> >>
> >>
> >
>



-- 
Dan Di Spaltro

Re: Samza Highly Available YARN Configuration

Posted by Yan Fang <ya...@gmail.com>.

Hi Chris,

I have made the Samza run in HA yarn, leveraging the high available
configuration. Just put my coarse approach here in case someone faces the
similar problem.

The HA yarn is from CDH5-beta 2 version, which is ZK-based HA yarn. It
seems not working by just replacing the jar file. So the way I made it work
is a little hacky: changed the samza-yarn a little, having the client check
the current active RM from Zookeeper every time it submits AM. ( Because HA
yarn keeps the active RM name in the ZK ). Of course, Samza works well. It
will automatically get restarted when the RM changes (that is, standby RM
becomes active when active RM fails).

Hope someone has a better idea for doing this. Thank you.

Cheers,

Fang, Yan
yanfang724@gmail.com
+1 (206) 849-4108


On Mon, Mar 10, 2014 at 4:35 PM, Yan Fang <ya...@gmail.com> wrote:

> Hi Chris,
>
> Thank you! You are correct, I am actually working in a CDH5-beta version.
> Will definitely try as you recommended and do some experiments to see how
> Samza performances.
>
> Cheers,
>
> Fang, Yan
> yanfang724@gmail.com
> +1 (206) 849-4108
>
>
> On Mon, Mar 10, 2014 at 3:54 PM, Chris Riccomini <cr...@linkedin.com>wrote:
>
>> Hey Yan,
>>
>> I'm not aware of anyone successfully running Samza with CDH5's HA YARN. As
>> far as I understand, those patches are not fully merged in to Apache yet
>> (I could be wrong, though).
>>
>> At a minimum, you'll probably need to replace Samza's 2.2 YARN jars with
>> the CDH5 jars, so that Samza properly interprets the different configs
>> (e.g. The new RM style of config, which you've mentioned).
>>
>> I'm not sure how Samza's YARN AM will behave when the RM is failed over.
>> You'll have to experiment with this and see. If you find anything out,
>> it'd be very very useful if you could share it with the rest of us. Samza
>> and HA RMs is something that we're investigating as well.
>>
>> Cheers,
>> Chris
>>
>> On 3/10/14 12:11 PM, "Yan Fang" <ya...@gmail.com> wrote:
>>
>> >Hi All,
>> >
>> >Happy daylight saving! I am wondering if anyone in this mailing-list has
>> >successfully run the Samza in a HA YARN cluster ?
>> >
>> >We are trying to run Samza in CDH5 which has HA YARN configurations. I am
>> >able to run Samza only by updating the yarn-default.xml (change
>> >yarn.resourcemanager.address), the same approach Nirmal Kumar mentioned
>> in
>> >"Running Samza on multi node". Otherwise, it will always connect to
>> >0.0.0.0
>> >in yarn-default.xml. (I am sure I set the conf file and YARN_HOME
>> >correctly.)
>> >
>> >So my question is:
>> >1. Can't Samza interpret HA YARN configuration file correctly? ( Is that
>> >because the HA YARN configuration is using, say,
>> >yarn.resourcemanager.address.*rm15* instead of
>> >yarn.resourcemanager.address
>> >?)
>> >
>> >2. Is it possible to switch to a new RM automatically when one is down?
>> >Because we have two RMs, one for Active and one for Standby but I can
>> only
>> >put one RM address in yarn-deault.xml. I am wondering if it is possible
>> to
>> >detect the active RM automatically in Samza (or other method)?
>> >
>> >3. Any one has the luck to leverage the HA YARN?
>> >
>> >Thank you.
>> >
>> >Cheers,
>> >
>> >Fang, Yan
>> >yanfang724@gmail.com
>> >+1 (206) 849-4108
>> >
>> >
>> >On Fri, Feb 21, 2014 at 3:23 PM, Chris Riccomini
>> ><cr...@linkedin.com>wrote:
>> >
>> >> Hey Ethan,
>> >>
>> >> YARN's HA support is marginal right now, and we're still investigating
>> >> this stuff. Some useful things to read are:
>> >>
>> >> * https://issues.apache.org/jira/browse/YARN-128
>> >> * https://issues.apache.org/jira/browse/YARN-149
>> >> * https://issues.apache.org/jira/browse/YARN-353
>> >> * https://issues.apache.org/jira/browse/YARN-556
>> >>
>> >>
>> >> Also, CDH seems to be packaging some of the ZK-based HA stuff already:
>> >>
>> >>
>> >>
>> >>
>> https://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/late
>> >>st
>> >> /CDH5-High-Availability-Guide/cdh5hag_cfg_RM_HA.html
>> >>
>> >>
>> >> At LI, we're still experimenting with the best setup, so my guidance
>> >>might
>> >> not be state of the art. We currently configure the YARN RM's store
>> >> (yarn.resourcemanager.store.class) to use the file system store
>> >>
>>
>> >>(org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMState
>> >>St
>> >> ore). The failover is a manual operation where we copy the RM state to
>> a
>> >> new machine, and then start the RM on that machine. You then need to
>> >>front
>> >> the RM with a VIP or DNS entry, which you can update to point to the
>> new
>> >> RM machine when a failover occurs. The NMs need to be configured to
>> >>point
>> >> to this VIP/DNS entry, so that when a failover occurs, the NMs don't
>> >>need
>> >> to update their yarn-site.xml files.
>> >>
>> >>
>> >> It sounds like in the future you won't need to use VIPs/DNS entries.
>> You
>> >> should probably also email the YARN mailing list, just in case we're
>> >> misinformed or unaware of some new updates.
>> >>
>> >> Cheers,
>> >> Chris
>> >>
>> >> On 2/21/14 2:27 PM, "Ethan Setnik" <et...@mobileaware.com>
>> wrote:
>> >>
>> >> >I'm looking to deploy Samza on AWS infrastructure in a HA
>> >>configuration.
>> >> >I
>> >> >have a clear picture of how to configure all the components such that
>> >>they
>> >> >do not contain any single point of failure.
>> >> >
>> >> >I'm stuck, however, when it comes to the YARN architecture.  It seems
>> >>that
>> >> >YARN relies on the single-master / multi-slave pattern as described in
>> >>the
>> >> >YARN documentation.  This introduces a single point of failure at the
>> >> >ResourceManager level such that a failed ResourceManager will fail the
>> >> >entire YARN cluster.  How does LinkedIn architect a HA configuration
>> >>for
>> >> >Samza on YARN such that a complete instance failure of ResourceManager
>> >> >provides failover for the YARN cluster?
>> >> >
>> >> >Thanks for your help.
>> >> >
>> >> >Best,
>> >> >Ethan
>> >> >
>> >> >
>> >> >--
>> >> >Ethan Setnik
>> >> >MobileAware
>> >> >
>> >> >m: +1 617 513 2052
>> >> >e: ethan.setnik@mobileaware.com
>> >>
>> >>
>>
>>
>

Re: Samza Highly Available YARN Configuration

Posted by Yan Fang <ya...@gmail.com>.

Hi Chris,

Thank you! You are correct, I am actually working in a CDH5-beta version.
Will definitely try as you recommended and do some experiments to see how
Samza performances.

Cheers,

Fang, Yan
yanfang724@gmail.com
+1 (206) 849-4108


On Mon, Mar 10, 2014 at 3:54 PM, Chris Riccomini <cr...@linkedin.com>wrote:

> Hey Yan,
>
> I'm not aware of anyone successfully running Samza with CDH5's HA YARN. As
> far as I understand, those patches are not fully merged in to Apache yet
> (I could be wrong, though).
>
> At a minimum, you'll probably need to replace Samza's 2.2 YARN jars with
> the CDH5 jars, so that Samza properly interprets the different configs
> (e.g. The new RM style of config, which you've mentioned).
>
> I'm not sure how Samza's YARN AM will behave when the RM is failed over.
> You'll have to experiment with this and see. If you find anything out,
> it'd be very very useful if you could share it with the rest of us. Samza
> and HA RMs is something that we're investigating as well.
>
> Cheers,
> Chris
>
> On 3/10/14 12:11 PM, "Yan Fang" <ya...@gmail.com> wrote:
>
> >Hi All,
> >
> >Happy daylight saving! I am wondering if anyone in this mailing-list has
> >successfully run the Samza in a HA YARN cluster ?
> >
> >We are trying to run Samza in CDH5 which has HA YARN configurations. I am
> >able to run Samza only by updating the yarn-default.xml (change
> >yarn.resourcemanager.address), the same approach Nirmal Kumar mentioned in
> >"Running Samza on multi node". Otherwise, it will always connect to
> >0.0.0.0
> >in yarn-default.xml. (I am sure I set the conf file and YARN_HOME
> >correctly.)
> >
> >So my question is:
> >1. Can't Samza interpret HA YARN configuration file correctly? ( Is that
> >because the HA YARN configuration is using, say,
> >yarn.resourcemanager.address.*rm15* instead of
> >yarn.resourcemanager.address
> >?)
> >
> >2. Is it possible to switch to a new RM automatically when one is down?
> >Because we have two RMs, one for Active and one for Standby but I can only
> >put one RM address in yarn-deault.xml. I am wondering if it is possible to
> >detect the active RM automatically in Samza (or other method)?
> >
> >3. Any one has the luck to leverage the HA YARN?
> >
> >Thank you.
> >
> >Cheers,
> >
> >Fang, Yan
> >yanfang724@gmail.com
> >+1 (206) 849-4108
> >
> >
> >On Fri, Feb 21, 2014 at 3:23 PM, Chris Riccomini
> ><cr...@linkedin.com>wrote:
> >
> >> Hey Ethan,
> >>
> >> YARN's HA support is marginal right now, and we're still investigating
> >> this stuff. Some useful things to read are:
> >>
> >> * https://issues.apache.org/jira/browse/YARN-128
> >> * https://issues.apache.org/jira/browse/YARN-149
> >> * https://issues.apache.org/jira/browse/YARN-353
> >> * https://issues.apache.org/jira/browse/YARN-556
> >>
> >>
> >> Also, CDH seems to be packaging some of the ZK-based HA stuff already:
> >>
> >>
> >>
> >>
> https://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/late
> >>st
> >> /CDH5-High-Availability-Guide/cdh5hag_cfg_RM_HA.html
> >>
> >>
> >> At LI, we're still experimenting with the best setup, so my guidance
> >>might
> >> not be state of the art. We currently configure the YARN RM's store
> >> (yarn.resourcemanager.store.class) to use the file system store
> >>
> >>(org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMState
> >>St
> >> ore). The failover is a manual operation where we copy the RM state to a
> >> new machine, and then start the RM on that machine. You then need to
> >>front
> >> the RM with a VIP or DNS entry, which you can update to point to the new
> >> RM machine when a failover occurs. The NMs need to be configured to
> >>point
> >> to this VIP/DNS entry, so that when a failover occurs, the NMs don't
> >>need
> >> to update their yarn-site.xml files.
> >>
> >>
> >> It sounds like in the future you won't need to use VIPs/DNS entries. You
> >> should probably also email the YARN mailing list, just in case we're
> >> misinformed or unaware of some new updates.
> >>
> >> Cheers,
> >> Chris
> >>
> >> On 2/21/14 2:27 PM, "Ethan Setnik" <et...@mobileaware.com>
> wrote:
> >>
> >> >I'm looking to deploy Samza on AWS infrastructure in a HA
> >>configuration.
> >> >I
> >> >have a clear picture of how to configure all the components such that
> >>they
> >> >do not contain any single point of failure.
> >> >
> >> >I'm stuck, however, when it comes to the YARN architecture.  It seems
> >>that
> >> >YARN relies on the single-master / multi-slave pattern as described in
> >>the
> >> >YARN documentation.  This introduces a single point of failure at the
> >> >ResourceManager level such that a failed ResourceManager will fail the
> >> >entire YARN cluster.  How does LinkedIn architect a HA configuration
> >>for
> >> >Samza on YARN such that a complete instance failure of ResourceManager
> >> >provides failover for the YARN cluster?
> >> >
> >> >Thanks for your help.
> >> >
> >> >Best,
> >> >Ethan
> >> >
> >> >
> >> >--
> >> >Ethan Setnik
> >> >MobileAware
> >> >
> >> >m: +1 617 513 2052
> >> >e: ethan.setnik@mobileaware.com
> >>
> >>
>
>

Re: Samza Highly Available YARN Configuration

Posted by Chris Riccomini <cr...@linkedin.com>.

Hey Yan,

I'm not aware of anyone successfully running Samza with CDH5's HA YARN. As
far as I understand, those patches are not fully merged in to Apache yet
(I could be wrong, though).

At a minimum, you'll probably need to replace Samza's 2.2 YARN jars with
the CDH5 jars, so that Samza properly interprets the different configs
(e.g. The new RM style of config, which you've mentioned).

I'm not sure how Samza's YARN AM will behave when the RM is failed over.
You'll have to experiment with this and see. If you find anything out,
it'd be very very useful if you could share it with the rest of us. Samza
and HA RMs is something that we're investigating as well.

Cheers,
Chris

On 3/10/14 12:11 PM, "Yan Fang" <ya...@gmail.com> wrote:

>Hi All,
>
>Happy daylight saving! I am wondering if anyone in this mailing-list has
>successfully run the Samza in a HA YARN cluster ?
>
>We are trying to run Samza in CDH5 which has HA YARN configurations. I am
>able to run Samza only by updating the yarn-default.xml (change
>yarn.resourcemanager.address), the same approach Nirmal Kumar mentioned in
>"Running Samza on multi node". Otherwise, it will always connect to
>0.0.0.0
>in yarn-default.xml. (I am sure I set the conf file and YARN_HOME
>correctly.)
>
>So my question is:
>1. Can't Samza interpret HA YARN configuration file correctly? ( Is that
>because the HA YARN configuration is using, say,
>yarn.resourcemanager.address.*rm15* instead of
>yarn.resourcemanager.address
>?)
>
>2. Is it possible to switch to a new RM automatically when one is down?
>Because we have two RMs, one for Active and one for Standby but I can only
>put one RM address in yarn-deault.xml. I am wondering if it is possible to
>detect the active RM automatically in Samza (or other method)?
>
>3. Any one has the luck to leverage the HA YARN?
>
>Thank you.
>
>Cheers,
>
>Fang, Yan
>yanfang724@gmail.com
>+1 (206) 849-4108
>
>
>On Fri, Feb 21, 2014 at 3:23 PM, Chris Riccomini
><cr...@linkedin.com>wrote:
>
>> Hey Ethan,
>>
>> YARN's HA support is marginal right now, and we're still investigating
>> this stuff. Some useful things to read are:
>>
>> * https://issues.apache.org/jira/browse/YARN-128
>> * https://issues.apache.org/jira/browse/YARN-149
>> * https://issues.apache.org/jira/browse/YARN-353
>> * https://issues.apache.org/jira/browse/YARN-556
>>
>>
>> Also, CDH seems to be packaging some of the ZK-based HA stuff already:
>>
>>
>> 
>>https://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/late
>>st
>> /CDH5-High-Availability-Guide/cdh5hag_cfg_RM_HA.html
>>
>>
>> At LI, we're still experimenting with the best setup, so my guidance
>>might
>> not be state of the art. We currently configure the YARN RM's store
>> (yarn.resourcemanager.store.class) to use the file system store
>> 
>>(org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMState
>>St
>> ore). The failover is a manual operation where we copy the RM state to a
>> new machine, and then start the RM on that machine. You then need to
>>front
>> the RM with a VIP or DNS entry, which you can update to point to the new
>> RM machine when a failover occurs. The NMs need to be configured to
>>point
>> to this VIP/DNS entry, so that when a failover occurs, the NMs don't
>>need
>> to update their yarn-site.xml files.
>>
>>
>> It sounds like in the future you won't need to use VIPs/DNS entries. You
>> should probably also email the YARN mailing list, just in case we're
>> misinformed or unaware of some new updates.
>>
>> Cheers,
>> Chris
>>
>> On 2/21/14 2:27 PM, "Ethan Setnik" <et...@mobileaware.com> wrote:
>>
>> >I'm looking to deploy Samza on AWS infrastructure in a HA
>>configuration.
>> >I
>> >have a clear picture of how to configure all the components such that
>>they
>> >do not contain any single point of failure.
>> >
>> >I'm stuck, however, when it comes to the YARN architecture.  It seems
>>that
>> >YARN relies on the single-master / multi-slave pattern as described in
>>the
>> >YARN documentation.  This introduces a single point of failure at the
>> >ResourceManager level such that a failed ResourceManager will fail the
>> >entire YARN cluster.  How does LinkedIn architect a HA configuration
>>for
>> >Samza on YARN such that a complete instance failure of ResourceManager
>> >provides failover for the YARN cluster?
>> >
>> >Thanks for your help.
>> >
>> >Best,
>> >Ethan
>> >
>> >
>> >--
>> >Ethan Setnik
>> >MobileAware
>> >
>> >m: +1 617 513 2052
>> >e: ethan.setnik@mobileaware.com
>>
>>

Re: Samza Highly Available YARN Configuration

Posted by Yan Fang <ya...@gmail.com>.

Hi All,

Happy daylight saving! I am wondering if anyone in this mailing-list has
successfully run the Samza in a HA YARN cluster ?

We are trying to run Samza in CDH5 which has HA YARN configurations. I am
able to run Samza only by updating the yarn-default.xml (change
yarn.resourcemanager.address), the same approach Nirmal Kumar mentioned in
"Running Samza on multi node". Otherwise, it will always connect to 0.0.0.0
in yarn-default.xml. (I am sure I set the conf file and YARN_HOME
correctly.)

So my question is:
1. Can't Samza interpret HA YARN configuration file correctly? ( Is that
because the HA YARN configuration is using, say,
yarn.resourcemanager.address.*rm15* instead of yarn.resourcemanager.address
?)

2. Is it possible to switch to a new RM automatically when one is down?
Because we have two RMs, one for Active and one for Standby but I can only
put one RM address in yarn-deault.xml. I am wondering if it is possible to
detect the active RM automatically in Samza (or other method)?

3. Any one has the luck to leverage the HA YARN?

Thank you.

Cheers,

Fang, Yan
yanfang724@gmail.com
+1 (206) 849-4108

On Fri, Feb 21, 2014 at 3:23 PM, Chris Riccomini <cr...@linkedin.com>wrote:

> Hey Ethan,
>
> YARN's HA support is marginal right now, and we're still investigating
> this stuff. Some useful things to read are:
>
> * https://issues.apache.org/jira/browse/YARN-128
> * https://issues.apache.org/jira/browse/YARN-149
> * https://issues.apache.org/jira/browse/YARN-353
> * https://issues.apache.org/jira/browse/YARN-556
>
>
> Also, CDH seems to be packaging some of the ZK-based HA stuff already:
>
>
> https://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest
> /CDH5-High-Availability-Guide/cdh5hag_cfg_RM_HA.html
>
>
> At LI, we're still experimenting with the best setup, so my guidance might
> not be state of the art. We currently configure the YARN RM's store
> (yarn.resourcemanager.store.class) to use the file system store
> (org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateSt
> ore). The failover is a manual operation where we copy the RM state to a
> new machine, and then start the RM on that machine. You then need to front
> the RM with a VIP or DNS entry, which you can update to point to the new
> RM machine when a failover occurs. The NMs need to be configured to point
> to this VIP/DNS entry, so that when a failover occurs, the NMs don't need
> to update their yarn-site.xml files.
>
>
> It sounds like in the future you won't need to use VIPs/DNS entries. You
> should probably also email the YARN mailing list, just in case we're
> misinformed or unaware of some new updates.
>
> Cheers,
> Chris
>
> On 2/21/14 2:27 PM, "Ethan Setnik" <et...@mobileaware.com> wrote:
>
> >I'm looking to deploy Samza on AWS infrastructure in a HA configuration.
> >I
> >have a clear picture of how to configure all the components such that they
> >do not contain any single point of failure.
> >
> >I'm stuck, however, when it comes to the YARN architecture.  It seems that
> >YARN relies on the single-master / multi-slave pattern as described in the
> >YARN documentation.  This introduces a single point of failure at the
> >ResourceManager level such that a failed ResourceManager will fail the
> >entire YARN cluster.  How does LinkedIn architect a HA configuration for
> >Samza on YARN such that a complete instance failure of ResourceManager
> >provides failover for the YARN cluster?
> >
> >Thanks for your help.
> >
> >Best,
> >Ethan
> >
> >
> >--
> >Ethan Setnik
> >MobileAware
> >
> >m: +1 617 513 2052
> >e: ethan.setnik@mobileaware.com
>
>

Re: Samza Highly Available YARN Configuration

Posted by Chris Riccomini <cr...@linkedin.com>.

Hey Ethan,

YARN's HA support is marginal right now, and we're still investigating
this stuff. Some useful things to read are:

* https://issues.apache.org/jira/browse/YARN-128
* https://issues.apache.org/jira/browse/YARN-149
* https://issues.apache.org/jira/browse/YARN-353
* https://issues.apache.org/jira/browse/YARN-556

Also, CDH seems to be packaging some of the ZK-based HA stuff already:

https://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest
/CDH5-High-Availability-Guide/cdh5hag_cfg_RM_HA.html

At LI, we're still experimenting with the best setup, so my guidance might
not be state of the art. We currently configure the YARN RM's store
(yarn.resourcemanager.store.class) to use the file system store
(org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateSt
ore). The failover is a manual operation where we copy the RM state to a
new machine, and then start the RM on that machine. You then need to front
the RM with a VIP or DNS entry, which you can update to point to the new
RM machine when a failover occurs. The NMs need to be configured to point
to this VIP/DNS entry, so that when a failover occurs, the NMs don't need
to update their yarn-site.xml files.

It sounds like in the future you won't need to use VIPs/DNS entries. You
should probably also email the YARN mailing list, just in case we're
misinformed or unaware of some new updates.

Cheers,
Chris

On 2/21/14 2:27 PM, "Ethan Setnik" <et...@mobileaware.com> wrote:

>I'm looking to deploy Samza on AWS infrastructure in a HA configuration.
>I
>have a clear picture of how to configure all the components such that they
>do not contain any single point of failure.
>
>I'm stuck, however, when it comes to the YARN architecture.  It seems that
>YARN relies on the single-master / multi-slave pattern as described in the
>YARN documentation.  This introduces a single point of failure at the
>ResourceManager level such that a failed ResourceManager will fail the
>entire YARN cluster.  How does LinkedIn architect a HA configuration for
>Samza on YARN such that a complete instance failure of ResourceManager
>provides failover for the YARN cluster?
>
>Thanks for your help.
>
>Best,
>Ethan
>
>
>-- 
>Ethan Setnik
>MobileAware
>
>m: +1 617 513 2052
>e: ethan.setnik@mobileaware.com