You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Sriram Ramachandrasekaran <sr...@gmail.com> on 2016/08/31 03:35:36 UTC

Samza Mesos

Folks,

We've been using Samza in Production from beginning of this year. It's been
quite stable for our needs, although, we don't use it heavily yet. One of
the things we would like to know is, where is Samza Mesos integration in
the roadmap? I know, SAMZA-375
<https://issues.apache.org/jira/browse/SAMZA-375> is specifically towards
that, but, is there something stopping the community from integration into
mainline?

I ask this because, we run our Samza jobs on YARN right now and we use
Mesos infra for other workloads. I really don't want to manage 2 infra
components which are supposed to do exactly the same thing. We've built
enough tooling around Mesos infra, so, wouldn't want to move away from it
too.

The options we're evaluating are:
1. Move to KStreams and get away from YARN
2. Explore Samza-Mesos integration so that, we can reduce "explicit"
dependency on Kafka.


Some clarity on this would really help us.
Sriram

-- 
It's just about how deep your longing is!

Re: Samza Mesos

Posted by Yi Pan <ni...@gmail.com>.
Hi, Sriram,

Yes, that's the correct direction to go.

Cheers!

-Yi

On Wed, Aug 31, 2016 at 12:39 PM, Sriram Ramachandrasekaran <
sri.rams85@gmail.com> wrote:

> Thanks Jagadish.
> So, in essence, I should be looking at samza-11 branch for the final API
> against which I would have to write the Mesos integration pieces?
>
> On Thu, Sep 1, 2016 at 12:56 AM, Jagadish Venkatraman <
> jagadish1989@gmail.com> wrote:
>
> > Hi Sriram,
> >
> > I had started prototyping it (purely to ensure that the Samza API makes
> > sense with Mesos). The exact API on the Samza-11 trunk is slightly
> > different, but hopefully there're similarities -
> > https://github.com/apache/samza/blob/master/samza-core/
> > src/main/java/org/apache/samza/clustermanager/
> ClusterResourceManager.java
> >
> > Find a stub implementation here: (that encapsulates a fair bit of boiler
> > plate from Mesos driver creation etc.)
> > https://github.com/vjagadish/samza-clone/commit/
> > 9e5ed9f1774dadf079ad33913ff7f20ed58bc8dc
> >
> > A version of the prototype with the Old API: here
> > <https://github.com/bringhurst/samza/tree/SAMZA-
> 375/samza-mesos/src/main/
> > scala/org/apache/samza/job/mesos>
> >
> > Some interesting implementation  notes:
> > - Mesos did not (yet) support a preferred host request. However, that
> could
> > be implemented via dynamic reservations
> > <http://mesos.apache.org/documentation/latest/reservation/>.
> > - My discussions with the Mesos community here:
> > https://mail-archives.apache.org/mod_mbox/mesos-user/201602.mbox/%
> > 3CCAMd3yjgxMhg4RTw4GBXGf9MSMBV6ABzBgpqL6eJQ5gwMOT0tYA@mail.gmail.com%3E
> > - MESOS-4616 has more context.
> >
> > It'd be awesome you can take a stab at Mesos integration - I'm happy to
> > help out in whatever way I can.
> >
> > Thank you,
> > Jagadish
> >
> > On Wed, Aug 31, 2016 at 10:45 AM, Sriram Ramachandrasekaran <
> > sri.rams85@gmail.com> wrote:
> >
> > > Yi,
> > > That's a good amount of history to know. I will take a look at 680 and
> > then
> > > see if I can implement something as well. If there's some stuff that's
> > > already done, would be glad to re-use it too.
> > > Thanks again
> > >
> > > On Wed, Aug 31, 2016 at 10:58 PM, Yi Pan <ni...@gmail.com> wrote:
> > >
> > > > Hi, Sriram,
> > > >
> > > > The story behind delaying the integration of SAMZA-375 is that there
> > are
> > > > tons of repeated code in SamzaAppMaster that exist in both samza-yarn
> > and
> > > > Mesos. W/o the change we recently made in SAMZA-680, we are going to
> > copy
> > > > the SamzaAppMaster code for every distributed execution system that
> we
> > > > added support in Samza. Now, w/ the change in SAMZA-680, we have
> > inverted
> > > > the JobCoordinator and the AppMaster logic, which makes it much
> easier
> > to
> > > > have pluggable distributed cluster management system in Samza. As
> > stated
> > > in
> > > > the JIRA, all we need is now a Mesos-specific implementation of
> > > > ClusterResourceManager that can talk to Mesos for container
> > > > request/allocation.
> > > >
> > > > @Jagadish, I remember that you did some proto-type integration w/
> Mesos
> > > > based on SAMZA-680. Would you mind to share some example code for
> that?
> > > >
> > > > Thanks!
> > > >
> > > > -Yi
> > > >
> > > > On Tue, Aug 30, 2016 at 8:35 PM, Sriram Ramachandrasekaran <
> > > > sri.rams85@gmail.com> wrote:
> > > >
> > > > > Folks,
> > > > >
> > > > > We've been using Samza in Production from beginning of this year.
> > It's
> > > > been
> > > > > quite stable for our needs, although, we don't use it heavily yet.
> > One
> > > of
> > > > > the things we would like to know is, where is Samza Mesos
> integration
> > > in
> > > > > the roadmap? I know, SAMZA-375
> > > > > <https://issues.apache.org/jira/browse/SAMZA-375> is specifically
> > > > towards
> > > > > that, but, is there something stopping the community from
> integration
> > > > into
> > > > > mainline?
> > > > >
> > > > > I ask this because, we run our Samza jobs on YARN right now and we
> > use
> > > > > Mesos infra for other workloads. I really don't want to manage 2
> > infra
> > > > > components which are supposed to do exactly the same thing. We've
> > built
> > > > > enough tooling around Mesos infra, so, wouldn't want to move away
> > from
> > > it
> > > > > too.
> > > > >
> > > > > The options we're evaluating are:
> > > > > 1. Move to KStreams and get away from YARN
> > > > > 2. Explore Samza-Mesos integration so that, we can reduce
> "explicit"
> > > > > dependency on Kafka.
> > > > >
> > > > >
> > > > > Some clarity on this would really help us.
> > > > > Sriram
> > > > >
> > > > > --
> > > > > It's just about how deep your longing is!
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > It's just about how deep your longing is!
> > >
> >
> >
> >
> > --
> > Jagadish V,
> > Graduate Student,
> > Department of Computer Science,
> > Stanford University
> >
>
>
>
> --
> It's just about how deep your longing is!
>

Re: Samza Mesos

Posted by Sriram Ramachandrasekaran <sr...@gmail.com>.
Thanks Jagadish.
So, in essence, I should be looking at samza-11 branch for the final API
against which I would have to write the Mesos integration pieces?

On Thu, Sep 1, 2016 at 12:56 AM, Jagadish Venkatraman <
jagadish1989@gmail.com> wrote:

> Hi Sriram,
>
> I had started prototyping it (purely to ensure that the Samza API makes
> sense with Mesos). The exact API on the Samza-11 trunk is slightly
> different, but hopefully there're similarities -
> https://github.com/apache/samza/blob/master/samza-core/
> src/main/java/org/apache/samza/clustermanager/ClusterResourceManager.java
>
> Find a stub implementation here: (that encapsulates a fair bit of boiler
> plate from Mesos driver creation etc.)
> https://github.com/vjagadish/samza-clone/commit/
> 9e5ed9f1774dadf079ad33913ff7f20ed58bc8dc
>
> A version of the prototype with the Old API: here
> <https://github.com/bringhurst/samza/tree/SAMZA-375/samza-mesos/src/main/
> scala/org/apache/samza/job/mesos>
>
> Some interesting implementation  notes:
> - Mesos did not (yet) support a preferred host request. However, that could
> be implemented via dynamic reservations
> <http://mesos.apache.org/documentation/latest/reservation/>.
> - My discussions with the Mesos community here:
> https://mail-archives.apache.org/mod_mbox/mesos-user/201602.mbox/%
> 3CCAMd3yjgxMhg4RTw4GBXGf9MSMBV6ABzBgpqL6eJQ5gwMOT0tYA@mail.gmail.com%3E
> - MESOS-4616 has more context.
>
> It'd be awesome you can take a stab at Mesos integration - I'm happy to
> help out in whatever way I can.
>
> Thank you,
> Jagadish
>
> On Wed, Aug 31, 2016 at 10:45 AM, Sriram Ramachandrasekaran <
> sri.rams85@gmail.com> wrote:
>
> > Yi,
> > That's a good amount of history to know. I will take a look at 680 and
> then
> > see if I can implement something as well. If there's some stuff that's
> > already done, would be glad to re-use it too.
> > Thanks again
> >
> > On Wed, Aug 31, 2016 at 10:58 PM, Yi Pan <ni...@gmail.com> wrote:
> >
> > > Hi, Sriram,
> > >
> > > The story behind delaying the integration of SAMZA-375 is that there
> are
> > > tons of repeated code in SamzaAppMaster that exist in both samza-yarn
> and
> > > Mesos. W/o the change we recently made in SAMZA-680, we are going to
> copy
> > > the SamzaAppMaster code for every distributed execution system that we
> > > added support in Samza. Now, w/ the change in SAMZA-680, we have
> inverted
> > > the JobCoordinator and the AppMaster logic, which makes it much easier
> to
> > > have pluggable distributed cluster management system in Samza. As
> stated
> > in
> > > the JIRA, all we need is now a Mesos-specific implementation of
> > > ClusterResourceManager that can talk to Mesos for container
> > > request/allocation.
> > >
> > > @Jagadish, I remember that you did some proto-type integration w/ Mesos
> > > based on SAMZA-680. Would you mind to share some example code for that?
> > >
> > > Thanks!
> > >
> > > -Yi
> > >
> > > On Tue, Aug 30, 2016 at 8:35 PM, Sriram Ramachandrasekaran <
> > > sri.rams85@gmail.com> wrote:
> > >
> > > > Folks,
> > > >
> > > > We've been using Samza in Production from beginning of this year.
> It's
> > > been
> > > > quite stable for our needs, although, we don't use it heavily yet.
> One
> > of
> > > > the things we would like to know is, where is Samza Mesos integration
> > in
> > > > the roadmap? I know, SAMZA-375
> > > > <https://issues.apache.org/jira/browse/SAMZA-375> is specifically
> > > towards
> > > > that, but, is there something stopping the community from integration
> > > into
> > > > mainline?
> > > >
> > > > I ask this because, we run our Samza jobs on YARN right now and we
> use
> > > > Mesos infra for other workloads. I really don't want to manage 2
> infra
> > > > components which are supposed to do exactly the same thing. We've
> built
> > > > enough tooling around Mesos infra, so, wouldn't want to move away
> from
> > it
> > > > too.
> > > >
> > > > The options we're evaluating are:
> > > > 1. Move to KStreams and get away from YARN
> > > > 2. Explore Samza-Mesos integration so that, we can reduce "explicit"
> > > > dependency on Kafka.
> > > >
> > > >
> > > > Some clarity on this would really help us.
> > > > Sriram
> > > >
> > > > --
> > > > It's just about how deep your longing is!
> > > >
> > >
> >
> >
> >
> > --
> > It's just about how deep your longing is!
> >
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>



-- 
It's just about how deep your longing is!

Re: Samza Mesos

Posted by Jagadish Venkatraman <ja...@gmail.com>.
Hi Sriram,

I had started prototyping it (purely to ensure that the Samza API makes
sense with Mesos). The exact API on the Samza-11 trunk is slightly
different, but hopefully there're similarities -
https://github.com/apache/samza/blob/master/samza-core/src/main/java/org/apache/samza/clustermanager/ClusterResourceManager.java

Find a stub implementation here: (that encapsulates a fair bit of boiler
plate from Mesos driver creation etc.)
https://github.com/vjagadish/samza-clone/commit/9e5ed9f1774dadf079ad33913ff7f20ed58bc8dc

A version of the prototype with the Old API: here
<https://github.com/bringhurst/samza/tree/SAMZA-375/samza-mesos/src/main/scala/org/apache/samza/job/mesos>

Some interesting implementation  notes:
- Mesos did not (yet) support a preferred host request. However, that could
be implemented via dynamic reservations
<http://mesos.apache.org/documentation/latest/reservation/>.
- My discussions with the Mesos community here:
https://mail-archives.apache.org/mod_mbox/mesos-user/201602.mbox/%3CCAMd3yjgxMhg4RTw4GBXGf9MSMBV6ABzBgpqL6eJQ5gwMOT0tYA@mail.gmail.com%3E
- MESOS-4616 has more context.

It'd be awesome you can take a stab at Mesos integration - I'm happy to
help out in whatever way I can.

Thank you,
Jagadish

On Wed, Aug 31, 2016 at 10:45 AM, Sriram Ramachandrasekaran <
sri.rams85@gmail.com> wrote:

> Yi,
> That's a good amount of history to know. I will take a look at 680 and then
> see if I can implement something as well. If there's some stuff that's
> already done, would be glad to re-use it too.
> Thanks again
>
> On Wed, Aug 31, 2016 at 10:58 PM, Yi Pan <ni...@gmail.com> wrote:
>
> > Hi, Sriram,
> >
> > The story behind delaying the integration of SAMZA-375 is that there are
> > tons of repeated code in SamzaAppMaster that exist in both samza-yarn and
> > Mesos. W/o the change we recently made in SAMZA-680, we are going to copy
> > the SamzaAppMaster code for every distributed execution system that we
> > added support in Samza. Now, w/ the change in SAMZA-680, we have inverted
> > the JobCoordinator and the AppMaster logic, which makes it much easier to
> > have pluggable distributed cluster management system in Samza. As stated
> in
> > the JIRA, all we need is now a Mesos-specific implementation of
> > ClusterResourceManager that can talk to Mesos for container
> > request/allocation.
> >
> > @Jagadish, I remember that you did some proto-type integration w/ Mesos
> > based on SAMZA-680. Would you mind to share some example code for that?
> >
> > Thanks!
> >
> > -Yi
> >
> > On Tue, Aug 30, 2016 at 8:35 PM, Sriram Ramachandrasekaran <
> > sri.rams85@gmail.com> wrote:
> >
> > > Folks,
> > >
> > > We've been using Samza in Production from beginning of this year. It's
> > been
> > > quite stable for our needs, although, we don't use it heavily yet. One
> of
> > > the things we would like to know is, where is Samza Mesos integration
> in
> > > the roadmap? I know, SAMZA-375
> > > <https://issues.apache.org/jira/browse/SAMZA-375> is specifically
> > towards
> > > that, but, is there something stopping the community from integration
> > into
> > > mainline?
> > >
> > > I ask this because, we run our Samza jobs on YARN right now and we use
> > > Mesos infra for other workloads. I really don't want to manage 2 infra
> > > components which are supposed to do exactly the same thing. We've built
> > > enough tooling around Mesos infra, so, wouldn't want to move away from
> it
> > > too.
> > >
> > > The options we're evaluating are:
> > > 1. Move to KStreams and get away from YARN
> > > 2. Explore Samza-Mesos integration so that, we can reduce "explicit"
> > > dependency on Kafka.
> > >
> > >
> > > Some clarity on this would really help us.
> > > Sriram
> > >
> > > --
> > > It's just about how deep your longing is!
> > >
> >
>
>
>
> --
> It's just about how deep your longing is!
>



-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University

Re: Samza Mesos

Posted by Sriram Ramachandrasekaran <sr...@gmail.com>.
Yi,
That's a good amount of history to know. I will take a look at 680 and then
see if I can implement something as well. If there's some stuff that's
already done, would be glad to re-use it too.
Thanks again

On Wed, Aug 31, 2016 at 10:58 PM, Yi Pan <ni...@gmail.com> wrote:

> Hi, Sriram,
>
> The story behind delaying the integration of SAMZA-375 is that there are
> tons of repeated code in SamzaAppMaster that exist in both samza-yarn and
> Mesos. W/o the change we recently made in SAMZA-680, we are going to copy
> the SamzaAppMaster code for every distributed execution system that we
> added support in Samza. Now, w/ the change in SAMZA-680, we have inverted
> the JobCoordinator and the AppMaster logic, which makes it much easier to
> have pluggable distributed cluster management system in Samza. As stated in
> the JIRA, all we need is now a Mesos-specific implementation of
> ClusterResourceManager that can talk to Mesos for container
> request/allocation.
>
> @Jagadish, I remember that you did some proto-type integration w/ Mesos
> based on SAMZA-680. Would you mind to share some example code for that?
>
> Thanks!
>
> -Yi
>
> On Tue, Aug 30, 2016 at 8:35 PM, Sriram Ramachandrasekaran <
> sri.rams85@gmail.com> wrote:
>
> > Folks,
> >
> > We've been using Samza in Production from beginning of this year. It's
> been
> > quite stable for our needs, although, we don't use it heavily yet. One of
> > the things we would like to know is, where is Samza Mesos integration in
> > the roadmap? I know, SAMZA-375
> > <https://issues.apache.org/jira/browse/SAMZA-375> is specifically
> towards
> > that, but, is there something stopping the community from integration
> into
> > mainline?
> >
> > I ask this because, we run our Samza jobs on YARN right now and we use
> > Mesos infra for other workloads. I really don't want to manage 2 infra
> > components which are supposed to do exactly the same thing. We've built
> > enough tooling around Mesos infra, so, wouldn't want to move away from it
> > too.
> >
> > The options we're evaluating are:
> > 1. Move to KStreams and get away from YARN
> > 2. Explore Samza-Mesos integration so that, we can reduce "explicit"
> > dependency on Kafka.
> >
> >
> > Some clarity on this would really help us.
> > Sriram
> >
> > --
> > It's just about how deep your longing is!
> >
>



-- 
It's just about how deep your longing is!

Re: Samza Mesos

Posted by Yi Pan <ni...@gmail.com>.
Hi, Sriram,

The story behind delaying the integration of SAMZA-375 is that there are
tons of repeated code in SamzaAppMaster that exist in both samza-yarn and
Mesos. W/o the change we recently made in SAMZA-680, we are going to copy
the SamzaAppMaster code for every distributed execution system that we
added support in Samza. Now, w/ the change in SAMZA-680, we have inverted
the JobCoordinator and the AppMaster logic, which makes it much easier to
have pluggable distributed cluster management system in Samza. As stated in
the JIRA, all we need is now a Mesos-specific implementation of
ClusterResourceManager that can talk to Mesos for container
request/allocation.

@Jagadish, I remember that you did some proto-type integration w/ Mesos
based on SAMZA-680. Would you mind to share some example code for that?

Thanks!

-Yi

On Tue, Aug 30, 2016 at 8:35 PM, Sriram Ramachandrasekaran <
sri.rams85@gmail.com> wrote:

> Folks,
>
> We've been using Samza in Production from beginning of this year. It's been
> quite stable for our needs, although, we don't use it heavily yet. One of
> the things we would like to know is, where is Samza Mesos integration in
> the roadmap? I know, SAMZA-375
> <https://issues.apache.org/jira/browse/SAMZA-375> is specifically towards
> that, but, is there something stopping the community from integration into
> mainline?
>
> I ask this because, we run our Samza jobs on YARN right now and we use
> Mesos infra for other workloads. I really don't want to manage 2 infra
> components which are supposed to do exactly the same thing. We've built
> enough tooling around Mesos infra, so, wouldn't want to move away from it
> too.
>
> The options we're evaluating are:
> 1. Move to KStreams and get away from YARN
> 2. Explore Samza-Mesos integration so that, we can reduce "explicit"
> dependency on Kafka.
>
>
> Some clarity on this would really help us.
> Sriram
>
> --
> It's just about how deep your longing is!
>