You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by Saikat Kanjilal <sx...@gmail.com> on 2017/06/13 23:17:16 UTC

Design Considerations on reef-1791

@Markus/Sergiy,
I've spent the past few days or so studying the implementation of the
reef-runtime-mesos and had some things I wanted to discuss, as I mentioned
before I created reef-runtime-spark as a clone of the mesos runtime as a
first step.  However the more I look at the code and try to figure out how
to merge
https://github.com/apache/reef/tree/master/lang/scala/reef-examples-scala/src/main/scala/org/apache/reef/examples/hellospark
into reef-runtime-spark there are several things that come to mind needing
further discussion:

1) the mesos runtime is currently using google protcol buffer and the mesos
task API, am assuming we don't need any of this for the spark runtime or
any of the interfaces with avro, is that assumption correct
2) I see a lot of classes in the org.apache.reef.runtime.mesos.driver
package associated with Launching, Releasing,Requesting Resources, in the
interim I renamed all these to Spark versions and am assuming we can still
reuse these, do you see any issues with this, if we can reuse these they
will be available through the SparkDriverConfiguration which extends
ConfigurationModuleBuilder (again similar to Mesos implementation)
3) I also renamed all of the mesos evaluator packages to their spark
counterparts, do you see any issues with reusing the evaluator parameters
classes
4) Finally I am looking at the mesos util directory and I am wondering if
we can do without any of the Remote management functionality (i..e
MesosRemoteManager etc)


Would love some input on this as I piece through the first implementation
of the reef-runtime-spark.
Regards

Re: Design Considerations on reef-1791

Posted by Markus Weimer <ma...@weimo.de>.
On Tue, Jun 13, 2017 at 4:17 PM, Saikat Kanjilal <sx...@gmail.com> wrote:
> 1) the mesos runtime is currently using google protcol buffer and the mesos
> task API, am assuming we don't need any of this for the spark runtime or
> any of the interfaces with avro, is that assumption correct

Yes, I don't think we need any interfacing via protobuf for this.

> 2) I see a lot of classes in the org.apache.reef.runtime.mesos.driver
> package associated with Launching, Releasing,Requesting Resources, in the
> interim I renamed all these to Spark versions and am assuming we can still
> reuse these, do you see any issues with this, if we can reuse these they
> will be available through the SparkDriverConfiguration which extends
> ConfigurationModuleBuilder (again similar to Mesos implementation)

You will need similar classes, but with different content :)

> 3) I also renamed all of the mesos evaluator packages to their spark
> counterparts, do you see any issues with reusing the evaluator parameters
> classes

No, as long as they are in a new package.

> 4) Finally I am looking at the mesos util directory and I am wondering if
> we can do without any of the Remote management functionality (i..e
> MesosRemoteManager etc)

I don't think we need the mesos remoting support for Spark.


Markus

Re: Design Considerations on reef-1791

Posted by John Yang <jo...@gmail.com>.
Hi Saikat,

Thanks for the clarification. I'm not familiar with the reef-runtime-spark
project, so I'm not sure I can answer your questions in that regard.
You're welcome to ask questions about developing the mesos runtime itself,
if that becomes your interest. :)

Thanks!
John


On Wed, Jun 14, 2017 at 10:42 AM, Saikat Kanjilal <sx...@gmail.com> wrote:

> :)))))), Hi John,
>  I am not working on the mesons runtime but rather using it as a template
> for building the reef runtime on spark, please read my email carefully
> below :) and let me know your thoughts on extending parts of this runtime
> to the reef runtime spark architecture.
> Regards
>
> Sent from my iPad
>
> > On Jun 13, 2017, at 6:19 PM, John Yang <jo...@gmail.com> wrote:
> >
> > Hi Saikat,
> >
> >
> > Many thanks for working on the mesos runtime!
> > I can answer 4): Yes, we can do without the extra remote managers, but
> with
> > some caveats.
> >
> > By default, Mesos employs pessimistic concurrency control
> > <https://research.google.com/pubs/pub41684.html> in giving out resource
> > offers.
> > So from our(REEF) perspective, once we get a resource offer from Mesos, I
> > believe the offer is pretty much for us to keep without any other job
> > taking it away from us.
> > With this in mind, the mesos runtime can do the following, which doesn't
> > really require any extra RemoteManagers.
> >
> >   - Upon start: Be a good citizen and reject any incoming offers, since
> we
> >   don't need any resources yet
> >   - Upon resource request: Keep an appropriate offer
> >   - Upon resource launch: Simply launch a REEF evaluator with the offer
> >
> > Let's call this Design A
> >
> > However, the current mesos runtime implementation(let's call it Design B)
> > does not work like Design A.
> > The main reason is that custom allocators
> > <http://mesos.apache.org/documentation/latest/
> allocation-module/#writing-a-custom-allocator>
> > that
> > make offers to multiple jobs simultaneously can be used in Mesos.
> > So to make sure, Design B launches a Mesos task upon resource request,
> and
> > the task sets up a RemoteManager channel through which the REEF evaluator
> > is launched.
> >
> > I must admit that had I known more about the pessimistic locking 3 years
> > ago when I wrote the mesos runtime, I would've thought about going with
> > Design A, which covers the common case much more nicely.
> > And then, I would've handled the behaviors of custom allocators as
> > exceptional cases through implementing the Scheduler#offerRescinded
> > callback, although I'm still not sure if it's straightforward to do so
> with
> > REEF.
> >
> > All in all, I believe the mesos runtime hasn't really been maintained
> since
> > it was first written, and has bits that need to be refactored.
> > For example, I see that we're still using Mesos 0.25.0, when 1.2.0
> > <http://mesos.apache.org/> has been released.
> >
> > Hope this helps.
> >
> >
> > Thanks,
> > John
> >
> >
> >> On Wed, Jun 14, 2017 at 8:17 AM, Saikat Kanjilal <sx...@gmail.com>
> wrote:
> >>
> >> @Markus/Sergiy,
> >> I've spent the past few days or so studying the implementation of the
> >> reef-runtime-mesos and had some things I wanted to discuss, as I
> mentioned
> >> before I created reef-runtime-spark as a clone of the mesos runtime as a
> >> first step.  However the more I look at the code and try to figure out
> how
> >> to merge
> >> https://github.com/apache/reef/tree/master/lang/scala/
> >> reef-examples-scala/src/main/scala/org/apache/reef/examples/hellospark
> >> into reef-runtime-spark there are several things that come to mind
> needing
> >> further discussion:
> >>
> >> 1) the mesos runtime is currently using google protcol buffer and the
> mesos
> >> task API, am assuming we don't need any of this for the spark runtime or
> >> any of the interfaces with avro, is that assumption correct
> >> 2) I see a lot of classes in the org.apache.reef.runtime.mesos.driver
> >> package associated with Launching, Releasing,Requesting Resources, in
> the
> >> interim I renamed all these to Spark versions and am assuming we can
> still
> >> reuse these, do you see any issues with this, if we can reuse these they
> >> will be available through the SparkDriverConfiguration which extends
> >> ConfigurationModuleBuilder (again similar to Mesos implementation)
> >> 3) I also renamed all of the mesos evaluator packages to their spark
> >> counterparts, do you see any issues with reusing the evaluator
> parameters
> >> classes
> >> 4) Finally I am looking at the mesos util directory and I am wondering
> if
> >> we can do without any of the Remote management functionality (i..e
> >> MesosRemoteManager etc)
> >>
> >>
> >> Would love some input on this as I piece through the first
> implementation
> >> of the reef-runtime-spark.
> >> Regards
> >>
>

Re: Design Considerations on reef-1791

Posted by Saikat Kanjilal <sx...@gmail.com>.
:)))))), Hi John,
 I am not working on the mesons runtime but rather using it as a template for building the reef runtime on spark, please read my email carefully below :) and let me know your thoughts on extending parts of this runtime to the reef runtime spark architecture.
Regards

Sent from my iPad

> On Jun 13, 2017, at 6:19 PM, John Yang <jo...@gmail.com> wrote:
> 
> Hi Saikat,
> 
> 
> Many thanks for working on the mesos runtime!
> I can answer 4): Yes, we can do without the extra remote managers, but with
> some caveats.
> 
> By default, Mesos employs pessimistic concurrency control
> <https://research.google.com/pubs/pub41684.html> in giving out resource
> offers.
> So from our(REEF) perspective, once we get a resource offer from Mesos, I
> believe the offer is pretty much for us to keep without any other job
> taking it away from us.
> With this in mind, the mesos runtime can do the following, which doesn't
> really require any extra RemoteManagers.
> 
>   - Upon start: Be a good citizen and reject any incoming offers, since we
>   don't need any resources yet
>   - Upon resource request: Keep an appropriate offer
>   - Upon resource launch: Simply launch a REEF evaluator with the offer
> 
> Let's call this Design A
> 
> However, the current mesos runtime implementation(let's call it Design B)
> does not work like Design A.
> The main reason is that custom allocators
> <http://mesos.apache.org/documentation/latest/allocation-module/#writing-a-custom-allocator>
> that
> make offers to multiple jobs simultaneously can be used in Mesos.
> So to make sure, Design B launches a Mesos task upon resource request, and
> the task sets up a RemoteManager channel through which the REEF evaluator
> is launched.
> 
> I must admit that had I known more about the pessimistic locking 3 years
> ago when I wrote the mesos runtime, I would've thought about going with
> Design A, which covers the common case much more nicely.
> And then, I would've handled the behaviors of custom allocators as
> exceptional cases through implementing the Scheduler#offerRescinded
> callback, although I'm still not sure if it's straightforward to do so with
> REEF.
> 
> All in all, I believe the mesos runtime hasn't really been maintained since
> it was first written, and has bits that need to be refactored.
> For example, I see that we're still using Mesos 0.25.0, when 1.2.0
> <http://mesos.apache.org/> has been released.
> 
> Hope this helps.
> 
> 
> Thanks,
> John
> 
> 
>> On Wed, Jun 14, 2017 at 8:17 AM, Saikat Kanjilal <sx...@gmail.com> wrote:
>> 
>> @Markus/Sergiy,
>> I've spent the past few days or so studying the implementation of the
>> reef-runtime-mesos and had some things I wanted to discuss, as I mentioned
>> before I created reef-runtime-spark as a clone of the mesos runtime as a
>> first step.  However the more I look at the code and try to figure out how
>> to merge
>> https://github.com/apache/reef/tree/master/lang/scala/
>> reef-examples-scala/src/main/scala/org/apache/reef/examples/hellospark
>> into reef-runtime-spark there are several things that come to mind needing
>> further discussion:
>> 
>> 1) the mesos runtime is currently using google protcol buffer and the mesos
>> task API, am assuming we don't need any of this for the spark runtime or
>> any of the interfaces with avro, is that assumption correct
>> 2) I see a lot of classes in the org.apache.reef.runtime.mesos.driver
>> package associated with Launching, Releasing,Requesting Resources, in the
>> interim I renamed all these to Spark versions and am assuming we can still
>> reuse these, do you see any issues with this, if we can reuse these they
>> will be available through the SparkDriverConfiguration which extends
>> ConfigurationModuleBuilder (again similar to Mesos implementation)
>> 3) I also renamed all of the mesos evaluator packages to their spark
>> counterparts, do you see any issues with reusing the evaluator parameters
>> classes
>> 4) Finally I am looking at the mesos util directory and I am wondering if
>> we can do without any of the Remote management functionality (i..e
>> MesosRemoteManager etc)
>> 
>> 
>> Would love some input on this as I piece through the first implementation
>> of the reef-runtime-spark.
>> Regards
>> 

Re: Design Considerations on reef-1791

Posted by John Yang <jo...@gmail.com>.
Hi Saikat,


Many thanks for working on the mesos runtime!
I can answer 4): Yes, we can do without the extra remote managers, but with
some caveats.

By default, Mesos employs pessimistic concurrency control
<https://research.google.com/pubs/pub41684.html> in giving out resource
offers.
So from our(REEF) perspective, once we get a resource offer from Mesos, I
believe the offer is pretty much for us to keep without any other job
taking it away from us.
With this in mind, the mesos runtime can do the following, which doesn't
really require any extra RemoteManagers.

   - Upon start: Be a good citizen and reject any incoming offers, since we
   don't need any resources yet
   - Upon resource request: Keep an appropriate offer
   - Upon resource launch: Simply launch a REEF evaluator with the offer

Let's call this Design A

However, the current mesos runtime implementation(let's call it Design B)
does not work like Design A.
The main reason is that custom allocators
<http://mesos.apache.org/documentation/latest/allocation-module/#writing-a-custom-allocator>
that
make offers to multiple jobs simultaneously can be used in Mesos.
So to make sure, Design B launches a Mesos task upon resource request, and
the task sets up a RemoteManager channel through which the REEF evaluator
is launched.

I must admit that had I known more about the pessimistic locking 3 years
ago when I wrote the mesos runtime, I would've thought about going with
Design A, which covers the common case much more nicely.
And then, I would've handled the behaviors of custom allocators as
exceptional cases through implementing the Scheduler#offerRescinded
callback, although I'm still not sure if it's straightforward to do so with
REEF.

All in all, I believe the mesos runtime hasn't really been maintained since
it was first written, and has bits that need to be refactored.
For example, I see that we're still using Mesos 0.25.0, when 1.2.0
<http://mesos.apache.org/> has been released.

Hope this helps.


Thanks,
John


On Wed, Jun 14, 2017 at 8:17 AM, Saikat Kanjilal <sx...@gmail.com> wrote:

> @Markus/Sergiy,
> I've spent the past few days or so studying the implementation of the
> reef-runtime-mesos and had some things I wanted to discuss, as I mentioned
> before I created reef-runtime-spark as a clone of the mesos runtime as a
> first step.  However the more I look at the code and try to figure out how
> to merge
> https://github.com/apache/reef/tree/master/lang/scala/
> reef-examples-scala/src/main/scala/org/apache/reef/examples/hellospark
> into reef-runtime-spark there are several things that come to mind needing
> further discussion:
>
> 1) the mesos runtime is currently using google protcol buffer and the mesos
> task API, am assuming we don't need any of this for the spark runtime or
> any of the interfaces with avro, is that assumption correct
> 2) I see a lot of classes in the org.apache.reef.runtime.mesos.driver
> package associated with Launching, Releasing,Requesting Resources, in the
> interim I renamed all these to Spark versions and am assuming we can still
> reuse these, do you see any issues with this, if we can reuse these they
> will be available through the SparkDriverConfiguration which extends
> ConfigurationModuleBuilder (again similar to Mesos implementation)
> 3) I also renamed all of the mesos evaluator packages to their spark
> counterparts, do you see any issues with reusing the evaluator parameters
> classes
> 4) Finally I am looking at the mesos util directory and I am wondering if
> we can do without any of the Remote management functionality (i..e
> MesosRemoteManager etc)
>
>
> Would love some input on this as I piece through the first implementation
> of the reef-runtime-spark.
> Regards
>