You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nemo.apache.org by JangHo Seo <ja...@apache.org> on 2018/06/01 06:59:14 UTC

[DISCUSS] Merge nemo-runtime-common into nemo-common?

Hi Nemo devs,

Currently Nemo codebase is splitted into multiple Maven modules.
While it's good for ease of maintenance and modularity, a module cannot
reference classes and/or interfaces in other modules unless it declares
them as dependencies in pom.xml.
Sometimes it makes difficult to expose runtime-level behaviors as
IR-level abstractions, when nemo-compiler-* modules cannot access the
related definitions in nemo-runtime-* modules.

A good example isĀ  [NEMO-73] 'SchedulingPolicy as Vertex-levelExecution
Property'. The issue is to expose SchedulingPolicy to compiler passes so
that users can configure scheduling behavior using Nemo IR.
Unfortunately 'SchedulingPolicy' is defined in 'nemo-runtime-common', so
compilers cannot access it and I think the proper fix is to move it to
the 'nemo-common' module.
Also, to move 'SchedulingPolicy' to 'nemo-common' module, lots of
definitions in 'nemo-runtime-common' should also be moved to 'nemo-common'.
(Because 'SchedulingPolicy' is a function of 'ExecutorRepresenter's and
'ExecutableTask', and the definition of 'ExecutableTask' depends on
'PhysicalStageEdge', 'PhysicalStage', and 'KeyRange', and so on...)

The module 'nemo-runtime-common' has definitions that define 'policies',
as well as definitions that implement 'mechanisms'.
'KeyRange' is an example for the former (defines how to partition the
data in a block), and 'MessageListener' is an example for the latter
(implements RPC).
In my opinion, policy-related definitions should be moved to
nemo-common, so that we can add more ExecutionProperties about
scheduilng, dynamic optimization, state management, and so on.

What do you think about this issue?

Regards,
Jangho


Re: [DISCUSS] Merge nemo-runtime-common into nemo-common?

Posted by Byung-Gon Chun <bg...@gmail.com>.
Thanks, Jangho and John.

This seems to be a non-trivial change.
Let's spend some time to come up with the best design.

Thanks.
-Gon


On Fri, Jun 1, 2018 at 8:01 PM, John Yang <jo...@gmail.com> wrote:

> Thanks Jangho for the explanation. I think I have a better understanding of
> the problem now.
>
> The SchedulingPolicy we have directly references the 'mechanisms' for
> executors and tasks. Nonetheless, I'd define SchedulingPolicy as 'given
> some information about a group of IRVertices (Stage) and executors, return
> executors that we want to use'.
>
> As you suggested, introducing a Stage interface would be nice, as we can do
> something like
>
> filter(StageInterface, list(ExecutorInfo))
>
> which can be implemented and configured in the optimizer. The runtime can
> supply appropriate arguments when calling the method, without exposing its
> 'mechanisms'.
>
> However, I'd be careful about introducing a concrete Stage 'class' in the
> optimizer, as that may require modifications of the stage partitioning
> logic, and optimization passes that only knows about IRVertices.
>
> Alternatively, and as the first implementation goal, something like this
> may be nice.
>
> filter(ExecutionPropertyMap, list(ExecutorInfo))
>
> which gives us the minimal information we need about the group of vertices
> and the executors. We can then tag an implementation of this method to each
> IRVertice, and perform checks when assigning stage numbers, similar to how
> we currently check if vertices of the same stage have the same degree of
> parallelism.
>
> After that I guess we can think about using a Stage interface, or providing
> additional information for expressing more sophisticated policies such as
> straggler handling.
>
> On a minor note, SchedulingPolicy appears to be in the master package, and
> not in the runtime-common package.
>
> https://github.com/apache/incubator-nemo/blob/master/
> runtime/master/src/main/java/edu/snu/nemo/runtime/master/
> scheduler/SchedulingPolicy.java
>
> John
>
> On Fri, Jun 1, 2018, 5:59 PM JangHo Seo <ja...@jangho.kr> wrote:
>
> > Thanks for the opinion.
> >
> > Currently some of policy-related code(e.g. SchedulingPolicy) depend on
> > mechanism-related code(e.g. PhysicalStage).
> > That said, SchedulingPolicy depends on 'declaration' of PhysicalStage,
> > not the actual 'implementation' of it.
> > And I agree mechanism-related code should stay within nemo-runtime-*
> > module.
> >
> > Maybe we can add PhysicalStage as an interface in nemo-common and rename
> > the current PhysicalStage to PhysicalStageImpl, implementing the
> > PhyscalStage interface.
> >
> > Regards,
> > Jangho
> >
> >
> > On 06/01/2018 05:49 PM, John Yang wrote:
> > > Sounds good to me.
> > >
> > >
> > > I agree that policy-related code should go into 'nemo-common' for the
> > > compiler-optimizer to optimize, and the runtime to execute.
> > > I guess 'KeyExtractor' and the ir package belong to this category, and
> > > that's why they're in 'nemo-common'
> > >
> > > On the other hand, mechanism-related code should stay in
> > > 'nemo-runtime-common' and should be exposed only to the
> > > compiler-backend/nemo.
> > > Things like 'PhysicalStage' seems to belong to this category, as it'd
> be
> > > unlikely that compiler-optimizer would want to do something with it.
> > >
> > >
> > > Thanks,
> > > John
> > >
> > > On Fri, Jun 1, 2018 at 3:59 PM, JangHo Seo <ja...@apache.org> wrote:
> > >
> > >> Hi Nemo devs,
> > >>
> > >> Currently Nemo codebase is splitted into multiple Maven modules.
> > >> While it's good for ease of maintenance and modularity, a module
> cannot
> > >> reference classes and/or interfaces in other modules unless it
> declares
> > >> them as dependencies in pom.xml.
> > >> Sometimes it makes difficult to expose runtime-level behaviors as
> > >> IR-level abstractions, when nemo-compiler-* modules cannot access the
> > >> related definitions in nemo-runtime-* modules.
> > >>
> > >> A good example is  [NEMO-73] 'SchedulingPolicy as
> Vertex-levelExecution
> > >> Property'. The issue is to expose SchedulingPolicy to compiler passes
> so
> > >> that users can configure scheduling behavior using Nemo IR.
> > >> Unfortunately 'SchedulingPolicy' is defined in 'nemo-runtime-common',
> so
> > >> compilers cannot access it and I think the proper fix is to move it to
> > >> the 'nemo-common' module.
> > >> Also, to move 'SchedulingPolicy' to 'nemo-common' module, lots of
> > >> definitions in 'nemo-runtime-common' should also be moved to
> > 'nemo-common'.
> > >> (Because 'SchedulingPolicy' is a function of 'ExecutorRepresenter's
> and
> > >> 'ExecutableTask', and the definition of 'ExecutableTask' depends on
> > >> 'PhysicalStageEdge', 'PhysicalStage', and 'KeyRange', and so on...)
> > >>
> > >> The module 'nemo-runtime-common' has definitions that define
> 'policies',
> > >> as well as definitions that implement 'mechanisms'.
> > >> 'KeyRange' is an example for the former (defines how to partition the
> > >> data in a block), and 'MessageListener' is an example for the latter
> > >> (implements RPC).
> > >> In my opinion, policy-related definitions should be moved to
> > >> nemo-common, so that we can add more ExecutionProperties about
> > >> scheduilng, dynamic optimization, state management, and so on.
> > >>
> > >> What do you think about this issue?
> > >>
> > >> Regards,
> > >> Jangho
> > >>
> > >>
> >
> >
> >
> >
>



-- 
Byung-Gon Chun

Re: [DISCUSS] Merge nemo-runtime-common into nemo-common?

Posted by John Yang <jo...@gmail.com>.
Thanks Jangho for the explanation. I think I have a better understanding of
the problem now.

The SchedulingPolicy we have directly references the 'mechanisms' for
executors and tasks. Nonetheless, I'd define SchedulingPolicy as 'given
some information about a group of IRVertices (Stage) and executors, return
executors that we want to use'.

As you suggested, introducing a Stage interface would be nice, as we can do
something like

filter(StageInterface, list(ExecutorInfo))

which can be implemented and configured in the optimizer. The runtime can
supply appropriate arguments when calling the method, without exposing its
'mechanisms'.

However, I'd be careful about introducing a concrete Stage 'class' in the
optimizer, as that may require modifications of the stage partitioning
logic, and optimization passes that only knows about IRVertices.

Alternatively, and as the first implementation goal, something like this
may be nice.

filter(ExecutionPropertyMap, list(ExecutorInfo))

which gives us the minimal information we need about the group of vertices
and the executors. We can then tag an implementation of this method to each
IRVertice, and perform checks when assigning stage numbers, similar to how
we currently check if vertices of the same stage have the same degree of
parallelism.

After that I guess we can think about using a Stage interface, or providing
additional information for expressing more sophisticated policies such as
straggler handling.

On a minor note, SchedulingPolicy appears to be in the master package, and
not in the runtime-common package.

https://github.com/apache/incubator-nemo/blob/master/runtime/master/src/main/java/edu/snu/nemo/runtime/master/scheduler/SchedulingPolicy.java

John

On Fri, Jun 1, 2018, 5:59 PM JangHo Seo <ja...@jangho.kr> wrote:

> Thanks for the opinion.
>
> Currently some of policy-related code(e.g. SchedulingPolicy) depend on
> mechanism-related code(e.g. PhysicalStage).
> That said, SchedulingPolicy depends on 'declaration' of PhysicalStage,
> not the actual 'implementation' of it.
> And I agree mechanism-related code should stay within nemo-runtime-*
> module.
>
> Maybe we can add PhysicalStage as an interface in nemo-common and rename
> the current PhysicalStage to PhysicalStageImpl, implementing the
> PhyscalStage interface.
>
> Regards,
> Jangho
>
>
> On 06/01/2018 05:49 PM, John Yang wrote:
> > Sounds good to me.
> >
> >
> > I agree that policy-related code should go into 'nemo-common' for the
> > compiler-optimizer to optimize, and the runtime to execute.
> > I guess 'KeyExtractor' and the ir package belong to this category, and
> > that's why they're in 'nemo-common'
> >
> > On the other hand, mechanism-related code should stay in
> > 'nemo-runtime-common' and should be exposed only to the
> > compiler-backend/nemo.
> > Things like 'PhysicalStage' seems to belong to this category, as it'd be
> > unlikely that compiler-optimizer would want to do something with it.
> >
> >
> > Thanks,
> > John
> >
> > On Fri, Jun 1, 2018 at 3:59 PM, JangHo Seo <ja...@apache.org> wrote:
> >
> >> Hi Nemo devs,
> >>
> >> Currently Nemo codebase is splitted into multiple Maven modules.
> >> While it's good for ease of maintenance and modularity, a module cannot
> >> reference classes and/or interfaces in other modules unless it declares
> >> them as dependencies in pom.xml.
> >> Sometimes it makes difficult to expose runtime-level behaviors as
> >> IR-level abstractions, when nemo-compiler-* modules cannot access the
> >> related definitions in nemo-runtime-* modules.
> >>
> >> A good example is  [NEMO-73] 'SchedulingPolicy as Vertex-levelExecution
> >> Property'. The issue is to expose SchedulingPolicy to compiler passes so
> >> that users can configure scheduling behavior using Nemo IR.
> >> Unfortunately 'SchedulingPolicy' is defined in 'nemo-runtime-common', so
> >> compilers cannot access it and I think the proper fix is to move it to
> >> the 'nemo-common' module.
> >> Also, to move 'SchedulingPolicy' to 'nemo-common' module, lots of
> >> definitions in 'nemo-runtime-common' should also be moved to
> 'nemo-common'.
> >> (Because 'SchedulingPolicy' is a function of 'ExecutorRepresenter's and
> >> 'ExecutableTask', and the definition of 'ExecutableTask' depends on
> >> 'PhysicalStageEdge', 'PhysicalStage', and 'KeyRange', and so on...)
> >>
> >> The module 'nemo-runtime-common' has definitions that define 'policies',
> >> as well as definitions that implement 'mechanisms'.
> >> 'KeyRange' is an example for the former (defines how to partition the
> >> data in a block), and 'MessageListener' is an example for the latter
> >> (implements RPC).
> >> In my opinion, policy-related definitions should be moved to
> >> nemo-common, so that we can add more ExecutionProperties about
> >> scheduilng, dynamic optimization, state management, and so on.
> >>
> >> What do you think about this issue?
> >>
> >> Regards,
> >> Jangho
> >>
> >>
>
>
>
>

Re: [DISCUSS] Merge nemo-runtime-common into nemo-common?

Posted by JangHo Seo <ja...@jangho.kr>.
Thanks for the opinion.

Currently some of policy-related code(e.g. SchedulingPolicy) depend on
mechanism-related code(e.g. PhysicalStage).
That said, SchedulingPolicy depends on 'declaration' of PhysicalStage,
not the actual 'implementation' of it.
And I agree mechanism-related code should stay within nemo-runtime-* module.

Maybe we can add PhysicalStage as an interface in nemo-common and rename
the current PhysicalStage to PhysicalStageImpl, implementing the
PhyscalStage interface.

Regards,
Jangho


On 06/01/2018 05:49 PM, John Yang wrote:
> Sounds good to me.
>
>
> I agree that policy-related code should go into 'nemo-common' for the
> compiler-optimizer to optimize, and the runtime to execute.
> I guess 'KeyExtractor' and the ir package belong to this category, and
> that's why they're in 'nemo-common'
>
> On the other hand, mechanism-related code should stay in
> 'nemo-runtime-common' and should be exposed only to the
> compiler-backend/nemo.
> Things like 'PhysicalStage' seems to belong to this category, as it'd be
> unlikely that compiler-optimizer would want to do something with it.
>
>
> Thanks,
> John
>
> On Fri, Jun 1, 2018 at 3:59 PM, JangHo Seo <ja...@apache.org> wrote:
>
>> Hi Nemo devs,
>>
>> Currently Nemo codebase is splitted into multiple Maven modules.
>> While it's good for ease of maintenance and modularity, a module cannot
>> reference classes and/or interfaces in other modules unless it declares
>> them as dependencies in pom.xml.
>> Sometimes it makes difficult to expose runtime-level behaviors as
>> IR-level abstractions, when nemo-compiler-* modules cannot access the
>> related definitions in nemo-runtime-* modules.
>>
>> A good example is  [NEMO-73] 'SchedulingPolicy as Vertex-levelExecution
>> Property'. The issue is to expose SchedulingPolicy to compiler passes so
>> that users can configure scheduling behavior using Nemo IR.
>> Unfortunately 'SchedulingPolicy' is defined in 'nemo-runtime-common', so
>> compilers cannot access it and I think the proper fix is to move it to
>> the 'nemo-common' module.
>> Also, to move 'SchedulingPolicy' to 'nemo-common' module, lots of
>> definitions in 'nemo-runtime-common' should also be moved to 'nemo-common'.
>> (Because 'SchedulingPolicy' is a function of 'ExecutorRepresenter's and
>> 'ExecutableTask', and the definition of 'ExecutableTask' depends on
>> 'PhysicalStageEdge', 'PhysicalStage', and 'KeyRange', and so on...)
>>
>> The module 'nemo-runtime-common' has definitions that define 'policies',
>> as well as definitions that implement 'mechanisms'.
>> 'KeyRange' is an example for the former (defines how to partition the
>> data in a block), and 'MessageListener' is an example for the latter
>> (implements RPC).
>> In my opinion, policy-related definitions should be moved to
>> nemo-common, so that we can add more ExecutionProperties about
>> scheduilng, dynamic optimization, state management, and so on.
>>
>> What do you think about this issue?
>>
>> Regards,
>> Jangho
>>
>>




Re: [DISCUSS] Merge nemo-runtime-common into nemo-common?

Posted by John Yang <jo...@gmail.com>.
Sounds good to me.


I agree that policy-related code should go into 'nemo-common' for the
compiler-optimizer to optimize, and the runtime to execute.
I guess 'KeyExtractor' and the ir package belong to this category, and
that's why they're in 'nemo-common'

On the other hand, mechanism-related code should stay in
'nemo-runtime-common' and should be exposed only to the
compiler-backend/nemo.
Things like 'PhysicalStage' seems to belong to this category, as it'd be
unlikely that compiler-optimizer would want to do something with it.


Thanks,
John

On Fri, Jun 1, 2018 at 3:59 PM, JangHo Seo <ja...@apache.org> wrote:

> Hi Nemo devs,
>
> Currently Nemo codebase is splitted into multiple Maven modules.
> While it's good for ease of maintenance and modularity, a module cannot
> reference classes and/or interfaces in other modules unless it declares
> them as dependencies in pom.xml.
> Sometimes it makes difficult to expose runtime-level behaviors as
> IR-level abstractions, when nemo-compiler-* modules cannot access the
> related definitions in nemo-runtime-* modules.
>
> A good example is  [NEMO-73] 'SchedulingPolicy as Vertex-levelExecution
> Property'. The issue is to expose SchedulingPolicy to compiler passes so
> that users can configure scheduling behavior using Nemo IR.
> Unfortunately 'SchedulingPolicy' is defined in 'nemo-runtime-common', so
> compilers cannot access it and I think the proper fix is to move it to
> the 'nemo-common' module.
> Also, to move 'SchedulingPolicy' to 'nemo-common' module, lots of
> definitions in 'nemo-runtime-common' should also be moved to 'nemo-common'.
> (Because 'SchedulingPolicy' is a function of 'ExecutorRepresenter's and
> 'ExecutableTask', and the definition of 'ExecutableTask' depends on
> 'PhysicalStageEdge', 'PhysicalStage', and 'KeyRange', and so on...)
>
> The module 'nemo-runtime-common' has definitions that define 'policies',
> as well as definitions that implement 'mechanisms'.
> 'KeyRange' is an example for the former (defines how to partition the
> data in a block), and 'MessageListener' is an example for the latter
> (implements RPC).
> In my opinion, policy-related definitions should be moved to
> nemo-common, so that we can add more ExecutionProperties about
> scheduilng, dynamic optimization, state management, and so on.
>
> What do you think about this issue?
>
> Regards,
> Jangho
>
>