You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@amaterasu.apache.org by Arun Manivannan <ar...@arunma.com> on 2018/05/26 07:16:50 UTC

AMATERASU-24

Gentlemen,

I am looking into Amaterasu-24 and would like to run the intended changes
by you before I make them.

Refactor Spark out of Amaterasu executor to it's own project
<https://issues.apache.org/jira/projects/AMATERASU/issues/AMATERASU-24?filter=allopenissues>

I understand Spark is just the first of many frameworks that has been lined
up for support by Amaterasu.

These are the intended changes :

1. Create a new module called "runners" and have the Spark runners under
executor pulled into this project
(org.apache.executor.execution.actions.runners.spark). We could call it
"frameworks" if "runners" is not a great name for this.
2. Will also pull away the Spark dependencies from the Executor to the
respective sub-sub-projects (at the moment, just Spark).
3. Since the result of the framework modules would be different bundles,
the pattern that I am considering to name the bundle is -  "runner-spark".
 So, it would be "runners:runner-spark" in gradle.
4. On the shell scripts (miniconda and load-spark-env") and the "-cp"
passed as commands for the ActionsExecutorLauncher, I could pull them as a
separate properties of Spark (inside the runner), so that the Application
master can use it.

Is it okay if I rename the Miniconda install file to miniconda-install
using the "wget -O".  The reason why this change is proposed is to avoid
hardcoding the conda version inside the code and possibly pull it away into
amaterasu.properties file. (The changes are in the ama-start shell scripts
and a couple of places inside the code).

Please let me know if this would work.

Cheers,
Arun

Re: AMATERASU-24

Posted by Arun Manivannan <ar...@arunma.com>.
Hi Yaniv, Nadiv and all,

Not sure if PR is the best way to initiate a discussion on the code but I
just managed to raise one based on my forked remote branch.

https://github.com/apache/incubator-amaterasu/pull/22
https://github.com/arunma/incubator-amaterasu/tree/AMATERASU-24-FrameworkRefactor

I am working on my gradle skills but I've made all the sub-modules compile
and the testcases run.  I am pretty sure the proper runs wouldn't work
because, at the moment, only the executor is set in the classpath.

Let the comment games begin :-)

Regards,
Arun

On Sun, May 27, 2018 at 4:43 PM Arun Manivannan <ar...@arunma.com> wrote:

> Hi Yaniv,
>
> Makes perfect sense. Non-JVM frameworks is something I hadn't considered.
> I haven't done something like this in the past and would require your
> guidance.  Would it be okay if we have a discussion over a meeting?
>
> Two modules - Yes. I have made changes in that direction but have kept
> both the runner and the runtime in the same module under different
> packages.  I'll make some final touches and get you the branch for review
> and discussion.  For now, these are the primary focus areas:
>
> 1. Executors still have the yarn and mesos dependency but not of Spark (I
> believe that needs some work as well considering the non-JVM frameworks)
> 2. Runners and Runtime modules has the spark dependency pulled into them
> (At the moment, both are unified under the same module but different
> packages)
>
> I still see Scala (libs/reflect/compiler) bundled in both the modules.
> This is a concern and needs some work on gradle.
>
> On the shell changes, I wasn't very sure whether I am on the right track.
> Thanks for clarifying.  I'll probably close the shell script PR after
> discussing with Nadav.
>
> Cheers,
> Arun
>
>
> On Sun, May 27, 2018 at 3:53 AM Yaniv Rodenski <ya...@shinto.io> wrote:
>
>> Hi Arun,
>>
>> You are correct Spark is the first framework, and in my mind,
>> frameworks should be treated as plugins. Also, we need to consider that
>> not
>> all frameworks will run under the JVM.
>> Last, each framework has two modules, a runner (used by both the executor
>> and the leader) and runtime, to be used by the actions themselves
>> I would suggest the following structure to start with:
>> frameworks
>>   |-> spark
>>       |-> runner
>>       |-> runtime
>>
>> As for the shell scripts, I will leave that for @Nadav, but please have a
>> look at PR #17 containing the CLI that will replace the scripts as of
>> 0.2.1-incubating.
>>
>> Cheers,
>> Yaniv
>>
>> On Sat, May 26, 2018 at 5:16 PM, Arun Manivannan <ar...@arunma.com> wrote:
>>
>> > Gentlemen,
>> >
>> > I am looking into Amaterasu-24 and would like to run the intended
>> changes
>> > by you before I make them.
>> >
>> > Refactor Spark out of Amaterasu executor to it's own project
>> > <https://issues.apache.org/jira/projects/AMATERASU/
>> > issues/AMATERASU-24?filter=allopenissues>
>> >
>> > I understand Spark is just the first of many frameworks that has been
>> lined
>> > up for support by Amaterasu.
>> >
>> > These are the intended changes :
>> >
>> > 1. Create a new module called "runners" and have the Spark runners under
>> > executor pulled into this project
>> > (org.apache.executor.execution.actions.runners.spark). We could call it
>> > "frameworks" if "runners" is not a great name for this.
>> > 2. Will also pull away the Spark dependencies from the Executor to the
>> > respective sub-sub-projects (at the moment, just Spark).
>> > 3. Since the result of the framework modules would be different bundles,
>> > the pattern that I am considering to name the bundle is -
>> "runner-spark".
>> >  So, it would be "runners:runner-spark" in gradle.
>> > 4. On the shell scripts (miniconda and load-spark-env") and the "-cp"
>> > passed as commands for the ActionsExecutorLauncher, I could pull them
>> as a
>> > separate properties of Spark (inside the runner), so that the
>> Application
>> > master can use it.
>> >
>> > Is it okay if I rename the Miniconda install file to miniconda-install
>> > using the "wget -O".  The reason why this change is proposed is to avoid
>> > hardcoding the conda version inside the code and possibly pull it away
>> into
>> > amaterasu.properties file. (The changes are in the ama-start shell
>> scripts
>> > and a couple of places inside the code).
>> >
>> > Please let me know if this would work.
>> >
>> > Cheers,
>> > Arun
>> >
>>
>>
>>
>> --
>> Yaniv Rodenski
>>
>> +61 477 778 405 <+61%20477%20778%20405>
>> yaniv@shinto.io
>>
>

Re: AMATERASU-24

Posted by Arun Manivannan <ar...@arunma.com>.
Hi Yaniv,

Makes perfect sense. Non-JVM frameworks is something I hadn't considered.
I haven't done something like this in the past and would require your
guidance.  Would it be okay if we have a discussion over a meeting?

Two modules - Yes. I have made changes in that direction but have kept both
the runner and the runtime in the same module under different packages.
I'll make some final touches and get you the branch for review and
discussion.  For now, these are the primary focus areas:

1. Executors still have the yarn and mesos dependency but not of Spark (I
believe that needs some work as well considering the non-JVM frameworks)
2. Runners and Runtime modules has the spark dependency pulled into them
(At the moment, both are unified under the same module but different
packages)

I still see Scala (libs/reflect/compiler) bundled in both the modules. This
is a concern and needs some work on gradle.

On the shell changes, I wasn't very sure whether I am on the right track.
Thanks for clarifying.  I'll probably close the shell script PR after
discussing with Nadav.

Cheers,
Arun

On Sun, May 27, 2018 at 3:53 AM Yaniv Rodenski <ya...@shinto.io> wrote:

> Hi Arun,
>
> You are correct Spark is the first framework, and in my mind,
> frameworks should be treated as plugins. Also, we need to consider that not
> all frameworks will run under the JVM.
> Last, each framework has two modules, a runner (used by both the executor
> and the leader) and runtime, to be used by the actions themselves
> I would suggest the following structure to start with:
> frameworks
>   |-> spark
>       |-> runner
>       |-> runtime
>
> As for the shell scripts, I will leave that for @Nadav, but please have a
> look at PR #17 containing the CLI that will replace the scripts as of
> 0.2.1-incubating.
>
> Cheers,
> Yaniv
>
> On Sat, May 26, 2018 at 5:16 PM, Arun Manivannan <ar...@arunma.com> wrote:
>
> > Gentlemen,
> >
> > I am looking into Amaterasu-24 and would like to run the intended changes
> > by you before I make them.
> >
> > Refactor Spark out of Amaterasu executor to it's own project
> > <https://issues.apache.org/jira/projects/AMATERASU/
> > issues/AMATERASU-24?filter=allopenissues>
> >
> > I understand Spark is just the first of many frameworks that has been
> lined
> > up for support by Amaterasu.
> >
> > These are the intended changes :
> >
> > 1. Create a new module called "runners" and have the Spark runners under
> > executor pulled into this project
> > (org.apache.executor.execution.actions.runners.spark). We could call it
> > "frameworks" if "runners" is not a great name for this.
> > 2. Will also pull away the Spark dependencies from the Executor to the
> > respective sub-sub-projects (at the moment, just Spark).
> > 3. Since the result of the framework modules would be different bundles,
> > the pattern that I am considering to name the bundle is -
> "runner-spark".
> >  So, it would be "runners:runner-spark" in gradle.
> > 4. On the shell scripts (miniconda and load-spark-env") and the "-cp"
> > passed as commands for the ActionsExecutorLauncher, I could pull them as
> a
> > separate properties of Spark (inside the runner), so that the Application
> > master can use it.
> >
> > Is it okay if I rename the Miniconda install file to miniconda-install
> > using the "wget -O".  The reason why this change is proposed is to avoid
> > hardcoding the conda version inside the code and possibly pull it away
> into
> > amaterasu.properties file. (The changes are in the ama-start shell
> scripts
> > and a couple of places inside the code).
> >
> > Please let me know if this would work.
> >
> > Cheers,
> > Arun
> >
>
>
>
> --
> Yaniv Rodenski
>
> +61 477 778 405 <+61%20477%20778%20405>
> yaniv@shinto.io
>

Re: AMATERASU-24

Posted by Arun Manivannan <ar...@arunma.com>.
Hi Nadav,

Absolutely. This sounds fantastic.  I am really keen to understand more on
this and would be happy to help where I can to move this forward.

Yes, It would be great if we could have a different discussion thread on
this (or preferably a meeting).  Let me quickly push a branch for your
review and progressively make the necessary changes on this.

Cheers,
Arun

On Sun, May 27, 2018 at 12:29 PM Nadav Har Tzvi <na...@gmail.com>
wrote:

> I agree with Yaniv that Frameworks should be plugins.
> Think about it like this, in the future, hopefully, you will be able to do
> something like "sudo yum install amaterasu"
> After install the "core" amaterasu using yum, you will be able to use the
> new CLI like this: "ama frameworks add <your favorite framework>" to add a
> framework.
> Alternatively we could do something like "sudo yum install amaterasu-spark"
> I mean, this is what I think anyhow.
>
> As I write this, I've just realized that we should open a thread to discuss
> packaging options that we'd like to see implemented.
>
> On 26 May 2018 at 22:53, Yaniv Rodenski <ya...@shinto.io> wrote:
>
> > Hi Arun,
> >
> > You are correct Spark is the first framework, and in my mind,
> > frameworks should be treated as plugins. Also, we need to consider that
> not
> > all frameworks will run under the JVM.
> > Last, each framework has two modules, a runner (used by both the executor
> > and the leader) and runtime, to be used by the actions themselves
> > I would suggest the following structure to start with:
> > frameworks
> >   |-> spark
> >       |-> runner
> >       |-> runtime
> >
> > As for the shell scripts, I will leave that for @Nadav, but please have a
> > look at PR #17 containing the CLI that will replace the scripts as of
> > 0.2.1-incubating.
> >
> > Cheers,
> > Yaniv
> >
> > On Sat, May 26, 2018 at 5:16 PM, Arun Manivannan <ar...@arunma.com>
> wrote:
> >
> > > Gentlemen,
> > >
> > > I am looking into Amaterasu-24 and would like to run the intended
> changes
> > > by you before I make them.
> > >
> > > Refactor Spark out of Amaterasu executor to it's own project
> > > <https://issues.apache.org/jira/projects/AMATERASU/
> > > issues/AMATERASU-24?filter=allopenissues>
> > >
> > > I understand Spark is just the first of many frameworks that has been
> > lined
> > > up for support by Amaterasu.
> > >
> > > These are the intended changes :
> > >
> > > 1. Create a new module called "runners" and have the Spark runners
> under
> > > executor pulled into this project
> > > (org.apache.executor.execution.actions.runners.spark). We could call it
> > > "frameworks" if "runners" is not a great name for this.
> > > 2. Will also pull away the Spark dependencies from the Executor to the
> > > respective sub-sub-projects (at the moment, just Spark).
> > > 3. Since the result of the framework modules would be different
> bundles,
> > > the pattern that I am considering to name the bundle is -
> > "runner-spark".
> > >  So, it would be "runners:runner-spark" in gradle.
> > > 4. On the shell scripts (miniconda and load-spark-env") and the "-cp"
> > > passed as commands for the ActionsExecutorLauncher, I could pull them
> as
> > a
> > > separate properties of Spark (inside the runner), so that the
> Application
> > > master can use it.
> > >
> > > Is it okay if I rename the Miniconda install file to miniconda-install
> > > using the "wget -O".  The reason why this change is proposed is to
> avoid
> > > hardcoding the conda version inside the code and possibly pull it away
> > into
> > > amaterasu.properties file. (The changes are in the ama-start shell
> > scripts
> > > and a couple of places inside the code).
> > >
> > > Please let me know if this would work.
> > >
> > > Cheers,
> > > Arun
> > >
> >
> >
> >
> > --
> > Yaniv Rodenski
> >
> > +61 477 778 405 <+61%20477%20778%20405>
> > yaniv@shinto.io
> >
>

Re: AMATERASU-24

Posted by Nadav Har Tzvi <na...@gmail.com>.
I agree with Yaniv that Frameworks should be plugins.
Think about it like this, in the future, hopefully, you will be able to do
something like "sudo yum install amaterasu"
After install the "core" amaterasu using yum, you will be able to use the
new CLI like this: "ama frameworks add <your favorite framework>" to add a
framework.
Alternatively we could do something like "sudo yum install amaterasu-spark"
I mean, this is what I think anyhow.

As I write this, I've just realized that we should open a thread to discuss
packaging options that we'd like to see implemented.

On 26 May 2018 at 22:53, Yaniv Rodenski <ya...@shinto.io> wrote:

> Hi Arun,
>
> You are correct Spark is the first framework, and in my mind,
> frameworks should be treated as plugins. Also, we need to consider that not
> all frameworks will run under the JVM.
> Last, each framework has two modules, a runner (used by both the executor
> and the leader) and runtime, to be used by the actions themselves
> I would suggest the following structure to start with:
> frameworks
>   |-> spark
>       |-> runner
>       |-> runtime
>
> As for the shell scripts, I will leave that for @Nadav, but please have a
> look at PR #17 containing the CLI that will replace the scripts as of
> 0.2.1-incubating.
>
> Cheers,
> Yaniv
>
> On Sat, May 26, 2018 at 5:16 PM, Arun Manivannan <ar...@arunma.com> wrote:
>
> > Gentlemen,
> >
> > I am looking into Amaterasu-24 and would like to run the intended changes
> > by you before I make them.
> >
> > Refactor Spark out of Amaterasu executor to it's own project
> > <https://issues.apache.org/jira/projects/AMATERASU/
> > issues/AMATERASU-24?filter=allopenissues>
> >
> > I understand Spark is just the first of many frameworks that has been
> lined
> > up for support by Amaterasu.
> >
> > These are the intended changes :
> >
> > 1. Create a new module called "runners" and have the Spark runners under
> > executor pulled into this project
> > (org.apache.executor.execution.actions.runners.spark). We could call it
> > "frameworks" if "runners" is not a great name for this.
> > 2. Will also pull away the Spark dependencies from the Executor to the
> > respective sub-sub-projects (at the moment, just Spark).
> > 3. Since the result of the framework modules would be different bundles,
> > the pattern that I am considering to name the bundle is -
> "runner-spark".
> >  So, it would be "runners:runner-spark" in gradle.
> > 4. On the shell scripts (miniconda and load-spark-env") and the "-cp"
> > passed as commands for the ActionsExecutorLauncher, I could pull them as
> a
> > separate properties of Spark (inside the runner), so that the Application
> > master can use it.
> >
> > Is it okay if I rename the Miniconda install file to miniconda-install
> > using the "wget -O".  The reason why this change is proposed is to avoid
> > hardcoding the conda version inside the code and possibly pull it away
> into
> > amaterasu.properties file. (The changes are in the ama-start shell
> scripts
> > and a couple of places inside the code).
> >
> > Please let me know if this would work.
> >
> > Cheers,
> > Arun
> >
>
>
>
> --
> Yaniv Rodenski
>
> +61 477 778 405
> yaniv@shinto.io
>

Re: AMATERASU-24

Posted by Yaniv Rodenski <ya...@shinto.io>.
Hi Arun,

You are correct Spark is the first framework, and in my mind,
frameworks should be treated as plugins. Also, we need to consider that not
all frameworks will run under the JVM.
Last, each framework has two modules, a runner (used by both the executor
and the leader) and runtime, to be used by the actions themselves
I would suggest the following structure to start with:
frameworks
  |-> spark
      |-> runner
      |-> runtime

As for the shell scripts, I will leave that for @Nadav, but please have a
look at PR #17 containing the CLI that will replace the scripts as of
0.2.1-incubating.

Cheers,
Yaniv

On Sat, May 26, 2018 at 5:16 PM, Arun Manivannan <ar...@arunma.com> wrote:

> Gentlemen,
>
> I am looking into Amaterasu-24 and would like to run the intended changes
> by you before I make them.
>
> Refactor Spark out of Amaterasu executor to it's own project
> <https://issues.apache.org/jira/projects/AMATERASU/
> issues/AMATERASU-24?filter=allopenissues>
>
> I understand Spark is just the first of many frameworks that has been lined
> up for support by Amaterasu.
>
> These are the intended changes :
>
> 1. Create a new module called "runners" and have the Spark runners under
> executor pulled into this project
> (org.apache.executor.execution.actions.runners.spark). We could call it
> "frameworks" if "runners" is not a great name for this.
> 2. Will also pull away the Spark dependencies from the Executor to the
> respective sub-sub-projects (at the moment, just Spark).
> 3. Since the result of the framework modules would be different bundles,
> the pattern that I am considering to name the bundle is -  "runner-spark".
>  So, it would be "runners:runner-spark" in gradle.
> 4. On the shell scripts (miniconda and load-spark-env") and the "-cp"
> passed as commands for the ActionsExecutorLauncher, I could pull them as a
> separate properties of Spark (inside the runner), so that the Application
> master can use it.
>
> Is it okay if I rename the Miniconda install file to miniconda-install
> using the "wget -O".  The reason why this change is proposed is to avoid
> hardcoding the conda version inside the code and possibly pull it away into
> amaterasu.properties file. (The changes are in the ama-start shell scripts
> and a couple of places inside the code).
>
> Please let me know if this would work.
>
> Cheers,
> Arun
>



-- 
Yaniv Rodenski

+61 477 778 405
yaniv@shinto.io