You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-dev@hadoop.apache.org by Wangda Tan <wh...@gmail.com> on 2019/01/31 18:53:03 UTC

[DISCUSS] Making submarine to different release model like Ozone

Hi devs,

Since we started submarine-related effort last year, we received a lot of
feedbacks, several companies (such as Netease, China Mobile, etc.) are
trying to deploy Submarine to their Hadoop cluster along with big data
workloads. Linkedin also has big interests to contribute a Submarine TonY (
https://github.com/linkedin/TonY) runtime to allow users to use the same
interface.

From what I can see, there're several issues of putting Submarine under
yarn-applications directory and have same release cycle with Hadoop:

1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
2019. Because of non-predictable blockers and security issues, it got
delayed a lot. We need to iterate submarine fast at this point.

2) We also see a lot of requirements to use Submarine on older Hadoop
releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
short time, but the requirement to run deep learning is urgent to them. We
should decouple Submarine from Hadoop version.

And why we wanna to keep it within Hadoop? First, Submarine included some
innovation parts such as enhancements of user experiences for YARN
services/containerization support which we can add it back to Hadoop later
to address common requirements. In addition to that, we have a big overlap
in the community developing and using it.

There're several proposals we have went through during Ozone merge to trunk
discussion:
https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E

I propose to adopt Ozone model: which is the same master branch, different
release cycle, and different release branch. It is a great example to show
agile release we can do (2 Ozone releases after Oct 2018) with less
overhead to setup CI, projects, etc.

*Links:*
- JIRA: https://issues.apache.org/jira/browse/YARN-8135
- Design doc
<https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit>
- User doc
<https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html>
(3.2.0
release)
- Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
<https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/>,
(Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
- Talks: Strata Data Conf NY
<https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289>

Thoughts?

Thanks,
Wangda Tan

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Rohith Sharma K S <ro...@apache.org>.

+1, Few interested ML/DL folks from Banglore asked about Submarine release
for trying out TensorFlow on YARN. We told them wait for release since they
were not ready to use trunk. I see agile release cycle for Submarine brings
lot of added value.

-Rohith Sharma K S

On Fri, 1 Feb 2019 at 00:34, Wangda Tan <wh...@gmail.com> wrote:

> Hi devs,
>
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
>
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
>
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
>
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
>
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
>
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
>
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
>
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> - User doc
> <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>
> Thoughts?
>
> Thanks,
> Wangda Tan
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Sunil G <su...@apache.org>.

+1 from me on this.
ML/DL is one of the fast growing areas and a runtime on YARN helps customers
to have ML/DL workloads to run on same cluster where the ETL or other
traditional
big data workloads ingest or mine data.
Faster release cadence can pace up the development for Submarine and more
agile
to run in older hadoop version without any upgrade efforts.

- Sunil



On Fri, Feb 1, 2019 at 12:34 AM Wangda Tan <wh...@gmail.com> wrote:

> Hi devs,
>
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
>
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
>
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
>
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
>
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
>
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
>
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
>
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> - User doc
> <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>
> Thoughts?
>
> Thanks,
> Wangda Tan
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by "Elek, Marton" <el...@apache.org>.

+1.

I like the idea.

For me, submarine/ML-job-execution seems to be a natural extension of
the existing Hadoop/Yarn capabilities.

And like the proposed project structure / release lifecycle, too. I
think it's better to be more modularized but keep the development in the
same project. IMHO it worked well with the Ozone releases. We can do
more frequent releases and support multiple versions of core hadoop but
the tested new improvements could be moved back to the hadoop-common.

Marton

On 1/31/19 7:53 PM, Wangda Tan wrote:
> Hi devs,
> 
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
> 
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
> 
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
> 
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
> 
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
> 
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> 
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
> 
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit>
> - User doc
> <https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html>
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/>,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289>
> 
> Thoughts?
> 
> Thanks,
> Wangda Tan
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Yang Jiandan <su...@gmail.com>.

+1. I'm from DiDi, and we plan to deploy Submarine. Making Submarine to
different release is more agile, and we'd like to join the develop with
community.

runlin zhang <ru...@gmail.com> 于2019年2月1日周五 上午10:31写道：

> +1, It is very necessary  to use Submarine on older  Hadoop.  What's more,
> the development of deep learning is too fast,  and Submarine  must  keep
> faster release iterate .
>
> > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
> >
> > Hi devs,
> >
> > Since we started submarine-related effort last year, we received a lot of
> > feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> > trying to deploy Submarine to their Hadoop cluster along with big data
> > workloads. Linkedin also has big interests to contribute a Submarine
> TonY (
> > https://github.com/linkedin/TonY) runtime to allow users to use the same
> > interface.
> >
> > From what I can see, there're several issues of putting Submarine under
> > yarn-applications directory and have same release cycle with Hadoop:
> >
> > 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> > 2019. Because of non-predictable blockers and security issues, it got
> > delayed a lot. We need to iterate submarine fast at this point.
> >
> > 2) We also see a lot of requirements to use Submarine on older Hadoop
> > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> > short time, but the requirement to run deep learning is urgent to them.
> We
> > should decouple Submarine from Hadoop version.
> >
> > And why we wanna to keep it within Hadoop? First, Submarine included some
> > innovation parts such as enhancements of user experiences for YARN
> > services/containerization support which we can add it back to Hadoop
> later
> > to address common requirements. In addition to that, we have a big
> overlap
> > in the community developing and using it.
> >
> > There're several proposals we have went through during Ozone merge to
> trunk
> > discussion:
> >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> >
> > I propose to adopt Ozone model: which is the same master branch,
> different
> > release cycle, and different release branch. It is a great example to
> show
> > agile release we can do (2 Ozone releases after Oct 2018) with less
> > overhead to setup CI, projects, etc.
> >
> > *Links:*
> > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> > - Design doc
> > <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> > - User doc
> > <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> > (3.2.0
> > release)
> > - Blogposts, {Submarine} : Running deep learning workloads on Apache
> Hadoop
> > <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> > - Talks: Strata Data Conf NY
> > <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
> >
> > Thoughts?
> >
> > Thanks,
> > Wangda Tan
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Yang Jiandan <su...@gmail.com>.

+1. I'm from DiDi, and we plan to deploy Submarine. Making Submarine to
different release is more agile, and we'd like to join the develop with
community.

runlin zhang <ru...@gmail.com> 于2019年2月1日周五 上午10:31写道：

> +1, It is very necessary  to use Submarine on older  Hadoop.  What's more,
> the development of deep learning is too fast,  and Submarine  must  keep
> faster release iterate .
>
> > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
> >
> > Hi devs,
> >
> > Since we started submarine-related effort last year, we received a lot of
> > feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> > trying to deploy Submarine to their Hadoop cluster along with big data
> > workloads. Linkedin also has big interests to contribute a Submarine
> TonY (
> > https://github.com/linkedin/TonY) runtime to allow users to use the same
> > interface.
> >
> > From what I can see, there're several issues of putting Submarine under
> > yarn-applications directory and have same release cycle with Hadoop:
> >
> > 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> > 2019. Because of non-predictable blockers and security issues, it got
> > delayed a lot. We need to iterate submarine fast at this point.
> >
> > 2) We also see a lot of requirements to use Submarine on older Hadoop
> > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> > short time, but the requirement to run deep learning is urgent to them.
> We
> > should decouple Submarine from Hadoop version.
> >
> > And why we wanna to keep it within Hadoop? First, Submarine included some
> > innovation parts such as enhancements of user experiences for YARN
> > services/containerization support which we can add it back to Hadoop
> later
> > to address common requirements. In addition to that, we have a big
> overlap
> > in the community developing and using it.
> >
> > There're several proposals we have went through during Ozone merge to
> trunk
> > discussion:
> >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> >
> > I propose to adopt Ozone model: which is the same master branch,
> different
> > release cycle, and different release branch. It is a great example to
> show
> > agile release we can do (2 Ozone releases after Oct 2018) with less
> > overhead to setup CI, projects, etc.
> >
> > *Links:*
> > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> > - Design doc
> > <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> > - User doc
> > <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> > (3.2.0
> > release)
> > - Blogposts, {Submarine} : Running deep learning workloads on Apache
> Hadoop
> > <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> > - Talks: Strata Data Conf NY
> > <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
> >
> > Thoughts?
> >
> > Thanks,
> > Wangda Tan
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by runlin zhang <ru...@gmail.com>.

+1, It is very necessary  to use Submarine on older  Hadoop.  What's more, the development of deep learning is too fast,  and Submarine  must  keep faster release iterate .

> 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
> 
> Hi devs,
> 
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
> 
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
> 
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
> 
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
> 
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
> 
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> 
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
> 
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit>
> - User doc
> <https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html>
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/>,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289>
> 
> Thoughts?
> 
> Thanks,
> Wangda Tan


---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Sunil G <su...@apache.org>.

+1 from me on this.
ML/DL is one of the fast growing areas and a runtime on YARN helps customers
to have ML/DL workloads to run on same cluster where the ETL or other
traditional
big data workloads ingest or mine data.
Faster release cadence can pace up the development for Submarine and more
agile
to run in older hadoop version without any upgrade efforts.

- Sunil



On Fri, Feb 1, 2019 at 12:34 AM Wangda Tan <wh...@gmail.com> wrote:

> Hi devs,
>
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
>
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
>
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
>
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
>
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
>
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
>
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
>
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> - User doc
> <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>
> Thoughts?
>
> Thanks,
> Wangda Tan
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by runlin zhang <ru...@gmail.com>.

+1, It is very necessary  to use Submarine on older  Hadoop.  What's more, the development of deep learning is too fast,  and Submarine  must  keep faster release iterate .

> 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
> 
> Hi devs,
> 
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
> 
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
> 
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
> 
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
> 
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
> 
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> 
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
> 
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit>
> - User doc
> <https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html>
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/>,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289>
> 
> Thoughts?
> 
> Thanks,
> Wangda Tan


---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Rohith Sharma K S <ro...@apache.org>.

+1, Few interested ML/DL folks from Banglore asked about Submarine release
for trying out TensorFlow on YARN. We told them wait for release since they
were not ready to use trunk. I see agile release cycle for Submarine brings
lot of added value.

-Rohith Sharma K S

On Fri, 1 Feb 2019 at 00:34, Wangda Tan <wh...@gmail.com> wrote:

> Hi devs,
>
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
>
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
>
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
>
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
>
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
>
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
>
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
>
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> - User doc
> <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>
> Thoughts?
>
> Thanks,
> Wangda Tan
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Oliver Hu <ol...@gmail.com>.

+1.

I'm one of the authors of LinkedIn TonY. We'd love to join the upstream
force to build a deep learning ecosystem around Hadoop. As mentioned by
Wangda, coupling submarine with YARN code base introduced a lot of
unnecessary complexities for us to integrate our use case with upstream,
for example: circular dependencies, a hard dependency on Hadoop 3.x block
us from adopting upstream features for near term. Architecture wise, it
also makes more sense to make YARN component more pure and focus on
resource scheduling.

ML ecosystem is moving extremely fast, having a different release cycles
and keep up the momentum would benefit a lot.

On Thu, Jan 31, 2019 at 4:05 PM Arun Suresh <as...@apache.org> wrote:

> Thanks for bringing this up Wangda. +1
> Makes a lot of sense to have Submarine follow its own release cadence - for
> all the reasons you outlined.
> I would one up this proposal to ask why shouldn't we allow YARN to have its
> own releases as well - but that is for a separate thread :)
>
> Cheers
> -Arun
>
> On Thu, Jan 31, 2019 at 11:04 AM Wangda Tan <wh...@gmail.com> wrote:
>
> > Hi devs,
> >
> > Since we started submarine-related effort last year, we received a lot of
> > feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> > trying to deploy Submarine to their Hadoop cluster along with big data
> > workloads. Linkedin also has big interests to contribute a Submarine
> TonY (
> > https://github.com/linkedin/TonY) runtime to allow users to use the same
> > interface.
> >
> > From what I can see, there're several issues of putting Submarine under
> > yarn-applications directory and have same release cycle with Hadoop:
> >
> > 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> > 2019. Because of non-predictable blockers and security issues, it got
> > delayed a lot. We need to iterate submarine fast at this point.
> >
> > 2) We also see a lot of requirements to use Submarine on older Hadoop
> > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> > short time, but the requirement to run deep learning is urgent to them.
> We
> > should decouple Submarine from Hadoop version.
> >
> > And why we wanna to keep it within Hadoop? First, Submarine included some
> > innovation parts such as enhancements of user experiences for YARN
> > services/containerization support which we can add it back to Hadoop
> later
> > to address common requirements. In addition to that, we have a big
> overlap
> > in the community developing and using it.
> >
> > There're several proposals we have went through during Ozone merge to
> trunk
> > discussion:
> >
> >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> >
> > I propose to adopt Ozone model: which is the same master branch,
> different
> > release cycle, and different release branch. It is a great example to
> show
> > agile release we can do (2 Ozone releases after Oct 2018) with less
> > overhead to setup CI, projects, etc.
> >
> > *Links:*
> > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> > - Design doc
> > <
> >
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> > >
> > - User doc
> > <
> >
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> > >
> > (3.2.0
> > release)
> > - Blogposts, {Submarine} : Running deep learning workloads on Apache
> Hadoop
> > <
> >
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> > >,
> > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> > - Talks: Strata Data Conf NY
> > <
> >
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> > >
> >
> > Thoughts?
> >
> > Thanks,
> > Wangda Tan
> >
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Arun Suresh <as...@apache.org>.

Thanks for bringing this up Wangda. +1
Makes a lot of sense to have Submarine follow its own release cadence - for
all the reasons you outlined.
I would one up this proposal to ask why shouldn't we allow YARN to have its
own releases as well - but that is for a separate thread :)

Cheers
-Arun

On Thu, Jan 31, 2019 at 11:04 AM Wangda Tan <wh...@gmail.com> wrote:

> Hi devs,
>
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
>
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
>
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
>
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
>
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
>
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
>
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
>
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> - User doc
> <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>
> Thoughts?
>
> Thanks,
> Wangda Tan
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Arun Suresh <as...@apache.org>.

Thanks for bringing this up Wangda. +1
Makes a lot of sense to have Submarine follow its own release cadence - for
all the reasons you outlined.
I would one up this proposal to ask why shouldn't we allow YARN to have its
own releases as well - but that is for a separate thread :)

Cheers
-Arun

On Thu, Jan 31, 2019 at 11:04 AM Wangda Tan <wh...@gmail.com> wrote:

> Hi devs,
>
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
>
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
>
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
>
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
>
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
>
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
>
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
>
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> - User doc
> <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>
> Thoughts?
>
> Thanks,
> Wangda Tan
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by "Elek, Marton" <el...@apache.org>.

+1.

I like the idea.

For me, submarine/ML-job-execution seems to be a natural extension of
the existing Hadoop/Yarn capabilities.

And like the proposed project structure / release lifecycle, too. I
think it's better to be more modularized but keep the development in the
same project. IMHO it worked well with the Ozone releases. We can do
more frequent releases and support multiple versions of core hadoop but
the tested new improvements could be moved back to the hadoop-common.

Marton

On 1/31/19 7:53 PM, Wangda Tan wrote:
> Hi devs,
> 
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
> 
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
> 
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
> 
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
> 
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
> 
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> 
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
> 
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit>
> - User doc
> <https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html>
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/>,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289>
> 
> Thoughts?
> 
> Thanks,
> Wangda Tan
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Anu Engineer <ae...@hortonworks.com>.

>> I propose to adopt Ozone model: which is the same master branch, different
>> release cycle, and different release branch. It is a great example to show
>> agile release we can do (2 Ozone releases after Oct 2018) with less
>> overhead to setup CI, projects, etc.


I second this, especially this allows Submarine to be used by Hadoop users without having
To upgrade to new versions. The new changes in Submarine can be used and tested by 
End-users much faster with this model.

A resounding +1 from me, based on experiences from ozone.

Thanks
Anu




On 1/31/19, 11:52 AM, "Jonathan Hung" <jy...@gmail.com> wrote:

    +1. This is important for improving the deep learning on hadoop story.
    There's recently a lot of momentum for this, and decoupling
    submarine/hadoop will help it continue.
    
    Jonathan Hung
    
    
    On Thu, Jan 31, 2019 at 11:04 AM Wangda Tan <wh...@gmail.com> wrote:
    
    > Hi devs,
    >
    > Since we started submarine-related effort last year, we received a lot of
    > feedbacks, several companies (such as Netease, China Mobile, etc.)  are
    > trying to deploy Submarine to their Hadoop cluster along with big data
    > workloads. Linkedin also has big interests to contribute a Submarine TonY (
    > https://github.com/linkedin/TonY) runtime to allow users to use the same
    > interface.
    >
    > From what I can see, there're several issues of putting Submarine under
    > yarn-applications directory and have same release cycle with Hadoop:
    >
    > 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
    > 2019. Because of non-predictable blockers and security issues, it got
    > delayed a lot. We need to iterate submarine fast at this point.
    >
    > 2) We also see a lot of requirements to use Submarine on older Hadoop
    > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
    > short time, but the requirement to run deep learning is urgent to them. We
    > should decouple Submarine from Hadoop version.
    >
    > And why we wanna to keep it within Hadoop? First, Submarine included some
    > innovation parts such as enhancements of user experiences for YARN
    > services/containerization support which we can add it back to Hadoop later
    > to address common requirements. In addition to that, we have a big overlap
    > in the community developing and using it.
    >
    > There're several proposals we have went through during Ozone merge to trunk
    > discussion:
    >
    > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
    >
    > I propose to adopt Ozone model: which is the same master branch, different
    > release cycle, and different release branch. It is a great example to show
    > agile release we can do (2 Ozone releases after Oct 2018) with less
    > overhead to setup CI, projects, etc.
    >
    > *Links:*
    > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
    > - Design doc
    > <
    > https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
    > >
    > - User doc
    > <
    > https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
    > >
    > (3.2.0
    > release)
    > - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
    > <
    > https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
    > >,
    > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
    > - Talks: Strata Data Conf NY
    > <
    > https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
    > >
    >
    > Thoughts?
    >
    > Thanks,
    > Wangda Tan
    >

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Anu Engineer <ae...@hortonworks.com>.

>> I propose to adopt Ozone model: which is the same master branch, different
>> release cycle, and different release branch. It is a great example to show
>> agile release we can do (2 Ozone releases after Oct 2018) with less
>> overhead to setup CI, projects, etc.


I second this, especially this allows Submarine to be used by Hadoop users without having
To upgrade to new versions. The new changes in Submarine can be used and tested by 
End-users much faster with this model.

A resounding +1 from me, based on experiences from ozone.

Thanks
Anu




On 1/31/19, 11:52 AM, "Jonathan Hung" <jy...@gmail.com> wrote:

    +1. This is important for improving the deep learning on hadoop story.
    There's recently a lot of momentum for this, and decoupling
    submarine/hadoop will help it continue.
    
    Jonathan Hung
    
    
    On Thu, Jan 31, 2019 at 11:04 AM Wangda Tan <wh...@gmail.com> wrote:
    
    > Hi devs,
    >
    > Since we started submarine-related effort last year, we received a lot of
    > feedbacks, several companies (such as Netease, China Mobile, etc.)  are
    > trying to deploy Submarine to their Hadoop cluster along with big data
    > workloads. Linkedin also has big interests to contribute a Submarine TonY (
    > https://github.com/linkedin/TonY) runtime to allow users to use the same
    > interface.
    >
    > From what I can see, there're several issues of putting Submarine under
    > yarn-applications directory and have same release cycle with Hadoop:
    >
    > 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
    > 2019. Because of non-predictable blockers and security issues, it got
    > delayed a lot. We need to iterate submarine fast at this point.
    >
    > 2) We also see a lot of requirements to use Submarine on older Hadoop
    > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
    > short time, but the requirement to run deep learning is urgent to them. We
    > should decouple Submarine from Hadoop version.
    >
    > And why we wanna to keep it within Hadoop? First, Submarine included some
    > innovation parts such as enhancements of user experiences for YARN
    > services/containerization support which we can add it back to Hadoop later
    > to address common requirements. In addition to that, we have a big overlap
    > in the community developing and using it.
    >
    > There're several proposals we have went through during Ozone merge to trunk
    > discussion:
    >
    > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
    >
    > I propose to adopt Ozone model: which is the same master branch, different
    > release cycle, and different release branch. It is a great example to show
    > agile release we can do (2 Ozone releases after Oct 2018) with less
    > overhead to setup CI, projects, etc.
    >
    > *Links:*
    > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
    > - Design doc
    > <
    > https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
    > >
    > - User doc
    > <
    > https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
    > >
    > (3.2.0
    > release)
    > - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
    > <
    > https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
    > >,
    > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
    > - Talks: Strata Data Conf NY
    > <
    > https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
    > >
    >
    > Thoughts?
    >
    > Thanks,
    > Wangda Tan
    >

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Jonathan Hung <jy...@gmail.com>.

+1. This is important for improving the deep learning on hadoop story.
There's recently a lot of momentum for this, and decoupling
submarine/hadoop will help it continue.

Jonathan Hung


On Thu, Jan 31, 2019 at 11:04 AM Wangda Tan <wh...@gmail.com> wrote:

> Hi devs,
>
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
>
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
>
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
>
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
>
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
>
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
>
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
>
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> - User doc
> <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>
> Thoughts?
>
> Thanks,
> Wangda Tan
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by "Elek, Marton" <el...@apache.org>.

+1.

I like the idea.

For me, submarine/ML-job-execution seems to be a natural extension of
the existing Hadoop/Yarn capabilities.

And like the proposed project structure / release lifecycle, too. I
think it's better to be more modularized but keep the development in the
same project. IMHO it worked well with the Ozone releases. We can do
more frequent releases and support multiple versions of core hadoop but
the tested new improvements could be moved back to the hadoop-common.

Marton

On 1/31/19 7:53 PM, Wangda Tan wrote:
> Hi devs,
> 
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
> 
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
> 
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
> 
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
> 
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
> 
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> 
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
> 
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit>
> - User doc
> <https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html>
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/>,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289>
> 
> Thoughts?
> 
> Thanks,
> Wangda Tan
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by John Zhuge <jo...@gmail.com>.

+1

Does Submarine support Jupyter?

On Fri, Feb 1, 2019 at 8:54 AM Zhe Zhang <zh...@apache.org> wrote:

> +1 on the proposal and looking forward to the progress of the project!
>
> On Thu, Jan 31, 2019 at 10:51 PM Weiwei Yang <ab...@gmail.com> wrote:
>
> > Thanks for proposing this Wangda, my +1 as well.
> > It is amazing to see the progress made in Submarine last year, the
> > community grows fast and quiet collaborative. I can see the reasons to
> get
> > it release faster in its own cycle. And at the same time, the Ozone way
> > works very well.
> >
> > —
> > Weiwei
> > On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
> > > +1
> > >
> > > Hello everyone,
> > >
> > > I am Xun Liu, the head of the machine learning team at Netease Research
> > Institute. I quite agree with Wangda.
> > >
> > > Our team is very grateful for getting Submarine machine learning engine
> > from the community.
> > > We are heavy users of Submarine.
> > > Because Submarine fits into the direction of our big data team's hadoop
> > technology stack,
> > > It avoids the needs to increase the manpower investment in learning
> > other container scheduling systems.
> > > The important thing is that we can use a common YARN cluster to run
> > machine learning,
> > > which makes the utilization of server resources more efficient, and
> > reserves a lot of human and material resources in our previous years.
> > >
> > > Our team have finished the test and deployment of the Submarine and
> will
> > provide the service to our e-commerce department (http://www.kaola.com/)
> > shortly.
> > >
> > > We also plan to provides the Submarine engine in our existing YARN
> > cluster in the next six months.
> > > Because we have a lot of product departments need to use machine
> > learning services,
> > > for example:
> > > 1) Game department (http://game.163.com/) needs AI battle training,
> > > 2) News department (http://www.163.com) needs news recommendation,
> > > 3) Mailbox department (http://www.163.com) requires anti-spam and
> > illegal detection,
> > > 4) Music department (https://music.163.com/) requires music
> > recommendation,
> > > 5) Education department (http://www.youdao.com) requires voice
> > recognition,
> > > 6) Massive Open Online Courses (https://open.163.com/) requires
> > multilingual translation and so on.
> > >
> > > If Submarine can be released independently like Ozone, it will help us
> > quickly get the latest features and improvements, and it will be great
> > helpful to our team and users.
> > >
> > > Thanks hadoop Community!
> > >
> > >
> > > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
> > > >
> > > > Hi devs,
> > > >
> > > > Since we started submarine-related effort last year, we received a
> lot
> > of
> > > > feedbacks, several companies (such as Netease, China Mobile, etc.)
> are
> > > > trying to deploy Submarine to their Hadoop cluster along with big
> data
> > > > workloads. Linkedin also has big interests to contribute a Submarine
> > TonY (
> > > > https://github.com/linkedin/TonY) runtime to allow users to use the
> > same
> > > > interface.
> > > >
> > > > From what I can see, there're several issues of putting Submarine
> under
> > > > yarn-applications directory and have same release cycle with Hadoop:
> > > >
> > > > 1) We started 3.2.0 release at Sep 2018, but the release is done at
> Jan
> > > > 2019. Because of non-predictable blockers and security issues, it got
> > > > delayed a lot. We need to iterate submarine fast at this point.
> > > >
> > > > 2) We also see a lot of requirements to use Submarine on older Hadoop
> > > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
> in a
> > > > short time, but the requirement to run deep learning is urgent to
> > them. We
> > > > should decouple Submarine from Hadoop version.
> > > >
> > > > And why we wanna to keep it within Hadoop? First, Submarine included
> > some
> > > > innovation parts such as enhancements of user experiences for YARN
> > > > services/containerization support which we can add it back to Hadoop
> > later
> > > > to address common requirements. In addition to that, we have a big
> > overlap
> > > > in the community developing and using it.
> > > >
> > > > There're several proposals we have went through during Ozone merge to
> > trunk
> > > > discussion:
> > > >
> >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> > > >
> > > > I propose to adopt Ozone model: which is the same master branch,
> > different
> > > > release cycle, and different release branch. It is a great example to
> > show
> > > > agile release we can do (2 Ozone releases after Oct 2018) with less
> > > > overhead to setup CI, projects, etc.
> > > >
> > > > *Links:*
> > > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> > > > - Design doc
> > > > <
> >
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> > >
> > > > - User doc
> > > > <
> >
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> > >
> > > > (3.2.0
> > > > release)
> > > > - Blogposts, {Submarine} : Running deep learning workloads on Apache
> > Hadoop
> > > > <
> >
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> > >,
> > > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> > > > - Talks: Strata Data Conf NY
> > > > <
> >
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> > >
> > > >
> > > > Thoughts?
> > > >
> > > > Thanks,
> > > > Wangda Tan
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> > > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
> > >
> >
>


-- 
John

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by John Zhuge <jo...@gmail.com>.

+1

Does Submarine support Jupyter?

On Fri, Feb 1, 2019 at 8:54 AM Zhe Zhang <zh...@apache.org> wrote:

> +1 on the proposal and looking forward to the progress of the project!
>
> On Thu, Jan 31, 2019 at 10:51 PM Weiwei Yang <ab...@gmail.com> wrote:
>
> > Thanks for proposing this Wangda, my +1 as well.
> > It is amazing to see the progress made in Submarine last year, the
> > community grows fast and quiet collaborative. I can see the reasons to
> get
> > it release faster in its own cycle. And at the same time, the Ozone way
> > works very well.
> >
> > —
> > Weiwei
> > On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
> > > +1
> > >
> > > Hello everyone,
> > >
> > > I am Xun Liu, the head of the machine learning team at Netease Research
> > Institute. I quite agree with Wangda.
> > >
> > > Our team is very grateful for getting Submarine machine learning engine
> > from the community.
> > > We are heavy users of Submarine.
> > > Because Submarine fits into the direction of our big data team's hadoop
> > technology stack,
> > > It avoids the needs to increase the manpower investment in learning
> > other container scheduling systems.
> > > The important thing is that we can use a common YARN cluster to run
> > machine learning,
> > > which makes the utilization of server resources more efficient, and
> > reserves a lot of human and material resources in our previous years.
> > >
> > > Our team have finished the test and deployment of the Submarine and
> will
> > provide the service to our e-commerce department (http://www.kaola.com/)
> > shortly.
> > >
> > > We also plan to provides the Submarine engine in our existing YARN
> > cluster in the next six months.
> > > Because we have a lot of product departments need to use machine
> > learning services,
> > > for example:
> > > 1) Game department (http://game.163.com/) needs AI battle training,
> > > 2) News department (http://www.163.com) needs news recommendation,
> > > 3) Mailbox department (http://www.163.com) requires anti-spam and
> > illegal detection,
> > > 4) Music department (https://music.163.com/) requires music
> > recommendation,
> > > 5) Education department (http://www.youdao.com) requires voice
> > recognition,
> > > 6) Massive Open Online Courses (https://open.163.com/) requires
> > multilingual translation and so on.
> > >
> > > If Submarine can be released independently like Ozone, it will help us
> > quickly get the latest features and improvements, and it will be great
> > helpful to our team and users.
> > >
> > > Thanks hadoop Community!
> > >
> > >
> > > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
> > > >
> > > > Hi devs,
> > > >
> > > > Since we started submarine-related effort last year, we received a
> lot
> > of
> > > > feedbacks, several companies (such as Netease, China Mobile, etc.)
> are
> > > > trying to deploy Submarine to their Hadoop cluster along with big
> data
> > > > workloads. Linkedin also has big interests to contribute a Submarine
> > TonY (
> > > > https://github.com/linkedin/TonY) runtime to allow users to use the
> > same
> > > > interface.
> > > >
> > > > From what I can see, there're several issues of putting Submarine
> under
> > > > yarn-applications directory and have same release cycle with Hadoop:
> > > >
> > > > 1) We started 3.2.0 release at Sep 2018, but the release is done at
> Jan
> > > > 2019. Because of non-predictable blockers and security issues, it got
> > > > delayed a lot. We need to iterate submarine fast at this point.
> > > >
> > > > 2) We also see a lot of requirements to use Submarine on older Hadoop
> > > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
> in a
> > > > short time, but the requirement to run deep learning is urgent to
> > them. We
> > > > should decouple Submarine from Hadoop version.
> > > >
> > > > And why we wanna to keep it within Hadoop? First, Submarine included
> > some
> > > > innovation parts such as enhancements of user experiences for YARN
> > > > services/containerization support which we can add it back to Hadoop
> > later
> > > > to address common requirements. In addition to that, we have a big
> > overlap
> > > > in the community developing and using it.
> > > >
> > > > There're several proposals we have went through during Ozone merge to
> > trunk
> > > > discussion:
> > > >
> >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> > > >
> > > > I propose to adopt Ozone model: which is the same master branch,
> > different
> > > > release cycle, and different release branch. It is a great example to
> > show
> > > > agile release we can do (2 Ozone releases after Oct 2018) with less
> > > > overhead to setup CI, projects, etc.
> > > >
> > > > *Links:*
> > > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> > > > - Design doc
> > > > <
> >
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> > >
> > > > - User doc
> > > > <
> >
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> > >
> > > > (3.2.0
> > > > release)
> > > > - Blogposts, {Submarine} : Running deep learning workloads on Apache
> > Hadoop
> > > > <
> >
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> > >,
> > > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> > > > - Talks: Strata Data Conf NY
> > > > <
> >
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> > >
> > > >
> > > > Thoughts?
> > > >
> > > > Thanks,
> > > > Wangda Tan
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> > > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
> > >
> >
>


-- 
John

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by John Zhuge <jo...@gmail.com>.

+1

Does Submarine support Jupyter?

On Fri, Feb 1, 2019 at 8:54 AM Zhe Zhang <zh...@apache.org> wrote:

> +1 on the proposal and looking forward to the progress of the project!
>
> On Thu, Jan 31, 2019 at 10:51 PM Weiwei Yang <ab...@gmail.com> wrote:
>
> > Thanks for proposing this Wangda, my +1 as well.
> > It is amazing to see the progress made in Submarine last year, the
> > community grows fast and quiet collaborative. I can see the reasons to
> get
> > it release faster in its own cycle. And at the same time, the Ozone way
> > works very well.
> >
> > —
> > Weiwei
> > On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
> > > +1
> > >
> > > Hello everyone,
> > >
> > > I am Xun Liu, the head of the machine learning team at Netease Research
> > Institute. I quite agree with Wangda.
> > >
> > > Our team is very grateful for getting Submarine machine learning engine
> > from the community.
> > > We are heavy users of Submarine.
> > > Because Submarine fits into the direction of our big data team's hadoop
> > technology stack,
> > > It avoids the needs to increase the manpower investment in learning
> > other container scheduling systems.
> > > The important thing is that we can use a common YARN cluster to run
> > machine learning,
> > > which makes the utilization of server resources more efficient, and
> > reserves a lot of human and material resources in our previous years.
> > >
> > > Our team have finished the test and deployment of the Submarine and
> will
> > provide the service to our e-commerce department (http://www.kaola.com/)
> > shortly.
> > >
> > > We also plan to provides the Submarine engine in our existing YARN
> > cluster in the next six months.
> > > Because we have a lot of product departments need to use machine
> > learning services,
> > > for example:
> > > 1) Game department (http://game.163.com/) needs AI battle training,
> > > 2) News department (http://www.163.com) needs news recommendation,
> > > 3) Mailbox department (http://www.163.com) requires anti-spam and
> > illegal detection,
> > > 4) Music department (https://music.163.com/) requires music
> > recommendation,
> > > 5) Education department (http://www.youdao.com) requires voice
> > recognition,
> > > 6) Massive Open Online Courses (https://open.163.com/) requires
> > multilingual translation and so on.
> > >
> > > If Submarine can be released independently like Ozone, it will help us
> > quickly get the latest features and improvements, and it will be great
> > helpful to our team and users.
> > >
> > > Thanks hadoop Community!
> > >
> > >
> > > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
> > > >
> > > > Hi devs,
> > > >
> > > > Since we started submarine-related effort last year, we received a
> lot
> > of
> > > > feedbacks, several companies (such as Netease, China Mobile, etc.)
> are
> > > > trying to deploy Submarine to their Hadoop cluster along with big
> data
> > > > workloads. Linkedin also has big interests to contribute a Submarine
> > TonY (
> > > > https://github.com/linkedin/TonY) runtime to allow users to use the
> > same
> > > > interface.
> > > >
> > > > From what I can see, there're several issues of putting Submarine
> under
> > > > yarn-applications directory and have same release cycle with Hadoop:
> > > >
> > > > 1) We started 3.2.0 release at Sep 2018, but the release is done at
> Jan
> > > > 2019. Because of non-predictable blockers and security issues, it got
> > > > delayed a lot. We need to iterate submarine fast at this point.
> > > >
> > > > 2) We also see a lot of requirements to use Submarine on older Hadoop
> > > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
> in a
> > > > short time, but the requirement to run deep learning is urgent to
> > them. We
> > > > should decouple Submarine from Hadoop version.
> > > >
> > > > And why we wanna to keep it within Hadoop? First, Submarine included
> > some
> > > > innovation parts such as enhancements of user experiences for YARN
> > > > services/containerization support which we can add it back to Hadoop
> > later
> > > > to address common requirements. In addition to that, we have a big
> > overlap
> > > > in the community developing and using it.
> > > >
> > > > There're several proposals we have went through during Ozone merge to
> > trunk
> > > > discussion:
> > > >
> >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> > > >
> > > > I propose to adopt Ozone model: which is the same master branch,
> > different
> > > > release cycle, and different release branch. It is a great example to
> > show
> > > > agile release we can do (2 Ozone releases after Oct 2018) with less
> > > > overhead to setup CI, projects, etc.
> > > >
> > > > *Links:*
> > > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> > > > - Design doc
> > > > <
> >
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> > >
> > > > - User doc
> > > > <
> >
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> > >
> > > > (3.2.0
> > > > release)
> > > > - Blogposts, {Submarine} : Running deep learning workloads on Apache
> > Hadoop
> > > > <
> >
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> > >,
> > > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> > > > - Talks: Strata Data Conf NY
> > > > <
> >
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> > >
> > > >
> > > > Thoughts?
> > > >
> > > > Thanks,
> > > > Wangda Tan
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> > > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
> > >
> >
>


-- 
John

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by John Zhuge <jo...@gmail.com>.

+1

Does Submarine support Jupyter?

On Fri, Feb 1, 2019 at 8:54 AM Zhe Zhang <zh...@apache.org> wrote:

> +1 on the proposal and looking forward to the progress of the project!
>
> On Thu, Jan 31, 2019 at 10:51 PM Weiwei Yang <ab...@gmail.com> wrote:
>
> > Thanks for proposing this Wangda, my +1 as well.
> > It is amazing to see the progress made in Submarine last year, the
> > community grows fast and quiet collaborative. I can see the reasons to
> get
> > it release faster in its own cycle. And at the same time, the Ozone way
> > works very well.
> >
> > —
> > Weiwei
> > On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
> > > +1
> > >
> > > Hello everyone,
> > >
> > > I am Xun Liu, the head of the machine learning team at Netease Research
> > Institute. I quite agree with Wangda.
> > >
> > > Our team is very grateful for getting Submarine machine learning engine
> > from the community.
> > > We are heavy users of Submarine.
> > > Because Submarine fits into the direction of our big data team's hadoop
> > technology stack,
> > > It avoids the needs to increase the manpower investment in learning
> > other container scheduling systems.
> > > The important thing is that we can use a common YARN cluster to run
> > machine learning,
> > > which makes the utilization of server resources more efficient, and
> > reserves a lot of human and material resources in our previous years.
> > >
> > > Our team have finished the test and deployment of the Submarine and
> will
> > provide the service to our e-commerce department (http://www.kaola.com/)
> > shortly.
> > >
> > > We also plan to provides the Submarine engine in our existing YARN
> > cluster in the next six months.
> > > Because we have a lot of product departments need to use machine
> > learning services,
> > > for example:
> > > 1) Game department (http://game.163.com/) needs AI battle training,
> > > 2) News department (http://www.163.com) needs news recommendation,
> > > 3) Mailbox department (http://www.163.com) requires anti-spam and
> > illegal detection,
> > > 4) Music department (https://music.163.com/) requires music
> > recommendation,
> > > 5) Education department (http://www.youdao.com) requires voice
> > recognition,
> > > 6) Massive Open Online Courses (https://open.163.com/) requires
> > multilingual translation and so on.
> > >
> > > If Submarine can be released independently like Ozone, it will help us
> > quickly get the latest features and improvements, and it will be great
> > helpful to our team and users.
> > >
> > > Thanks hadoop Community!
> > >
> > >
> > > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
> > > >
> > > > Hi devs,
> > > >
> > > > Since we started submarine-related effort last year, we received a
> lot
> > of
> > > > feedbacks, several companies (such as Netease, China Mobile, etc.)
> are
> > > > trying to deploy Submarine to their Hadoop cluster along with big
> data
> > > > workloads. Linkedin also has big interests to contribute a Submarine
> > TonY (
> > > > https://github.com/linkedin/TonY) runtime to allow users to use the
> > same
> > > > interface.
> > > >
> > > > From what I can see, there're several issues of putting Submarine
> under
> > > > yarn-applications directory and have same release cycle with Hadoop:
> > > >
> > > > 1) We started 3.2.0 release at Sep 2018, but the release is done at
> Jan
> > > > 2019. Because of non-predictable blockers and security issues, it got
> > > > delayed a lot. We need to iterate submarine fast at this point.
> > > >
> > > > 2) We also see a lot of requirements to use Submarine on older Hadoop
> > > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
> in a
> > > > short time, but the requirement to run deep learning is urgent to
> > them. We
> > > > should decouple Submarine from Hadoop version.
> > > >
> > > > And why we wanna to keep it within Hadoop? First, Submarine included
> > some
> > > > innovation parts such as enhancements of user experiences for YARN
> > > > services/containerization support which we can add it back to Hadoop
> > later
> > > > to address common requirements. In addition to that, we have a big
> > overlap
> > > > in the community developing and using it.
> > > >
> > > > There're several proposals we have went through during Ozone merge to
> > trunk
> > > > discussion:
> > > >
> >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> > > >
> > > > I propose to adopt Ozone model: which is the same master branch,
> > different
> > > > release cycle, and different release branch. It is a great example to
> > show
> > > > agile release we can do (2 Ozone releases after Oct 2018) with less
> > > > overhead to setup CI, projects, etc.
> > > >
> > > > *Links:*
> > > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> > > > - Design doc
> > > > <
> >
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> > >
> > > > - User doc
> > > > <
> >
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> > >
> > > > (3.2.0
> > > > release)
> > > > - Blogposts, {Submarine} : Running deep learning workloads on Apache
> > Hadoop
> > > > <
> >
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> > >,
> > > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> > > > - Talks: Strata Data Conf NY
> > > > <
> >
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> > >
> > > >
> > > > Thoughts?
> > > >
> > > > Thanks,
> > > > Wangda Tan
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> > > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
> > >
> >
>


-- 
John

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Zhe Zhang <zh...@apache.org>.

+1 on the proposal and looking forward to the progress of the project!

On Thu, Jan 31, 2019 at 10:51 PM Weiwei Yang <ab...@gmail.com> wrote:

> Thanks for proposing this Wangda, my +1 as well.
> It is amazing to see the progress made in Submarine last year, the
> community grows fast and quiet collaborative. I can see the reasons to get
> it release faster in its own cycle. And at the same time, the Ozone way
> works very well.
>
> —
> Weiwei
> On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
> > +1
> >
> > Hello everyone,
> >
> > I am Xun Liu, the head of the machine learning team at Netease Research
> Institute. I quite agree with Wangda.
> >
> > Our team is very grateful for getting Submarine machine learning engine
> from the community.
> > We are heavy users of Submarine.
> > Because Submarine fits into the direction of our big data team's hadoop
> technology stack,
> > It avoids the needs to increase the manpower investment in learning
> other container scheduling systems.
> > The important thing is that we can use a common YARN cluster to run
> machine learning,
> > which makes the utilization of server resources more efficient, and
> reserves a lot of human and material resources in our previous years.
> >
> > Our team have finished the test and deployment of the Submarine and will
> provide the service to our e-commerce department (http://www.kaola.com/)
> shortly.
> >
> > We also plan to provides the Submarine engine in our existing YARN
> cluster in the next six months.
> > Because we have a lot of product departments need to use machine
> learning services,
> > for example:
> > 1) Game department (http://game.163.com/) needs AI battle training,
> > 2) News department (http://www.163.com) needs news recommendation,
> > 3) Mailbox department (http://www.163.com) requires anti-spam and
> illegal detection,
> > 4) Music department (https://music.163.com/) requires music
> recommendation,
> > 5) Education department (http://www.youdao.com) requires voice
> recognition,
> > 6) Massive Open Online Courses (https://open.163.com/) requires
> multilingual translation and so on.
> >
> > If Submarine can be released independently like Ozone, it will help us
> quickly get the latest features and improvements, and it will be great
> helpful to our team and users.
> >
> > Thanks hadoop Community!
> >
> >
> > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
> > >
> > > Hi devs,
> > >
> > > Since we started submarine-related effort last year, we received a lot
> of
> > > feedbacks, several companies (such as Netease, China Mobile, etc.) are
> > > trying to deploy Submarine to their Hadoop cluster along with big data
> > > workloads. Linkedin also has big interests to contribute a Submarine
> TonY (
> > > https://github.com/linkedin/TonY) runtime to allow users to use the
> same
> > > interface.
> > >
> > > From what I can see, there're several issues of putting Submarine under
> > > yarn-applications directory and have same release cycle with Hadoop:
> > >
> > > 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> > > 2019. Because of non-predictable blockers and security issues, it got
> > > delayed a lot. We need to iterate submarine fast at this point.
> > >
> > > 2) We also see a lot of requirements to use Submarine on older Hadoop
> > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> > > short time, but the requirement to run deep learning is urgent to
> them. We
> > > should decouple Submarine from Hadoop version.
> > >
> > > And why we wanna to keep it within Hadoop? First, Submarine included
> some
> > > innovation parts such as enhancements of user experiences for YARN
> > > services/containerization support which we can add it back to Hadoop
> later
> > > to address common requirements. In addition to that, we have a big
> overlap
> > > in the community developing and using it.
> > >
> > > There're several proposals we have went through during Ozone merge to
> trunk
> > > discussion:
> > >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> > >
> > > I propose to adopt Ozone model: which is the same master branch,
> different
> > > release cycle, and different release branch. It is a great example to
> show
> > > agile release we can do (2 Ozone releases after Oct 2018) with less
> > > overhead to setup CI, projects, etc.
> > >
> > > *Links:*
> > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> > > - Design doc
> > > <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> > > - User doc
> > > <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> > > (3.2.0
> > > release)
> > > - Blogposts, {Submarine} : Running deep learning workloads on Apache
> Hadoop
> > > <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> > > - Talks: Strata Data Conf NY
> > > <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
> > >
> > > Thoughts?
> > >
> > > Thanks,
> > > Wangda Tan
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
> >
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Zhe Zhang <zh...@apache.org>.

+1 on the proposal and looking forward to the progress of the project!

On Thu, Jan 31, 2019 at 10:51 PM Weiwei Yang <ab...@gmail.com> wrote:

> Thanks for proposing this Wangda, my +1 as well.
> It is amazing to see the progress made in Submarine last year, the
> community grows fast and quiet collaborative. I can see the reasons to get
> it release faster in its own cycle. And at the same time, the Ozone way
> works very well.
>
> —
> Weiwei
> On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
> > +1
> >
> > Hello everyone,
> >
> > I am Xun Liu, the head of the machine learning team at Netease Research
> Institute. I quite agree with Wangda.
> >
> > Our team is very grateful for getting Submarine machine learning engine
> from the community.
> > We are heavy users of Submarine.
> > Because Submarine fits into the direction of our big data team's hadoop
> technology stack,
> > It avoids the needs to increase the manpower investment in learning
> other container scheduling systems.
> > The important thing is that we can use a common YARN cluster to run
> machine learning,
> > which makes the utilization of server resources more efficient, and
> reserves a lot of human and material resources in our previous years.
> >
> > Our team have finished the test and deployment of the Submarine and will
> provide the service to our e-commerce department (http://www.kaola.com/)
> shortly.
> >
> > We also plan to provides the Submarine engine in our existing YARN
> cluster in the next six months.
> > Because we have a lot of product departments need to use machine
> learning services,
> > for example:
> > 1) Game department (http://game.163.com/) needs AI battle training,
> > 2) News department (http://www.163.com) needs news recommendation,
> > 3) Mailbox department (http://www.163.com) requires anti-spam and
> illegal detection,
> > 4) Music department (https://music.163.com/) requires music
> recommendation,
> > 5) Education department (http://www.youdao.com) requires voice
> recognition,
> > 6) Massive Open Online Courses (https://open.163.com/) requires
> multilingual translation and so on.
> >
> > If Submarine can be released independently like Ozone, it will help us
> quickly get the latest features and improvements, and it will be great
> helpful to our team and users.
> >
> > Thanks hadoop Community!
> >
> >
> > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
> > >
> > > Hi devs,
> > >
> > > Since we started submarine-related effort last year, we received a lot
> of
> > > feedbacks, several companies (such as Netease, China Mobile, etc.) are
> > > trying to deploy Submarine to their Hadoop cluster along with big data
> > > workloads. Linkedin also has big interests to contribute a Submarine
> TonY (
> > > https://github.com/linkedin/TonY) runtime to allow users to use the
> same
> > > interface.
> > >
> > > From what I can see, there're several issues of putting Submarine under
> > > yarn-applications directory and have same release cycle with Hadoop:
> > >
> > > 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> > > 2019. Because of non-predictable blockers and security issues, it got
> > > delayed a lot. We need to iterate submarine fast at this point.
> > >
> > > 2) We also see a lot of requirements to use Submarine on older Hadoop
> > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> > > short time, but the requirement to run deep learning is urgent to
> them. We
> > > should decouple Submarine from Hadoop version.
> > >
> > > And why we wanna to keep it within Hadoop? First, Submarine included
> some
> > > innovation parts such as enhancements of user experiences for YARN
> > > services/containerization support which we can add it back to Hadoop
> later
> > > to address common requirements. In addition to that, we have a big
> overlap
> > > in the community developing and using it.
> > >
> > > There're several proposals we have went through during Ozone merge to
> trunk
> > > discussion:
> > >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> > >
> > > I propose to adopt Ozone model: which is the same master branch,
> different
> > > release cycle, and different release branch. It is a great example to
> show
> > > agile release we can do (2 Ozone releases after Oct 2018) with less
> > > overhead to setup CI, projects, etc.
> > >
> > > *Links:*
> > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> > > - Design doc
> > > <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> > > - User doc
> > > <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> > > (3.2.0
> > > release)
> > > - Blogposts, {Submarine} : Running deep learning workloads on Apache
> Hadoop
> > > <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> > > - Talks: Strata Data Conf NY
> > > <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
> > >
> > > Thoughts?
> > >
> > > Thanks,
> > > Wangda Tan
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
> >
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Hanisha Koneru <hk...@hortonworks.com>.

This is a great proposal. +1.

Thanks,
Hanisha









On 2/1/19, 11:04 AM, "Bharat Viswanadham" <bv...@hortonworks.com> wrote:

>Thank You Wangda for driving this discussion.
>+1 for a separate release for submarine.
>Having own release cadence will help iterate the project to grow at a faster pace and also get the new features in hand to the users, and get their feedback quickly.
>
>
>Thanks,
>Bharat
>
>
>
>
>On 2/1/19, 10:54 AM, "Ajay Kumar" <aj...@hortonworks.com> wrote:
>
>    +1, Thanks for driving this. With rise of use cases running ML along with traditional applications this will be of great help.
>    
>    Thanks,
>    Ajay   
>    
>    On 2/1/19, 10:49 AM, "Suma Shivaprasad" <su...@gmail.com> wrote:
>    
>        +1. Thanks for bringing this up Wangda.
>        
>        Makes sense to have Submarine follow its own release cadence given the good
>        momentum/adoption so far. Also, making it run with older versions of Hadoop
>        would drive higher adoption.
>        
>        Suma
>        
>        On Fri, Feb 1, 2019 at 9:40 AM Eric Yang <ey...@hortonworks.com> wrote:
>        
>        > Submarine is an application built for YARN framework, but it does not have
>        > strong dependency on YARN development.  For this kind of projects, it would
>        > be best to enter Apache Incubator cycles to create a new community.  Apache
>        > commons is the only project other than Incubator that has independent
>        > release cycles.  The collection is large, and the project goal is
>        > ambitious.  No one really knows which component works with each other in
>        > Apache commons.  Hadoop is a much more focused project on distributed
>        > computing framework and not incubation sandbox.  For alignment with Hadoop
>        > goals, and we want to prevent Hadoop project to be overloaded while
>        > allowing good ideas to be carried forwarded in Apache incubator.  Put on my
>        > Apache Member hat, my vote is -1 to allow more independent subproject
>        > release cycle in Hadoop project that does not align with Hadoop project
>        > goals.
>        >
>        > Apache incubator process is highly recommended for Submarine:
>        > https://incubator.apache.org/policy/process.html This allows Submarine to
>        > develop for older version of Hadoop like Spark works with multiple versions
>        > of Hadoop.
>        >
>        > Regards,
>        > Eric
>        >
>        > On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
>        >
>        >     Thanks for proposing this Wangda, my +1 as well.
>        >     It is amazing to see the progress made in Submarine last year, the
>        > community grows fast and quiet collaborative. I can see the reasons to get
>        > it release faster in its own cycle. And at the same time, the Ozone way
>        > works very well.
>        >
>        >     —
>        >     Weiwei
>        >     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
>        >     > +1
>        >     >
>        >     > Hello everyone,
>        >     >
>        >     > I am Xun Liu, the head of the machine learning team at Netease
>        > Research Institute. I quite agree with Wangda.
>        >     >
>        >     > Our team is very grateful for getting Submarine machine learning
>        > engine from the community.
>        >     > We are heavy users of Submarine.
>        >     > Because Submarine fits into the direction of our big data team's
>        > hadoop technology stack,
>        >     > It avoids the needs to increase the manpower investment in learning
>        > other container scheduling systems.
>        >     > The important thing is that we can use a common YARN cluster to run
>        > machine learning,
>        >     > which makes the utilization of server resources more efficient, and
>        > reserves a lot of human and material resources in our previous years.
>        >     >
>        >     > Our team have finished the test and deployment of the Submarine and
>        > will provide the service to our e-commerce department (
>        > http://www.kaola.com/) shortly.
>        >     >
>        >     > We also plan to provides the Submarine engine in our existing YARN
>        > cluster in the next six months.
>        >     > Because we have a lot of product departments need to use machine
>        > learning services,
>        >     > for example:
>        >     > 1) Game department (http://game.163.com/) needs AI battle training,
>        >     > 2) News department (http://www.163.com) needs news recommendation,
>        >     > 3) Mailbox department (http://www.163.com) requires anti-spam and
>        > illegal detection,
>        >     > 4) Music department (https://music.163.com/) requires music
>        > recommendation,
>        >     > 5) Education department (http://www.youdao.com) requires voice
>        > recognition,
>        >     > 6) Massive Open Online Courses (https://open.163.com/) requires
>        > multilingual translation and so on.
>        >     >
>        >     > If Submarine can be released independently like Ozone, it will help
>        > us quickly get the latest features and improvements, and it will be great
>        > helpful to our team and users.
>        >     >
>        >     > Thanks hadoop Community!
>        >     >
>        >     >
>        >     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
>        >     > >
>        >     > > Hi devs,
>        >     > >
>        >     > > Since we started submarine-related effort last year, we received a
>        > lot of
>        >     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
>        > are
>        >     > > trying to deploy Submarine to their Hadoop cluster along with big
>        > data
>        >     > > workloads. Linkedin also has big interests to contribute a
>        > Submarine TonY (
>        >     > > https://github.com/linkedin/TonY) runtime to allow users to use
>        > the same
>        >     > > interface.
>        >     > >
>        >     > > From what I can see, there're several issues of putting Submarine
>        > under
>        >     > > yarn-applications directory and have same release cycle with
>        > Hadoop:
>        >     > >
>        >     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
>        > at Jan
>        >     > > 2019. Because of non-predictable blockers and security issues, it
>        > got
>        >     > > delayed a lot. We need to iterate submarine fast at this point.
>        >     > >
>        >     > > 2) We also see a lot of requirements to use Submarine on older
>        > Hadoop
>        >     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
>        > in a
>        >     > > short time, but the requirement to run deep learning is urgent to
>        > them. We
>        >     > > should decouple Submarine from Hadoop version.
>        >     > >
>        >     > > And why we wanna to keep it within Hadoop? First, Submarine
>        > included some
>        >     > > innovation parts such as enhancements of user experiences for YARN
>        >     > > services/containerization support which we can add it back to
>        > Hadoop later
>        >     > > to address common requirements. In addition to that, we have a big
>        > overlap
>        >     > > in the community developing and using it.
>        >     > >
>        >     > > There're several proposals we have went through during Ozone merge
>        > to trunk
>        >     > > discussion:
>        >     > >
>        > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>        >     > >
>        >     > > I propose to adopt Ozone model: which is the same master branch,
>        > different
>        >     > > release cycle, and different release branch. It is a great example
>        > to show
>        >     > > agile release we can do (2 Ozone releases after Oct 2018) with less
>        >     > > overhead to setup CI, projects, etc.
>        >     > >
>        >     > > *Links:*
>        >     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
>        >     > > - Design doc
>        >     > > <
>        > https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
>        > >
>        >     > > - User doc
>        >     > > <
>        > https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
>        > >
>        >     > > (3.2.0
>        >     > > release)
>        >     > > - Blogposts, {Submarine} : Running deep learning workloads on
>        > Apache Hadoop
>        >     > > <
>        > https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
>        > >,
>        >     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
>        >     > > - Talks: Strata Data Conf NY
>        >     > > <
>        > https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
>        > >
>        >     > >
>        >     > > Thoughts?
>        >     > >
>        >     > > Thanks,
>        >     > > Wangda Tan
>        >     >
>        >     >
>        >     >
>        >     > ---------------------------------------------------------------------
>        >     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>        >     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>        >     >
>        >
>        >
>        >
>        
>    
>    
>    ---------------------------------------------------------------------
>    To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
>    For additional commands, e-mail: common-dev-help@hadoop.apache.org
>    
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Hanisha Koneru <hk...@hortonworks.com>.

This is a great proposal. +1.

Thanks,
Hanisha









On 2/1/19, 11:04 AM, "Bharat Viswanadham" <bv...@hortonworks.com> wrote:

>Thank You Wangda for driving this discussion.
>+1 for a separate release for submarine.
>Having own release cadence will help iterate the project to grow at a faster pace and also get the new features in hand to the users, and get their feedback quickly.
>
>
>Thanks,
>Bharat
>
>
>
>
>On 2/1/19, 10:54 AM, "Ajay Kumar" <aj...@hortonworks.com> wrote:
>
>    +1, Thanks for driving this. With rise of use cases running ML along with traditional applications this will be of great help.
>    
>    Thanks,
>    Ajay   
>    
>    On 2/1/19, 10:49 AM, "Suma Shivaprasad" <su...@gmail.com> wrote:
>    
>        +1. Thanks for bringing this up Wangda.
>        
>        Makes sense to have Submarine follow its own release cadence given the good
>        momentum/adoption so far. Also, making it run with older versions of Hadoop
>        would drive higher adoption.
>        
>        Suma
>        
>        On Fri, Feb 1, 2019 at 9:40 AM Eric Yang <ey...@hortonworks.com> wrote:
>        
>        > Submarine is an application built for YARN framework, but it does not have
>        > strong dependency on YARN development.  For this kind of projects, it would
>        > be best to enter Apache Incubator cycles to create a new community.  Apache
>        > commons is the only project other than Incubator that has independent
>        > release cycles.  The collection is large, and the project goal is
>        > ambitious.  No one really knows which component works with each other in
>        > Apache commons.  Hadoop is a much more focused project on distributed
>        > computing framework and not incubation sandbox.  For alignment with Hadoop
>        > goals, and we want to prevent Hadoop project to be overloaded while
>        > allowing good ideas to be carried forwarded in Apache incubator.  Put on my
>        > Apache Member hat, my vote is -1 to allow more independent subproject
>        > release cycle in Hadoop project that does not align with Hadoop project
>        > goals.
>        >
>        > Apache incubator process is highly recommended for Submarine:
>        > https://incubator.apache.org/policy/process.html This allows Submarine to
>        > develop for older version of Hadoop like Spark works with multiple versions
>        > of Hadoop.
>        >
>        > Regards,
>        > Eric
>        >
>        > On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
>        >
>        >     Thanks for proposing this Wangda, my +1 as well.
>        >     It is amazing to see the progress made in Submarine last year, the
>        > community grows fast and quiet collaborative. I can see the reasons to get
>        > it release faster in its own cycle. And at the same time, the Ozone way
>        > works very well.
>        >
>        >     —
>        >     Weiwei
>        >     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
>        >     > +1
>        >     >
>        >     > Hello everyone,
>        >     >
>        >     > I am Xun Liu, the head of the machine learning team at Netease
>        > Research Institute. I quite agree with Wangda.
>        >     >
>        >     > Our team is very grateful for getting Submarine machine learning
>        > engine from the community.
>        >     > We are heavy users of Submarine.
>        >     > Because Submarine fits into the direction of our big data team's
>        > hadoop technology stack,
>        >     > It avoids the needs to increase the manpower investment in learning
>        > other container scheduling systems.
>        >     > The important thing is that we can use a common YARN cluster to run
>        > machine learning,
>        >     > which makes the utilization of server resources more efficient, and
>        > reserves a lot of human and material resources in our previous years.
>        >     >
>        >     > Our team have finished the test and deployment of the Submarine and
>        > will provide the service to our e-commerce department (
>        > http://www.kaola.com/) shortly.
>        >     >
>        >     > We also plan to provides the Submarine engine in our existing YARN
>        > cluster in the next six months.
>        >     > Because we have a lot of product departments need to use machine
>        > learning services,
>        >     > for example:
>        >     > 1) Game department (http://game.163.com/) needs AI battle training,
>        >     > 2) News department (http://www.163.com) needs news recommendation,
>        >     > 3) Mailbox department (http://www.163.com) requires anti-spam and
>        > illegal detection,
>        >     > 4) Music department (https://music.163.com/) requires music
>        > recommendation,
>        >     > 5) Education department (http://www.youdao.com) requires voice
>        > recognition,
>        >     > 6) Massive Open Online Courses (https://open.163.com/) requires
>        > multilingual translation and so on.
>        >     >
>        >     > If Submarine can be released independently like Ozone, it will help
>        > us quickly get the latest features and improvements, and it will be great
>        > helpful to our team and users.
>        >     >
>        >     > Thanks hadoop Community!
>        >     >
>        >     >
>        >     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
>        >     > >
>        >     > > Hi devs,
>        >     > >
>        >     > > Since we started submarine-related effort last year, we received a
>        > lot of
>        >     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
>        > are
>        >     > > trying to deploy Submarine to their Hadoop cluster along with big
>        > data
>        >     > > workloads. Linkedin also has big interests to contribute a
>        > Submarine TonY (
>        >     > > https://github.com/linkedin/TonY) runtime to allow users to use
>        > the same
>        >     > > interface.
>        >     > >
>        >     > > From what I can see, there're several issues of putting Submarine
>        > under
>        >     > > yarn-applications directory and have same release cycle with
>        > Hadoop:
>        >     > >
>        >     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
>        > at Jan
>        >     > > 2019. Because of non-predictable blockers and security issues, it
>        > got
>        >     > > delayed a lot. We need to iterate submarine fast at this point.
>        >     > >
>        >     > > 2) We also see a lot of requirements to use Submarine on older
>        > Hadoop
>        >     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
>        > in a
>        >     > > short time, but the requirement to run deep learning is urgent to
>        > them. We
>        >     > > should decouple Submarine from Hadoop version.
>        >     > >
>        >     > > And why we wanna to keep it within Hadoop? First, Submarine
>        > included some
>        >     > > innovation parts such as enhancements of user experiences for YARN
>        >     > > services/containerization support which we can add it back to
>        > Hadoop later
>        >     > > to address common requirements. In addition to that, we have a big
>        > overlap
>        >     > > in the community developing and using it.
>        >     > >
>        >     > > There're several proposals we have went through during Ozone merge
>        > to trunk
>        >     > > discussion:
>        >     > >
>        > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>        >     > >
>        >     > > I propose to adopt Ozone model: which is the same master branch,
>        > different
>        >     > > release cycle, and different release branch. It is a great example
>        > to show
>        >     > > agile release we can do (2 Ozone releases after Oct 2018) with less
>        >     > > overhead to setup CI, projects, etc.
>        >     > >
>        >     > > *Links:*
>        >     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
>        >     > > - Design doc
>        >     > > <
>        > https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
>        > >
>        >     > > - User doc
>        >     > > <
>        > https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
>        > >
>        >     > > (3.2.0
>        >     > > release)
>        >     > > - Blogposts, {Submarine} : Running deep learning workloads on
>        > Apache Hadoop
>        >     > > <
>        > https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
>        > >,
>        >     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
>        >     > > - Talks: Strata Data Conf NY
>        >     > > <
>        > https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
>        > >
>        >     > >
>        >     > > Thoughts?
>        >     > >
>        >     > > Thanks,
>        >     > > Wangda Tan
>        >     >
>        >     >
>        >     >
>        >     > ---------------------------------------------------------------------
>        >     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>        >     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>        >     >
>        >
>        >
>        >
>        
>    
>    
>    ---------------------------------------------------------------------
>    To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
>    For additional commands, e-mail: common-dev-help@hadoop.apache.org
>    
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Hanisha Koneru <hk...@hortonworks.com>.

This is a great proposal. +1.

Thanks,
Hanisha









On 2/1/19, 11:04 AM, "Bharat Viswanadham" <bv...@hortonworks.com> wrote:

>Thank You Wangda for driving this discussion.
>+1 for a separate release for submarine.
>Having own release cadence will help iterate the project to grow at a faster pace and also get the new features in hand to the users, and get their feedback quickly.
>
>
>Thanks,
>Bharat
>
>
>
>
>On 2/1/19, 10:54 AM, "Ajay Kumar" <aj...@hortonworks.com> wrote:
>
>    +1, Thanks for driving this. With rise of use cases running ML along with traditional applications this will be of great help.
>    
>    Thanks,
>    Ajay   
>    
>    On 2/1/19, 10:49 AM, "Suma Shivaprasad" <su...@gmail.com> wrote:
>    
>        +1. Thanks for bringing this up Wangda.
>        
>        Makes sense to have Submarine follow its own release cadence given the good
>        momentum/adoption so far. Also, making it run with older versions of Hadoop
>        would drive higher adoption.
>        
>        Suma
>        
>        On Fri, Feb 1, 2019 at 9:40 AM Eric Yang <ey...@hortonworks.com> wrote:
>        
>        > Submarine is an application built for YARN framework, but it does not have
>        > strong dependency on YARN development.  For this kind of projects, it would
>        > be best to enter Apache Incubator cycles to create a new community.  Apache
>        > commons is the only project other than Incubator that has independent
>        > release cycles.  The collection is large, and the project goal is
>        > ambitious.  No one really knows which component works with each other in
>        > Apache commons.  Hadoop is a much more focused project on distributed
>        > computing framework and not incubation sandbox.  For alignment with Hadoop
>        > goals, and we want to prevent Hadoop project to be overloaded while
>        > allowing good ideas to be carried forwarded in Apache incubator.  Put on my
>        > Apache Member hat, my vote is -1 to allow more independent subproject
>        > release cycle in Hadoop project that does not align with Hadoop project
>        > goals.
>        >
>        > Apache incubator process is highly recommended for Submarine:
>        > https://incubator.apache.org/policy/process.html This allows Submarine to
>        > develop for older version of Hadoop like Spark works with multiple versions
>        > of Hadoop.
>        >
>        > Regards,
>        > Eric
>        >
>        > On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
>        >
>        >     Thanks for proposing this Wangda, my +1 as well.
>        >     It is amazing to see the progress made in Submarine last year, the
>        > community grows fast and quiet collaborative. I can see the reasons to get
>        > it release faster in its own cycle. And at the same time, the Ozone way
>        > works very well.
>        >
>        >     —
>        >     Weiwei
>        >     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
>        >     > +1
>        >     >
>        >     > Hello everyone,
>        >     >
>        >     > I am Xun Liu, the head of the machine learning team at Netease
>        > Research Institute. I quite agree with Wangda.
>        >     >
>        >     > Our team is very grateful for getting Submarine machine learning
>        > engine from the community.
>        >     > We are heavy users of Submarine.
>        >     > Because Submarine fits into the direction of our big data team's
>        > hadoop technology stack,
>        >     > It avoids the needs to increase the manpower investment in learning
>        > other container scheduling systems.
>        >     > The important thing is that we can use a common YARN cluster to run
>        > machine learning,
>        >     > which makes the utilization of server resources more efficient, and
>        > reserves a lot of human and material resources in our previous years.
>        >     >
>        >     > Our team have finished the test and deployment of the Submarine and
>        > will provide the service to our e-commerce department (
>        > http://www.kaola.com/) shortly.
>        >     >
>        >     > We also plan to provides the Submarine engine in our existing YARN
>        > cluster in the next six months.
>        >     > Because we have a lot of product departments need to use machine
>        > learning services,
>        >     > for example:
>        >     > 1) Game department (http://game.163.com/) needs AI battle training,
>        >     > 2) News department (http://www.163.com) needs news recommendation,
>        >     > 3) Mailbox department (http://www.163.com) requires anti-spam and
>        > illegal detection,
>        >     > 4) Music department (https://music.163.com/) requires music
>        > recommendation,
>        >     > 5) Education department (http://www.youdao.com) requires voice
>        > recognition,
>        >     > 6) Massive Open Online Courses (https://open.163.com/) requires
>        > multilingual translation and so on.
>        >     >
>        >     > If Submarine can be released independently like Ozone, it will help
>        > us quickly get the latest features and improvements, and it will be great
>        > helpful to our team and users.
>        >     >
>        >     > Thanks hadoop Community!
>        >     >
>        >     >
>        >     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
>        >     > >
>        >     > > Hi devs,
>        >     > >
>        >     > > Since we started submarine-related effort last year, we received a
>        > lot of
>        >     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
>        > are
>        >     > > trying to deploy Submarine to their Hadoop cluster along with big
>        > data
>        >     > > workloads. Linkedin also has big interests to contribute a
>        > Submarine TonY (
>        >     > > https://github.com/linkedin/TonY) runtime to allow users to use
>        > the same
>        >     > > interface.
>        >     > >
>        >     > > From what I can see, there're several issues of putting Submarine
>        > under
>        >     > > yarn-applications directory and have same release cycle with
>        > Hadoop:
>        >     > >
>        >     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
>        > at Jan
>        >     > > 2019. Because of non-predictable blockers and security issues, it
>        > got
>        >     > > delayed a lot. We need to iterate submarine fast at this point.
>        >     > >
>        >     > > 2) We also see a lot of requirements to use Submarine on older
>        > Hadoop
>        >     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
>        > in a
>        >     > > short time, but the requirement to run deep learning is urgent to
>        > them. We
>        >     > > should decouple Submarine from Hadoop version.
>        >     > >
>        >     > > And why we wanna to keep it within Hadoop? First, Submarine
>        > included some
>        >     > > innovation parts such as enhancements of user experiences for YARN
>        >     > > services/containerization support which we can add it back to
>        > Hadoop later
>        >     > > to address common requirements. In addition to that, we have a big
>        > overlap
>        >     > > in the community developing and using it.
>        >     > >
>        >     > > There're several proposals we have went through during Ozone merge
>        > to trunk
>        >     > > discussion:
>        >     > >
>        > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>        >     > >
>        >     > > I propose to adopt Ozone model: which is the same master branch,
>        > different
>        >     > > release cycle, and different release branch. It is a great example
>        > to show
>        >     > > agile release we can do (2 Ozone releases after Oct 2018) with less
>        >     > > overhead to setup CI, projects, etc.
>        >     > >
>        >     > > *Links:*
>        >     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
>        >     > > - Design doc
>        >     > > <
>        > https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
>        > >
>        >     > > - User doc
>        >     > > <
>        > https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
>        > >
>        >     > > (3.2.0
>        >     > > release)
>        >     > > - Blogposts, {Submarine} : Running deep learning workloads on
>        > Apache Hadoop
>        >     > > <
>        > https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
>        > >,
>        >     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
>        >     > > - Talks: Strata Data Conf NY
>        >     > > <
>        > https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
>        > >
>        >     > >
>        >     > > Thoughts?
>        >     > >
>        >     > > Thanks,
>        >     > > Wangda Tan
>        >     >
>        >     >
>        >     >
>        >     > ---------------------------------------------------------------------
>        >     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>        >     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>        >     >
>        >
>        >
>        >
>        
>    
>    
>    ---------------------------------------------------------------------
>    To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
>    For additional commands, e-mail: common-dev-help@hadoop.apache.org
>    
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Hanisha Koneru <hk...@hortonworks.com>.

This is a great proposal. +1.

Thanks,
Hanisha









On 2/1/19, 11:04 AM, "Bharat Viswanadham" <bv...@hortonworks.com> wrote:

>Thank You Wangda for driving this discussion.
>+1 for a separate release for submarine.
>Having own release cadence will help iterate the project to grow at a faster pace and also get the new features in hand to the users, and get their feedback quickly.
>
>
>Thanks,
>Bharat
>
>
>
>
>On 2/1/19, 10:54 AM, "Ajay Kumar" <aj...@hortonworks.com> wrote:
>
>    +1, Thanks for driving this. With rise of use cases running ML along with traditional applications this will be of great help.
>    
>    Thanks,
>    Ajay   
>    
>    On 2/1/19, 10:49 AM, "Suma Shivaprasad" <su...@gmail.com> wrote:
>    
>        +1. Thanks for bringing this up Wangda.
>        
>        Makes sense to have Submarine follow its own release cadence given the good
>        momentum/adoption so far. Also, making it run with older versions of Hadoop
>        would drive higher adoption.
>        
>        Suma
>        
>        On Fri, Feb 1, 2019 at 9:40 AM Eric Yang <ey...@hortonworks.com> wrote:
>        
>        > Submarine is an application built for YARN framework, but it does not have
>        > strong dependency on YARN development.  For this kind of projects, it would
>        > be best to enter Apache Incubator cycles to create a new community.  Apache
>        > commons is the only project other than Incubator that has independent
>        > release cycles.  The collection is large, and the project goal is
>        > ambitious.  No one really knows which component works with each other in
>        > Apache commons.  Hadoop is a much more focused project on distributed
>        > computing framework and not incubation sandbox.  For alignment with Hadoop
>        > goals, and we want to prevent Hadoop project to be overloaded while
>        > allowing good ideas to be carried forwarded in Apache incubator.  Put on my
>        > Apache Member hat, my vote is -1 to allow more independent subproject
>        > release cycle in Hadoop project that does not align with Hadoop project
>        > goals.
>        >
>        > Apache incubator process is highly recommended for Submarine:
>        > https://incubator.apache.org/policy/process.html This allows Submarine to
>        > develop for older version of Hadoop like Spark works with multiple versions
>        > of Hadoop.
>        >
>        > Regards,
>        > Eric
>        >
>        > On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
>        >
>        >     Thanks for proposing this Wangda, my +1 as well.
>        >     It is amazing to see the progress made in Submarine last year, the
>        > community grows fast and quiet collaborative. I can see the reasons to get
>        > it release faster in its own cycle. And at the same time, the Ozone way
>        > works very well.
>        >
>        >     —
>        >     Weiwei
>        >     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
>        >     > +1
>        >     >
>        >     > Hello everyone,
>        >     >
>        >     > I am Xun Liu, the head of the machine learning team at Netease
>        > Research Institute. I quite agree with Wangda.
>        >     >
>        >     > Our team is very grateful for getting Submarine machine learning
>        > engine from the community.
>        >     > We are heavy users of Submarine.
>        >     > Because Submarine fits into the direction of our big data team's
>        > hadoop technology stack,
>        >     > It avoids the needs to increase the manpower investment in learning
>        > other container scheduling systems.
>        >     > The important thing is that we can use a common YARN cluster to run
>        > machine learning,
>        >     > which makes the utilization of server resources more efficient, and
>        > reserves a lot of human and material resources in our previous years.
>        >     >
>        >     > Our team have finished the test and deployment of the Submarine and
>        > will provide the service to our e-commerce department (
>        > http://www.kaola.com/) shortly.
>        >     >
>        >     > We also plan to provides the Submarine engine in our existing YARN
>        > cluster in the next six months.
>        >     > Because we have a lot of product departments need to use machine
>        > learning services,
>        >     > for example:
>        >     > 1) Game department (http://game.163.com/) needs AI battle training,
>        >     > 2) News department (http://www.163.com) needs news recommendation,
>        >     > 3) Mailbox department (http://www.163.com) requires anti-spam and
>        > illegal detection,
>        >     > 4) Music department (https://music.163.com/) requires music
>        > recommendation,
>        >     > 5) Education department (http://www.youdao.com) requires voice
>        > recognition,
>        >     > 6) Massive Open Online Courses (https://open.163.com/) requires
>        > multilingual translation and so on.
>        >     >
>        >     > If Submarine can be released independently like Ozone, it will help
>        > us quickly get the latest features and improvements, and it will be great
>        > helpful to our team and users.
>        >     >
>        >     > Thanks hadoop Community!
>        >     >
>        >     >
>        >     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
>        >     > >
>        >     > > Hi devs,
>        >     > >
>        >     > > Since we started submarine-related effort last year, we received a
>        > lot of
>        >     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
>        > are
>        >     > > trying to deploy Submarine to their Hadoop cluster along with big
>        > data
>        >     > > workloads. Linkedin also has big interests to contribute a
>        > Submarine TonY (
>        >     > > https://github.com/linkedin/TonY) runtime to allow users to use
>        > the same
>        >     > > interface.
>        >     > >
>        >     > > From what I can see, there're several issues of putting Submarine
>        > under
>        >     > > yarn-applications directory and have same release cycle with
>        > Hadoop:
>        >     > >
>        >     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
>        > at Jan
>        >     > > 2019. Because of non-predictable blockers and security issues, it
>        > got
>        >     > > delayed a lot. We need to iterate submarine fast at this point.
>        >     > >
>        >     > > 2) We also see a lot of requirements to use Submarine on older
>        > Hadoop
>        >     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
>        > in a
>        >     > > short time, but the requirement to run deep learning is urgent to
>        > them. We
>        >     > > should decouple Submarine from Hadoop version.
>        >     > >
>        >     > > And why we wanna to keep it within Hadoop? First, Submarine
>        > included some
>        >     > > innovation parts such as enhancements of user experiences for YARN
>        >     > > services/containerization support which we can add it back to
>        > Hadoop later
>        >     > > to address common requirements. In addition to that, we have a big
>        > overlap
>        >     > > in the community developing and using it.
>        >     > >
>        >     > > There're several proposals we have went through during Ozone merge
>        > to trunk
>        >     > > discussion:
>        >     > >
>        > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>        >     > >
>        >     > > I propose to adopt Ozone model: which is the same master branch,
>        > different
>        >     > > release cycle, and different release branch. It is a great example
>        > to show
>        >     > > agile release we can do (2 Ozone releases after Oct 2018) with less
>        >     > > overhead to setup CI, projects, etc.
>        >     > >
>        >     > > *Links:*
>        >     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
>        >     > > - Design doc
>        >     > > <
>        > https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
>        > >
>        >     > > - User doc
>        >     > > <
>        > https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
>        > >
>        >     > > (3.2.0
>        >     > > release)
>        >     > > - Blogposts, {Submarine} : Running deep learning workloads on
>        > Apache Hadoop
>        >     > > <
>        > https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
>        > >,
>        >     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
>        >     > > - Talks: Strata Data Conf NY
>        >     > > <
>        > https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
>        > >
>        >     > >
>        >     > > Thoughts?
>        >     > >
>        >     > > Thanks,
>        >     > > Wangda Tan
>        >     >
>        >     >
>        >     >
>        >     > ---------------------------------------------------------------------
>        >     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>        >     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>        >     >
>        >
>        >
>        >
>        
>    
>    
>    ---------------------------------------------------------------------
>    To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
>    For additional commands, e-mail: common-dev-help@hadoop.apache.org
>    
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Bharat Viswanadham <bv...@hortonworks.com>.

Thank You Wangda for driving this discussion.
+1 for a separate release for submarine.
Having own release cadence will help iterate the project to grow at a faster pace and also get the new features in hand to the users, and get their feedback quickly.


Thanks,
Bharat




On 2/1/19, 10:54 AM, "Ajay Kumar" <aj...@hortonworks.com> wrote:

    +1, Thanks for driving this. With rise of use cases running ML along with traditional applications this will be of great help.
    
    Thanks,
    Ajay   
    
    On 2/1/19, 10:49 AM, "Suma Shivaprasad" <su...@gmail.com> wrote:
    
        +1. Thanks for bringing this up Wangda.
        
        Makes sense to have Submarine follow its own release cadence given the good
        momentum/adoption so far. Also, making it run with older versions of Hadoop
        would drive higher adoption.
        
        Suma
        
        On Fri, Feb 1, 2019 at 9:40 AM Eric Yang <ey...@hortonworks.com> wrote:
        
        > Submarine is an application built for YARN framework, but it does not have
        > strong dependency on YARN development.  For this kind of projects, it would
        > be best to enter Apache Incubator cycles to create a new community.  Apache
        > commons is the only project other than Incubator that has independent
        > release cycles.  The collection is large, and the project goal is
        > ambitious.  No one really knows which component works with each other in
        > Apache commons.  Hadoop is a much more focused project on distributed
        > computing framework and not incubation sandbox.  For alignment with Hadoop
        > goals, and we want to prevent Hadoop project to be overloaded while
        > allowing good ideas to be carried forwarded in Apache incubator.  Put on my
        > Apache Member hat, my vote is -1 to allow more independent subproject
        > release cycle in Hadoop project that does not align with Hadoop project
        > goals.
        >
        > Apache incubator process is highly recommended for Submarine:
        > https://incubator.apache.org/policy/process.html This allows Submarine to
        > develop for older version of Hadoop like Spark works with multiple versions
        > of Hadoop.
        >
        > Regards,
        > Eric
        >
        > On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
        >
        >     Thanks for proposing this Wangda, my +1 as well.
        >     It is amazing to see the progress made in Submarine last year, the
        > community grows fast and quiet collaborative. I can see the reasons to get
        > it release faster in its own cycle. And at the same time, the Ozone way
        > works very well.
        >
        >     —
        >     Weiwei
        >     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
        >     > +1
        >     >
        >     > Hello everyone,
        >     >
        >     > I am Xun Liu, the head of the machine learning team at Netease
        > Research Institute. I quite agree with Wangda.
        >     >
        >     > Our team is very grateful for getting Submarine machine learning
        > engine from the community.
        >     > We are heavy users of Submarine.
        >     > Because Submarine fits into the direction of our big data team's
        > hadoop technology stack,
        >     > It avoids the needs to increase the manpower investment in learning
        > other container scheduling systems.
        >     > The important thing is that we can use a common YARN cluster to run
        > machine learning,
        >     > which makes the utilization of server resources more efficient, and
        > reserves a lot of human and material resources in our previous years.
        >     >
        >     > Our team have finished the test and deployment of the Submarine and
        > will provide the service to our e-commerce department (
        > http://www.kaola.com/) shortly.
        >     >
        >     > We also plan to provides the Submarine engine in our existing YARN
        > cluster in the next six months.
        >     > Because we have a lot of product departments need to use machine
        > learning services,
        >     > for example:
        >     > 1) Game department (http://game.163.com/) needs AI battle training,
        >     > 2) News department (http://www.163.com) needs news recommendation,
        >     > 3) Mailbox department (http://www.163.com) requires anti-spam and
        > illegal detection,
        >     > 4) Music department (https://music.163.com/) requires music
        > recommendation,
        >     > 5) Education department (http://www.youdao.com) requires voice
        > recognition,
        >     > 6) Massive Open Online Courses (https://open.163.com/) requires
        > multilingual translation and so on.
        >     >
        >     > If Submarine can be released independently like Ozone, it will help
        > us quickly get the latest features and improvements, and it will be great
        > helpful to our team and users.
        >     >
        >     > Thanks hadoop Community!
        >     >
        >     >
        >     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
        >     > >
        >     > > Hi devs,
        >     > >
        >     > > Since we started submarine-related effort last year, we received a
        > lot of
        >     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
        > are
        >     > > trying to deploy Submarine to their Hadoop cluster along with big
        > data
        >     > > workloads. Linkedin also has big interests to contribute a
        > Submarine TonY (
        >     > > https://github.com/linkedin/TonY) runtime to allow users to use
        > the same
        >     > > interface.
        >     > >
        >     > > From what I can see, there're several issues of putting Submarine
        > under
        >     > > yarn-applications directory and have same release cycle with
        > Hadoop:
        >     > >
        >     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
        > at Jan
        >     > > 2019. Because of non-predictable blockers and security issues, it
        > got
        >     > > delayed a lot. We need to iterate submarine fast at this point.
        >     > >
        >     > > 2) We also see a lot of requirements to use Submarine on older
        > Hadoop
        >     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
        > in a
        >     > > short time, but the requirement to run deep learning is urgent to
        > them. We
        >     > > should decouple Submarine from Hadoop version.
        >     > >
        >     > > And why we wanna to keep it within Hadoop? First, Submarine
        > included some
        >     > > innovation parts such as enhancements of user experiences for YARN
        >     > > services/containerization support which we can add it back to
        > Hadoop later
        >     > > to address common requirements. In addition to that, we have a big
        > overlap
        >     > > in the community developing and using it.
        >     > >
        >     > > There're several proposals we have went through during Ozone merge
        > to trunk
        >     > > discussion:
        >     > >
        > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
        >     > >
        >     > > I propose to adopt Ozone model: which is the same master branch,
        > different
        >     > > release cycle, and different release branch. It is a great example
        > to show
        >     > > agile release we can do (2 Ozone releases after Oct 2018) with less
        >     > > overhead to setup CI, projects, etc.
        >     > >
        >     > > *Links:*
        >     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
        >     > > - Design doc
        >     > > <
        > https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
        > >
        >     > > - User doc
        >     > > <
        > https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
        > >
        >     > > (3.2.0
        >     > > release)
        >     > > - Blogposts, {Submarine} : Running deep learning workloads on
        > Apache Hadoop
        >     > > <
        > https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
        > >,
        >     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
        >     > > - Talks: Strata Data Conf NY
        >     > > <
        > https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
        > >
        >     > >
        >     > > Thoughts?
        >     > >
        >     > > Thanks,
        >     > > Wangda Tan
        >     >
        >     >
        >     >
        >     > ---------------------------------------------------------------------
        >     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
        >     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
        >     >
        >
        >
        >
        
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
    For additional commands, e-mail: common-dev-help@hadoop.apache.org

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Xiaoyu Yao <xy...@hortonworks.com>.

+1, thanks for bringing this up, Wangda. This will help expanding the Hadoop ecosystem by supporting new AI/ML workloads. 

Thanks,
Xiaoyu
On 2/1/19, 10:58 AM, "Dinesh Chitlangia" <dc...@hortonworks.com> wrote:

    +1 This is a fantastic recommendation given the increasing interest in ML across the globe.
    
    Thanks,
    Dinesh
    
    
    
    On 2/1/19, 1:54 PM, "Ajay Kumar" <aj...@hortonworks.com> wrote:
    
        +1, Thanks for driving this. With rise of use cases running ML along with traditional applications this will be of great help.
        
        Thanks,
        Ajay   
        
        On 2/1/19, 10:49 AM, "Suma Shivaprasad" <su...@gmail.com> wrote:
        
            +1. Thanks for bringing this up Wangda.
            
            Makes sense to have Submarine follow its own release cadence given the good
            momentum/adoption so far. Also, making it run with older versions of Hadoop
            would drive higher adoption.
            
            Suma
            
            On Fri, Feb 1, 2019 at 9:40 AM Eric Yang <ey...@hortonworks.com> wrote:
            
            > Submarine is an application built for YARN framework, but it does not have
            > strong dependency on YARN development.  For this kind of projects, it would
            > be best to enter Apache Incubator cycles to create a new community.  Apache
            > commons is the only project other than Incubator that has independent
            > release cycles.  The collection is large, and the project goal is
            > ambitious.  No one really knows which component works with each other in
            > Apache commons.  Hadoop is a much more focused project on distributed
            > computing framework and not incubation sandbox.  For alignment with Hadoop
            > goals, and we want to prevent Hadoop project to be overloaded while
            > allowing good ideas to be carried forwarded in Apache incubator.  Put on my
            > Apache Member hat, my vote is -1 to allow more independent subproject
            > release cycle in Hadoop project that does not align with Hadoop project
            > goals.
            >
            > Apache incubator process is highly recommended for Submarine:
            > https://incubator.apache.org/policy/process.html This allows Submarine to
            > develop for older version of Hadoop like Spark works with multiple versions
            > of Hadoop.
            >
            > Regards,
            > Eric
            >
            > On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
            >
            >     Thanks for proposing this Wangda, my +1 as well.
            >     It is amazing to see the progress made in Submarine last year, the
            > community grows fast and quiet collaborative. I can see the reasons to get
            > it release faster in its own cycle. And at the same time, the Ozone way
            > works very well.
            >
            >     —
            >     Weiwei
            >     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
            >     > +1
            >     >
            >     > Hello everyone,
            >     >
            >     > I am Xun Liu, the head of the machine learning team at Netease
            > Research Institute. I quite agree with Wangda.
            >     >
            >     > Our team is very grateful for getting Submarine machine learning
            > engine from the community.
            >     > We are heavy users of Submarine.
            >     > Because Submarine fits into the direction of our big data team's
            > hadoop technology stack,
            >     > It avoids the needs to increase the manpower investment in learning
            > other container scheduling systems.
            >     > The important thing is that we can use a common YARN cluster to run
            > machine learning,
            >     > which makes the utilization of server resources more efficient, and
            > reserves a lot of human and material resources in our previous years.
            >     >
            >     > Our team have finished the test and deployment of the Submarine and
            > will provide the service to our e-commerce department (
            > http://www.kaola.com/) shortly.
            >     >
            >     > We also plan to provides the Submarine engine in our existing YARN
            > cluster in the next six months.
            >     > Because we have a lot of product departments need to use machine
            > learning services,
            >     > for example:
            >     > 1) Game department (http://game.163.com/) needs AI battle training,
            >     > 2) News department (http://www.163.com) needs news recommendation,
            >     > 3) Mailbox department (http://www.163.com) requires anti-spam and
            > illegal detection,
            >     > 4) Music department (https://music.163.com/) requires music
            > recommendation,
            >     > 5) Education department (http://www.youdao.com) requires voice
            > recognition,
            >     > 6) Massive Open Online Courses (https://open.163.com/) requires
            > multilingual translation and so on.
            >     >
            >     > If Submarine can be released independently like Ozone, it will help
            > us quickly get the latest features and improvements, and it will be great
            > helpful to our team and users.
            >     >
            >     > Thanks hadoop Community!
            >     >
            >     >
            >     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
            >     > >
            >     > > Hi devs,
            >     > >
            >     > > Since we started submarine-related effort last year, we received a
            > lot of
            >     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
            > are
            >     > > trying to deploy Submarine to their Hadoop cluster along with big
            > data
            >     > > workloads. Linkedin also has big interests to contribute a
            > Submarine TonY (
            >     > > https://github.com/linkedin/TonY) runtime to allow users to use
            > the same
            >     > > interface.
            >     > >
            >     > > From what I can see, there're several issues of putting Submarine
            > under
            >     > > yarn-applications directory and have same release cycle with
            > Hadoop:
            >     > >
            >     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
            > at Jan
            >     > > 2019. Because of non-predictable blockers and security issues, it
            > got
            >     > > delayed a lot. We need to iterate submarine fast at this point.
            >     > >
            >     > > 2) We also see a lot of requirements to use Submarine on older
            > Hadoop
            >     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
            > in a
            >     > > short time, but the requirement to run deep learning is urgent to
            > them. We
            >     > > should decouple Submarine from Hadoop version.
            >     > >
            >     > > And why we wanna to keep it within Hadoop? First, Submarine
            > included some
            >     > > innovation parts such as enhancements of user experiences for YARN
            >     > > services/containerization support which we can add it back to
            > Hadoop later
            >     > > to address common requirements. In addition to that, we have a big
            > overlap
            >     > > in the community developing and using it.
            >     > >
            >     > > There're several proposals we have went through during Ozone merge
            > to trunk
            >     > > discussion:
            >     > >
            > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
            >     > >
            >     > > I propose to adopt Ozone model: which is the same master branch,
            > different
            >     > > release cycle, and different release branch. It is a great example
            > to show
            >     > > agile release we can do (2 Ozone releases after Oct 2018) with less
            >     > > overhead to setup CI, projects, etc.
            >     > >
            >     > > *Links:*
            >     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
            >     > > - Design doc
            >     > > <
            > https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
            > >
            >     > > - User doc
            >     > > <
            > https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
            > >
            >     > > (3.2.0
            >     > > release)
            >     > > - Blogposts, {Submarine} : Running deep learning workloads on
            > Apache Hadoop
            >     > > <
            > https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
            > >,
            >     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
            >     > > - Talks: Strata Data Conf NY
            >     > > <
            > https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
            > >
            >     > >
            >     > > Thoughts?
            >     > >
            >     > > Thanks,
            >     > > Wangda Tan
            >     >
            >     >
            >     >
            >     > ---------------------------------------------------------------------
            >     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
            >     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
            >     >
            >
            >
            >
            
        
        
        ---------------------------------------------------------------------
        To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
        For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
        
        
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
    For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
    
    


---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Xiaoyu Yao <xy...@hortonworks.com>.

+1, thanks for bringing this up, Wangda. This will help expanding the Hadoop ecosystem by supporting new AI/ML workloads. 

Thanks,
Xiaoyu
On 2/1/19, 10:58 AM, "Dinesh Chitlangia" <dc...@hortonworks.com> wrote:

    +1 This is a fantastic recommendation given the increasing interest in ML across the globe.
    
    Thanks,
    Dinesh
    
    
    
    On 2/1/19, 1:54 PM, "Ajay Kumar" <aj...@hortonworks.com> wrote:
    
        +1, Thanks for driving this. With rise of use cases running ML along with traditional applications this will be of great help.
        
        Thanks,
        Ajay   
        
        On 2/1/19, 10:49 AM, "Suma Shivaprasad" <su...@gmail.com> wrote:
        
            +1. Thanks for bringing this up Wangda.
            
            Makes sense to have Submarine follow its own release cadence given the good
            momentum/adoption so far. Also, making it run with older versions of Hadoop
            would drive higher adoption.
            
            Suma
            
            On Fri, Feb 1, 2019 at 9:40 AM Eric Yang <ey...@hortonworks.com> wrote:
            
            > Submarine is an application built for YARN framework, but it does not have
            > strong dependency on YARN development.  For this kind of projects, it would
            > be best to enter Apache Incubator cycles to create a new community.  Apache
            > commons is the only project other than Incubator that has independent
            > release cycles.  The collection is large, and the project goal is
            > ambitious.  No one really knows which component works with each other in
            > Apache commons.  Hadoop is a much more focused project on distributed
            > computing framework and not incubation sandbox.  For alignment with Hadoop
            > goals, and we want to prevent Hadoop project to be overloaded while
            > allowing good ideas to be carried forwarded in Apache incubator.  Put on my
            > Apache Member hat, my vote is -1 to allow more independent subproject
            > release cycle in Hadoop project that does not align with Hadoop project
            > goals.
            >
            > Apache incubator process is highly recommended for Submarine:
            > https://incubator.apache.org/policy/process.html This allows Submarine to
            > develop for older version of Hadoop like Spark works with multiple versions
            > of Hadoop.
            >
            > Regards,
            > Eric
            >
            > On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
            >
            >     Thanks for proposing this Wangda, my +1 as well.
            >     It is amazing to see the progress made in Submarine last year, the
            > community grows fast and quiet collaborative. I can see the reasons to get
            > it release faster in its own cycle. And at the same time, the Ozone way
            > works very well.
            >
            >     —
            >     Weiwei
            >     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
            >     > +1
            >     >
            >     > Hello everyone,
            >     >
            >     > I am Xun Liu, the head of the machine learning team at Netease
            > Research Institute. I quite agree with Wangda.
            >     >
            >     > Our team is very grateful for getting Submarine machine learning
            > engine from the community.
            >     > We are heavy users of Submarine.
            >     > Because Submarine fits into the direction of our big data team's
            > hadoop technology stack,
            >     > It avoids the needs to increase the manpower investment in learning
            > other container scheduling systems.
            >     > The important thing is that we can use a common YARN cluster to run
            > machine learning,
            >     > which makes the utilization of server resources more efficient, and
            > reserves a lot of human and material resources in our previous years.
            >     >
            >     > Our team have finished the test and deployment of the Submarine and
            > will provide the service to our e-commerce department (
            > http://www.kaola.com/) shortly.
            >     >
            >     > We also plan to provides the Submarine engine in our existing YARN
            > cluster in the next six months.
            >     > Because we have a lot of product departments need to use machine
            > learning services,
            >     > for example:
            >     > 1) Game department (http://game.163.com/) needs AI battle training,
            >     > 2) News department (http://www.163.com) needs news recommendation,
            >     > 3) Mailbox department (http://www.163.com) requires anti-spam and
            > illegal detection,
            >     > 4) Music department (https://music.163.com/) requires music
            > recommendation,
            >     > 5) Education department (http://www.youdao.com) requires voice
            > recognition,
            >     > 6) Massive Open Online Courses (https://open.163.com/) requires
            > multilingual translation and so on.
            >     >
            >     > If Submarine can be released independently like Ozone, it will help
            > us quickly get the latest features and improvements, and it will be great
            > helpful to our team and users.
            >     >
            >     > Thanks hadoop Community!
            >     >
            >     >
            >     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
            >     > >
            >     > > Hi devs,
            >     > >
            >     > > Since we started submarine-related effort last year, we received a
            > lot of
            >     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
            > are
            >     > > trying to deploy Submarine to their Hadoop cluster along with big
            > data
            >     > > workloads. Linkedin also has big interests to contribute a
            > Submarine TonY (
            >     > > https://github.com/linkedin/TonY) runtime to allow users to use
            > the same
            >     > > interface.
            >     > >
            >     > > From what I can see, there're several issues of putting Submarine
            > under
            >     > > yarn-applications directory and have same release cycle with
            > Hadoop:
            >     > >
            >     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
            > at Jan
            >     > > 2019. Because of non-predictable blockers and security issues, it
            > got
            >     > > delayed a lot. We need to iterate submarine fast at this point.
            >     > >
            >     > > 2) We also see a lot of requirements to use Submarine on older
            > Hadoop
            >     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
            > in a
            >     > > short time, but the requirement to run deep learning is urgent to
            > them. We
            >     > > should decouple Submarine from Hadoop version.
            >     > >
            >     > > And why we wanna to keep it within Hadoop? First, Submarine
            > included some
            >     > > innovation parts such as enhancements of user experiences for YARN
            >     > > services/containerization support which we can add it back to
            > Hadoop later
            >     > > to address common requirements. In addition to that, we have a big
            > overlap
            >     > > in the community developing and using it.
            >     > >
            >     > > There're several proposals we have went through during Ozone merge
            > to trunk
            >     > > discussion:
            >     > >
            > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
            >     > >
            >     > > I propose to adopt Ozone model: which is the same master branch,
            > different
            >     > > release cycle, and different release branch. It is a great example
            > to show
            >     > > agile release we can do (2 Ozone releases after Oct 2018) with less
            >     > > overhead to setup CI, projects, etc.
            >     > >
            >     > > *Links:*
            >     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
            >     > > - Design doc
            >     > > <
            > https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
            > >
            >     > > - User doc
            >     > > <
            > https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
            > >
            >     > > (3.2.0
            >     > > release)
            >     > > - Blogposts, {Submarine} : Running deep learning workloads on
            > Apache Hadoop
            >     > > <
            > https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
            > >,
            >     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
            >     > > - Talks: Strata Data Conf NY
            >     > > <
            > https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
            > >
            >     > >
            >     > > Thoughts?
            >     > >
            >     > > Thanks,
            >     > > Wangda Tan
            >     >
            >     >
            >     >
            >     > ---------------------------------------------------------------------
            >     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
            >     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
            >     >
            >
            >
            >
            
        
        
        ---------------------------------------------------------------------
        To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
        For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
        
        
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
    For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
    
    


---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Xiaoyu Yao <xy...@hortonworks.com>.

+1, thanks for bringing this up, Wangda. This will help expanding the Hadoop ecosystem by supporting new AI/ML workloads. 

Thanks,
Xiaoyu
On 2/1/19, 10:58 AM, "Dinesh Chitlangia" <dc...@hortonworks.com> wrote:

    +1 This is a fantastic recommendation given the increasing interest in ML across the globe.
    
    Thanks,
    Dinesh
    
    
    
    On 2/1/19, 1:54 PM, "Ajay Kumar" <aj...@hortonworks.com> wrote:
    
        +1, Thanks for driving this. With rise of use cases running ML along with traditional applications this will be of great help.
        
        Thanks,
        Ajay   
        
        On 2/1/19, 10:49 AM, "Suma Shivaprasad" <su...@gmail.com> wrote:
        
            +1. Thanks for bringing this up Wangda.
            
            Makes sense to have Submarine follow its own release cadence given the good
            momentum/adoption so far. Also, making it run with older versions of Hadoop
            would drive higher adoption.
            
            Suma
            
            On Fri, Feb 1, 2019 at 9:40 AM Eric Yang <ey...@hortonworks.com> wrote:
            
            > Submarine is an application built for YARN framework, but it does not have
            > strong dependency on YARN development.  For this kind of projects, it would
            > be best to enter Apache Incubator cycles to create a new community.  Apache
            > commons is the only project other than Incubator that has independent
            > release cycles.  The collection is large, and the project goal is
            > ambitious.  No one really knows which component works with each other in
            > Apache commons.  Hadoop is a much more focused project on distributed
            > computing framework and not incubation sandbox.  For alignment with Hadoop
            > goals, and we want to prevent Hadoop project to be overloaded while
            > allowing good ideas to be carried forwarded in Apache incubator.  Put on my
            > Apache Member hat, my vote is -1 to allow more independent subproject
            > release cycle in Hadoop project that does not align with Hadoop project
            > goals.
            >
            > Apache incubator process is highly recommended for Submarine:
            > https://incubator.apache.org/policy/process.html This allows Submarine to
            > develop for older version of Hadoop like Spark works with multiple versions
            > of Hadoop.
            >
            > Regards,
            > Eric
            >
            > On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
            >
            >     Thanks for proposing this Wangda, my +1 as well.
            >     It is amazing to see the progress made in Submarine last year, the
            > community grows fast and quiet collaborative. I can see the reasons to get
            > it release faster in its own cycle. And at the same time, the Ozone way
            > works very well.
            >
            >     —
            >     Weiwei
            >     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
            >     > +1
            >     >
            >     > Hello everyone,
            >     >
            >     > I am Xun Liu, the head of the machine learning team at Netease
            > Research Institute. I quite agree with Wangda.
            >     >
            >     > Our team is very grateful for getting Submarine machine learning
            > engine from the community.
            >     > We are heavy users of Submarine.
            >     > Because Submarine fits into the direction of our big data team's
            > hadoop technology stack,
            >     > It avoids the needs to increase the manpower investment in learning
            > other container scheduling systems.
            >     > The important thing is that we can use a common YARN cluster to run
            > machine learning,
            >     > which makes the utilization of server resources more efficient, and
            > reserves a lot of human and material resources in our previous years.
            >     >
            >     > Our team have finished the test and deployment of the Submarine and
            > will provide the service to our e-commerce department (
            > http://www.kaola.com/) shortly.
            >     >
            >     > We also plan to provides the Submarine engine in our existing YARN
            > cluster in the next six months.
            >     > Because we have a lot of product departments need to use machine
            > learning services,
            >     > for example:
            >     > 1) Game department (http://game.163.com/) needs AI battle training,
            >     > 2) News department (http://www.163.com) needs news recommendation,
            >     > 3) Mailbox department (http://www.163.com) requires anti-spam and
            > illegal detection,
            >     > 4) Music department (https://music.163.com/) requires music
            > recommendation,
            >     > 5) Education department (http://www.youdao.com) requires voice
            > recognition,
            >     > 6) Massive Open Online Courses (https://open.163.com/) requires
            > multilingual translation and so on.
            >     >
            >     > If Submarine can be released independently like Ozone, it will help
            > us quickly get the latest features and improvements, and it will be great
            > helpful to our team and users.
            >     >
            >     > Thanks hadoop Community!
            >     >
            >     >
            >     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
            >     > >
            >     > > Hi devs,
            >     > >
            >     > > Since we started submarine-related effort last year, we received a
            > lot of
            >     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
            > are
            >     > > trying to deploy Submarine to their Hadoop cluster along with big
            > data
            >     > > workloads. Linkedin also has big interests to contribute a
            > Submarine TonY (
            >     > > https://github.com/linkedin/TonY) runtime to allow users to use
            > the same
            >     > > interface.
            >     > >
            >     > > From what I can see, there're several issues of putting Submarine
            > under
            >     > > yarn-applications directory and have same release cycle with
            > Hadoop:
            >     > >
            >     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
            > at Jan
            >     > > 2019. Because of non-predictable blockers and security issues, it
            > got
            >     > > delayed a lot. We need to iterate submarine fast at this point.
            >     > >
            >     > > 2) We also see a lot of requirements to use Submarine on older
            > Hadoop
            >     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
            > in a
            >     > > short time, but the requirement to run deep learning is urgent to
            > them. We
            >     > > should decouple Submarine from Hadoop version.
            >     > >
            >     > > And why we wanna to keep it within Hadoop? First, Submarine
            > included some
            >     > > innovation parts such as enhancements of user experiences for YARN
            >     > > services/containerization support which we can add it back to
            > Hadoop later
            >     > > to address common requirements. In addition to that, we have a big
            > overlap
            >     > > in the community developing and using it.
            >     > >
            >     > > There're several proposals we have went through during Ozone merge
            > to trunk
            >     > > discussion:
            >     > >
            > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
            >     > >
            >     > > I propose to adopt Ozone model: which is the same master branch,
            > different
            >     > > release cycle, and different release branch. It is a great example
            > to show
            >     > > agile release we can do (2 Ozone releases after Oct 2018) with less
            >     > > overhead to setup CI, projects, etc.
            >     > >
            >     > > *Links:*
            >     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
            >     > > - Design doc
            >     > > <
            > https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
            > >
            >     > > - User doc
            >     > > <
            > https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
            > >
            >     > > (3.2.0
            >     > > release)
            >     > > - Blogposts, {Submarine} : Running deep learning workloads on
            > Apache Hadoop
            >     > > <
            > https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
            > >,
            >     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
            >     > > - Talks: Strata Data Conf NY
            >     > > <
            > https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
            > >
            >     > >
            >     > > Thoughts?
            >     > >
            >     > > Thanks,
            >     > > Wangda Tan
            >     >
            >     >
            >     >
            >     > ---------------------------------------------------------------------
            >     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
            >     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
            >     >
            >
            >
            >
            
        
        
        ---------------------------------------------------------------------
        To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
        For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
        
        
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
    For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
    
    


---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Xiaoyu Yao <xy...@hortonworks.com>.

+1, thanks for bringing this up, Wangda. This will help expanding the Hadoop ecosystem by supporting new AI/ML workloads. 

Thanks,
Xiaoyu
On 2/1/19, 10:58 AM, "Dinesh Chitlangia" <dc...@hortonworks.com> wrote:

    +1 This is a fantastic recommendation given the increasing interest in ML across the globe.
    
    Thanks,
    Dinesh
    
    
    
    On 2/1/19, 1:54 PM, "Ajay Kumar" <aj...@hortonworks.com> wrote:
    
        +1, Thanks for driving this. With rise of use cases running ML along with traditional applications this will be of great help.
        
        Thanks,
        Ajay   
        
        On 2/1/19, 10:49 AM, "Suma Shivaprasad" <su...@gmail.com> wrote:
        
            +1. Thanks for bringing this up Wangda.
            
            Makes sense to have Submarine follow its own release cadence given the good
            momentum/adoption so far. Also, making it run with older versions of Hadoop
            would drive higher adoption.
            
            Suma
            
            On Fri, Feb 1, 2019 at 9:40 AM Eric Yang <ey...@hortonworks.com> wrote:
            
            > Submarine is an application built for YARN framework, but it does not have
            > strong dependency on YARN development.  For this kind of projects, it would
            > be best to enter Apache Incubator cycles to create a new community.  Apache
            > commons is the only project other than Incubator that has independent
            > release cycles.  The collection is large, and the project goal is
            > ambitious.  No one really knows which component works with each other in
            > Apache commons.  Hadoop is a much more focused project on distributed
            > computing framework and not incubation sandbox.  For alignment with Hadoop
            > goals, and we want to prevent Hadoop project to be overloaded while
            > allowing good ideas to be carried forwarded in Apache incubator.  Put on my
            > Apache Member hat, my vote is -1 to allow more independent subproject
            > release cycle in Hadoop project that does not align with Hadoop project
            > goals.
            >
            > Apache incubator process is highly recommended for Submarine:
            > https://incubator.apache.org/policy/process.html This allows Submarine to
            > develop for older version of Hadoop like Spark works with multiple versions
            > of Hadoop.
            >
            > Regards,
            > Eric
            >
            > On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
            >
            >     Thanks for proposing this Wangda, my +1 as well.
            >     It is amazing to see the progress made in Submarine last year, the
            > community grows fast and quiet collaborative. I can see the reasons to get
            > it release faster in its own cycle. And at the same time, the Ozone way
            > works very well.
            >
            >     —
            >     Weiwei
            >     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
            >     > +1
            >     >
            >     > Hello everyone,
            >     >
            >     > I am Xun Liu, the head of the machine learning team at Netease
            > Research Institute. I quite agree with Wangda.
            >     >
            >     > Our team is very grateful for getting Submarine machine learning
            > engine from the community.
            >     > We are heavy users of Submarine.
            >     > Because Submarine fits into the direction of our big data team's
            > hadoop technology stack,
            >     > It avoids the needs to increase the manpower investment in learning
            > other container scheduling systems.
            >     > The important thing is that we can use a common YARN cluster to run
            > machine learning,
            >     > which makes the utilization of server resources more efficient, and
            > reserves a lot of human and material resources in our previous years.
            >     >
            >     > Our team have finished the test and deployment of the Submarine and
            > will provide the service to our e-commerce department (
            > http://www.kaola.com/) shortly.
            >     >
            >     > We also plan to provides the Submarine engine in our existing YARN
            > cluster in the next six months.
            >     > Because we have a lot of product departments need to use machine
            > learning services,
            >     > for example:
            >     > 1) Game department (http://game.163.com/) needs AI battle training,
            >     > 2) News department (http://www.163.com) needs news recommendation,
            >     > 3) Mailbox department (http://www.163.com) requires anti-spam and
            > illegal detection,
            >     > 4) Music department (https://music.163.com/) requires music
            > recommendation,
            >     > 5) Education department (http://www.youdao.com) requires voice
            > recognition,
            >     > 6) Massive Open Online Courses (https://open.163.com/) requires
            > multilingual translation and so on.
            >     >
            >     > If Submarine can be released independently like Ozone, it will help
            > us quickly get the latest features and improvements, and it will be great
            > helpful to our team and users.
            >     >
            >     > Thanks hadoop Community!
            >     >
            >     >
            >     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
            >     > >
            >     > > Hi devs,
            >     > >
            >     > > Since we started submarine-related effort last year, we received a
            > lot of
            >     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
            > are
            >     > > trying to deploy Submarine to their Hadoop cluster along with big
            > data
            >     > > workloads. Linkedin also has big interests to contribute a
            > Submarine TonY (
            >     > > https://github.com/linkedin/TonY) runtime to allow users to use
            > the same
            >     > > interface.
            >     > >
            >     > > From what I can see, there're several issues of putting Submarine
            > under
            >     > > yarn-applications directory and have same release cycle with
            > Hadoop:
            >     > >
            >     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
            > at Jan
            >     > > 2019. Because of non-predictable blockers and security issues, it
            > got
            >     > > delayed a lot. We need to iterate submarine fast at this point.
            >     > >
            >     > > 2) We also see a lot of requirements to use Submarine on older
            > Hadoop
            >     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
            > in a
            >     > > short time, but the requirement to run deep learning is urgent to
            > them. We
            >     > > should decouple Submarine from Hadoop version.
            >     > >
            >     > > And why we wanna to keep it within Hadoop? First, Submarine
            > included some
            >     > > innovation parts such as enhancements of user experiences for YARN
            >     > > services/containerization support which we can add it back to
            > Hadoop later
            >     > > to address common requirements. In addition to that, we have a big
            > overlap
            >     > > in the community developing and using it.
            >     > >
            >     > > There're several proposals we have went through during Ozone merge
            > to trunk
            >     > > discussion:
            >     > >
            > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
            >     > >
            >     > > I propose to adopt Ozone model: which is the same master branch,
            > different
            >     > > release cycle, and different release branch. It is a great example
            > to show
            >     > > agile release we can do (2 Ozone releases after Oct 2018) with less
            >     > > overhead to setup CI, projects, etc.
            >     > >
            >     > > *Links:*
            >     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
            >     > > - Design doc
            >     > > <
            > https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
            > >
            >     > > - User doc
            >     > > <
            > https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
            > >
            >     > > (3.2.0
            >     > > release)
            >     > > - Blogposts, {Submarine} : Running deep learning workloads on
            > Apache Hadoop
            >     > > <
            > https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
            > >,
            >     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
            >     > > - Talks: Strata Data Conf NY
            >     > > <
            > https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
            > >
            >     > >
            >     > > Thoughts?
            >     > >
            >     > > Thanks,
            >     > > Wangda Tan
            >     >
            >     >
            >     >
            >     > ---------------------------------------------------------------------
            >     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
            >     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
            >     >
            >
            >
            >
            
        
        
        ---------------------------------------------------------------------
        To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
        For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
        
        
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
    For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
    
    


---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Dinesh Chitlangia <dc...@hortonworks.com>.

+1 This is a fantastic recommendation given the increasing interest in ML across the globe.

Thanks,
Dinesh



On 2/1/19, 1:54 PM, "Ajay Kumar" <aj...@hortonworks.com> wrote:

    +1, Thanks for driving this. With rise of use cases running ML along with traditional applications this will be of great help.
    
    Thanks,
    Ajay   
    
    On 2/1/19, 10:49 AM, "Suma Shivaprasad" <su...@gmail.com> wrote:
    
        +1. Thanks for bringing this up Wangda.
        
        Makes sense to have Submarine follow its own release cadence given the good
        momentum/adoption so far. Also, making it run with older versions of Hadoop
        would drive higher adoption.
        
        Suma
        
        On Fri, Feb 1, 2019 at 9:40 AM Eric Yang <ey...@hortonworks.com> wrote:
        
        > Submarine is an application built for YARN framework, but it does not have
        > strong dependency on YARN development.  For this kind of projects, it would
        > be best to enter Apache Incubator cycles to create a new community.  Apache
        > commons is the only project other than Incubator that has independent
        > release cycles.  The collection is large, and the project goal is
        > ambitious.  No one really knows which component works with each other in
        > Apache commons.  Hadoop is a much more focused project on distributed
        > computing framework and not incubation sandbox.  For alignment with Hadoop
        > goals, and we want to prevent Hadoop project to be overloaded while
        > allowing good ideas to be carried forwarded in Apache incubator.  Put on my
        > Apache Member hat, my vote is -1 to allow more independent subproject
        > release cycle in Hadoop project that does not align with Hadoop project
        > goals.
        >
        > Apache incubator process is highly recommended for Submarine:
        > https://incubator.apache.org/policy/process.html This allows Submarine to
        > develop for older version of Hadoop like Spark works with multiple versions
        > of Hadoop.
        >
        > Regards,
        > Eric
        >
        > On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
        >
        >     Thanks for proposing this Wangda, my +1 as well.
        >     It is amazing to see the progress made in Submarine last year, the
        > community grows fast and quiet collaborative. I can see the reasons to get
        > it release faster in its own cycle. And at the same time, the Ozone way
        > works very well.
        >
        >     —
        >     Weiwei
        >     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
        >     > +1
        >     >
        >     > Hello everyone,
        >     >
        >     > I am Xun Liu, the head of the machine learning team at Netease
        > Research Institute. I quite agree with Wangda.
        >     >
        >     > Our team is very grateful for getting Submarine machine learning
        > engine from the community.
        >     > We are heavy users of Submarine.
        >     > Because Submarine fits into the direction of our big data team's
        > hadoop technology stack,
        >     > It avoids the needs to increase the manpower investment in learning
        > other container scheduling systems.
        >     > The important thing is that we can use a common YARN cluster to run
        > machine learning,
        >     > which makes the utilization of server resources more efficient, and
        > reserves a lot of human and material resources in our previous years.
        >     >
        >     > Our team have finished the test and deployment of the Submarine and
        > will provide the service to our e-commerce department (
        > http://www.kaola.com/) shortly.
        >     >
        >     > We also plan to provides the Submarine engine in our existing YARN
        > cluster in the next six months.
        >     > Because we have a lot of product departments need to use machine
        > learning services,
        >     > for example:
        >     > 1) Game department (http://game.163.com/) needs AI battle training,
        >     > 2) News department (http://www.163.com) needs news recommendation,
        >     > 3) Mailbox department (http://www.163.com) requires anti-spam and
        > illegal detection,
        >     > 4) Music department (https://music.163.com/) requires music
        > recommendation,
        >     > 5) Education department (http://www.youdao.com) requires voice
        > recognition,
        >     > 6) Massive Open Online Courses (https://open.163.com/) requires
        > multilingual translation and so on.
        >     >
        >     > If Submarine can be released independently like Ozone, it will help
        > us quickly get the latest features and improvements, and it will be great
        > helpful to our team and users.
        >     >
        >     > Thanks hadoop Community!
        >     >
        >     >
        >     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
        >     > >
        >     > > Hi devs,
        >     > >
        >     > > Since we started submarine-related effort last year, we received a
        > lot of
        >     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
        > are
        >     > > trying to deploy Submarine to their Hadoop cluster along with big
        > data
        >     > > workloads. Linkedin also has big interests to contribute a
        > Submarine TonY (
        >     > > https://github.com/linkedin/TonY) runtime to allow users to use
        > the same
        >     > > interface.
        >     > >
        >     > > From what I can see, there're several issues of putting Submarine
        > under
        >     > > yarn-applications directory and have same release cycle with
        > Hadoop:
        >     > >
        >     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
        > at Jan
        >     > > 2019. Because of non-predictable blockers and security issues, it
        > got
        >     > > delayed a lot. We need to iterate submarine fast at this point.
        >     > >
        >     > > 2) We also see a lot of requirements to use Submarine on older
        > Hadoop
        >     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
        > in a
        >     > > short time, but the requirement to run deep learning is urgent to
        > them. We
        >     > > should decouple Submarine from Hadoop version.
        >     > >
        >     > > And why we wanna to keep it within Hadoop? First, Submarine
        > included some
        >     > > innovation parts such as enhancements of user experiences for YARN
        >     > > services/containerization support which we can add it back to
        > Hadoop later
        >     > > to address common requirements. In addition to that, we have a big
        > overlap
        >     > > in the community developing and using it.
        >     > >
        >     > > There're several proposals we have went through during Ozone merge
        > to trunk
        >     > > discussion:
        >     > >
        > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
        >     > >
        >     > > I propose to adopt Ozone model: which is the same master branch,
        > different
        >     > > release cycle, and different release branch. It is a great example
        > to show
        >     > > agile release we can do (2 Ozone releases after Oct 2018) with less
        >     > > overhead to setup CI, projects, etc.
        >     > >
        >     > > *Links:*
        >     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
        >     > > - Design doc
        >     > > <
        > https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
        > >
        >     > > - User doc
        >     > > <
        > https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
        > >
        >     > > (3.2.0
        >     > > release)
        >     > > - Blogposts, {Submarine} : Running deep learning workloads on
        > Apache Hadoop
        >     > > <
        > https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
        > >,
        >     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
        >     > > - Talks: Strata Data Conf NY
        >     > > <
        > https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
        > >
        >     > >
        >     > > Thoughts?
        >     > >
        >     > > Thanks,
        >     > > Wangda Tan
        >     >
        >     >
        >     >
        >     > ---------------------------------------------------------------------
        >     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
        >     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
        >     >
        >
        >
        >
        
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
    For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Bharat Viswanadham <bv...@hortonworks.com>.

Thank You Wangda for driving this discussion.
+1 for a separate release for submarine.
Having own release cadence will help iterate the project to grow at a faster pace and also get the new features in hand to the users, and get their feedback quickly.


Thanks,
Bharat




On 2/1/19, 10:54 AM, "Ajay Kumar" <aj...@hortonworks.com> wrote:

    +1, Thanks for driving this. With rise of use cases running ML along with traditional applications this will be of great help.
    
    Thanks,
    Ajay   
    
    On 2/1/19, 10:49 AM, "Suma Shivaprasad" <su...@gmail.com> wrote:
    
        +1. Thanks for bringing this up Wangda.
        
        Makes sense to have Submarine follow its own release cadence given the good
        momentum/adoption so far. Also, making it run with older versions of Hadoop
        would drive higher adoption.
        
        Suma
        
        On Fri, Feb 1, 2019 at 9:40 AM Eric Yang <ey...@hortonworks.com> wrote:
        
        > Submarine is an application built for YARN framework, but it does not have
        > strong dependency on YARN development.  For this kind of projects, it would
        > be best to enter Apache Incubator cycles to create a new community.  Apache
        > commons is the only project other than Incubator that has independent
        > release cycles.  The collection is large, and the project goal is
        > ambitious.  No one really knows which component works with each other in
        > Apache commons.  Hadoop is a much more focused project on distributed
        > computing framework and not incubation sandbox.  For alignment with Hadoop
        > goals, and we want to prevent Hadoop project to be overloaded while
        > allowing good ideas to be carried forwarded in Apache incubator.  Put on my
        > Apache Member hat, my vote is -1 to allow more independent subproject
        > release cycle in Hadoop project that does not align with Hadoop project
        > goals.
        >
        > Apache incubator process is highly recommended for Submarine:
        > https://incubator.apache.org/policy/process.html This allows Submarine to
        > develop for older version of Hadoop like Spark works with multiple versions
        > of Hadoop.
        >
        > Regards,
        > Eric
        >
        > On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
        >
        >     Thanks for proposing this Wangda, my +1 as well.
        >     It is amazing to see the progress made in Submarine last year, the
        > community grows fast and quiet collaborative. I can see the reasons to get
        > it release faster in its own cycle. And at the same time, the Ozone way
        > works very well.
        >
        >     —
        >     Weiwei
        >     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
        >     > +1
        >     >
        >     > Hello everyone,
        >     >
        >     > I am Xun Liu, the head of the machine learning team at Netease
        > Research Institute. I quite agree with Wangda.
        >     >
        >     > Our team is very grateful for getting Submarine machine learning
        > engine from the community.
        >     > We are heavy users of Submarine.
        >     > Because Submarine fits into the direction of our big data team's
        > hadoop technology stack,
        >     > It avoids the needs to increase the manpower investment in learning
        > other container scheduling systems.
        >     > The important thing is that we can use a common YARN cluster to run
        > machine learning,
        >     > which makes the utilization of server resources more efficient, and
        > reserves a lot of human and material resources in our previous years.
        >     >
        >     > Our team have finished the test and deployment of the Submarine and
        > will provide the service to our e-commerce department (
        > http://www.kaola.com/) shortly.
        >     >
        >     > We also plan to provides the Submarine engine in our existing YARN
        > cluster in the next six months.
        >     > Because we have a lot of product departments need to use machine
        > learning services,
        >     > for example:
        >     > 1) Game department (http://game.163.com/) needs AI battle training,
        >     > 2) News department (http://www.163.com) needs news recommendation,
        >     > 3) Mailbox department (http://www.163.com) requires anti-spam and
        > illegal detection,
        >     > 4) Music department (https://music.163.com/) requires music
        > recommendation,
        >     > 5) Education department (http://www.youdao.com) requires voice
        > recognition,
        >     > 6) Massive Open Online Courses (https://open.163.com/) requires
        > multilingual translation and so on.
        >     >
        >     > If Submarine can be released independently like Ozone, it will help
        > us quickly get the latest features and improvements, and it will be great
        > helpful to our team and users.
        >     >
        >     > Thanks hadoop Community!
        >     >
        >     >
        >     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
        >     > >
        >     > > Hi devs,
        >     > >
        >     > > Since we started submarine-related effort last year, we received a
        > lot of
        >     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
        > are
        >     > > trying to deploy Submarine to their Hadoop cluster along with big
        > data
        >     > > workloads. Linkedin also has big interests to contribute a
        > Submarine TonY (
        >     > > https://github.com/linkedin/TonY) runtime to allow users to use
        > the same
        >     > > interface.
        >     > >
        >     > > From what I can see, there're several issues of putting Submarine
        > under
        >     > > yarn-applications directory and have same release cycle with
        > Hadoop:
        >     > >
        >     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
        > at Jan
        >     > > 2019. Because of non-predictable blockers and security issues, it
        > got
        >     > > delayed a lot. We need to iterate submarine fast at this point.
        >     > >
        >     > > 2) We also see a lot of requirements to use Submarine on older
        > Hadoop
        >     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
        > in a
        >     > > short time, but the requirement to run deep learning is urgent to
        > them. We
        >     > > should decouple Submarine from Hadoop version.
        >     > >
        >     > > And why we wanna to keep it within Hadoop? First, Submarine
        > included some
        >     > > innovation parts such as enhancements of user experiences for YARN
        >     > > services/containerization support which we can add it back to
        > Hadoop later
        >     > > to address common requirements. In addition to that, we have a big
        > overlap
        >     > > in the community developing and using it.
        >     > >
        >     > > There're several proposals we have went through during Ozone merge
        > to trunk
        >     > > discussion:
        >     > >
        > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
        >     > >
        >     > > I propose to adopt Ozone model: which is the same master branch,
        > different
        >     > > release cycle, and different release branch. It is a great example
        > to show
        >     > > agile release we can do (2 Ozone releases after Oct 2018) with less
        >     > > overhead to setup CI, projects, etc.
        >     > >
        >     > > *Links:*
        >     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
        >     > > - Design doc
        >     > > <
        > https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
        > >
        >     > > - User doc
        >     > > <
        > https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
        > >
        >     > > (3.2.0
        >     > > release)
        >     > > - Blogposts, {Submarine} : Running deep learning workloads on
        > Apache Hadoop
        >     > > <
        > https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
        > >,
        >     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
        >     > > - Talks: Strata Data Conf NY
        >     > > <
        > https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
        > >
        >     > >
        >     > > Thoughts?
        >     > >
        >     > > Thanks,
        >     > > Wangda Tan
        >     >
        >     >
        >     >
        >     > ---------------------------------------------------------------------
        >     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
        >     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
        >     >
        >
        >
        >
        
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
    For additional commands, e-mail: common-dev-help@hadoop.apache.org

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Ajay Kumar <aj...@hortonworks.com>.

+1, Thanks for driving this. With rise of use cases running ML along with traditional applications this will be of great help.

Thanks,
Ajay   

On 2/1/19, 10:49 AM, "Suma Shivaprasad" <su...@gmail.com> wrote:

    +1. Thanks for bringing this up Wangda.
    
    Makes sense to have Submarine follow its own release cadence given the good
    momentum/adoption so far. Also, making it run with older versions of Hadoop
    would drive higher adoption.
    
    Suma
    
    On Fri, Feb 1, 2019 at 9:40 AM Eric Yang <ey...@hortonworks.com> wrote:
    
    > Submarine is an application built for YARN framework, but it does not have
    > strong dependency on YARN development.  For this kind of projects, it would
    > be best to enter Apache Incubator cycles to create a new community.  Apache
    > commons is the only project other than Incubator that has independent
    > release cycles.  The collection is large, and the project goal is
    > ambitious.  No one really knows which component works with each other in
    > Apache commons.  Hadoop is a much more focused project on distributed
    > computing framework and not incubation sandbox.  For alignment with Hadoop
    > goals, and we want to prevent Hadoop project to be overloaded while
    > allowing good ideas to be carried forwarded in Apache incubator.  Put on my
    > Apache Member hat, my vote is -1 to allow more independent subproject
    > release cycle in Hadoop project that does not align with Hadoop project
    > goals.
    >
    > Apache incubator process is highly recommended for Submarine:
    > https://incubator.apache.org/policy/process.html This allows Submarine to
    > develop for older version of Hadoop like Spark works with multiple versions
    > of Hadoop.
    >
    > Regards,
    > Eric
    >
    > On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
    >
    >     Thanks for proposing this Wangda, my +1 as well.
    >     It is amazing to see the progress made in Submarine last year, the
    > community grows fast and quiet collaborative. I can see the reasons to get
    > it release faster in its own cycle. And at the same time, the Ozone way
    > works very well.
    >
    >     —
    >     Weiwei
    >     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
    >     > +1
    >     >
    >     > Hello everyone,
    >     >
    >     > I am Xun Liu, the head of the machine learning team at Netease
    > Research Institute. I quite agree with Wangda.
    >     >
    >     > Our team is very grateful for getting Submarine machine learning
    > engine from the community.
    >     > We are heavy users of Submarine.
    >     > Because Submarine fits into the direction of our big data team's
    > hadoop technology stack,
    >     > It avoids the needs to increase the manpower investment in learning
    > other container scheduling systems.
    >     > The important thing is that we can use a common YARN cluster to run
    > machine learning,
    >     > which makes the utilization of server resources more efficient, and
    > reserves a lot of human and material resources in our previous years.
    >     >
    >     > Our team have finished the test and deployment of the Submarine and
    > will provide the service to our e-commerce department (
    > http://www.kaola.com/) shortly.
    >     >
    >     > We also plan to provides the Submarine engine in our existing YARN
    > cluster in the next six months.
    >     > Because we have a lot of product departments need to use machine
    > learning services,
    >     > for example:
    >     > 1) Game department (http://game.163.com/) needs AI battle training,
    >     > 2) News department (http://www.163.com) needs news recommendation,
    >     > 3) Mailbox department (http://www.163.com) requires anti-spam and
    > illegal detection,
    >     > 4) Music department (https://music.163.com/) requires music
    > recommendation,
    >     > 5) Education department (http://www.youdao.com) requires voice
    > recognition,
    >     > 6) Massive Open Online Courses (https://open.163.com/) requires
    > multilingual translation and so on.
    >     >
    >     > If Submarine can be released independently like Ozone, it will help
    > us quickly get the latest features and improvements, and it will be great
    > helpful to our team and users.
    >     >
    >     > Thanks hadoop Community!
    >     >
    >     >
    >     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
    >     > >
    >     > > Hi devs,
    >     > >
    >     > > Since we started submarine-related effort last year, we received a
    > lot of
    >     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
    > are
    >     > > trying to deploy Submarine to their Hadoop cluster along with big
    > data
    >     > > workloads. Linkedin also has big interests to contribute a
    > Submarine TonY (
    >     > > https://github.com/linkedin/TonY) runtime to allow users to use
    > the same
    >     > > interface.
    >     > >
    >     > > From what I can see, there're several issues of putting Submarine
    > under
    >     > > yarn-applications directory and have same release cycle with
    > Hadoop:
    >     > >
    >     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
    > at Jan
    >     > > 2019. Because of non-predictable blockers and security issues, it
    > got
    >     > > delayed a lot. We need to iterate submarine fast at this point.
    >     > >
    >     > > 2) We also see a lot of requirements to use Submarine on older
    > Hadoop
    >     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
    > in a
    >     > > short time, but the requirement to run deep learning is urgent to
    > them. We
    >     > > should decouple Submarine from Hadoop version.
    >     > >
    >     > > And why we wanna to keep it within Hadoop? First, Submarine
    > included some
    >     > > innovation parts such as enhancements of user experiences for YARN
    >     > > services/containerization support which we can add it back to
    > Hadoop later
    >     > > to address common requirements. In addition to that, we have a big
    > overlap
    >     > > in the community developing and using it.
    >     > >
    >     > > There're several proposals we have went through during Ozone merge
    > to trunk
    >     > > discussion:
    >     > >
    > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
    >     > >
    >     > > I propose to adopt Ozone model: which is the same master branch,
    > different
    >     > > release cycle, and different release branch. It is a great example
    > to show
    >     > > agile release we can do (2 Ozone releases after Oct 2018) with less
    >     > > overhead to setup CI, projects, etc.
    >     > >
    >     > > *Links:*
    >     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
    >     > > - Design doc
    >     > > <
    > https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
    > >
    >     > > - User doc
    >     > > <
    > https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
    > >
    >     > > (3.2.0
    >     > > release)
    >     > > - Blogposts, {Submarine} : Running deep learning workloads on
    > Apache Hadoop
    >     > > <
    > https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
    > >,
    >     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
    >     > > - Talks: Strata Data Conf NY
    >     > > <
    > https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
    > >
    >     > >
    >     > > Thoughts?
    >     > >
    >     > > Thanks,
    >     > > Wangda Tan
    >     >
    >     >
    >     >
    >     > ---------------------------------------------------------------------
    >     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
    >     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
    >     >
    >
    >
    >

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Ajay Kumar <aj...@hortonworks.com>.

+1, Thanks for driving this. With rise of use cases running ML along with traditional applications this will be of great help.

Thanks,
Ajay   

On 2/1/19, 10:49 AM, "Suma Shivaprasad" <su...@gmail.com> wrote:

    +1. Thanks for bringing this up Wangda.
    
    Makes sense to have Submarine follow its own release cadence given the good
    momentum/adoption so far. Also, making it run with older versions of Hadoop
    would drive higher adoption.
    
    Suma
    
    On Fri, Feb 1, 2019 at 9:40 AM Eric Yang <ey...@hortonworks.com> wrote:
    
    > Submarine is an application built for YARN framework, but it does not have
    > strong dependency on YARN development.  For this kind of projects, it would
    > be best to enter Apache Incubator cycles to create a new community.  Apache
    > commons is the only project other than Incubator that has independent
    > release cycles.  The collection is large, and the project goal is
    > ambitious.  No one really knows which component works with each other in
    > Apache commons.  Hadoop is a much more focused project on distributed
    > computing framework and not incubation sandbox.  For alignment with Hadoop
    > goals, and we want to prevent Hadoop project to be overloaded while
    > allowing good ideas to be carried forwarded in Apache incubator.  Put on my
    > Apache Member hat, my vote is -1 to allow more independent subproject
    > release cycle in Hadoop project that does not align with Hadoop project
    > goals.
    >
    > Apache incubator process is highly recommended for Submarine:
    > https://incubator.apache.org/policy/process.html This allows Submarine to
    > develop for older version of Hadoop like Spark works with multiple versions
    > of Hadoop.
    >
    > Regards,
    > Eric
    >
    > On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
    >
    >     Thanks for proposing this Wangda, my +1 as well.
    >     It is amazing to see the progress made in Submarine last year, the
    > community grows fast and quiet collaborative. I can see the reasons to get
    > it release faster in its own cycle. And at the same time, the Ozone way
    > works very well.
    >
    >     —
    >     Weiwei
    >     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
    >     > +1
    >     >
    >     > Hello everyone,
    >     >
    >     > I am Xun Liu, the head of the machine learning team at Netease
    > Research Institute. I quite agree with Wangda.
    >     >
    >     > Our team is very grateful for getting Submarine machine learning
    > engine from the community.
    >     > We are heavy users of Submarine.
    >     > Because Submarine fits into the direction of our big data team's
    > hadoop technology stack,
    >     > It avoids the needs to increase the manpower investment in learning
    > other container scheduling systems.
    >     > The important thing is that we can use a common YARN cluster to run
    > machine learning,
    >     > which makes the utilization of server resources more efficient, and
    > reserves a lot of human and material resources in our previous years.
    >     >
    >     > Our team have finished the test and deployment of the Submarine and
    > will provide the service to our e-commerce department (
    > http://www.kaola.com/) shortly.
    >     >
    >     > We also plan to provides the Submarine engine in our existing YARN
    > cluster in the next six months.
    >     > Because we have a lot of product departments need to use machine
    > learning services,
    >     > for example:
    >     > 1) Game department (http://game.163.com/) needs AI battle training,
    >     > 2) News department (http://www.163.com) needs news recommendation,
    >     > 3) Mailbox department (http://www.163.com) requires anti-spam and
    > illegal detection,
    >     > 4) Music department (https://music.163.com/) requires music
    > recommendation,
    >     > 5) Education department (http://www.youdao.com) requires voice
    > recognition,
    >     > 6) Massive Open Online Courses (https://open.163.com/) requires
    > multilingual translation and so on.
    >     >
    >     > If Submarine can be released independently like Ozone, it will help
    > us quickly get the latest features and improvements, and it will be great
    > helpful to our team and users.
    >     >
    >     > Thanks hadoop Community!
    >     >
    >     >
    >     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
    >     > >
    >     > > Hi devs,
    >     > >
    >     > > Since we started submarine-related effort last year, we received a
    > lot of
    >     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
    > are
    >     > > trying to deploy Submarine to their Hadoop cluster along with big
    > data
    >     > > workloads. Linkedin also has big interests to contribute a
    > Submarine TonY (
    >     > > https://github.com/linkedin/TonY) runtime to allow users to use
    > the same
    >     > > interface.
    >     > >
    >     > > From what I can see, there're several issues of putting Submarine
    > under
    >     > > yarn-applications directory and have same release cycle with
    > Hadoop:
    >     > >
    >     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
    > at Jan
    >     > > 2019. Because of non-predictable blockers and security issues, it
    > got
    >     > > delayed a lot. We need to iterate submarine fast at this point.
    >     > >
    >     > > 2) We also see a lot of requirements to use Submarine on older
    > Hadoop
    >     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
    > in a
    >     > > short time, but the requirement to run deep learning is urgent to
    > them. We
    >     > > should decouple Submarine from Hadoop version.
    >     > >
    >     > > And why we wanna to keep it within Hadoop? First, Submarine
    > included some
    >     > > innovation parts such as enhancements of user experiences for YARN
    >     > > services/containerization support which we can add it back to
    > Hadoop later
    >     > > to address common requirements. In addition to that, we have a big
    > overlap
    >     > > in the community developing and using it.
    >     > >
    >     > > There're several proposals we have went through during Ozone merge
    > to trunk
    >     > > discussion:
    >     > >
    > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
    >     > >
    >     > > I propose to adopt Ozone model: which is the same master branch,
    > different
    >     > > release cycle, and different release branch. It is a great example
    > to show
    >     > > agile release we can do (2 Ozone releases after Oct 2018) with less
    >     > > overhead to setup CI, projects, etc.
    >     > >
    >     > > *Links:*
    >     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
    >     > > - Design doc
    >     > > <
    > https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
    > >
    >     > > - User doc
    >     > > <
    > https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
    > >
    >     > > (3.2.0
    >     > > release)
    >     > > - Blogposts, {Submarine} : Running deep learning workloads on
    > Apache Hadoop
    >     > > <
    > https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
    > >,
    >     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
    >     > > - Talks: Strata Data Conf NY
    >     > > <
    > https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
    > >
    >     > >
    >     > > Thoughts?
    >     > >
    >     > > Thanks,
    >     > > Wangda Tan
    >     >
    >     >
    >     >
    >     > ---------------------------------------------------------------------
    >     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
    >     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
    >     >
    >
    >
    >

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Ajay Kumar <aj...@hortonworks.com>.

+1, Thanks for driving this. With rise of use cases running ML along with traditional applications this will be of great help.

Thanks,
Ajay   

On 2/1/19, 10:49 AM, "Suma Shivaprasad" <su...@gmail.com> wrote:

    +1. Thanks for bringing this up Wangda.
    
    Makes sense to have Submarine follow its own release cadence given the good
    momentum/adoption so far. Also, making it run with older versions of Hadoop
    would drive higher adoption.
    
    Suma
    
    On Fri, Feb 1, 2019 at 9:40 AM Eric Yang <ey...@hortonworks.com> wrote:
    
    > Submarine is an application built for YARN framework, but it does not have
    > strong dependency on YARN development.  For this kind of projects, it would
    > be best to enter Apache Incubator cycles to create a new community.  Apache
    > commons is the only project other than Incubator that has independent
    > release cycles.  The collection is large, and the project goal is
    > ambitious.  No one really knows which component works with each other in
    > Apache commons.  Hadoop is a much more focused project on distributed
    > computing framework and not incubation sandbox.  For alignment with Hadoop
    > goals, and we want to prevent Hadoop project to be overloaded while
    > allowing good ideas to be carried forwarded in Apache incubator.  Put on my
    > Apache Member hat, my vote is -1 to allow more independent subproject
    > release cycle in Hadoop project that does not align with Hadoop project
    > goals.
    >
    > Apache incubator process is highly recommended for Submarine:
    > https://incubator.apache.org/policy/process.html This allows Submarine to
    > develop for older version of Hadoop like Spark works with multiple versions
    > of Hadoop.
    >
    > Regards,
    > Eric
    >
    > On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
    >
    >     Thanks for proposing this Wangda, my +1 as well.
    >     It is amazing to see the progress made in Submarine last year, the
    > community grows fast and quiet collaborative. I can see the reasons to get
    > it release faster in its own cycle. And at the same time, the Ozone way
    > works very well.
    >
    >     —
    >     Weiwei
    >     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
    >     > +1
    >     >
    >     > Hello everyone,
    >     >
    >     > I am Xun Liu, the head of the machine learning team at Netease
    > Research Institute. I quite agree with Wangda.
    >     >
    >     > Our team is very grateful for getting Submarine machine learning
    > engine from the community.
    >     > We are heavy users of Submarine.
    >     > Because Submarine fits into the direction of our big data team's
    > hadoop technology stack,
    >     > It avoids the needs to increase the manpower investment in learning
    > other container scheduling systems.
    >     > The important thing is that we can use a common YARN cluster to run
    > machine learning,
    >     > which makes the utilization of server resources more efficient, and
    > reserves a lot of human and material resources in our previous years.
    >     >
    >     > Our team have finished the test and deployment of the Submarine and
    > will provide the service to our e-commerce department (
    > http://www.kaola.com/) shortly.
    >     >
    >     > We also plan to provides the Submarine engine in our existing YARN
    > cluster in the next six months.
    >     > Because we have a lot of product departments need to use machine
    > learning services,
    >     > for example:
    >     > 1) Game department (http://game.163.com/) needs AI battle training,
    >     > 2) News department (http://www.163.com) needs news recommendation,
    >     > 3) Mailbox department (http://www.163.com) requires anti-spam and
    > illegal detection,
    >     > 4) Music department (https://music.163.com/) requires music
    > recommendation,
    >     > 5) Education department (http://www.youdao.com) requires voice
    > recognition,
    >     > 6) Massive Open Online Courses (https://open.163.com/) requires
    > multilingual translation and so on.
    >     >
    >     > If Submarine can be released independently like Ozone, it will help
    > us quickly get the latest features and improvements, and it will be great
    > helpful to our team and users.
    >     >
    >     > Thanks hadoop Community!
    >     >
    >     >
    >     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
    >     > >
    >     > > Hi devs,
    >     > >
    >     > > Since we started submarine-related effort last year, we received a
    > lot of
    >     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
    > are
    >     > > trying to deploy Submarine to their Hadoop cluster along with big
    > data
    >     > > workloads. Linkedin also has big interests to contribute a
    > Submarine TonY (
    >     > > https://github.com/linkedin/TonY) runtime to allow users to use
    > the same
    >     > > interface.
    >     > >
    >     > > From what I can see, there're several issues of putting Submarine
    > under
    >     > > yarn-applications directory and have same release cycle with
    > Hadoop:
    >     > >
    >     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
    > at Jan
    >     > > 2019. Because of non-predictable blockers and security issues, it
    > got
    >     > > delayed a lot. We need to iterate submarine fast at this point.
    >     > >
    >     > > 2) We also see a lot of requirements to use Submarine on older
    > Hadoop
    >     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
    > in a
    >     > > short time, but the requirement to run deep learning is urgent to
    > them. We
    >     > > should decouple Submarine from Hadoop version.
    >     > >
    >     > > And why we wanna to keep it within Hadoop? First, Submarine
    > included some
    >     > > innovation parts such as enhancements of user experiences for YARN
    >     > > services/containerization support which we can add it back to
    > Hadoop later
    >     > > to address common requirements. In addition to that, we have a big
    > overlap
    >     > > in the community developing and using it.
    >     > >
    >     > > There're several proposals we have went through during Ozone merge
    > to trunk
    >     > > discussion:
    >     > >
    > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
    >     > >
    >     > > I propose to adopt Ozone model: which is the same master branch,
    > different
    >     > > release cycle, and different release branch. It is a great example
    > to show
    >     > > agile release we can do (2 Ozone releases after Oct 2018) with less
    >     > > overhead to setup CI, projects, etc.
    >     > >
    >     > > *Links:*
    >     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
    >     > > - Design doc
    >     > > <
    > https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
    > >
    >     > > - User doc
    >     > > <
    > https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
    > >
    >     > > (3.2.0
    >     > > release)
    >     > > - Blogposts, {Submarine} : Running deep learning workloads on
    > Apache Hadoop
    >     > > <
    > https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
    > >,
    >     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
    >     > > - Talks: Strata Data Conf NY
    >     > > <
    > https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
    > >
    >     > >
    >     > > Thoughts?
    >     > >
    >     > > Thanks,
    >     > > Wangda Tan
    >     >
    >     >
    >     >
    >     > ---------------------------------------------------------------------
    >     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
    >     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
    >     >
    >
    >
    >

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Ajay Kumar <aj...@hortonworks.com>.

+1, Thanks for driving this. With rise of use cases running ML along with traditional applications this will be of great help.

Thanks,
Ajay   

On 2/1/19, 10:49 AM, "Suma Shivaprasad" <su...@gmail.com> wrote:

    +1. Thanks for bringing this up Wangda.
    
    Makes sense to have Submarine follow its own release cadence given the good
    momentum/adoption so far. Also, making it run with older versions of Hadoop
    would drive higher adoption.
    
    Suma
    
    On Fri, Feb 1, 2019 at 9:40 AM Eric Yang <ey...@hortonworks.com> wrote:
    
    > Submarine is an application built for YARN framework, but it does not have
    > strong dependency on YARN development.  For this kind of projects, it would
    > be best to enter Apache Incubator cycles to create a new community.  Apache
    > commons is the only project other than Incubator that has independent
    > release cycles.  The collection is large, and the project goal is
    > ambitious.  No one really knows which component works with each other in
    > Apache commons.  Hadoop is a much more focused project on distributed
    > computing framework and not incubation sandbox.  For alignment with Hadoop
    > goals, and we want to prevent Hadoop project to be overloaded while
    > allowing good ideas to be carried forwarded in Apache incubator.  Put on my
    > Apache Member hat, my vote is -1 to allow more independent subproject
    > release cycle in Hadoop project that does not align with Hadoop project
    > goals.
    >
    > Apache incubator process is highly recommended for Submarine:
    > https://incubator.apache.org/policy/process.html This allows Submarine to
    > develop for older version of Hadoop like Spark works with multiple versions
    > of Hadoop.
    >
    > Regards,
    > Eric
    >
    > On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
    >
    >     Thanks for proposing this Wangda, my +1 as well.
    >     It is amazing to see the progress made in Submarine last year, the
    > community grows fast and quiet collaborative. I can see the reasons to get
    > it release faster in its own cycle. And at the same time, the Ozone way
    > works very well.
    >
    >     —
    >     Weiwei
    >     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
    >     > +1
    >     >
    >     > Hello everyone,
    >     >
    >     > I am Xun Liu, the head of the machine learning team at Netease
    > Research Institute. I quite agree with Wangda.
    >     >
    >     > Our team is very grateful for getting Submarine machine learning
    > engine from the community.
    >     > We are heavy users of Submarine.
    >     > Because Submarine fits into the direction of our big data team's
    > hadoop technology stack,
    >     > It avoids the needs to increase the manpower investment in learning
    > other container scheduling systems.
    >     > The important thing is that we can use a common YARN cluster to run
    > machine learning,
    >     > which makes the utilization of server resources more efficient, and
    > reserves a lot of human and material resources in our previous years.
    >     >
    >     > Our team have finished the test and deployment of the Submarine and
    > will provide the service to our e-commerce department (
    > http://www.kaola.com/) shortly.
    >     >
    >     > We also plan to provides the Submarine engine in our existing YARN
    > cluster in the next six months.
    >     > Because we have a lot of product departments need to use machine
    > learning services,
    >     > for example:
    >     > 1) Game department (http://game.163.com/) needs AI battle training,
    >     > 2) News department (http://www.163.com) needs news recommendation,
    >     > 3) Mailbox department (http://www.163.com) requires anti-spam and
    > illegal detection,
    >     > 4) Music department (https://music.163.com/) requires music
    > recommendation,
    >     > 5) Education department (http://www.youdao.com) requires voice
    > recognition,
    >     > 6) Massive Open Online Courses (https://open.163.com/) requires
    > multilingual translation and so on.
    >     >
    >     > If Submarine can be released independently like Ozone, it will help
    > us quickly get the latest features and improvements, and it will be great
    > helpful to our team and users.
    >     >
    >     > Thanks hadoop Community!
    >     >
    >     >
    >     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
    >     > >
    >     > > Hi devs,
    >     > >
    >     > > Since we started submarine-related effort last year, we received a
    > lot of
    >     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
    > are
    >     > > trying to deploy Submarine to their Hadoop cluster along with big
    > data
    >     > > workloads. Linkedin also has big interests to contribute a
    > Submarine TonY (
    >     > > https://github.com/linkedin/TonY) runtime to allow users to use
    > the same
    >     > > interface.
    >     > >
    >     > > From what I can see, there're several issues of putting Submarine
    > under
    >     > > yarn-applications directory and have same release cycle with
    > Hadoop:
    >     > >
    >     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
    > at Jan
    >     > > 2019. Because of non-predictable blockers and security issues, it
    > got
    >     > > delayed a lot. We need to iterate submarine fast at this point.
    >     > >
    >     > > 2) We also see a lot of requirements to use Submarine on older
    > Hadoop
    >     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
    > in a
    >     > > short time, but the requirement to run deep learning is urgent to
    > them. We
    >     > > should decouple Submarine from Hadoop version.
    >     > >
    >     > > And why we wanna to keep it within Hadoop? First, Submarine
    > included some
    >     > > innovation parts such as enhancements of user experiences for YARN
    >     > > services/containerization support which we can add it back to
    > Hadoop later
    >     > > to address common requirements. In addition to that, we have a big
    > overlap
    >     > > in the community developing and using it.
    >     > >
    >     > > There're several proposals we have went through during Ozone merge
    > to trunk
    >     > > discussion:
    >     > >
    > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
    >     > >
    >     > > I propose to adopt Ozone model: which is the same master branch,
    > different
    >     > > release cycle, and different release branch. It is a great example
    > to show
    >     > > agile release we can do (2 Ozone releases after Oct 2018) with less
    >     > > overhead to setup CI, projects, etc.
    >     > >
    >     > > *Links:*
    >     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
    >     > > - Design doc
    >     > > <
    > https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
    > >
    >     > > - User doc
    >     > > <
    > https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
    > >
    >     > > (3.2.0
    >     > > release)
    >     > > - Blogposts, {Submarine} : Running deep learning workloads on
    > Apache Hadoop
    >     > > <
    > https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
    > >,
    >     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
    >     > > - Talks: Strata Data Conf NY
    >     > > <
    > https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
    > >
    >     > >
    >     > > Thoughts?
    >     > >
    >     > > Thanks,
    >     > > Wangda Tan
    >     >
    >     >
    >     >
    >     > ---------------------------------------------------------------------
    >     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
    >     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
    >     >
    >
    >
    >

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Suma Shivaprasad <su...@gmail.com>.

+1. Thanks for bringing this up Wangda.

Makes sense to have Submarine follow its own release cadence given the good
momentum/adoption so far. Also, making it run with older versions of Hadoop
would drive higher adoption.

Suma

On Fri, Feb 1, 2019 at 9:40 AM Eric Yang <ey...@hortonworks.com> wrote:

> Submarine is an application built for YARN framework, but it does not have
> strong dependency on YARN development.  For this kind of projects, it would
> be best to enter Apache Incubator cycles to create a new community.  Apache
> commons is the only project other than Incubator that has independent
> release cycles.  The collection is large, and the project goal is
> ambitious.  No one really knows which component works with each other in
> Apache commons.  Hadoop is a much more focused project on distributed
> computing framework and not incubation sandbox.  For alignment with Hadoop
> goals, and we want to prevent Hadoop project to be overloaded while
> allowing good ideas to be carried forwarded in Apache incubator.  Put on my
> Apache Member hat, my vote is -1 to allow more independent subproject
> release cycle in Hadoop project that does not align with Hadoop project
> goals.
>
> Apache incubator process is highly recommended for Submarine:
> https://incubator.apache.org/policy/process.html This allows Submarine to
> develop for older version of Hadoop like Spark works with multiple versions
> of Hadoop.
>
> Regards,
> Eric
>
> On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
>
>     Thanks for proposing this Wangda, my +1 as well.
>     It is amazing to see the progress made in Submarine last year, the
> community grows fast and quiet collaborative. I can see the reasons to get
> it release faster in its own cycle. And at the same time, the Ozone way
> works very well.
>
>     —
>     Weiwei
>     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
>     > +1
>     >
>     > Hello everyone,
>     >
>     > I am Xun Liu, the head of the machine learning team at Netease
> Research Institute. I quite agree with Wangda.
>     >
>     > Our team is very grateful for getting Submarine machine learning
> engine from the community.
>     > We are heavy users of Submarine.
>     > Because Submarine fits into the direction of our big data team's
> hadoop technology stack,
>     > It avoids the needs to increase the manpower investment in learning
> other container scheduling systems.
>     > The important thing is that we can use a common YARN cluster to run
> machine learning,
>     > which makes the utilization of server resources more efficient, and
> reserves a lot of human and material resources in our previous years.
>     >
>     > Our team have finished the test and deployment of the Submarine and
> will provide the service to our e-commerce department (
> http://www.kaola.com/) shortly.
>     >
>     > We also plan to provides the Submarine engine in our existing YARN
> cluster in the next six months.
>     > Because we have a lot of product departments need to use machine
> learning services,
>     > for example:
>     > 1) Game department (http://game.163.com/) needs AI battle training,
>     > 2) News department (http://www.163.com) needs news recommendation,
>     > 3) Mailbox department (http://www.163.com) requires anti-spam and
> illegal detection,
>     > 4) Music department (https://music.163.com/) requires music
> recommendation,
>     > 5) Education department (http://www.youdao.com) requires voice
> recognition,
>     > 6) Massive Open Online Courses (https://open.163.com/) requires
> multilingual translation and so on.
>     >
>     > If Submarine can be released independently like Ozone, it will help
> us quickly get the latest features and improvements, and it will be great
> helpful to our team and users.
>     >
>     > Thanks hadoop Community!
>     >
>     >
>     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
>     > >
>     > > Hi devs,
>     > >
>     > > Since we started submarine-related effort last year, we received a
> lot of
>     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
> are
>     > > trying to deploy Submarine to their Hadoop cluster along with big
> data
>     > > workloads. Linkedin also has big interests to contribute a
> Submarine TonY (
>     > > https://github.com/linkedin/TonY) runtime to allow users to use
> the same
>     > > interface.
>     > >
>     > > From what I can see, there're several issues of putting Submarine
> under
>     > > yarn-applications directory and have same release cycle with
> Hadoop:
>     > >
>     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
> at Jan
>     > > 2019. Because of non-predictable blockers and security issues, it
> got
>     > > delayed a lot. We need to iterate submarine fast at this point.
>     > >
>     > > 2) We also see a lot of requirements to use Submarine on older
> Hadoop
>     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
> in a
>     > > short time, but the requirement to run deep learning is urgent to
> them. We
>     > > should decouple Submarine from Hadoop version.
>     > >
>     > > And why we wanna to keep it within Hadoop? First, Submarine
> included some
>     > > innovation parts such as enhancements of user experiences for YARN
>     > > services/containerization support which we can add it back to
> Hadoop later
>     > > to address common requirements. In addition to that, we have a big
> overlap
>     > > in the community developing and using it.
>     > >
>     > > There're several proposals we have went through during Ozone merge
> to trunk
>     > > discussion:
>     > >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>     > >
>     > > I propose to adopt Ozone model: which is the same master branch,
> different
>     > > release cycle, and different release branch. It is a great example
> to show
>     > > agile release we can do (2 Ozone releases after Oct 2018) with less
>     > > overhead to setup CI, projects, etc.
>     > >
>     > > *Links:*
>     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
>     > > - Design doc
>     > > <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
>     > > - User doc
>     > > <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
>     > > (3.2.0
>     > > release)
>     > > - Blogposts, {Submarine} : Running deep learning workloads on
> Apache Hadoop
>     > > <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
>     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
>     > > - Talks: Strata Data Conf NY
>     > > <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>     > >
>     > > Thoughts?
>     > >
>     > > Thanks,
>     > > Wangda Tan
>     >
>     >
>     >
>     > ---------------------------------------------------------------------
>     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>     >
>
>
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Wangda Tan <wh...@gmail.com>.

Eric,
Thanks for your reconsideration. We will definitely try best to not break
compatibilities, etc. like how we did to other components!

Really appreciate everybody's support, thoughts, suggestions shared on this
thread. Given the discussion went very positive, I will go ahead to send a
voting thread.

Best,
Wangda

On Fri, Feb 1, 2019 at 2:06 PM Eric Yang <ey...@hortonworks.com> wrote:

> If HDFS or YARN breaks compatibility with Submarine, it will require to
> make release to catch up with the latest Hadoop changes.  On
> hadoop.apache.org website, the latest news may always have Submarine on
> top to repair compatibility with latest of Hadoop.  This may overwhelm any
> interesting news that may happen in Hadoop space.  I don’t like to see that
> happen, but unavoidable with independent release cycle.  Maybe there is a
> good way to avoid this with help of release manager to ensure that
> Hadoop/Submarine don’t break compatibility frequently.
>
>
>
> For me to lift my veto, release managers of independent release cycles
> need to take responsibility to ensure X version of Hadoop is tested with Y
> version of Submarine.  Release managers will have to do more work to ensure
> the defined combination works.  With the greater responsibility of release
> management comes with its own reward.  Seasoned PMC may be nominated to
> become Apache Member, which will help with Submarine to enter Apache
> Incubator when time is right.  Hence, I will withdraw my veto and let
> Submarine set its own course.
>
>
>
> Good luck Wangda.
>
>
>
> Regards,
>
> Eric
>
>
>
> *From: *Wangda Tan <wh...@gmail.com>
> *Date: *Friday, February 1, 2019 at 10:52 AM
> *To: *Eric Yang <ey...@hortonworks.com>
> *Cc: *Weiwei Yang <ab...@gmail.com>, Xun Liu <ne...@163.com>,
> Hadoop Common <co...@hadoop.apache.org>, "yarn-dev@hadoop.apache.org"
> <ya...@hadoop.apache.org>, Hdfs-dev <hd...@hadoop.apache.org>, "
> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>
> *Subject: *Re: [DISCUSS] Making submarine to different release model like
> Ozone
>
>
>
> Thanks everyone for sharing thoughts!
>
>
>
> Eric, appreciate your suggestions. But there are many examples to have
> separate releases, like Hive's storage API, OZone, etc. For loosely coupled
> sub-projects, it gonna be great (at least for most of the users) to have
> separate releases so new features can be faster consumed and iterated. From
> above feedbacks from developers and users, I think it is also what people
> want.
>
>
>
> Another concern you mentioned is Submarine is aligned with Hadoop project
> goals. From feedbacks we can see, it attracts companies continue using
> Hadoop to solve their ML/DL requirements, it also created a good feedback
> loop, many issues faced, and some new functionalities added by Submarine
> went back to Hadoop. Such as localization files, directories. GPU topology
> related enhancement, etc.
>
>
>
> We will definitely use this sub-project opportunity to fast grow both
> Submarine and Hadoop, try to get fast release cycles for both of the
> projects. And for your suggestion about Apache incubator, we can reconsider
> it once Submarine becomes a more independent project, now it is still too
> small and too much overhead to go through the process, I don't want to stop
> the fast-growing community for months to go through incubator process for
> now.
>
>
>
> I really hope my comment can help you reconsider the veto. :)
>
>
>
> Thanks,
>
> Wangda
>
>
>
> On Fri, Feb 1, 2019 at 9:39 AM Eric Yang <ey...@hortonworks.com> wrote:
>
> Submarine is an application built for YARN framework, but it does not have
> strong dependency on YARN development.  For this kind of projects, it would
> be best to enter Apache Incubator cycles to create a new community.  Apache
> commons is the only project other than Incubator that has independent
> release cycles.  The collection is large, and the project goal is
> ambitious.  No one really knows which component works with each other in
> Apache commons.  Hadoop is a much more focused project on distributed
> computing framework and not incubation sandbox.  For alignment with Hadoop
> goals, and we want to prevent Hadoop project to be overloaded while
> allowing good ideas to be carried forwarded in Apache incubator.  Put on my
> Apache Member hat, my vote is -1 to allow more independent subproject
> release cycle in Hadoop project that does not align with Hadoop project
> goals.
>
> Apache incubator process is highly recommended for Submarine:
> https://incubator.apache.org/policy/process.html This allows Submarine to
> develop for older version of Hadoop like Spark works with multiple versions
> of Hadoop.
>
> Regards,
> Eric
>
> On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
>
>     Thanks for proposing this Wangda, my +1 as well.
>     It is amazing to see the progress made in Submarine last year, the
> community grows fast and quiet collaborative. I can see the reasons to get
> it release faster in its own cycle. And at the same time, the Ozone way
> works very well.
>
>     —
>     Weiwei
>     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
>     > +1
>     >
>     > Hello everyone,
>     >
>     > I am Xun Liu, the head of the machine learning team at Netease
> Research Institute. I quite agree with Wangda.
>     >
>     > Our team is very grateful for getting Submarine machine learning
> engine from the community.
>     > We are heavy users of Submarine.
>     > Because Submarine fits into the direction of our big data team's
> hadoop technology stack,
>     > It avoids the needs to increase the manpower investment in learning
> other container scheduling systems.
>     > The important thing is that we can use a common YARN cluster to run
> machine learning,
>     > which makes the utilization of server resources more efficient, and
> reserves a lot of human and material resources in our previous years.
>     >
>     > Our team have finished the test and deployment of the Submarine and
> will provide the service to our e-commerce department (
> http://www.kaola.com/) shortly.
>     >
>     > We also plan to provides the Submarine engine in our existing YARN
> cluster in the next six months.
>     > Because we have a lot of product departments need to use machine
> learning services,
>     > for example:
>     > 1) Game department (http://game.163.com/) needs AI battle training,
>     > 2) News department (http://www.163.com) needs news recommendation,
>     > 3) Mailbox department (http://www.163.com) requires anti-spam and
> illegal detection,
>     > 4) Music department (https://music.163.com/) requires music
> recommendation,
>     > 5) Education department (http://www.youdao.com) requires voice
> recognition,
>     > 6) Massive Open Online Courses (https://open.163.com/) requires
> multilingual translation and so on.
>     >
>     > If Submarine can be released independently like Ozone, it will help
> us quickly get the latest features and improvements, and it will be great
> helpful to our team and users.
>     >
>     > Thanks hadoop Community!
>     >
>     >
>     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
>     > >
>     > > Hi devs,
>     > >
>     > > Since we started submarine-related effort last year, we received a
> lot of
>     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
> are
>     > > trying to deploy Submarine to their Hadoop cluster along with big
> data
>     > > workloads. Linkedin also has big interests to contribute a
> Submarine TonY (
>     > > https://github.com/linkedin/TonY) runtime to allow users to use
> the same
>     > > interface.
>     > >
>     > > From what I can see, there're several issues of putting Submarine
> under
>     > > yarn-applications directory and have same release cycle with
> Hadoop:
>     > >
>     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
> at Jan
>     > > 2019. Because of non-predictable blockers and security issues, it
> got
>     > > delayed a lot. We need to iterate submarine fast at this point.
>     > >
>     > > 2) We also see a lot of requirements to use Submarine on older
> Hadoop
>     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
> in a
>     > > short time, but the requirement to run deep learning is urgent to
> them. We
>     > > should decouple Submarine from Hadoop version.
>     > >
>     > > And why we wanna to keep it within Hadoop? First, Submarine
> included some
>     > > innovation parts such as enhancements of user experiences for YARN
>     > > services/containerization support which we can add it back to
> Hadoop later
>     > > to address common requirements. In addition to that, we have a big
> overlap
>     > > in the community developing and using it.
>     > >
>     > > There're several proposals we have went through during Ozone merge
> to trunk
>     > > discussion:
>     > >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>     > >
>     > > I propose to adopt Ozone model: which is the same master branch,
> different
>     > > release cycle, and different release branch. It is a great example
> to show
>     > > agile release we can do (2 Ozone releases after Oct 2018) with less
>     > > overhead to setup CI, projects, etc.
>     > >
>     > > *Links:*
>     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
>     > > - Design doc
>     > > <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
>     > > - User doc
>     > > <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
>     > > (3.2.0
>     > > release)
>     > > - Blogposts, {Submarine} : Running deep learning workloads on
> Apache Hadoop
>     > > <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
>     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
>     > > - Talks: Strata Data Conf NY
>     > > <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>     > >
>     > > Thoughts?
>     > >
>     > > Thanks,
>     > > Wangda Tan
>     >
>     >
>     >
>     > ---------------------------------------------------------------------
>     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>     >
>
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Wangda Tan <wh...@gmail.com>.

Eric,
Thanks for your reconsideration. We will definitely try best to not break
compatibilities, etc. like how we did to other components!

Really appreciate everybody's support, thoughts, suggestions shared on this
thread. Given the discussion went very positive, I will go ahead to send a
voting thread.

Best,
Wangda

On Fri, Feb 1, 2019 at 2:06 PM Eric Yang <ey...@hortonworks.com> wrote:

> If HDFS or YARN breaks compatibility with Submarine, it will require to
> make release to catch up with the latest Hadoop changes.  On
> hadoop.apache.org website, the latest news may always have Submarine on
> top to repair compatibility with latest of Hadoop.  This may overwhelm any
> interesting news that may happen in Hadoop space.  I don’t like to see that
> happen, but unavoidable with independent release cycle.  Maybe there is a
> good way to avoid this with help of release manager to ensure that
> Hadoop/Submarine don’t break compatibility frequently.
>
>
>
> For me to lift my veto, release managers of independent release cycles
> need to take responsibility to ensure X version of Hadoop is tested with Y
> version of Submarine.  Release managers will have to do more work to ensure
> the defined combination works.  With the greater responsibility of release
> management comes with its own reward.  Seasoned PMC may be nominated to
> become Apache Member, which will help with Submarine to enter Apache
> Incubator when time is right.  Hence, I will withdraw my veto and let
> Submarine set its own course.
>
>
>
> Good luck Wangda.
>
>
>
> Regards,
>
> Eric
>
>
>
> *From: *Wangda Tan <wh...@gmail.com>
> *Date: *Friday, February 1, 2019 at 10:52 AM
> *To: *Eric Yang <ey...@hortonworks.com>
> *Cc: *Weiwei Yang <ab...@gmail.com>, Xun Liu <ne...@163.com>,
> Hadoop Common <co...@hadoop.apache.org>, "yarn-dev@hadoop.apache.org"
> <ya...@hadoop.apache.org>, Hdfs-dev <hd...@hadoop.apache.org>, "
> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>
> *Subject: *Re: [DISCUSS] Making submarine to different release model like
> Ozone
>
>
>
> Thanks everyone for sharing thoughts!
>
>
>
> Eric, appreciate your suggestions. But there are many examples to have
> separate releases, like Hive's storage API, OZone, etc. For loosely coupled
> sub-projects, it gonna be great (at least for most of the users) to have
> separate releases so new features can be faster consumed and iterated. From
> above feedbacks from developers and users, I think it is also what people
> want.
>
>
>
> Another concern you mentioned is Submarine is aligned with Hadoop project
> goals. From feedbacks we can see, it attracts companies continue using
> Hadoop to solve their ML/DL requirements, it also created a good feedback
> loop, many issues faced, and some new functionalities added by Submarine
> went back to Hadoop. Such as localization files, directories. GPU topology
> related enhancement, etc.
>
>
>
> We will definitely use this sub-project opportunity to fast grow both
> Submarine and Hadoop, try to get fast release cycles for both of the
> projects. And for your suggestion about Apache incubator, we can reconsider
> it once Submarine becomes a more independent project, now it is still too
> small and too much overhead to go through the process, I don't want to stop
> the fast-growing community for months to go through incubator process for
> now.
>
>
>
> I really hope my comment can help you reconsider the veto. :)
>
>
>
> Thanks,
>
> Wangda
>
>
>
> On Fri, Feb 1, 2019 at 9:39 AM Eric Yang <ey...@hortonworks.com> wrote:
>
> Submarine is an application built for YARN framework, but it does not have
> strong dependency on YARN development.  For this kind of projects, it would
> be best to enter Apache Incubator cycles to create a new community.  Apache
> commons is the only project other than Incubator that has independent
> release cycles.  The collection is large, and the project goal is
> ambitious.  No one really knows which component works with each other in
> Apache commons.  Hadoop is a much more focused project on distributed
> computing framework and not incubation sandbox.  For alignment with Hadoop
> goals, and we want to prevent Hadoop project to be overloaded while
> allowing good ideas to be carried forwarded in Apache incubator.  Put on my
> Apache Member hat, my vote is -1 to allow more independent subproject
> release cycle in Hadoop project that does not align with Hadoop project
> goals.
>
> Apache incubator process is highly recommended for Submarine:
> https://incubator.apache.org/policy/process.html This allows Submarine to
> develop for older version of Hadoop like Spark works with multiple versions
> of Hadoop.
>
> Regards,
> Eric
>
> On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
>
>     Thanks for proposing this Wangda, my +1 as well.
>     It is amazing to see the progress made in Submarine last year, the
> community grows fast and quiet collaborative. I can see the reasons to get
> it release faster in its own cycle. And at the same time, the Ozone way
> works very well.
>
>     —
>     Weiwei
>     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
>     > +1
>     >
>     > Hello everyone,
>     >
>     > I am Xun Liu, the head of the machine learning team at Netease
> Research Institute. I quite agree with Wangda.
>     >
>     > Our team is very grateful for getting Submarine machine learning
> engine from the community.
>     > We are heavy users of Submarine.
>     > Because Submarine fits into the direction of our big data team's
> hadoop technology stack,
>     > It avoids the needs to increase the manpower investment in learning
> other container scheduling systems.
>     > The important thing is that we can use a common YARN cluster to run
> machine learning,
>     > which makes the utilization of server resources more efficient, and
> reserves a lot of human and material resources in our previous years.
>     >
>     > Our team have finished the test and deployment of the Submarine and
> will provide the service to our e-commerce department (
> http://www.kaola.com/) shortly.
>     >
>     > We also plan to provides the Submarine engine in our existing YARN
> cluster in the next six months.
>     > Because we have a lot of product departments need to use machine
> learning services,
>     > for example:
>     > 1) Game department (http://game.163.com/) needs AI battle training,
>     > 2) News department (http://www.163.com) needs news recommendation,
>     > 3) Mailbox department (http://www.163.com) requires anti-spam and
> illegal detection,
>     > 4) Music department (https://music.163.com/) requires music
> recommendation,
>     > 5) Education department (http://www.youdao.com) requires voice
> recognition,
>     > 6) Massive Open Online Courses (https://open.163.com/) requires
> multilingual translation and so on.
>     >
>     > If Submarine can be released independently like Ozone, it will help
> us quickly get the latest features and improvements, and it will be great
> helpful to our team and users.
>     >
>     > Thanks hadoop Community!
>     >
>     >
>     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
>     > >
>     > > Hi devs,
>     > >
>     > > Since we started submarine-related effort last year, we received a
> lot of
>     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
> are
>     > > trying to deploy Submarine to their Hadoop cluster along with big
> data
>     > > workloads. Linkedin also has big interests to contribute a
> Submarine TonY (
>     > > https://github.com/linkedin/TonY) runtime to allow users to use
> the same
>     > > interface.
>     > >
>     > > From what I can see, there're several issues of putting Submarine
> under
>     > > yarn-applications directory and have same release cycle with
> Hadoop:
>     > >
>     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
> at Jan
>     > > 2019. Because of non-predictable blockers and security issues, it
> got
>     > > delayed a lot. We need to iterate submarine fast at this point.
>     > >
>     > > 2) We also see a lot of requirements to use Submarine on older
> Hadoop
>     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
> in a
>     > > short time, but the requirement to run deep learning is urgent to
> them. We
>     > > should decouple Submarine from Hadoop version.
>     > >
>     > > And why we wanna to keep it within Hadoop? First, Submarine
> included some
>     > > innovation parts such as enhancements of user experiences for YARN
>     > > services/containerization support which we can add it back to
> Hadoop later
>     > > to address common requirements. In addition to that, we have a big
> overlap
>     > > in the community developing and using it.
>     > >
>     > > There're several proposals we have went through during Ozone merge
> to trunk
>     > > discussion:
>     > >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>     > >
>     > > I propose to adopt Ozone model: which is the same master branch,
> different
>     > > release cycle, and different release branch. It is a great example
> to show
>     > > agile release we can do (2 Ozone releases after Oct 2018) with less
>     > > overhead to setup CI, projects, etc.
>     > >
>     > > *Links:*
>     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
>     > > - Design doc
>     > > <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
>     > > - User doc
>     > > <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
>     > > (3.2.0
>     > > release)
>     > > - Blogposts, {Submarine} : Running deep learning workloads on
> Apache Hadoop
>     > > <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
>     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
>     > > - Talks: Strata Data Conf NY
>     > > <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>     > >
>     > > Thoughts?
>     > >
>     > > Thanks,
>     > > Wangda Tan
>     >
>     >
>     >
>     > ---------------------------------------------------------------------
>     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>     >
>
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Wangda Tan <wh...@gmail.com>.

Eric,
Thanks for your reconsideration. We will definitely try best to not break
compatibilities, etc. like how we did to other components!

Really appreciate everybody's support, thoughts, suggestions shared on this
thread. Given the discussion went very positive, I will go ahead to send a
voting thread.

Best,
Wangda

On Fri, Feb 1, 2019 at 2:06 PM Eric Yang <ey...@hortonworks.com> wrote:

> If HDFS or YARN breaks compatibility with Submarine, it will require to
> make release to catch up with the latest Hadoop changes.  On
> hadoop.apache.org website, the latest news may always have Submarine on
> top to repair compatibility with latest of Hadoop.  This may overwhelm any
> interesting news that may happen in Hadoop space.  I don’t like to see that
> happen, but unavoidable with independent release cycle.  Maybe there is a
> good way to avoid this with help of release manager to ensure that
> Hadoop/Submarine don’t break compatibility frequently.
>
>
>
> For me to lift my veto, release managers of independent release cycles
> need to take responsibility to ensure X version of Hadoop is tested with Y
> version of Submarine.  Release managers will have to do more work to ensure
> the defined combination works.  With the greater responsibility of release
> management comes with its own reward.  Seasoned PMC may be nominated to
> become Apache Member, which will help with Submarine to enter Apache
> Incubator when time is right.  Hence, I will withdraw my veto and let
> Submarine set its own course.
>
>
>
> Good luck Wangda.
>
>
>
> Regards,
>
> Eric
>
>
>
> *From: *Wangda Tan <wh...@gmail.com>
> *Date: *Friday, February 1, 2019 at 10:52 AM
> *To: *Eric Yang <ey...@hortonworks.com>
> *Cc: *Weiwei Yang <ab...@gmail.com>, Xun Liu <ne...@163.com>,
> Hadoop Common <co...@hadoop.apache.org>, "yarn-dev@hadoop.apache.org"
> <ya...@hadoop.apache.org>, Hdfs-dev <hd...@hadoop.apache.org>, "
> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>
> *Subject: *Re: [DISCUSS] Making submarine to different release model like
> Ozone
>
>
>
> Thanks everyone for sharing thoughts!
>
>
>
> Eric, appreciate your suggestions. But there are many examples to have
> separate releases, like Hive's storage API, OZone, etc. For loosely coupled
> sub-projects, it gonna be great (at least for most of the users) to have
> separate releases so new features can be faster consumed and iterated. From
> above feedbacks from developers and users, I think it is also what people
> want.
>
>
>
> Another concern you mentioned is Submarine is aligned with Hadoop project
> goals. From feedbacks we can see, it attracts companies continue using
> Hadoop to solve their ML/DL requirements, it also created a good feedback
> loop, many issues faced, and some new functionalities added by Submarine
> went back to Hadoop. Such as localization files, directories. GPU topology
> related enhancement, etc.
>
>
>
> We will definitely use this sub-project opportunity to fast grow both
> Submarine and Hadoop, try to get fast release cycles for both of the
> projects. And for your suggestion about Apache incubator, we can reconsider
> it once Submarine becomes a more independent project, now it is still too
> small and too much overhead to go through the process, I don't want to stop
> the fast-growing community for months to go through incubator process for
> now.
>
>
>
> I really hope my comment can help you reconsider the veto. :)
>
>
>
> Thanks,
>
> Wangda
>
>
>
> On Fri, Feb 1, 2019 at 9:39 AM Eric Yang <ey...@hortonworks.com> wrote:
>
> Submarine is an application built for YARN framework, but it does not have
> strong dependency on YARN development.  For this kind of projects, it would
> be best to enter Apache Incubator cycles to create a new community.  Apache
> commons is the only project other than Incubator that has independent
> release cycles.  The collection is large, and the project goal is
> ambitious.  No one really knows which component works with each other in
> Apache commons.  Hadoop is a much more focused project on distributed
> computing framework and not incubation sandbox.  For alignment with Hadoop
> goals, and we want to prevent Hadoop project to be overloaded while
> allowing good ideas to be carried forwarded in Apache incubator.  Put on my
> Apache Member hat, my vote is -1 to allow more independent subproject
> release cycle in Hadoop project that does not align with Hadoop project
> goals.
>
> Apache incubator process is highly recommended for Submarine:
> https://incubator.apache.org/policy/process.html This allows Submarine to
> develop for older version of Hadoop like Spark works with multiple versions
> of Hadoop.
>
> Regards,
> Eric
>
> On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
>
>     Thanks for proposing this Wangda, my +1 as well.
>     It is amazing to see the progress made in Submarine last year, the
> community grows fast and quiet collaborative. I can see the reasons to get
> it release faster in its own cycle. And at the same time, the Ozone way
> works very well.
>
>     —
>     Weiwei
>     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
>     > +1
>     >
>     > Hello everyone,
>     >
>     > I am Xun Liu, the head of the machine learning team at Netease
> Research Institute. I quite agree with Wangda.
>     >
>     > Our team is very grateful for getting Submarine machine learning
> engine from the community.
>     > We are heavy users of Submarine.
>     > Because Submarine fits into the direction of our big data team's
> hadoop technology stack,
>     > It avoids the needs to increase the manpower investment in learning
> other container scheduling systems.
>     > The important thing is that we can use a common YARN cluster to run
> machine learning,
>     > which makes the utilization of server resources more efficient, and
> reserves a lot of human and material resources in our previous years.
>     >
>     > Our team have finished the test and deployment of the Submarine and
> will provide the service to our e-commerce department (
> http://www.kaola.com/) shortly.
>     >
>     > We also plan to provides the Submarine engine in our existing YARN
> cluster in the next six months.
>     > Because we have a lot of product departments need to use machine
> learning services,
>     > for example:
>     > 1) Game department (http://game.163.com/) needs AI battle training,
>     > 2) News department (http://www.163.com) needs news recommendation,
>     > 3) Mailbox department (http://www.163.com) requires anti-spam and
> illegal detection,
>     > 4) Music department (https://music.163.com/) requires music
> recommendation,
>     > 5) Education department (http://www.youdao.com) requires voice
> recognition,
>     > 6) Massive Open Online Courses (https://open.163.com/) requires
> multilingual translation and so on.
>     >
>     > If Submarine can be released independently like Ozone, it will help
> us quickly get the latest features and improvements, and it will be great
> helpful to our team and users.
>     >
>     > Thanks hadoop Community!
>     >
>     >
>     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
>     > >
>     > > Hi devs,
>     > >
>     > > Since we started submarine-related effort last year, we received a
> lot of
>     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
> are
>     > > trying to deploy Submarine to their Hadoop cluster along with big
> data
>     > > workloads. Linkedin also has big interests to contribute a
> Submarine TonY (
>     > > https://github.com/linkedin/TonY) runtime to allow users to use
> the same
>     > > interface.
>     > >
>     > > From what I can see, there're several issues of putting Submarine
> under
>     > > yarn-applications directory and have same release cycle with
> Hadoop:
>     > >
>     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
> at Jan
>     > > 2019. Because of non-predictable blockers and security issues, it
> got
>     > > delayed a lot. We need to iterate submarine fast at this point.
>     > >
>     > > 2) We also see a lot of requirements to use Submarine on older
> Hadoop
>     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
> in a
>     > > short time, but the requirement to run deep learning is urgent to
> them. We
>     > > should decouple Submarine from Hadoop version.
>     > >
>     > > And why we wanna to keep it within Hadoop? First, Submarine
> included some
>     > > innovation parts such as enhancements of user experiences for YARN
>     > > services/containerization support which we can add it back to
> Hadoop later
>     > > to address common requirements. In addition to that, we have a big
> overlap
>     > > in the community developing and using it.
>     > >
>     > > There're several proposals we have went through during Ozone merge
> to trunk
>     > > discussion:
>     > >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>     > >
>     > > I propose to adopt Ozone model: which is the same master branch,
> different
>     > > release cycle, and different release branch. It is a great example
> to show
>     > > agile release we can do (2 Ozone releases after Oct 2018) with less
>     > > overhead to setup CI, projects, etc.
>     > >
>     > > *Links:*
>     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
>     > > - Design doc
>     > > <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
>     > > - User doc
>     > > <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
>     > > (3.2.0
>     > > release)
>     > > - Blogposts, {Submarine} : Running deep learning workloads on
> Apache Hadoop
>     > > <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
>     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
>     > > - Talks: Strata Data Conf NY
>     > > <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>     > >
>     > > Thoughts?
>     > >
>     > > Thanks,
>     > > Wangda Tan
>     >
>     >
>     >
>     > ---------------------------------------------------------------------
>     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>     >
>
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Wangda Tan <wh...@gmail.com>.

Eric,
Thanks for your reconsideration. We will definitely try best to not break
compatibilities, etc. like how we did to other components!

Really appreciate everybody's support, thoughts, suggestions shared on this
thread. Given the discussion went very positive, I will go ahead to send a
voting thread.

Best,
Wangda

On Fri, Feb 1, 2019 at 2:06 PM Eric Yang <ey...@hortonworks.com> wrote:

> If HDFS or YARN breaks compatibility with Submarine, it will require to
> make release to catch up with the latest Hadoop changes.  On
> hadoop.apache.org website, the latest news may always have Submarine on
> top to repair compatibility with latest of Hadoop.  This may overwhelm any
> interesting news that may happen in Hadoop space.  I don’t like to see that
> happen, but unavoidable with independent release cycle.  Maybe there is a
> good way to avoid this with help of release manager to ensure that
> Hadoop/Submarine don’t break compatibility frequently.
>
>
>
> For me to lift my veto, release managers of independent release cycles
> need to take responsibility to ensure X version of Hadoop is tested with Y
> version of Submarine.  Release managers will have to do more work to ensure
> the defined combination works.  With the greater responsibility of release
> management comes with its own reward.  Seasoned PMC may be nominated to
> become Apache Member, which will help with Submarine to enter Apache
> Incubator when time is right.  Hence, I will withdraw my veto and let
> Submarine set its own course.
>
>
>
> Good luck Wangda.
>
>
>
> Regards,
>
> Eric
>
>
>
> *From: *Wangda Tan <wh...@gmail.com>
> *Date: *Friday, February 1, 2019 at 10:52 AM
> *To: *Eric Yang <ey...@hortonworks.com>
> *Cc: *Weiwei Yang <ab...@gmail.com>, Xun Liu <ne...@163.com>,
> Hadoop Common <co...@hadoop.apache.org>, "yarn-dev@hadoop.apache.org"
> <ya...@hadoop.apache.org>, Hdfs-dev <hd...@hadoop.apache.org>, "
> mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>
> *Subject: *Re: [DISCUSS] Making submarine to different release model like
> Ozone
>
>
>
> Thanks everyone for sharing thoughts!
>
>
>
> Eric, appreciate your suggestions. But there are many examples to have
> separate releases, like Hive's storage API, OZone, etc. For loosely coupled
> sub-projects, it gonna be great (at least for most of the users) to have
> separate releases so new features can be faster consumed and iterated. From
> above feedbacks from developers and users, I think it is also what people
> want.
>
>
>
> Another concern you mentioned is Submarine is aligned with Hadoop project
> goals. From feedbacks we can see, it attracts companies continue using
> Hadoop to solve their ML/DL requirements, it also created a good feedback
> loop, many issues faced, and some new functionalities added by Submarine
> went back to Hadoop. Such as localization files, directories. GPU topology
> related enhancement, etc.
>
>
>
> We will definitely use this sub-project opportunity to fast grow both
> Submarine and Hadoop, try to get fast release cycles for both of the
> projects. And for your suggestion about Apache incubator, we can reconsider
> it once Submarine becomes a more independent project, now it is still too
> small and too much overhead to go through the process, I don't want to stop
> the fast-growing community for months to go through incubator process for
> now.
>
>
>
> I really hope my comment can help you reconsider the veto. :)
>
>
>
> Thanks,
>
> Wangda
>
>
>
> On Fri, Feb 1, 2019 at 9:39 AM Eric Yang <ey...@hortonworks.com> wrote:
>
> Submarine is an application built for YARN framework, but it does not have
> strong dependency on YARN development.  For this kind of projects, it would
> be best to enter Apache Incubator cycles to create a new community.  Apache
> commons is the only project other than Incubator that has independent
> release cycles.  The collection is large, and the project goal is
> ambitious.  No one really knows which component works with each other in
> Apache commons.  Hadoop is a much more focused project on distributed
> computing framework and not incubation sandbox.  For alignment with Hadoop
> goals, and we want to prevent Hadoop project to be overloaded while
> allowing good ideas to be carried forwarded in Apache incubator.  Put on my
> Apache Member hat, my vote is -1 to allow more independent subproject
> release cycle in Hadoop project that does not align with Hadoop project
> goals.
>
> Apache incubator process is highly recommended for Submarine:
> https://incubator.apache.org/policy/process.html This allows Submarine to
> develop for older version of Hadoop like Spark works with multiple versions
> of Hadoop.
>
> Regards,
> Eric
>
> On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
>
>     Thanks for proposing this Wangda, my +1 as well.
>     It is amazing to see the progress made in Submarine last year, the
> community grows fast and quiet collaborative. I can see the reasons to get
> it release faster in its own cycle. And at the same time, the Ozone way
> works very well.
>
>     —
>     Weiwei
>     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
>     > +1
>     >
>     > Hello everyone,
>     >
>     > I am Xun Liu, the head of the machine learning team at Netease
> Research Institute. I quite agree with Wangda.
>     >
>     > Our team is very grateful for getting Submarine machine learning
> engine from the community.
>     > We are heavy users of Submarine.
>     > Because Submarine fits into the direction of our big data team's
> hadoop technology stack,
>     > It avoids the needs to increase the manpower investment in learning
> other container scheduling systems.
>     > The important thing is that we can use a common YARN cluster to run
> machine learning,
>     > which makes the utilization of server resources more efficient, and
> reserves a lot of human and material resources in our previous years.
>     >
>     > Our team have finished the test and deployment of the Submarine and
> will provide the service to our e-commerce department (
> http://www.kaola.com/) shortly.
>     >
>     > We also plan to provides the Submarine engine in our existing YARN
> cluster in the next six months.
>     > Because we have a lot of product departments need to use machine
> learning services,
>     > for example:
>     > 1) Game department (http://game.163.com/) needs AI battle training,
>     > 2) News department (http://www.163.com) needs news recommendation,
>     > 3) Mailbox department (http://www.163.com) requires anti-spam and
> illegal detection,
>     > 4) Music department (https://music.163.com/) requires music
> recommendation,
>     > 5) Education department (http://www.youdao.com) requires voice
> recognition,
>     > 6) Massive Open Online Courses (https://open.163.com/) requires
> multilingual translation and so on.
>     >
>     > If Submarine can be released independently like Ozone, it will help
> us quickly get the latest features and improvements, and it will be great
> helpful to our team and users.
>     >
>     > Thanks hadoop Community!
>     >
>     >
>     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
>     > >
>     > > Hi devs,
>     > >
>     > > Since we started submarine-related effort last year, we received a
> lot of
>     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
> are
>     > > trying to deploy Submarine to their Hadoop cluster along with big
> data
>     > > workloads. Linkedin also has big interests to contribute a
> Submarine TonY (
>     > > https://github.com/linkedin/TonY) runtime to allow users to use
> the same
>     > > interface.
>     > >
>     > > From what I can see, there're several issues of putting Submarine
> under
>     > > yarn-applications directory and have same release cycle with
> Hadoop:
>     > >
>     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
> at Jan
>     > > 2019. Because of non-predictable blockers and security issues, it
> got
>     > > delayed a lot. We need to iterate submarine fast at this point.
>     > >
>     > > 2) We also see a lot of requirements to use Submarine on older
> Hadoop
>     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
> in a
>     > > short time, but the requirement to run deep learning is urgent to
> them. We
>     > > should decouple Submarine from Hadoop version.
>     > >
>     > > And why we wanna to keep it within Hadoop? First, Submarine
> included some
>     > > innovation parts such as enhancements of user experiences for YARN
>     > > services/containerization support which we can add it back to
> Hadoop later
>     > > to address common requirements. In addition to that, we have a big
> overlap
>     > > in the community developing and using it.
>     > >
>     > > There're several proposals we have went through during Ozone merge
> to trunk
>     > > discussion:
>     > >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>     > >
>     > > I propose to adopt Ozone model: which is the same master branch,
> different
>     > > release cycle, and different release branch. It is a great example
> to show
>     > > agile release we can do (2 Ozone releases after Oct 2018) with less
>     > > overhead to setup CI, projects, etc.
>     > >
>     > > *Links:*
>     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
>     > > - Design doc
>     > > <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
>     > > - User doc
>     > > <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
>     > > (3.2.0
>     > > release)
>     > > - Blogposts, {Submarine} : Running deep learning workloads on
> Apache Hadoop
>     > > <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
>     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
>     > > - Talks: Strata Data Conf NY
>     > > <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>     > >
>     > > Thoughts?
>     > >
>     > > Thanks,
>     > > Wangda Tan
>     >
>     >
>     >
>     > ---------------------------------------------------------------------
>     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>     >
>
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Eric Yang <ey...@hortonworks.com>.

If HDFS or YARN breaks compatibility with Submarine, it will require to make release to catch up with the latest Hadoop changes.  On hadoop.apache.org website, the latest news may always have Submarine on top to repair compatibility with latest of Hadoop.  This may overwhelm any interesting news that may happen in Hadoop space.  I don’t like to see that happen, but unavoidable with independent release cycle.  Maybe there is a good way to avoid this with help of release manager to ensure that Hadoop/Submarine don’t break compatibility frequently.

For me to lift my veto, release managers of independent release cycles need to take responsibility to ensure X version of Hadoop is tested with Y version of Submarine.  Release managers will have to do more work to ensure the defined combination works.  With the greater responsibility of release management comes with its own reward.  Seasoned PMC may be nominated to become Apache Member, which will help with Submarine to enter Apache Incubator when time is right.  Hence, I will withdraw my veto and let Submarine set its own course.

Good luck Wangda.

Regards,
Eric

From: Wangda Tan <wh...@gmail.com>
Date: Friday, February 1, 2019 at 10:52 AM
To: Eric Yang <ey...@hortonworks.com>
Cc: Weiwei Yang <ab...@gmail.com>, Xun Liu <ne...@163.com>, Hadoop Common <co...@hadoop.apache.org>, "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>, Hdfs-dev <hd...@hadoop.apache.org>, "mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>
Subject: Re: [DISCUSS] Making submarine to different release model like Ozone

Thanks everyone for sharing thoughts!

Eric, appreciate your suggestions. But there are many examples to have separate releases, like Hive's storage API, OZone, etc. For loosely coupled sub-projects, it gonna be great (at least for most of the users) to have separate releases so new features can be faster consumed and iterated. From above feedbacks from developers and users, I think it is also what people want.

Another concern you mentioned is Submarine is aligned with Hadoop project goals. From feedbacks we can see, it attracts companies continue using Hadoop to solve their ML/DL requirements, it also created a good feedback loop, many issues faced, and some new functionalities added by Submarine went back to Hadoop. Such as localization files, directories. GPU topology related enhancement, etc.

We will definitely use this sub-project opportunity to fast grow both Submarine and Hadoop, try to get fast release cycles for both of the projects. And for your suggestion about Apache incubator, we can reconsider it once Submarine becomes a more independent project, now it is still too small and too much overhead to go through the process, I don't want to stop the fast-growing community for months to go through incubator process for now.

I really hope my comment can help you reconsider the veto. :)

Thanks,
Wangda

On Fri, Feb 1, 2019 at 9:39 AM Eric Yang <ey...@hortonworks.com>> wrote:
Submarine is an application built for YARN framework, but it does not have strong dependency on YARN development.  For this kind of projects, it would be best to enter Apache Incubator cycles to create a new community.  Apache commons is the only project other than Incubator that has independent release cycles.  The collection is large, and the project goal is ambitious.  No one really knows which component works with each other in Apache commons.  Hadoop is a much more focused project on distributed computing framework and not incubation sandbox.  For alignment with Hadoop goals, and we want to prevent Hadoop project to be overloaded while allowing good ideas to be carried forwarded in Apache incubator.  Put on my Apache Member hat, my vote is -1 to allow more independent subproject release cycle in Hadoop project that does not align with Hadoop project goals.

Apache incubator process is highly recommended for Submarine: https://incubator.apache.org/policy/process.html This allows Submarine to develop for older version of Hadoop like Spark works with multiple versions of Hadoop.

Regards,
Eric

On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com>> wrote:

    Thanks for proposing this Wangda, my +1 as well.
    It is amazing to see the progress made in Submarine last year, the community grows fast and quiet collaborative. I can see the reasons to get it release faster in its own cycle. And at the same time, the Ozone way works very well.

    —
    Weiwei
    On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>>, wrote:
    > +1
    >
    > Hello everyone,
    >
    > I am Xun Liu, the head of the machine learning team at Netease Research Institute. I quite agree with Wangda.
    >
    > Our team is very grateful for getting Submarine machine learning engine from the community.
    > We are heavy users of Submarine.
    > Because Submarine fits into the direction of our big data team's hadoop technology stack,
    > It avoids the needs to increase the manpower investment in learning other container scheduling systems.
    > The important thing is that we can use a common YARN cluster to run machine learning,
    > which makes the utilization of server resources more efficient, and reserves a lot of human and material resources in our previous years.
    >
    > Our team have finished the test and deployment of the Submarine and will provide the service to our e-commerce department (http://www.kaola.com/) shortly.
    >
    > We also plan to provides the Submarine engine in our existing YARN cluster in the next six months.
    > Because we have a lot of product departments need to use machine learning services,
    > for example:
    > 1) Game department (http://game.163.com/) needs AI battle training,
    > 2) News department (http://www.163.com) needs news recommendation,
    > 3) Mailbox department (http://www.163.com) requires anti-spam and illegal detection,
    > 4) Music department (https://music.163.com/) requires music recommendation,
    > 5) Education department (http://www.youdao.com) requires voice recognition,
    > 6) Massive Open Online Courses (https://open.163.com/) requires multilingual translation and so on.
    >
    > If Submarine can be released independently like Ozone, it will help us quickly get the latest features and improvements, and it will be great helpful to our team and users.
    >
    > Thanks hadoop Community!
    >
    >
    > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com>> 写道：
    > >
    > > Hi devs,
    > >
    > > Since we started submarine-related effort last year, we received a lot of
    > > feedbacks, several companies (such as Netease, China Mobile, etc.) are
    > > trying to deploy Submarine to their Hadoop cluster along with big data
    > > workloads. Linkedin also has big interests to contribute a Submarine TonY (
    > > https://github.com/linkedin/TonY) runtime to allow users to use the same
    > > interface.
    > >
    > > From what I can see, there're several issues of putting Submarine under
    > > yarn-applications directory and have same release cycle with Hadoop:
    > >
    > > 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
    > > 2019. Because of non-predictable blockers and security issues, it got
    > > delayed a lot. We need to iterate submarine fast at this point.
    > >
    > > 2) We also see a lot of requirements to use Submarine on older Hadoop
    > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
    > > short time, but the requirement to run deep learning is urgent to them. We
    > > should decouple Submarine from Hadoop version.
    > >
    > > And why we wanna to keep it within Hadoop? First, Submarine included some
    > > innovation parts such as enhancements of user experiences for YARN
    > > services/containerization support which we can add it back to Hadoop later
    > > to address common requirements. In addition to that, we have a big overlap
    > > in the community developing and using it.
    > >
    > > There're several proposals we have went through during Ozone merge to trunk
    > > discussion:
    > > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
    > >
    > > I propose to adopt Ozone model: which is the same master branch, different
    > > release cycle, and different release branch. It is a great example to show
    > > agile release we can do (2 Ozone releases after Oct 2018) with less
    > > overhead to setup CI, projects, etc.
    > >
    > > *Links:*
    > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
    > > - Design doc
    > > <https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit>
    > > - User doc
    > > <https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html>
    > > (3.2.0
    > > release)
    > > - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
    > > <https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/>,
    > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
    > > - Talks: Strata Data Conf NY
    > > <https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289>
    > >
    > > Thoughts?
    > >
    > > Thanks,
    > > Wangda Tan
    >
    >
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org<ma...@hadoop.apache.org>
    > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org<ma...@hadoop.apache.org>
    >

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Eric Yang <ey...@hortonworks.com>.

If HDFS or YARN breaks compatibility with Submarine, it will require to make release to catch up with the latest Hadoop changes.  On hadoop.apache.org website, the latest news may always have Submarine on top to repair compatibility with latest of Hadoop.  This may overwhelm any interesting news that may happen in Hadoop space.  I don’t like to see that happen, but unavoidable with independent release cycle.  Maybe there is a good way to avoid this with help of release manager to ensure that Hadoop/Submarine don’t break compatibility frequently.

For me to lift my veto, release managers of independent release cycles need to take responsibility to ensure X version of Hadoop is tested with Y version of Submarine.  Release managers will have to do more work to ensure the defined combination works.  With the greater responsibility of release management comes with its own reward.  Seasoned PMC may be nominated to become Apache Member, which will help with Submarine to enter Apache Incubator when time is right.  Hence, I will withdraw my veto and let Submarine set its own course.

Good luck Wangda.

Regards,
Eric

From: Wangda Tan <wh...@gmail.com>
Date: Friday, February 1, 2019 at 10:52 AM
To: Eric Yang <ey...@hortonworks.com>
Cc: Weiwei Yang <ab...@gmail.com>, Xun Liu <ne...@163.com>, Hadoop Common <co...@hadoop.apache.org>, "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>, Hdfs-dev <hd...@hadoop.apache.org>, "mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>
Subject: Re: [DISCUSS] Making submarine to different release model like Ozone

Thanks everyone for sharing thoughts!

Eric, appreciate your suggestions. But there are many examples to have separate releases, like Hive's storage API, OZone, etc. For loosely coupled sub-projects, it gonna be great (at least for most of the users) to have separate releases so new features can be faster consumed and iterated. From above feedbacks from developers and users, I think it is also what people want.

Another concern you mentioned is Submarine is aligned with Hadoop project goals. From feedbacks we can see, it attracts companies continue using Hadoop to solve their ML/DL requirements, it also created a good feedback loop, many issues faced, and some new functionalities added by Submarine went back to Hadoop. Such as localization files, directories. GPU topology related enhancement, etc.

We will definitely use this sub-project opportunity to fast grow both Submarine and Hadoop, try to get fast release cycles for both of the projects. And for your suggestion about Apache incubator, we can reconsider it once Submarine becomes a more independent project, now it is still too small and too much overhead to go through the process, I don't want to stop the fast-growing community for months to go through incubator process for now.

I really hope my comment can help you reconsider the veto. :)

Thanks,
Wangda

On Fri, Feb 1, 2019 at 9:39 AM Eric Yang <ey...@hortonworks.com>> wrote:
Submarine is an application built for YARN framework, but it does not have strong dependency on YARN development.  For this kind of projects, it would be best to enter Apache Incubator cycles to create a new community.  Apache commons is the only project other than Incubator that has independent release cycles.  The collection is large, and the project goal is ambitious.  No one really knows which component works with each other in Apache commons.  Hadoop is a much more focused project on distributed computing framework and not incubation sandbox.  For alignment with Hadoop goals, and we want to prevent Hadoop project to be overloaded while allowing good ideas to be carried forwarded in Apache incubator.  Put on my Apache Member hat, my vote is -1 to allow more independent subproject release cycle in Hadoop project that does not align with Hadoop project goals.

Apache incubator process is highly recommended for Submarine: https://incubator.apache.org/policy/process.html This allows Submarine to develop for older version of Hadoop like Spark works with multiple versions of Hadoop.

Regards,
Eric

On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com>> wrote:

    Thanks for proposing this Wangda, my +1 as well.
    It is amazing to see the progress made in Submarine last year, the community grows fast and quiet collaborative. I can see the reasons to get it release faster in its own cycle. And at the same time, the Ozone way works very well.

    —
    Weiwei
    On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>>, wrote:
    > +1
    >
    > Hello everyone,
    >
    > I am Xun Liu, the head of the machine learning team at Netease Research Institute. I quite agree with Wangda.
    >
    > Our team is very grateful for getting Submarine machine learning engine from the community.
    > We are heavy users of Submarine.
    > Because Submarine fits into the direction of our big data team's hadoop technology stack,
    > It avoids the needs to increase the manpower investment in learning other container scheduling systems.
    > The important thing is that we can use a common YARN cluster to run machine learning,
    > which makes the utilization of server resources more efficient, and reserves a lot of human and material resources in our previous years.
    >
    > Our team have finished the test and deployment of the Submarine and will provide the service to our e-commerce department (http://www.kaola.com/) shortly.
    >
    > We also plan to provides the Submarine engine in our existing YARN cluster in the next six months.
    > Because we have a lot of product departments need to use machine learning services,
    > for example:
    > 1) Game department (http://game.163.com/) needs AI battle training,
    > 2) News department (http://www.163.com) needs news recommendation,
    > 3) Mailbox department (http://www.163.com) requires anti-spam and illegal detection,
    > 4) Music department (https://music.163.com/) requires music recommendation,
    > 5) Education department (http://www.youdao.com) requires voice recognition,
    > 6) Massive Open Online Courses (https://open.163.com/) requires multilingual translation and so on.
    >
    > If Submarine can be released independently like Ozone, it will help us quickly get the latest features and improvements, and it will be great helpful to our team and users.
    >
    > Thanks hadoop Community!
    >
    >
    > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com>> 写道：
    > >
    > > Hi devs,
    > >
    > > Since we started submarine-related effort last year, we received a lot of
    > > feedbacks, several companies (such as Netease, China Mobile, etc.) are
    > > trying to deploy Submarine to their Hadoop cluster along with big data
    > > workloads. Linkedin also has big interests to contribute a Submarine TonY (
    > > https://github.com/linkedin/TonY) runtime to allow users to use the same
    > > interface.
    > >
    > > From what I can see, there're several issues of putting Submarine under
    > > yarn-applications directory and have same release cycle with Hadoop:
    > >
    > > 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
    > > 2019. Because of non-predictable blockers and security issues, it got
    > > delayed a lot. We need to iterate submarine fast at this point.
    > >
    > > 2) We also see a lot of requirements to use Submarine on older Hadoop
    > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
    > > short time, but the requirement to run deep learning is urgent to them. We
    > > should decouple Submarine from Hadoop version.
    > >
    > > And why we wanna to keep it within Hadoop? First, Submarine included some
    > > innovation parts such as enhancements of user experiences for YARN
    > > services/containerization support which we can add it back to Hadoop later
    > > to address common requirements. In addition to that, we have a big overlap
    > > in the community developing and using it.
    > >
    > > There're several proposals we have went through during Ozone merge to trunk
    > > discussion:
    > > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
    > >
    > > I propose to adopt Ozone model: which is the same master branch, different
    > > release cycle, and different release branch. It is a great example to show
    > > agile release we can do (2 Ozone releases after Oct 2018) with less
    > > overhead to setup CI, projects, etc.
    > >
    > > *Links:*
    > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
    > > - Design doc
    > > <https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit>
    > > - User doc
    > > <https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html>
    > > (3.2.0
    > > release)
    > > - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
    > > <https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/>,
    > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
    > > - Talks: Strata Data Conf NY
    > > <https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289>
    > >
    > > Thoughts?
    > >
    > > Thanks,
    > > Wangda Tan
    >
    >
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org<ma...@hadoop.apache.org>
    > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org<ma...@hadoop.apache.org>
    >

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Eric Yang <ey...@hortonworks.com>.

If HDFS or YARN breaks compatibility with Submarine, it will require to make release to catch up with the latest Hadoop changes.  On hadoop.apache.org website, the latest news may always have Submarine on top to repair compatibility with latest of Hadoop.  This may overwhelm any interesting news that may happen in Hadoop space.  I don’t like to see that happen, but unavoidable with independent release cycle.  Maybe there is a good way to avoid this with help of release manager to ensure that Hadoop/Submarine don’t break compatibility frequently.

For me to lift my veto, release managers of independent release cycles need to take responsibility to ensure X version of Hadoop is tested with Y version of Submarine.  Release managers will have to do more work to ensure the defined combination works.  With the greater responsibility of release management comes with its own reward.  Seasoned PMC may be nominated to become Apache Member, which will help with Submarine to enter Apache Incubator when time is right.  Hence, I will withdraw my veto and let Submarine set its own course.

Good luck Wangda.

Regards,
Eric

From: Wangda Tan <wh...@gmail.com>
Date: Friday, February 1, 2019 at 10:52 AM
To: Eric Yang <ey...@hortonworks.com>
Cc: Weiwei Yang <ab...@gmail.com>, Xun Liu <ne...@163.com>, Hadoop Common <co...@hadoop.apache.org>, "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>, Hdfs-dev <hd...@hadoop.apache.org>, "mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org>
Subject: Re: [DISCUSS] Making submarine to different release model like Ozone

Thanks everyone for sharing thoughts!

Eric, appreciate your suggestions. But there are many examples to have separate releases, like Hive's storage API, OZone, etc. For loosely coupled sub-projects, it gonna be great (at least for most of the users) to have separate releases so new features can be faster consumed and iterated. From above feedbacks from developers and users, I think it is also what people want.

Another concern you mentioned is Submarine is aligned with Hadoop project goals. From feedbacks we can see, it attracts companies continue using Hadoop to solve their ML/DL requirements, it also created a good feedback loop, many issues faced, and some new functionalities added by Submarine went back to Hadoop. Such as localization files, directories. GPU topology related enhancement, etc.

We will definitely use this sub-project opportunity to fast grow both Submarine and Hadoop, try to get fast release cycles for both of the projects. And for your suggestion about Apache incubator, we can reconsider it once Submarine becomes a more independent project, now it is still too small and too much overhead to go through the process, I don't want to stop the fast-growing community for months to go through incubator process for now.

I really hope my comment can help you reconsider the veto. :)

Thanks,
Wangda

On Fri, Feb 1, 2019 at 9:39 AM Eric Yang <ey...@hortonworks.com>> wrote:
Submarine is an application built for YARN framework, but it does not have strong dependency on YARN development.  For this kind of projects, it would be best to enter Apache Incubator cycles to create a new community.  Apache commons is the only project other than Incubator that has independent release cycles.  The collection is large, and the project goal is ambitious.  No one really knows which component works with each other in Apache commons.  Hadoop is a much more focused project on distributed computing framework and not incubation sandbox.  For alignment with Hadoop goals, and we want to prevent Hadoop project to be overloaded while allowing good ideas to be carried forwarded in Apache incubator.  Put on my Apache Member hat, my vote is -1 to allow more independent subproject release cycle in Hadoop project that does not align with Hadoop project goals.

Apache incubator process is highly recommended for Submarine: https://incubator.apache.org/policy/process.html This allows Submarine to develop for older version of Hadoop like Spark works with multiple versions of Hadoop.

Regards,
Eric

On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com>> wrote:

    Thanks for proposing this Wangda, my +1 as well.
    It is amazing to see the progress made in Submarine last year, the community grows fast and quiet collaborative. I can see the reasons to get it release faster in its own cycle. And at the same time, the Ozone way works very well.

    —
    Weiwei
    On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>>, wrote:
    > +1
    >
    > Hello everyone,
    >
    > I am Xun Liu, the head of the machine learning team at Netease Research Institute. I quite agree with Wangda.
    >
    > Our team is very grateful for getting Submarine machine learning engine from the community.
    > We are heavy users of Submarine.
    > Because Submarine fits into the direction of our big data team's hadoop technology stack,
    > It avoids the needs to increase the manpower investment in learning other container scheduling systems.
    > The important thing is that we can use a common YARN cluster to run machine learning,
    > which makes the utilization of server resources more efficient, and reserves a lot of human and material resources in our previous years.
    >
    > Our team have finished the test and deployment of the Submarine and will provide the service to our e-commerce department (http://www.kaola.com/) shortly.
    >
    > We also plan to provides the Submarine engine in our existing YARN cluster in the next six months.
    > Because we have a lot of product departments need to use machine learning services,
    > for example:
    > 1) Game department (http://game.163.com/) needs AI battle training,
    > 2) News department (http://www.163.com) needs news recommendation,
    > 3) Mailbox department (http://www.163.com) requires anti-spam and illegal detection,
    > 4) Music department (https://music.163.com/) requires music recommendation,
    > 5) Education department (http://www.youdao.com) requires voice recognition,
    > 6) Massive Open Online Courses (https://open.163.com/) requires multilingual translation and so on.
    >
    > If Submarine can be released independently like Ozone, it will help us quickly get the latest features and improvements, and it will be great helpful to our team and users.
    >
    > Thanks hadoop Community!
    >
    >
    > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com>> 写道：
    > >
    > > Hi devs,
    > >
    > > Since we started submarine-related effort last year, we received a lot of
    > > feedbacks, several companies (such as Netease, China Mobile, etc.) are
    > > trying to deploy Submarine to their Hadoop cluster along with big data
    > > workloads. Linkedin also has big interests to contribute a Submarine TonY (
    > > https://github.com/linkedin/TonY) runtime to allow users to use the same
    > > interface.
    > >
    > > From what I can see, there're several issues of putting Submarine under
    > > yarn-applications directory and have same release cycle with Hadoop:
    > >
    > > 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
    > > 2019. Because of non-predictable blockers and security issues, it got
    > > delayed a lot. We need to iterate submarine fast at this point.
    > >
    > > 2) We also see a lot of requirements to use Submarine on older Hadoop
    > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
    > > short time, but the requirement to run deep learning is urgent to them. We
    > > should decouple Submarine from Hadoop version.
    > >
    > > And why we wanna to keep it within Hadoop? First, Submarine included some
    > > innovation parts such as enhancements of user experiences for YARN
    > > services/containerization support which we can add it back to Hadoop later
    > > to address common requirements. In addition to that, we have a big overlap
    > > in the community developing and using it.
    > >
    > > There're several proposals we have went through during Ozone merge to trunk
    > > discussion:
    > > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
    > >
    > > I propose to adopt Ozone model: which is the same master branch, different
    > > release cycle, and different release branch. It is a great example to show
    > > agile release we can do (2 Ozone releases after Oct 2018) with less
    > > overhead to setup CI, projects, etc.
    > >
    > > *Links:*
    > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
    > > - Design doc
    > > <https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit>
    > > - User doc
    > > <https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html>
    > > (3.2.0
    > > release)
    > > - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
    > > <https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/>,
    > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
    > > - Talks: Strata Data Conf NY
    > > <https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289>
    > >
    > > Thoughts?
    > >
    > > Thanks,
    > > Wangda Tan
    >
    >
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org<ma...@hadoop.apache.org>
    > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org<ma...@hadoop.apache.org>
    >

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Wangda Tan <wh...@gmail.com>.

Thanks everyone for sharing thoughts!

Eric, appreciate your suggestions. But there are many examples to have
separate releases, like Hive's storage API, OZone, etc. For loosely coupled
sub-projects, it gonna be great (at least for most of the users) to have
separate releases so new features can be faster consumed and iterated. From
above feedbacks from developers and users, I think it is also what people
want.

Another concern you mentioned is Submarine is aligned with Hadoop project
goals. From feedbacks we can see, it attracts companies continue using
Hadoop to solve their ML/DL requirements, it also created a good feedback
loop, many issues faced, and some new functionalities added by Submarine
went back to Hadoop. Such as localization files, directories. GPU topology
related enhancement, etc.

We will definitely use this sub-project opportunity to fast grow both
Submarine and Hadoop, try to get fast release cycles for both of the
projects. And for your suggestion about Apache incubator, we can reconsider
it once Submarine becomes a more independent project, now it is still too
small and too much overhead to go through the process, I don't want to stop
the fast-growing community for months to go through incubator process for
now.

I really hope my comment can help you reconsider the veto. :)

Thanks,
Wangda

On Fri, Feb 1, 2019 at 9:39 AM Eric Yang <ey...@hortonworks.com> wrote:

> Submarine is an application built for YARN framework, but it does not have
> strong dependency on YARN development.  For this kind of projects, it would
> be best to enter Apache Incubator cycles to create a new community.  Apache
> commons is the only project other than Incubator that has independent
> release cycles.  The collection is large, and the project goal is
> ambitious.  No one really knows which component works with each other in
> Apache commons.  Hadoop is a much more focused project on distributed
> computing framework and not incubation sandbox.  For alignment with Hadoop
> goals, and we want to prevent Hadoop project to be overloaded while
> allowing good ideas to be carried forwarded in Apache incubator.  Put on my
> Apache Member hat, my vote is -1 to allow more independent subproject
> release cycle in Hadoop project that does not align with Hadoop project
> goals.
>
> Apache incubator process is highly recommended for Submarine:
> https://incubator.apache.org/policy/process.html This allows Submarine to
> develop for older version of Hadoop like Spark works with multiple versions
> of Hadoop.
>
> Regards,
> Eric
>
> On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
>
>     Thanks for proposing this Wangda, my +1 as well.
>     It is amazing to see the progress made in Submarine last year, the
> community grows fast and quiet collaborative. I can see the reasons to get
> it release faster in its own cycle. And at the same time, the Ozone way
> works very well.
>
>     —
>     Weiwei
>     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
>     > +1
>     >
>     > Hello everyone,
>     >
>     > I am Xun Liu, the head of the machine learning team at Netease
> Research Institute. I quite agree with Wangda.
>     >
>     > Our team is very grateful for getting Submarine machine learning
> engine from the community.
>     > We are heavy users of Submarine.
>     > Because Submarine fits into the direction of our big data team's
> hadoop technology stack,
>     > It avoids the needs to increase the manpower investment in learning
> other container scheduling systems.
>     > The important thing is that we can use a common YARN cluster to run
> machine learning,
>     > which makes the utilization of server resources more efficient, and
> reserves a lot of human and material resources in our previous years.
>     >
>     > Our team have finished the test and deployment of the Submarine and
> will provide the service to our e-commerce department (
> http://www.kaola.com/) shortly.
>     >
>     > We also plan to provides the Submarine engine in our existing YARN
> cluster in the next six months.
>     > Because we have a lot of product departments need to use machine
> learning services,
>     > for example:
>     > 1) Game department (http://game.163.com/) needs AI battle training,
>     > 2) News department (http://www.163.com) needs news recommendation,
>     > 3) Mailbox department (http://www.163.com) requires anti-spam and
> illegal detection,
>     > 4) Music department (https://music.163.com/) requires music
> recommendation,
>     > 5) Education department (http://www.youdao.com) requires voice
> recognition,
>     > 6) Massive Open Online Courses (https://open.163.com/) requires
> multilingual translation and so on.
>     >
>     > If Submarine can be released independently like Ozone, it will help
> us quickly get the latest features and improvements, and it will be great
> helpful to our team and users.
>     >
>     > Thanks hadoop Community!
>     >
>     >
>     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
>     > >
>     > > Hi devs,
>     > >
>     > > Since we started submarine-related effort last year, we received a
> lot of
>     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
> are
>     > > trying to deploy Submarine to their Hadoop cluster along with big
> data
>     > > workloads. Linkedin also has big interests to contribute a
> Submarine TonY (
>     > > https://github.com/linkedin/TonY) runtime to allow users to use
> the same
>     > > interface.
>     > >
>     > > From what I can see, there're several issues of putting Submarine
> under
>     > > yarn-applications directory and have same release cycle with
> Hadoop:
>     > >
>     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
> at Jan
>     > > 2019. Because of non-predictable blockers and security issues, it
> got
>     > > delayed a lot. We need to iterate submarine fast at this point.
>     > >
>     > > 2) We also see a lot of requirements to use Submarine on older
> Hadoop
>     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
> in a
>     > > short time, but the requirement to run deep learning is urgent to
> them. We
>     > > should decouple Submarine from Hadoop version.
>     > >
>     > > And why we wanna to keep it within Hadoop? First, Submarine
> included some
>     > > innovation parts such as enhancements of user experiences for YARN
>     > > services/containerization support which we can add it back to
> Hadoop later
>     > > to address common requirements. In addition to that, we have a big
> overlap
>     > > in the community developing and using it.
>     > >
>     > > There're several proposals we have went through during Ozone merge
> to trunk
>     > > discussion:
>     > >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>     > >
>     > > I propose to adopt Ozone model: which is the same master branch,
> different
>     > > release cycle, and different release branch. It is a great example
> to show
>     > > agile release we can do (2 Ozone releases after Oct 2018) with less
>     > > overhead to setup CI, projects, etc.
>     > >
>     > > *Links:*
>     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
>     > > - Design doc
>     > > <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
>     > > - User doc
>     > > <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
>     > > (3.2.0
>     > > release)
>     > > - Blogposts, {Submarine} : Running deep learning workloads on
> Apache Hadoop
>     > > <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
>     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
>     > > - Talks: Strata Data Conf NY
>     > > <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>     > >
>     > > Thoughts?
>     > >
>     > > Thanks,
>     > > Wangda Tan
>     >
>     >
>     >
>     > ---------------------------------------------------------------------
>     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>     >
>
>
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Wangda Tan <wh...@gmail.com>.

Thanks everyone for sharing thoughts!

Eric, appreciate your suggestions. But there are many examples to have
separate releases, like Hive's storage API, OZone, etc. For loosely coupled
sub-projects, it gonna be great (at least for most of the users) to have
separate releases so new features can be faster consumed and iterated. From
above feedbacks from developers and users, I think it is also what people
want.

Another concern you mentioned is Submarine is aligned with Hadoop project
goals. From feedbacks we can see, it attracts companies continue using
Hadoop to solve their ML/DL requirements, it also created a good feedback
loop, many issues faced, and some new functionalities added by Submarine
went back to Hadoop. Such as localization files, directories. GPU topology
related enhancement, etc.

We will definitely use this sub-project opportunity to fast grow both
Submarine and Hadoop, try to get fast release cycles for both of the
projects. And for your suggestion about Apache incubator, we can reconsider
it once Submarine becomes a more independent project, now it is still too
small and too much overhead to go through the process, I don't want to stop
the fast-growing community for months to go through incubator process for
now.

I really hope my comment can help you reconsider the veto. :)

Thanks,
Wangda

On Fri, Feb 1, 2019 at 9:39 AM Eric Yang <ey...@hortonworks.com> wrote:

> Submarine is an application built for YARN framework, but it does not have
> strong dependency on YARN development.  For this kind of projects, it would
> be best to enter Apache Incubator cycles to create a new community.  Apache
> commons is the only project other than Incubator that has independent
> release cycles.  The collection is large, and the project goal is
> ambitious.  No one really knows which component works with each other in
> Apache commons.  Hadoop is a much more focused project on distributed
> computing framework and not incubation sandbox.  For alignment with Hadoop
> goals, and we want to prevent Hadoop project to be overloaded while
> allowing good ideas to be carried forwarded in Apache incubator.  Put on my
> Apache Member hat, my vote is -1 to allow more independent subproject
> release cycle in Hadoop project that does not align with Hadoop project
> goals.
>
> Apache incubator process is highly recommended for Submarine:
> https://incubator.apache.org/policy/process.html This allows Submarine to
> develop for older version of Hadoop like Spark works with multiple versions
> of Hadoop.
>
> Regards,
> Eric
>
> On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
>
>     Thanks for proposing this Wangda, my +1 as well.
>     It is amazing to see the progress made in Submarine last year, the
> community grows fast and quiet collaborative. I can see the reasons to get
> it release faster in its own cycle. And at the same time, the Ozone way
> works very well.
>
>     —
>     Weiwei
>     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
>     > +1
>     >
>     > Hello everyone,
>     >
>     > I am Xun Liu, the head of the machine learning team at Netease
> Research Institute. I quite agree with Wangda.
>     >
>     > Our team is very grateful for getting Submarine machine learning
> engine from the community.
>     > We are heavy users of Submarine.
>     > Because Submarine fits into the direction of our big data team's
> hadoop technology stack,
>     > It avoids the needs to increase the manpower investment in learning
> other container scheduling systems.
>     > The important thing is that we can use a common YARN cluster to run
> machine learning,
>     > which makes the utilization of server resources more efficient, and
> reserves a lot of human and material resources in our previous years.
>     >
>     > Our team have finished the test and deployment of the Submarine and
> will provide the service to our e-commerce department (
> http://www.kaola.com/) shortly.
>     >
>     > We also plan to provides the Submarine engine in our existing YARN
> cluster in the next six months.
>     > Because we have a lot of product departments need to use machine
> learning services,
>     > for example:
>     > 1) Game department (http://game.163.com/) needs AI battle training,
>     > 2) News department (http://www.163.com) needs news recommendation,
>     > 3) Mailbox department (http://www.163.com) requires anti-spam and
> illegal detection,
>     > 4) Music department (https://music.163.com/) requires music
> recommendation,
>     > 5) Education department (http://www.youdao.com) requires voice
> recognition,
>     > 6) Massive Open Online Courses (https://open.163.com/) requires
> multilingual translation and so on.
>     >
>     > If Submarine can be released independently like Ozone, it will help
> us quickly get the latest features and improvements, and it will be great
> helpful to our team and users.
>     >
>     > Thanks hadoop Community!
>     >
>     >
>     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
>     > >
>     > > Hi devs,
>     > >
>     > > Since we started submarine-related effort last year, we received a
> lot of
>     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
> are
>     > > trying to deploy Submarine to their Hadoop cluster along with big
> data
>     > > workloads. Linkedin also has big interests to contribute a
> Submarine TonY (
>     > > https://github.com/linkedin/TonY) runtime to allow users to use
> the same
>     > > interface.
>     > >
>     > > From what I can see, there're several issues of putting Submarine
> under
>     > > yarn-applications directory and have same release cycle with
> Hadoop:
>     > >
>     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
> at Jan
>     > > 2019. Because of non-predictable blockers and security issues, it
> got
>     > > delayed a lot. We need to iterate submarine fast at this point.
>     > >
>     > > 2) We also see a lot of requirements to use Submarine on older
> Hadoop
>     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
> in a
>     > > short time, but the requirement to run deep learning is urgent to
> them. We
>     > > should decouple Submarine from Hadoop version.
>     > >
>     > > And why we wanna to keep it within Hadoop? First, Submarine
> included some
>     > > innovation parts such as enhancements of user experiences for YARN
>     > > services/containerization support which we can add it back to
> Hadoop later
>     > > to address common requirements. In addition to that, we have a big
> overlap
>     > > in the community developing and using it.
>     > >
>     > > There're several proposals we have went through during Ozone merge
> to trunk
>     > > discussion:
>     > >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>     > >
>     > > I propose to adopt Ozone model: which is the same master branch,
> different
>     > > release cycle, and different release branch. It is a great example
> to show
>     > > agile release we can do (2 Ozone releases after Oct 2018) with less
>     > > overhead to setup CI, projects, etc.
>     > >
>     > > *Links:*
>     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
>     > > - Design doc
>     > > <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
>     > > - User doc
>     > > <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
>     > > (3.2.0
>     > > release)
>     > > - Blogposts, {Submarine} : Running deep learning workloads on
> Apache Hadoop
>     > > <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
>     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
>     > > - Talks: Strata Data Conf NY
>     > > <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>     > >
>     > > Thoughts?
>     > >
>     > > Thanks,
>     > > Wangda Tan
>     >
>     >
>     >
>     > ---------------------------------------------------------------------
>     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>     >
>
>
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Wangda Tan <wh...@gmail.com>.

Thanks everyone for sharing thoughts!

Eric, appreciate your suggestions. But there are many examples to have
separate releases, like Hive's storage API, OZone, etc. For loosely coupled
sub-projects, it gonna be great (at least for most of the users) to have
separate releases so new features can be faster consumed and iterated. From
above feedbacks from developers and users, I think it is also what people
want.

Another concern you mentioned is Submarine is aligned with Hadoop project
goals. From feedbacks we can see, it attracts companies continue using
Hadoop to solve their ML/DL requirements, it also created a good feedback
loop, many issues faced, and some new functionalities added by Submarine
went back to Hadoop. Such as localization files, directories. GPU topology
related enhancement, etc.

We will definitely use this sub-project opportunity to fast grow both
Submarine and Hadoop, try to get fast release cycles for both of the
projects. And for your suggestion about Apache incubator, we can reconsider
it once Submarine becomes a more independent project, now it is still too
small and too much overhead to go through the process, I don't want to stop
the fast-growing community for months to go through incubator process for
now.

I really hope my comment can help you reconsider the veto. :)

Thanks,
Wangda

On Fri, Feb 1, 2019 at 9:39 AM Eric Yang <ey...@hortonworks.com> wrote:

> Submarine is an application built for YARN framework, but it does not have
> strong dependency on YARN development.  For this kind of projects, it would
> be best to enter Apache Incubator cycles to create a new community.  Apache
> commons is the only project other than Incubator that has independent
> release cycles.  The collection is large, and the project goal is
> ambitious.  No one really knows which component works with each other in
> Apache commons.  Hadoop is a much more focused project on distributed
> computing framework and not incubation sandbox.  For alignment with Hadoop
> goals, and we want to prevent Hadoop project to be overloaded while
> allowing good ideas to be carried forwarded in Apache incubator.  Put on my
> Apache Member hat, my vote is -1 to allow more independent subproject
> release cycle in Hadoop project that does not align with Hadoop project
> goals.
>
> Apache incubator process is highly recommended for Submarine:
> https://incubator.apache.org/policy/process.html This allows Submarine to
> develop for older version of Hadoop like Spark works with multiple versions
> of Hadoop.
>
> Regards,
> Eric
>
> On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
>
>     Thanks for proposing this Wangda, my +1 as well.
>     It is amazing to see the progress made in Submarine last year, the
> community grows fast and quiet collaborative. I can see the reasons to get
> it release faster in its own cycle. And at the same time, the Ozone way
> works very well.
>
>     —
>     Weiwei
>     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
>     > +1
>     >
>     > Hello everyone,
>     >
>     > I am Xun Liu, the head of the machine learning team at Netease
> Research Institute. I quite agree with Wangda.
>     >
>     > Our team is very grateful for getting Submarine machine learning
> engine from the community.
>     > We are heavy users of Submarine.
>     > Because Submarine fits into the direction of our big data team's
> hadoop technology stack,
>     > It avoids the needs to increase the manpower investment in learning
> other container scheduling systems.
>     > The important thing is that we can use a common YARN cluster to run
> machine learning,
>     > which makes the utilization of server resources more efficient, and
> reserves a lot of human and material resources in our previous years.
>     >
>     > Our team have finished the test and deployment of the Submarine and
> will provide the service to our e-commerce department (
> http://www.kaola.com/) shortly.
>     >
>     > We also plan to provides the Submarine engine in our existing YARN
> cluster in the next six months.
>     > Because we have a lot of product departments need to use machine
> learning services,
>     > for example:
>     > 1) Game department (http://game.163.com/) needs AI battle training,
>     > 2) News department (http://www.163.com) needs news recommendation,
>     > 3) Mailbox department (http://www.163.com) requires anti-spam and
> illegal detection,
>     > 4) Music department (https://music.163.com/) requires music
> recommendation,
>     > 5) Education department (http://www.youdao.com) requires voice
> recognition,
>     > 6) Massive Open Online Courses (https://open.163.com/) requires
> multilingual translation and so on.
>     >
>     > If Submarine can be released independently like Ozone, it will help
> us quickly get the latest features and improvements, and it will be great
> helpful to our team and users.
>     >
>     > Thanks hadoop Community!
>     >
>     >
>     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
>     > >
>     > > Hi devs,
>     > >
>     > > Since we started submarine-related effort last year, we received a
> lot of
>     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
> are
>     > > trying to deploy Submarine to their Hadoop cluster along with big
> data
>     > > workloads. Linkedin also has big interests to contribute a
> Submarine TonY (
>     > > https://github.com/linkedin/TonY) runtime to allow users to use
> the same
>     > > interface.
>     > >
>     > > From what I can see, there're several issues of putting Submarine
> under
>     > > yarn-applications directory and have same release cycle with
> Hadoop:
>     > >
>     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
> at Jan
>     > > 2019. Because of non-predictable blockers and security issues, it
> got
>     > > delayed a lot. We need to iterate submarine fast at this point.
>     > >
>     > > 2) We also see a lot of requirements to use Submarine on older
> Hadoop
>     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
> in a
>     > > short time, but the requirement to run deep learning is urgent to
> them. We
>     > > should decouple Submarine from Hadoop version.
>     > >
>     > > And why we wanna to keep it within Hadoop? First, Submarine
> included some
>     > > innovation parts such as enhancements of user experiences for YARN
>     > > services/containerization support which we can add it back to
> Hadoop later
>     > > to address common requirements. In addition to that, we have a big
> overlap
>     > > in the community developing and using it.
>     > >
>     > > There're several proposals we have went through during Ozone merge
> to trunk
>     > > discussion:
>     > >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>     > >
>     > > I propose to adopt Ozone model: which is the same master branch,
> different
>     > > release cycle, and different release branch. It is a great example
> to show
>     > > agile release we can do (2 Ozone releases after Oct 2018) with less
>     > > overhead to setup CI, projects, etc.
>     > >
>     > > *Links:*
>     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
>     > > - Design doc
>     > > <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
>     > > - User doc
>     > > <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
>     > > (3.2.0
>     > > release)
>     > > - Blogposts, {Submarine} : Running deep learning workloads on
> Apache Hadoop
>     > > <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
>     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
>     > > - Talks: Strata Data Conf NY
>     > > <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>     > >
>     > > Thoughts?
>     > >
>     > > Thanks,
>     > > Wangda Tan
>     >
>     >
>     >
>     > ---------------------------------------------------------------------
>     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>     >
>
>
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Wangda Tan <wh...@gmail.com>.

Thanks everyone for sharing thoughts!

Eric, appreciate your suggestions. But there are many examples to have
separate releases, like Hive's storage API, OZone, etc. For loosely coupled
sub-projects, it gonna be great (at least for most of the users) to have
separate releases so new features can be faster consumed and iterated. From
above feedbacks from developers and users, I think it is also what people
want.

Another concern you mentioned is Submarine is aligned with Hadoop project
goals. From feedbacks we can see, it attracts companies continue using
Hadoop to solve their ML/DL requirements, it also created a good feedback
loop, many issues faced, and some new functionalities added by Submarine
went back to Hadoop. Such as localization files, directories. GPU topology
related enhancement, etc.

We will definitely use this sub-project opportunity to fast grow both
Submarine and Hadoop, try to get fast release cycles for both of the
projects. And for your suggestion about Apache incubator, we can reconsider
it once Submarine becomes a more independent project, now it is still too
small and too much overhead to go through the process, I don't want to stop
the fast-growing community for months to go through incubator process for
now.

I really hope my comment can help you reconsider the veto. :)

Thanks,
Wangda

On Fri, Feb 1, 2019 at 9:39 AM Eric Yang <ey...@hortonworks.com> wrote:

> Submarine is an application built for YARN framework, but it does not have
> strong dependency on YARN development.  For this kind of projects, it would
> be best to enter Apache Incubator cycles to create a new community.  Apache
> commons is the only project other than Incubator that has independent
> release cycles.  The collection is large, and the project goal is
> ambitious.  No one really knows which component works with each other in
> Apache commons.  Hadoop is a much more focused project on distributed
> computing framework and not incubation sandbox.  For alignment with Hadoop
> goals, and we want to prevent Hadoop project to be overloaded while
> allowing good ideas to be carried forwarded in Apache incubator.  Put on my
> Apache Member hat, my vote is -1 to allow more independent subproject
> release cycle in Hadoop project that does not align with Hadoop project
> goals.
>
> Apache incubator process is highly recommended for Submarine:
> https://incubator.apache.org/policy/process.html This allows Submarine to
> develop for older version of Hadoop like Spark works with multiple versions
> of Hadoop.
>
> Regards,
> Eric
>
> On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
>
>     Thanks for proposing this Wangda, my +1 as well.
>     It is amazing to see the progress made in Submarine last year, the
> community grows fast and quiet collaborative. I can see the reasons to get
> it release faster in its own cycle. And at the same time, the Ozone way
> works very well.
>
>     —
>     Weiwei
>     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
>     > +1
>     >
>     > Hello everyone,
>     >
>     > I am Xun Liu, the head of the machine learning team at Netease
> Research Institute. I quite agree with Wangda.
>     >
>     > Our team is very grateful for getting Submarine machine learning
> engine from the community.
>     > We are heavy users of Submarine.
>     > Because Submarine fits into the direction of our big data team's
> hadoop technology stack,
>     > It avoids the needs to increase the manpower investment in learning
> other container scheduling systems.
>     > The important thing is that we can use a common YARN cluster to run
> machine learning,
>     > which makes the utilization of server resources more efficient, and
> reserves a lot of human and material resources in our previous years.
>     >
>     > Our team have finished the test and deployment of the Submarine and
> will provide the service to our e-commerce department (
> http://www.kaola.com/) shortly.
>     >
>     > We also plan to provides the Submarine engine in our existing YARN
> cluster in the next six months.
>     > Because we have a lot of product departments need to use machine
> learning services,
>     > for example:
>     > 1) Game department (http://game.163.com/) needs AI battle training,
>     > 2) News department (http://www.163.com) needs news recommendation,
>     > 3) Mailbox department (http://www.163.com) requires anti-spam and
> illegal detection,
>     > 4) Music department (https://music.163.com/) requires music
> recommendation,
>     > 5) Education department (http://www.youdao.com) requires voice
> recognition,
>     > 6) Massive Open Online Courses (https://open.163.com/) requires
> multilingual translation and so on.
>     >
>     > If Submarine can be released independently like Ozone, it will help
> us quickly get the latest features and improvements, and it will be great
> helpful to our team and users.
>     >
>     > Thanks hadoop Community!
>     >
>     >
>     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
>     > >
>     > > Hi devs,
>     > >
>     > > Since we started submarine-related effort last year, we received a
> lot of
>     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
> are
>     > > trying to deploy Submarine to their Hadoop cluster along with big
> data
>     > > workloads. Linkedin also has big interests to contribute a
> Submarine TonY (
>     > > https://github.com/linkedin/TonY) runtime to allow users to use
> the same
>     > > interface.
>     > >
>     > > From what I can see, there're several issues of putting Submarine
> under
>     > > yarn-applications directory and have same release cycle with
> Hadoop:
>     > >
>     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
> at Jan
>     > > 2019. Because of non-predictable blockers and security issues, it
> got
>     > > delayed a lot. We need to iterate submarine fast at this point.
>     > >
>     > > 2) We also see a lot of requirements to use Submarine on older
> Hadoop
>     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
> in a
>     > > short time, but the requirement to run deep learning is urgent to
> them. We
>     > > should decouple Submarine from Hadoop version.
>     > >
>     > > And why we wanna to keep it within Hadoop? First, Submarine
> included some
>     > > innovation parts such as enhancements of user experiences for YARN
>     > > services/containerization support which we can add it back to
> Hadoop later
>     > > to address common requirements. In addition to that, we have a big
> overlap
>     > > in the community developing and using it.
>     > >
>     > > There're several proposals we have went through during Ozone merge
> to trunk
>     > > discussion:
>     > >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>     > >
>     > > I propose to adopt Ozone model: which is the same master branch,
> different
>     > > release cycle, and different release branch. It is a great example
> to show
>     > > agile release we can do (2 Ozone releases after Oct 2018) with less
>     > > overhead to setup CI, projects, etc.
>     > >
>     > > *Links:*
>     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
>     > > - Design doc
>     > > <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
>     > > - User doc
>     > > <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
>     > > (3.2.0
>     > > release)
>     > > - Blogposts, {Submarine} : Running deep learning workloads on
> Apache Hadoop
>     > > <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
>     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
>     > > - Talks: Strata Data Conf NY
>     > > <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>     > >
>     > > Thoughts?
>     > >
>     > > Thanks,
>     > > Wangda Tan
>     >
>     >
>     >
>     > ---------------------------------------------------------------------
>     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>     >
>
>
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Suma Shivaprasad <su...@gmail.com>.

+1. Thanks for bringing this up Wangda.

Makes sense to have Submarine follow its own release cadence given the good
momentum/adoption so far. Also, making it run with older versions of Hadoop
would drive higher adoption.

Suma

On Fri, Feb 1, 2019 at 9:40 AM Eric Yang <ey...@hortonworks.com> wrote:

> Submarine is an application built for YARN framework, but it does not have
> strong dependency on YARN development.  For this kind of projects, it would
> be best to enter Apache Incubator cycles to create a new community.  Apache
> commons is the only project other than Incubator that has independent
> release cycles.  The collection is large, and the project goal is
> ambitious.  No one really knows which component works with each other in
> Apache commons.  Hadoop is a much more focused project on distributed
> computing framework and not incubation sandbox.  For alignment with Hadoop
> goals, and we want to prevent Hadoop project to be overloaded while
> allowing good ideas to be carried forwarded in Apache incubator.  Put on my
> Apache Member hat, my vote is -1 to allow more independent subproject
> release cycle in Hadoop project that does not align with Hadoop project
> goals.
>
> Apache incubator process is highly recommended for Submarine:
> https://incubator.apache.org/policy/process.html This allows Submarine to
> develop for older version of Hadoop like Spark works with multiple versions
> of Hadoop.
>
> Regards,
> Eric
>
> On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
>
>     Thanks for proposing this Wangda, my +1 as well.
>     It is amazing to see the progress made in Submarine last year, the
> community grows fast and quiet collaborative. I can see the reasons to get
> it release faster in its own cycle. And at the same time, the Ozone way
> works very well.
>
>     —
>     Weiwei
>     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
>     > +1
>     >
>     > Hello everyone,
>     >
>     > I am Xun Liu, the head of the machine learning team at Netease
> Research Institute. I quite agree with Wangda.
>     >
>     > Our team is very grateful for getting Submarine machine learning
> engine from the community.
>     > We are heavy users of Submarine.
>     > Because Submarine fits into the direction of our big data team's
> hadoop technology stack,
>     > It avoids the needs to increase the manpower investment in learning
> other container scheduling systems.
>     > The important thing is that we can use a common YARN cluster to run
> machine learning,
>     > which makes the utilization of server resources more efficient, and
> reserves a lot of human and material resources in our previous years.
>     >
>     > Our team have finished the test and deployment of the Submarine and
> will provide the service to our e-commerce department (
> http://www.kaola.com/) shortly.
>     >
>     > We also plan to provides the Submarine engine in our existing YARN
> cluster in the next six months.
>     > Because we have a lot of product departments need to use machine
> learning services,
>     > for example:
>     > 1) Game department (http://game.163.com/) needs AI battle training,
>     > 2) News department (http://www.163.com) needs news recommendation,
>     > 3) Mailbox department (http://www.163.com) requires anti-spam and
> illegal detection,
>     > 4) Music department (https://music.163.com/) requires music
> recommendation,
>     > 5) Education department (http://www.youdao.com) requires voice
> recognition,
>     > 6) Massive Open Online Courses (https://open.163.com/) requires
> multilingual translation and so on.
>     >
>     > If Submarine can be released independently like Ozone, it will help
> us quickly get the latest features and improvements, and it will be great
> helpful to our team and users.
>     >
>     > Thanks hadoop Community!
>     >
>     >
>     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
>     > >
>     > > Hi devs,
>     > >
>     > > Since we started submarine-related effort last year, we received a
> lot of
>     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
> are
>     > > trying to deploy Submarine to their Hadoop cluster along with big
> data
>     > > workloads. Linkedin also has big interests to contribute a
> Submarine TonY (
>     > > https://github.com/linkedin/TonY) runtime to allow users to use
> the same
>     > > interface.
>     > >
>     > > From what I can see, there're several issues of putting Submarine
> under
>     > > yarn-applications directory and have same release cycle with
> Hadoop:
>     > >
>     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
> at Jan
>     > > 2019. Because of non-predictable blockers and security issues, it
> got
>     > > delayed a lot. We need to iterate submarine fast at this point.
>     > >
>     > > 2) We also see a lot of requirements to use Submarine on older
> Hadoop
>     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
> in a
>     > > short time, but the requirement to run deep learning is urgent to
> them. We
>     > > should decouple Submarine from Hadoop version.
>     > >
>     > > And why we wanna to keep it within Hadoop? First, Submarine
> included some
>     > > innovation parts such as enhancements of user experiences for YARN
>     > > services/containerization support which we can add it back to
> Hadoop later
>     > > to address common requirements. In addition to that, we have a big
> overlap
>     > > in the community developing and using it.
>     > >
>     > > There're several proposals we have went through during Ozone merge
> to trunk
>     > > discussion:
>     > >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>     > >
>     > > I propose to adopt Ozone model: which is the same master branch,
> different
>     > > release cycle, and different release branch. It is a great example
> to show
>     > > agile release we can do (2 Ozone releases after Oct 2018) with less
>     > > overhead to setup CI, projects, etc.
>     > >
>     > > *Links:*
>     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
>     > > - Design doc
>     > > <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
>     > > - User doc
>     > > <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
>     > > (3.2.0
>     > > release)
>     > > - Blogposts, {Submarine} : Running deep learning workloads on
> Apache Hadoop
>     > > <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
>     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
>     > > - Talks: Strata Data Conf NY
>     > > <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>     > >
>     > > Thoughts?
>     > >
>     > > Thanks,
>     > > Wangda Tan
>     >
>     >
>     >
>     > ---------------------------------------------------------------------
>     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>     >
>
>
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Suma Shivaprasad <su...@gmail.com>.

+1. Thanks for bringing this up Wangda.

Makes sense to have Submarine follow its own release cadence given the good
momentum/adoption so far. Also, making it run with older versions of Hadoop
would drive higher adoption.

Suma

On Fri, Feb 1, 2019 at 9:40 AM Eric Yang <ey...@hortonworks.com> wrote:

> Submarine is an application built for YARN framework, but it does not have
> strong dependency on YARN development.  For this kind of projects, it would
> be best to enter Apache Incubator cycles to create a new community.  Apache
> commons is the only project other than Incubator that has independent
> release cycles.  The collection is large, and the project goal is
> ambitious.  No one really knows which component works with each other in
> Apache commons.  Hadoop is a much more focused project on distributed
> computing framework and not incubation sandbox.  For alignment with Hadoop
> goals, and we want to prevent Hadoop project to be overloaded while
> allowing good ideas to be carried forwarded in Apache incubator.  Put on my
> Apache Member hat, my vote is -1 to allow more independent subproject
> release cycle in Hadoop project that does not align with Hadoop project
> goals.
>
> Apache incubator process is highly recommended for Submarine:
> https://incubator.apache.org/policy/process.html This allows Submarine to
> develop for older version of Hadoop like Spark works with multiple versions
> of Hadoop.
>
> Regards,
> Eric
>
> On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
>
>     Thanks for proposing this Wangda, my +1 as well.
>     It is amazing to see the progress made in Submarine last year, the
> community grows fast and quiet collaborative. I can see the reasons to get
> it release faster in its own cycle. And at the same time, the Ozone way
> works very well.
>
>     —
>     Weiwei
>     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
>     > +1
>     >
>     > Hello everyone,
>     >
>     > I am Xun Liu, the head of the machine learning team at Netease
> Research Institute. I quite agree with Wangda.
>     >
>     > Our team is very grateful for getting Submarine machine learning
> engine from the community.
>     > We are heavy users of Submarine.
>     > Because Submarine fits into the direction of our big data team's
> hadoop technology stack,
>     > It avoids the needs to increase the manpower investment in learning
> other container scheduling systems.
>     > The important thing is that we can use a common YARN cluster to run
> machine learning,
>     > which makes the utilization of server resources more efficient, and
> reserves a lot of human and material resources in our previous years.
>     >
>     > Our team have finished the test and deployment of the Submarine and
> will provide the service to our e-commerce department (
> http://www.kaola.com/) shortly.
>     >
>     > We also plan to provides the Submarine engine in our existing YARN
> cluster in the next six months.
>     > Because we have a lot of product departments need to use machine
> learning services,
>     > for example:
>     > 1) Game department (http://game.163.com/) needs AI battle training,
>     > 2) News department (http://www.163.com) needs news recommendation,
>     > 3) Mailbox department (http://www.163.com) requires anti-spam and
> illegal detection,
>     > 4) Music department (https://music.163.com/) requires music
> recommendation,
>     > 5) Education department (http://www.youdao.com) requires voice
> recognition,
>     > 6) Massive Open Online Courses (https://open.163.com/) requires
> multilingual translation and so on.
>     >
>     > If Submarine can be released independently like Ozone, it will help
> us quickly get the latest features and improvements, and it will be great
> helpful to our team and users.
>     >
>     > Thanks hadoop Community!
>     >
>     >
>     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
>     > >
>     > > Hi devs,
>     > >
>     > > Since we started submarine-related effort last year, we received a
> lot of
>     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
> are
>     > > trying to deploy Submarine to their Hadoop cluster along with big
> data
>     > > workloads. Linkedin also has big interests to contribute a
> Submarine TonY (
>     > > https://github.com/linkedin/TonY) runtime to allow users to use
> the same
>     > > interface.
>     > >
>     > > From what I can see, there're several issues of putting Submarine
> under
>     > > yarn-applications directory and have same release cycle with
> Hadoop:
>     > >
>     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
> at Jan
>     > > 2019. Because of non-predictable blockers and security issues, it
> got
>     > > delayed a lot. We need to iterate submarine fast at this point.
>     > >
>     > > 2) We also see a lot of requirements to use Submarine on older
> Hadoop
>     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
> in a
>     > > short time, but the requirement to run deep learning is urgent to
> them. We
>     > > should decouple Submarine from Hadoop version.
>     > >
>     > > And why we wanna to keep it within Hadoop? First, Submarine
> included some
>     > > innovation parts such as enhancements of user experiences for YARN
>     > > services/containerization support which we can add it back to
> Hadoop later
>     > > to address common requirements. In addition to that, we have a big
> overlap
>     > > in the community developing and using it.
>     > >
>     > > There're several proposals we have went through during Ozone merge
> to trunk
>     > > discussion:
>     > >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>     > >
>     > > I propose to adopt Ozone model: which is the same master branch,
> different
>     > > release cycle, and different release branch. It is a great example
> to show
>     > > agile release we can do (2 Ozone releases after Oct 2018) with less
>     > > overhead to setup CI, projects, etc.
>     > >
>     > > *Links:*
>     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
>     > > - Design doc
>     > > <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
>     > > - User doc
>     > > <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
>     > > (3.2.0
>     > > release)
>     > > - Blogposts, {Submarine} : Running deep learning workloads on
> Apache Hadoop
>     > > <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
>     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
>     > > - Talks: Strata Data Conf NY
>     > > <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>     > >
>     > > Thoughts?
>     > >
>     > > Thanks,
>     > > Wangda Tan
>     >
>     >
>     >
>     > ---------------------------------------------------------------------
>     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>     >
>
>
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Suma Shivaprasad <su...@gmail.com>.

+1. Thanks for bringing this up Wangda.

Makes sense to have Submarine follow its own release cadence given the good
momentum/adoption so far. Also, making it run with older versions of Hadoop
would drive higher adoption.

Suma

On Fri, Feb 1, 2019 at 9:40 AM Eric Yang <ey...@hortonworks.com> wrote:

> Submarine is an application built for YARN framework, but it does not have
> strong dependency on YARN development.  For this kind of projects, it would
> be best to enter Apache Incubator cycles to create a new community.  Apache
> commons is the only project other than Incubator that has independent
> release cycles.  The collection is large, and the project goal is
> ambitious.  No one really knows which component works with each other in
> Apache commons.  Hadoop is a much more focused project on distributed
> computing framework and not incubation sandbox.  For alignment with Hadoop
> goals, and we want to prevent Hadoop project to be overloaded while
> allowing good ideas to be carried forwarded in Apache incubator.  Put on my
> Apache Member hat, my vote is -1 to allow more independent subproject
> release cycle in Hadoop project that does not align with Hadoop project
> goals.
>
> Apache incubator process is highly recommended for Submarine:
> https://incubator.apache.org/policy/process.html This allows Submarine to
> develop for older version of Hadoop like Spark works with multiple versions
> of Hadoop.
>
> Regards,
> Eric
>
> On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:
>
>     Thanks for proposing this Wangda, my +1 as well.
>     It is amazing to see the progress made in Submarine last year, the
> community grows fast and quiet collaborative. I can see the reasons to get
> it release faster in its own cycle. And at the same time, the Ozone way
> works very well.
>
>     —
>     Weiwei
>     On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
>     > +1
>     >
>     > Hello everyone,
>     >
>     > I am Xun Liu, the head of the machine learning team at Netease
> Research Institute. I quite agree with Wangda.
>     >
>     > Our team is very grateful for getting Submarine machine learning
> engine from the community.
>     > We are heavy users of Submarine.
>     > Because Submarine fits into the direction of our big data team's
> hadoop technology stack,
>     > It avoids the needs to increase the manpower investment in learning
> other container scheduling systems.
>     > The important thing is that we can use a common YARN cluster to run
> machine learning,
>     > which makes the utilization of server resources more efficient, and
> reserves a lot of human and material resources in our previous years.
>     >
>     > Our team have finished the test and deployment of the Submarine and
> will provide the service to our e-commerce department (
> http://www.kaola.com/) shortly.
>     >
>     > We also plan to provides the Submarine engine in our existing YARN
> cluster in the next six months.
>     > Because we have a lot of product departments need to use machine
> learning services,
>     > for example:
>     > 1) Game department (http://game.163.com/) needs AI battle training,
>     > 2) News department (http://www.163.com) needs news recommendation,
>     > 3) Mailbox department (http://www.163.com) requires anti-spam and
> illegal detection,
>     > 4) Music department (https://music.163.com/) requires music
> recommendation,
>     > 5) Education department (http://www.youdao.com) requires voice
> recognition,
>     > 6) Massive Open Online Courses (https://open.163.com/) requires
> multilingual translation and so on.
>     >
>     > If Submarine can be released independently like Ozone, it will help
> us quickly get the latest features and improvements, and it will be great
> helpful to our team and users.
>     >
>     > Thanks hadoop Community!
>     >
>     >
>     > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
>     > >
>     > > Hi devs,
>     > >
>     > > Since we started submarine-related effort last year, we received a
> lot of
>     > > feedbacks, several companies (such as Netease, China Mobile, etc.)
> are
>     > > trying to deploy Submarine to their Hadoop cluster along with big
> data
>     > > workloads. Linkedin also has big interests to contribute a
> Submarine TonY (
>     > > https://github.com/linkedin/TonY) runtime to allow users to use
> the same
>     > > interface.
>     > >
>     > > From what I can see, there're several issues of putting Submarine
> under
>     > > yarn-applications directory and have same release cycle with
> Hadoop:
>     > >
>     > > 1) We started 3.2.0 release at Sep 2018, but the release is done
> at Jan
>     > > 2019. Because of non-predictable blockers and security issues, it
> got
>     > > delayed a lot. We need to iterate submarine fast at this point.
>     > >
>     > > 2) We also see a lot of requirements to use Submarine on older
> Hadoop
>     > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x
> in a
>     > > short time, but the requirement to run deep learning is urgent to
> them. We
>     > > should decouple Submarine from Hadoop version.
>     > >
>     > > And why we wanna to keep it within Hadoop? First, Submarine
> included some
>     > > innovation parts such as enhancements of user experiences for YARN
>     > > services/containerization support which we can add it back to
> Hadoop later
>     > > to address common requirements. In addition to that, we have a big
> overlap
>     > > in the community developing and using it.
>     > >
>     > > There're several proposals we have went through during Ozone merge
> to trunk
>     > > discussion:
>     > >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>     > >
>     > > I propose to adopt Ozone model: which is the same master branch,
> different
>     > > release cycle, and different release branch. It is a great example
> to show
>     > > agile release we can do (2 Ozone releases after Oct 2018) with less
>     > > overhead to setup CI, projects, etc.
>     > >
>     > > *Links:*
>     > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
>     > > - Design doc
>     > > <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
>     > > - User doc
>     > > <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
>     > > (3.2.0
>     > > release)
>     > > - Blogposts, {Submarine} : Running deep learning workloads on
> Apache Hadoop
>     > > <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
>     > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
>     > > - Talks: Strata Data Conf NY
>     > > <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>     > >
>     > > Thoughts?
>     > >
>     > > Thanks,
>     > > Wangda Tan
>     >
>     >
>     >
>     > ---------------------------------------------------------------------
>     > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>     >
>
>
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Eric Yang <ey...@hortonworks.com>.

Submarine is an application built for YARN framework, but it does not have strong dependency on YARN development.  For this kind of projects, it would be best to enter Apache Incubator cycles to create a new community.  Apache commons is the only project other than Incubator that has independent release cycles.  The collection is large, and the project goal is ambitious.  No one really knows which component works with each other in Apache commons.  Hadoop is a much more focused project on distributed computing framework and not incubation sandbox.  For alignment with Hadoop goals, and we want to prevent Hadoop project to be overloaded while allowing good ideas to be carried forwarded in Apache incubator.  Put on my Apache Member hat, my vote is -1 to allow more independent subproject release cycle in Hadoop project that does not align with Hadoop project goals.  

Apache incubator process is highly recommended for Submarine: https://incubator.apache.org/policy/process.html This allows Submarine to develop for older version of Hadoop like Spark works with multiple versions of Hadoop.

Regards,
Eric

On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:

    Thanks for proposing this Wangda, my +1 as well.
    It is amazing to see the progress made in Submarine last year, the community grows fast and quiet collaborative. I can see the reasons to get it release faster in its own cycle. And at the same time, the Ozone way works very well.
    
    —
    Weiwei
    On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
    > +1
    >
    > Hello everyone,
    >
    > I am Xun Liu, the head of the machine learning team at Netease Research Institute. I quite agree with Wangda.
    >
    > Our team is very grateful for getting Submarine machine learning engine from the community.
    > We are heavy users of Submarine.
    > Because Submarine fits into the direction of our big data team's hadoop technology stack,
    > It avoids the needs to increase the manpower investment in learning other container scheduling systems.
    > The important thing is that we can use a common YARN cluster to run machine learning,
    > which makes the utilization of server resources more efficient, and reserves a lot of human and material resources in our previous years.
    >
    > Our team have finished the test and deployment of the Submarine and will provide the service to our e-commerce department (http://www.kaola.com/) shortly.
    >
    > We also plan to provides the Submarine engine in our existing YARN cluster in the next six months.
    > Because we have a lot of product departments need to use machine learning services,
    > for example:
    > 1) Game department (http://game.163.com/) needs AI battle training,
    > 2) News department (http://www.163.com) needs news recommendation,
    > 3) Mailbox department (http://www.163.com) requires anti-spam and illegal detection,
    > 4) Music department (https://music.163.com/) requires music recommendation,
    > 5) Education department (http://www.youdao.com) requires voice recognition,
    > 6) Massive Open Online Courses (https://open.163.com/) requires multilingual translation and so on.
    >
    > If Submarine can be released independently like Ozone, it will help us quickly get the latest features and improvements, and it will be great helpful to our team and users.
    >
    > Thanks hadoop Community!
    >
    >
    > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
    > >
    > > Hi devs,
    > >
    > > Since we started submarine-related effort last year, we received a lot of
    > > feedbacks, several companies (such as Netease, China Mobile, etc.) are
    > > trying to deploy Submarine to their Hadoop cluster along with big data
    > > workloads. Linkedin also has big interests to contribute a Submarine TonY (
    > > https://github.com/linkedin/TonY) runtime to allow users to use the same
    > > interface.
    > >
    > > From what I can see, there're several issues of putting Submarine under
    > > yarn-applications directory and have same release cycle with Hadoop:
    > >
    > > 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
    > > 2019. Because of non-predictable blockers and security issues, it got
    > > delayed a lot. We need to iterate submarine fast at this point.
    > >
    > > 2) We also see a lot of requirements to use Submarine on older Hadoop
    > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
    > > short time, but the requirement to run deep learning is urgent to them. We
    > > should decouple Submarine from Hadoop version.
    > >
    > > And why we wanna to keep it within Hadoop? First, Submarine included some
    > > innovation parts such as enhancements of user experiences for YARN
    > > services/containerization support which we can add it back to Hadoop later
    > > to address common requirements. In addition to that, we have a big overlap
    > > in the community developing and using it.
    > >
    > > There're several proposals we have went through during Ozone merge to trunk
    > > discussion:
    > > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
    > >
    > > I propose to adopt Ozone model: which is the same master branch, different
    > > release cycle, and different release branch. It is a great example to show
    > > agile release we can do (2 Ozone releases after Oct 2018) with less
    > > overhead to setup CI, projects, etc.
    > >
    > > *Links:*
    > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
    > > - Design doc
    > > <https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit>
    > > - User doc
    > > <https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html>
    > > (3.2.0
    > > release)
    > > - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
    > > <https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/>,
    > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
    > > - Talks: Strata Data Conf NY
    > > <https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289>
    > >
    > > Thoughts?
    > >
    > > Thanks,
    > > Wangda Tan
    >
    >
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
    > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
    >

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Eric Yang <ey...@hortonworks.com>.

Submarine is an application built for YARN framework, but it does not have strong dependency on YARN development.  For this kind of projects, it would be best to enter Apache Incubator cycles to create a new community.  Apache commons is the only project other than Incubator that has independent release cycles.  The collection is large, and the project goal is ambitious.  No one really knows which component works with each other in Apache commons.  Hadoop is a much more focused project on distributed computing framework and not incubation sandbox.  For alignment with Hadoop goals, and we want to prevent Hadoop project to be overloaded while allowing good ideas to be carried forwarded in Apache incubator.  Put on my Apache Member hat, my vote is -1 to allow more independent subproject release cycle in Hadoop project that does not align with Hadoop project goals.  

Apache incubator process is highly recommended for Submarine: https://incubator.apache.org/policy/process.html This allows Submarine to develop for older version of Hadoop like Spark works with multiple versions of Hadoop.

Regards,
Eric

On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:

    Thanks for proposing this Wangda, my +1 as well.
    It is amazing to see the progress made in Submarine last year, the community grows fast and quiet collaborative. I can see the reasons to get it release faster in its own cycle. And at the same time, the Ozone way works very well.
    
    —
    Weiwei
    On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
    > +1
    >
    > Hello everyone,
    >
    > I am Xun Liu, the head of the machine learning team at Netease Research Institute. I quite agree with Wangda.
    >
    > Our team is very grateful for getting Submarine machine learning engine from the community.
    > We are heavy users of Submarine.
    > Because Submarine fits into the direction of our big data team's hadoop technology stack,
    > It avoids the needs to increase the manpower investment in learning other container scheduling systems.
    > The important thing is that we can use a common YARN cluster to run machine learning,
    > which makes the utilization of server resources more efficient, and reserves a lot of human and material resources in our previous years.
    >
    > Our team have finished the test and deployment of the Submarine and will provide the service to our e-commerce department (http://www.kaola.com/) shortly.
    >
    > We also plan to provides the Submarine engine in our existing YARN cluster in the next six months.
    > Because we have a lot of product departments need to use machine learning services,
    > for example:
    > 1) Game department (http://game.163.com/) needs AI battle training,
    > 2) News department (http://www.163.com) needs news recommendation,
    > 3) Mailbox department (http://www.163.com) requires anti-spam and illegal detection,
    > 4) Music department (https://music.163.com/) requires music recommendation,
    > 5) Education department (http://www.youdao.com) requires voice recognition,
    > 6) Massive Open Online Courses (https://open.163.com/) requires multilingual translation and so on.
    >
    > If Submarine can be released independently like Ozone, it will help us quickly get the latest features and improvements, and it will be great helpful to our team and users.
    >
    > Thanks hadoop Community!
    >
    >
    > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
    > >
    > > Hi devs,
    > >
    > > Since we started submarine-related effort last year, we received a lot of
    > > feedbacks, several companies (such as Netease, China Mobile, etc.) are
    > > trying to deploy Submarine to their Hadoop cluster along with big data
    > > workloads. Linkedin also has big interests to contribute a Submarine TonY (
    > > https://github.com/linkedin/TonY) runtime to allow users to use the same
    > > interface.
    > >
    > > From what I can see, there're several issues of putting Submarine under
    > > yarn-applications directory and have same release cycle with Hadoop:
    > >
    > > 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
    > > 2019. Because of non-predictable blockers and security issues, it got
    > > delayed a lot. We need to iterate submarine fast at this point.
    > >
    > > 2) We also see a lot of requirements to use Submarine on older Hadoop
    > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
    > > short time, but the requirement to run deep learning is urgent to them. We
    > > should decouple Submarine from Hadoop version.
    > >
    > > And why we wanna to keep it within Hadoop? First, Submarine included some
    > > innovation parts such as enhancements of user experiences for YARN
    > > services/containerization support which we can add it back to Hadoop later
    > > to address common requirements. In addition to that, we have a big overlap
    > > in the community developing and using it.
    > >
    > > There're several proposals we have went through during Ozone merge to trunk
    > > discussion:
    > > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
    > >
    > > I propose to adopt Ozone model: which is the same master branch, different
    > > release cycle, and different release branch. It is a great example to show
    > > agile release we can do (2 Ozone releases after Oct 2018) with less
    > > overhead to setup CI, projects, etc.
    > >
    > > *Links:*
    > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
    > > - Design doc
    > > <https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit>
    > > - User doc
    > > <https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html>
    > > (3.2.0
    > > release)
    > > - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
    > > <https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/>,
    > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
    > > - Talks: Strata Data Conf NY
    > > <https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289>
    > >
    > > Thoughts?
    > >
    > > Thanks,
    > > Wangda Tan
    >
    >
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
    > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
    >

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Eric Yang <ey...@hortonworks.com>.

Submarine is an application built for YARN framework, but it does not have strong dependency on YARN development.  For this kind of projects, it would be best to enter Apache Incubator cycles to create a new community.  Apache commons is the only project other than Incubator that has independent release cycles.  The collection is large, and the project goal is ambitious.  No one really knows which component works with each other in Apache commons.  Hadoop is a much more focused project on distributed computing framework and not incubation sandbox.  For alignment with Hadoop goals, and we want to prevent Hadoop project to be overloaded while allowing good ideas to be carried forwarded in Apache incubator.  Put on my Apache Member hat, my vote is -1 to allow more independent subproject release cycle in Hadoop project that does not align with Hadoop project goals.  

Apache incubator process is highly recommended for Submarine: https://incubator.apache.org/policy/process.html This allows Submarine to develop for older version of Hadoop like Spark works with multiple versions of Hadoop.

Regards,
Eric

On 1/31/19, 10:51 PM, "Weiwei Yang" <ab...@gmail.com> wrote:

    Thanks for proposing this Wangda, my +1 as well.
    It is amazing to see the progress made in Submarine last year, the community grows fast and quiet collaborative. I can see the reasons to get it release faster in its own cycle. And at the same time, the Ozone way works very well.
    
    —
    Weiwei
    On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
    > +1
    >
    > Hello everyone,
    >
    > I am Xun Liu, the head of the machine learning team at Netease Research Institute. I quite agree with Wangda.
    >
    > Our team is very grateful for getting Submarine machine learning engine from the community.
    > We are heavy users of Submarine.
    > Because Submarine fits into the direction of our big data team's hadoop technology stack,
    > It avoids the needs to increase the manpower investment in learning other container scheduling systems.
    > The important thing is that we can use a common YARN cluster to run machine learning,
    > which makes the utilization of server resources more efficient, and reserves a lot of human and material resources in our previous years.
    >
    > Our team have finished the test and deployment of the Submarine and will provide the service to our e-commerce department (http://www.kaola.com/) shortly.
    >
    > We also plan to provides the Submarine engine in our existing YARN cluster in the next six months.
    > Because we have a lot of product departments need to use machine learning services,
    > for example:
    > 1) Game department (http://game.163.com/) needs AI battle training,
    > 2) News department (http://www.163.com) needs news recommendation,
    > 3) Mailbox department (http://www.163.com) requires anti-spam and illegal detection,
    > 4) Music department (https://music.163.com/) requires music recommendation,
    > 5) Education department (http://www.youdao.com) requires voice recognition,
    > 6) Massive Open Online Courses (https://open.163.com/) requires multilingual translation and so on.
    >
    > If Submarine can be released independently like Ozone, it will help us quickly get the latest features and improvements, and it will be great helpful to our team and users.
    >
    > Thanks hadoop Community!
    >
    >
    > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
    > >
    > > Hi devs,
    > >
    > > Since we started submarine-related effort last year, we received a lot of
    > > feedbacks, several companies (such as Netease, China Mobile, etc.) are
    > > trying to deploy Submarine to their Hadoop cluster along with big data
    > > workloads. Linkedin also has big interests to contribute a Submarine TonY (
    > > https://github.com/linkedin/TonY) runtime to allow users to use the same
    > > interface.
    > >
    > > From what I can see, there're several issues of putting Submarine under
    > > yarn-applications directory and have same release cycle with Hadoop:
    > >
    > > 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
    > > 2019. Because of non-predictable blockers and security issues, it got
    > > delayed a lot. We need to iterate submarine fast at this point.
    > >
    > > 2) We also see a lot of requirements to use Submarine on older Hadoop
    > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
    > > short time, but the requirement to run deep learning is urgent to them. We
    > > should decouple Submarine from Hadoop version.
    > >
    > > And why we wanna to keep it within Hadoop? First, Submarine included some
    > > innovation parts such as enhancements of user experiences for YARN
    > > services/containerization support which we can add it back to Hadoop later
    > > to address common requirements. In addition to that, we have a big overlap
    > > in the community developing and using it.
    > >
    > > There're several proposals we have went through during Ozone merge to trunk
    > > discussion:
    > > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
    > >
    > > I propose to adopt Ozone model: which is the same master branch, different
    > > release cycle, and different release branch. It is a great example to show
    > > agile release we can do (2 Ozone releases after Oct 2018) with less
    > > overhead to setup CI, projects, etc.
    > >
    > > *Links:*
    > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
    > > - Design doc
    > > <https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit>
    > > - User doc
    > > <https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html>
    > > (3.2.0
    > > release)
    > > - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
    > > <https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/>,
    > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
    > > - Talks: Strata Data Conf NY
    > > <https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289>
    > >
    > > Thoughts?
    > >
    > > Thanks,
    > > Wangda Tan
    >
    >
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
    > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
    >

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Zhe Zhang <zh...@apache.org>.

+1 on the proposal and looking forward to the progress of the project!

On Thu, Jan 31, 2019 at 10:51 PM Weiwei Yang <ab...@gmail.com> wrote:

> Thanks for proposing this Wangda, my +1 as well.
> It is amazing to see the progress made in Submarine last year, the
> community grows fast and quiet collaborative. I can see the reasons to get
> it release faster in its own cycle. And at the same time, the Ozone way
> works very well.
>
> —
> Weiwei
> On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
> > +1
> >
> > Hello everyone,
> >
> > I am Xun Liu, the head of the machine learning team at Netease Research
> Institute. I quite agree with Wangda.
> >
> > Our team is very grateful for getting Submarine machine learning engine
> from the community.
> > We are heavy users of Submarine.
> > Because Submarine fits into the direction of our big data team's hadoop
> technology stack,
> > It avoids the needs to increase the manpower investment in learning
> other container scheduling systems.
> > The important thing is that we can use a common YARN cluster to run
> machine learning,
> > which makes the utilization of server resources more efficient, and
> reserves a lot of human and material resources in our previous years.
> >
> > Our team have finished the test and deployment of the Submarine and will
> provide the service to our e-commerce department (http://www.kaola.com/)
> shortly.
> >
> > We also plan to provides the Submarine engine in our existing YARN
> cluster in the next six months.
> > Because we have a lot of product departments need to use machine
> learning services,
> > for example:
> > 1) Game department (http://game.163.com/) needs AI battle training,
> > 2) News department (http://www.163.com) needs news recommendation,
> > 3) Mailbox department (http://www.163.com) requires anti-spam and
> illegal detection,
> > 4) Music department (https://music.163.com/) requires music
> recommendation,
> > 5) Education department (http://www.youdao.com) requires voice
> recognition,
> > 6) Massive Open Online Courses (https://open.163.com/) requires
> multilingual translation and so on.
> >
> > If Submarine can be released independently like Ozone, it will help us
> quickly get the latest features and improvements, and it will be great
> helpful to our team and users.
> >
> > Thanks hadoop Community!
> >
> >
> > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
> > >
> > > Hi devs,
> > >
> > > Since we started submarine-related effort last year, we received a lot
> of
> > > feedbacks, several companies (such as Netease, China Mobile, etc.) are
> > > trying to deploy Submarine to their Hadoop cluster along with big data
> > > workloads. Linkedin also has big interests to contribute a Submarine
> TonY (
> > > https://github.com/linkedin/TonY) runtime to allow users to use the
> same
> > > interface.
> > >
> > > From what I can see, there're several issues of putting Submarine under
> > > yarn-applications directory and have same release cycle with Hadoop:
> > >
> > > 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> > > 2019. Because of non-predictable blockers and security issues, it got
> > > delayed a lot. We need to iterate submarine fast at this point.
> > >
> > > 2) We also see a lot of requirements to use Submarine on older Hadoop
> > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> > > short time, but the requirement to run deep learning is urgent to
> them. We
> > > should decouple Submarine from Hadoop version.
> > >
> > > And why we wanna to keep it within Hadoop? First, Submarine included
> some
> > > innovation parts such as enhancements of user experiences for YARN
> > > services/containerization support which we can add it back to Hadoop
> later
> > > to address common requirements. In addition to that, we have a big
> overlap
> > > in the community developing and using it.
> > >
> > > There're several proposals we have went through during Ozone merge to
> trunk
> > > discussion:
> > >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> > >
> > > I propose to adopt Ozone model: which is the same master branch,
> different
> > > release cycle, and different release branch. It is a great example to
> show
> > > agile release we can do (2 Ozone releases after Oct 2018) with less
> > > overhead to setup CI, projects, etc.
> > >
> > > *Links:*
> > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> > > - Design doc
> > > <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> > > - User doc
> > > <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> > > (3.2.0
> > > release)
> > > - Blogposts, {Submarine} : Running deep learning workloads on Apache
> Hadoop
> > > <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> > > - Talks: Strata Data Conf NY
> > > <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
> > >
> > > Thoughts?
> > >
> > > Thanks,
> > > Wangda Tan
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
> >
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Zhe Zhang <zh...@apache.org>.

+1 on the proposal and looking forward to the progress of the project!

On Thu, Jan 31, 2019 at 10:51 PM Weiwei Yang <ab...@gmail.com> wrote:

> Thanks for proposing this Wangda, my +1 as well.
> It is amazing to see the progress made in Submarine last year, the
> community grows fast and quiet collaborative. I can see the reasons to get
> it release faster in its own cycle. And at the same time, the Ozone way
> works very well.
>
> —
> Weiwei
> On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
> > +1
> >
> > Hello everyone,
> >
> > I am Xun Liu, the head of the machine learning team at Netease Research
> Institute. I quite agree with Wangda.
> >
> > Our team is very grateful for getting Submarine machine learning engine
> from the community.
> > We are heavy users of Submarine.
> > Because Submarine fits into the direction of our big data team's hadoop
> technology stack,
> > It avoids the needs to increase the manpower investment in learning
> other container scheduling systems.
> > The important thing is that we can use a common YARN cluster to run
> machine learning,
> > which makes the utilization of server resources more efficient, and
> reserves a lot of human and material resources in our previous years.
> >
> > Our team have finished the test and deployment of the Submarine and will
> provide the service to our e-commerce department (http://www.kaola.com/)
> shortly.
> >
> > We also plan to provides the Submarine engine in our existing YARN
> cluster in the next six months.
> > Because we have a lot of product departments need to use machine
> learning services,
> > for example:
> > 1) Game department (http://game.163.com/) needs AI battle training,
> > 2) News department (http://www.163.com) needs news recommendation,
> > 3) Mailbox department (http://www.163.com) requires anti-spam and
> illegal detection,
> > 4) Music department (https://music.163.com/) requires music
> recommendation,
> > 5) Education department (http://www.youdao.com) requires voice
> recognition,
> > 6) Massive Open Online Courses (https://open.163.com/) requires
> multilingual translation and so on.
> >
> > If Submarine can be released independently like Ozone, it will help us
> quickly get the latest features and improvements, and it will be great
> helpful to our team and users.
> >
> > Thanks hadoop Community!
> >
> >
> > > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
> > >
> > > Hi devs,
> > >
> > > Since we started submarine-related effort last year, we received a lot
> of
> > > feedbacks, several companies (such as Netease, China Mobile, etc.) are
> > > trying to deploy Submarine to their Hadoop cluster along with big data
> > > workloads. Linkedin also has big interests to contribute a Submarine
> TonY (
> > > https://github.com/linkedin/TonY) runtime to allow users to use the
> same
> > > interface.
> > >
> > > From what I can see, there're several issues of putting Submarine under
> > > yarn-applications directory and have same release cycle with Hadoop:
> > >
> > > 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> > > 2019. Because of non-predictable blockers and security issues, it got
> > > delayed a lot. We need to iterate submarine fast at this point.
> > >
> > > 2) We also see a lot of requirements to use Submarine on older Hadoop
> > > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> > > short time, but the requirement to run deep learning is urgent to
> them. We
> > > should decouple Submarine from Hadoop version.
> > >
> > > And why we wanna to keep it within Hadoop? First, Submarine included
> some
> > > innovation parts such as enhancements of user experiences for YARN
> > > services/containerization support which we can add it back to Hadoop
> later
> > > to address common requirements. In addition to that, we have a big
> overlap
> > > in the community developing and using it.
> > >
> > > There're several proposals we have went through during Ozone merge to
> trunk
> > > discussion:
> > >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> > >
> > > I propose to adopt Ozone model: which is the same master branch,
> different
> > > release cycle, and different release branch. It is a great example to
> show
> > > agile release we can do (2 Ozone releases after Oct 2018) with less
> > > overhead to setup CI, projects, etc.
> > >
> > > *Links:*
> > > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> > > - Design doc
> > > <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> > > - User doc
> > > <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> > > (3.2.0
> > > release)
> > > - Blogposts, {Submarine} : Running deep learning workloads on Apache
> Hadoop
> > > <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> > > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> > > - Talks: Strata Data Conf NY
> > > <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
> > >
> > > Thoughts?
> > >
> > > Thanks,
> > > Wangda Tan
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
> >
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Weiwei Yang <ab...@gmail.com>.

Thanks for proposing this Wangda, my +1 as well.
It is amazing to see the progress made in Submarine last year, the community grows fast and quiet collaborative. I can see the reasons to get it release faster in its own cycle. And at the same time, the Ozone way works very well.

—
Weiwei
On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
> +1
>
> Hello everyone,
>
> I am Xun Liu, the head of the machine learning team at Netease Research Institute. I quite agree with Wangda.
>
> Our team is very grateful for getting Submarine machine learning engine from the community.
> We are heavy users of Submarine.
> Because Submarine fits into the direction of our big data team's hadoop technology stack,
> It avoids the needs to increase the manpower investment in learning other container scheduling systems.
> The important thing is that we can use a common YARN cluster to run machine learning,
> which makes the utilization of server resources more efficient, and reserves a lot of human and material resources in our previous years.
>
> Our team have finished the test and deployment of the Submarine and will provide the service to our e-commerce department (http://www.kaola.com/) shortly.
>
> We also plan to provides the Submarine engine in our existing YARN cluster in the next six months.
> Because we have a lot of product departments need to use machine learning services,
> for example:
> 1) Game department (http://game.163.com/) needs AI battle training,
> 2) News department (http://www.163.com) needs news recommendation,
> 3) Mailbox department (http://www.163.com) requires anti-spam and illegal detection,
> 4) Music department (https://music.163.com/) requires music recommendation,
> 5) Education department (http://www.youdao.com) requires voice recognition,
> 6) Massive Open Online Courses (https://open.163.com/) requires multilingual translation and so on.
>
> If Submarine can be released independently like Ozone, it will help us quickly get the latest features and improvements, and it will be great helpful to our team and users.
>
> Thanks hadoop Community!
>
>
> > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
> >
> > Hi devs,
> >
> > Since we started submarine-related effort last year, we received a lot of
> > feedbacks, several companies (such as Netease, China Mobile, etc.) are
> > trying to deploy Submarine to their Hadoop cluster along with big data
> > workloads. Linkedin also has big interests to contribute a Submarine TonY (
> > https://github.com/linkedin/TonY) runtime to allow users to use the same
> > interface.
> >
> > From what I can see, there're several issues of putting Submarine under
> > yarn-applications directory and have same release cycle with Hadoop:
> >
> > 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> > 2019. Because of non-predictable blockers and security issues, it got
> > delayed a lot. We need to iterate submarine fast at this point.
> >
> > 2) We also see a lot of requirements to use Submarine on older Hadoop
> > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> > short time, but the requirement to run deep learning is urgent to them. We
> > should decouple Submarine from Hadoop version.
> >
> > And why we wanna to keep it within Hadoop? First, Submarine included some
> > innovation parts such as enhancements of user experiences for YARN
> > services/containerization support which we can add it back to Hadoop later
> > to address common requirements. In addition to that, we have a big overlap
> > in the community developing and using it.
> >
> > There're several proposals we have went through during Ozone merge to trunk
> > discussion:
> > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> >
> > I propose to adopt Ozone model: which is the same master branch, different
> > release cycle, and different release branch. It is a great example to show
> > agile release we can do (2 Ozone releases after Oct 2018) with less
> > overhead to setup CI, projects, etc.
> >
> > *Links:*
> > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> > - Design doc
> > <https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit>
> > - User doc
> > <https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html>
> > (3.2.0
> > release)
> > - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> > <https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/>,
> > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> > - Talks: Strata Data Conf NY
> > <https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289>
> >
> > Thoughts?
> >
> > Thanks,
> > Wangda Tan
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Weiwei Yang <ab...@gmail.com>.

Thanks for proposing this Wangda, my +1 as well.
It is amazing to see the progress made in Submarine last year, the community grows fast and quiet collaborative. I can see the reasons to get it release faster in its own cycle. And at the same time, the Ozone way works very well.

—
Weiwei
On Feb 1, 2019, 10:49 AM +0800, Xun Liu <ne...@163.com>, wrote:
> +1
>
> Hello everyone,
>
> I am Xun Liu, the head of the machine learning team at Netease Research Institute. I quite agree with Wangda.
>
> Our team is very grateful for getting Submarine machine learning engine from the community.
> We are heavy users of Submarine.
> Because Submarine fits into the direction of our big data team's hadoop technology stack,
> It avoids the needs to increase the manpower investment in learning other container scheduling systems.
> The important thing is that we can use a common YARN cluster to run machine learning,
> which makes the utilization of server resources more efficient, and reserves a lot of human and material resources in our previous years.
>
> Our team have finished the test and deployment of the Submarine and will provide the service to our e-commerce department (http://www.kaola.com/) shortly.
>
> We also plan to provides the Submarine engine in our existing YARN cluster in the next six months.
> Because we have a lot of product departments need to use machine learning services,
> for example:
> 1) Game department (http://game.163.com/) needs AI battle training,
> 2) News department (http://www.163.com) needs news recommendation,
> 3) Mailbox department (http://www.163.com) requires anti-spam and illegal detection,
> 4) Music department (https://music.163.com/) requires music recommendation,
> 5) Education department (http://www.youdao.com) requires voice recognition,
> 6) Massive Open Online Courses (https://open.163.com/) requires multilingual translation and so on.
>
> If Submarine can be released independently like Ozone, it will help us quickly get the latest features and improvements, and it will be great helpful to our team and users.
>
> Thanks hadoop Community!
>
>
> > 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
> >
> > Hi devs,
> >
> > Since we started submarine-related effort last year, we received a lot of
> > feedbacks, several companies (such as Netease, China Mobile, etc.) are
> > trying to deploy Submarine to their Hadoop cluster along with big data
> > workloads. Linkedin also has big interests to contribute a Submarine TonY (
> > https://github.com/linkedin/TonY) runtime to allow users to use the same
> > interface.
> >
> > From what I can see, there're several issues of putting Submarine under
> > yarn-applications directory and have same release cycle with Hadoop:
> >
> > 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> > 2019. Because of non-predictable blockers and security issues, it got
> > delayed a lot. We need to iterate submarine fast at this point.
> >
> > 2) We also see a lot of requirements to use Submarine on older Hadoop
> > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> > short time, but the requirement to run deep learning is urgent to them. We
> > should decouple Submarine from Hadoop version.
> >
> > And why we wanna to keep it within Hadoop? First, Submarine included some
> > innovation parts such as enhancements of user experiences for YARN
> > services/containerization support which we can add it back to Hadoop later
> > to address common requirements. In addition to that, we have a big overlap
> > in the community developing and using it.
> >
> > There're several proposals we have went through during Ozone merge to trunk
> > discussion:
> > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> >
> > I propose to adopt Ozone model: which is the same master branch, different
> > release cycle, and different release branch. It is a great example to show
> > agile release we can do (2 Ozone releases after Oct 2018) with less
> > overhead to setup CI, projects, etc.
> >
> > *Links:*
> > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> > - Design doc
> > <https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit>
> > - User doc
> > <https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html>
> > (3.2.0
> > release)
> > - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> > <https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/>,
> > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> > - Talks: Strata Data Conf NY
> > <https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289>
> >
> > Thoughts?
> >
> > Thanks,
> > Wangda Tan
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Xun Liu <ne...@163.com>.

+1

Hello everyone, 

I am Xun Liu, the head of the machine learning team at Netease Research Institute. I quite agree with Wangda.

Our team is very grateful for getting Submarine machine learning engine from the community.  
We are heavy users of Submarine. 
Because Submarine fits into the direction of our big data team's hadoop technology stack, 
It avoids the needs to increase the manpower investment in learning other container scheduling systems. 
The important thing is that we can use a common YARN cluster to run machine learning, 
which makes the utilization of server resources more efficient, and reserves a lot of human and material resources in our previous years.

Our team have finished the test and deployment of the Submarine and will provide the service to our e-commerce department (http://www.kaola.com/) shortly.

We also plan to provides the Submarine engine in our existing YARN cluster in the next six months. 
Because we have a lot of product departments need to use machine learning services, 
for example: 
1) Game department (http://game.163.com/) needs AI battle training, 
2) News department (http://www.163.com) needs news recommendation,
3) Mailbox department (http://www.163.com) requires anti-spam and illegal detection, 
4) Music department (https://music.163.com/) requires music recommendation, 
5) Education department (http://www.youdao.com) requires voice recognition, 
6) Massive Open Online Courses (https://open.163.com/) requires multilingual translation and so on.

If Submarine can be released independently like Ozone, it will help us quickly get the latest features and improvements, and it will be great helpful to our team and users.

Thanks hadoop Community!


> 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
> 
> Hi devs,
> 
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
> 
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
> 
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
> 
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
> 
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
> 
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> 
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
> 
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit>
> - User doc
> <https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html>
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/>,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289>
> 
> Thoughts?
> 
> Thanks,
> Wangda Tan



---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Jonathan Hung <jy...@gmail.com>.

+1. This is important for improving the deep learning on hadoop story.
There's recently a lot of momentum for this, and decoupling
submarine/hadoop will help it continue.

Jonathan Hung


On Thu, Jan 31, 2019 at 11:04 AM Wangda Tan <wh...@gmail.com> wrote:

> Hi devs,
>
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
>
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
>
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
>
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
>
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
>
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
>
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
>
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> - User doc
> <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>
> Thoughts?
>
> Thanks,
> Wangda Tan
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Sunil G <su...@apache.org>.

+1 from me on this.
ML/DL is one of the fast growing areas and a runtime on YARN helps customers
to have ML/DL workloads to run on same cluster where the ETL or other
traditional
big data workloads ingest or mine data.
Faster release cadence can pace up the development for Submarine and more
agile
to run in older hadoop version without any upgrade efforts.

- Sunil



On Fri, Feb 1, 2019 at 12:34 AM Wangda Tan <wh...@gmail.com> wrote:

> Hi devs,
>
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
>
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
>
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
>
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
>
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
>
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
>
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
>
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> - User doc
> <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>
> Thoughts?
>
> Thanks,
> Wangda Tan
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Jonathan Hung <jy...@gmail.com>.

+1. This is important for improving the deep learning on hadoop story.
There's recently a lot of momentum for this, and decoupling
submarine/hadoop will help it continue.

Jonathan Hung


On Thu, Jan 31, 2019 at 11:04 AM Wangda Tan <wh...@gmail.com> wrote:

> Hi devs,
>
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
>
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
>
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
>
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
>
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
>
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
>
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
>
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> - User doc
> <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>
> Thoughts?
>
> Thanks,
> Wangda Tan
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Arun Suresh <as...@apache.org>.

Thanks for bringing this up Wangda. +1
Makes a lot of sense to have Submarine follow its own release cadence - for
all the reasons you outlined.
I would one up this proposal to ask why shouldn't we allow YARN to have its
own releases as well - but that is for a separate thread :)

Cheers
-Arun

On Thu, Jan 31, 2019 at 11:04 AM Wangda Tan <wh...@gmail.com> wrote:

> Hi devs,
>
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
>
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
>
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
>
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
>
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
>
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
>
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
>
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> - User doc
> <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>
> Thoughts?
>
> Thanks,
> Wangda Tan
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Xun Liu <ne...@163.com>.

+1

Hello everyone, 

I am Xun Liu, the head of the machine learning team at Netease Research Institute. I quite agree with Wangda.

Our team is very grateful for getting Submarine machine learning engine from the community.  
We are heavy users of Submarine. 
Because Submarine fits into the direction of our big data team's hadoop technology stack, 
It avoids the needs to increase the manpower investment in learning other container scheduling systems. 
The important thing is that we can use a common YARN cluster to run machine learning, 
which makes the utilization of server resources more efficient, and reserves a lot of human and material resources in our previous years.

Our team have finished the test and deployment of the Submarine and will provide the service to our e-commerce department (http://www.kaola.com/) shortly.

We also plan to provides the Submarine engine in our existing YARN cluster in the next six months. 
Because we have a lot of product departments need to use machine learning services, 
for example: 
1) Game department (http://game.163.com/) needs AI battle training, 
2) News department (http://www.163.com) needs news recommendation,
3) Mailbox department (http://www.163.com) requires anti-spam and illegal detection, 
4) Music department (https://music.163.com/) requires music recommendation, 
5) Education department (http://www.youdao.com) requires voice recognition, 
6) Massive Open Online Courses (https://open.163.com/) requires multilingual translation and so on.

If Submarine can be released independently like Ozone, it will help us quickly get the latest features and improvements, and it will be great helpful to our team and users.

Thanks hadoop Community!


> 在 2019年2月1日，上午2:53，Wangda Tan <wh...@gmail.com> 写道：
> 
> Hi devs,
> 
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
> 
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
> 
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
> 
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
> 
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
> 
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> 
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
> 
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit>
> - User doc
> <https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html>
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/>,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289>
> 
> Thoughts?
> 
> Thanks,
> Wangda Tan



---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Rohith Sharma K S <ro...@apache.org>.

+1, Few interested ML/DL folks from Banglore asked about Submarine release
for trying out TensorFlow on YARN. We told them wait for release since they
were not ready to use trunk. I see agile release cycle for Submarine brings
lot of added value.

-Rohith Sharma K S

On Fri, 1 Feb 2019 at 00:34, Wangda Tan <wh...@gmail.com> wrote:

> Hi devs,
>
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
>
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
>
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
>
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
>
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
>
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
>
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
>
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> - User doc
> <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>
> Thoughts?
>
> Thanks,
> Wangda Tan
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Rohith Sharma K S <ro...@apache.org>.

+1, Few interested ML/DL folks from Banglore asked about Submarine release
for trying out TensorFlow on YARN. We told them wait for release since they
were not ready to use trunk. I see agile release cycle for Submarine brings
lot of added value.

-Rohith Sharma K S

On Fri, 1 Feb 2019 at 00:34, Wangda Tan <wh...@gmail.com> wrote:

> Hi devs,
>
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
>
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
>
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
>
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
>
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
>
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
>
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
>
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> - User doc
> <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>
> Thoughts?
>
> Thanks,
> Wangda Tan
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Arun Suresh <as...@apache.org>.

Thanks for bringing this up Wangda. +1
Makes a lot of sense to have Submarine follow its own release cadence - for
all the reasons you outlined.
I would one up this proposal to ask why shouldn't we allow YARN to have its
own releases as well - but that is for a separate thread :)

Cheers
-Arun

On Thu, Jan 31, 2019 at 11:04 AM Wangda Tan <wh...@gmail.com> wrote:

> Hi devs,
>
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
>
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
>
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
>
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
>
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
>
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
>
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
>
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> - User doc
> <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>
> Thoughts?
>
> Thanks,
> Wangda Tan
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by "Elek, Marton" <el...@apache.org>.

+1.

I like the idea.

For me, submarine/ML-job-execution seems to be a natural extension of
the existing Hadoop/Yarn capabilities.

And like the proposed project structure / release lifecycle, too. I
think it's better to be more modularized but keep the development in the
same project. IMHO it worked well with the Ozone releases. We can do
more frequent releases and support multiple versions of core hadoop but
the tested new improvements could be moved back to the hadoop-common.

Marton

On 1/31/19 7:53 PM, Wangda Tan wrote:
> Hi devs,
> 
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
> 
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
> 
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
> 
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
> 
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
> 
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> 
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
> 
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit>
> - User doc
> <https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html>
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/>,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289>
> 
> Thoughts?
> 
> Thanks,
> Wangda Tan
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Jonathan Hung <jy...@gmail.com>.

+1. This is important for improving the deep learning on hadoop story.
There's recently a lot of momentum for this, and decoupling
submarine/hadoop will help it continue.

Jonathan Hung


On Thu, Jan 31, 2019 at 11:04 AM Wangda Tan <wh...@gmail.com> wrote:

> Hi devs,
>
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
>
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
>
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
>
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
>
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
>
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
>
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
>
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> - User doc
> <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>
> Thoughts?
>
> Thanks,
> Wangda Tan
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Sunil G <su...@apache.org>.

+1 from me on this.
ML/DL is one of the fast growing areas and a runtime on YARN helps customers
to have ML/DL workloads to run on same cluster where the ETL or other
traditional
big data workloads ingest or mine data.
Faster release cadence can pace up the development for Submarine and more
agile
to run in older hadoop version without any upgrade efforts.

- Sunil



On Fri, Feb 1, 2019 at 12:34 AM Wangda Tan <wh...@gmail.com> wrote:

> Hi devs,
>
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
>
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
>
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
>
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
>
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
>
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
>
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
>
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> - User doc
> <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>
> Thoughts?
>
> Thanks,
> Wangda Tan
>