You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Arun Suresh <as...@apache.org> on 2019/02/01 00:05:30 UTC

Re: [DISCUSS] Making submarine to different release model like Ozone

Thanks for bringing this up Wangda. +1
Makes a lot of sense to have Submarine follow its own release cadence - for
all the reasons you outlined.
I would one up this proposal to ask why shouldn't we allow YARN to have its
own releases as well - but that is for a separate thread :)

Cheers
-Arun

On Thu, Jan 31, 2019 at 11:04 AM Wangda Tan <wh...@gmail.com> wrote:

> Hi devs,
>
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
>
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
>
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
>
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
>
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
>
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
>
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
>
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
>
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> >
> - User doc
> <
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> >
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> >,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> >
>
> Thoughts?
>
> Thanks,
> Wangda Tan
>

Re: [DISCUSS] Making submarine to different release model like Ozone

Posted by Oliver Hu <ol...@gmail.com>.

+1.

I'm one of the authors of LinkedIn TonY. We'd love to join the upstream
force to build a deep learning ecosystem around Hadoop. As mentioned by
Wangda, coupling submarine with YARN code base introduced a lot of
unnecessary complexities for us to integrate our use case with upstream,
for example: circular dependencies, a hard dependency on Hadoop 3.x block
us from adopting upstream features for near term. Architecture wise, it
also makes more sense to make YARN component more pure and focus on
resource scheduling.

ML ecosystem is moving extremely fast, having a different release cycles
and keep up the momentum would benefit a lot.

On Thu, Jan 31, 2019 at 4:05 PM Arun Suresh <as...@apache.org> wrote:

> Thanks for bringing this up Wangda. +1
> Makes a lot of sense to have Submarine follow its own release cadence - for
> all the reasons you outlined.
> I would one up this proposal to ask why shouldn't we allow YARN to have its
> own releases as well - but that is for a separate thread :)
>
> Cheers
> -Arun
>
> On Thu, Jan 31, 2019 at 11:04 AM Wangda Tan <wh...@gmail.com> wrote:
>
> > Hi devs,
> >
> > Since we started submarine-related effort last year, we received a lot of
> > feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> > trying to deploy Submarine to their Hadoop cluster along with big data
> > workloads. Linkedin also has big interests to contribute a Submarine
> TonY (
> > https://github.com/linkedin/TonY) runtime to allow users to use the same
> > interface.
> >
> > From what I can see, there're several issues of putting Submarine under
> > yarn-applications directory and have same release cycle with Hadoop:
> >
> > 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> > 2019. Because of non-predictable blockers and security issues, it got
> > delayed a lot. We need to iterate submarine fast at this point.
> >
> > 2) We also see a lot of requirements to use Submarine on older Hadoop
> > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> > short time, but the requirement to run deep learning is urgent to them.
> We
> > should decouple Submarine from Hadoop version.
> >
> > And why we wanna to keep it within Hadoop? First, Submarine included some
> > innovation parts such as enhancements of user experiences for YARN
> > services/containerization support which we can add it back to Hadoop
> later
> > to address common requirements. In addition to that, we have a big
> overlap
> > in the community developing and using it.
> >
> > There're several proposals we have went through during Ozone merge to
> trunk
> > discussion:
> >
> >
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> >
> > I propose to adopt Ozone model: which is the same master branch,
> different
> > release cycle, and different release branch. It is a great example to
> show
> > agile release we can do (2 Ozone releases after Oct 2018) with less
> > overhead to setup CI, projects, etc.
> >
> > *Links:*
> > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> > - Design doc
> > <
> >
> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit
> > >
> > - User doc
> > <
> >
> https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html
> > >
> > (3.2.0
> > release)
> > - Blogposts, {Submarine} : Running deep learning workloads on Apache
> Hadoop
> > <
> >
> https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/
> > >,
> > (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> > - Talks: Strata Data Conf NY
> > <
> >
> https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289
> > >
> >
> > Thoughts?
> >
> > Thanks,
> > Wangda Tan
> >
>