You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by Dragos Dublea <du...@gmail.com> on 2019/04/09 12:55:32 UTC

Re: [DISCUSS] Notebook serving

Hello,

This is a very interesting topic. I did go through the design doc. Can you
please mentor me to implement this? I am very much interested in taking it
up.

Thanks

On 2019/03/26 21:31:16, moon soo Lee <m....@apache.org> wrote:
> Hi,>
>
> There're some challenges bringing a model inside notebook to a production>

> environment.>
> Many many organizations, the most common practice I see today is
something>
> like>
>
> 1. Data scientist develop a model in a data science notebook.>
> 2. SW engineer rewrites the model, to meet the production requirements.>
>
> In other words, data scientists do not have self-service capability. And>
> the organization is spending a lot of time for reimplementing model for>
> production.>
>
> I tried to identify the gaps between data science notebook and production>

> environment, and what can possibly address them. So models that created
by>
> data scientists in the notebook can go production with minimum efforts.>
>
> I made a proposal to solve this problem. Please review and comment. Any>
> ideas and feedbacks are welcome. You can make a modification if needed.>
>
https://docs.google.com/document/d/1YA6q8W9yO8a88xzLDYs9zv_fKu2_cnB58rmQbakxi1I/edit?usp=sharing
>
>
> This document is linked from>
> https://issues.apache.org/jira/browse/ZEPPELIN-3994>
>
> Thanks,>
> moon>
>

Re: [DISCUSS] Notebook serving

Posted by moon soo Lee <mo...@apache.org>.
Hi,

Although https://github.com/apache/zeppelin/pull/3356/files implements the
basic functionality of the design,
few key features are not yet implemented and many things can be improved.
Also we can even expand the scope, like,

 - Scaling (or autoscaling) serving is not yet implemented. Need to design
how we want to scale (manually by chaning # of replica or using autoscaling
using horizontal autoscaler, or maybe both?)
 - Routing table generation is based on periodic polling of services. Can
be improved by using watch api in kubernetes.
 - Notebook serving design take care of Testing of notebook because we can
not think serving notebook in production without test. But what about Code
review? do we need? if yes, how do we want to handle this for notebooks?
 - We have Test task and Serving task. Do we also need Training for Machine
learning use case?
 - Serving task runs at least one ZeppelinServer and an Interpreter JVM
process. Will there be a way to reduce memory footprint? like a
using graalvm in the container. So hundreds and thousands of small models
can be deployed without much overhead.
 - Every component, TestTask, ServingTask, ContextStorage, MetricStorage
are pluggable. Good! And do we need additional implementation for them?
currently Kubernetes environment is default implementation for all of the
component, but how about integrate with other popular software frameworks?
like Kubeflow, TensroflowServing, etc?

I think there are a lot of interesting topics beyond the pullrequest I
made. So, hope this be part of GSoC.

Thanks,
moon

On Fri, Apr 26, 2019 at 1:50 AM Dragos Dublea <du...@gmail.com>
wrote:

> Hello,
>
> Happy Day!
>
> It is great to follow the improvements on this topic in this Pull Request
> <https://github.com/apache/zeppelin/pull/3356/files>.
> Is this project not going to be the part of GSoC any further? Is there a
> scope for FE as a part of GSoC?
>
> Thanks,
>
>
>
> On Tue, 16 Apr 2019 at 23:06, moon soo Lee <mo...@apache.org> wrote:
>
> > Hi,
> >
> > You're right. I joined the program as a mentor.
> > Thanks again for the interest to the project and to this topic.
> >
> > Thanks,
> > moon
> >
> > On Sat, Apr 13, 2019 at 8:50 AM Dragos Dublea <
> dubleadragos2709@gmail.com>
> > wrote:
> >
> >> Hello,
> >>
> >> I will be very glad to take up any subtasks or participate in a
> >> discussion about this project if you get time for the discussion.
> >>
> >> As I will begin with my vacation soon, I am excited to work on this
> >> project with the Zeppelin community.
> >>
> >> Thank you
> >>
> >> On Wed, Apr 10, 2019, 4:03 PM Dragos Dublea <dubleadragos2709@gmail.com
> >
> >> wrote:
> >>
> >>> Hello,
> >>>
> >>> Thank you so much for your reply. AFAIK, Only the student signup period
> >>> is over. Mentors can still join the program. They will have to receive
> an
> >>> invite from the organization admins. Here, in this case, Apache
> Software
> >>> Foundation org admins will have to send the invite link to enable your
> >>> signup.
> >>>
> >>> Thanks again
> >>>
> >>>
> >>>
> >>> On Wed, 10 Apr 2019 at 13:23, moon soo Lee <mo...@apache.org> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> Thanks for the interest in this topic. I realized mentor sign up
> period
> >>>> is
> >>>> finished.
> >>>> Let me see if there's a way to add myself as a GsoC mentor or any
> other
> >>>> alternative.
> >>>>
> >>>> Regards,
> >>>> moon
> >>>>
> >>>> On Tue, Apr 9, 2019 at 9:55 PM Dragos Dublea <
> >>>> dubleadragos2709@gmail.com>
> >>>> wrote:
> >>>>
> >>>> > Hello,
> >>>> >
> >>>> > This is a very interesting topic. I did go through the design doc.
> >>>> Can you
> >>>> > please mentor me to implement this? I am very much interested in
> >>>> taking it
> >>>> > up.
> >>>> >
> >>>> > Thanks
> >>>> >
> >>>> > On 2019/03/26 21:31:16, moon soo Lee <m....@apache.org> wrote:
> >>>> > > Hi,>
> >>>> > >
> >>>> > > There're some challenges bringing a model inside notebook to a
> >>>> > production>
> >>>> >
> >>>> > > environment.>
> >>>> > > Many many organizations, the most common practice I see today is
> >>>> > something>
> >>>> > > like>
> >>>> > >
> >>>> > > 1. Data scientist develop a model in a data science notebook.>
> >>>> > > 2. SW engineer rewrites the model, to meet the production
> >>>> requirements.>
> >>>> > >
> >>>> > > In other words, data scientists do not have self-service
> >>>> capability. And>
> >>>> > > the organization is spending a lot of time for reimplementing
> model
> >>>> for>
> >>>> > > production.>
> >>>> > >
> >>>> > > I tried to identify the gaps between data science notebook and
> >>>> > production>
> >>>> >
> >>>> > > environment, and what can possibly address them. So models that
> >>>> created
> >>>> > by>
> >>>> > > data scientists in the notebook can go production with minimum
> >>>> efforts.>
> >>>> > >
> >>>> > > I made a proposal to solve this problem. Please review and
> comment.
> >>>> Any>
> >>>> > > ideas and feedbacks are welcome. You can make a modification if
> >>>> needed.>
> >>>> > >
> >>>> >
> >>>> >
> >>>>
> https://docs.google.com/document/d/1YA6q8W9yO8a88xzLDYs9zv_fKu2_cnB58rmQbakxi1I/edit?usp=sharing
> >>>> > >
> >>>> > >
> >>>> > > This document is linked from>
> >>>> > > https://issues.apache.org/jira/browse/ZEPPELIN-3994>
> >>>> > >
> >>>> > > Thanks,>
> >>>> > > moon>
> >>>> > >
> >>>> >
> >>>>
> >>>
>

Re: [DISCUSS] Notebook serving

Posted by Dragos Dublea <du...@gmail.com>.
Hello,

Happy Day!

It is great to follow the improvements on this topic in this Pull Request
<https://github.com/apache/zeppelin/pull/3356/files>.
Is this project not going to be the part of GSoC any further? Is there a
scope for FE as a part of GSoC?

Thanks,



On Tue, 16 Apr 2019 at 23:06, moon soo Lee <mo...@apache.org> wrote:

> Hi,
>
> You're right. I joined the program as a mentor.
> Thanks again for the interest to the project and to this topic.
>
> Thanks,
> moon
>
> On Sat, Apr 13, 2019 at 8:50 AM Dragos Dublea <du...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I will be very glad to take up any subtasks or participate in a
>> discussion about this project if you get time for the discussion.
>>
>> As I will begin with my vacation soon, I am excited to work on this
>> project with the Zeppelin community.
>>
>> Thank you
>>
>> On Wed, Apr 10, 2019, 4:03 PM Dragos Dublea <du...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> Thank you so much for your reply. AFAIK, Only the student signup period
>>> is over. Mentors can still join the program. They will have to receive an
>>> invite from the organization admins. Here, in this case, Apache Software
>>> Foundation org admins will have to send the invite link to enable your
>>> signup.
>>>
>>> Thanks again
>>>
>>>
>>>
>>> On Wed, 10 Apr 2019 at 13:23, moon soo Lee <mo...@apache.org> wrote:
>>>
>>>> Hi,
>>>>
>>>> Thanks for the interest in this topic. I realized mentor sign up period
>>>> is
>>>> finished.
>>>> Let me see if there's a way to add myself as a GsoC mentor or any other
>>>> alternative.
>>>>
>>>> Regards,
>>>> moon
>>>>
>>>> On Tue, Apr 9, 2019 at 9:55 PM Dragos Dublea <
>>>> dubleadragos2709@gmail.com>
>>>> wrote:
>>>>
>>>> > Hello,
>>>> >
>>>> > This is a very interesting topic. I did go through the design doc.
>>>> Can you
>>>> > please mentor me to implement this? I am very much interested in
>>>> taking it
>>>> > up.
>>>> >
>>>> > Thanks
>>>> >
>>>> > On 2019/03/26 21:31:16, moon soo Lee <m....@apache.org> wrote:
>>>> > > Hi,>
>>>> > >
>>>> > > There're some challenges bringing a model inside notebook to a
>>>> > production>
>>>> >
>>>> > > environment.>
>>>> > > Many many organizations, the most common practice I see today is
>>>> > something>
>>>> > > like>
>>>> > >
>>>> > > 1. Data scientist develop a model in a data science notebook.>
>>>> > > 2. SW engineer rewrites the model, to meet the production
>>>> requirements.>
>>>> > >
>>>> > > In other words, data scientists do not have self-service
>>>> capability. And>
>>>> > > the organization is spending a lot of time for reimplementing model
>>>> for>
>>>> > > production.>
>>>> > >
>>>> > > I tried to identify the gaps between data science notebook and
>>>> > production>
>>>> >
>>>> > > environment, and what can possibly address them. So models that
>>>> created
>>>> > by>
>>>> > > data scientists in the notebook can go production with minimum
>>>> efforts.>
>>>> > >
>>>> > > I made a proposal to solve this problem. Please review and comment.
>>>> Any>
>>>> > > ideas and feedbacks are welcome. You can make a modification if
>>>> needed.>
>>>> > >
>>>> >
>>>> >
>>>> https://docs.google.com/document/d/1YA6q8W9yO8a88xzLDYs9zv_fKu2_cnB58rmQbakxi1I/edit?usp=sharing
>>>> > >
>>>> > >
>>>> > > This document is linked from>
>>>> > > https://issues.apache.org/jira/browse/ZEPPELIN-3994>
>>>> > >
>>>> > > Thanks,>
>>>> > > moon>
>>>> > >
>>>> >
>>>>
>>>

Re: [DISCUSS] Notebook serving

Posted by moon soo Lee <mo...@apache.org>.
Hi,

You're right. I joined the program as a mentor.
Thanks again for the interest to the project and to this topic.

Thanks,
moon

On Sat, Apr 13, 2019 at 8:50 AM Dragos Dublea <du...@gmail.com>
wrote:

> Hello,
>
> I will be very glad to take up any subtasks or participate in a discussion
> about this project if you get time for the discussion.
>
> As I will begin with my vacation soon, I am excited to work on this
> project with the Zeppelin community.
>
> Thank you
>
> On Wed, Apr 10, 2019, 4:03 PM Dragos Dublea <du...@gmail.com>
> wrote:
>
>> Hello,
>>
>> Thank you so much for your reply. AFAIK, Only the student signup period
>> is over. Mentors can still join the program. They will have to receive an
>> invite from the organization admins. Here, in this case, Apache Software
>> Foundation org admins will have to send the invite link to enable your
>> signup.
>>
>> Thanks again
>>
>>
>>
>> On Wed, 10 Apr 2019 at 13:23, moon soo Lee <mo...@apache.org> wrote:
>>
>>> Hi,
>>>
>>> Thanks for the interest in this topic. I realized mentor sign up period
>>> is
>>> finished.
>>> Let me see if there's a way to add myself as a GsoC mentor or any other
>>> alternative.
>>>
>>> Regards,
>>> moon
>>>
>>> On Tue, Apr 9, 2019 at 9:55 PM Dragos Dublea <dubleadragos2709@gmail.com
>>> >
>>> wrote:
>>>
>>> > Hello,
>>> >
>>> > This is a very interesting topic. I did go through the design doc. Can
>>> you
>>> > please mentor me to implement this? I am very much interested in
>>> taking it
>>> > up.
>>> >
>>> > Thanks
>>> >
>>> > On 2019/03/26 21:31:16, moon soo Lee <m....@apache.org> wrote:
>>> > > Hi,>
>>> > >
>>> > > There're some challenges bringing a model inside notebook to a
>>> > production>
>>> >
>>> > > environment.>
>>> > > Many many organizations, the most common practice I see today is
>>> > something>
>>> > > like>
>>> > >
>>> > > 1. Data scientist develop a model in a data science notebook.>
>>> > > 2. SW engineer rewrites the model, to meet the production
>>> requirements.>
>>> > >
>>> > > In other words, data scientists do not have self-service capability.
>>> And>
>>> > > the organization is spending a lot of time for reimplementing model
>>> for>
>>> > > production.>
>>> > >
>>> > > I tried to identify the gaps between data science notebook and
>>> > production>
>>> >
>>> > > environment, and what can possibly address them. So models that
>>> created
>>> > by>
>>> > > data scientists in the notebook can go production with minimum
>>> efforts.>
>>> > >
>>> > > I made a proposal to solve this problem. Please review and comment.
>>> Any>
>>> > > ideas and feedbacks are welcome. You can make a modification if
>>> needed.>
>>> > >
>>> >
>>> >
>>> https://docs.google.com/document/d/1YA6q8W9yO8a88xzLDYs9zv_fKu2_cnB58rmQbakxi1I/edit?usp=sharing
>>> > >
>>> > >
>>> > > This document is linked from>
>>> > > https://issues.apache.org/jira/browse/ZEPPELIN-3994>
>>> > >
>>> > > Thanks,>
>>> > > moon>
>>> > >
>>> >
>>>
>>

Re: [DISCUSS] Notebook serving

Posted by Dragos Dublea <du...@gmail.com>.
Hello,

I will be very glad to take up any subtasks or participate in a discussion
about this project if you get time for the discussion.

As I will begin with my vacation soon, I am excited to work on this project
with the Zeppelin community.

Thank you

On Wed, Apr 10, 2019, 4:03 PM Dragos Dublea <du...@gmail.com>
wrote:

> Hello,
>
> Thank you so much for your reply. AFAIK, Only the student signup period is
> over. Mentors can still join the program. They will have to receive an
> invite from the organization admins. Here, in this case, Apache Software
> Foundation org admins will have to send the invite link to enable your
> signup.
>
> Thanks again
>
>
>
> On Wed, 10 Apr 2019 at 13:23, moon soo Lee <mo...@apache.org> wrote:
>
>> Hi,
>>
>> Thanks for the interest in this topic. I realized mentor sign up period is
>> finished.
>> Let me see if there's a way to add myself as a GsoC mentor or any other
>> alternative.
>>
>> Regards,
>> moon
>>
>> On Tue, Apr 9, 2019 at 9:55 PM Dragos Dublea <du...@gmail.com>
>> wrote:
>>
>> > Hello,
>> >
>> > This is a very interesting topic. I did go through the design doc. Can
>> you
>> > please mentor me to implement this? I am very much interested in taking
>> it
>> > up.
>> >
>> > Thanks
>> >
>> > On 2019/03/26 21:31:16, moon soo Lee <m....@apache.org> wrote:
>> > > Hi,>
>> > >
>> > > There're some challenges bringing a model inside notebook to a
>> > production>
>> >
>> > > environment.>
>> > > Many many organizations, the most common practice I see today is
>> > something>
>> > > like>
>> > >
>> > > 1. Data scientist develop a model in a data science notebook.>
>> > > 2. SW engineer rewrites the model, to meet the production
>> requirements.>
>> > >
>> > > In other words, data scientists do not have self-service capability.
>> And>
>> > > the organization is spending a lot of time for reimplementing model
>> for>
>> > > production.>
>> > >
>> > > I tried to identify the gaps between data science notebook and
>> > production>
>> >
>> > > environment, and what can possibly address them. So models that
>> created
>> > by>
>> > > data scientists in the notebook can go production with minimum
>> efforts.>
>> > >
>> > > I made a proposal to solve this problem. Please review and comment.
>> Any>
>> > > ideas and feedbacks are welcome. You can make a modification if
>> needed.>
>> > >
>> >
>> >
>> https://docs.google.com/document/d/1YA6q8W9yO8a88xzLDYs9zv_fKu2_cnB58rmQbakxi1I/edit?usp=sharing
>> > >
>> > >
>> > > This document is linked from>
>> > > https://issues.apache.org/jira/browse/ZEPPELIN-3994>
>> > >
>> > > Thanks,>
>> > > moon>
>> > >
>> >
>>
>

Re: [DISCUSS] Notebook serving

Posted by Dragos Dublea <du...@gmail.com>.
Hello,

Thank you so much for your reply. AFAIK, Only the student signup period is
over. Mentors can still join the program. They will have to receive an
invite from the organization admins. Here, in this case, Apache Software
Foundation org admins will have to send the invite link to enable your
signup.

Thanks again



On Wed, 10 Apr 2019 at 13:23, moon soo Lee <mo...@apache.org> wrote:

> Hi,
>
> Thanks for the interest in this topic. I realized mentor sign up period is
> finished.
> Let me see if there's a way to add myself as a GsoC mentor or any other
> alternative.
>
> Regards,
> moon
>
> On Tue, Apr 9, 2019 at 9:55 PM Dragos Dublea <du...@gmail.com>
> wrote:
>
> > Hello,
> >
> > This is a very interesting topic. I did go through the design doc. Can
> you
> > please mentor me to implement this? I am very much interested in taking
> it
> > up.
> >
> > Thanks
> >
> > On 2019/03/26 21:31:16, moon soo Lee <m....@apache.org> wrote:
> > > Hi,>
> > >
> > > There're some challenges bringing a model inside notebook to a
> > production>
> >
> > > environment.>
> > > Many many organizations, the most common practice I see today is
> > something>
> > > like>
> > >
> > > 1. Data scientist develop a model in a data science notebook.>
> > > 2. SW engineer rewrites the model, to meet the production
> requirements.>
> > >
> > > In other words, data scientists do not have self-service capability.
> And>
> > > the organization is spending a lot of time for reimplementing model
> for>
> > > production.>
> > >
> > > I tried to identify the gaps between data science notebook and
> > production>
> >
> > > environment, and what can possibly address them. So models that created
> > by>
> > > data scientists in the notebook can go production with minimum
> efforts.>
> > >
> > > I made a proposal to solve this problem. Please review and comment.
> Any>
> > > ideas and feedbacks are welcome. You can make a modification if
> needed.>
> > >
> >
> >
> https://docs.google.com/document/d/1YA6q8W9yO8a88xzLDYs9zv_fKu2_cnB58rmQbakxi1I/edit?usp=sharing
> > >
> > >
> > > This document is linked from>
> > > https://issues.apache.org/jira/browse/ZEPPELIN-3994>
> > >
> > > Thanks,>
> > > moon>
> > >
> >
>

Re: [DISCUSS] Notebook serving

Posted by moon soo Lee <mo...@apache.org>.
Hi,

Thanks for the interest in this topic. I realized mentor sign up period is
finished.
Let me see if there's a way to add myself as a GsoC mentor or any other
alternative.

Regards,
moon

On Tue, Apr 9, 2019 at 9:55 PM Dragos Dublea <du...@gmail.com>
wrote:

> Hello,
>
> This is a very interesting topic. I did go through the design doc. Can you
> please mentor me to implement this? I am very much interested in taking it
> up.
>
> Thanks
>
> On 2019/03/26 21:31:16, moon soo Lee <m....@apache.org> wrote:
> > Hi,>
> >
> > There're some challenges bringing a model inside notebook to a
> production>
>
> > environment.>
> > Many many organizations, the most common practice I see today is
> something>
> > like>
> >
> > 1. Data scientist develop a model in a data science notebook.>
> > 2. SW engineer rewrites the model, to meet the production requirements.>
> >
> > In other words, data scientists do not have self-service capability. And>
> > the organization is spending a lot of time for reimplementing model for>
> > production.>
> >
> > I tried to identify the gaps between data science notebook and
> production>
>
> > environment, and what can possibly address them. So models that created
> by>
> > data scientists in the notebook can go production with minimum efforts.>
> >
> > I made a proposal to solve this problem. Please review and comment. Any>
> > ideas and feedbacks are welcome. You can make a modification if needed.>
> >
>
> https://docs.google.com/document/d/1YA6q8W9yO8a88xzLDYs9zv_fKu2_cnB58rmQbakxi1I/edit?usp=sharing
> >
> >
> > This document is linked from>
> > https://issues.apache.org/jira/browse/ZEPPELIN-3994>
> >
> > Thanks,>
> > moon>
> >
>