You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by Luciano Resende <lu...@gmail.com> on 2017/01/04 18:25:24 UTC

Re: Exporting Spark paragraphs as Spark Applications

I have made some progress with a tool to handle the points discussed in
this thread. It's currently a command line tool and given a Zeppelin
notebook (note.json) it generates a Spark scala application, compiles it
using the compiler embedded in the scala sdk and then package all these
resources into a jar that works with spark-submit command.

I would like to start prototyping the integration into the Zeppelin UI and
I was wondering if it would be ok to use the above jar as a dependency
(e.g. from a maven release) and integrate into zeppelin...

Thoughts ?


On Mon, Sep 19, 2016 at 7:47 AM, Sourav Mazumder <
sourav.mazumder00@gmail.com> wrote:

> To Moon's point, This is what my vision is around this feature -
>
> 1. Use should be able to package 1, more than one, all of the paragraphs in
> a Notebook to create a Jar file which can be used with Spark-Submit.
>
> 2. The tool should automatically remove the all the interactive statements
> like print, show etc.
>
> 3. The tool should automatically create a Main class in addition to the jar
> file(s) which will internally call the respective jar. User can then change
> this main class if needed for parameterization through Args.
>
> Regards,
> Sourav
>
> On Mon, Sep 19, 2016 at 7:33 AM, Sourav Mazumder <
> sourav.mazumder00@gmail.com> wrote:
>
> > I am also pretty much for this.
> >
> > I have got the similar request from each and every people/group who I
> > showcased Zeppelin.Regards,
> > Sourav
> >
> > On Fri, Sep 16, 2016 at 8:06 PM, moon soo Lee <mo...@apache.org> wrote:
> >
> >> Hi Luciano,
> >>
> >> I've also got a lot of questions about "Productize the notebook" every
> >> time
> >> i meet users use Zeppelin in their work.
> >>
> >> I think it's actually about two different problems that Zeppelin need to
> >> address.
> >>
> >> *1) Provide way that interactive notebook becomes part of production
> data
> >> pipeline.*
> >>
> >> Although Zeppelin does have quite convenient cron-like scheduler for
> each
> >> Note, built-in cron scheduler is not ready for serious use in the
> >> production. Because it lacks some features like actions after
> >> success/fail,
> >> fault-tolerance, history, and so on. I think community is working on
> >> improving it, and it's going to take some time.
> >>  Meanwhile, any external enterprise level job scheduler can run Note or
> >> Paragraph via REST api. But we don't have any guide and examples for it,
> >> what are the REST APIs user can use for this purpose, and how to use
> them
> >> in various cases (e.g. with authentication on, dynamic form parameters,
> >> etc). I think a lot of things need to be improved to make zeppelin
> easier
> >> to be part of production pipeline.
> >>
> >> *2) Provide stable way of run spark paragraphs.*
> >>
> >> Another barrier of using notebook in production pipeline is Scala REPL
> in
> >> SparkInterpreter. SparkInterpreter uses Scala REPL to provide
> interactive
> >> scala session and Scala REPL will eventually hit OOME as it compiles and
> >> runs statements. Current workaround in zeppelin is cron-scheduler inside
> >> of
> >> notebook has checkbox that can restart the Note after scheduler runs it.
> >> Of course that option does not apply when external scheduler runs job
> >> through REST api.
> >>
> >> I think what Luciano suggesting, "Export Spark Paragraph as Spark
> >> application" is interesting. If Spark Paragraphs can be easily packaged
> >> into jar (spark application) that can be one of way to address 1) and
> 2).
> >> In case of user already have stable way to schedule spark application
> jar.
> >>
> >> Actually, Flink interactive shell works in similar way internally as far
> >> as
> >> i know. i.e. package compiled class into jar and submit.
> >>
> >> One idea for prototyping is,
> >> How about make a interpreter inside of spark interpreter group, say it's
> >> %spark.build or some better name.
> >>
> >> And if user runs some command like
> >>
> >> %spark.build
> >> package
> >>
> >> then it builds spark application jar based on spark paragraph in the
> Note.
> >> I think it can be the simplest user interface for the prototype.
> >>
> >> Thanks,
> >> moon
> >>
> >> On Fri, Sep 16, 2016 at 1:11 PM Jeremy Anderson <
> >> jeremy@objectadjective.com>
> >> wrote:
> >>
> >> > Luciano, I think this would be a terrific feature. I've heard the
> exact
> >> > same workflow you've describe in all of the research we've done.
> >> >
> >> > ...........................
> >> >
> >> > Jeremy Anderson
> >> > Founder, Object Adjective
> >> > 415.493.8489
> >> > jeremy@objectadjective.com
> >> > objectadjective.com <http://about.me/jeremyanderson>
> >> >
> >> >
> >> >
> >> > This email and any files transmitted with it are confidential and
> >> > intended solely for the use of the individual or entity to whom they
> are
> >> > addressed.
> >> >
> >> > On 16 September 2016 at 12:19, Luciano Resende <lu...@gmail.com>
> >> > wrote:
> >> >
> >> > > While talking with a few different users, I have been seeing the use
> >> case
> >> > > of using iterative development in Notebooks or Spark Shell and then
> >> > copying
> >> > > and pasting the final solution to a formal application repeating
> >> itself
> >> > > very often.
> >> > >
> >> > > I was wondering if an "Export Spark Paragraphs as a Spark
> Application
> >> > > (jar)" would be a feature that Zeppelin community would think it's
> >> > useful.
> >> > > But keep in mind there are some limitation here : we would be
> >> constrained
> >> > > to Spark related paragraphs, etc...  but even so, I think there are
> >> > > multiple scenarios where I see that the ability to have an
> application
> >> > that
> >> > > directly runs on Spark to be very useful.
> >> > >
> >> > > If the community is interested, let's use this thread to discuss any
> >> > > specific requirements or suggestions that others might have, and
> >> after a
> >> > > few days I would like to start prototyping this functionality.
> >> > >
> >> > > Thoughts ?
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Luciano Resende
> >> > > http://twitter.com/lresende1975
> >> > > http://lresende.blogspot.com/
> >> > >
> >> >
> >>
> >
> >
>



-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: Exporting Spark paragraphs as Spark Applications

Posted by Jeff Zhang <zj...@gmail.com>.
Hi Luciano, maybe I am wrong, just my two cents for your consideration.



Jeff Zhang <zj...@gmail.com>于2017年1月6日周五 上午8:32写道:

>
> Thanks Luciano.  I am not saying the community don't feel this is a good
> idea. It's just my personal opinion (maybe with some bias, I didn't talk
> with many customers as you)  I just feel maybe you can spend time on
> improving zeppelin to make zeppelin to do the job rather than exporting the
> jar and leverage other tools to deploy the jar.  Because I don't want you
> to waste time that maybe finally you find out customer are happy to do that
> in one central place: zeppelin.  Anyway, this is just my personal thinking,
> you can talk with your customers to hear their feedback.
>
>
> Luciano Resende <lu...@gmail.com>于2017年1月6日周五 上午5:01写道:
>
> Hi Jeff,
>
> While I agree with you that what you mentioned is completely acceptable for
> some users, particularly regarding the data science personas. Having said
> that, while talking with multiple enterprise companies, that have their own
> scheduler infrastructure with different quality of service or just want to
> deploy this as an app into their production environment which will have
> much more resources for running these apps with complete data sets, and
> currently they finish the experiment/development of the application in an
> interactive environment and them move their final code into a native spark
> application.
>
> Zeppelin is evolving quickly in this area, and I think that export as an
> application might be a good option for users that want to actually deploy
> their notebooks as native applications into their own Spark cluster.
>
> Having said that, if the community feels that this is not a required
> function in Zeppelin anymore, then I can continue with the development of
> the tool as a standalone command line tool. I was even thinking about
> expanding the functionality and implementing what is described in
> ZEPPELIN-1793.
>
> Thoughts ?
>
> On Thu, Jan 5, 2017 at 12:38 AM, Jeff Zhang <zj...@gmail.com> wrote:
>
> > Thanks Luciano. IIRC, what user want is to run the whole spark app, but
> > they don't care about whether it is in zeppelin or through a standard
> spark
> > app jar. I know zeppelin currently doesn't do well in converting note to
> > production spark app as Lee mentioned. But exporting note as jar seems a
> > short term solution, not a long term solution. I just feel when zeppelin
> > improve in this field, user might abandon this solution and transit to
> > zeppelin again. Here's some disadvantages I can see of this approach.
> >
> > 1.  If user want to change the code in iterative development , they have
> to
> > repeat the whole pipeline (change code in zeppelin -> export it to spark
> > jar -> redeploy this jar). This process is painful and wasting time.
> > 2.  Hard to debug and diagnose as code is changed/restructured when
> > exporting to jar
> > 3.  User have to leverage several distinct tools for the whole
> development
> > cycle (zeppelin, spark job server, and maybe cron job)
> >
> > Besides,  the OOM issue of Spark REPL Lee mentioned might not be a
> problem.
> > Because we can shutdown the app (close interpreter) after the app is
> done.
> >
> >
> >
> >
> >
> > Luciano Resende <lu...@gmail.com>于2017年1月5日周四 下午3:59写道:
> >
> > Some use cases discussed earlier on this thread:
> >
> > https://www.mail-archive.com/dev@zeppelin.apache.org/msg06323.html
> >
> > https://www.mail-archive.com/dev@zeppelin.apache.org/msg06332.html
> >
> > On Wed, Jan 4, 2017 at 4:51 PM, Jianfeng (Jeff) Zhang <
> > jzhang@hortonworks.com> wrote:
> >
> > >
> > > I don¹t understand why user want to export zeppelin note as spark
> > > application.
> > >
> > > If they want to trigger the running of spark app, why not use
> zeppelin¹s
> > > rest api for that. Even user export it as spark application, most of
> time
> > > in reality, they need to submit it through spark job server, so why not
> > > use zeppelin as a spark job server.
> > > And if the spark app fails, it is pretty hard to debug it, because the
> > > exporting tool has changed/restructured the source code.
> > >
> > >
> > > If this is a pretty large and complicated spark application, I don¹t
> > think
> > > zeppelin is a proper tool for that, they¹d better to use IDE for that
> > > project.
> > >
> > > BTW, After https://github.com/apache/zeppelin/pull/1799, user can
> define
> > > the dependency between paragraphs, and they can run one whole note
> which
> > > contains different interpreters.
> > >
> > >
> > >
> > > Best Regard,
> > > Jeff Zhang
> > >
> > >
> > >
> > >
> > >
> > > On 1/5/17, 2:25 AM, "Luciano Resende" <lu...@gmail.com> wrote:
> > >
> > > >I have made some progress with a tool to handle the points discussed
> in
> > > >this thread. It's currently a command line tool and given a Zeppelin
> > > >notebook (note.json) it generates a Spark scala application, compiles
> it
> > > >using the compiler embedded in the scala sdk and then package all
> these
> > > >resources into a jar that works with spark-submit command.
> > > >
> > > >I would like to start prototyping the integration into the Zeppelin UI
> > and
> > > >I was wondering if it would be ok to use the above jar as a dependency
> > > >(e.g. from a maven release) and integrate into zeppelin...
> > > >
> > > >Thoughts ?
> > > >
> > > >
> > > >On Mon, Sep 19, 2016 at 7:47 AM, Sourav Mazumder <
> > > >sourav.mazumder00@gmail.com> wrote:
> > > >
> > > >> To Moon's point, This is what my vision is around this feature -
> > > >>
> > > >> 1. Use should be able to package 1, more than one, all of the
> > > >>paragraphs in
> > > >> a Notebook to create a Jar file which can be used with Spark-Submit.
> > > >>
> > > >> 2. The tool should automatically remove the all the interactive
> > > >>statements
> > > >> like print, show etc.
> > > >>
> > > >> 3. The tool should automatically create a Main class in addition to
> > the
> > > >>jar
> > > >> file(s) which will internally call the respective jar. User can then
> > > >>change
> > > >> this main class if needed for parameterization through Args.
> > > >>
> > > >> Regards,
> > > >> Sourav
> > > >>
> > > >> On Mon, Sep 19, 2016 at 7:33 AM, Sourav Mazumder <
> > > >> sourav.mazumder00@gmail.com> wrote:
> > > >>
> > > >> > I am also pretty much for this.
> > > >> >
> > > >> > I have got the similar request from each and every people/group
> who
> > I
> > > >> > showcased Zeppelin.Regards,
> > > >> > Sourav
> > > >> >
> > > >> > On Fri, Sep 16, 2016 at 8:06 PM, moon soo Lee <mo...@apache.org>
> > > wrote:
> > > >> >
> > > >> >> Hi Luciano,
> > > >> >>
> > > >> >> I've also got a lot of questions about "Productize the notebook"
> > > >>every
> > > >> >> time
> > > >> >> i meet users use Zeppelin in their work.
> > > >> >>
> > > >> >> I think it's actually about two different problems that Zeppelin
> > > >>need to
> > > >> >> address.
> > > >> >>
> > > >> >> *1) Provide way that interactive notebook becomes part of
> > production
> > > >> data
> > > >> >> pipeline.*
> > > >> >>
> > > >> >> Although Zeppelin does have quite convenient cron-like scheduler
> > for
> > > >> each
> > > >> >> Note, built-in cron scheduler is not ready for serious use in the
> > > >> >> production. Because it lacks some features like actions after
> > > >> >> success/fail,
> > > >> >> fault-tolerance, history, and so on. I think community is working
> > on
> > > >> >> improving it, and it's going to take some time.
> > > >> >>  Meanwhile, any external enterprise level job scheduler can run
> > Note
> > > >>or
> > > >> >> Paragraph via REST api. But we don't have any guide and examples
> > for
> > > >>it,
> > > >> >> what are the REST APIs user can use for this purpose, and how to
> > use
> > > >> them
> > > >> >> in various cases (e.g. with authentication on, dynamic form
> > > >>parameters,
> > > >> >> etc). I think a lot of things need to be improved to make
> zeppelin
> > > >> easier
> > > >> >> to be part of production pipeline.
> > > >> >>
> > > >> >> *2) Provide stable way of run spark paragraphs.*
> > > >> >>
> > > >> >> Another barrier of using notebook in production pipeline is Scala
> > > >>REPL
> > > >> in
> > > >> >> SparkInterpreter. SparkInterpreter uses Scala REPL to provide
> > > >> interactive
> > > >> >> scala session and Scala REPL will eventually hit OOME as it
> > compiles
> > > >>and
> > > >> >> runs statements. Current workaround in zeppelin is cron-scheduler
> > > >>inside
> > > >> >> of
> > > >> >> notebook has checkbox that can restart the Note after scheduler
> > runs
> > > >>it.
> > > >> >> Of course that option does not apply when external scheduler runs
> > job
> > > >> >> through REST api.
> > > >> >>
> > > >> >> I think what Luciano suggesting, "Export Spark Paragraph as Spark
> > > >> >> application" is interesting. If Spark Paragraphs can be easily
> > > >>packaged
> > > >> >> into jar (spark application) that can be one of way to address 1)
> > and
> > > >> 2).
> > > >> >> In case of user already have stable way to schedule spark
> > application
> > > >> jar.
> > > >> >>
> > > >> >> Actually, Flink interactive shell works in similar way internally
> > as
> > > >>far
> > > >> >> as
> > > >> >> i know. i.e. package compiled class into jar and submit.
> > > >> >>
> > > >> >> One idea for prototyping is,
> > > >> >> How about make a interpreter inside of spark interpreter group,
> say
> > > >>it's
> > > >> >> %spark.build or some better name.
> > > >> >>
> > > >> >> And if user runs some command like
> > > >> >>
> > > >> >> %spark.build
> > > >> >> package
> > > >> >>
> > > >> >> then it builds spark application jar based on spark paragraph in
> > the
> > > >> Note.
> > > >> >> I think it can be the simplest user interface for the prototype.
> > > >> >>
> > > >> >> Thanks,
> > > >> >> moon
> > > >> >>
> > > >> >> On Fri, Sep 16, 2016 at 1:11 PM Jeremy Anderson <
> > > >> >> jeremy@objectadjective.com>
> > > >> >> wrote:
> > > >> >>
> > > >> >> > Luciano, I think this would be a terrific feature. I've heard
> the
> > > >> exact
> > > >> >> > same workflow you've describe in all of the research we've
> done.
> > > >> >> >
> > > >> >> > ...........................
> > > >> >> >
> > > >> >> > Jeremy Anderson
> > > >> >> > Founder, Object Adjective
> > > >> >> > 415.493.8489 <(415)%20493-8489> <(415)%20493-8489>
> > > >> >> > jeremy@objectadjective.com
> > > >> >> > objectadjective.com <http://about.me/jeremyanderson>
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> > This email and any files transmitted with it are confidential
> and
> > > >> >> > intended solely for the use of the individual or entity to whom
> > > >>they
> > > >> are
> > > >> >> > addressed.
> > > >> >> >
> > > >> >> > On 16 September 2016 at 12:19, Luciano Resende
> > > >><lu...@gmail.com>
> > > >> >> > wrote:
> > > >> >> >
> > > >> >> > > While talking with a few different users, I have been seeing
> > the
> > > >>use
> > > >> >> case
> > > >> >> > > of using iterative development in Notebooks or Spark Shell
> and
> > > >>then
> > > >> >> > copying
> > > >> >> > > and pasting the final solution to a formal application
> > repeating
> > > >> >> itself
> > > >> >> > > very often.
> > > >> >> > >
> > > >> >> > > I was wondering if an "Export Spark Paragraphs as a Spark
> > > >> Application
> > > >> >> > > (jar)" would be a feature that Zeppelin community would think
> > > >>it's
> > > >> >> > useful.
> > > >> >> > > But keep in mind there are some limitation here : we would be
> > > >> >> constrained
> > > >> >> > > to Spark related paragraphs, etc...  but even so, I think
> there
> > > >>are
> > > >> >> > > multiple scenarios where I see that the ability to have an
> > > >> application
> > > >> >> > that
> > > >> >> > > directly runs on Spark to be very useful.
> > > >> >> > >
> > > >> >> > > If the community is interested, let's use this thread to
> > discuss
> > > >>any
> > > >> >> > > specific requirements or suggestions that others might have,
> > and
> > > >> >> after a
> > > >> >> > > few days I would like to start prototyping this
> functionality.
> > > >> >> > >
> > > >> >> > > Thoughts ?
> > > >> >> > >
> > > >> >> > >
> > > >> >> > >
> > > >> >> > > --
> > > >> >> > > Luciano Resende
> > > >> >> > > http://twitter.com/lresende1975
> > > >> >> > > http://lresende.blogspot.com/
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >> >
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > >--
> > > >Luciano Resende
> > > >http://twitter.com/lresende1975
> > > >http://lresende.blogspot.com/
> > >
> > >
> >
> >
> > --
> > Luciano Resende
> > http://twitter.com/lresende1975
> > http://lresende.blogspot.com/
> >
>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>
>

Re: Exporting Spark paragraphs as Spark Applications

Posted by Jeff Zhang <zj...@gmail.com>.
Thanks Luciano.  I am not saying the community don't feel this is a good
idea. It's just my personal opinion (maybe with some bias, I didn't talk
with many customers as you)  I just feel maybe you can spend time on
improving zeppelin to make zeppelin to do the job rather than exporting the
jar and leverage other tools to deploy the jar.  Because I don't want you
to waste time that maybe finally you find out customer are happy to do that
in one central place: zeppelin.  Anyway, this is just my personal thinking,
you can talk with your customers to hear their feedback.


Luciano Resende <lu...@gmail.com>于2017年1月6日周五 上午5:01写道:

> Hi Jeff,
>
> While I agree with you that what you mentioned is completely acceptable for
> some users, particularly regarding the data science personas. Having said
> that, while talking with multiple enterprise companies, that have their own
> scheduler infrastructure with different quality of service or just want to
> deploy this as an app into their production environment which will have
> much more resources for running these apps with complete data sets, and
> currently they finish the experiment/development of the application in an
> interactive environment and them move their final code into a native spark
> application.
>
> Zeppelin is evolving quickly in this area, and I think that export as an
> application might be a good option for users that want to actually deploy
> their notebooks as native applications into their own Spark cluster.
>
> Having said that, if the community feels that this is not a required
> function in Zeppelin anymore, then I can continue with the development of
> the tool as a standalone command line tool. I was even thinking about
> expanding the functionality and implementing what is described in
> ZEPPELIN-1793.
>
> Thoughts ?
>
> On Thu, Jan 5, 2017 at 12:38 AM, Jeff Zhang <zj...@gmail.com> wrote:
>
> > Thanks Luciano. IIRC, what user want is to run the whole spark app, but
> > they don't care about whether it is in zeppelin or through a standard
> spark
> > app jar. I know zeppelin currently doesn't do well in converting note to
> > production spark app as Lee mentioned. But exporting note as jar seems a
> > short term solution, not a long term solution. I just feel when zeppelin
> > improve in this field, user might abandon this solution and transit to
> > zeppelin again. Here's some disadvantages I can see of this approach.
> >
> > 1.  If user want to change the code in iterative development , they have
> to
> > repeat the whole pipeline (change code in zeppelin -> export it to spark
> > jar -> redeploy this jar). This process is painful and wasting time.
> > 2.  Hard to debug and diagnose as code is changed/restructured when
> > exporting to jar
> > 3.  User have to leverage several distinct tools for the whole
> development
> > cycle (zeppelin, spark job server, and maybe cron job)
> >
> > Besides,  the OOM issue of Spark REPL Lee mentioned might not be a
> problem.
> > Because we can shutdown the app (close interpreter) after the app is
> done.
> >
> >
> >
> >
> >
> > Luciano Resende <lu...@gmail.com>于2017年1月5日周四 下午3:59写道:
> >
> > Some use cases discussed earlier on this thread:
> >
> > https://www.mail-archive.com/dev@zeppelin.apache.org/msg06323.html
> >
> > https://www.mail-archive.com/dev@zeppelin.apache.org/msg06332.html
> >
> > On Wed, Jan 4, 2017 at 4:51 PM, Jianfeng (Jeff) Zhang <
> > jzhang@hortonworks.com> wrote:
> >
> > >
> > > I don¹t understand why user want to export zeppelin note as spark
> > > application.
> > >
> > > If they want to trigger the running of spark app, why not use
> zeppelin¹s
> > > rest api for that. Even user export it as spark application, most of
> time
> > > in reality, they need to submit it through spark job server, so why not
> > > use zeppelin as a spark job server.
> > > And if the spark app fails, it is pretty hard to debug it, because the
> > > exporting tool has changed/restructured the source code.
> > >
> > >
> > > If this is a pretty large and complicated spark application, I don¹t
> > think
> > > zeppelin is a proper tool for that, they¹d better to use IDE for that
> > > project.
> > >
> > > BTW, After https://github.com/apache/zeppelin/pull/1799, user can
> define
> > > the dependency between paragraphs, and they can run one whole note
> which
> > > contains different interpreters.
> > >
> > >
> > >
> > > Best Regard,
> > > Jeff Zhang
> > >
> > >
> > >
> > >
> > >
> > > On 1/5/17, 2:25 AM, "Luciano Resende" <lu...@gmail.com> wrote:
> > >
> > > >I have made some progress with a tool to handle the points discussed
> in
> > > >this thread. It's currently a command line tool and given a Zeppelin
> > > >notebook (note.json) it generates a Spark scala application, compiles
> it
> > > >using the compiler embedded in the scala sdk and then package all
> these
> > > >resources into a jar that works with spark-submit command.
> > > >
> > > >I would like to start prototyping the integration into the Zeppelin UI
> > and
> > > >I was wondering if it would be ok to use the above jar as a dependency
> > > >(e.g. from a maven release) and integrate into zeppelin...
> > > >
> > > >Thoughts ?
> > > >
> > > >
> > > >On Mon, Sep 19, 2016 at 7:47 AM, Sourav Mazumder <
> > > >sourav.mazumder00@gmail.com> wrote:
> > > >
> > > >> To Moon's point, This is what my vision is around this feature -
> > > >>
> > > >> 1. Use should be able to package 1, more than one, all of the
> > > >>paragraphs in
> > > >> a Notebook to create a Jar file which can be used with Spark-Submit.
> > > >>
> > > >> 2. The tool should automatically remove the all the interactive
> > > >>statements
> > > >> like print, show etc.
> > > >>
> > > >> 3. The tool should automatically create a Main class in addition to
> > the
> > > >>jar
> > > >> file(s) which will internally call the respective jar. User can then
> > > >>change
> > > >> this main class if needed for parameterization through Args.
> > > >>
> > > >> Regards,
> > > >> Sourav
> > > >>
> > > >> On Mon, Sep 19, 2016 at 7:33 AM, Sourav Mazumder <
> > > >> sourav.mazumder00@gmail.com> wrote:
> > > >>
> > > >> > I am also pretty much for this.
> > > >> >
> > > >> > I have got the similar request from each and every people/group
> who
> > I
> > > >> > showcased Zeppelin.Regards,
> > > >> > Sourav
> > > >> >
> > > >> > On Fri, Sep 16, 2016 at 8:06 PM, moon soo Lee <mo...@apache.org>
> > > wrote:
> > > >> >
> > > >> >> Hi Luciano,
> > > >> >>
> > > >> >> I've also got a lot of questions about "Productize the notebook"
> > > >>every
> > > >> >> time
> > > >> >> i meet users use Zeppelin in their work.
> > > >> >>
> > > >> >> I think it's actually about two different problems that Zeppelin
> > > >>need to
> > > >> >> address.
> > > >> >>
> > > >> >> *1) Provide way that interactive notebook becomes part of
> > production
> > > >> data
> > > >> >> pipeline.*
> > > >> >>
> > > >> >> Although Zeppelin does have quite convenient cron-like scheduler
> > for
> > > >> each
> > > >> >> Note, built-in cron scheduler is not ready for serious use in the
> > > >> >> production. Because it lacks some features like actions after
> > > >> >> success/fail,
> > > >> >> fault-tolerance, history, and so on. I think community is working
> > on
> > > >> >> improving it, and it's going to take some time.
> > > >> >>  Meanwhile, any external enterprise level job scheduler can run
> > Note
> > > >>or
> > > >> >> Paragraph via REST api. But we don't have any guide and examples
> > for
> > > >>it,
> > > >> >> what are the REST APIs user can use for this purpose, and how to
> > use
> > > >> them
> > > >> >> in various cases (e.g. with authentication on, dynamic form
> > > >>parameters,
> > > >> >> etc). I think a lot of things need to be improved to make
> zeppelin
> > > >> easier
> > > >> >> to be part of production pipeline.
> > > >> >>
> > > >> >> *2) Provide stable way of run spark paragraphs.*
> > > >> >>
> > > >> >> Another barrier of using notebook in production pipeline is Scala
> > > >>REPL
> > > >> in
> > > >> >> SparkInterpreter. SparkInterpreter uses Scala REPL to provide
> > > >> interactive
> > > >> >> scala session and Scala REPL will eventually hit OOME as it
> > compiles
> > > >>and
> > > >> >> runs statements. Current workaround in zeppelin is cron-scheduler
> > > >>inside
> > > >> >> of
> > > >> >> notebook has checkbox that can restart the Note after scheduler
> > runs
> > > >>it.
> > > >> >> Of course that option does not apply when external scheduler runs
> > job
> > > >> >> through REST api.
> > > >> >>
> > > >> >> I think what Luciano suggesting, "Export Spark Paragraph as Spark
> > > >> >> application" is interesting. If Spark Paragraphs can be easily
> > > >>packaged
> > > >> >> into jar (spark application) that can be one of way to address 1)
> > and
> > > >> 2).
> > > >> >> In case of user already have stable way to schedule spark
> > application
> > > >> jar.
> > > >> >>
> > > >> >> Actually, Flink interactive shell works in similar way internally
> > as
> > > >>far
> > > >> >> as
> > > >> >> i know. i.e. package compiled class into jar and submit.
> > > >> >>
> > > >> >> One idea for prototyping is,
> > > >> >> How about make a interpreter inside of spark interpreter group,
> say
> > > >>it's
> > > >> >> %spark.build or some better name.
> > > >> >>
> > > >> >> And if user runs some command like
> > > >> >>
> > > >> >> %spark.build
> > > >> >> package
> > > >> >>
> > > >> >> then it builds spark application jar based on spark paragraph in
> > the
> > > >> Note.
> > > >> >> I think it can be the simplest user interface for the prototype.
> > > >> >>
> > > >> >> Thanks,
> > > >> >> moon
> > > >> >>
> > > >> >> On Fri, Sep 16, 2016 at 1:11 PM Jeremy Anderson <
> > > >> >> jeremy@objectadjective.com>
> > > >> >> wrote:
> > > >> >>
> > > >> >> > Luciano, I think this would be a terrific feature. I've heard
> the
> > > >> exact
> > > >> >> > same workflow you've describe in all of the research we've
> done.
> > > >> >> >
> > > >> >> > ...........................
> > > >> >> >
> > > >> >> > Jeremy Anderson
> > > >> >> > Founder, Object Adjective
> > > >> >> > 415.493.8489 <(415)%20493-8489> <(415)%20493-8489>
> > > >> >> > jeremy@objectadjective.com
> > > >> >> > objectadjective.com <http://about.me/jeremyanderson>
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> > This email and any files transmitted with it are confidential
> and
> > > >> >> > intended solely for the use of the individual or entity to whom
> > > >>they
> > > >> are
> > > >> >> > addressed.
> > > >> >> >
> > > >> >> > On 16 September 2016 at 12:19, Luciano Resende
> > > >><lu...@gmail.com>
> > > >> >> > wrote:
> > > >> >> >
> > > >> >> > > While talking with a few different users, I have been seeing
> > the
> > > >>use
> > > >> >> case
> > > >> >> > > of using iterative development in Notebooks or Spark Shell
> and
> > > >>then
> > > >> >> > copying
> > > >> >> > > and pasting the final solution to a formal application
> > repeating
> > > >> >> itself
> > > >> >> > > very often.
> > > >> >> > >
> > > >> >> > > I was wondering if an "Export Spark Paragraphs as a Spark
> > > >> Application
> > > >> >> > > (jar)" would be a feature that Zeppelin community would think
> > > >>it's
> > > >> >> > useful.
> > > >> >> > > But keep in mind there are some limitation here : we would be
> > > >> >> constrained
> > > >> >> > > to Spark related paragraphs, etc...  but even so, I think
> there
> > > >>are
> > > >> >> > > multiple scenarios where I see that the ability to have an
> > > >> application
> > > >> >> > that
> > > >> >> > > directly runs on Spark to be very useful.
> > > >> >> > >
> > > >> >> > > If the community is interested, let's use this thread to
> > discuss
> > > >>any
> > > >> >> > > specific requirements or suggestions that others might have,
> > and
> > > >> >> after a
> > > >> >> > > few days I would like to start prototyping this
> functionality.
> > > >> >> > >
> > > >> >> > > Thoughts ?
> > > >> >> > >
> > > >> >> > >
> > > >> >> > >
> > > >> >> > > --
> > > >> >> > > Luciano Resende
> > > >> >> > > http://twitter.com/lresende1975
> > > >> >> > > http://lresende.blogspot.com/
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >> >
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > >--
> > > >Luciano Resende
> > > >http://twitter.com/lresende1975
> > > >http://lresende.blogspot.com/
> > >
> > >
> >
> >
> > --
> > Luciano Resende
> > http://twitter.com/lresende1975
> > http://lresende.blogspot.com/
> >
>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>

Re: Exporting Spark paragraphs as Spark Applications

Posted by Luciano Resende <lu...@gmail.com>.
Hi Jeff,

While I agree with you that what you mentioned is completely acceptable for
some users, particularly regarding the data science personas. Having said
that, while talking with multiple enterprise companies, that have their own
scheduler infrastructure with different quality of service or just want to
deploy this as an app into their production environment which will have
much more resources for running these apps with complete data sets, and
currently they finish the experiment/development of the application in an
interactive environment and them move their final code into a native spark
application.

Zeppelin is evolving quickly in this area, and I think that export as an
application might be a good option for users that want to actually deploy
their notebooks as native applications into their own Spark cluster.

Having said that, if the community feels that this is not a required
function in Zeppelin anymore, then I can continue with the development of
the tool as a standalone command line tool. I was even thinking about
expanding the functionality and implementing what is described in
ZEPPELIN-1793.

Thoughts ?

On Thu, Jan 5, 2017 at 12:38 AM, Jeff Zhang <zj...@gmail.com> wrote:

> Thanks Luciano. IIRC, what user want is to run the whole spark app, but
> they don't care about whether it is in zeppelin or through a standard spark
> app jar. I know zeppelin currently doesn't do well in converting note to
> production spark app as Lee mentioned. But exporting note as jar seems a
> short term solution, not a long term solution. I just feel when zeppelin
> improve in this field, user might abandon this solution and transit to
> zeppelin again. Here's some disadvantages I can see of this approach.
>
> 1.  If user want to change the code in iterative development , they have to
> repeat the whole pipeline (change code in zeppelin -> export it to spark
> jar -> redeploy this jar). This process is painful and wasting time.
> 2.  Hard to debug and diagnose as code is changed/restructured when
> exporting to jar
> 3.  User have to leverage several distinct tools for the whole development
> cycle (zeppelin, spark job server, and maybe cron job)
>
> Besides,  the OOM issue of Spark REPL Lee mentioned might not be a problem.
> Because we can shutdown the app (close interpreter) after the app is done.
>
>
>
>
>
> Luciano Resende <lu...@gmail.com>于2017年1月5日周四 下午3:59写道:
>
> Some use cases discussed earlier on this thread:
>
> https://www.mail-archive.com/dev@zeppelin.apache.org/msg06323.html
>
> https://www.mail-archive.com/dev@zeppelin.apache.org/msg06332.html
>
> On Wed, Jan 4, 2017 at 4:51 PM, Jianfeng (Jeff) Zhang <
> jzhang@hortonworks.com> wrote:
>
> >
> > I don¹t understand why user want to export zeppelin note as spark
> > application.
> >
> > If they want to trigger the running of spark app, why not use zeppelin¹s
> > rest api for that. Even user export it as spark application, most of time
> > in reality, they need to submit it through spark job server, so why not
> > use zeppelin as a spark job server.
> > And if the spark app fails, it is pretty hard to debug it, because the
> > exporting tool has changed/restructured the source code.
> >
> >
> > If this is a pretty large and complicated spark application, I don¹t
> think
> > zeppelin is a proper tool for that, they¹d better to use IDE for that
> > project.
> >
> > BTW, After https://github.com/apache/zeppelin/pull/1799, user can define
> > the dependency between paragraphs, and they can run one whole note which
> > contains different interpreters.
> >
> >
> >
> > Best Regard,
> > Jeff Zhang
> >
> >
> >
> >
> >
> > On 1/5/17, 2:25 AM, "Luciano Resende" <lu...@gmail.com> wrote:
> >
> > >I have made some progress with a tool to handle the points discussed in
> > >this thread. It's currently a command line tool and given a Zeppelin
> > >notebook (note.json) it generates a Spark scala application, compiles it
> > >using the compiler embedded in the scala sdk and then package all these
> > >resources into a jar that works with spark-submit command.
> > >
> > >I would like to start prototyping the integration into the Zeppelin UI
> and
> > >I was wondering if it would be ok to use the above jar as a dependency
> > >(e.g. from a maven release) and integrate into zeppelin...
> > >
> > >Thoughts ?
> > >
> > >
> > >On Mon, Sep 19, 2016 at 7:47 AM, Sourav Mazumder <
> > >sourav.mazumder00@gmail.com> wrote:
> > >
> > >> To Moon's point, This is what my vision is around this feature -
> > >>
> > >> 1. Use should be able to package 1, more than one, all of the
> > >>paragraphs in
> > >> a Notebook to create a Jar file which can be used with Spark-Submit.
> > >>
> > >> 2. The tool should automatically remove the all the interactive
> > >>statements
> > >> like print, show etc.
> > >>
> > >> 3. The tool should automatically create a Main class in addition to
> the
> > >>jar
> > >> file(s) which will internally call the respective jar. User can then
> > >>change
> > >> this main class if needed for parameterization through Args.
> > >>
> > >> Regards,
> > >> Sourav
> > >>
> > >> On Mon, Sep 19, 2016 at 7:33 AM, Sourav Mazumder <
> > >> sourav.mazumder00@gmail.com> wrote:
> > >>
> > >> > I am also pretty much for this.
> > >> >
> > >> > I have got the similar request from each and every people/group who
> I
> > >> > showcased Zeppelin.Regards,
> > >> > Sourav
> > >> >
> > >> > On Fri, Sep 16, 2016 at 8:06 PM, moon soo Lee <mo...@apache.org>
> > wrote:
> > >> >
> > >> >> Hi Luciano,
> > >> >>
> > >> >> I've also got a lot of questions about "Productize the notebook"
> > >>every
> > >> >> time
> > >> >> i meet users use Zeppelin in their work.
> > >> >>
> > >> >> I think it's actually about two different problems that Zeppelin
> > >>need to
> > >> >> address.
> > >> >>
> > >> >> *1) Provide way that interactive notebook becomes part of
> production
> > >> data
> > >> >> pipeline.*
> > >> >>
> > >> >> Although Zeppelin does have quite convenient cron-like scheduler
> for
> > >> each
> > >> >> Note, built-in cron scheduler is not ready for serious use in the
> > >> >> production. Because it lacks some features like actions after
> > >> >> success/fail,
> > >> >> fault-tolerance, history, and so on. I think community is working
> on
> > >> >> improving it, and it's going to take some time.
> > >> >>  Meanwhile, any external enterprise level job scheduler can run
> Note
> > >>or
> > >> >> Paragraph via REST api. But we don't have any guide and examples
> for
> > >>it,
> > >> >> what are the REST APIs user can use for this purpose, and how to
> use
> > >> them
> > >> >> in various cases (e.g. with authentication on, dynamic form
> > >>parameters,
> > >> >> etc). I think a lot of things need to be improved to make zeppelin
> > >> easier
> > >> >> to be part of production pipeline.
> > >> >>
> > >> >> *2) Provide stable way of run spark paragraphs.*
> > >> >>
> > >> >> Another barrier of using notebook in production pipeline is Scala
> > >>REPL
> > >> in
> > >> >> SparkInterpreter. SparkInterpreter uses Scala REPL to provide
> > >> interactive
> > >> >> scala session and Scala REPL will eventually hit OOME as it
> compiles
> > >>and
> > >> >> runs statements. Current workaround in zeppelin is cron-scheduler
> > >>inside
> > >> >> of
> > >> >> notebook has checkbox that can restart the Note after scheduler
> runs
> > >>it.
> > >> >> Of course that option does not apply when external scheduler runs
> job
> > >> >> through REST api.
> > >> >>
> > >> >> I think what Luciano suggesting, "Export Spark Paragraph as Spark
> > >> >> application" is interesting. If Spark Paragraphs can be easily
> > >>packaged
> > >> >> into jar (spark application) that can be one of way to address 1)
> and
> > >> 2).
> > >> >> In case of user already have stable way to schedule spark
> application
> > >> jar.
> > >> >>
> > >> >> Actually, Flink interactive shell works in similar way internally
> as
> > >>far
> > >> >> as
> > >> >> i know. i.e. package compiled class into jar and submit.
> > >> >>
> > >> >> One idea for prototyping is,
> > >> >> How about make a interpreter inside of spark interpreter group, say
> > >>it's
> > >> >> %spark.build or some better name.
> > >> >>
> > >> >> And if user runs some command like
> > >> >>
> > >> >> %spark.build
> > >> >> package
> > >> >>
> > >> >> then it builds spark application jar based on spark paragraph in
> the
> > >> Note.
> > >> >> I think it can be the simplest user interface for the prototype.
> > >> >>
> > >> >> Thanks,
> > >> >> moon
> > >> >>
> > >> >> On Fri, Sep 16, 2016 at 1:11 PM Jeremy Anderson <
> > >> >> jeremy@objectadjective.com>
> > >> >> wrote:
> > >> >>
> > >> >> > Luciano, I think this would be a terrific feature. I've heard the
> > >> exact
> > >> >> > same workflow you've describe in all of the research we've done.
> > >> >> >
> > >> >> > ...........................
> > >> >> >
> > >> >> > Jeremy Anderson
> > >> >> > Founder, Object Adjective
> > >> >> > 415.493.8489 <(415)%20493-8489>
> > >> >> > jeremy@objectadjective.com
> > >> >> > objectadjective.com <http://about.me/jeremyanderson>
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > This email and any files transmitted with it are confidential and
> > >> >> > intended solely for the use of the individual or entity to whom
> > >>they
> > >> are
> > >> >> > addressed.
> > >> >> >
> > >> >> > On 16 September 2016 at 12:19, Luciano Resende
> > >><lu...@gmail.com>
> > >> >> > wrote:
> > >> >> >
> > >> >> > > While talking with a few different users, I have been seeing
> the
> > >>use
> > >> >> case
> > >> >> > > of using iterative development in Notebooks or Spark Shell and
> > >>then
> > >> >> > copying
> > >> >> > > and pasting the final solution to a formal application
> repeating
> > >> >> itself
> > >> >> > > very often.
> > >> >> > >
> > >> >> > > I was wondering if an "Export Spark Paragraphs as a Spark
> > >> Application
> > >> >> > > (jar)" would be a feature that Zeppelin community would think
> > >>it's
> > >> >> > useful.
> > >> >> > > But keep in mind there are some limitation here : we would be
> > >> >> constrained
> > >> >> > > to Spark related paragraphs, etc...  but even so, I think there
> > >>are
> > >> >> > > multiple scenarios where I see that the ability to have an
> > >> application
> > >> >> > that
> > >> >> > > directly runs on Spark to be very useful.
> > >> >> > >
> > >> >> > > If the community is interested, let's use this thread to
> discuss
> > >>any
> > >> >> > > specific requirements or suggestions that others might have,
> and
> > >> >> after a
> > >> >> > > few days I would like to start prototyping this functionality.
> > >> >> > >
> > >> >> > > Thoughts ?
> > >> >> > >
> > >> >> > >
> > >> >> > >
> > >> >> > > --
> > >> >> > > Luciano Resende
> > >> >> > > http://twitter.com/lresende1975
> > >> >> > > http://lresende.blogspot.com/
> > >> >> > >
> > >> >> >
> > >> >>
> > >> >
> > >> >
> > >>
> > >
> > >
> > >
> > >--
> > >Luciano Resende
> > >http://twitter.com/lresende1975
> > >http://lresende.blogspot.com/
> >
> >
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>



-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: Exporting Spark paragraphs as Spark Applications

Posted by Jeff Zhang <zj...@gmail.com>.
Thanks Luciano. IIRC, what user want is to run the whole spark app, but
they don't care about whether it is in zeppelin or through a standard spark
app jar. I know zeppelin currently doesn't do well in converting note to
production spark app as Lee mentioned. But exporting note as jar seems a
short term solution, not a long term solution. I just feel when zeppelin
improve in this field, user might abandon this solution and transit to
zeppelin again. Here's some disadvantages I can see of this approach.

1.  If user want to change the code in iterative development , they have to
repeat the whole pipeline (change code in zeppelin -> export it to spark
jar -> redeploy this jar). This process is painful and wasting time.
2.  Hard to debug and diagnose as code is changed/restructured when
exporting to jar
3.  User have to leverage several distinct tools for the whole development
cycle (zeppelin, spark job server, and maybe cron job)

Besides,  the OOM issue of Spark REPL Lee mentioned might not be a problem.
Because we can shutdown the app (close interpreter) after the app is done.





Luciano Resende <lu...@gmail.com>于2017年1月5日周四 下午3:59写道:

Some use cases discussed earlier on this thread:

https://www.mail-archive.com/dev@zeppelin.apache.org/msg06323.html

https://www.mail-archive.com/dev@zeppelin.apache.org/msg06332.html

On Wed, Jan 4, 2017 at 4:51 PM, Jianfeng (Jeff) Zhang <
jzhang@hortonworks.com> wrote:

>
> I don¹t understand why user want to export zeppelin note as spark
> application.
>
> If they want to trigger the running of spark app, why not use zeppelin¹s
> rest api for that. Even user export it as spark application, most of time
> in reality, they need to submit it through spark job server, so why not
> use zeppelin as a spark job server.
> And if the spark app fails, it is pretty hard to debug it, because the
> exporting tool has changed/restructured the source code.
>
>
> If this is a pretty large and complicated spark application, I don¹t think
> zeppelin is a proper tool for that, they¹d better to use IDE for that
> project.
>
> BTW, After https://github.com/apache/zeppelin/pull/1799, user can define
> the dependency between paragraphs, and they can run one whole note which
> contains different interpreters.
>
>
>
> Best Regard,
> Jeff Zhang
>
>
>
>
>
> On 1/5/17, 2:25 AM, "Luciano Resende" <lu...@gmail.com> wrote:
>
> >I have made some progress with a tool to handle the points discussed in
> >this thread. It's currently a command line tool and given a Zeppelin
> >notebook (note.json) it generates a Spark scala application, compiles it
> >using the compiler embedded in the scala sdk and then package all these
> >resources into a jar that works with spark-submit command.
> >
> >I would like to start prototyping the integration into the Zeppelin UI
and
> >I was wondering if it would be ok to use the above jar as a dependency
> >(e.g. from a maven release) and integrate into zeppelin...
> >
> >Thoughts ?
> >
> >
> >On Mon, Sep 19, 2016 at 7:47 AM, Sourav Mazumder <
> >sourav.mazumder00@gmail.com> wrote:
> >
> >> To Moon's point, This is what my vision is around this feature -
> >>
> >> 1. Use should be able to package 1, more than one, all of the
> >>paragraphs in
> >> a Notebook to create a Jar file which can be used with Spark-Submit.
> >>
> >> 2. The tool should automatically remove the all the interactive
> >>statements
> >> like print, show etc.
> >>
> >> 3. The tool should automatically create a Main class in addition to the
> >>jar
> >> file(s) which will internally call the respective jar. User can then
> >>change
> >> this main class if needed for parameterization through Args.
> >>
> >> Regards,
> >> Sourav
> >>
> >> On Mon, Sep 19, 2016 at 7:33 AM, Sourav Mazumder <
> >> sourav.mazumder00@gmail.com> wrote:
> >>
> >> > I am also pretty much for this.
> >> >
> >> > I have got the similar request from each and every people/group who I
> >> > showcased Zeppelin.Regards,
> >> > Sourav
> >> >
> >> > On Fri, Sep 16, 2016 at 8:06 PM, moon soo Lee <mo...@apache.org>
> wrote:
> >> >
> >> >> Hi Luciano,
> >> >>
> >> >> I've also got a lot of questions about "Productize the notebook"
> >>every
> >> >> time
> >> >> i meet users use Zeppelin in their work.
> >> >>
> >> >> I think it's actually about two different problems that Zeppelin
> >>need to
> >> >> address.
> >> >>
> >> >> *1) Provide way that interactive notebook becomes part of production
> >> data
> >> >> pipeline.*
> >> >>
> >> >> Although Zeppelin does have quite convenient cron-like scheduler for
> >> each
> >> >> Note, built-in cron scheduler is not ready for serious use in the
> >> >> production. Because it lacks some features like actions after
> >> >> success/fail,
> >> >> fault-tolerance, history, and so on. I think community is working on
> >> >> improving it, and it's going to take some time.
> >> >>  Meanwhile, any external enterprise level job scheduler can run Note
> >>or
> >> >> Paragraph via REST api. But we don't have any guide and examples for
> >>it,
> >> >> what are the REST APIs user can use for this purpose, and how to use
> >> them
> >> >> in various cases (e.g. with authentication on, dynamic form
> >>parameters,
> >> >> etc). I think a lot of things need to be improved to make zeppelin
> >> easier
> >> >> to be part of production pipeline.
> >> >>
> >> >> *2) Provide stable way of run spark paragraphs.*
> >> >>
> >> >> Another barrier of using notebook in production pipeline is Scala
> >>REPL
> >> in
> >> >> SparkInterpreter. SparkInterpreter uses Scala REPL to provide
> >> interactive
> >> >> scala session and Scala REPL will eventually hit OOME as it compiles
> >>and
> >> >> runs statements. Current workaround in zeppelin is cron-scheduler
> >>inside
> >> >> of
> >> >> notebook has checkbox that can restart the Note after scheduler runs
> >>it.
> >> >> Of course that option does not apply when external scheduler runs
job
> >> >> through REST api.
> >> >>
> >> >> I think what Luciano suggesting, "Export Spark Paragraph as Spark
> >> >> application" is interesting. If Spark Paragraphs can be easily
> >>packaged
> >> >> into jar (spark application) that can be one of way to address 1)
and
> >> 2).
> >> >> In case of user already have stable way to schedule spark
application
> >> jar.
> >> >>
> >> >> Actually, Flink interactive shell works in similar way internally as
> >>far
> >> >> as
> >> >> i know. i.e. package compiled class into jar and submit.
> >> >>
> >> >> One idea for prototyping is,
> >> >> How about make a interpreter inside of spark interpreter group, say
> >>it's
> >> >> %spark.build or some better name.
> >> >>
> >> >> And if user runs some command like
> >> >>
> >> >> %spark.build
> >> >> package
> >> >>
> >> >> then it builds spark application jar based on spark paragraph in the
> >> Note.
> >> >> I think it can be the simplest user interface for the prototype.
> >> >>
> >> >> Thanks,
> >> >> moon
> >> >>
> >> >> On Fri, Sep 16, 2016 at 1:11 PM Jeremy Anderson <
> >> >> jeremy@objectadjective.com>
> >> >> wrote:
> >> >>
> >> >> > Luciano, I think this would be a terrific feature. I've heard the
> >> exact
> >> >> > same workflow you've describe in all of the research we've done.
> >> >> >
> >> >> > ...........................
> >> >> >
> >> >> > Jeremy Anderson
> >> >> > Founder, Object Adjective
> >> >> > 415.493.8489 <(415)%20493-8489>
> >> >> > jeremy@objectadjective.com
> >> >> > objectadjective.com <http://about.me/jeremyanderson>
> >> >> >
> >> >> >
> >> >> >
> >> >> > This email and any files transmitted with it are confidential and
> >> >> > intended solely for the use of the individual or entity to whom
> >>they
> >> are
> >> >> > addressed.
> >> >> >
> >> >> > On 16 September 2016 at 12:19, Luciano Resende
> >><lu...@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> > > While talking with a few different users, I have been seeing the
> >>use
> >> >> case
> >> >> > > of using iterative development in Notebooks or Spark Shell and
> >>then
> >> >> > copying
> >> >> > > and pasting the final solution to a formal application repeating
> >> >> itself
> >> >> > > very often.
> >> >> > >
> >> >> > > I was wondering if an "Export Spark Paragraphs as a Spark
> >> Application
> >> >> > > (jar)" would be a feature that Zeppelin community would think
> >>it's
> >> >> > useful.
> >> >> > > But keep in mind there are some limitation here : we would be
> >> >> constrained
> >> >> > > to Spark related paragraphs, etc...  but even so, I think there
> >>are
> >> >> > > multiple scenarios where I see that the ability to have an
> >> application
> >> >> > that
> >> >> > > directly runs on Spark to be very useful.
> >> >> > >
> >> >> > > If the community is interested, let's use this thread to discuss
> >>any
> >> >> > > specific requirements or suggestions that others might have, and
> >> >> after a
> >> >> > > few days I would like to start prototyping this functionality.
> >> >> > >
> >> >> > > Thoughts ?
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > --
> >> >> > > Luciano Resende
> >> >> > > http://twitter.com/lresende1975
> >> >> > > http://lresende.blogspot.com/
> >> >> > >
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
> >
> >--
> >Luciano Resende
> >http://twitter.com/lresende1975
> >http://lresende.blogspot.com/
>
>


--
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: Exporting Spark paragraphs as Spark Applications

Posted by Luciano Resende <lu...@gmail.com>.
Some use cases discussed earlier on this thread:

https://www.mail-archive.com/dev@zeppelin.apache.org/msg06323.html

https://www.mail-archive.com/dev@zeppelin.apache.org/msg06332.html

On Wed, Jan 4, 2017 at 4:51 PM, Jianfeng (Jeff) Zhang <
jzhang@hortonworks.com> wrote:

>
> I don¹t understand why user want to export zeppelin note as spark
> application.
>
> If they want to trigger the running of spark app, why not use zeppelin¹s
> rest api for that. Even user export it as spark application, most of time
> in reality, they need to submit it through spark job server, so why not
> use zeppelin as a spark job server.
> And if the spark app fails, it is pretty hard to debug it, because the
> exporting tool has changed/restructured the source code.
>
>
> If this is a pretty large and complicated spark application, I don¹t think
> zeppelin is a proper tool for that, they¹d better to use IDE for that
> project.
>
> BTW, After https://github.com/apache/zeppelin/pull/1799, user can define
> the dependency between paragraphs, and they can run one whole note which
> contains different interpreters.
>
>
>
> Best Regard,
> Jeff Zhang
>
>
>
>
>
> On 1/5/17, 2:25 AM, "Luciano Resende" <lu...@gmail.com> wrote:
>
> >I have made some progress with a tool to handle the points discussed in
> >this thread. It's currently a command line tool and given a Zeppelin
> >notebook (note.json) it generates a Spark scala application, compiles it
> >using the compiler embedded in the scala sdk and then package all these
> >resources into a jar that works with spark-submit command.
> >
> >I would like to start prototyping the integration into the Zeppelin UI and
> >I was wondering if it would be ok to use the above jar as a dependency
> >(e.g. from a maven release) and integrate into zeppelin...
> >
> >Thoughts ?
> >
> >
> >On Mon, Sep 19, 2016 at 7:47 AM, Sourav Mazumder <
> >sourav.mazumder00@gmail.com> wrote:
> >
> >> To Moon's point, This is what my vision is around this feature -
> >>
> >> 1. Use should be able to package 1, more than one, all of the
> >>paragraphs in
> >> a Notebook to create a Jar file which can be used with Spark-Submit.
> >>
> >> 2. The tool should automatically remove the all the interactive
> >>statements
> >> like print, show etc.
> >>
> >> 3. The tool should automatically create a Main class in addition to the
> >>jar
> >> file(s) which will internally call the respective jar. User can then
> >>change
> >> this main class if needed for parameterization through Args.
> >>
> >> Regards,
> >> Sourav
> >>
> >> On Mon, Sep 19, 2016 at 7:33 AM, Sourav Mazumder <
> >> sourav.mazumder00@gmail.com> wrote:
> >>
> >> > I am also pretty much for this.
> >> >
> >> > I have got the similar request from each and every people/group who I
> >> > showcased Zeppelin.Regards,
> >> > Sourav
> >> >
> >> > On Fri, Sep 16, 2016 at 8:06 PM, moon soo Lee <mo...@apache.org>
> wrote:
> >> >
> >> >> Hi Luciano,
> >> >>
> >> >> I've also got a lot of questions about "Productize the notebook"
> >>every
> >> >> time
> >> >> i meet users use Zeppelin in their work.
> >> >>
> >> >> I think it's actually about two different problems that Zeppelin
> >>need to
> >> >> address.
> >> >>
> >> >> *1) Provide way that interactive notebook becomes part of production
> >> data
> >> >> pipeline.*
> >> >>
> >> >> Although Zeppelin does have quite convenient cron-like scheduler for
> >> each
> >> >> Note, built-in cron scheduler is not ready for serious use in the
> >> >> production. Because it lacks some features like actions after
> >> >> success/fail,
> >> >> fault-tolerance, history, and so on. I think community is working on
> >> >> improving it, and it's going to take some time.
> >> >>  Meanwhile, any external enterprise level job scheduler can run Note
> >>or
> >> >> Paragraph via REST api. But we don't have any guide and examples for
> >>it,
> >> >> what are the REST APIs user can use for this purpose, and how to use
> >> them
> >> >> in various cases (e.g. with authentication on, dynamic form
> >>parameters,
> >> >> etc). I think a lot of things need to be improved to make zeppelin
> >> easier
> >> >> to be part of production pipeline.
> >> >>
> >> >> *2) Provide stable way of run spark paragraphs.*
> >> >>
> >> >> Another barrier of using notebook in production pipeline is Scala
> >>REPL
> >> in
> >> >> SparkInterpreter. SparkInterpreter uses Scala REPL to provide
> >> interactive
> >> >> scala session and Scala REPL will eventually hit OOME as it compiles
> >>and
> >> >> runs statements. Current workaround in zeppelin is cron-scheduler
> >>inside
> >> >> of
> >> >> notebook has checkbox that can restart the Note after scheduler runs
> >>it.
> >> >> Of course that option does not apply when external scheduler runs job
> >> >> through REST api.
> >> >>
> >> >> I think what Luciano suggesting, "Export Spark Paragraph as Spark
> >> >> application" is interesting. If Spark Paragraphs can be easily
> >>packaged
> >> >> into jar (spark application) that can be one of way to address 1) and
> >> 2).
> >> >> In case of user already have stable way to schedule spark application
> >> jar.
> >> >>
> >> >> Actually, Flink interactive shell works in similar way internally as
> >>far
> >> >> as
> >> >> i know. i.e. package compiled class into jar and submit.
> >> >>
> >> >> One idea for prototyping is,
> >> >> How about make a interpreter inside of spark interpreter group, say
> >>it's
> >> >> %spark.build or some better name.
> >> >>
> >> >> And if user runs some command like
> >> >>
> >> >> %spark.build
> >> >> package
> >> >>
> >> >> then it builds spark application jar based on spark paragraph in the
> >> Note.
> >> >> I think it can be the simplest user interface for the prototype.
> >> >>
> >> >> Thanks,
> >> >> moon
> >> >>
> >> >> On Fri, Sep 16, 2016 at 1:11 PM Jeremy Anderson <
> >> >> jeremy@objectadjective.com>
> >> >> wrote:
> >> >>
> >> >> > Luciano, I think this would be a terrific feature. I've heard the
> >> exact
> >> >> > same workflow you've describe in all of the research we've done.
> >> >> >
> >> >> > ...........................
> >> >> >
> >> >> > Jeremy Anderson
> >> >> > Founder, Object Adjective
> >> >> > 415.493.8489
> >> >> > jeremy@objectadjective.com
> >> >> > objectadjective.com <http://about.me/jeremyanderson>
> >> >> >
> >> >> >
> >> >> >
> >> >> > This email and any files transmitted with it are confidential and
> >> >> > intended solely for the use of the individual or entity to whom
> >>they
> >> are
> >> >> > addressed.
> >> >> >
> >> >> > On 16 September 2016 at 12:19, Luciano Resende
> >><lu...@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> > > While talking with a few different users, I have been seeing the
> >>use
> >> >> case
> >> >> > > of using iterative development in Notebooks or Spark Shell and
> >>then
> >> >> > copying
> >> >> > > and pasting the final solution to a formal application repeating
> >> >> itself
> >> >> > > very often.
> >> >> > >
> >> >> > > I was wondering if an "Export Spark Paragraphs as a Spark
> >> Application
> >> >> > > (jar)" would be a feature that Zeppelin community would think
> >>it's
> >> >> > useful.
> >> >> > > But keep in mind there are some limitation here : we would be
> >> >> constrained
> >> >> > > to Spark related paragraphs, etc...  but even so, I think there
> >>are
> >> >> > > multiple scenarios where I see that the ability to have an
> >> application
> >> >> > that
> >> >> > > directly runs on Spark to be very useful.
> >> >> > >
> >> >> > > If the community is interested, let's use this thread to discuss
> >>any
> >> >> > > specific requirements or suggestions that others might have, and
> >> >> after a
> >> >> > > few days I would like to start prototyping this functionality.
> >> >> > >
> >> >> > > Thoughts ?
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > --
> >> >> > > Luciano Resende
> >> >> > > http://twitter.com/lresende1975
> >> >> > > http://lresende.blogspot.com/
> >> >> > >
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
> >
> >--
> >Luciano Resende
> >http://twitter.com/lresende1975
> >http://lresende.blogspot.com/
>
>


-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: Exporting Spark paragraphs as Spark Applications

Posted by "Jianfeng (Jeff) Zhang" <jz...@hortonworks.com>.
I don¹t understand why user want to export zeppelin note as spark
application. 

If they want to trigger the running of spark app, why not use zeppelin¹s
rest api for that. Even user export it as spark application, most of time
in reality, they need to submit it through spark job server, so why not
use zeppelin as a spark job server.
And if the spark app fails, it is pretty hard to debug it, because the
exporting tool has changed/restructured the source code.
 

If this is a pretty large and complicated spark application, I don¹t think
zeppelin is a proper tool for that, they¹d better to use IDE for that
project.

BTW, After https://github.com/apache/zeppelin/pull/1799, user can define
the dependency between paragraphs, and they can run one whole note which
contains different interpreters.
 


Best Regard,
Jeff Zhang





On 1/5/17, 2:25 AM, "Luciano Resende" <lu...@gmail.com> wrote:

>I have made some progress with a tool to handle the points discussed in
>this thread. It's currently a command line tool and given a Zeppelin
>notebook (note.json) it generates a Spark scala application, compiles it
>using the compiler embedded in the scala sdk and then package all these
>resources into a jar that works with spark-submit command.
>
>I would like to start prototyping the integration into the Zeppelin UI and
>I was wondering if it would be ok to use the above jar as a dependency
>(e.g. from a maven release) and integrate into zeppelin...
>
>Thoughts ?
>
>
>On Mon, Sep 19, 2016 at 7:47 AM, Sourav Mazumder <
>sourav.mazumder00@gmail.com> wrote:
>
>> To Moon's point, This is what my vision is around this feature -
>>
>> 1. Use should be able to package 1, more than one, all of the
>>paragraphs in
>> a Notebook to create a Jar file which can be used with Spark-Submit.
>>
>> 2. The tool should automatically remove the all the interactive
>>statements
>> like print, show etc.
>>
>> 3. The tool should automatically create a Main class in addition to the
>>jar
>> file(s) which will internally call the respective jar. User can then
>>change
>> this main class if needed for parameterization through Args.
>>
>> Regards,
>> Sourav
>>
>> On Mon, Sep 19, 2016 at 7:33 AM, Sourav Mazumder <
>> sourav.mazumder00@gmail.com> wrote:
>>
>> > I am also pretty much for this.
>> >
>> > I have got the similar request from each and every people/group who I
>> > showcased Zeppelin.Regards,
>> > Sourav
>> >
>> > On Fri, Sep 16, 2016 at 8:06 PM, moon soo Lee <mo...@apache.org> wrote:
>> >
>> >> Hi Luciano,
>> >>
>> >> I've also got a lot of questions about "Productize the notebook"
>>every
>> >> time
>> >> i meet users use Zeppelin in their work.
>> >>
>> >> I think it's actually about two different problems that Zeppelin
>>need to
>> >> address.
>> >>
>> >> *1) Provide way that interactive notebook becomes part of production
>> data
>> >> pipeline.*
>> >>
>> >> Although Zeppelin does have quite convenient cron-like scheduler for
>> each
>> >> Note, built-in cron scheduler is not ready for serious use in the
>> >> production. Because it lacks some features like actions after
>> >> success/fail,
>> >> fault-tolerance, history, and so on. I think community is working on
>> >> improving it, and it's going to take some time.
>> >>  Meanwhile, any external enterprise level job scheduler can run Note
>>or
>> >> Paragraph via REST api. But we don't have any guide and examples for
>>it,
>> >> what are the REST APIs user can use for this purpose, and how to use
>> them
>> >> in various cases (e.g. with authentication on, dynamic form
>>parameters,
>> >> etc). I think a lot of things need to be improved to make zeppelin
>> easier
>> >> to be part of production pipeline.
>> >>
>> >> *2) Provide stable way of run spark paragraphs.*
>> >>
>> >> Another barrier of using notebook in production pipeline is Scala
>>REPL
>> in
>> >> SparkInterpreter. SparkInterpreter uses Scala REPL to provide
>> interactive
>> >> scala session and Scala REPL will eventually hit OOME as it compiles
>>and
>> >> runs statements. Current workaround in zeppelin is cron-scheduler
>>inside
>> >> of
>> >> notebook has checkbox that can restart the Note after scheduler runs
>>it.
>> >> Of course that option does not apply when external scheduler runs job
>> >> through REST api.
>> >>
>> >> I think what Luciano suggesting, "Export Spark Paragraph as Spark
>> >> application" is interesting. If Spark Paragraphs can be easily
>>packaged
>> >> into jar (spark application) that can be one of way to address 1) and
>> 2).
>> >> In case of user already have stable way to schedule spark application
>> jar.
>> >>
>> >> Actually, Flink interactive shell works in similar way internally as
>>far
>> >> as
>> >> i know. i.e. package compiled class into jar and submit.
>> >>
>> >> One idea for prototyping is,
>> >> How about make a interpreter inside of spark interpreter group, say
>>it's
>> >> %spark.build or some better name.
>> >>
>> >> And if user runs some command like
>> >>
>> >> %spark.build
>> >> package
>> >>
>> >> then it builds spark application jar based on spark paragraph in the
>> Note.
>> >> I think it can be the simplest user interface for the prototype.
>> >>
>> >> Thanks,
>> >> moon
>> >>
>> >> On Fri, Sep 16, 2016 at 1:11 PM Jeremy Anderson <
>> >> jeremy@objectadjective.com>
>> >> wrote:
>> >>
>> >> > Luciano, I think this would be a terrific feature. I've heard the
>> exact
>> >> > same workflow you've describe in all of the research we've done.
>> >> >
>> >> > ...........................
>> >> >
>> >> > Jeremy Anderson
>> >> > Founder, Object Adjective
>> >> > 415.493.8489
>> >> > jeremy@objectadjective.com
>> >> > objectadjective.com <http://about.me/jeremyanderson>
>> >> >
>> >> >
>> >> >
>> >> > This email and any files transmitted with it are confidential and
>> >> > intended solely for the use of the individual or entity to whom
>>they
>> are
>> >> > addressed.
>> >> >
>> >> > On 16 September 2016 at 12:19, Luciano Resende
>><lu...@gmail.com>
>> >> > wrote:
>> >> >
>> >> > > While talking with a few different users, I have been seeing the
>>use
>> >> case
>> >> > > of using iterative development in Notebooks or Spark Shell and
>>then
>> >> > copying
>> >> > > and pasting the final solution to a formal application repeating
>> >> itself
>> >> > > very often.
>> >> > >
>> >> > > I was wondering if an "Export Spark Paragraphs as a Spark
>> Application
>> >> > > (jar)" would be a feature that Zeppelin community would think
>>it's
>> >> > useful.
>> >> > > But keep in mind there are some limitation here : we would be
>> >> constrained
>> >> > > to Spark related paragraphs, etc...  but even so, I think there
>>are
>> >> > > multiple scenarios where I see that the ability to have an
>> application
>> >> > that
>> >> > > directly runs on Spark to be very useful.
>> >> > >
>> >> > > If the community is interested, let's use this thread to discuss
>>any
>> >> > > specific requirements or suggestions that others might have, and
>> >> after a
>> >> > > few days I would like to start prototyping this functionality.
>> >> > >
>> >> > > Thoughts ?
>> >> > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > > Luciano Resende
>> >> > > http://twitter.com/lresende1975
>> >> > > http://lresende.blogspot.com/
>> >> > >
>> >> >
>> >>
>> >
>> >
>>
>
>
>
>-- 
>Luciano Resende
>http://twitter.com/lresende1975
>http://lresende.blogspot.com/