You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by Zhong Wang <wa...@gmail.com> on 2016/02/05 03:54:00 UTC

Multiple spark interpreters in the same Zeppelin instance

Hi zeppelin pilots,

I am trying to run multiple spark interpreters in the same Zeppelin
instance. This is very helpful if the data comes from multiple spark
clusters.

Another useful use case is that, run one instance in cluster mode, and
another in local mode. This will significantly boost the performance of
small data analysis.

Is there anyway to run multiple spark interpreters? I tried to create
another spark interpreter with a different identifier, which is allowed in
UI. But it doesn't work (shall I file a ticket?)

I am now trying running multiple sparkContext in the same spark interpreter.

Zhong

Re: Multiple spark interpreters in the same Zeppelin instance

Posted by vijikarthi <vi...@yahoo.com>.

Portability of notebook export/import will be an issue irrespective of
multi-interpreter support or not since the interpreter configurations are
not part of note. One could also argue that it can never be part of the note
since the environment could be different between source (from where note is
exported) and the target (where note is imported)

Regarding multiple interpreter support, it will be better if we handle the
interpreter definition purely by referring to "alias" as first class
citizen. For e.g., if we have two spark interpreters "spark-dev" and
"spark-prod", we should be able to refer anyone of the interpreters using
the alias name like below.

%<ALIAS_NAME>.<INTERPRETER>

For e.g., 
%spark-dev.spark 
%spark-prod.pyspark
%spark-prod.spark 





--
View this message in context: http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/Multiple-spark-interpreters-in-the-same-Zeppelin-instance-tp2171p2997.html
Sent from the Apache Zeppelin Users (incubating) mailing list mailing list archive at Nabble.com.

Re: Multiple spark interpreters in the same Zeppelin instance

Posted by Jongyoul Lee <jo...@gmail.com>.

Hi,

Concerning importing/exporting notebook with aliases, my simple solution is
to store reference interpreter setting into note.json. It, however, makes
note.json enlarged and it's possible not to match current and new
interpreter setting ids. Second, when user import note, we provide a menu
to choose interpreter mapping if there's no same aliases, or request
interpreter-setting information from previous server. But it looks
complicated for user, and cannot be adopted when user upload json only. And
finally, we can alert if there's not defined interpreter and you should
define it. It's very unfriendly.

There's no simple way now. I'll think of it deeply and find a smart
solution.

Regards,
JL


On Sat, Jul 2, 2016 at 1:22 AM, moon soo Lee <mo...@apache.org> wrote:

> Thanks Jongyoul for taking care of ZEPPELIN-2012 and share the plan.
>
> Could you share little bit more detail about how export/import notebook
> will work after ZEPPELIN-2012? Because we assume export/import notebook
> works between different Zeppelin installations and one installation might
> '%myinterpreter-setting' but the other installation does not.
>
> In this case, user will need to guess type of interpreter from
> '%interpreter-setting' name or code text, and change each paragraph's
> interpreter selection one by one, to run imported notebook in the other
> zeppelin instance.
>
> Will there anyway to simplify the importing and using notebook once user
> able to select interpreter using alias?
>
> Best,
> moon
>
> On Thu, Jun 30, 2016 at 10:27 PM Jongyoul Lee <jo...@gmail.com> wrote:
>
>> Hi,
>>
>> This is a little bit later response, but, I think it's useful to share
>> the current status and the future plan for dealing with this feature.
>>
>> For now, JdbcInterpreter supports parameter like '%jdbc(drill)',
>> '%jdbc(hive) and so on. This is a JdbcInterpreter features from
>> 0.6.0-SNAPSHOT and will be included 0.6.0. Furthermore, Zeppelin
>> interpreter supports the parameter mechanism of JdbcInterpreter as an
>> alias. Thus you can use %drill, %hive in your paragraph, when you set
>> proper properties to JDBC on an interpreter tab. You can find more
>> information on the web[1]. However, it is only for JdbcInterpreter now. In
>> the next release, Zeppelin will support aliases for all interpreters. Then,
>> you can make multiple interpreters like '%spark-dev`, '%spark-prod' and so
>> on. this means different spark interpreters on a single Zeppelin server and
>> it will allow you to run multiple spark interpreters in a same note
>> simutaneously. This will be handled ZEPPELIN-1012[2]. Please watch it.
>>
>> Regards,
>> Jongyoul Lee
>>
>> [1]: http://zeppelin.apache.org/docs/0.6.0-SNAPSHOT/interpreter/jdbc.html
>> [2]: https://issues.apache.org/jira/browse/ZEPPELIN-1012
>>
>> On Tue, May 3, 2016 at 3:44 AM, John Omernik <jo...@omernik.com> wrote:
>>
>>> I see two components.
>>>
>>> 1. To ability to have multiple interpreters of the same time, but use
>>> different configuration options. a jdbc1, jdbc2, spark1, spark2, spark3,
>>> etc.  What ever you want to name them is great, but spark1 would use the
>>> SPARK_HOME that is configured, and spark2 would use a different SPARK_HOME
>>> or spark submit options.  That's the top level.
>>>
>>> 2. Ability to alias %interpreter to what ever interpreters are defined.
>>> I.e. I could do %jdbc1 for Drill, %jdbc2 for MySQL. And then have a file
>>> let as a user, I can say "I want %mysql to point to %jdbc2, and %drill to
>>> point to %jdbc1.
>>>
>>> For #1, the idea here is we will have multiple instances of any given
>>> interpreter type. for #2, it really should be easy for a user to make their
>>> environment easy to use and intuitive. Not to pick on your example Rick,
>>> but as a user typing %spark:dev.sql is a pain... I need two shift
>>> characters, and another non alpha character.  whereas if I could just type
>>> %dev.sql and had an alias in my notebook that said %dev pointed to
>>> %spark_dev that would be handy It may seem like not a big deal, but having
>>> to type something like that over and over again gets old :)
>>>
>>>
>>>
>>> On Mon, May 2, 2016 at 11:31 AM, Rick Moritz <ra...@gmail.com> wrote:
>>>
>>>> I think the solution would be to distinguish between interpreter type
>>>> and interpreter instance.
>>>> The type should be relatively static, while the instance could be any
>>>> alias/name and only generate a warning when unable to match with entries in
>>>> interpreter.json. Finally the specific type would be added to distinguish
>>>> the frontend-language (scala, python, R or sql/hive for spark, for example).
>>>>
>>>> Since implementing this would also clear up some of the rather buggy
>>>> and hard to maintain interpreter-group code, it would be a worthwhile thing
>>>> to do, in any case.
>>>> A final call could then look like this: %spark:dev.sql or
>>>> %spark:prod.pyspark. (or jdbc:drill, jdbc:oracledw)
>>>> Adding another separator (could be a period also - but the colon is
>>>> semantically nice, since it's essentially a service and address that we're
>>>> calling) makes for easy parsing of the string and keeps notes (somewhat)
>>>> portable.
>>>>
>>>> What do you think?
>>>>
>>>
>>>
>>
>>
>> --
>> 이종열, Jongyoul Lee, 李宗烈
>> http://madeng.net
>>
>


-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net

Re: Multiple spark interpreters in the same Zeppelin instance

Posted by moon soo Lee <mo...@apache.org>.

Thanks Jongyoul for taking care of ZEPPELIN-2012 and share the plan.

Could you share little bit more detail about how export/import notebook
will work after ZEPPELIN-2012? Because we assume export/import notebook
works between different Zeppelin installations and one installation might
'%myinterpreter-setting' but the other installation does not.

In this case, user will need to guess type of interpreter from
'%interpreter-setting' name or code text, and change each paragraph's
interpreter selection one by one, to run imported notebook in the other
zeppelin instance.

Will there anyway to simplify the importing and using notebook once user
able to select interpreter using alias?

Best,
moon

On Thu, Jun 30, 2016 at 10:27 PM Jongyoul Lee <jo...@gmail.com> wrote:

> Hi,
>
> This is a little bit later response, but, I think it's useful to share the
> current status and the future plan for dealing with this feature.
>
> For now, JdbcInterpreter supports parameter like '%jdbc(drill)',
> '%jdbc(hive) and so on. This is a JdbcInterpreter features from
> 0.6.0-SNAPSHOT and will be included 0.6.0. Furthermore, Zeppelin
> interpreter supports the parameter mechanism of JdbcInterpreter as an
> alias. Thus you can use %drill, %hive in your paragraph, when you set
> proper properties to JDBC on an interpreter tab. You can find more
> information on the web[1]. However, it is only for JdbcInterpreter now. In
> the next release, Zeppelin will support aliases for all interpreters. Then,
> you can make multiple interpreters like '%spark-dev`, '%spark-prod' and so
> on. this means different spark interpreters on a single Zeppelin server and
> it will allow you to run multiple spark interpreters in a same note
> simutaneously. This will be handled ZEPPELIN-1012[2]. Please watch it.
>
> Regards,
> Jongyoul Lee
>
> [1]: http://zeppelin.apache.org/docs/0.6.0-SNAPSHOT/interpreter/jdbc.html
> [2]: https://issues.apache.org/jira/browse/ZEPPELIN-1012
>
> On Tue, May 3, 2016 at 3:44 AM, John Omernik <jo...@omernik.com> wrote:
>
>> I see two components.
>>
>> 1. To ability to have multiple interpreters of the same time, but use
>> different configuration options. a jdbc1, jdbc2, spark1, spark2, spark3,
>> etc.  What ever you want to name them is great, but spark1 would use the
>> SPARK_HOME that is configured, and spark2 would use a different SPARK_HOME
>> or spark submit options.  That's the top level.
>>
>> 2. Ability to alias %interpreter to what ever interpreters are defined.
>> I.e. I could do %jdbc1 for Drill, %jdbc2 for MySQL. And then have a file
>> let as a user, I can say "I want %mysql to point to %jdbc2, and %drill to
>> point to %jdbc1.
>>
>> For #1, the idea here is we will have multiple instances of any given
>> interpreter type. for #2, it really should be easy for a user to make their
>> environment easy to use and intuitive. Not to pick on your example Rick,
>> but as a user typing %spark:dev.sql is a pain... I need two shift
>> characters, and another non alpha character.  whereas if I could just type
>> %dev.sql and had an alias in my notebook that said %dev pointed to
>> %spark_dev that would be handy It may seem like not a big deal, but having
>> to type something like that over and over again gets old :)
>>
>>
>>
>> On Mon, May 2, 2016 at 11:31 AM, Rick Moritz <ra...@gmail.com> wrote:
>>
>>> I think the solution would be to distinguish between interpreter type
>>> and interpreter instance.
>>> The type should be relatively static, while the instance could be any
>>> alias/name and only generate a warning when unable to match with entries in
>>> interpreter.json. Finally the specific type would be added to distinguish
>>> the frontend-language (scala, python, R or sql/hive for spark, for example).
>>>
>>> Since implementing this would also clear up some of the rather buggy and
>>> hard to maintain interpreter-group code, it would be a worthwhile thing to
>>> do, in any case.
>>> A final call could then look like this: %spark:dev.sql or
>>> %spark:prod.pyspark. (or jdbc:drill, jdbc:oracledw)
>>> Adding another separator (could be a period also - but the colon is
>>> semantically nice, since it's essentially a service and address that we're
>>> calling) makes for easy parsing of the string and keeps notes (somewhat)
>>> portable.
>>>
>>> What do you think?
>>>
>>
>>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>

Re: Multiple spark interpreters in the same Zeppelin instance

Posted by Jongyoul Lee <jo...@gmail.com>.

Hi,

This is a little bit later response, but, I think it's useful to share the
current status and the future plan for dealing with this feature.

For now, JdbcInterpreter supports parameter like '%jdbc(drill)',
'%jdbc(hive) and so on. This is a JdbcInterpreter features from
0.6.0-SNAPSHOT and will be included 0.6.0. Furthermore, Zeppelin
interpreter supports the parameter mechanism of JdbcInterpreter as an
alias. Thus you can use %drill, %hive in your paragraph, when you set
proper properties to JDBC on an interpreter tab. You can find more
information on the web[1]. However, it is only for JdbcInterpreter now. In
the next release, Zeppelin will support aliases for all interpreters. Then,
you can make multiple interpreters like '%spark-dev`, '%spark-prod' and so
on. this means different spark interpreters on a single Zeppelin server and
it will allow you to run multiple spark interpreters in a same note
simutaneously. This will be handled ZEPPELIN-1012[2]. Please watch it.

Regards,
Jongyoul Lee

[1]: http://zeppelin.apache.org/docs/0.6.0-SNAPSHOT/interpreter/jdbc.html
[2]: https://issues.apache.org/jira/browse/ZEPPELIN-1012

On Tue, May 3, 2016 at 3:44 AM, John Omernik <jo...@omernik.com> wrote:

> I see two components.
>
> 1. To ability to have multiple interpreters of the same time, but use
> different configuration options. a jdbc1, jdbc2, spark1, spark2, spark3,
> etc.  What ever you want to name them is great, but spark1 would use the
> SPARK_HOME that is configured, and spark2 would use a different SPARK_HOME
> or spark submit options.  That's the top level.
>
> 2. Ability to alias %interpreter to what ever interpreters are defined.
> I.e. I could do %jdbc1 for Drill, %jdbc2 for MySQL. And then have a file
> let as a user, I can say "I want %mysql to point to %jdbc2, and %drill to
> point to %jdbc1.
>
> For #1, the idea here is we will have multiple instances of any given
> interpreter type. for #2, it really should be easy for a user to make their
> environment easy to use and intuitive. Not to pick on your example Rick,
> but as a user typing %spark:dev.sql is a pain... I need two shift
> characters, and another non alpha character.  whereas if I could just type
> %dev.sql and had an alias in my notebook that said %dev pointed to
> %spark_dev that would be handy It may seem like not a big deal, but having
> to type something like that over and over again gets old :)
>
>
>
> On Mon, May 2, 2016 at 11:31 AM, Rick Moritz <ra...@gmail.com> wrote:
>
>> I think the solution would be to distinguish between interpreter type and
>> interpreter instance.
>> The type should be relatively static, while the instance could be any
>> alias/name and only generate a warning when unable to match with entries in
>> interpreter.json. Finally the specific type would be added to distinguish
>> the frontend-language (scala, python, R or sql/hive for spark, for example).
>>
>> Since implementing this would also clear up some of the rather buggy and
>> hard to maintain interpreter-group code, it would be a worthwhile thing to
>> do, in any case.
>> A final call could then look like this: %spark:dev.sql or
>> %spark:prod.pyspark. (or jdbc:drill, jdbc:oracledw)
>> Adding another separator (could be a period also - but the colon is
>> semantically nice, since it's essentially a service and address that we're
>> calling) makes for easy parsing of the string and keeps notes (somewhat)
>> portable.
>>
>> What do you think?
>>
>
>


-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net

Re: Multiple spark interpreters in the same Zeppelin instance

Posted by John Omernik <jo...@omernik.com>.

I see two components.

1. To ability to have multiple interpreters of the same time, but use
different configuration options. a jdbc1, jdbc2, spark1, spark2, spark3,
etc.  What ever you want to name them is great, but spark1 would use the
SPARK_HOME that is configured, and spark2 would use a different SPARK_HOME
or spark submit options.  That's the top level.

2. Ability to alias %interpreter to what ever interpreters are defined.
I.e. I could do %jdbc1 for Drill, %jdbc2 for MySQL. And then have a file
let as a user, I can say "I want %mysql to point to %jdbc2, and %drill to
point to %jdbc1.

For #1, the idea here is we will have multiple instances of any given
interpreter type. for #2, it really should be easy for a user to make their
environment easy to use and intuitive. Not to pick on your example Rick,
but as a user typing %spark:dev.sql is a pain... I need two shift
characters, and another non alpha character.  whereas if I could just type
%dev.sql and had an alias in my notebook that said %dev pointed to
%spark_dev that would be handy It may seem like not a big deal, but having
to type something like that over and over again gets old :)

On Mon, May 2, 2016 at 11:31 AM, Rick Moritz <ra...@gmail.com> wrote:

> I think the solution would be to distinguish between interpreter type and
> interpreter instance.
> The type should be relatively static, while the instance could be any
> alias/name and only generate a warning when unable to match with entries in
> interpreter.json. Finally the specific type would be added to distinguish
> the frontend-language (scala, python, R or sql/hive for spark, for example).
>
> Since implementing this would also clear up some of the rather buggy and
> hard to maintain interpreter-group code, it would be a worthwhile thing to
> do, in any case.
> A final call could then look like this: %spark:dev.sql or
> %spark:prod.pyspark. (or jdbc:drill, jdbc:oracledw)
> Adding another separator (could be a period also - but the colon is
> semantically nice, since it's essentially a service and address that we're
> calling) makes for easy parsing of the string and keeps notes (somewhat)
> portable.
>
> What do you think?
>

Re: Multiple spark interpreters in the same Zeppelin instance

Posted by Rick Moritz <ra...@gmail.com>.

I think the solution would be to distinguish between interpreter type and
interpreter instance.
The type should be relatively static, while the instance could be any
alias/name and only generate a warning when unable to match with entries in
interpreter.json. Finally the specific type would be added to distinguish
the frontend-language (scala, python, R or sql/hive for spark, for example).

Since implementing this would also clear up some of the rather buggy and
hard to maintain interpreter-group code, it would be a worthwhile thing to
do, in any case.
A final call could then look like this: %spark:dev.sql or
%spark:prod.pyspark. (or jdbc:drill, jdbc:oracledw)
Adding another separator (could be a period also - but the colon is
semantically nice, since it's essentially a service and address that we're
calling) makes for easy parsing of the string and keeps notes (somewhat)
portable.

What do you think?

Re: Multiple spark interpreters in the same Zeppelin instance

Posted by John Omernik <jo...@omernik.com>.

Would this allows us to have multiple instances of spark, or jdbc defined
in the interpreters?  I think one of the big things I am looking for is to
have

%drill -> %jdbc with settings for my drill cluster
%oracledw -> %jdbc with settings for my oracle data warehouse
%sparkprod -> %spark with settings for my spark cluster in production
   %sparkprod.sql
   %sparkprod.pyspark
%sparkdev - %spark with settings for my spark cluster in dev
  %sparkdev.sql
  %sparkprod.pyspark

Would this be possible? It's intuitive from the user, and provides a level
of granularity for the administrator.

John






On Fri, Apr 29, 2016 at 5:23 PM, moon soo Lee <mo...@apache.org> wrote:

> Hi,
>
> Thanks John and DuyHai for sharing the idea.
> I can clearly see demands for using alias instead of static interpreter
> name.
>
> How about save static interpreter name (e.g. spark.sql) in each pargraph
> of note.json and allow alias?
>
> For example, if i have 2 interpreter settings in interpreter.json,
> 'spark-dev', 'spark-cluster',  i can select interpreter in my paragraph
> such as
>
> '%spark-dev'
> '%spark-cluster'
> '%spark-dev.sql'
> '%spark-cluster.sql'
>
> Once user run paragraph, zeppelin insert information of static interpreter
> name into the paragraph. for example
>
> paragraphs : [
>    {
>          text : "%spark-dev  ....",
>          interpreter : "spark.spark",
>          ...
>    },
>    {
>        text : "%spark-cluster.sql ...",
>        interpreter: "spark.sql",
>        ...
>    }
> ]
>
> When Zeppelin imports notebook, Zeppelin can use information of static
> interpreter name from paragraphs.interpreter to suggest possible
> interpreter setting from interpreter.json.
>
> I expect every zeppelin will have different property for any interpreter
> while their system and cluster configuration will be different. So instead
> of embedding all interpreter setting in note.json, embedding only static
> interpreter name in note.json would be more simple and practical.
>
> What do you think?
>
> Thanks,
> moon
>
>
> On Fri, Apr 29, 2016 at 11:15 AM DuyHai Doan <do...@gmail.com> wrote:
>
>> I would agree with John Omernik point about portability of JSON notes
>> because of the strong dependency with configured interpreters.
>>
>> Which gives me an idea: what's about "exporting" interpreters config into
>> note.json file ?
>>
>> Let's say your note has 10 paragraphs but they are just using 3 different
>> interpreters. Upon export, we will just fetch the current config for those
>> 3 interpreters and save them in the note.json
>>
>> On import of the note.json, there is more work to do:
>>
>> - if there are already 3 interpreters as the one saved in the note.json,
>> check the current config
>>    - if the config match, import the note
>>    - else, ask the user with a dialog if they want to 1) use current
>> interpreter conf, 2) override current interpreter conf with the ones in the
>> note.json or 3) try to merge configuration
>>
>> - if for each 3 interpreters in the note.json, there is no matching
>> interpreter instance, propose the user to create one for him by using the
>> config saved in the note
>>
>> And for backward compatibility with old note.json format, on import if we
>> don't find any info related to interpreter we just skip the whole config
>> checking step above
>>
>> What do you think ? It's a little bit complex but I guess it will help
>> greatly portability. I'm not saying it's necessarily easy and indeed is
>> require a lot of code change but I'm just throwing some ideas to feed the
>> discussion
>>
>>
>>
>> On Fri, Apr 29, 2016 at 5:41 PM, John Omernik <jo...@omernik.com> wrote:
>>
>>> Moon -
>>>
>>> I would be curious on your thoughts on my email from April 12th.
>>>
>>> John
>>>
>>>
>>>
>>> On Tue, Apr 12, 2016 at 7:11 AM, John Omernik <jo...@omernik.com> wrote:
>>>
>>>> I would actually argue that if the user doesn't have access to the same
>>>> or a similar interpreter.json file, than notebook file portability is a
>>>> moot point. For example, if I setup %spark or %jdbc in my environment,
>>>> create a notebook, that notebook is not any more or less portable than if I
>>>> had %myspark or %drill (a jdbc  interpreter).  Mainly, because if someone
>>>> tries to open that notebook, and they don't have my setup of %spark or of
>>>> %jdbc, they can't run the notebook.  If we could allow the user to create
>>>> an alias for an instance of an interpreter, and that alias information was
>>>> stored in interpreter.json, then the portability of the notebook would
>>>> essentially that the same.
>>>>
>>>> Said another way:
>>>>
>>>> Static interpreter invocation: (%jdbc, %pysaprk, %psql):
>>>> - This notebook is 100% dependent on the interpreter.json in order to
>>>> run. %jdbc may point to Drill, %pyspark may point to an authenticated YARN
>>>> instance (specific to the user/org), %psql may point to an authenticated
>>>> Postgres instance unique to the org/user.  Without interpreter.json, this
>>>> notebook is not portable.
>>>>
>>>> Aliases for interpreter invocation stored in interpreter.json (%drill
>>>> -> jdbc with settings, %datesciencespark -> pyspark for the data science
>>>> group, %entdw -> postgres server, enterprise datawarehouse)
>>>>
>>>> - Thus notebook is still 100% dependent on the interpreter.json file in
>>>> order to run. There is no more or less dependance on the interpreter.json
>>>> (if these aliases are stored there) then there is if Zeppelin is using
>>>> static interpreter invocation, thus portability is not a benefit of the
>>>> static method, and the aliased method can provide a good deal of analyst
>>>> agility/definition in a multi data set/source environment.
>>>>
>>>>
>>>> My thought is we should people to create new interpreters of known
>>>> types, and on creation of these interpreters allow the invocation to be
>>>> stored in the interpreter.json. Also, if a new interpreter is registered,
>>>> it would follow the same interpreter group methodology. Thus if I setup a
>>>> new %spark to be %entspark, then the sub interpreters (pyspark, sparksql
>>>> etc) can be there and have access to the master entspark, and also can be
>>>> renamed.  so that sub interpreter can be renamed, and the access it has to
>>>> interpreter group is based on the parent child relationship, not just by
>>>> name...
>>>>
>>>> Thoughts?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Feb 5, 2016 at 2:15 PM, Zhong Wang <wa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks moon - it is good to know the ideas behind the design. It makes
>>>>> a lot more sense to use system-defined identifiers in order to make the
>>>>> notebook portable.
>>>>>
>>>>> Currently, I can name the interpreter in the WebUI, but actually the
>>>>> name doesn't help to distinguish between my spark interpreters, which is
>>>>> quite confusing to me. I am not sure whether this is a better way:
>>>>> --
>>>>> 1. the UI generates the default identifier for the first spark
>>>>> interpreter, which is %spark
>>>>> 2. when the user creates another spark interpreter, the UI asks the
>>>>> users to provide a user-defined identifier
>>>>>
>>>>> Zhong
>>>>>
>>>>> On Fri, Feb 5, 2016 at 12:02 AM, moon soo Lee <mo...@apache.org> wrote:
>>>>>
>>>>>> In the initial stage of development, there were discussion about
>>>>>> %xxx, where xxx should be user defined interpreter identifier, or
>>>>>> should be a static interpreter identifier.
>>>>>>
>>>>>> And decided to go later one. Because we wanted keep notebook file
>>>>>> portable. i.e. Let run imported note.json file from other Zeppelin instance
>>>>>> without (or minimum) modification.
>>>>>>
>>>>>> if we use user defined identifier, running imported notebook will not
>>>>>> be very simple. This is why %xxx is not using user defined interpreter
>>>>>> identifier at the moment.
>>>>>>
>>>>>> If you have any other thoughts, ideas, please feel free to share.
>>>>>>
>>>>>> Thanks,
>>>>>> moon
>>>>>>
>>>>>> On Fri, Feb 5, 2016 at 3:58 PM Zhong Wang <wa...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks, Moon! I got it worked. The reason why it didn't work is that
>>>>>>> I tried to use both of the spark interpreters inside one notebook. I think
>>>>>>> I can create different notebooks for each interpreters, but it would be
>>>>>>> great if we could use "%xxx", where xxx is the user defined interpreter
>>>>>>> identifier, to identify different interpreters for different paragraphs.
>>>>>>>
>>>>>>> Besides, because currently both of the interpreters are using
>>>>>>> "spark" as the identifier, they share the same log file. I am not sure
>>>>>>> whether there are other cases they interfere with each other.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Zhong
>>>>>>>
>>>>>>> On Thu, Feb 4, 2016 at 9:04 PM, moon soo Lee <mo...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Once you create another spark interpreter in Interpreter menu of
>>>>>>>> GUI,
>>>>>>>> then each notebook should able to select and use it (setting icon
>>>>>>>> on top right corner of each notebook).
>>>>>>>>
>>>>>>>> If it does not work, could you find error message on the log file?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> moon
>>>>>>>>
>>>>>>>> On Fri, Feb 5, 2016 at 11:54 AM Zhong Wang <wa...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi zeppelin pilots,
>>>>>>>>>
>>>>>>>>> I am trying to run multiple spark interpreters in the same
>>>>>>>>> Zeppelin instance. This is very helpful if the data comes from multiple
>>>>>>>>> spark clusters.
>>>>>>>>>
>>>>>>>>> Another useful use case is that, run one instance in cluster mode,
>>>>>>>>> and another in local mode. This will significantly boost the performance of
>>>>>>>>> small data analysis.
>>>>>>>>>
>>>>>>>>> Is there anyway to run multiple spark interpreters? I tried to
>>>>>>>>> create another spark interpreter with a different identifier, which is
>>>>>>>>> allowed in UI. But it doesn't work (shall I file a ticket?)
>>>>>>>>>
>>>>>>>>> I am now trying running multiple sparkContext in the same spark
>>>>>>>>> interpreter.
>>>>>>>>>
>>>>>>>>> Zhong
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>

Re: Multiple spark interpreters in the same Zeppelin instance

Posted by moon soo Lee <mo...@apache.org>.

Hi,

Thanks John and DuyHai for sharing the idea.
I can clearly see demands for using alias instead of static interpreter
name.

How about save static interpreter name (e.g. spark.sql) in each pargraph of
note.json and allow alias?

For example, if i have 2 interpreter settings in interpreter.json,
'spark-dev', 'spark-cluster',  i can select interpreter in my paragraph
such as

'%spark-dev'
'%spark-cluster'
'%spark-dev.sql'
'%spark-cluster.sql'

Once user run paragraph, zeppelin insert information of static interpreter
name into the paragraph. for example

paragraphs : [
   {
         text : "%spark-dev  ....",
         interpreter : "spark.spark",
         ...
   },
   {
       text : "%spark-cluster.sql ...",
       interpreter: "spark.sql",
       ...
   }
]

When Zeppelin imports notebook, Zeppelin can use information of static
interpreter name from paragraphs.interpreter to suggest possible
interpreter setting from interpreter.json.

I expect every zeppelin will have different property for any interpreter
while their system and cluster configuration will be different. So instead
of embedding all interpreter setting in note.json, embedding only static
interpreter name in note.json would be more simple and practical.

What do you think?

Thanks,
moon


On Fri, Apr 29, 2016 at 11:15 AM DuyHai Doan <do...@gmail.com> wrote:

> I would agree with John Omernik point about portability of JSON notes
> because of the strong dependency with configured interpreters.
>
> Which gives me an idea: what's about "exporting" interpreters config into
> note.json file ?
>
> Let's say your note has 10 paragraphs but they are just using 3 different
> interpreters. Upon export, we will just fetch the current config for those
> 3 interpreters and save them in the note.json
>
> On import of the note.json, there is more work to do:
>
> - if there are already 3 interpreters as the one saved in the note.json,
> check the current config
>    - if the config match, import the note
>    - else, ask the user with a dialog if they want to 1) use current
> interpreter conf, 2) override current interpreter conf with the ones in the
> note.json or 3) try to merge configuration
>
> - if for each 3 interpreters in the note.json, there is no matching
> interpreter instance, propose the user to create one for him by using the
> config saved in the note
>
> And for backward compatibility with old note.json format, on import if we
> don't find any info related to interpreter we just skip the whole config
> checking step above
>
> What do you think ? It's a little bit complex but I guess it will help
> greatly portability. I'm not saying it's necessarily easy and indeed is
> require a lot of code change but I'm just throwing some ideas to feed the
> discussion
>
>
>
> On Fri, Apr 29, 2016 at 5:41 PM, John Omernik <jo...@omernik.com> wrote:
>
>> Moon -
>>
>> I would be curious on your thoughts on my email from April 12th.
>>
>> John
>>
>>
>>
>> On Tue, Apr 12, 2016 at 7:11 AM, John Omernik <jo...@omernik.com> wrote:
>>
>>> I would actually argue that if the user doesn't have access to the same
>>> or a similar interpreter.json file, than notebook file portability is a
>>> moot point. For example, if I setup %spark or %jdbc in my environment,
>>> create a notebook, that notebook is not any more or less portable than if I
>>> had %myspark or %drill (a jdbc  interpreter).  Mainly, because if someone
>>> tries to open that notebook, and they don't have my setup of %spark or of
>>> %jdbc, they can't run the notebook.  If we could allow the user to create
>>> an alias for an instance of an interpreter, and that alias information was
>>> stored in interpreter.json, then the portability of the notebook would
>>> essentially that the same.
>>>
>>> Said another way:
>>>
>>> Static interpreter invocation: (%jdbc, %pysaprk, %psql):
>>> - This notebook is 100% dependent on the interpreter.json in order to
>>> run. %jdbc may point to Drill, %pyspark may point to an authenticated YARN
>>> instance (specific to the user/org), %psql may point to an authenticated
>>> Postgres instance unique to the org/user.  Without interpreter.json, this
>>> notebook is not portable.
>>>
>>> Aliases for interpreter invocation stored in interpreter.json (%drill ->
>>> jdbc with settings, %datesciencespark -> pyspark for the data science
>>> group, %entdw -> postgres server, enterprise datawarehouse)
>>>
>>> - Thus notebook is still 100% dependent on the interpreter.json file in
>>> order to run. There is no more or less dependance on the interpreter.json
>>> (if these aliases are stored there) then there is if Zeppelin is using
>>> static interpreter invocation, thus portability is not a benefit of the
>>> static method, and the aliased method can provide a good deal of analyst
>>> agility/definition in a multi data set/source environment.
>>>
>>>
>>> My thought is we should people to create new interpreters of known
>>> types, and on creation of these interpreters allow the invocation to be
>>> stored in the interpreter.json. Also, if a new interpreter is registered,
>>> it would follow the same interpreter group methodology. Thus if I setup a
>>> new %spark to be %entspark, then the sub interpreters (pyspark, sparksql
>>> etc) can be there and have access to the master entspark, and also can be
>>> renamed.  so that sub interpreter can be renamed, and the access it has to
>>> interpreter group is based on the parent child relationship, not just by
>>> name...
>>>
>>> Thoughts?
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Feb 5, 2016 at 2:15 PM, Zhong Wang <wa...@gmail.com>
>>> wrote:
>>>
>>>> Thanks moon - it is good to know the ideas behind the design. It makes
>>>> a lot more sense to use system-defined identifiers in order to make the
>>>> notebook portable.
>>>>
>>>> Currently, I can name the interpreter in the WebUI, but actually the
>>>> name doesn't help to distinguish between my spark interpreters, which is
>>>> quite confusing to me. I am not sure whether this is a better way:
>>>> --
>>>> 1. the UI generates the default identifier for the first spark
>>>> interpreter, which is %spark
>>>> 2. when the user creates another spark interpreter, the UI asks the
>>>> users to provide a user-defined identifier
>>>>
>>>> Zhong
>>>>
>>>> On Fri, Feb 5, 2016 at 12:02 AM, moon soo Lee <mo...@apache.org> wrote:
>>>>
>>>>> In the initial stage of development, there were discussion about
>>>>> %xxx, where xxx should be user defined interpreter identifier, or
>>>>> should be a static interpreter identifier.
>>>>>
>>>>> And decided to go later one. Because we wanted keep notebook file
>>>>> portable. i.e. Let run imported note.json file from other Zeppelin instance
>>>>> without (or minimum) modification.
>>>>>
>>>>> if we use user defined identifier, running imported notebook will not
>>>>> be very simple. This is why %xxx is not using user defined interpreter
>>>>> identifier at the moment.
>>>>>
>>>>> If you have any other thoughts, ideas, please feel free to share.
>>>>>
>>>>> Thanks,
>>>>> moon
>>>>>
>>>>> On Fri, Feb 5, 2016 at 3:58 PM Zhong Wang <wa...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks, Moon! I got it worked. The reason why it didn't work is that
>>>>>> I tried to use both of the spark interpreters inside one notebook. I think
>>>>>> I can create different notebooks for each interpreters, but it would be
>>>>>> great if we could use "%xxx", where xxx is the user defined interpreter
>>>>>> identifier, to identify different interpreters for different paragraphs.
>>>>>>
>>>>>> Besides, because currently both of the interpreters are using "spark"
>>>>>> as the identifier, they share the same log file. I am not sure whether
>>>>>> there are other cases they interfere with each other.
>>>>>>
>>>>>> Thanks,
>>>>>> Zhong
>>>>>>
>>>>>> On Thu, Feb 4, 2016 at 9:04 PM, moon soo Lee <mo...@apache.org> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Once you create another spark interpreter in Interpreter menu of GUI,
>>>>>>> then each notebook should able to select and use it (setting icon on
>>>>>>> top right corner of each notebook).
>>>>>>>
>>>>>>> If it does not work, could you find error message on the log file?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> moon
>>>>>>>
>>>>>>> On Fri, Feb 5, 2016 at 11:54 AM Zhong Wang <wa...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi zeppelin pilots,
>>>>>>>>
>>>>>>>> I am trying to run multiple spark interpreters in the same Zeppelin
>>>>>>>> instance. This is very helpful if the data comes from multiple spark
>>>>>>>> clusters.
>>>>>>>>
>>>>>>>> Another useful use case is that, run one instance in cluster mode,
>>>>>>>> and another in local mode. This will significantly boost the performance of
>>>>>>>> small data analysis.
>>>>>>>>
>>>>>>>> Is there anyway to run multiple spark interpreters? I tried to
>>>>>>>> create another spark interpreter with a different identifier, which is
>>>>>>>> allowed in UI. But it doesn't work (shall I file a ticket?)
>>>>>>>>
>>>>>>>> I am now trying running multiple sparkContext in the same spark
>>>>>>>> interpreter.
>>>>>>>>
>>>>>>>> Zhong
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: Multiple spark interpreters in the same Zeppelin instance

Posted by DuyHai Doan <do...@gmail.com>.

I would agree with John Omernik point about portability of JSON notes
because of the strong dependency with configured interpreters.

Which gives me an idea: what's about "exporting" interpreters config into
note.json file ?

Let's say your note has 10 paragraphs but they are just using 3 different
interpreters. Upon export, we will just fetch the current config for those
3 interpreters and save them in the note.json

On import of the note.json, there is more work to do:

- if there are already 3 interpreters as the one saved in the note.json,
check the current config
   - if the config match, import the note
   - else, ask the user with a dialog if they want to 1) use current
interpreter conf, 2) override current interpreter conf with the ones in the
note.json or 3) try to merge configuration

- if for each 3 interpreters in the note.json, there is no matching
interpreter instance, propose the user to create one for him by using the
config saved in the note

And for backward compatibility with old note.json format, on import if we
don't find any info related to interpreter we just skip the whole config
checking step above

What do you think ? It's a little bit complex but I guess it will help
greatly portability. I'm not saying it's necessarily easy and indeed is
require a lot of code change but I'm just throwing some ideas to feed the
discussion



On Fri, Apr 29, 2016 at 5:41 PM, John Omernik <jo...@omernik.com> wrote:

> Moon -
>
> I would be curious on your thoughts on my email from April 12th.
>
> John
>
>
>
> On Tue, Apr 12, 2016 at 7:11 AM, John Omernik <jo...@omernik.com> wrote:
>
>> I would actually argue that if the user doesn't have access to the same
>> or a similar interpreter.json file, than notebook file portability is a
>> moot point. For example, if I setup %spark or %jdbc in my environment,
>> create a notebook, that notebook is not any more or less portable than if I
>> had %myspark or %drill (a jdbc  interpreter).  Mainly, because if someone
>> tries to open that notebook, and they don't have my setup of %spark or of
>> %jdbc, they can't run the notebook.  If we could allow the user to create
>> an alias for an instance of an interpreter, and that alias information was
>> stored in interpreter.json, then the portability of the notebook would
>> essentially that the same.
>>
>> Said another way:
>>
>> Static interpreter invocation: (%jdbc, %pysaprk, %psql):
>> - This notebook is 100% dependent on the interpreter.json in order to
>> run. %jdbc may point to Drill, %pyspark may point to an authenticated YARN
>> instance (specific to the user/org), %psql may point to an authenticated
>> Postgres instance unique to the org/user.  Without interpreter.json, this
>> notebook is not portable.
>>
>> Aliases for interpreter invocation stored in interpreter.json (%drill ->
>> jdbc with settings, %datesciencespark -> pyspark for the data science
>> group, %entdw -> postgres server, enterprise datawarehouse)
>>
>> - Thus notebook is still 100% dependent on the interpreter.json file in
>> order to run. There is no more or less dependance on the interpreter.json
>> (if these aliases are stored there) then there is if Zeppelin is using
>> static interpreter invocation, thus portability is not a benefit of the
>> static method, and the aliased method can provide a good deal of analyst
>> agility/definition in a multi data set/source environment.
>>
>>
>> My thought is we should people to create new interpreters of known types,
>> and on creation of these interpreters allow the invocation to be stored in
>> the interpreter.json. Also, if a new interpreter is registered, it would
>> follow the same interpreter group methodology. Thus if I setup a new %spark
>> to be %entspark, then the sub interpreters (pyspark, sparksql etc) can be
>> there and have access to the master entspark, and also can be renamed.  so
>> that sub interpreter can be renamed, and the access it has to interpreter
>> group is based on the parent child relationship, not just by name...
>>
>> Thoughts?
>>
>>
>>
>>
>>
>>
>> On Fri, Feb 5, 2016 at 2:15 PM, Zhong Wang <wa...@gmail.com>
>> wrote:
>>
>>> Thanks moon - it is good to know the ideas behind the design. It makes a
>>> lot more sense to use system-defined identifiers in order to make the
>>> notebook portable.
>>>
>>> Currently, I can name the interpreter in the WebUI, but actually the
>>> name doesn't help to distinguish between my spark interpreters, which is
>>> quite confusing to me. I am not sure whether this is a better way:
>>> --
>>> 1. the UI generates the default identifier for the first spark
>>> interpreter, which is %spark
>>> 2. when the user creates another spark interpreter, the UI asks the
>>> users to provide a user-defined identifier
>>>
>>> Zhong
>>>
>>> On Fri, Feb 5, 2016 at 12:02 AM, moon soo Lee <mo...@apache.org> wrote:
>>>
>>>> In the initial stage of development, there were discussion about
>>>> %xxx, where xxx should be user defined interpreter identifier, or
>>>> should be a static interpreter identifier.
>>>>
>>>> And decided to go later one. Because we wanted keep notebook file
>>>> portable. i.e. Let run imported note.json file from other Zeppelin instance
>>>> without (or minimum) modification.
>>>>
>>>> if we use user defined identifier, running imported notebook will not
>>>> be very simple. This is why %xxx is not using user defined interpreter
>>>> identifier at the moment.
>>>>
>>>> If you have any other thoughts, ideas, please feel free to share.
>>>>
>>>> Thanks,
>>>> moon
>>>>
>>>> On Fri, Feb 5, 2016 at 3:58 PM Zhong Wang <wa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks, Moon! I got it worked. The reason why it didn't work is that I
>>>>> tried to use both of the spark interpreters inside one notebook. I think I
>>>>> can create different notebooks for each interpreters, but it would be great
>>>>> if we could use "%xxx", where xxx is the user defined interpreter
>>>>> identifier, to identify different interpreters for different paragraphs.
>>>>>
>>>>> Besides, because currently both of the interpreters are using "spark"
>>>>> as the identifier, they share the same log file. I am not sure whether
>>>>> there are other cases they interfere with each other.
>>>>>
>>>>> Thanks,
>>>>> Zhong
>>>>>
>>>>> On Thu, Feb 4, 2016 at 9:04 PM, moon soo Lee <mo...@apache.org> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Once you create another spark interpreter in Interpreter menu of GUI,
>>>>>> then each notebook should able to select and use it (setting icon on
>>>>>> top right corner of each notebook).
>>>>>>
>>>>>> If it does not work, could you find error message on the log file?
>>>>>>
>>>>>> Thanks,
>>>>>> moon
>>>>>>
>>>>>> On Fri, Feb 5, 2016 at 11:54 AM Zhong Wang <wa...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi zeppelin pilots,
>>>>>>>
>>>>>>> I am trying to run multiple spark interpreters in the same Zeppelin
>>>>>>> instance. This is very helpful if the data comes from multiple spark
>>>>>>> clusters.
>>>>>>>
>>>>>>> Another useful use case is that, run one instance in cluster mode,
>>>>>>> and another in local mode. This will significantly boost the performance of
>>>>>>> small data analysis.
>>>>>>>
>>>>>>> Is there anyway to run multiple spark interpreters? I tried to
>>>>>>> create another spark interpreter with a different identifier, which is
>>>>>>> allowed in UI. But it doesn't work (shall I file a ticket?)
>>>>>>>
>>>>>>> I am now trying running multiple sparkContext in the same spark
>>>>>>> interpreter.
>>>>>>>
>>>>>>> Zhong
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>
>>
>

Re: Multiple spark interpreters in the same Zeppelin instance

Posted by John Omernik <jo...@omernik.com>.

Moon -

I would be curious on your thoughts on my email from April 12th.

John



On Tue, Apr 12, 2016 at 7:11 AM, John Omernik <jo...@omernik.com> wrote:

> I would actually argue that if the user doesn't have access to the same or
> a similar interpreter.json file, than notebook file portability is a moot
> point. For example, if I setup %spark or %jdbc in my environment, create a
> notebook, that notebook is not any more or less portable than if I had
> %myspark or %drill (a jdbc  interpreter).  Mainly, because if someone tries
> to open that notebook, and they don't have my setup of %spark or of %jdbc,
> they can't run the notebook.  If we could allow the user to create an alias
> for an instance of an interpreter, and that alias information was stored in
> interpreter.json, then the portability of the notebook would essentially
> that the same.
>
> Said another way:
>
> Static interpreter invocation: (%jdbc, %pysaprk, %psql):
> - This notebook is 100% dependent on the interpreter.json in order to run.
> %jdbc may point to Drill, %pyspark may point to an authenticated YARN
> instance (specific to the user/org), %psql may point to an authenticated
> Postgres instance unique to the org/user.  Without interpreter.json, this
> notebook is not portable.
>
> Aliases for interpreter invocation stored in interpreter.json (%drill ->
> jdbc with settings, %datesciencespark -> pyspark for the data science
> group, %entdw -> postgres server, enterprise datawarehouse)
>
> - Thus notebook is still 100% dependent on the interpreter.json file in
> order to run. There is no more or less dependance on the interpreter.json
> (if these aliases are stored there) then there is if Zeppelin is using
> static interpreter invocation, thus portability is not a benefit of the
> static method, and the aliased method can provide a good deal of analyst
> agility/definition in a multi data set/source environment.
>
>
> My thought is we should people to create new interpreters of known types,
> and on creation of these interpreters allow the invocation to be stored in
> the interpreter.json. Also, if a new interpreter is registered, it would
> follow the same interpreter group methodology. Thus if I setup a new %spark
> to be %entspark, then the sub interpreters (pyspark, sparksql etc) can be
> there and have access to the master entspark, and also can be renamed.  so
> that sub interpreter can be renamed, and the access it has to interpreter
> group is based on the parent child relationship, not just by name...
>
> Thoughts?
>
>
>
>
>
>
> On Fri, Feb 5, 2016 at 2:15 PM, Zhong Wang <wa...@gmail.com>
> wrote:
>
>> Thanks moon - it is good to know the ideas behind the design. It makes a
>> lot more sense to use system-defined identifiers in order to make the
>> notebook portable.
>>
>> Currently, I can name the interpreter in the WebUI, but actually the name
>> doesn't help to distinguish between my spark interpreters, which is quite
>> confusing to me. I am not sure whether this is a better way:
>> --
>> 1. the UI generates the default identifier for the first spark
>> interpreter, which is %spark
>> 2. when the user creates another spark interpreter, the UI asks the users
>> to provide a user-defined identifier
>>
>> Zhong
>>
>> On Fri, Feb 5, 2016 at 12:02 AM, moon soo Lee <mo...@apache.org> wrote:
>>
>>> In the initial stage of development, there were discussion about
>>> %xxx, where xxx should be user defined interpreter identifier, or should
>>> be a static interpreter identifier.
>>>
>>> And decided to go later one. Because we wanted keep notebook file
>>> portable. i.e. Let run imported note.json file from other Zeppelin instance
>>> without (or minimum) modification.
>>>
>>> if we use user defined identifier, running imported notebook will not be
>>> very simple. This is why %xxx is not using user defined interpreter
>>> identifier at the moment.
>>>
>>> If you have any other thoughts, ideas, please feel free to share.
>>>
>>> Thanks,
>>> moon
>>>
>>> On Fri, Feb 5, 2016 at 3:58 PM Zhong Wang <wa...@gmail.com>
>>> wrote:
>>>
>>>> Thanks, Moon! I got it worked. The reason why it didn't work is that I
>>>> tried to use both of the spark interpreters inside one notebook. I think I
>>>> can create different notebooks for each interpreters, but it would be great
>>>> if we could use "%xxx", where xxx is the user defined interpreter
>>>> identifier, to identify different interpreters for different paragraphs.
>>>>
>>>> Besides, because currently both of the interpreters are using "spark"
>>>> as the identifier, they share the same log file. I am not sure whether
>>>> there are other cases they interfere with each other.
>>>>
>>>> Thanks,
>>>> Zhong
>>>>
>>>> On Thu, Feb 4, 2016 at 9:04 PM, moon soo Lee <mo...@apache.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Once you create another spark interpreter in Interpreter menu of GUI,
>>>>> then each notebook should able to select and use it (setting icon on
>>>>> top right corner of each notebook).
>>>>>
>>>>> If it does not work, could you find error message on the log file?
>>>>>
>>>>> Thanks,
>>>>> moon
>>>>>
>>>>> On Fri, Feb 5, 2016 at 11:54 AM Zhong Wang <wa...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi zeppelin pilots,
>>>>>>
>>>>>> I am trying to run multiple spark interpreters in the same Zeppelin
>>>>>> instance. This is very helpful if the data comes from multiple spark
>>>>>> clusters.
>>>>>>
>>>>>> Another useful use case is that, run one instance in cluster mode,
>>>>>> and another in local mode. This will significantly boost the performance of
>>>>>> small data analysis.
>>>>>>
>>>>>> Is there anyway to run multiple spark interpreters? I tried to create
>>>>>> another spark interpreter with a different identifier, which is allowed in
>>>>>> UI. But it doesn't work (shall I file a ticket?)
>>>>>>
>>>>>> I am now trying running multiple sparkContext in the same spark
>>>>>> interpreter.
>>>>>>
>>>>>> Zhong
>>>>>>
>>>>>>
>>>>>>
>>>>
>>
>

Re: Multiple spark interpreters in the same Zeppelin instance

Posted by John Omernik <jo...@omernik.com>.

I would actually argue that if the user doesn't have access to the same or
a similar interpreter.json file, than notebook file portability is a moot
point. For example, if I setup %spark or %jdbc in my environment, create a
notebook, that notebook is not any more or less portable than if I had
%myspark or %drill (a jdbc  interpreter).  Mainly, because if someone tries
to open that notebook, and they don't have my setup of %spark or of %jdbc,
they can't run the notebook.  If we could allow the user to create an alias
for an instance of an interpreter, and that alias information was stored in
interpreter.json, then the portability of the notebook would essentially
that the same.

Said another way:

Static interpreter invocation: (%jdbc, %pysaprk, %psql):
- This notebook is 100% dependent on the interpreter.json in order to run.
%jdbc may point to Drill, %pyspark may point to an authenticated YARN
instance (specific to the user/org), %psql may point to an authenticated
Postgres instance unique to the org/user.  Without interpreter.json, this
notebook is not portable.

Aliases for interpreter invocation stored in interpreter.json (%drill ->
jdbc with settings, %datesciencespark -> pyspark for the data science
group, %entdw -> postgres server, enterprise datawarehouse)

- Thus notebook is still 100% dependent on the interpreter.json file in
order to run. There is no more or less dependance on the interpreter.json
(if these aliases are stored there) then there is if Zeppelin is using
static interpreter invocation, thus portability is not a benefit of the
static method, and the aliased method can provide a good deal of analyst
agility/definition in a multi data set/source environment.


My thought is we should people to create new interpreters of known types,
and on creation of these interpreters allow the invocation to be stored in
the interpreter.json. Also, if a new interpreter is registered, it would
follow the same interpreter group methodology. Thus if I setup a new %spark
to be %entspark, then the sub interpreters (pyspark, sparksql etc) can be
there and have access to the master entspark, and also can be renamed.  so
that sub interpreter can be renamed, and the access it has to interpreter
group is based on the parent child relationship, not just by name...

Thoughts?






On Fri, Feb 5, 2016 at 2:15 PM, Zhong Wang <wa...@gmail.com> wrote:

> Thanks moon - it is good to know the ideas behind the design. It makes a
> lot more sense to use system-defined identifiers in order to make the
> notebook portable.
>
> Currently, I can name the interpreter in the WebUI, but actually the name
> doesn't help to distinguish between my spark interpreters, which is quite
> confusing to me. I am not sure whether this is a better way:
> --
> 1. the UI generates the default identifier for the first spark
> interpreter, which is %spark
> 2. when the user creates another spark interpreter, the UI asks the users
> to provide a user-defined identifier
>
> Zhong
>
> On Fri, Feb 5, 2016 at 12:02 AM, moon soo Lee <mo...@apache.org> wrote:
>
>> In the initial stage of development, there were discussion about
>> %xxx, where xxx should be user defined interpreter identifier, or should
>> be a static interpreter identifier.
>>
>> And decided to go later one. Because we wanted keep notebook file
>> portable. i.e. Let run imported note.json file from other Zeppelin instance
>> without (or minimum) modification.
>>
>> if we use user defined identifier, running imported notebook will not be
>> very simple. This is why %xxx is not using user defined interpreter
>> identifier at the moment.
>>
>> If you have any other thoughts, ideas, please feel free to share.
>>
>> Thanks,
>> moon
>>
>> On Fri, Feb 5, 2016 at 3:58 PM Zhong Wang <wa...@gmail.com>
>> wrote:
>>
>>> Thanks, Moon! I got it worked. The reason why it didn't work is that I
>>> tried to use both of the spark interpreters inside one notebook. I think I
>>> can create different notebooks for each interpreters, but it would be great
>>> if we could use "%xxx", where xxx is the user defined interpreter
>>> identifier, to identify different interpreters for different paragraphs.
>>>
>>> Besides, because currently both of the interpreters are using "spark" as
>>> the identifier, they share the same log file. I am not sure whether there
>>> are other cases they interfere with each other.
>>>
>>> Thanks,
>>> Zhong
>>>
>>> On Thu, Feb 4, 2016 at 9:04 PM, moon soo Lee <mo...@apache.org> wrote:
>>>
>>>> Hi,
>>>>
>>>> Once you create another spark interpreter in Interpreter menu of GUI,
>>>> then each notebook should able to select and use it (setting icon on
>>>> top right corner of each notebook).
>>>>
>>>> If it does not work, could you find error message on the log file?
>>>>
>>>> Thanks,
>>>> moon
>>>>
>>>> On Fri, Feb 5, 2016 at 11:54 AM Zhong Wang <wa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi zeppelin pilots,
>>>>>
>>>>> I am trying to run multiple spark interpreters in the same Zeppelin
>>>>> instance. This is very helpful if the data comes from multiple spark
>>>>> clusters.
>>>>>
>>>>> Another useful use case is that, run one instance in cluster mode, and
>>>>> another in local mode. This will significantly boost the performance of
>>>>> small data analysis.
>>>>>
>>>>> Is there anyway to run multiple spark interpreters? I tried to create
>>>>> another spark interpreter with a different identifier, which is allowed in
>>>>> UI. But it doesn't work (shall I file a ticket?)
>>>>>
>>>>> I am now trying running multiple sparkContext in the same spark
>>>>> interpreter.
>>>>>
>>>>> Zhong
>>>>>
>>>>>
>>>>>
>>>
>

Re: Multiple spark interpreters in the same Zeppelin instance

Posted by Zhong Wang <wa...@gmail.com>.

Thanks moon - it is good to know the ideas behind the design. It makes a
lot more sense to use system-defined identifiers in order to make the
notebook portable.

Currently, I can name the interpreter in the WebUI, but actually the name
doesn't help to distinguish between my spark interpreters, which is quite
confusing to me. I am not sure whether this is a better way:
--
1. the UI generates the default identifier for the first spark interpreter,
which is %spark
2. when the user creates another spark interpreter, the UI asks the users
to provide a user-defined identifier

Zhong

On Fri, Feb 5, 2016 at 12:02 AM, moon soo Lee <mo...@apache.org> wrote:

> In the initial stage of development, there were discussion about
> %xxx, where xxx should be user defined interpreter identifier, or should
> be a static interpreter identifier.
>
> And decided to go later one. Because we wanted keep notebook file
> portable. i.e. Let run imported note.json file from other Zeppelin instance
> without (or minimum) modification.
>
> if we use user defined identifier, running imported notebook will not be
> very simple. This is why %xxx is not using user defined interpreter
> identifier at the moment.
>
> If you have any other thoughts, ideas, please feel free to share.
>
> Thanks,
> moon
>
> On Fri, Feb 5, 2016 at 3:58 PM Zhong Wang <wa...@gmail.com> wrote:
>
>> Thanks, Moon! I got it worked. The reason why it didn't work is that I
>> tried to use both of the spark interpreters inside one notebook. I think I
>> can create different notebooks for each interpreters, but it would be great
>> if we could use "%xxx", where xxx is the user defined interpreter
>> identifier, to identify different interpreters for different paragraphs.
>>
>> Besides, because currently both of the interpreters are using "spark" as
>> the identifier, they share the same log file. I am not sure whether there
>> are other cases they interfere with each other.
>>
>> Thanks,
>> Zhong
>>
>> On Thu, Feb 4, 2016 at 9:04 PM, moon soo Lee <mo...@apache.org> wrote:
>>
>>> Hi,
>>>
>>> Once you create another spark interpreter in Interpreter menu of GUI,
>>> then each notebook should able to select and use it (setting icon on top
>>> right corner of each notebook).
>>>
>>> If it does not work, could you find error message on the log file?
>>>
>>> Thanks,
>>> moon
>>>
>>> On Fri, Feb 5, 2016 at 11:54 AM Zhong Wang <wa...@gmail.com>
>>> wrote:
>>>
>>>> Hi zeppelin pilots,
>>>>
>>>> I am trying to run multiple spark interpreters in the same Zeppelin
>>>> instance. This is very helpful if the data comes from multiple spark
>>>> clusters.
>>>>
>>>> Another useful use case is that, run one instance in cluster mode, and
>>>> another in local mode. This will significantly boost the performance of
>>>> small data analysis.
>>>>
>>>> Is there anyway to run multiple spark interpreters? I tried to create
>>>> another spark interpreter with a different identifier, which is allowed in
>>>> UI. But it doesn't work (shall I file a ticket?)
>>>>
>>>> I am now trying running multiple sparkContext in the same spark
>>>> interpreter.
>>>>
>>>> Zhong
>>>>
>>>>
>>>>
>>

Re: Multiple spark interpreters in the same Zeppelin instance

Posted by moon soo Lee <mo...@apache.org>.

In the initial stage of development, there were discussion about
%xxx, where xxx should be user defined interpreter identifier, or should be
a static interpreter identifier.

And decided to go later one. Because we wanted keep notebook file portable.
i.e. Let run imported note.json file from other Zeppelin instance without
(or minimum) modification.

if we use user defined identifier, running imported notebook will not be
very simple. This is why %xxx is not using user defined interpreter
identifier at the moment.

If you have any other thoughts, ideas, please feel free to share.

Thanks,
moon

On Fri, Feb 5, 2016 at 3:58 PM Zhong Wang <wa...@gmail.com> wrote:

> Thanks, Moon! I got it worked. The reason why it didn't work is that I
> tried to use both of the spark interpreters inside one notebook. I think I
> can create different notebooks for each interpreters, but it would be great
> if we could use "%xxx", where xxx is the user defined interpreter
> identifier, to identify different interpreters for different paragraphs.
>
> Besides, because currently both of the interpreters are using "spark" as
> the identifier, they share the same log file. I am not sure whether there
> are other cases they interfere with each other.
>
> Thanks,
> Zhong
>
> On Thu, Feb 4, 2016 at 9:04 PM, moon soo Lee <mo...@apache.org> wrote:
>
>> Hi,
>>
>> Once you create another spark interpreter in Interpreter menu of GUI,
>> then each notebook should able to select and use it (setting icon on top
>> right corner of each notebook).
>>
>> If it does not work, could you find error message on the log file?
>>
>> Thanks,
>> moon
>>
>> On Fri, Feb 5, 2016 at 11:54 AM Zhong Wang <wa...@gmail.com>
>> wrote:
>>
>>> Hi zeppelin pilots,
>>>
>>> I am trying to run multiple spark interpreters in the same Zeppelin
>>> instance. This is very helpful if the data comes from multiple spark
>>> clusters.
>>>
>>> Another useful use case is that, run one instance in cluster mode, and
>>> another in local mode. This will significantly boost the performance of
>>> small data analysis.
>>>
>>> Is there anyway to run multiple spark interpreters? I tried to create
>>> another spark interpreter with a different identifier, which is allowed in
>>> UI. But it doesn't work (shall I file a ticket?)
>>>
>>> I am now trying running multiple sparkContext in the same spark
>>> interpreter.
>>>
>>> Zhong
>>>
>>>
>>>
>

Re: Multiple spark interpreters in the same Zeppelin instance

Posted by Zhong Wang <wa...@gmail.com>.

Thanks, Moon! I got it worked. The reason why it didn't work is that I
tried to use both of the spark interpreters inside one notebook. I think I
can create different notebooks for each interpreters, but it would be great
if we could use "%xxx", where xxx is the user defined interpreter
identifier, to identify different interpreters for different paragraphs.

Besides, because currently both of the interpreters are using "spark" as
the identifier, they share the same log file. I am not sure whether there
are other cases they interfere with each other.

Thanks,
Zhong

On Thu, Feb 4, 2016 at 9:04 PM, moon soo Lee <mo...@apache.org> wrote:

> Hi,
>
> Once you create another spark interpreter in Interpreter menu of GUI,
> then each notebook should able to select and use it (setting icon on top
> right corner of each notebook).
>
> If it does not work, could you find error message on the log file?
>
> Thanks,
> moon
>
> On Fri, Feb 5, 2016 at 11:54 AM Zhong Wang <wa...@gmail.com>
> wrote:
>
>> Hi zeppelin pilots,
>>
>> I am trying to run multiple spark interpreters in the same Zeppelin
>> instance. This is very helpful if the data comes from multiple spark
>> clusters.
>>
>> Another useful use case is that, run one instance in cluster mode, and
>> another in local mode. This will significantly boost the performance of
>> small data analysis.
>>
>> Is there anyway to run multiple spark interpreters? I tried to create
>> another spark interpreter with a different identifier, which is allowed in
>> UI. But it doesn't work (shall I file a ticket?)
>>
>> I am now trying running multiple sparkContext in the same spark
>> interpreter.
>>
>> Zhong
>>
>>
>>

Re: Multiple spark interpreters in the same Zeppelin instance

Posted by moon soo Lee <mo...@apache.org>.

Hi,

Once you create another spark interpreter in Interpreter menu of GUI,
then each notebook should able to select and use it (setting icon on top
right corner of each notebook).

If it does not work, could you find error message on the log file?

Thanks,
moon

On Fri, Feb 5, 2016 at 11:54 AM Zhong Wang <wa...@gmail.com> wrote:

> Hi zeppelin pilots,
>
> I am trying to run multiple spark interpreters in the same Zeppelin
> instance. This is very helpful if the data comes from multiple spark
> clusters.
>
> Another useful use case is that, run one instance in cluster mode, and
> another in local mode. This will significantly boost the performance of
> small data analysis.
>
> Is there anyway to run multiple spark interpreters? I tried to create
> another spark interpreter with a different identifier, which is allowed in
> UI. But it doesn't work (shall I file a ticket?)
>
> I am now trying running multiple sparkContext in the same spark
> interpreter.
>
> Zhong
>
>
>