You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by Jongyoul Lee <jo...@gmail.com> on 2016/07/01 05:27:42 UTC

Re: Multiple spark interpreters in the same Zeppelin instance

Hi,

This is a little bit later response, but, I think it's useful to share the
current status and the future plan for dealing with this feature.

For now, JdbcInterpreter supports parameter like '%jdbc(drill)',
'%jdbc(hive) and so on. This is a JdbcInterpreter features from
0.6.0-SNAPSHOT and will be included 0.6.0. Furthermore, Zeppelin
interpreter supports the parameter mechanism of JdbcInterpreter as an
alias. Thus you can use %drill, %hive in your paragraph, when you set
proper properties to JDBC on an interpreter tab. You can find more
information on the web[1]. However, it is only for JdbcInterpreter now. In
the next release, Zeppelin will support aliases for all interpreters. Then,
you can make multiple interpreters like '%spark-dev`, '%spark-prod' and so
on. this means different spark interpreters on a single Zeppelin server and
it will allow you to run multiple spark interpreters in a same note
simutaneously. This will be handled ZEPPELIN-1012[2]. Please watch it.

Regards,
Jongyoul Lee

[1]: http://zeppelin.apache.org/docs/0.6.0-SNAPSHOT/interpreter/jdbc.html
[2]: https://issues.apache.org/jira/browse/ZEPPELIN-1012

On Tue, May 3, 2016 at 3:44 AM, John Omernik <jo...@omernik.com> wrote:

> I see two components.
>
> 1. To ability to have multiple interpreters of the same time, but use
> different configuration options. a jdbc1, jdbc2, spark1, spark2, spark3,
> etc.  What ever you want to name them is great, but spark1 would use the
> SPARK_HOME that is configured, and spark2 would use a different SPARK_HOME
> or spark submit options.  That's the top level.
>
> 2. Ability to alias %interpreter to what ever interpreters are defined.
> I.e. I could do %jdbc1 for Drill, %jdbc2 for MySQL. And then have a file
> let as a user, I can say "I want %mysql to point to %jdbc2, and %drill to
> point to %jdbc1.
>
> For #1, the idea here is we will have multiple instances of any given
> interpreter type. for #2, it really should be easy for a user to make their
> environment easy to use and intuitive. Not to pick on your example Rick,
> but as a user typing %spark:dev.sql is a pain... I need two shift
> characters, and another non alpha character.  whereas if I could just type
> %dev.sql and had an alias in my notebook that said %dev pointed to
> %spark_dev that would be handy It may seem like not a big deal, but having
> to type something like that over and over again gets old :)
>
>
>
> On Mon, May 2, 2016 at 11:31 AM, Rick Moritz <ra...@gmail.com> wrote:
>
>> I think the solution would be to distinguish between interpreter type and
>> interpreter instance.
>> The type should be relatively static, while the instance could be any
>> alias/name and only generate a warning when unable to match with entries in
>> interpreter.json. Finally the specific type would be added to distinguish
>> the frontend-language (scala, python, R or sql/hive for spark, for example).
>>
>> Since implementing this would also clear up some of the rather buggy and
>> hard to maintain interpreter-group code, it would be a worthwhile thing to
>> do, in any case.
>> A final call could then look like this: %spark:dev.sql or
>> %spark:prod.pyspark. (or jdbc:drill, jdbc:oracledw)
>> Adding another separator (could be a period also - but the colon is
>> semantically nice, since it's essentially a service and address that we're
>> calling) makes for easy parsing of the string and keeps notes (somewhat)
>> portable.
>>
>> What do you think?
>>
>
>


-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net

Re: Multiple spark interpreters in the same Zeppelin instance

Posted by Jongyoul Lee <jo...@gmail.com>.

Hi,

Concerning importing/exporting notebook with aliases, my simple solution is
to store reference interpreter setting into note.json. It, however, makes
note.json enlarged and it's possible not to match current and new
interpreter setting ids. Second, when user import note, we provide a menu
to choose interpreter mapping if there's no same aliases, or request
interpreter-setting information from previous server. But it looks
complicated for user, and cannot be adopted when user upload json only. And
finally, we can alert if there's not defined interpreter and you should
define it. It's very unfriendly.

There's no simple way now. I'll think of it deeply and find a smart
solution.

Regards,
JL


On Sat, Jul 2, 2016 at 1:22 AM, moon soo Lee <mo...@apache.org> wrote:

> Thanks Jongyoul for taking care of ZEPPELIN-2012 and share the plan.
>
> Could you share little bit more detail about how export/import notebook
> will work after ZEPPELIN-2012? Because we assume export/import notebook
> works between different Zeppelin installations and one installation might
> '%myinterpreter-setting' but the other installation does not.
>
> In this case, user will need to guess type of interpreter from
> '%interpreter-setting' name or code text, and change each paragraph's
> interpreter selection one by one, to run imported notebook in the other
> zeppelin instance.
>
> Will there anyway to simplify the importing and using notebook once user
> able to select interpreter using alias?
>
> Best,
> moon
>
> On Thu, Jun 30, 2016 at 10:27 PM Jongyoul Lee <jo...@gmail.com> wrote:
>
>> Hi,
>>
>> This is a little bit later response, but, I think it's useful to share
>> the current status and the future plan for dealing with this feature.
>>
>> For now, JdbcInterpreter supports parameter like '%jdbc(drill)',
>> '%jdbc(hive) and so on. This is a JdbcInterpreter features from
>> 0.6.0-SNAPSHOT and will be included 0.6.0. Furthermore, Zeppelin
>> interpreter supports the parameter mechanism of JdbcInterpreter as an
>> alias. Thus you can use %drill, %hive in your paragraph, when you set
>> proper properties to JDBC on an interpreter tab. You can find more
>> information on the web[1]. However, it is only for JdbcInterpreter now. In
>> the next release, Zeppelin will support aliases for all interpreters. Then,
>> you can make multiple interpreters like '%spark-dev`, '%spark-prod' and so
>> on. this means different spark interpreters on a single Zeppelin server and
>> it will allow you to run multiple spark interpreters in a same note
>> simutaneously. This will be handled ZEPPELIN-1012[2]. Please watch it.
>>
>> Regards,
>> Jongyoul Lee
>>
>> [1]: http://zeppelin.apache.org/docs/0.6.0-SNAPSHOT/interpreter/jdbc.html
>> [2]: https://issues.apache.org/jira/browse/ZEPPELIN-1012
>>
>> On Tue, May 3, 2016 at 3:44 AM, John Omernik <jo...@omernik.com> wrote:
>>
>>> I see two components.
>>>
>>> 1. To ability to have multiple interpreters of the same time, but use
>>> different configuration options. a jdbc1, jdbc2, spark1, spark2, spark3,
>>> etc.  What ever you want to name them is great, but spark1 would use the
>>> SPARK_HOME that is configured, and spark2 would use a different SPARK_HOME
>>> or spark submit options.  That's the top level.
>>>
>>> 2. Ability to alias %interpreter to what ever interpreters are defined.
>>> I.e. I could do %jdbc1 for Drill, %jdbc2 for MySQL. And then have a file
>>> let as a user, I can say "I want %mysql to point to %jdbc2, and %drill to
>>> point to %jdbc1.
>>>
>>> For #1, the idea here is we will have multiple instances of any given
>>> interpreter type. for #2, it really should be easy for a user to make their
>>> environment easy to use and intuitive. Not to pick on your example Rick,
>>> but as a user typing %spark:dev.sql is a pain... I need two shift
>>> characters, and another non alpha character.  whereas if I could just type
>>> %dev.sql and had an alias in my notebook that said %dev pointed to
>>> %spark_dev that would be handy It may seem like not a big deal, but having
>>> to type something like that over and over again gets old :)
>>>
>>>
>>>
>>> On Mon, May 2, 2016 at 11:31 AM, Rick Moritz <ra...@gmail.com> wrote:
>>>
>>>> I think the solution would be to distinguish between interpreter type
>>>> and interpreter instance.
>>>> The type should be relatively static, while the instance could be any
>>>> alias/name and only generate a warning when unable to match with entries in
>>>> interpreter.json. Finally the specific type would be added to distinguish
>>>> the frontend-language (scala, python, R or sql/hive for spark, for example).
>>>>
>>>> Since implementing this would also clear up some of the rather buggy
>>>> and hard to maintain interpreter-group code, it would be a worthwhile thing
>>>> to do, in any case.
>>>> A final call could then look like this: %spark:dev.sql or
>>>> %spark:prod.pyspark. (or jdbc:drill, jdbc:oracledw)
>>>> Adding another separator (could be a period also - but the colon is
>>>> semantically nice, since it's essentially a service and address that we're
>>>> calling) makes for easy parsing of the string and keeps notes (somewhat)
>>>> portable.
>>>>
>>>> What do you think?
>>>>
>>>
>>>
>>
>>
>> --
>> 이종열, Jongyoul Lee, 李宗烈
>> http://madeng.net
>>
>


-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net

Re: Multiple spark interpreters in the same Zeppelin instance

Posted by moon soo Lee <mo...@apache.org>.

Thanks Jongyoul for taking care of ZEPPELIN-2012 and share the plan.

Could you share little bit more detail about how export/import notebook
will work after ZEPPELIN-2012? Because we assume export/import notebook
works between different Zeppelin installations and one installation might
'%myinterpreter-setting' but the other installation does not.

In this case, user will need to guess type of interpreter from
'%interpreter-setting' name or code text, and change each paragraph's
interpreter selection one by one, to run imported notebook in the other
zeppelin instance.

Will there anyway to simplify the importing and using notebook once user
able to select interpreter using alias?

Best,
moon

On Thu, Jun 30, 2016 at 10:27 PM Jongyoul Lee <jo...@gmail.com> wrote:

> Hi,
>
> This is a little bit later response, but, I think it's useful to share the
> current status and the future plan for dealing with this feature.
>
> For now, JdbcInterpreter supports parameter like '%jdbc(drill)',
> '%jdbc(hive) and so on. This is a JdbcInterpreter features from
> 0.6.0-SNAPSHOT and will be included 0.6.0. Furthermore, Zeppelin
> interpreter supports the parameter mechanism of JdbcInterpreter as an
> alias. Thus you can use %drill, %hive in your paragraph, when you set
> proper properties to JDBC on an interpreter tab. You can find more
> information on the web[1]. However, it is only for JdbcInterpreter now. In
> the next release, Zeppelin will support aliases for all interpreters. Then,
> you can make multiple interpreters like '%spark-dev`, '%spark-prod' and so
> on. this means different spark interpreters on a single Zeppelin server and
> it will allow you to run multiple spark interpreters in a same note
> simutaneously. This will be handled ZEPPELIN-1012[2]. Please watch it.
>
> Regards,
> Jongyoul Lee
>
> [1]: http://zeppelin.apache.org/docs/0.6.0-SNAPSHOT/interpreter/jdbc.html
> [2]: https://issues.apache.org/jira/browse/ZEPPELIN-1012
>
> On Tue, May 3, 2016 at 3:44 AM, John Omernik <jo...@omernik.com> wrote:
>
>> I see two components.
>>
>> 1. To ability to have multiple interpreters of the same time, but use
>> different configuration options. a jdbc1, jdbc2, spark1, spark2, spark3,
>> etc.  What ever you want to name them is great, but spark1 would use the
>> SPARK_HOME that is configured, and spark2 would use a different SPARK_HOME
>> or spark submit options.  That's the top level.
>>
>> 2. Ability to alias %interpreter to what ever interpreters are defined.
>> I.e. I could do %jdbc1 for Drill, %jdbc2 for MySQL. And then have a file
>> let as a user, I can say "I want %mysql to point to %jdbc2, and %drill to
>> point to %jdbc1.
>>
>> For #1, the idea here is we will have multiple instances of any given
>> interpreter type. for #2, it really should be easy for a user to make their
>> environment easy to use and intuitive. Not to pick on your example Rick,
>> but as a user typing %spark:dev.sql is a pain... I need two shift
>> characters, and another non alpha character.  whereas if I could just type
>> %dev.sql and had an alias in my notebook that said %dev pointed to
>> %spark_dev that would be handy It may seem like not a big deal, but having
>> to type something like that over and over again gets old :)
>>
>>
>>
>> On Mon, May 2, 2016 at 11:31 AM, Rick Moritz <ra...@gmail.com> wrote:
>>
>>> I think the solution would be to distinguish between interpreter type
>>> and interpreter instance.
>>> The type should be relatively static, while the instance could be any
>>> alias/name and only generate a warning when unable to match with entries in
>>> interpreter.json. Finally the specific type would be added to distinguish
>>> the frontend-language (scala, python, R or sql/hive for spark, for example).
>>>
>>> Since implementing this would also clear up some of the rather buggy and
>>> hard to maintain interpreter-group code, it would be a worthwhile thing to
>>> do, in any case.
>>> A final call could then look like this: %spark:dev.sql or
>>> %spark:prod.pyspark. (or jdbc:drill, jdbc:oracledw)
>>> Adding another separator (could be a period also - but the colon is
>>> semantically nice, since it's essentially a service and address that we're
>>> calling) makes for easy parsing of the string and keeps notes (somewhat)
>>> portable.
>>>
>>> What do you think?
>>>
>>
>>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>