You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Hammad <ha...@flexilogix.com> on 2016/01/07 14:11:47 UTC

Re: Zeppelin Anatomy

Hi Alex,
Do we have some update on this?

Thanks,

On Sun, Sep 27, 2015 at 3:31 AM, Alex <ke...@gmail.com> wrote:

> Thank you guys for your interest and support!
>
> We will talk about it on ApacheCon on Monday and right after that I will
> post more information here, so stay tuned!
>
> --
> Alex.
>
> On Sat, Sep 26, 2015 at 8:05 AM, Hammad <ha...@flexilogix.com> wrote:
>
>> +1 for multi-tenancy
>>
>> @Alex: do you still need more up votes? :)
>>
>> On Fri, Sep 25, 2015 at 19:24 Sourav Mazumder <
>> sourav.mazumder00@gmail.com> wrote:
>>
>>> I have one more qs/requirement in this context.
>>>
>>> Say I have 2 intercepters (spark1 and spark2) created out of the same
>>> base intercepter (spark). The spark1 connects to a local spark environment
>>> where as spark2 connects to a remote stand alone spark cluster. Both of
>>> them use same Hive on a hadoop cluster.
>>>
>>> What I want to do is , uisng spark1 I would read a local file and save
>>> it to Hive as a table. Then using spark2 I want to process that data with
>>> other data in Hive. I want to use spark2 because of bigger infrastructure
>>> for computation intensive processing.
>>>
>>> Now I can do the same using 2 different note books by specifying spark1
>>> and spark 2 respectively as the interecpter in them separately.
>>>
>>> But I cannot do the same in the same notebook because both spark1 and
>>> spak2 uses same set of tags like %sql, %dep etc.
>>>
>>> Any idea whether this is still doable with some different
>>> configuration/work around ?
>>>
>>> Regards,
>>> Sourav
>>>
>>>
>>>
>>> On Thu, Sep 24, 2015 at 10:53 PM, tog <gu...@gmail.com>
>>> wrote:
>>>
>>>> Hi Alex
>>>>
>>>> Yep, i think the multitenancy set-up has raised numerous questions
>>>> recently. It might be interesting to dedicate a web page to your container
>>>> approach in the doc.
>>>>
>>>> Thanks
>>>> Guillaume
>>>>
>>>> On Friday, 25 September 2015, Alex <ab...@nflabs.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Spark context is bounded to Spark interpreter instance, each running
>>>>> in a separate process.
>>>>>
>>>>> All notes that share the same interpreter - are sharing the context
>>>>> too (among other things)
>>>>>
>>>>> You can archive the desired behaviour in multiuser environment right
>>>>> now, I.e by creating a separate spark interpreter for each user in case all
>>>>> users share access to the same Zeppelin instance.
>>>>>
>>>>> Another approach that we use for our customers is to host a separate
>>>>> Zeppelin instance in container, one per-user, and have a balancing
>>>>> reverse-proxy in front of it.
>>>>>
>>>>> I can share more details on this multi-tenancy setup, if enough people
>>>>> from community are interested in it.
>>>>>
>>>>> Hole this helps!
>>>>>
>>>>> --
>>>>> Kind regards,
>>>>> Alexander
>>>>>
>>>>> On 25 Sep 2015, at 00:54, Yian Shang <yi...@gmail.com> wrote:
>>>>>
>>>>> Are there any plans to change this so that there will be a separate
>>>>> Spark context per Notebook? In a multi-user environment, it is hard to deal
>>>>> with the accidental overwriting of user variables.
>>>>>
>>>>> On Thu, Sep 24, 2015 at 7:19 AM, Rick Moritz <ra...@gmail.com> wrote:
>>>>>
>>>>>> Different instances of Zeppelin (even under the same user) are indeed
>>>>>> separate, which is (currently) the only way to get any kind of independence
>>>>>> into notebooks. In comparison, spark-notebook spawns one spark context per
>>>>>> Notebook, which is somehwat better design, since concurrent useres of the
>>>>>> same application aren't overwriting each other's variables accidentally,
>>>>>> and each notebook is indeed "repeatable" and "stand-alone", which is a
>>>>>> current deficit of Zeppelin, especially ina  multi-user environment.
>>>>>> So yes, closing one context in one instance of Zeppelin will not
>>>>>> interefere with the other Spark context in the other instance of Zeppelin.
>>>>>>
>>>>>> On Thu, Sep 24, 2015 at 4:02 PM, Hammad <ha...@flexilogix.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Very useful indeed, Rick!
>>>>>>>
>>>>>>> If I have two zeppelin instances running as two different users with
>>>>>>> same Spark Master -  I see them as two different applications in Spark Web
>>>>>>> UI.
>>>>>>>
>>>>>>> 1. will they have their own 'context' of execution in this case? If
>>>>>>> I understand, this would mean that closing a spark context in one user's
>>>>>>> zeppelin will have no impact on another user's zeppelin environment or its
>>>>>>> not true?
>>>>>>>
>>>>>>> On Thu, Sep 24, 2015 at 4:47 PM, Rick Moritz <ra...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> 1)
>>>>>>>> Zeppelin uses the spark-shell REPL API. Therefore it behaves
>>>>>>>> similarly to the scala shell.
>>>>>>>> You do not write applications in the shell, in the technical sense,
>>>>>>>> but instead evaluate individual expressions with the goal of interacting
>>>>>>>> with a dataset.
>>>>>>>> You can (manually) export some of the code that you find useful in
>>>>>>>> Zeppelin to applications, for example to provide batch-pre-processing.
>>>>>>>> I recommend you look at demos/descriptions of the interactive shell
>>>>>>>> functionality to get an idea, of what Zeppelin offers over an application.
>>>>>>>> Also: You still have to manage most of your imports ;)
>>>>>>>>
>>>>>>>> 2)
>>>>>>>> There are two benefits:
>>>>>>>> - You can import and export/share notebooks. This means it makes
>>>>>>>> sense to split content.
>>>>>>>> - You also reduce the load of the browser, by splitting heavy
>>>>>>>> visualizations into multiple notebooks. Once you start rendering tens of
>>>>>>>> thousands of points, you start reaching the limits of a browser's
>>>>>>>> capability.
>>>>>>>>
>>>>>>>> Hopefully this helps you get started.
>>>>>>>>
>>>>>>>> On Thu, Sep 24, 2015 at 1:04 PM, Hammad <ha...@flexilogix.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi mates,
>>>>>>>>>
>>>>>>>>> I was struggling with anatomy of Zeppelin in context of Spark and
>>>>>>>>> could not find anywhere that could answer my questions in mind as below;
>>>>>>>>>
>>>>>>>>> 1. Usually a scala application structure is;
>>>>>>>>>
>>>>>>>>> import org.apache.<whatever>
>>>>>>>>>
>>>>>>>>> obect MyApp{
>>>>>>>>> def main(args: Array[String]){
>>>>>>>>> //something
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> whereas, on zeppelin we only write //something. Does it mean that
>>>>>>>>> one zeppelin daemon is one application? What if I want to write multiple
>>>>>>>>> applications on one zeppelin daemon instance?
>>>>>>>>>
>>>>>>>>> 2. Related to (1), if same spark context is shared across all
>>>>>>>>> notebooks, whats the benefit of having multiple notebooks?
>>>>>>>>>
>>>>>>>>> I really appreciate if someone may help me understand above two.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Hmad
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Flexilogix
>>>>>>> Ph: +92 618090374
>>>>>>> Fax: +92 612011810
>>>>>>> http://www.flexilogix.com
>>>>>>> info@flexilogix.com
>>>>>>>
>>>>>>> Disclaimer:  This transmission (including any attachments) may
>>>>>>> contain confidential information, privileged material or constitute
>>>>>>> non-public information. Any use of this information by anyone other than
>>>>>>> the intended recipient is prohibited. If you have received this
>>>>>>> transmission in error, please immediately reply to the sender and delete
>>>>>>> this information from your system.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> PGP KeyID: 2048R/EA31CFC9  subkeys.pgp.net
>>>>
>>>
>>>
>


-- 
Flexilogix
Ph: +92 618090374
Fax: +92 612011810
http://www.flexilogix.com
info@flexilogix.com

Disclaimer:  This transmission (including any attachments) may contain
confidential information, privileged material or constitute non-public
information. Any use of this information by anyone other than the intended
recipient is prohibited. If you have received this transmission in error,
please immediately reply to the sender and delete this information from
your system.