You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by York Huang <yo...@gmail.com> on 2016/09/03 08:45:11 UTC

Zeppelin architecture

Hi,

I am new to Zeppelin and have a few questions.
1. Should I install Zeppelin on a Hadoop edge node and every users access from browser? Or should every users have to install their own Zeppelin ?

2. How do I run standard Python without using spark?

3. Can I install Zeppelin on Windows server?

4. Is it possible to share data between interpreters ?

Thanks

York

Sent from my iPhone

Re: Zeppelin architecture

Posted by York Huang <yo...@gmail.com>.
Hi Moon,

Sorry, a few more questions.

My cluster is a mapr cluster.

If I want to install zeppelin on one edge node and multiple users access
that zeppelin, how do I set up multiple users to run jobs and access data
in MapR cluster using their own accounts?

If I want to install zeppelin on every users' desktop and let them to
access MapR from their own desktops, how do I install zeppelin on their
windows desktops?

Is there any guide somewhere?

Thanks,

York

On 7 September 2016 at 10:06, York Huang <yo...@gmail.com> wrote:

> Hi Moon,
>
> More questions.
>
> If I set up the MapR cluster in secure mode, how do I set up zeppelin?
>
> Thanks,
>
> York
>
> On 6 September 2016 at 17:16, York Huang <yo...@gmail.com> wrote:
>
>> Hi Moon,
>>
>> Thanks for your response.
>>
>> I have a MapR 4.1 cluster and would like to use zeppelin on it. If I
>> install zeppelin on an edge node, what security should I set up? The online
>> document is a bit confusing. Basically, I want to set up every users have
>> their own account (either AD or newly created zeppelin account).
>>
>> Is there any guide?
>>
>> Thanks,
>>
>> York
>>
>> On 5 September 2016 at 07:31, moon soo Lee <mo...@apache.org> wrote:
>>
>>> Hi York,
>>>
>>> Thanks for the question.
>>>
>>> 1. How you install zeppelin is up to you and your use case. You can
>>> either run single instances of Zeppelin and configure authentication and
>>> let many user login, or let each user run their own Zeppelin instance.
>>> I see both use cases from users, and it really depends on your
>>> environment.
>>>
>>> 2. From 0.6.0 release, Zeppelin ships python interpreter. You can try
>>> %python.
>>>
>>> 3. You can run Zeppelin on windows by running bin/zeppelin.cmd
>>>
>>> 4. Interpreter can share data through resource pool. You can think
>>> resource pool as a distributed map across all interpreters. Although every
>>> interpreter can access the resource pool, few interpreters expose API to
>>> user and let user directly access the resource pool.
>>>
>>> SparkInterpreter, PysparkInterpreter, SparkRInterpreter are interpreters
>>> that expose resource pool API to user. You can access resource pool via
>>> z.get(), z.put() api. Check [1].
>>>
>>>
>>> Thanks,
>>> moon
>>>
>>> [1] http://zeppelin.apache.org/docs/latest/interpreter/spark
>>> .html#object-exchange
>>>
>>> On Sat, Sep 3, 2016 at 6:45 PM York Huang <yo...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am new to Zeppelin and have a few questions.
>>>> 1. Should I install Zeppelin on a Hadoop edge node and every users
>>>> access from browser? Or should every users have to install their own
>>>> Zeppelin ?
>>>>
>>>> 2. How do I run standard Python without using spark?
>>>>
>>>> 3. Can I install Zeppelin on Windows server?
>>>>
>>>> 4. Is it possible to share data between interpreters ?
>>>>
>>>> Thanks
>>>>
>>>> York
>>>>
>>>> Sent from my iPhone
>>>
>>>
>>
>

Re: Zeppelin architecture

Posted by York Huang <yo...@gmail.com>.
Hi Moon,

More questions.

If I set up the MapR cluster in secure mode, how do I set up zeppelin?

Thanks,

York

On 6 September 2016 at 17:16, York Huang <yo...@gmail.com> wrote:

> Hi Moon,
>
> Thanks for your response.
>
> I have a MapR 4.1 cluster and would like to use zeppelin on it. If I
> install zeppelin on an edge node, what security should I set up? The online
> document is a bit confusing. Basically, I want to set up every users have
> their own account (either AD or newly created zeppelin account).
>
> Is there any guide?
>
> Thanks,
>
> York
>
> On 5 September 2016 at 07:31, moon soo Lee <mo...@apache.org> wrote:
>
>> Hi York,
>>
>> Thanks for the question.
>>
>> 1. How you install zeppelin is up to you and your use case. You can
>> either run single instances of Zeppelin and configure authentication and
>> let many user login, or let each user run their own Zeppelin instance.
>> I see both use cases from users, and it really depends on your
>> environment.
>>
>> 2. From 0.6.0 release, Zeppelin ships python interpreter. You can try
>> %python.
>>
>> 3. You can run Zeppelin on windows by running bin/zeppelin.cmd
>>
>> 4. Interpreter can share data through resource pool. You can think
>> resource pool as a distributed map across all interpreters. Although every
>> interpreter can access the resource pool, few interpreters expose API to
>> user and let user directly access the resource pool.
>>
>> SparkInterpreter, PysparkInterpreter, SparkRInterpreter are interpreters
>> that expose resource pool API to user. You can access resource pool via
>> z.get(), z.put() api. Check [1].
>>
>>
>> Thanks,
>> moon
>>
>> [1] http://zeppelin.apache.org/docs/latest/interpreter/spark
>> .html#object-exchange
>>
>> On Sat, Sep 3, 2016 at 6:45 PM York Huang <yo...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am new to Zeppelin and have a few questions.
>>> 1. Should I install Zeppelin on a Hadoop edge node and every users
>>> access from browser? Or should every users have to install their own
>>> Zeppelin ?
>>>
>>> 2. How do I run standard Python without using spark?
>>>
>>> 3. Can I install Zeppelin on Windows server?
>>>
>>> 4. Is it possible to share data between interpreters ?
>>>
>>> Thanks
>>>
>>> York
>>>
>>> Sent from my iPhone
>>
>>
>

Re: Running R on Zeppelin EMR Cluster

Posted by Flayranalytics <ma...@flayranalytics.co.uk>.
Thanks for sharing. 

That is disappointing that R is not available on EMR. I will look out for updates. 

Regards,
Mark



> On 6 Sep 2016, at 17:42, Jonathan Kelly <jo...@gmail.com> wrote:
> 
> Mark,
> 
> I see in the couchbase-spark-connector Github project that they have already upgraded to Spark 2.0 (https://github.com/couchbase/couchbase-spark-connector/pull/9) but that this change has not yet been released into a new version. According to the discussion on that pull request, it sounds like they are hoping for a new version this month.
> 
> As for using the R interpreter on emr-5.0.0, unfortunately EMR does not yet (officially) support the R interpreter. I expect that we (I'm from EMR, btw) would be able to support it eventually, but I'm unable to give any ETA on that.
> 
> ~ Jonathan
> 
>> On Tue, Sep 6, 2016 at 8:34 AM Mark Mikolajczak - 07855 306 064 <ma...@flayranalytics.co.uk> wrote:
>> Thanks I was afraid that was the solution.
>> 
>> I am connecting to a Couchbase database and the connector only supports Spark 1.6 and by upgrading I will be using EMR-5.0.0 which seems  to only run Spark 2.0…
>>  
>> 
>>> On 6 Sep 2016, at 16:27, Hyung Sung Shim <hs...@nflabs.com> wrote:
>>> 
>>> and EMR-5.0.0 supports Zeppelin 0.6.1.
>>> 
>>> 
>>> 2016-09-07 0:24 GMT+09:00 Hyung Sung Shim <hs...@nflabs.com>:
>>>> Hi.
>>>> Unfortunately Zeppelin 0.5.6 does not support R interpreter.
>>>> Could you upgrade your Zeppelin to higher version? 
>>>> 
>>>> 2016-09-06 23:53 GMT+09:00 Mark Mikolajczak - 07855 306 064 <ma...@flayranalytics.co.uk>:
>>>>> Hi All,
>>>>> 
>>>>> I am trying to setup the R interpreter to run in Zeppelin which is currently running on EMR. Zeppelin is working perfectly and I am able to write script in Scala and Python. When I use %r, %sparkR or %knitr I receive an error : "r interpreter not found"
>>>>> 
>>>>> The applications which I have running in my emr-4.7.2 cluster are: Hive 1.0.0, Zeppelin-Sandbox 0.5.6, Spark 1.6.2, Pig 0.14.0
>>>>> 
>>>>> Within the interpreter there is no mention of R so figure I am missing something but do not know what.
>>>>> 
>>>>> Any pointers greatly appreciated.

Re: Running R on Zeppelin EMR Cluster

Posted by Jonathan Kelly <jo...@gmail.com>.
Mark,

I see in the couchbase-spark-connector Github project that they have
already upgraded to Spark 2.0 (
https://github.com/couchbase/couchbase-spark-connector/pull/9) but that
this change has not yet been released into a new version. According to the
discussion on that pull request, it sounds like they are hoping for a new
version this month.

As for using the R interpreter on emr-5.0.0, unfortunately EMR does not yet
(officially) support the R interpreter. I expect that we (I'm from EMR,
btw) would be able to support it eventually, but I'm unable to give any ETA
on that.

~ Jonathan

On Tue, Sep 6, 2016 at 8:34 AM Mark Mikolajczak - 07855 306 064 <
mark@flayranalytics.co.uk> wrote:

> Thanks I was afraid that was the solution.
>
> I am connecting to a Couchbase database and the connector only supports
> Spark 1.6 and by upgrading I will be using EMR-5.0.0 which seems  to only
> run Spark 2.0…
>
>
> On 6 Sep 2016, at 16:27, Hyung Sung Shim <hs...@nflabs.com> wrote:
>
> and EMR-5.0.0 supports Zeppelin 0.6.1.
>
>
> 2016-09-07 0:24 GMT+09:00 Hyung Sung Shim <hs...@nflabs.com>:
>
>> Hi.
>> Unfortunately Zeppelin 0.5.6 does not support R interpreter.
>> Could you upgrade your Zeppelin to higher version?
>>
>> 2016-09-06 23:53 GMT+09:00 Mark Mikolajczak - 07855 306 064 <
>> mark@flayranalytics.co.uk>:
>>
>>> Hi All,
>>>
>>> I am trying to setup the R interpreter to run in Zeppelin which is
>>> currently running on EMR. Zeppelin is working perfectly and I am able to
>>> write script in Scala and Python. When I use %r, %sparkR or %knitr I
>>> receive an error : "r interpreter not found"
>>>
>>> The applications which I have running in my emr-4.7.2 cluster are: Hive
>>> 1.0.0, Zeppelin-Sandbox 0.5.6, Spark 1.6.2, Pig 0.14.0
>>>
>>> Within the interpreter there is no mention of R so figure I am missing
>>> something but do not know what.
>>>
>>> Any pointers greatly appreciated.
>>>
>>
>>
>
>

Re: Running R on Zeppelin EMR Cluster

Posted by Mark Mikolajczak - 07855 306 064 <ma...@flayranalytics.co.uk>.
Thanks I was afraid that was the solution.

I am connecting to a Couchbase database and the connector only supports Spark 1.6 and by upgrading I will be using EMR-5.0.0 which seems  to only run Spark 2.0…
 

> On 6 Sep 2016, at 16:27, Hyung Sung Shim <hs...@nflabs.com> wrote:
> 
> and EMR-5.0.0 supports Zeppelin 0.6.1.
> 
> 
> 2016-09-07 0:24 GMT+09:00 Hyung Sung Shim <hsshim@nflabs.com <ma...@nflabs.com>>:
> Hi.
> Unfortunately Zeppelin 0.5.6 does not support R interpreter.
> Could you upgrade your Zeppelin to higher version? 
> 
> 2016-09-06 23:53 GMT+09:00 Mark Mikolajczak - 07855 306 064 <mark@flayranalytics.co.uk <ma...@flayranalytics.co.uk>>:
> Hi All,
> 
> I am trying to setup the R interpreter to run in Zeppelin which is currently running on EMR. Zeppelin is working perfectly and I am able to write script in Scala and Python. When I use %r, %sparkR or %knitr I receive an error : "r interpreter not found"
> 
> The applications which I have running in my emr-4.7.2 cluster are: Hive 1.0.0, Zeppelin-Sandbox 0.5.6, Spark 1.6.2, Pig 0.14.0
> 
> Within the interpreter there is no mention of R so figure I am missing something but do not know what.
> 
> Any pointers greatly appreciated.
> 
> 
> 


Re: Running R on Zeppelin EMR Cluster

Posted by Hyung Sung Shim <hs...@nflabs.com>.
and EMR-5.0.0 supports Zeppelin 0.6.1.


2016-09-07 0:24 GMT+09:00 Hyung Sung Shim <hs...@nflabs.com>:

> Hi.
> Unfortunately Zeppelin 0.5.6 does not support R interpreter.
> Could you upgrade your Zeppelin to higher version?
>
> 2016-09-06 23:53 GMT+09:00 Mark Mikolajczak - 07855 306 064 <
> mark@flayranalytics.co.uk>:
>
>> Hi All,
>>
>> I am trying to setup the R interpreter to run in Zeppelin which is
>> currently running on EMR. Zeppelin is working perfectly and I am able to
>> write script in Scala and Python. When I use %r, %sparkR or %knitr I
>> receive an error : "r interpreter not found"
>>
>> The applications which I have running in my emr-4.7.2 cluster are: Hive
>> 1.0.0, Zeppelin-Sandbox 0.5.6, Spark 1.6.2, Pig 0.14.0
>>
>> Within the interpreter there is no mention of R so figure I am missing
>> something but do not know what.
>>
>> Any pointers greatly appreciated.
>>
>
>

Re: Running R on Zeppelin EMR Cluster

Posted by Hyung Sung Shim <hs...@nflabs.com>.
Hi.
Unfortunately Zeppelin 0.5.6 does not support R interpreter.
Could you upgrade your Zeppelin to higher version?

2016-09-06 23:53 GMT+09:00 Mark Mikolajczak - 07855 306 064 <
mark@flayranalytics.co.uk>:

> Hi All,
>
> I am trying to setup the R interpreter to run in Zeppelin which is
> currently running on EMR. Zeppelin is working perfectly and I am able to
> write script in Scala and Python. When I use %r, %sparkR or %knitr I
> receive an error : "r interpreter not found"
>
> The applications which I have running in my emr-4.7.2 cluster are: Hive
> 1.0.0, Zeppelin-Sandbox 0.5.6, Spark 1.6.2, Pig 0.14.0
>
> Within the interpreter there is no mention of R so figure I am missing
> something but do not know what.
>
> Any pointers greatly appreciated.
>

Running R on Zeppelin EMR Cluster

Posted by Mark Mikolajczak - 07855 306 064 <ma...@flayranalytics.co.uk>.
Hi All,

I am trying to setup the R interpreter to run in Zeppelin which is currently running on EMR. Zeppelin is working perfectly and I am able to write script in Scala and Python. When I use %r, %sparkR or %knitr I receive an error : "r interpreter not found"

The applications which I have running in my emr-4.7.2 cluster are: Hive 1.0.0, Zeppelin-Sandbox 0.5.6, Spark 1.6.2, Pig 0.14.0

Within the interpreter there is no mention of R so figure I am missing something but do not know what.

Any pointers greatly appreciated.

Re: Zeppelin architecture

Posted by York Huang <yo...@gmail.com>.
Hi Moon,

Thanks for your response.

I have a MapR 4.1 cluster and would like to use zeppelin on it. If I
install zeppelin on an edge node, what security should I set up? The online
document is a bit confusing. Basically, I want to set up every users have
their own account (either AD or newly created zeppelin account).

Is there any guide?

Thanks,

York

On 5 September 2016 at 07:31, moon soo Lee <mo...@apache.org> wrote:

> Hi York,
>
> Thanks for the question.
>
> 1. How you install zeppelin is up to you and your use case. You can either
> run single instances of Zeppelin and configure authentication and let many
> user login, or let each user run their own Zeppelin instance.
> I see both use cases from users, and it really depends on your environment.
>
> 2. From 0.6.0 release, Zeppelin ships python interpreter. You can try
> %python.
>
> 3. You can run Zeppelin on windows by running bin/zeppelin.cmd
>
> 4. Interpreter can share data through resource pool. You can think
> resource pool as a distributed map across all interpreters. Although every
> interpreter can access the resource pool, few interpreters expose API to
> user and let user directly access the resource pool.
>
> SparkInterpreter, PysparkInterpreter, SparkRInterpreter are interpreters
> that expose resource pool API to user. You can access resource pool via
> z.get(), z.put() api. Check [1].
>
>
> Thanks,
> moon
>
> [1] http://zeppelin.apache.org/docs/latest/interpreter/
> spark.html#object-exchange
>
> On Sat, Sep 3, 2016 at 6:45 PM York Huang <yo...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am new to Zeppelin and have a few questions.
>> 1. Should I install Zeppelin on a Hadoop edge node and every users access
>> from browser? Or should every users have to install their own Zeppelin ?
>>
>> 2. How do I run standard Python without using spark?
>>
>> 3. Can I install Zeppelin on Windows server?
>>
>> 4. Is it possible to share data between interpreters ?
>>
>> Thanks
>>
>> York
>>
>> Sent from my iPhone
>
>

Re: Zeppelin architecture

Posted by moon soo Lee <mo...@apache.org>.
Hi York,

Thanks for the question.

1. How you install zeppelin is up to you and your use case. You can either
run single instances of Zeppelin and configure authentication and let many
user login, or let each user run their own Zeppelin instance.
I see both use cases from users, and it really depends on your environment.

2. From 0.6.0 release, Zeppelin ships python interpreter. You can try
%python.

3. You can run Zeppelin on windows by running bin/zeppelin.cmd

4. Interpreter can share data through resource pool. You can think resource
pool as a distributed map across all interpreters. Although every
interpreter can access the resource pool, few interpreters expose API to
user and let user directly access the resource pool.

SparkInterpreter, PysparkInterpreter, SparkRInterpreter are interpreters
that expose resource pool API to user. You can access resource pool via
z.get(), z.put() api. Check [1].


Thanks,
moon

[1]
http://zeppelin.apache.org/docs/latest/interpreter/spark.html#object-exchange

On Sat, Sep 3, 2016 at 6:45 PM York Huang <yo...@gmail.com> wrote:

> Hi,
>
> I am new to Zeppelin and have a few questions.
> 1. Should I install Zeppelin on a Hadoop edge node and every users access
> from browser? Or should every users have to install their own Zeppelin ?
>
> 2. How do I run standard Python without using spark?
>
> 3. Can I install Zeppelin on Windows server?
>
> 4. Is it possible to share data between interpreters ?
>
> Thanks
>
> York
>
> Sent from my iPhone