You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by Jhon Anderson Cardenas Diaz <jh...@gmail.com> on 2018/03/13 23:24:09 UTC

Zeppelin - Spark Driver location

Hi zeppelin users !

I am working with zeppelin pointing to a spark in standalone. I am trying
to figure out a way to make zeppelin runs the spark driver outside of
client process that submits the application.

According with the documentation (
http://spark.apache.org/docs/2.1.1/spark-standalone.html):

*For standalone clusters, Spark currently supports two deploy modes.
In client mode, the driver is launched in the same process as the client
that submits the application. In cluster mode, however, the driver is
launched from one of the Worker processes inside the cluster, and the
client process exits as soon as it fulfills its responsibility of
submitting the application without waiting for the application to finish.*

The problem is that, even when I set the properties for spark-standalone
cluster and deploy mode in cluster, the driver still run inside zeppelin
machine (according with spark UI/executors page). These are properties that
I am setting for the spark interpreter:

master: spark://<master-name>:7077
spark.submit.deployMode: cluster
spark.executor.memory: 16g

Any ideas would be appreciated.

Thank you

Details:
Spark version: 2.1.1
Zeppelin version: 0.8.0 (merged at September 2017 version)

Re: Zeppelin - Spark Driver location

Posted by Jeff Zhang <zj...@gmail.com>.

spark-submit would only run when you run the first paragraph using spark
interpreter. After that, paragraph would send code to the spark app to
execute.

>>> Also spark standalone cluster moder should work even before this new
release, right?
I didn't verify that, not sure whether other people veryfit.


ankit jain <an...@gmail.com>于2018年3月15日周四 上午4:32写道：

> Also spark standalone cluster moder should work even before this new
> release, right?
>
> On Wed, Mar 14, 2018 at 8:43 AM, ankit jain <an...@gmail.com>
> wrote:
>
>> Hi Jhang,
>> Not clear on that - I thought spark-submit was done when we run a
>> paragraph, how does the .sh file come into play?
>>
>> Thanks
>> Ankit
>>
>> On Tue, Mar 13, 2018 at 5:43 PM, Jeff Zhang <zj...@gmail.com> wrote:
>>
>>>
>>> spark-submit is called in bin/interpreter.sh,  I didn't try standalone
>>> cluster mode. It is expected to run driver in separate host, but didn't
>>> guaranteed zeppelin support this.
>>>
>>> Ankit Jain <an...@gmail.com>于2018年3月14日周三 上午8:34写道：
>>>
>>>> Hi Jhang,
>>>> What is the expected behavior with standalone cluster mode? Should we
>>>> see separate driver processes in the cluster(one per user) or multiple
>>>> SparkSubmit processes?
>>>>
>>>> I was trying to dig in Zeppelin code & didn’t see where Zeppelin does
>>>> the Spark-submit to the cluster? Can you please point to it?
>>>>
>>>> Thanks
>>>> Ankit
>>>>
>>>> On Mar 13, 2018, at 5:25 PM, Jeff Zhang <zj...@gmail.com> wrote:
>>>>
>>>>
>>>> ZEPPELIN-2898 <https://issues.apache.org/jira/browse/ZEPPELIN-2898> is
>>>> for yarn cluster model.  And Zeppelin have integration test for yarn mode,
>>>> so guaranteed it would work. But don't' have test for standalone, so not
>>>> sure the behavior of standalone mode.
>>>>
>>>>
>>>> Ruslan Dautkhanov <da...@gmail.com>于2018年3月14日周三 上午8:06写道：
>>>>
>>>>> https://github.com/apache/zeppelin/pull/2577 pronounces yarn-cluster
>>>>> in it's title so I assume it's only yarn-cluster.
>>>>> Never used standalone-cluster myself.
>>>>>
>>>>> Which distro of Hadoop do you use?
>>>>> Cloudera desupported standalone in CDH 5.5 and will remove in CDH 6.
>>>>>
>>>>> https://www.cloudera.com/documentation/enterprise/release-notes/topics/rg_deprecated.html
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Ruslan Dautkhanov
>>>>>
>>>>> On Tue, Mar 13, 2018 at 5:45 PM, Jhon Anderson Cardenas Diaz <
>>>>> jhonderson2007@gmail.com> wrote:
>>>>>
>>>>>> Does this new feature work only for yarn-cluster ?. Or for spark
>>>>>> standalone too ?
>>>>>>
>>>>> El mar., 13 de mar. de 2018 18:34, Ruslan Dautkhanov <
>>>>>> dautkhanov@gmail.com> escribió:
>>>>>>
>>>>> > Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>>>>>
>>>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-2898 was merged end
>>>>>>> of September so not sure if you have that.
>>>>>>>
>>>>>>> Check out
>>>>>>> https://medium.com/@zjffdu/zeppelin-0-8-0-new-features-ea53e8810235
>>>>>>> how to set this up.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ruslan Dautkhanov
>>>>>>>
>>>>>>> On Tue, Mar 13, 2018 at 5:24 PM, Jhon Anderson Cardenas Diaz <
>>>>>>> jhonderson2007@gmail.com> wrote:
>>>>>>>
>>>>>> Hi zeppelin users !
>>>>>>>>
>>>>>>>> I am working with zeppelin pointing to a spark in standalone. I am
>>>>>>>> trying to figure out a way to make zeppelin runs the spark driver outside
>>>>>>>> of client process that submits the application.
>>>>>>>>
>>>>>>>> According with the documentation (
>>>>>>>> http://spark.apache.org/docs/2.1.1/spark-standalone.html):
>>>>>>>>
>>>>>>>> *For standalone clusters, Spark currently supports two deploy
>>>>>>>> modes. In client mode, the driver is launched in the same process as the
>>>>>>>> client that submits the application. In cluster mode, however, the driver
>>>>>>>> is launched from one of the Worker processes inside the cluster, and the
>>>>>>>> client process exits as soon as it fulfills its responsibility of
>>>>>>>> submitting the application without waiting for the application to finish.*
>>>>>>>>
>>>>>>>> The problem is that, even when I set the properties for
>>>>>>>> spark-standalone cluster and deploy mode in cluster, the driver still run
>>>>>>>> inside zeppelin machine (according with spark UI/executors page). These are
>>>>>>>> properties that I am setting for the spark interpreter:
>>>>>>>>
>>>>>>>> master: spark://<master-name>:7077
>>>>>>>> spark.submit.deployMode: cluster
>>>>>>>> spark.executor.memory: 16g
>>>>>>>>
>>>>>>>> Any ideas would be appreciated.
>>>>>>>>
>>>>>>>> Thank you
>>>>>>>>
>>>>>>>> Details:
>>>>>>>> Spark version: 2.1.1
>>>>>>>> Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>>>>>>
>>>>>>>
>>
>>
>> --
>> Thanks & Regards,
>> Ankit.
>>
>
>
>
> --
> Thanks & Regards,
> Ankit.
>

Re: Zeppelin - Spark Driver location

Posted by Jeff Zhang <zj...@gmail.com>.

spark-submit would only run when you run the first paragraph using spark
interpreter. After that, paragraph would send code to the spark app to
execute.

>>> Also spark standalone cluster moder should work even before this new
release, right?
I didn't verify that, not sure whether other people veryfit.


ankit jain <an...@gmail.com>于2018年3月15日周四 上午4:32写道：

> Also spark standalone cluster moder should work even before this new
> release, right?
>
> On Wed, Mar 14, 2018 at 8:43 AM, ankit jain <an...@gmail.com>
> wrote:
>
>> Hi Jhang,
>> Not clear on that - I thought spark-submit was done when we run a
>> paragraph, how does the .sh file come into play?
>>
>> Thanks
>> Ankit
>>
>> On Tue, Mar 13, 2018 at 5:43 PM, Jeff Zhang <zj...@gmail.com> wrote:
>>
>>>
>>> spark-submit is called in bin/interpreter.sh,  I didn't try standalone
>>> cluster mode. It is expected to run driver in separate host, but didn't
>>> guaranteed zeppelin support this.
>>>
>>> Ankit Jain <an...@gmail.com>于2018年3月14日周三 上午8:34写道：
>>>
>>>> Hi Jhang,
>>>> What is the expected behavior with standalone cluster mode? Should we
>>>> see separate driver processes in the cluster(one per user) or multiple
>>>> SparkSubmit processes?
>>>>
>>>> I was trying to dig in Zeppelin code & didn’t see where Zeppelin does
>>>> the Spark-submit to the cluster? Can you please point to it?
>>>>
>>>> Thanks
>>>> Ankit
>>>>
>>>> On Mar 13, 2018, at 5:25 PM, Jeff Zhang <zj...@gmail.com> wrote:
>>>>
>>>>
>>>> ZEPPELIN-2898 <https://issues.apache.org/jira/browse/ZEPPELIN-2898> is
>>>> for yarn cluster model.  And Zeppelin have integration test for yarn mode,
>>>> so guaranteed it would work. But don't' have test for standalone, so not
>>>> sure the behavior of standalone mode.
>>>>
>>>>
>>>> Ruslan Dautkhanov <da...@gmail.com>于2018年3月14日周三 上午8:06写道：
>>>>
>>>>> https://github.com/apache/zeppelin/pull/2577 pronounces yarn-cluster
>>>>> in it's title so I assume it's only yarn-cluster.
>>>>> Never used standalone-cluster myself.
>>>>>
>>>>> Which distro of Hadoop do you use?
>>>>> Cloudera desupported standalone in CDH 5.5 and will remove in CDH 6.
>>>>>
>>>>> https://www.cloudera.com/documentation/enterprise/release-notes/topics/rg_deprecated.html
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Ruslan Dautkhanov
>>>>>
>>>>> On Tue, Mar 13, 2018 at 5:45 PM, Jhon Anderson Cardenas Diaz <
>>>>> jhonderson2007@gmail.com> wrote:
>>>>>
>>>>>> Does this new feature work only for yarn-cluster ?. Or for spark
>>>>>> standalone too ?
>>>>>>
>>>>> El mar., 13 de mar. de 2018 18:34, Ruslan Dautkhanov <
>>>>>> dautkhanov@gmail.com> escribió:
>>>>>>
>>>>> > Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>>>>>
>>>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-2898 was merged end
>>>>>>> of September so not sure if you have that.
>>>>>>>
>>>>>>> Check out
>>>>>>> https://medium.com/@zjffdu/zeppelin-0-8-0-new-features-ea53e8810235
>>>>>>> how to set this up.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ruslan Dautkhanov
>>>>>>>
>>>>>>> On Tue, Mar 13, 2018 at 5:24 PM, Jhon Anderson Cardenas Diaz <
>>>>>>> jhonderson2007@gmail.com> wrote:
>>>>>>>
>>>>>> Hi zeppelin users !
>>>>>>>>
>>>>>>>> I am working with zeppelin pointing to a spark in standalone. I am
>>>>>>>> trying to figure out a way to make zeppelin runs the spark driver outside
>>>>>>>> of client process that submits the application.
>>>>>>>>
>>>>>>>> According with the documentation (
>>>>>>>> http://spark.apache.org/docs/2.1.1/spark-standalone.html):
>>>>>>>>
>>>>>>>> *For standalone clusters, Spark currently supports two deploy
>>>>>>>> modes. In client mode, the driver is launched in the same process as the
>>>>>>>> client that submits the application. In cluster mode, however, the driver
>>>>>>>> is launched from one of the Worker processes inside the cluster, and the
>>>>>>>> client process exits as soon as it fulfills its responsibility of
>>>>>>>> submitting the application without waiting for the application to finish.*
>>>>>>>>
>>>>>>>> The problem is that, even when I set the properties for
>>>>>>>> spark-standalone cluster and deploy mode in cluster, the driver still run
>>>>>>>> inside zeppelin machine (according with spark UI/executors page). These are
>>>>>>>> properties that I am setting for the spark interpreter:
>>>>>>>>
>>>>>>>> master: spark://<master-name>:7077
>>>>>>>> spark.submit.deployMode: cluster
>>>>>>>> spark.executor.memory: 16g
>>>>>>>>
>>>>>>>> Any ideas would be appreciated.
>>>>>>>>
>>>>>>>> Thank you
>>>>>>>>
>>>>>>>> Details:
>>>>>>>> Spark version: 2.1.1
>>>>>>>> Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>>>>>>
>>>>>>>
>>
>>
>> --
>> Thanks & Regards,
>> Ankit.
>>
>
>
>
> --
> Thanks & Regards,
> Ankit.
>

Re: Zeppelin - Spark Driver location

Posted by ankit jain <an...@gmail.com>.

Also spark standalone cluster moder should work even before this new
release, right?

On Wed, Mar 14, 2018 at 8:43 AM, ankit jain <an...@gmail.com> wrote:

> Hi Jhang,
> Not clear on that - I thought spark-submit was done when we run a
> paragraph, how does the .sh file come into play?
>
> Thanks
> Ankit
>
> On Tue, Mar 13, 2018 at 5:43 PM, Jeff Zhang <zj...@gmail.com> wrote:
>
>>
>> spark-submit is called in bin/interpreter.sh,  I didn't try standalone
>> cluster mode. It is expected to run driver in separate host, but didn't
>> guaranteed zeppelin support this.
>>
>> Ankit Jain <an...@gmail.com>于2018年3月14日周三 上午8:34写道：
>>
>>> Hi Jhang,
>>> What is the expected behavior with standalone cluster mode? Should we
>>> see separate driver processes in the cluster(one per user) or multiple
>>> SparkSubmit processes?
>>>
>>> I was trying to dig in Zeppelin code & didn’t see where Zeppelin does
>>> the Spark-submit to the cluster? Can you please point to it?
>>>
>>> Thanks
>>> Ankit
>>>
>>> On Mar 13, 2018, at 5:25 PM, Jeff Zhang <zj...@gmail.com> wrote:
>>>
>>>
>>> ZEPPELIN-2898 <https://issues.apache.org/jira/browse/ZEPPELIN-2898> is
>>> for yarn cluster model.  And Zeppelin have integration test for yarn mode,
>>> so guaranteed it would work. But don't' have test for standalone, so not
>>> sure the behavior of standalone mode.
>>>
>>>
>>> Ruslan Dautkhanov <da...@gmail.com>于2018年3月14日周三 上午8:06写道：
>>>
>>>> https://github.com/apache/zeppelin/pull/2577 pronounces yarn-cluster
>>>> in it's title so I assume it's only yarn-cluster.
>>>> Never used standalone-cluster myself.
>>>>
>>>> Which distro of Hadoop do you use?
>>>> Cloudera desupported standalone in CDH 5.5 and will remove in CDH 6.
>>>> https://www.cloudera.com/documentation/enterprise/release-
>>>> notes/topics/rg_deprecated.html
>>>>
>>>>
>>>>
>>>> --
>>>> Ruslan Dautkhanov
>>>>
>>>> On Tue, Mar 13, 2018 at 5:45 PM, Jhon Anderson Cardenas Diaz <
>>>> jhonderson2007@gmail.com> wrote:
>>>>
>>>>> Does this new feature work only for yarn-cluster ?. Or for spark
>>>>> standalone too ?
>>>>>
>>>> El mar., 13 de mar. de 2018 18:34, Ruslan Dautkhanov <
>>>>> dautkhanov@gmail.com> escribió:
>>>>>
>>>> > Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-2898 was merged end
>>>>>> of September so not sure if you have that.
>>>>>>
>>>>>> Check out https://medium.com/@zjffdu/zeppelin-0-8-0-new-features-
>>>>>> ea53e8810235 how to set this up.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ruslan Dautkhanov
>>>>>>
>>>>>> On Tue, Mar 13, 2018 at 5:24 PM, Jhon Anderson Cardenas Diaz <
>>>>>> jhonderson2007@gmail.com> wrote:
>>>>>>
>>>>> Hi zeppelin users !
>>>>>>>
>>>>>>> I am working with zeppelin pointing to a spark in standalone. I am
>>>>>>> trying to figure out a way to make zeppelin runs the spark driver outside
>>>>>>> of client process that submits the application.
>>>>>>>
>>>>>>> According with the documentation (http://spark.apache.org/docs/
>>>>>>> 2.1.1/spark-standalone.html):
>>>>>>>
>>>>>>> *For standalone clusters, Spark currently supports two deploy modes.
>>>>>>> In client mode, the driver is launched in the same process as the client
>>>>>>> that submits the application. In cluster mode, however, the driver is
>>>>>>> launched from one of the Worker processes inside the cluster, and the
>>>>>>> client process exits as soon as it fulfills its responsibility of
>>>>>>> submitting the application without waiting for the application to finish.*
>>>>>>>
>>>>>>> The problem is that, even when I set the properties for
>>>>>>> spark-standalone cluster and deploy mode in cluster, the driver still run
>>>>>>> inside zeppelin machine (according with spark UI/executors page). These are
>>>>>>> properties that I am setting for the spark interpreter:
>>>>>>>
>>>>>>> master: spark://<master-name>:7077
>>>>>>> spark.submit.deployMode: cluster
>>>>>>> spark.executor.memory: 16g
>>>>>>>
>>>>>>> Any ideas would be appreciated.
>>>>>>>
>>>>>>> Thank you
>>>>>>>
>>>>>>> Details:
>>>>>>> Spark version: 2.1.1
>>>>>>> Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>>>>>
>>>>>>
>
>
> --
> Thanks & Regards,
> Ankit.
>



-- 
Thanks & Regards,
Ankit.

Re: Zeppelin - Spark Driver location

Posted by ankit jain <an...@gmail.com>.

Also spark standalone cluster moder should work even before this new
release, right?

On Wed, Mar 14, 2018 at 8:43 AM, ankit jain <an...@gmail.com> wrote:

> Hi Jhang,
> Not clear on that - I thought spark-submit was done when we run a
> paragraph, how does the .sh file come into play?
>
> Thanks
> Ankit
>
> On Tue, Mar 13, 2018 at 5:43 PM, Jeff Zhang <zj...@gmail.com> wrote:
>
>>
>> spark-submit is called in bin/interpreter.sh,  I didn't try standalone
>> cluster mode. It is expected to run driver in separate host, but didn't
>> guaranteed zeppelin support this.
>>
>> Ankit Jain <an...@gmail.com>于2018年3月14日周三 上午8:34写道：
>>
>>> Hi Jhang,
>>> What is the expected behavior with standalone cluster mode? Should we
>>> see separate driver processes in the cluster(one per user) or multiple
>>> SparkSubmit processes?
>>>
>>> I was trying to dig in Zeppelin code & didn’t see where Zeppelin does
>>> the Spark-submit to the cluster? Can you please point to it?
>>>
>>> Thanks
>>> Ankit
>>>
>>> On Mar 13, 2018, at 5:25 PM, Jeff Zhang <zj...@gmail.com> wrote:
>>>
>>>
>>> ZEPPELIN-2898 <https://issues.apache.org/jira/browse/ZEPPELIN-2898> is
>>> for yarn cluster model.  And Zeppelin have integration test for yarn mode,
>>> so guaranteed it would work. But don't' have test for standalone, so not
>>> sure the behavior of standalone mode.
>>>
>>>
>>> Ruslan Dautkhanov <da...@gmail.com>于2018年3月14日周三 上午8:06写道：
>>>
>>>> https://github.com/apache/zeppelin/pull/2577 pronounces yarn-cluster
>>>> in it's title so I assume it's only yarn-cluster.
>>>> Never used standalone-cluster myself.
>>>>
>>>> Which distro of Hadoop do you use?
>>>> Cloudera desupported standalone in CDH 5.5 and will remove in CDH 6.
>>>> https://www.cloudera.com/documentation/enterprise/release-
>>>> notes/topics/rg_deprecated.html
>>>>
>>>>
>>>>
>>>> --
>>>> Ruslan Dautkhanov
>>>>
>>>> On Tue, Mar 13, 2018 at 5:45 PM, Jhon Anderson Cardenas Diaz <
>>>> jhonderson2007@gmail.com> wrote:
>>>>
>>>>> Does this new feature work only for yarn-cluster ?. Or for spark
>>>>> standalone too ?
>>>>>
>>>> El mar., 13 de mar. de 2018 18:34, Ruslan Dautkhanov <
>>>>> dautkhanov@gmail.com> escribió:
>>>>>
>>>> > Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-2898 was merged end
>>>>>> of September so not sure if you have that.
>>>>>>
>>>>>> Check out https://medium.com/@zjffdu/zeppelin-0-8-0-new-features-
>>>>>> ea53e8810235 how to set this up.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ruslan Dautkhanov
>>>>>>
>>>>>> On Tue, Mar 13, 2018 at 5:24 PM, Jhon Anderson Cardenas Diaz <
>>>>>> jhonderson2007@gmail.com> wrote:
>>>>>>
>>>>> Hi zeppelin users !
>>>>>>>
>>>>>>> I am working with zeppelin pointing to a spark in standalone. I am
>>>>>>> trying to figure out a way to make zeppelin runs the spark driver outside
>>>>>>> of client process that submits the application.
>>>>>>>
>>>>>>> According with the documentation (http://spark.apache.org/docs/
>>>>>>> 2.1.1/spark-standalone.html):
>>>>>>>
>>>>>>> *For standalone clusters, Spark currently supports two deploy modes.
>>>>>>> In client mode, the driver is launched in the same process as the client
>>>>>>> that submits the application. In cluster mode, however, the driver is
>>>>>>> launched from one of the Worker processes inside the cluster, and the
>>>>>>> client process exits as soon as it fulfills its responsibility of
>>>>>>> submitting the application without waiting for the application to finish.*
>>>>>>>
>>>>>>> The problem is that, even when I set the properties for
>>>>>>> spark-standalone cluster and deploy mode in cluster, the driver still run
>>>>>>> inside zeppelin machine (according with spark UI/executors page). These are
>>>>>>> properties that I am setting for the spark interpreter:
>>>>>>>
>>>>>>> master: spark://<master-name>:7077
>>>>>>> spark.submit.deployMode: cluster
>>>>>>> spark.executor.memory: 16g
>>>>>>>
>>>>>>> Any ideas would be appreciated.
>>>>>>>
>>>>>>> Thank you
>>>>>>>
>>>>>>> Details:
>>>>>>> Spark version: 2.1.1
>>>>>>> Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>>>>>
>>>>>>
>
>
> --
> Thanks & Regards,
> Ankit.
>



-- 
Thanks & Regards,
Ankit.

Re: Zeppelin - Spark Driver location

Posted by ankit jain <an...@gmail.com>.

Hi Jhang,
Not clear on that - I thought spark-submit was done when we run a
paragraph, how does the .sh file come into play?

Thanks
Ankit

On Tue, Mar 13, 2018 at 5:43 PM, Jeff Zhang <zj...@gmail.com> wrote:

>
> spark-submit is called in bin/interpreter.sh,  I didn't try standalone
> cluster mode. It is expected to run driver in separate host, but didn't
> guaranteed zeppelin support this.
>
> Ankit Jain <an...@gmail.com>于2018年3月14日周三 上午8:34写道：
>
>> Hi Jhang,
>> What is the expected behavior with standalone cluster mode? Should we see
>> separate driver processes in the cluster(one per user) or multiple
>> SparkSubmit processes?
>>
>> I was trying to dig in Zeppelin code & didn’t see where Zeppelin does the
>> Spark-submit to the cluster? Can you please point to it?
>>
>> Thanks
>> Ankit
>>
>> On Mar 13, 2018, at 5:25 PM, Jeff Zhang <zj...@gmail.com> wrote:
>>
>>
>> ZEPPELIN-2898 <https://issues.apache.org/jira/browse/ZEPPELIN-2898> is
>> for yarn cluster model.  And Zeppelin have integration test for yarn mode,
>> so guaranteed it would work. But don't' have test for standalone, so not
>> sure the behavior of standalone mode.
>>
>>
>> Ruslan Dautkhanov <da...@gmail.com>于2018年3月14日周三 上午8:06写道：
>>
>>> https://github.com/apache/zeppelin/pull/2577 pronounces yarn-cluster in
>>> it's title so I assume it's only yarn-cluster.
>>> Never used standalone-cluster myself.
>>>
>>> Which distro of Hadoop do you use?
>>> Cloudera desupported standalone in CDH 5.5 and will remove in CDH 6.
>>> https://www.cloudera.com/documentation/enterprise/
>>> release-notes/topics/rg_deprecated.html
>>>
>>>
>>>
>>> --
>>> Ruslan Dautkhanov
>>>
>>> On Tue, Mar 13, 2018 at 5:45 PM, Jhon Anderson Cardenas Diaz <
>>> jhonderson2007@gmail.com> wrote:
>>>
>>>> Does this new feature work only for yarn-cluster ?. Or for spark
>>>> standalone too ?
>>>>
>>> El mar., 13 de mar. de 2018 18:34, Ruslan Dautkhanov <
>>>> dautkhanov@gmail.com> escribió:
>>>>
>>> > Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>>>
>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-2898 was merged end of
>>>>> September so not sure if you have that.
>>>>>
>>>>> Check out https://medium.com/@zjffdu/zeppelin-0-8-0-new-
>>>>> features-ea53e8810235 how to set this up.
>>>>>
>>>>>
>>>>> --
>>>>> Ruslan Dautkhanov
>>>>>
>>>>> On Tue, Mar 13, 2018 at 5:24 PM, Jhon Anderson Cardenas Diaz <
>>>>> jhonderson2007@gmail.com> wrote:
>>>>>
>>>> Hi zeppelin users !
>>>>>>
>>>>>> I am working with zeppelin pointing to a spark in standalone. I am
>>>>>> trying to figure out a way to make zeppelin runs the spark driver outside
>>>>>> of client process that submits the application.
>>>>>>
>>>>>> According with the documentation (http://spark.apache.org/docs/
>>>>>> 2.1.1/spark-standalone.html):
>>>>>>
>>>>>> *For standalone clusters, Spark currently supports two deploy modes.
>>>>>> In client mode, the driver is launched in the same process as the client
>>>>>> that submits the application. In cluster mode, however, the driver is
>>>>>> launched from one of the Worker processes inside the cluster, and the
>>>>>> client process exits as soon as it fulfills its responsibility of
>>>>>> submitting the application without waiting for the application to finish.*
>>>>>>
>>>>>> The problem is that, even when I set the properties for
>>>>>> spark-standalone cluster and deploy mode in cluster, the driver still run
>>>>>> inside zeppelin machine (according with spark UI/executors page). These are
>>>>>> properties that I am setting for the spark interpreter:
>>>>>>
>>>>>> master: spark://<master-name>:7077
>>>>>> spark.submit.deployMode: cluster
>>>>>> spark.executor.memory: 16g
>>>>>>
>>>>>> Any ideas would be appreciated.
>>>>>>
>>>>>> Thank you
>>>>>>
>>>>>> Details:
>>>>>> Spark version: 2.1.1
>>>>>> Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>>>>
>>>>>


-- 
Thanks & Regards,
Ankit.

Re: Zeppelin - Spark Driver location

Posted by ankit jain <an...@gmail.com>.

Hi Jhang,
Not clear on that - I thought spark-submit was done when we run a
paragraph, how does the .sh file come into play?

Thanks
Ankit

On Tue, Mar 13, 2018 at 5:43 PM, Jeff Zhang <zj...@gmail.com> wrote:

>
> spark-submit is called in bin/interpreter.sh,  I didn't try standalone
> cluster mode. It is expected to run driver in separate host, but didn't
> guaranteed zeppelin support this.
>
> Ankit Jain <an...@gmail.com>于2018年3月14日周三 上午8:34写道：
>
>> Hi Jhang,
>> What is the expected behavior with standalone cluster mode? Should we see
>> separate driver processes in the cluster(one per user) or multiple
>> SparkSubmit processes?
>>
>> I was trying to dig in Zeppelin code & didn’t see where Zeppelin does the
>> Spark-submit to the cluster? Can you please point to it?
>>
>> Thanks
>> Ankit
>>
>> On Mar 13, 2018, at 5:25 PM, Jeff Zhang <zj...@gmail.com> wrote:
>>
>>
>> ZEPPELIN-2898 <https://issues.apache.org/jira/browse/ZEPPELIN-2898> is
>> for yarn cluster model.  And Zeppelin have integration test for yarn mode,
>> so guaranteed it would work. But don't' have test for standalone, so not
>> sure the behavior of standalone mode.
>>
>>
>> Ruslan Dautkhanov <da...@gmail.com>于2018年3月14日周三 上午8:06写道：
>>
>>> https://github.com/apache/zeppelin/pull/2577 pronounces yarn-cluster in
>>> it's title so I assume it's only yarn-cluster.
>>> Never used standalone-cluster myself.
>>>
>>> Which distro of Hadoop do you use?
>>> Cloudera desupported standalone in CDH 5.5 and will remove in CDH 6.
>>> https://www.cloudera.com/documentation/enterprise/
>>> release-notes/topics/rg_deprecated.html
>>>
>>>
>>>
>>> --
>>> Ruslan Dautkhanov
>>>
>>> On Tue, Mar 13, 2018 at 5:45 PM, Jhon Anderson Cardenas Diaz <
>>> jhonderson2007@gmail.com> wrote:
>>>
>>>> Does this new feature work only for yarn-cluster ?. Or for spark
>>>> standalone too ?
>>>>
>>> El mar., 13 de mar. de 2018 18:34, Ruslan Dautkhanov <
>>>> dautkhanov@gmail.com> escribió:
>>>>
>>> > Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>>>
>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-2898 was merged end of
>>>>> September so not sure if you have that.
>>>>>
>>>>> Check out https://medium.com/@zjffdu/zeppelin-0-8-0-new-
>>>>> features-ea53e8810235 how to set this up.
>>>>>
>>>>>
>>>>> --
>>>>> Ruslan Dautkhanov
>>>>>
>>>>> On Tue, Mar 13, 2018 at 5:24 PM, Jhon Anderson Cardenas Diaz <
>>>>> jhonderson2007@gmail.com> wrote:
>>>>>
>>>> Hi zeppelin users !
>>>>>>
>>>>>> I am working with zeppelin pointing to a spark in standalone. I am
>>>>>> trying to figure out a way to make zeppelin runs the spark driver outside
>>>>>> of client process that submits the application.
>>>>>>
>>>>>> According with the documentation (http://spark.apache.org/docs/
>>>>>> 2.1.1/spark-standalone.html):
>>>>>>
>>>>>> *For standalone clusters, Spark currently supports two deploy modes.
>>>>>> In client mode, the driver is launched in the same process as the client
>>>>>> that submits the application. In cluster mode, however, the driver is
>>>>>> launched from one of the Worker processes inside the cluster, and the
>>>>>> client process exits as soon as it fulfills its responsibility of
>>>>>> submitting the application without waiting for the application to finish.*
>>>>>>
>>>>>> The problem is that, even when I set the properties for
>>>>>> spark-standalone cluster and deploy mode in cluster, the driver still run
>>>>>> inside zeppelin machine (according with spark UI/executors page). These are
>>>>>> properties that I am setting for the spark interpreter:
>>>>>>
>>>>>> master: spark://<master-name>:7077
>>>>>> spark.submit.deployMode: cluster
>>>>>> spark.executor.memory: 16g
>>>>>>
>>>>>> Any ideas would be appreciated.
>>>>>>
>>>>>> Thank you
>>>>>>
>>>>>> Details:
>>>>>> Spark version: 2.1.1
>>>>>> Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>>>>
>>>>>


-- 
Thanks & Regards,
Ankit.

Re: Zeppelin - Spark Driver location

Posted by Jeff Zhang <zj...@gmail.com>.

spark-submit is called in bin/interpreter.sh,  I didn't try standalone
cluster mode. It is expected to run driver in separate host, but didn't
guaranteed zeppelin support this.

Ankit Jain <an...@gmail.com>于2018年3月14日周三 上午8:34写道：

> Hi Jhang,
> What is the expected behavior with standalone cluster mode? Should we see
> separate driver processes in the cluster(one per user) or multiple
> SparkSubmit processes?
>
> I was trying to dig in Zeppelin code & didn’t see where Zeppelin does the
> Spark-submit to the cluster? Can you please point to it?
>
> Thanks
> Ankit
>
> On Mar 13, 2018, at 5:25 PM, Jeff Zhang <zj...@gmail.com> wrote:
>
>
> ZEPPELIN-2898 <https://issues.apache.org/jira/browse/ZEPPELIN-2898> is
> for yarn cluster model.  And Zeppelin have integration test for yarn mode,
> so guaranteed it would work. But don't' have test for standalone, so not
> sure the behavior of standalone mode.
>
>
> Ruslan Dautkhanov <da...@gmail.com>于2018年3月14日周三 上午8:06写道：
>
>> https://github.com/apache/zeppelin/pull/2577 pronounces yarn-cluster in
>> it's title so I assume it's only yarn-cluster.
>> Never used standalone-cluster myself.
>>
>> Which distro of Hadoop do you use?
>> Cloudera desupported standalone in CDH 5.5 and will remove in CDH 6.
>>
>> https://www.cloudera.com/documentation/enterprise/release-notes/topics/rg_deprecated.html
>>
>>
>>
>> --
>> Ruslan Dautkhanov
>>
>> On Tue, Mar 13, 2018 at 5:45 PM, Jhon Anderson Cardenas Diaz <
>> jhonderson2007@gmail.com> wrote:
>>
>>> Does this new feature work only for yarn-cluster ?. Or for spark
>>> standalone too ?
>>>
>> El mar., 13 de mar. de 2018 18:34, Ruslan Dautkhanov <
>>> dautkhanov@gmail.com> escribió:
>>>
>> > Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>>
>>>> https://issues.apache.org/jira/browse/ZEPPELIN-2898 was merged end of
>>>> September so not sure if you have that.
>>>>
>>>> Check out
>>>> https://medium.com/@zjffdu/zeppelin-0-8-0-new-features-ea53e8810235
>>>> how to set this up.
>>>>
>>>>
>>>> --
>>>> Ruslan Dautkhanov
>>>>
>>>> On Tue, Mar 13, 2018 at 5:24 PM, Jhon Anderson Cardenas Diaz <
>>>> jhonderson2007@gmail.com> wrote:
>>>>
>>> Hi zeppelin users !
>>>>>
>>>>> I am working with zeppelin pointing to a spark in standalone. I am
>>>>> trying to figure out a way to make zeppelin runs the spark driver outside
>>>>> of client process that submits the application.
>>>>>
>>>>> According with the documentation (
>>>>> http://spark.apache.org/docs/2.1.1/spark-standalone.html):
>>>>>
>>>>> *For standalone clusters, Spark currently supports two deploy modes.
>>>>> In client mode, the driver is launched in the same process as the client
>>>>> that submits the application. In cluster mode, however, the driver is
>>>>> launched from one of the Worker processes inside the cluster, and the
>>>>> client process exits as soon as it fulfills its responsibility of
>>>>> submitting the application without waiting for the application to finish.*
>>>>>
>>>>> The problem is that, even when I set the properties for
>>>>> spark-standalone cluster and deploy mode in cluster, the driver still run
>>>>> inside zeppelin machine (according with spark UI/executors page). These are
>>>>> properties that I am setting for the spark interpreter:
>>>>>
>>>>> master: spark://<master-name>:7077
>>>>> spark.submit.deployMode: cluster
>>>>> spark.executor.memory: 16g
>>>>>
>>>>> Any ideas would be appreciated.
>>>>>
>>>>> Thank you
>>>>>
>>>>> Details:
>>>>> Spark version: 2.1.1
>>>>> Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>>>
>>>>

Re: Zeppelin - Spark Driver location

Posted by Jeff Zhang <zj...@gmail.com>.

spark-submit is called in bin/interpreter.sh,  I didn't try standalone
cluster mode. It is expected to run driver in separate host, but didn't
guaranteed zeppelin support this.

Ankit Jain <an...@gmail.com>于2018年3月14日周三 上午8:34写道：

> Hi Jhang,
> What is the expected behavior with standalone cluster mode? Should we see
> separate driver processes in the cluster(one per user) or multiple
> SparkSubmit processes?
>
> I was trying to dig in Zeppelin code & didn’t see where Zeppelin does the
> Spark-submit to the cluster? Can you please point to it?
>
> Thanks
> Ankit
>
> On Mar 13, 2018, at 5:25 PM, Jeff Zhang <zj...@gmail.com> wrote:
>
>
> ZEPPELIN-2898 <https://issues.apache.org/jira/browse/ZEPPELIN-2898> is
> for yarn cluster model.  And Zeppelin have integration test for yarn mode,
> so guaranteed it would work. But don't' have test for standalone, so not
> sure the behavior of standalone mode.
>
>
> Ruslan Dautkhanov <da...@gmail.com>于2018年3月14日周三 上午8:06写道：
>
>> https://github.com/apache/zeppelin/pull/2577 pronounces yarn-cluster in
>> it's title so I assume it's only yarn-cluster.
>> Never used standalone-cluster myself.
>>
>> Which distro of Hadoop do you use?
>> Cloudera desupported standalone in CDH 5.5 and will remove in CDH 6.
>>
>> https://www.cloudera.com/documentation/enterprise/release-notes/topics/rg_deprecated.html
>>
>>
>>
>> --
>> Ruslan Dautkhanov
>>
>> On Tue, Mar 13, 2018 at 5:45 PM, Jhon Anderson Cardenas Diaz <
>> jhonderson2007@gmail.com> wrote:
>>
>>> Does this new feature work only for yarn-cluster ?. Or for spark
>>> standalone too ?
>>>
>> El mar., 13 de mar. de 2018 18:34, Ruslan Dautkhanov <
>>> dautkhanov@gmail.com> escribió:
>>>
>> > Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>>
>>>> https://issues.apache.org/jira/browse/ZEPPELIN-2898 was merged end of
>>>> September so not sure if you have that.
>>>>
>>>> Check out
>>>> https://medium.com/@zjffdu/zeppelin-0-8-0-new-features-ea53e8810235
>>>> how to set this up.
>>>>
>>>>
>>>> --
>>>> Ruslan Dautkhanov
>>>>
>>>> On Tue, Mar 13, 2018 at 5:24 PM, Jhon Anderson Cardenas Diaz <
>>>> jhonderson2007@gmail.com> wrote:
>>>>
>>> Hi zeppelin users !
>>>>>
>>>>> I am working with zeppelin pointing to a spark in standalone. I am
>>>>> trying to figure out a way to make zeppelin runs the spark driver outside
>>>>> of client process that submits the application.
>>>>>
>>>>> According with the documentation (
>>>>> http://spark.apache.org/docs/2.1.1/spark-standalone.html):
>>>>>
>>>>> *For standalone clusters, Spark currently supports two deploy modes.
>>>>> In client mode, the driver is launched in the same process as the client
>>>>> that submits the application. In cluster mode, however, the driver is
>>>>> launched from one of the Worker processes inside the cluster, and the
>>>>> client process exits as soon as it fulfills its responsibility of
>>>>> submitting the application without waiting for the application to finish.*
>>>>>
>>>>> The problem is that, even when I set the properties for
>>>>> spark-standalone cluster and deploy mode in cluster, the driver still run
>>>>> inside zeppelin machine (according with spark UI/executors page). These are
>>>>> properties that I am setting for the spark interpreter:
>>>>>
>>>>> master: spark://<master-name>:7077
>>>>> spark.submit.deployMode: cluster
>>>>> spark.executor.memory: 16g
>>>>>
>>>>> Any ideas would be appreciated.
>>>>>
>>>>> Thank you
>>>>>
>>>>> Details:
>>>>> Spark version: 2.1.1
>>>>> Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>>>
>>>>

Re: Zeppelin - Spark Driver location

Posted by Ankit Jain <an...@gmail.com>.

Hi Jhang,
What is the expected behavior with standalone cluster mode? Should we see separate driver processes in the cluster(one per user) or multiple SparkSubmit processes?

I was trying to dig in Zeppelin code & didn’t see where Zeppelin does the Spark-submit to the cluster? Can you please point to it?

Thanks
Ankit

> On Mar 13, 2018, at 5:25 PM, Jeff Zhang <zj...@gmail.com> wrote:
> 
> 
> ZEPPELIN-2898 is for yarn cluster model.  And Zeppelin have integration test for yarn mode, so guaranteed it would work. But don't' have test for standalone, so not sure the behavior of standalone mode. 
> 
> 
> Ruslan Dautkhanov <da...@gmail.com>于2018年3月14日周三 上午8:06写道：
>> https://github.com/apache/zeppelin/pull/2577 pronounces yarn-cluster in it's title so I assume it's only yarn-cluster.
>> Never used standalone-cluster myself. 
>> 
>> Which distro of Hadoop do you use?
>> Cloudera desupported standalone in CDH 5.5 and will remove in CDH 6.
>> https://www.cloudera.com/documentation/enterprise/release-notes/topics/rg_deprecated.html
>> 
>> 
>> 
>> -- 
>> Ruslan Dautkhanov
>> 
>>> On Tue, Mar 13, 2018 at 5:45 PM, Jhon Anderson Cardenas Diaz <jh...@gmail.com> wrote:
>> 
>>> Does this new feature work only for yarn-cluster ?. Or for spark standalone too ?
>> 
>>> El mar., 13 de mar. de 2018 18:34, Ruslan Dautkhanov <da...@gmail.com> escribió:
>> 
>>>> > Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>> 
>>>> https://issues.apache.org/jira/browse/ZEPPELIN-2898 was merged end of September so not sure if you have that.
>>>> 
>>>> Check out https://medium.com/@zjffdu/zeppelin-0-8-0-new-features-ea53e8810235 how to set this up.
>>>> 
>> 
>>>> 
>>>> -- 
>>>> Ruslan Dautkhanov
>>>> 
>> 
>>>>> On Tue, Mar 13, 2018 at 5:24 PM, Jhon Anderson Cardenas Diaz <jh...@gmail.com> wrote:
>> 
>>>>> Hi zeppelin users !
>>>>> 
>>>>> I am working with zeppelin pointing to a spark in standalone. I am trying to figure out a way to make zeppelin runs the spark driver outside of client process that submits the application.
>>>>> 
>>>>> According with the documentation (http://spark.apache.org/docs/2.1.1/spark-standalone.html):
>>>>> 
>>>>> For standalone clusters, Spark currently supports two deploy modes. In client mode, the driver is launched in the same process as the client that submits the application. In cluster mode, however, the driver is launched from one of the Worker processes inside the cluster, and the client process exits as soon as it fulfills its responsibility of submitting the application without waiting for the application to finish.
>>>>> 
>>>>> The problem is that, even when I set the properties for spark-standalone cluster and deploy mode in cluster, the driver still run inside zeppelin machine (according with spark UI/executors page). These are properties that I am setting for the spark interpreter:
>>>>> 
>>>>> master: spark://<master-name>:7077
>>>>> spark.submit.deployMode: cluster
>>>>> spark.executor.memory: 16g
>>>>> 
>>>>> Any ideas would be appreciated.
>>>>> 
>>>>> Thank you
>>>>> 
>>>>> Details:
>>>>> Spark version: 2.1.1
>>>>> Zeppelin version: 0.8.0 (merged at September 2017 version)

Re: Zeppelin - Spark Driver location

Posted by Ankit Jain <an...@gmail.com>.

Hi Jhang,
What is the expected behavior with standalone cluster mode? Should we see separate driver processes in the cluster(one per user) or multiple SparkSubmit processes?

I was trying to dig in Zeppelin code & didn’t see where Zeppelin does the Spark-submit to the cluster? Can you please point to it?

Thanks
Ankit

> On Mar 13, 2018, at 5:25 PM, Jeff Zhang <zj...@gmail.com> wrote:
> 
> 
> ZEPPELIN-2898 is for yarn cluster model.  And Zeppelin have integration test for yarn mode, so guaranteed it would work. But don't' have test for standalone, so not sure the behavior of standalone mode. 
> 
> 
> Ruslan Dautkhanov <da...@gmail.com>于2018年3月14日周三 上午8:06写道：
>> https://github.com/apache/zeppelin/pull/2577 pronounces yarn-cluster in it's title so I assume it's only yarn-cluster.
>> Never used standalone-cluster myself. 
>> 
>> Which distro of Hadoop do you use?
>> Cloudera desupported standalone in CDH 5.5 and will remove in CDH 6.
>> https://www.cloudera.com/documentation/enterprise/release-notes/topics/rg_deprecated.html
>> 
>> 
>> 
>> -- 
>> Ruslan Dautkhanov
>> 
>>> On Tue, Mar 13, 2018 at 5:45 PM, Jhon Anderson Cardenas Diaz <jh...@gmail.com> wrote:
>> 
>>> Does this new feature work only for yarn-cluster ?. Or for spark standalone too ?
>> 
>>> El mar., 13 de mar. de 2018 18:34, Ruslan Dautkhanov <da...@gmail.com> escribió:
>> 
>>>> > Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>> 
>>>> https://issues.apache.org/jira/browse/ZEPPELIN-2898 was merged end of September so not sure if you have that.
>>>> 
>>>> Check out https://medium.com/@zjffdu/zeppelin-0-8-0-new-features-ea53e8810235 how to set this up.
>>>> 
>> 
>>>> 
>>>> -- 
>>>> Ruslan Dautkhanov
>>>> 
>> 
>>>>> On Tue, Mar 13, 2018 at 5:24 PM, Jhon Anderson Cardenas Diaz <jh...@gmail.com> wrote:
>> 
>>>>> Hi zeppelin users !
>>>>> 
>>>>> I am working with zeppelin pointing to a spark in standalone. I am trying to figure out a way to make zeppelin runs the spark driver outside of client process that submits the application.
>>>>> 
>>>>> According with the documentation (http://spark.apache.org/docs/2.1.1/spark-standalone.html):
>>>>> 
>>>>> For standalone clusters, Spark currently supports two deploy modes. In client mode, the driver is launched in the same process as the client that submits the application. In cluster mode, however, the driver is launched from one of the Worker processes inside the cluster, and the client process exits as soon as it fulfills its responsibility of submitting the application without waiting for the application to finish.
>>>>> 
>>>>> The problem is that, even when I set the properties for spark-standalone cluster and deploy mode in cluster, the driver still run inside zeppelin machine (according with spark UI/executors page). These are properties that I am setting for the spark interpreter:
>>>>> 
>>>>> master: spark://<master-name>:7077
>>>>> spark.submit.deployMode: cluster
>>>>> spark.executor.memory: 16g
>>>>> 
>>>>> Any ideas would be appreciated.
>>>>> 
>>>>> Thank you
>>>>> 
>>>>> Details:
>>>>> Spark version: 2.1.1
>>>>> Zeppelin version: 0.8.0 (merged at September 2017 version)

Re: Zeppelin - Spark Driver location

Posted by Jeff Zhang <zj...@gmail.com>.

ZEPPELIN-2898 <https://issues.apache.org/jira/browse/ZEPPELIN-2898> is for
yarn cluster model.  And Zeppelin have integration test for yarn mode, so
guaranteed it would work. But don't' have test for standalone, so not sure
the behavior of standalone mode.


Ruslan Dautkhanov <da...@gmail.com>于2018年3月14日周三 上午8:06写道：

> https://github.com/apache/zeppelin/pull/2577 pronounces yarn-cluster in
> it's title so I assume it's only yarn-cluster.
> Never used standalone-cluster myself.
>
> Which distro of Hadoop do you use?
> Cloudera desupported standalone in CDH 5.5 and will remove in CDH 6.
>
> https://www.cloudera.com/documentation/enterprise/release-notes/topics/rg_deprecated.html
>
>
>
> --
> Ruslan Dautkhanov
>
> On Tue, Mar 13, 2018 at 5:45 PM, Jhon Anderson Cardenas Diaz <
> jhonderson2007@gmail.com> wrote:
>
>> Does this new feature work only for yarn-cluster ?. Or for spark
>> standalone too ?
>>
> El mar., 13 de mar. de 2018 18:34, Ruslan Dautkhanov <da...@gmail.com>
>> escribió:
>>
> > Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>
>>> https://issues.apache.org/jira/browse/ZEPPELIN-2898 was merged end of
>>> September so not sure if you have that.
>>>
>>> Check out
>>> https://medium.com/@zjffdu/zeppelin-0-8-0-new-features-ea53e8810235 how
>>> to set this up.
>>>
>>>
>>> --
>>> Ruslan Dautkhanov
>>>
>>> On Tue, Mar 13, 2018 at 5:24 PM, Jhon Anderson Cardenas Diaz <
>>> jhonderson2007@gmail.com> wrote:
>>>
>> Hi zeppelin users !
>>>>
>>>> I am working with zeppelin pointing to a spark in standalone. I am
>>>> trying to figure out a way to make zeppelin runs the spark driver outside
>>>> of client process that submits the application.
>>>>
>>>> According with the documentation (
>>>> http://spark.apache.org/docs/2.1.1/spark-standalone.html):
>>>>
>>>> *For standalone clusters, Spark currently supports two deploy modes.
>>>> In client mode, the driver is launched in the same process as the client
>>>> that submits the application. In cluster mode, however, the driver is
>>>> launched from one of the Worker processes inside the cluster, and the
>>>> client process exits as soon as it fulfills its responsibility of
>>>> submitting the application without waiting for the application to finish.*
>>>>
>>>> The problem is that, even when I set the properties for
>>>> spark-standalone cluster and deploy mode in cluster, the driver still run
>>>> inside zeppelin machine (according with spark UI/executors page). These are
>>>> properties that I am setting for the spark interpreter:
>>>>
>>>> master: spark://<master-name>:7077
>>>> spark.submit.deployMode: cluster
>>>> spark.executor.memory: 16g
>>>>
>>>> Any ideas would be appreciated.
>>>>
>>>> Thank you
>>>>
>>>> Details:
>>>> Spark version: 2.1.1
>>>> Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>>
>>>

Re: Zeppelin - Spark Driver location

Posted by Jeff Zhang <zj...@gmail.com>.

ZEPPELIN-2898 <https://issues.apache.org/jira/browse/ZEPPELIN-2898> is for
yarn cluster model.  And Zeppelin have integration test for yarn mode, so
guaranteed it would work. But don't' have test for standalone, so not sure
the behavior of standalone mode.


Ruslan Dautkhanov <da...@gmail.com>于2018年3月14日周三 上午8:06写道：

> https://github.com/apache/zeppelin/pull/2577 pronounces yarn-cluster in
> it's title so I assume it's only yarn-cluster.
> Never used standalone-cluster myself.
>
> Which distro of Hadoop do you use?
> Cloudera desupported standalone in CDH 5.5 and will remove in CDH 6.
>
> https://www.cloudera.com/documentation/enterprise/release-notes/topics/rg_deprecated.html
>
>
>
> --
> Ruslan Dautkhanov
>
> On Tue, Mar 13, 2018 at 5:45 PM, Jhon Anderson Cardenas Diaz <
> jhonderson2007@gmail.com> wrote:
>
>> Does this new feature work only for yarn-cluster ?. Or for spark
>> standalone too ?
>>
> El mar., 13 de mar. de 2018 18:34, Ruslan Dautkhanov <da...@gmail.com>
>> escribió:
>>
> > Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>
>>> https://issues.apache.org/jira/browse/ZEPPELIN-2898 was merged end of
>>> September so not sure if you have that.
>>>
>>> Check out
>>> https://medium.com/@zjffdu/zeppelin-0-8-0-new-features-ea53e8810235 how
>>> to set this up.
>>>
>>>
>>> --
>>> Ruslan Dautkhanov
>>>
>>> On Tue, Mar 13, 2018 at 5:24 PM, Jhon Anderson Cardenas Diaz <
>>> jhonderson2007@gmail.com> wrote:
>>>
>> Hi zeppelin users !
>>>>
>>>> I am working with zeppelin pointing to a spark in standalone. I am
>>>> trying to figure out a way to make zeppelin runs the spark driver outside
>>>> of client process that submits the application.
>>>>
>>>> According with the documentation (
>>>> http://spark.apache.org/docs/2.1.1/spark-standalone.html):
>>>>
>>>> *For standalone clusters, Spark currently supports two deploy modes.
>>>> In client mode, the driver is launched in the same process as the client
>>>> that submits the application. In cluster mode, however, the driver is
>>>> launched from one of the Worker processes inside the cluster, and the
>>>> client process exits as soon as it fulfills its responsibility of
>>>> submitting the application without waiting for the application to finish.*
>>>>
>>>> The problem is that, even when I set the properties for
>>>> spark-standalone cluster and deploy mode in cluster, the driver still run
>>>> inside zeppelin machine (according with spark UI/executors page). These are
>>>> properties that I am setting for the spark interpreter:
>>>>
>>>> master: spark://<master-name>:7077
>>>> spark.submit.deployMode: cluster
>>>> spark.executor.memory: 16g
>>>>
>>>> Any ideas would be appreciated.
>>>>
>>>> Thank you
>>>>
>>>> Details:
>>>> Spark version: 2.1.1
>>>> Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>>
>>>

Re: Zeppelin - Spark Driver location

Posted by Ruslan Dautkhanov <da...@gmail.com>.

https://github.com/apache/zeppelin/pull/2577 pronounces yarn-cluster in
it's title so I assume it's only yarn-cluster.
Never used standalone-cluster myself.

Which distro of Hadoop do you use?
Cloudera desupported standalone in CDH 5.5 and will remove in CDH 6.
https://www.cloudera.com/documentation/enterprise/release-notes/topics/rg_deprecated.html



-- 
Ruslan Dautkhanov

On Tue, Mar 13, 2018 at 5:45 PM, Jhon Anderson Cardenas Diaz <
jhonderson2007@gmail.com> wrote:

> Does this new feature work only for yarn-cluster ?. Or for spark
> standalone too ?
>
> El mar., 13 de mar. de 2018 18:34, Ruslan Dautkhanov <da...@gmail.com>
> escribió:
>
>> > Zeppelin version: 0.8.0 (merged at September 2017 version)
>>
>> https://issues.apache.org/jira/browse/ZEPPELIN-2898 was merged end of
>> September so not sure if you have that.
>>
>> Check out https://medium.com/@zjffdu/zeppelin-0-8-0-new-
>> features-ea53e8810235 how to set this up.
>>
>>
>>
>> --
>> Ruslan Dautkhanov
>>
>> On Tue, Mar 13, 2018 at 5:24 PM, Jhon Anderson Cardenas Diaz <
>> jhonderson2007@gmail.com> wrote:
>>
>>> Hi zeppelin users !
>>>
>>> I am working with zeppelin pointing to a spark in standalone. I am
>>> trying to figure out a way to make zeppelin runs the spark driver outside
>>> of client process that submits the application.
>>>
>>> According with the documentation (http://spark.apache.org/docs/
>>> 2.1.1/spark-standalone.html):
>>>
>>> *For standalone clusters, Spark currently supports two deploy modes.
>>> In client mode, the driver is launched in the same process as the client
>>> that submits the application. In cluster mode, however, the driver is
>>> launched from one of the Worker processes inside the cluster, and the
>>> client process exits as soon as it fulfills its responsibility of
>>> submitting the application without waiting for the application to finish.*
>>>
>>> The problem is that, even when I set the properties for spark-standalone
>>> cluster and deploy mode in cluster, the driver still run inside zeppelin
>>> machine (according with spark UI/executors page). These are properties that
>>> I am setting for the spark interpreter:
>>>
>>> master: spark://<master-name>:7077
>>> spark.submit.deployMode: cluster
>>> spark.executor.memory: 16g
>>>
>>> Any ideas would be appreciated.
>>>
>>> Thank you
>>>
>>> Details:
>>> Spark version: 2.1.1
>>> Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>
>>
>>

Re: Zeppelin - Spark Driver location

Posted by Jhon Anderson Cardenas Diaz <jh...@gmail.com>.

Does this new feature work only for yarn-cluster ?. Or for spark standalone
too ?

El mar., 13 de mar. de 2018 18:34, Ruslan Dautkhanov <da...@gmail.com>
escribió:

> > Zeppelin version: 0.8.0 (merged at September 2017 version)
>
> https://issues.apache.org/jira/browse/ZEPPELIN-2898 was merged end of
> September so not sure if you have that.
>
> Check out
> https://medium.com/@zjffdu/zeppelin-0-8-0-new-features-ea53e8810235 how
> to set this up.
>
>
>
> --
> Ruslan Dautkhanov
>
> On Tue, Mar 13, 2018 at 5:24 PM, Jhon Anderson Cardenas Diaz <
> jhonderson2007@gmail.com> wrote:
>
>> Hi zeppelin users !
>>
>> I am working with zeppelin pointing to a spark in standalone. I am trying
>> to figure out a way to make zeppelin runs the spark driver outside of
>> client process that submits the application.
>>
>> According with the documentation (
>> http://spark.apache.org/docs/2.1.1/spark-standalone.html):
>>
>> *For standalone clusters, Spark currently supports two deploy modes.
>> In client mode, the driver is launched in the same process as the client
>> that submits the application. In cluster mode, however, the driver is
>> launched from one of the Worker processes inside the cluster, and the
>> client process exits as soon as it fulfills its responsibility of
>> submitting the application without waiting for the application to finish.*
>>
>> The problem is that, even when I set the properties for spark-standalone
>> cluster and deploy mode in cluster, the driver still run inside zeppelin
>> machine (according with spark UI/executors page). These are properties that
>> I am setting for the spark interpreter:
>>
>> master: spark://<master-name>:7077
>> spark.submit.deployMode: cluster
>> spark.executor.memory: 16g
>>
>> Any ideas would be appreciated.
>>
>> Thank you
>>
>> Details:
>> Spark version: 2.1.1
>> Zeppelin version: 0.8.0 (merged at September 2017 version)
>>
>
>

Re: Zeppelin - Spark Driver location

Posted by Jhon Anderson Cardenas Diaz <jh...@gmail.com>.

Does this new feature work only for yarn-cluster ?. Or for spark standalone
too ?

El mar., 13 de mar. de 2018 18:34, Ruslan Dautkhanov <da...@gmail.com>
escribió:

> > Zeppelin version: 0.8.0 (merged at September 2017 version)
>
> https://issues.apache.org/jira/browse/ZEPPELIN-2898 was merged end of
> September so not sure if you have that.
>
> Check out
> https://medium.com/@zjffdu/zeppelin-0-8-0-new-features-ea53e8810235 how
> to set this up.
>
>
>
> --
> Ruslan Dautkhanov
>
> On Tue, Mar 13, 2018 at 5:24 PM, Jhon Anderson Cardenas Diaz <
> jhonderson2007@gmail.com> wrote:
>
>> Hi zeppelin users !
>>
>> I am working with zeppelin pointing to a spark in standalone. I am trying
>> to figure out a way to make zeppelin runs the spark driver outside of
>> client process that submits the application.
>>
>> According with the documentation (
>> http://spark.apache.org/docs/2.1.1/spark-standalone.html):
>>
>> *For standalone clusters, Spark currently supports two deploy modes.
>> In client mode, the driver is launched in the same process as the client
>> that submits the application. In cluster mode, however, the driver is
>> launched from one of the Worker processes inside the cluster, and the
>> client process exits as soon as it fulfills its responsibility of
>> submitting the application without waiting for the application to finish.*
>>
>> The problem is that, even when I set the properties for spark-standalone
>> cluster and deploy mode in cluster, the driver still run inside zeppelin
>> machine (according with spark UI/executors page). These are properties that
>> I am setting for the spark interpreter:
>>
>> master: spark://<master-name>:7077
>> spark.submit.deployMode: cluster
>> spark.executor.memory: 16g
>>
>> Any ideas would be appreciated.
>>
>> Thank you
>>
>> Details:
>> Spark version: 2.1.1
>> Zeppelin version: 0.8.0 (merged at September 2017 version)
>>
>
>

Re: Zeppelin - Spark Driver location

Posted by Ruslan Dautkhanov <da...@gmail.com>.

 > Zeppelin version: 0.8.0 (merged at September 2017 version)

https://issues.apache.org/jira/browse/ZEPPELIN-2898 was merged end of
September so not sure if you have that.

Check out
https://medium.com/@zjffdu/zeppelin-0-8-0-new-features-ea53e8810235 how to
set this up.



-- 
Ruslan Dautkhanov

On Tue, Mar 13, 2018 at 5:24 PM, Jhon Anderson Cardenas Diaz <
jhonderson2007@gmail.com> wrote:

> Hi zeppelin users !
>
> I am working with zeppelin pointing to a spark in standalone. I am trying
> to figure out a way to make zeppelin runs the spark driver outside of
> client process that submits the application.
>
> According with the documentation (http://spark.apache.org/docs/
> 2.1.1/spark-standalone.html):
>
> *For standalone clusters, Spark currently supports two deploy modes.
> In client mode, the driver is launched in the same process as the client
> that submits the application. In cluster mode, however, the driver is
> launched from one of the Worker processes inside the cluster, and the
> client process exits as soon as it fulfills its responsibility of
> submitting the application without waiting for the application to finish.*
>
> The problem is that, even when I set the properties for spark-standalone
> cluster and deploy mode in cluster, the driver still run inside zeppelin
> machine (according with spark UI/executors page). These are properties that
> I am setting for the spark interpreter:
>
> master: spark://<master-name>:7077
> spark.submit.deployMode: cluster
> spark.executor.memory: 16g
>
> Any ideas would be appreciated.
>
> Thank you
>
> Details:
> Spark version: 2.1.1
> Zeppelin version: 0.8.0 (merged at September 2017 version)
>

Re: Zeppelin - Spark Driver location

Posted by "Vannson, Raphael" <Ra...@Teradata.com>.

Hello Jhon,

Conceptually this makes sense, since Zeppelin creates a spark application for the execution runtime underneath its frontend process.

Having said this, depending on how Zeppelin is implemented, it might be required for the driver to be collocated with the zeppelin process on the same host (remember the Zeppelin notebook process needs to “talk” to the spark driver process, this might be done via a child process).
I certainly can see how a collocated design would be simpler to implement for the Zeppelin contributors which may have considered the functionality you have described but for a later release date.

So this is not a definitive answer (I don’t know the actual answer) but I would not expect this kind of setup to be supported yet.
(I tried to make it work and could not get the spark kernel to start so I just reverted to a client deploy mode instead of cluster – since this option was acceptable to me).
I would be curious to see if that is possible tough and how that would be configured.

I hope this helps (a bit).

Best,
Raphael

From: Jhon Anderson Cardenas Diaz <jh...@gmail.com>
Reply-To: "users@zeppelin.apache.org" <us...@zeppelin.apache.org>
Date: Tuesday, March 13, 2018 at 4:24 PM
To: "dev@zeppelin.apache.org" <de...@zeppelin.apache.org>, "users@zeppelin.apache.org" <us...@zeppelin.apache.org>
Subject: Zeppelin - Spark Driver location

Hi zeppelin users !

I am working with zeppelin pointing to a spark in standalone. I am trying to figure out a way to make zeppelin runs the spark driver outside of client process that submits the application.

According with the documentation (http://spark.apache.org/docs/2.1.1/spark-standalone.html):

For standalone clusters, Spark currently supports two deploy modes. In client mode, the driver is launched in the same process as the client that submits the application. In cluster mode, however, the driver is launched from one of the Worker processes inside the cluster, and the client process exits as soon as it fulfills its responsibility of submitting the application without waiting for the application to finish.

The problem is that, even when I set the properties for spark-standalone cluster and deploy mode in cluster, the driver still run inside zeppelin machine (according with spark UI/executors page). These are properties that I am setting for the spark interpreter:

master: spark://<master-name>:7077
spark.submit.deployMode: cluster
spark.executor.memory: 16g

Any ideas would be appreciated.

Thank you

Details:
Spark version: 2.1.1
Zeppelin version: 0.8.0 (merged at September 2017 version)

Re: Zeppelin - Spark Driver location

Posted by Ruslan Dautkhanov <da...@gmail.com>.

 > Zeppelin version: 0.8.0 (merged at September 2017 version)

https://issues.apache.org/jira/browse/ZEPPELIN-2898 was merged end of
September so not sure if you have that.

Check out
https://medium.com/@zjffdu/zeppelin-0-8-0-new-features-ea53e8810235 how to
set this up.



-- 
Ruslan Dautkhanov

On Tue, Mar 13, 2018 at 5:24 PM, Jhon Anderson Cardenas Diaz <
jhonderson2007@gmail.com> wrote:

> Hi zeppelin users !
>
> I am working with zeppelin pointing to a spark in standalone. I am trying
> to figure out a way to make zeppelin runs the spark driver outside of
> client process that submits the application.
>
> According with the documentation (http://spark.apache.org/docs/
> 2.1.1/spark-standalone.html):
>
> *For standalone clusters, Spark currently supports two deploy modes.
> In client mode, the driver is launched in the same process as the client
> that submits the application. In cluster mode, however, the driver is
> launched from one of the Worker processes inside the cluster, and the
> client process exits as soon as it fulfills its responsibility of
> submitting the application without waiting for the application to finish.*
>
> The problem is that, even when I set the properties for spark-standalone
> cluster and deploy mode in cluster, the driver still run inside zeppelin
> machine (according with spark UI/executors page). These are properties that
> I am setting for the spark interpreter:
>
> master: spark://<master-name>:7077
> spark.submit.deployMode: cluster
> spark.executor.memory: 16g
>
> Any ideas would be appreciated.
>
> Thank you
>
> Details:
> Spark version: 2.1.1
> Zeppelin version: 0.8.0 (merged at September 2017 version)
>