You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by Patrik Iselind <pa...@gmail.com> on 2020/05/09 12:34:15 UTC

Apache Spark master value question

Hi,

First comes some background, then I have some questions.

*Background*
I'm trying out Zeppelin 0.8.2 based on the Docker image. My Docker file
looks like this:

```Dockerfile
FROM apache/zeppelin:0.8.2


# Install Java and some tools
RUN apt-get -y update &&\
    DEBIAN_FRONTEND=noninteractive \
        apt -y install vim python3-pip

RUN python3 -m pip install -U pyspark

ENV PYSPARK_PYTHON python3
ENV PYSPARK_DRIVER_PYTHON python3
```

When I start a section like so

```Zeppelin paragraph
%pyspark

print(sc)
print()
print(dir(sc))
print()
print(sc.master)
print()
print(sc.defaultParallelism)
```

I get the following output

```output
<SparkContext master=local appName=Zeppelin> ['PACKAGE_EXTENSIONS',
'__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__',
'__eq__', '__exit__', '__format__', '__ge__', '__getattribute__',
'__getnewargs__', '__gt__', '__hash__', '__init__', '__le__', '__lt__',
'__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'__weakref__', '_accumulatorServer', '_active_spark_context', '_batchSize',
'_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init',
'_ensure_initialized', '_gateway', '_getJavaStorageLevel',
'_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock',
'_next_accum_id', '_pickled_broadcast_vars', '_python_includes',
'_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
'addFile', 'addPyFile', 'appName', 'applicationId', 'binaryFiles',
'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD',
'environment', 'getConf', 'getLocalProperty', 'getOrCreate', 'hadoopFile',
'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD',
'parallelize', 'pickleFile', 'profiler_collector', 'pythonExec',
'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer',
'setCheckpointDir', 'setJobGroup', 'setLocalProperty', 'setLogLevel',
'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
'startTime', 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union',
'version', 'wholeTextFiles'] local 1
```

This even though the "master" property in the interpretter is set to
"local[*]". I'd like to use all cores on my machine. To do that I have to
explicitly create the "spark.master" property in the spark
interpretter with the value "local[*]", then I get

```new output
<SparkContext master=local[*] appName=Zeppelin> ['PACKAGE_EXTENSIONS',
'__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__',
'__eq__', '__exit__', '__format__', '__ge__', '__getattribute__',
'__getnewargs__', '__gt__', '__hash__', '__init__', '__le__', '__lt__',
'__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'__weakref__', '_accumulatorServer', '_active_spark_context', '_batchSize',
'_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init',
'_ensure_initialized', '_gateway', '_getJavaStorageLevel',
'_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock',
'_next_accum_id', '_pickled_broadcast_vars', '_python_includes',
'_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
'addFile', 'addPyFile', 'appName', 'applicationId', 'binaryFiles',
'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD',
'environment', 'getConf', 'getLocalProperty', 'getOrCreate', 'hadoopFile',
'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD',
'parallelize', 'pickleFile', 'profiler_collector', 'pythonExec',
'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer',
'setCheckpointDir', 'setJobGroup', 'setLocalProperty', 'setLogLevel',
'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
'startTime', 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union',
'version', 'wholeTextFiles'] local[*] 8
```
This is what I want.

*The Questions*

   - Why is the "master" property not used in the created SparkContext?
   - How do I add the spark.master property to the docker image?


Any hint or support you can provide would be greatly appreciated.

Yours Sincerely,
Patrik Iselind

Re: Apache Spark master value question

Posted by Alex Ott <al...@gmail.com>.

Hi Jeff

Ok, I'll update PR this evening to remove environment variable... Although,
in some cases (for example for Docker), environment variable could be more
handy - I need to look, maybe I'll rework that part so environment variable
could be also used.

SparkLauncher is getting Spark master as master, then spark.master, then
just use local[*]. But in different places in Spark interpreter I have seen
that only spark.master is used directly - I think that this is cause the
problem.


On Mon, May 18, 2020 at 8:56 AM Jeff Zhang <zj...@gmail.com> wrote:

> The env name in interpreter.json and interpreter-setting.json is not used.
> We should remove them.
>
> I still don't understand how master & spark.master would effect the
> behavior. `master` is a legacy stuff that we introduced very long time ago,
> we definitely should use spark.master instead. But actually internally we
> do translate master to spark.master, so not sure why it would cause this
> issue, maybe it is some bugs.
>
>
>
> Alex Ott <al...@gmail.com> 于2020年5月17日周日 下午9:36写道：
>
>> I've seen somewhere in CDH documentation that they use MASTER, that's why
>> I'm asking...
>>
>> On Sun, May 17, 2020 at 3:13 PM Patrik Iselind <pa...@gmail.com>
>> wrote:
>>
>>> Thanks a lot for creating the issue. It seems I am not allowed to.
>>>
>>> As I understand it, the environment variable is supposed to be
>>> SPARK_MASTER.
>>>
>>> On Sun, May 17, 2020 at 11:56 AM Alex Ott <al...@gmail.com> wrote:
>>>
>>>> Ok, I've created a JIRA for it:
>>>> https://issues.apache.org/jira/browse/ZEPPELIN-4821 and working on
>>>> patch
>>>>
>>>> I'm not sure about environment variable name - it's simply MASTER,
>>>> should it be `SPARK_MASTER`, or it's a requirement of CDH and other Hadoop
>>>> distributions to have it as MASTER?
>>>>
>>>> On Sat, May 16, 2020 at 3:45 PM Patrik Iselind <pa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Alex,
>>>>>
>>>>> Thanks a lot for helping out with this.
>>>>>
>>>>> You're correct, but it doesn't seem that it's the
>>>>> interpreter-settings.json for Spark interpreter that is being used. It's
>>>>> conf/interpreter.json. In this file both 0.8.2 and 0.9.0 have
>>>>> ```partial-json
>>>>>     "spark": {
>>>>>       "id": "spark",
>>>>>       "name": "spark",
>>>>>       "group": "spark",
>>>>>       "properties": {
>>>>>         "SPARK_HOME": {
>>>>>           "name": "SPARK_HOME",
>>>>>           "value": "",
>>>>>           "type": "string",
>>>>>           "description": "Location of spark distribution"
>>>>>         },
>>>>>         "master": {
>>>>>           "name": "master",
>>>>>           "value": "local[*]",
>>>>>           "type": "string",
>>>>>           "description": "Spark master uri. local | yarn-client |
>>>>> yarn-cluster | spark master address of standalone mode, ex)
>>>>> spark://master_host:7077"
>>>>>         },
>>>>> ```
>>>>> That "master" should be "spark.master".
>>>>>
>>>>> By adding an explicit spark.master with the value "local[*]" I can use
>>>>> all cores as expected. Without this and printing sc.master I get "local".
>>>>> With the addition of the spark.master property set to "local[*]" and
>>>>> printing sc.master I get "local[*]". My conclusion is
>>>>> that conf/interpreter.json isn't in sync with the interpreter-settings.json
>>>>> for Spark interpreter.
>>>>>
>>>>> Best regards,
>>>>> Patrik Iselind
>>>>>
>>>>>
>>>>> On Sat, May 16, 2020 at 11:22 AM Alex Ott <al...@gmail.com> wrote:
>>>>>
>>>>>> Spark master is set to `local[*]` by default. Here is corresponding
>>>>>> piece
>>>>>> form interpreter-settings.json for Spark interpreter:
>>>>>>
>>>>>>       "master": {
>>>>>>         "envName": "MASTER",
>>>>>>         "propertyName": "spark.master",
>>>>>>         "defaultValue": "local[*]",
>>>>>>         "description": "Spark master uri. local | yarn-client |
>>>>>> yarn-cluster | spark master address of standalone mode, ex)
>>>>>> spark://master_host:7077",
>>>>>>         "type": "string"
>>>>>>       },
>>>>>>
>>>>>>
>>>>>> Patrik Iselind  at "Sun, 10 May 2020 20:31:08 +0200" wrote:
>>>>>>  PI> Hi Jeff,
>>>>>>
>>>>>>  PI> I've tried the release from
>>>>>> http://zeppelin.apache.org/download.html, both in a docker and
>>>>>> without a docker. They both have the same issue as
>>>>>>  PI> previously described.
>>>>>>
>>>>>>  PI> Can I somehow set spark.master to "local[*]" in zeppelin,
>>>>>> perhaps using some environment variable?
>>>>>>
>>>>>>  PI> When is the next Zeppelin 0.9.0 docker image planned to be
>>>>>> released?
>>>>>>
>>>>>>  PI> Best Regards,
>>>>>>  PI> Patrik Iselind
>>>>>>
>>>>>>  PI> On Sun, May 10, 2020 at 9:26 AM Jeff Zhang <zj...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>  PI>     Hi Patric,
>>>>>>  PI>
>>>>>>  PI>     Do you mind to try the 0.9.0-preview, it might be an issue
>>>>>> of docker container.
>>>>>>  PI>
>>>>>>  PI>     http://zeppelin.apache.org/download.html
>>>>>>
>>>>>>  PI>     Patrik Iselind <pa...@gmail.com> 于2020年5月10日周日上午2:30写道：
>>>>>>  PI>
>>>>>>  PI>         Hello Jeff,
>>>>>>  PI>
>>>>>>  PI>         Thank you for looking into this for me.
>>>>>>  PI>
>>>>>>  PI>         Using the latest pushed docker image for 0.9.0 (image
>>>>>> ID 92890adfadfb, built 6 weeks ago), I still see the same issue. My image
>>>>>> has
>>>>>>  PI>         the digest "apache/zeppelin@sha256
>>>>>> :0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092".
>>>>>>  PI>
>>>>>>  PI>         If it's not on the tip of master, could you guys please
>>>>>> release a newer 0.9.0 image?
>>>>>>  PI>
>>>>>>  PI>         Best Regards,
>>>>>>  PI>         Patrik Iselind
>>>>>>
>>>>>>  PI>         On Sat, May 9, 2020 at 4:03 PM Jeff Zhang <
>>>>>> zjffdu@gmail.com> wrote:
>>>>>>  PI>
>>>>>>  PI>             This might be a bug of 0.8, I tried that in 0.9
>>>>>> (master branch), it works for me.
>>>>>>  PI>
>>>>>>  PI>             print(sc.master)
>>>>>>  PI>             print(sc.defaultParallelism)
>>>>>>  PI>
>>>>>>  PI>             ---
>>>>>>  PI>             local[*] 8
>>>>>>
>>>>>>  PI>             Patrik Iselind <pa...@gmail.com>
>>>>>> 于2020年5月9日周六下午8:34写道：
>>>>>>  PI>
>>>>>>  PI>                 Hi,
>>>>>>  PI>
>>>>>>  PI>                 First comes some background, then I have some
>>>>>> questions.
>>>>>>  PI>
>>>>>>  PI>                 Background
>>>>>>  PI>                 I'm trying out Zeppelin 0.8.2 based on the
>>>>>> Docker image. My Docker file looks like this:
>>>>>>  PI>
>>>>>>  PI>                 ```Dockerfile
>>>>>>  PI>                 FROM apache/zeppelin:0.8.2
>>>>>>
>>>>>>
>>>>>>  PI>
>>>>>>  PI>                 # Install Java and some tools
>>>>>>  PI>                 RUN apt-get -y update &&\
>>>>>>  PI>                     DEBIAN_FRONTEND=noninteractive \
>>>>>>  PI>                         apt -y install vim python3-pip
>>>>>>  PI>
>>>>>>  PI>                 RUN python3 -m pip install -U pyspark
>>>>>>  PI>
>>>>>>  PI>                 ENV PYSPARK_PYTHON python3
>>>>>>  PI>                 ENV PYSPARK_DRIVER_PYTHON python3
>>>>>>  PI>                 ```
>>>>>>  PI>
>>>>>>  PI>                 When I start a section like so
>>>>>>  PI>
>>>>>>  PI>                 ```Zeppelin paragraph
>>>>>>  PI>                 %pyspark
>>>>>>  PI>
>>>>>>  PI>                 print(sc)
>>>>>>  PI>                 print()
>>>>>>  PI>                 print(dir(sc))
>>>>>>  PI>                 print()
>>>>>>  PI>                 print(sc.master)
>>>>>>  PI>                 print()
>>>>>>  PI>                 print(sc.defaultParallelism)
>>>>>>  PI>                 ```
>>>>>>  PI>
>>>>>>  PI>                 I get the following output
>>>>>>  PI>
>>>>>>  PI>                 ```output
>>>>>>  PI>                 <SparkContext master=local appName=Zeppelin>
>>>>>> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__',
>>>>>>  PI>                 '__doc__', '__enter__', '__eq__', '__exit__',
>>>>>> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
>>>>>>  PI>                 '__hash__', '__init__', '__le__', '__lt__',
>>>>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>>>>>> '__repr__',
>>>>>>  PI>                 '__setattr__', '__sizeof__', '__str__',
>>>>>> '__subclasshook__', '__weakref__', '_accumulatorServer',
>>>>>> '_active_spark_context',
>>>>>>  PI>                 '_batchSize', '_callsite', '_checkpointFile',
>>>>>> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway',
>>>>>>  PI>                 '_getJavaStorageLevel', '_initialize_context',
>>>>>> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
>>>>>>  PI>                 '_pickled_broadcast_vars', '_python_includes',
>>>>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>>>>>> 'addFile',
>>>>>>  PI>                 'addPyFile', 'appName', 'applicationId',
>>>>>> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs',
>>>>>> 'cancelJobGroup',
>>>>>>  PI>                 'defaultMinPartitions', 'defaultParallelism',
>>>>>> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty',
>>>>>>  PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD',
>>>>>> 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize',
>>>>>> 'pickleFile',
>>>>>>  PI>                 'profiler_collector', 'pythonExec', 'pythonVer',
>>>>>> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
>>>>>>  PI>                 'setJobGroup', 'setLocalProperty',
>>>>>> 'setLogLevel', 'setSystemProperty', 'show_profiles', 'sparkHome',
>>>>>> 'sparkUser', 'startTime',
>>>>>>  PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl',
>>>>>> 'union', 'version', 'wholeTextFiles'] local 1
>>>>>>  PI>                 ```
>>>>>>  PI>
>>>>>>  PI>                 This even though the "master" property in the
>>>>>> interpretter is set to "local[*]". I'd like to use all cores on my machine.
>>>>>> To
>>>>>>  PI>                 do that I have to explicitly create the
>>>>>> "spark.master" property in the spark interpretter with the value
>>>>>> "local[*]", then I
>>>>>>  PI>                 get
>>>>>>  PI>
>>>>>>  PI>                 ```new output
>>>>>>  PI>                 <SparkContext master=local[*] appName=Zeppelin>
>>>>>> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__',
>>>>>>  PI>                 '__doc__', '__enter__', '__eq__', '__exit__',
>>>>>> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
>>>>>>  PI>                 '__hash__', '__init__', '__le__', '__lt__',
>>>>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>>>>>> '__repr__',
>>>>>>  PI>                 '__setattr__', '__sizeof__', '__str__',
>>>>>> '__subclasshook__', '__weakref__', '_accumulatorServer',
>>>>>> '_active_spark_context',
>>>>>>  PI>                 '_batchSize', '_callsite', '_checkpointFile',
>>>>>> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway',
>>>>>>  PI>                 '_getJavaStorageLevel', '_initialize_context',
>>>>>> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
>>>>>>  PI>                 '_pickled_broadcast_vars', '_python_includes',
>>>>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>>>>>> 'addFile',
>>>>>>  PI>                 'addPyFile', 'appName', 'applicationId',
>>>>>> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs',
>>>>>> 'cancelJobGroup',
>>>>>>  PI>                 'defaultMinPartitions', 'defaultParallelism',
>>>>>> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty',
>>>>>>  PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD',
>>>>>> 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize',
>>>>>> 'pickleFile',
>>>>>>  PI>                 'profiler_collector', 'pythonExec', 'pythonVer',
>>>>>> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
>>>>>>  PI>                 'setJobGroup', 'setLocalProperty',
>>>>>> 'setLogLevel', 'setSystemProperty', 'show_profiles', 'sparkHome',
>>>>>> 'sparkUser', 'startTime',
>>>>>>  PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl',
>>>>>> 'union', 'version', 'wholeTextFiles'] local[*] 8
>>>>>>  PI>                 ```
>>>>>>  PI>                 This is what I want.
>>>>>>  PI>
>>>>>>  PI>                 The Questions
>>>>>>  PI>                   @ Why is the "master" property not used in the
>>>>>> created SparkContext?
>>>>>>  PI>                   @ How do I add the spark.master property to
>>>>>> the docker image?
>>>>>>  PI>
>>>>>>  PI>                 Any hint or support you can provide would be
>>>>>> greatly appreciated.
>>>>>>  PI>
>>>>>>  PI>                 Yours Sincerely,
>>>>>>  PI>                 Patrik Iselind
>>>>>>
>>>>>>  PI>             --
>>>>>>  PI>             Best Regards
>>>>>>  PI>
>>>>>>  PI>             Jeff Zhang
>>>>>>
>>>>>>  PI>     --
>>>>>>  PI>     Best Regards
>>>>>>  PI>
>>>>>>  PI>     Jeff Zhang
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> With best wishes,                    Alex Ott
>>>>>> http://alexott.net/
>>>>>> Twitter: alexott_en (English), alexott (Russian)
>>>>>>
>>>>>
>>>>
>>>> --
>>>> With best wishes,                    Alex Ott
>>>> http://alexott.net/
>>>> Twitter: alexott_en (English), alexott (Russian)
>>>>
>>>
>>
>> --
>> With best wishes,                    Alex Ott
>> http://alexott.net/
>> Twitter: alexott_en (English), alexott (Russian)
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>


-- 
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

Re: Apache Spark master value question

Posted by Jeff Zhang <zj...@gmail.com>.

The env name in interpreter.json and interpreter-setting.json is not used.
We should remove them.

I still don't understand how master & spark.master would effect the
behavior. `master` is a legacy stuff that we introduced very long time ago,
we definitely should use spark.master instead. But actually internally we
do translate master to spark.master, so not sure why it would cause this
issue, maybe it is some bugs.



Alex Ott <al...@gmail.com> 于2020年5月17日周日 下午9:36写道：

> I've seen somewhere in CDH documentation that they use MASTER, that's why
> I'm asking...
>
> On Sun, May 17, 2020 at 3:13 PM Patrik Iselind <pa...@gmail.com>
> wrote:
>
>> Thanks a lot for creating the issue. It seems I am not allowed to.
>>
>> As I understand it, the environment variable is supposed to be
>> SPARK_MASTER.
>>
>> On Sun, May 17, 2020 at 11:56 AM Alex Ott <al...@gmail.com> wrote:
>>
>>> Ok, I've created a JIRA for it:
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4821 and working on patch
>>>
>>> I'm not sure about environment variable name - it's simply MASTER,
>>> should it be `SPARK_MASTER`, or it's a requirement of CDH and other Hadoop
>>> distributions to have it as MASTER?
>>>
>>> On Sat, May 16, 2020 at 3:45 PM Patrik Iselind <pa...@gmail.com>
>>> wrote:
>>>
>>>> Hi Alex,
>>>>
>>>> Thanks a lot for helping out with this.
>>>>
>>>> You're correct, but it doesn't seem that it's the
>>>> interpreter-settings.json for Spark interpreter that is being used. It's
>>>> conf/interpreter.json. In this file both 0.8.2 and 0.9.0 have
>>>> ```partial-json
>>>>     "spark": {
>>>>       "id": "spark",
>>>>       "name": "spark",
>>>>       "group": "spark",
>>>>       "properties": {
>>>>         "SPARK_HOME": {
>>>>           "name": "SPARK_HOME",
>>>>           "value": "",
>>>>           "type": "string",
>>>>           "description": "Location of spark distribution"
>>>>         },
>>>>         "master": {
>>>>           "name": "master",
>>>>           "value": "local[*]",
>>>>           "type": "string",
>>>>           "description": "Spark master uri. local | yarn-client |
>>>> yarn-cluster | spark master address of standalone mode, ex)
>>>> spark://master_host:7077"
>>>>         },
>>>> ```
>>>> That "master" should be "spark.master".
>>>>
>>>> By adding an explicit spark.master with the value "local[*]" I can use
>>>> all cores as expected. Without this and printing sc.master I get "local".
>>>> With the addition of the spark.master property set to "local[*]" and
>>>> printing sc.master I get "local[*]". My conclusion is
>>>> that conf/interpreter.json isn't in sync with the interpreter-settings.json
>>>> for Spark interpreter.
>>>>
>>>> Best regards,
>>>> Patrik Iselind
>>>>
>>>>
>>>> On Sat, May 16, 2020 at 11:22 AM Alex Ott <al...@gmail.com> wrote:
>>>>
>>>>> Spark master is set to `local[*]` by default. Here is corresponding
>>>>> piece
>>>>> form interpreter-settings.json for Spark interpreter:
>>>>>
>>>>>       "master": {
>>>>>         "envName": "MASTER",
>>>>>         "propertyName": "spark.master",
>>>>>         "defaultValue": "local[*]",
>>>>>         "description": "Spark master uri. local | yarn-client |
>>>>> yarn-cluster | spark master address of standalone mode, ex)
>>>>> spark://master_host:7077",
>>>>>         "type": "string"
>>>>>       },
>>>>>
>>>>>
>>>>> Patrik Iselind  at "Sun, 10 May 2020 20:31:08 +0200" wrote:
>>>>>  PI> Hi Jeff,
>>>>>
>>>>>  PI> I've tried the release from
>>>>> http://zeppelin.apache.org/download.html, both in a docker and
>>>>> without a docker. They both have the same issue as
>>>>>  PI> previously described.
>>>>>
>>>>>  PI> Can I somehow set spark.master to "local[*]" in zeppelin, perhaps
>>>>> using some environment variable?
>>>>>
>>>>>  PI> When is the next Zeppelin 0.9.0 docker image planned to be
>>>>> released?
>>>>>
>>>>>  PI> Best Regards,
>>>>>  PI> Patrik Iselind
>>>>>
>>>>>  PI> On Sun, May 10, 2020 at 9:26 AM Jeff Zhang <zj...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>  PI>     Hi Patric,
>>>>>  PI>
>>>>>  PI>     Do you mind to try the 0.9.0-preview, it might be an issue of
>>>>> docker container.
>>>>>  PI>
>>>>>  PI>     http://zeppelin.apache.org/download.html
>>>>>
>>>>>  PI>     Patrik Iselind <pa...@gmail.com> 于2020年5月10日周日上午2:30写道：
>>>>>  PI>
>>>>>  PI>         Hello Jeff,
>>>>>  PI>
>>>>>  PI>         Thank you for looking into this for me.
>>>>>  PI>
>>>>>  PI>         Using the latest pushed docker image for 0.9.0 (image
>>>>> ID 92890adfadfb, built 6 weeks ago), I still see the same issue. My image
>>>>> has
>>>>>  PI>         the digest "apache/zeppelin@sha256
>>>>> :0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092".
>>>>>  PI>
>>>>>  PI>         If it's not on the tip of master, could you guys please
>>>>> release a newer 0.9.0 image?
>>>>>  PI>
>>>>>  PI>         Best Regards,
>>>>>  PI>         Patrik Iselind
>>>>>
>>>>>  PI>         On Sat, May 9, 2020 at 4:03 PM Jeff Zhang <
>>>>> zjffdu@gmail.com> wrote:
>>>>>  PI>
>>>>>  PI>             This might be a bug of 0.8, I tried that in 0.9
>>>>> (master branch), it works for me.
>>>>>  PI>
>>>>>  PI>             print(sc.master)
>>>>>  PI>             print(sc.defaultParallelism)
>>>>>  PI>
>>>>>  PI>             ---
>>>>>  PI>             local[*] 8
>>>>>
>>>>>  PI>             Patrik Iselind <pa...@gmail.com>
>>>>> 于2020年5月9日周六下午8:34写道：
>>>>>  PI>
>>>>>  PI>                 Hi,
>>>>>  PI>
>>>>>  PI>                 First comes some background, then I have some
>>>>> questions.
>>>>>  PI>
>>>>>  PI>                 Background
>>>>>  PI>                 I'm trying out Zeppelin 0.8.2 based on the Docker
>>>>> image. My Docker file looks like this:
>>>>>  PI>
>>>>>  PI>                 ```Dockerfile
>>>>>  PI>                 FROM apache/zeppelin:0.8.2
>>>>>
>>>>>  PI>
>>>>>  PI>                 # Install Java and some tools
>>>>>  PI>                 RUN apt-get -y update &&\
>>>>>  PI>                     DEBIAN_FRONTEND=noninteractive \
>>>>>  PI>                         apt -y install vim python3-pip
>>>>>  PI>
>>>>>  PI>                 RUN python3 -m pip install -U pyspark
>>>>>  PI>
>>>>>  PI>                 ENV PYSPARK_PYTHON python3
>>>>>  PI>                 ENV PYSPARK_DRIVER_PYTHON python3
>>>>>  PI>                 ```
>>>>>  PI>
>>>>>  PI>                 When I start a section like so
>>>>>  PI>
>>>>>  PI>                 ```Zeppelin paragraph
>>>>>  PI>                 %pyspark
>>>>>  PI>
>>>>>  PI>                 print(sc)
>>>>>  PI>                 print()
>>>>>  PI>                 print(dir(sc))
>>>>>  PI>                 print()
>>>>>  PI>                 print(sc.master)
>>>>>  PI>                 print()
>>>>>  PI>                 print(sc.defaultParallelism)
>>>>>  PI>                 ```
>>>>>  PI>
>>>>>  PI>                 I get the following output
>>>>>  PI>
>>>>>  PI>                 ```output
>>>>>  PI>                 <SparkContext master=local appName=Zeppelin>
>>>>> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__',
>>>>>  PI>                 '__doc__', '__enter__', '__eq__', '__exit__',
>>>>> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
>>>>>  PI>                 '__hash__', '__init__', '__le__', '__lt__',
>>>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>>>>> '__repr__',
>>>>>  PI>                 '__setattr__', '__sizeof__', '__str__',
>>>>> '__subclasshook__', '__weakref__', '_accumulatorServer',
>>>>> '_active_spark_context',
>>>>>  PI>                 '_batchSize', '_callsite', '_checkpointFile',
>>>>> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway',
>>>>>  PI>                 '_getJavaStorageLevel', '_initialize_context',
>>>>> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
>>>>>  PI>                 '_pickled_broadcast_vars', '_python_includes',
>>>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>>>>> 'addFile',
>>>>>  PI>                 'addPyFile', 'appName', 'applicationId',
>>>>> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs',
>>>>> 'cancelJobGroup',
>>>>>  PI>                 'defaultMinPartitions', 'defaultParallelism',
>>>>> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty',
>>>>>  PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD',
>>>>> 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize',
>>>>> 'pickleFile',
>>>>>  PI>                 'profiler_collector', 'pythonExec', 'pythonVer',
>>>>> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
>>>>>  PI>                 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>>>>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime',
>>>>>  PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl',
>>>>> 'union', 'version', 'wholeTextFiles'] local 1
>>>>>  PI>                 ```
>>>>>  PI>
>>>>>  PI>                 This even though the "master" property in the
>>>>> interpretter is set to "local[*]". I'd like to use all cores on my machine.
>>>>> To
>>>>>  PI>                 do that I have to explicitly create the
>>>>> "spark.master" property in the spark interpretter with the value
>>>>> "local[*]", then I
>>>>>  PI>                 get
>>>>>  PI>
>>>>>  PI>                 ```new output
>>>>>  PI>                 <SparkContext master=local[*] appName=Zeppelin>
>>>>> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__',
>>>>>  PI>                 '__doc__', '__enter__', '__eq__', '__exit__',
>>>>> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
>>>>>  PI>                 '__hash__', '__init__', '__le__', '__lt__',
>>>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>>>>> '__repr__',
>>>>>  PI>                 '__setattr__', '__sizeof__', '__str__',
>>>>> '__subclasshook__', '__weakref__', '_accumulatorServer',
>>>>> '_active_spark_context',
>>>>>  PI>                 '_batchSize', '_callsite', '_checkpointFile',
>>>>> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway',
>>>>>  PI>                 '_getJavaStorageLevel', '_initialize_context',
>>>>> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
>>>>>  PI>                 '_pickled_broadcast_vars', '_python_includes',
>>>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>>>>> 'addFile',
>>>>>  PI>                 'addPyFile', 'appName', 'applicationId',
>>>>> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs',
>>>>> 'cancelJobGroup',
>>>>>  PI>                 'defaultMinPartitions', 'defaultParallelism',
>>>>> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty',
>>>>>  PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD',
>>>>> 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize',
>>>>> 'pickleFile',
>>>>>  PI>                 'profiler_collector', 'pythonExec', 'pythonVer',
>>>>> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
>>>>>  PI>                 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>>>>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime',
>>>>>  PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl',
>>>>> 'union', 'version', 'wholeTextFiles'] local[*] 8
>>>>>  PI>                 ```
>>>>>  PI>                 This is what I want.
>>>>>  PI>
>>>>>  PI>                 The Questions
>>>>>  PI>                   @ Why is the "master" property not used in the
>>>>> created SparkContext?
>>>>>  PI>                   @ How do I add the spark.master property to the
>>>>> docker image?
>>>>>  PI>
>>>>>  PI>                 Any hint or support you can provide would be
>>>>> greatly appreciated.
>>>>>  PI>
>>>>>  PI>                 Yours Sincerely,
>>>>>  PI>                 Patrik Iselind
>>>>>
>>>>>  PI>             --
>>>>>  PI>             Best Regards
>>>>>  PI>
>>>>>  PI>             Jeff Zhang
>>>>>
>>>>>  PI>     --
>>>>>  PI>     Best Regards
>>>>>  PI>
>>>>>  PI>     Jeff Zhang
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> With best wishes,                    Alex Ott
>>>>> http://alexott.net/
>>>>> Twitter: alexott_en (English), alexott (Russian)
>>>>>
>>>>
>>>
>>> --
>>> With best wishes,                    Alex Ott
>>> http://alexott.net/
>>> Twitter: alexott_en (English), alexott (Russian)
>>>
>>
>
> --
> With best wishes,                    Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
>


-- 
Best Regards

Jeff Zhang

Re: Apache Spark master value question

Posted by Alex Ott <al...@gmail.com>.

I've seen somewhere in CDH documentation that they use MASTER, that's why
I'm asking...

On Sun, May 17, 2020 at 3:13 PM Patrik Iselind <pa...@gmail.com> wrote:

> Thanks a lot for creating the issue. It seems I am not allowed to.
>
> As I understand it, the environment variable is supposed to be
> SPARK_MASTER.
>
> On Sun, May 17, 2020 at 11:56 AM Alex Ott <al...@gmail.com> wrote:
>
>> Ok, I've created a JIRA for it:
>> https://issues.apache.org/jira/browse/ZEPPELIN-4821 and working on patch
>>
>> I'm not sure about environment variable name - it's simply MASTER, should
>> it be `SPARK_MASTER`, or it's a requirement of CDH and other Hadoop
>> distributions to have it as MASTER?
>>
>> On Sat, May 16, 2020 at 3:45 PM Patrik Iselind <pa...@gmail.com>
>> wrote:
>>
>>> Hi Alex,
>>>
>>> Thanks a lot for helping out with this.
>>>
>>> You're correct, but it doesn't seem that it's the
>>> interpreter-settings.json for Spark interpreter that is being used. It's
>>> conf/interpreter.json. In this file both 0.8.2 and 0.9.0 have
>>> ```partial-json
>>>     "spark": {
>>>       "id": "spark",
>>>       "name": "spark",
>>>       "group": "spark",
>>>       "properties": {
>>>         "SPARK_HOME": {
>>>           "name": "SPARK_HOME",
>>>           "value": "",
>>>           "type": "string",
>>>           "description": "Location of spark distribution"
>>>         },
>>>         "master": {
>>>           "name": "master",
>>>           "value": "local[*]",
>>>           "type": "string",
>>>           "description": "Spark master uri. local | yarn-client |
>>> yarn-cluster | spark master address of standalone mode, ex)
>>> spark://master_host:7077"
>>>         },
>>> ```
>>> That "master" should be "spark.master".
>>>
>>> By adding an explicit spark.master with the value "local[*]" I can use
>>> all cores as expected. Without this and printing sc.master I get "local".
>>> With the addition of the spark.master property set to "local[*]" and
>>> printing sc.master I get "local[*]". My conclusion is
>>> that conf/interpreter.json isn't in sync with the interpreter-settings.json
>>> for Spark interpreter.
>>>
>>> Best regards,
>>> Patrik Iselind
>>>
>>>
>>> On Sat, May 16, 2020 at 11:22 AM Alex Ott <al...@gmail.com> wrote:
>>>
>>>> Spark master is set to `local[*]` by default. Here is corresponding
>>>> piece
>>>> form interpreter-settings.json for Spark interpreter:
>>>>
>>>>       "master": {
>>>>         "envName": "MASTER",
>>>>         "propertyName": "spark.master",
>>>>         "defaultValue": "local[*]",
>>>>         "description": "Spark master uri. local | yarn-client |
>>>> yarn-cluster | spark master address of standalone mode, ex)
>>>> spark://master_host:7077",
>>>>         "type": "string"
>>>>       },
>>>>
>>>>
>>>> Patrik Iselind  at "Sun, 10 May 2020 20:31:08 +0200" wrote:
>>>>  PI> Hi Jeff,
>>>>
>>>>  PI> I've tried the release from
>>>> http://zeppelin.apache.org/download.html, both in a docker and without
>>>> a docker. They both have the same issue as
>>>>  PI> previously described.
>>>>
>>>>  PI> Can I somehow set spark.master to "local[*]" in zeppelin, perhaps
>>>> using some environment variable?
>>>>
>>>>  PI> When is the next Zeppelin 0.9.0 docker image planned to be
>>>> released?
>>>>
>>>>  PI> Best Regards,
>>>>  PI> Patrik Iselind
>>>>
>>>>  PI> On Sun, May 10, 2020 at 9:26 AM Jeff Zhang <zj...@gmail.com>
>>>> wrote:
>>>>
>>>>  PI>     Hi Patric,
>>>>  PI>
>>>>  PI>     Do you mind to try the 0.9.0-preview, it might be an issue of
>>>> docker container.
>>>>  PI>
>>>>  PI>     http://zeppelin.apache.org/download.html
>>>>
>>>>  PI>     Patrik Iselind <pa...@gmail.com> 于2020年5月10日周日上午2:30写道：
>>>>  PI>
>>>>  PI>         Hello Jeff,
>>>>  PI>
>>>>  PI>         Thank you for looking into this for me.
>>>>  PI>
>>>>  PI>         Using the latest pushed docker image for 0.9.0 (image
>>>> ID 92890adfadfb, built 6 weeks ago), I still see the same issue. My image
>>>> has
>>>>  PI>         the digest "apache/zeppelin@sha256
>>>> :0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092".
>>>>  PI>
>>>>  PI>         If it's not on the tip of master, could you guys please
>>>> release a newer 0.9.0 image?
>>>>  PI>
>>>>  PI>         Best Regards,
>>>>  PI>         Patrik Iselind
>>>>
>>>>  PI>         On Sat, May 9, 2020 at 4:03 PM Jeff Zhang <
>>>> zjffdu@gmail.com> wrote:
>>>>  PI>
>>>>  PI>             This might be a bug of 0.8, I tried that in 0.9
>>>> (master branch), it works for me.
>>>>  PI>
>>>>  PI>             print(sc.master)
>>>>  PI>             print(sc.defaultParallelism)
>>>>  PI>
>>>>  PI>             ---
>>>>  PI>             local[*] 8
>>>>
>>>>  PI>             Patrik Iselind <pa...@gmail.com>
>>>> 于2020年5月9日周六下午8:34写道：
>>>>  PI>
>>>>  PI>                 Hi,
>>>>  PI>
>>>>  PI>                 First comes some background, then I have some
>>>> questions.
>>>>  PI>
>>>>  PI>                 Background
>>>>  PI>                 I'm trying out Zeppelin 0.8.2 based on the Docker
>>>> image. My Docker file looks like this:
>>>>  PI>
>>>>  PI>                 ```Dockerfile
>>>>  PI>                 FROM apache/zeppelin:0.8.2
>>>>
>>>>  PI>
>>>>  PI>                 # Install Java and some tools
>>>>  PI>                 RUN apt-get -y update &&\
>>>>  PI>                     DEBIAN_FRONTEND=noninteractive \
>>>>  PI>                         apt -y install vim python3-pip
>>>>  PI>
>>>>  PI>                 RUN python3 -m pip install -U pyspark
>>>>  PI>
>>>>  PI>                 ENV PYSPARK_PYTHON python3
>>>>  PI>                 ENV PYSPARK_DRIVER_PYTHON python3
>>>>  PI>                 ```
>>>>  PI>
>>>>  PI>                 When I start a section like so
>>>>  PI>
>>>>  PI>                 ```Zeppelin paragraph
>>>>  PI>                 %pyspark
>>>>  PI>
>>>>  PI>                 print(sc)
>>>>  PI>                 print()
>>>>  PI>                 print(dir(sc))
>>>>  PI>                 print()
>>>>  PI>                 print(sc.master)
>>>>  PI>                 print()
>>>>  PI>                 print(sc.defaultParallelism)
>>>>  PI>                 ```
>>>>  PI>
>>>>  PI>                 I get the following output
>>>>  PI>
>>>>  PI>                 ```output
>>>>  PI>                 <SparkContext master=local appName=Zeppelin>
>>>> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__',
>>>>  PI>                 '__doc__', '__enter__', '__eq__', '__exit__',
>>>> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
>>>>  PI>                 '__hash__', '__init__', '__le__', '__lt__',
>>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>>>> '__repr__',
>>>>  PI>                 '__setattr__', '__sizeof__', '__str__',
>>>> '__subclasshook__', '__weakref__', '_accumulatorServer',
>>>> '_active_spark_context',
>>>>  PI>                 '_batchSize', '_callsite', '_checkpointFile',
>>>> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway',
>>>>  PI>                 '_getJavaStorageLevel', '_initialize_context',
>>>> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
>>>>  PI>                 '_pickled_broadcast_vars', '_python_includes',
>>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>>>> 'addFile',
>>>>  PI>                 'addPyFile', 'appName', 'applicationId',
>>>> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs',
>>>> 'cancelJobGroup',
>>>>  PI>                 'defaultMinPartitions', 'defaultParallelism',
>>>> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty',
>>>>  PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD',
>>>> 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize',
>>>> 'pickleFile',
>>>>  PI>                 'profiler_collector', 'pythonExec', 'pythonVer',
>>>> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
>>>>  PI>                 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>>>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime',
>>>>  PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl',
>>>> 'union', 'version', 'wholeTextFiles'] local 1
>>>>  PI>                 ```
>>>>  PI>
>>>>  PI>                 This even though the "master" property in the
>>>> interpretter is set to "local[*]". I'd like to use all cores on my machine.
>>>> To
>>>>  PI>                 do that I have to explicitly create the
>>>> "spark.master" property in the spark interpretter with the value
>>>> "local[*]", then I
>>>>  PI>                 get
>>>>  PI>
>>>>  PI>                 ```new output
>>>>  PI>                 <SparkContext master=local[*] appName=Zeppelin>
>>>> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__',
>>>>  PI>                 '__doc__', '__enter__', '__eq__', '__exit__',
>>>> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
>>>>  PI>                 '__hash__', '__init__', '__le__', '__lt__',
>>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>>>> '__repr__',
>>>>  PI>                 '__setattr__', '__sizeof__', '__str__',
>>>> '__subclasshook__', '__weakref__', '_accumulatorServer',
>>>> '_active_spark_context',
>>>>  PI>                 '_batchSize', '_callsite', '_checkpointFile',
>>>> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway',
>>>>  PI>                 '_getJavaStorageLevel', '_initialize_context',
>>>> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
>>>>  PI>                 '_pickled_broadcast_vars', '_python_includes',
>>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>>>> 'addFile',
>>>>  PI>                 'addPyFile', 'appName', 'applicationId',
>>>> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs',
>>>> 'cancelJobGroup',
>>>>  PI>                 'defaultMinPartitions', 'defaultParallelism',
>>>> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty',
>>>>  PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD',
>>>> 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize',
>>>> 'pickleFile',
>>>>  PI>                 'profiler_collector', 'pythonExec', 'pythonVer',
>>>> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
>>>>  PI>                 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>>>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime',
>>>>  PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl',
>>>> 'union', 'version', 'wholeTextFiles'] local[*] 8
>>>>  PI>                 ```
>>>>  PI>                 This is what I want.
>>>>  PI>
>>>>  PI>                 The Questions
>>>>  PI>                   @ Why is the "master" property not used in the
>>>> created SparkContext?
>>>>  PI>                   @ How do I add the spark.master property to the
>>>> docker image?
>>>>  PI>
>>>>  PI>                 Any hint or support you can provide would be
>>>> greatly appreciated.
>>>>  PI>
>>>>  PI>                 Yours Sincerely,
>>>>  PI>                 Patrik Iselind
>>>>
>>>>  PI>             --
>>>>  PI>             Best Regards
>>>>  PI>
>>>>  PI>             Jeff Zhang
>>>>
>>>>  PI>     --
>>>>  PI>     Best Regards
>>>>  PI>
>>>>  PI>     Jeff Zhang
>>>>
>>>>
>>>>
>>>> --
>>>> With best wishes,                    Alex Ott
>>>> http://alexott.net/
>>>> Twitter: alexott_en (English), alexott (Russian)
>>>>
>>>
>>
>> --
>> With best wishes,                    Alex Ott
>> http://alexott.net/
>> Twitter: alexott_en (English), alexott (Russian)
>>
>

-- 
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

Re: Apache Spark master value question

Posted by Patrik Iselind <pa...@gmail.com>.

Thanks a lot for creating the issue. It seems I am not allowed to.

As I understand it, the environment variable is supposed to be SPARK_MASTER.

On Sun, May 17, 2020 at 11:56 AM Alex Ott <al...@gmail.com> wrote:

> Ok, I've created a JIRA for it:
> https://issues.apache.org/jira/browse/ZEPPELIN-4821 and working on patch
>
> I'm not sure about environment variable name - it's simply MASTER, should
> it be `SPARK_MASTER`, or it's a requirement of CDH and other Hadoop
> distributions to have it as MASTER?
>
> On Sat, May 16, 2020 at 3:45 PM Patrik Iselind <pa...@gmail.com>
> wrote:
>
>> Hi Alex,
>>
>> Thanks a lot for helping out with this.
>>
>> You're correct, but it doesn't seem that it's the
>> interpreter-settings.json for Spark interpreter that is being used. It's
>> conf/interpreter.json. In this file both 0.8.2 and 0.9.0 have
>> ```partial-json
>>     "spark": {
>>       "id": "spark",
>>       "name": "spark",
>>       "group": "spark",
>>       "properties": {
>>         "SPARK_HOME": {
>>           "name": "SPARK_HOME",
>>           "value": "",
>>           "type": "string",
>>           "description": "Location of spark distribution"
>>         },
>>         "master": {
>>           "name": "master",
>>           "value": "local[*]",
>>           "type": "string",
>>           "description": "Spark master uri. local | yarn-client |
>> yarn-cluster | spark master address of standalone mode, ex)
>> spark://master_host:7077"
>>         },
>> ```
>> That "master" should be "spark.master".
>>
>> By adding an explicit spark.master with the value "local[*]" I can use
>> all cores as expected. Without this and printing sc.master I get "local".
>> With the addition of the spark.master property set to "local[*]" and
>> printing sc.master I get "local[*]". My conclusion is
>> that conf/interpreter.json isn't in sync with the interpreter-settings.json
>> for Spark interpreter.
>>
>> Best regards,
>> Patrik Iselind
>>
>>
>> On Sat, May 16, 2020 at 11:22 AM Alex Ott <al...@gmail.com> wrote:
>>
>>> Spark master is set to `local[*]` by default. Here is corresponding piece
>>> form interpreter-settings.json for Spark interpreter:
>>>
>>>       "master": {
>>>         "envName": "MASTER",
>>>         "propertyName": "spark.master",
>>>         "defaultValue": "local[*]",
>>>         "description": "Spark master uri. local | yarn-client |
>>> yarn-cluster | spark master address of standalone mode, ex)
>>> spark://master_host:7077",
>>>         "type": "string"
>>>       },
>>>
>>>
>>> Patrik Iselind  at "Sun, 10 May 2020 20:31:08 +0200" wrote:
>>>  PI> Hi Jeff,
>>>
>>>  PI> I've tried the release from
>>> http://zeppelin.apache.org/download.html, both in a docker and without
>>> a docker. They both have the same issue as
>>>  PI> previously described.
>>>
>>>  PI> Can I somehow set spark.master to "local[*]" in zeppelin, perhaps
>>> using some environment variable?
>>>
>>>  PI> When is the next Zeppelin 0.9.0 docker image planned to be released?
>>>
>>>  PI> Best Regards,
>>>  PI> Patrik Iselind
>>>
>>>  PI> On Sun, May 10, 2020 at 9:26 AM Jeff Zhang <zj...@gmail.com>
>>> wrote:
>>>
>>>  PI>     Hi Patric,
>>>  PI>
>>>  PI>     Do you mind to try the 0.9.0-preview, it might be an issue of
>>> docker container.
>>>  PI>
>>>  PI>     http://zeppelin.apache.org/download.html
>>>
>>>  PI>     Patrik Iselind <pa...@gmail.com> 于2020年5月10日周日上午2:30写道：
>>>  PI>
>>>  PI>         Hello Jeff,
>>>  PI>
>>>  PI>         Thank you for looking into this for me.
>>>  PI>
>>>  PI>         Using the latest pushed docker image for 0.9.0 (image
>>> ID 92890adfadfb, built 6 weeks ago), I still see the same issue. My image
>>> has
>>>  PI>         the digest "apache/zeppelin@sha256
>>> :0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092".
>>>  PI>
>>>  PI>         If it's not on the tip of master, could you guys please
>>> release a newer 0.9.0 image?
>>>  PI>
>>>  PI>         Best Regards,
>>>  PI>         Patrik Iselind
>>>
>>>  PI>         On Sat, May 9, 2020 at 4:03 PM Jeff Zhang <zj...@gmail.com>
>>> wrote:
>>>  PI>
>>>  PI>             This might be a bug of 0.8, I tried that in 0.9 (master
>>> branch), it works for me.
>>>  PI>
>>>  PI>             print(sc.master)
>>>  PI>             print(sc.defaultParallelism)
>>>  PI>
>>>  PI>             ---
>>>  PI>             local[*] 8
>>>
>>>  PI>             Patrik Iselind <pa...@gmail.com>
>>> 于2020年5月9日周六下午8:34写道：
>>>  PI>
>>>  PI>                 Hi,
>>>  PI>
>>>  PI>                 First comes some background, then I have some
>>> questions.
>>>  PI>
>>>  PI>                 Background
>>>  PI>                 I'm trying out Zeppelin 0.8.2 based on the Docker
>>> image. My Docker file looks like this:
>>>  PI>
>>>  PI>                 ```Dockerfile
>>>  PI>                 FROM apache/zeppelin:0.8.2
>>>
>>>  PI>
>>>  PI>                 # Install Java and some tools
>>>  PI>                 RUN apt-get -y update &&\
>>>  PI>                     DEBIAN_FRONTEND=noninteractive \
>>>  PI>                         apt -y install vim python3-pip
>>>  PI>
>>>  PI>                 RUN python3 -m pip install -U pyspark
>>>  PI>
>>>  PI>                 ENV PYSPARK_PYTHON python3
>>>  PI>                 ENV PYSPARK_DRIVER_PYTHON python3
>>>  PI>                 ```
>>>  PI>
>>>  PI>                 When I start a section like so
>>>  PI>
>>>  PI>                 ```Zeppelin paragraph
>>>  PI>                 %pyspark
>>>  PI>
>>>  PI>                 print(sc)
>>>  PI>                 print()
>>>  PI>                 print(dir(sc))
>>>  PI>                 print()
>>>  PI>                 print(sc.master)
>>>  PI>                 print()
>>>  PI>                 print(sc.defaultParallelism)
>>>  PI>                 ```
>>>  PI>
>>>  PI>                 I get the following output
>>>  PI>
>>>  PI>                 ```output
>>>  PI>                 <SparkContext master=local appName=Zeppelin>
>>> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__',
>>>  PI>                 '__doc__', '__enter__', '__eq__', '__exit__',
>>> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
>>>  PI>                 '__hash__', '__init__', '__le__', '__lt__',
>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>>> '__repr__',
>>>  PI>                 '__setattr__', '__sizeof__', '__str__',
>>> '__subclasshook__', '__weakref__', '_accumulatorServer',
>>> '_active_spark_context',
>>>  PI>                 '_batchSize', '_callsite', '_checkpointFile',
>>> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway',
>>>  PI>                 '_getJavaStorageLevel', '_initialize_context',
>>> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
>>>  PI>                 '_pickled_broadcast_vars', '_python_includes',
>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>>> 'addFile',
>>>  PI>                 'addPyFile', 'appName', 'applicationId',
>>> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs',
>>> 'cancelJobGroup',
>>>  PI>                 'defaultMinPartitions', 'defaultParallelism',
>>> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty',
>>>  PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD', 'master',
>>> 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', 'pickleFile',
>>>  PI>                 'profiler_collector', 'pythonExec', 'pythonVer',
>>> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
>>>  PI>                 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime',
>>>  PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl',
>>> 'union', 'version', 'wholeTextFiles'] local 1
>>>  PI>                 ```
>>>  PI>
>>>  PI>                 This even though the "master" property in the
>>> interpretter is set to "local[*]". I'd like to use all cores on my machine.
>>> To
>>>  PI>                 do that I have to explicitly create the
>>> "spark.master" property in the spark interpretter with the value
>>> "local[*]", then I
>>>  PI>                 get
>>>  PI>
>>>  PI>                 ```new output
>>>  PI>                 <SparkContext master=local[*] appName=Zeppelin>
>>> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__',
>>>  PI>                 '__doc__', '__enter__', '__eq__', '__exit__',
>>> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
>>>  PI>                 '__hash__', '__init__', '__le__', '__lt__',
>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>>> '__repr__',
>>>  PI>                 '__setattr__', '__sizeof__', '__str__',
>>> '__subclasshook__', '__weakref__', '_accumulatorServer',
>>> '_active_spark_context',
>>>  PI>                 '_batchSize', '_callsite', '_checkpointFile',
>>> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway',
>>>  PI>                 '_getJavaStorageLevel', '_initialize_context',
>>> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
>>>  PI>                 '_pickled_broadcast_vars', '_python_includes',
>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>>> 'addFile',
>>>  PI>                 'addPyFile', 'appName', 'applicationId',
>>> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs',
>>> 'cancelJobGroup',
>>>  PI>                 'defaultMinPartitions', 'defaultParallelism',
>>> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty',
>>>  PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD', 'master',
>>> 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', 'pickleFile',
>>>  PI>                 'profiler_collector', 'pythonExec', 'pythonVer',
>>> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
>>>  PI>                 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime',
>>>  PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl',
>>> 'union', 'version', 'wholeTextFiles'] local[*] 8
>>>  PI>                 ```
>>>  PI>                 This is what I want.
>>>  PI>
>>>  PI>                 The Questions
>>>  PI>                   @ Why is the "master" property not used in the
>>> created SparkContext?
>>>  PI>                   @ How do I add the spark.master property to the
>>> docker image?
>>>  PI>
>>>  PI>                 Any hint or support you can provide would be
>>> greatly appreciated.
>>>  PI>
>>>  PI>                 Yours Sincerely,
>>>  PI>                 Patrik Iselind
>>>
>>>  PI>             --
>>>  PI>             Best Regards
>>>  PI>
>>>  PI>             Jeff Zhang
>>>
>>>  PI>     --
>>>  PI>     Best Regards
>>>  PI>
>>>  PI>     Jeff Zhang
>>>
>>>
>>>
>>> --
>>> With best wishes,                    Alex Ott
>>> http://alexott.net/
>>> Twitter: alexott_en (English), alexott (Russian)
>>>
>>
>
> --
> With best wishes,                    Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
>

Re: Apache Spark master value question

Posted by Alex Ott <al...@gmail.com>.

Ok, I've created a JIRA for it:
https://issues.apache.org/jira/browse/ZEPPELIN-4821 and working on patch

I'm not sure about environment variable name - it's simply MASTER, should
it be `SPARK_MASTER`, or it's a requirement of CDH and other Hadoop
distributions to have it as MASTER?

On Sat, May 16, 2020 at 3:45 PM Patrik Iselind <pa...@gmail.com> wrote:

> Hi Alex,
>
> Thanks a lot for helping out with this.
>
> You're correct, but it doesn't seem that it's the
> interpreter-settings.json for Spark interpreter that is being used. It's
> conf/interpreter.json. In this file both 0.8.2 and 0.9.0 have
> ```partial-json
>     "spark": {
>       "id": "spark",
>       "name": "spark",
>       "group": "spark",
>       "properties": {
>         "SPARK_HOME": {
>           "name": "SPARK_HOME",
>           "value": "",
>           "type": "string",
>           "description": "Location of spark distribution"
>         },
>         "master": {
>           "name": "master",
>           "value": "local[*]",
>           "type": "string",
>           "description": "Spark master uri. local | yarn-client |
> yarn-cluster | spark master address of standalone mode, ex)
> spark://master_host:7077"
>         },
> ```
> That "master" should be "spark.master".
>
> By adding an explicit spark.master with the value "local[*]" I can use all
> cores as expected. Without this and printing sc.master I get "local". With
> the addition of the spark.master property set to "local[*]" and printing
> sc.master I get "local[*]". My conclusion is that conf/interpreter.json
> isn't in sync with the interpreter-settings.json for Spark interpreter.
>
> Best regards,
> Patrik Iselind
>
>
> On Sat, May 16, 2020 at 11:22 AM Alex Ott <al...@gmail.com> wrote:
>
>> Spark master is set to `local[*]` by default. Here is corresponding piece
>> form interpreter-settings.json for Spark interpreter:
>>
>>       "master": {
>>         "envName": "MASTER",
>>         "propertyName": "spark.master",
>>         "defaultValue": "local[*]",
>>         "description": "Spark master uri. local | yarn-client |
>> yarn-cluster | spark master address of standalone mode, ex)
>> spark://master_host:7077",
>>         "type": "string"
>>       },
>>
>>
>> Patrik Iselind  at "Sun, 10 May 2020 20:31:08 +0200" wrote:
>>  PI> Hi Jeff,
>>
>>  PI> I've tried the release from http://zeppelin.apache.org/download.html, both
>> in a docker and without a docker. They both have the same issue as
>>  PI> previously described.
>>
>>  PI> Can I somehow set spark.master to "local[*]" in zeppelin, perhaps
>> using some environment variable?
>>
>>  PI> When is the next Zeppelin 0.9.0 docker image planned to be released?
>>
>>  PI> Best Regards,
>>  PI> Patrik Iselind
>>
>>  PI> On Sun, May 10, 2020 at 9:26 AM Jeff Zhang <zj...@gmail.com> wrote:
>>
>>  PI>     Hi Patric,
>>  PI>
>>  PI>     Do you mind to try the 0.9.0-preview, it might be an issue of
>> docker container.
>>  PI>
>>  PI>     http://zeppelin.apache.org/download.html
>>
>>  PI>     Patrik Iselind <pa...@gmail.com> 于2020年5月10日周日上午2:30写道：
>>  PI>
>>  PI>         Hello Jeff,
>>  PI>
>>  PI>         Thank you for looking into this for me.
>>  PI>
>>  PI>         Using the latest pushed docker image for 0.9.0 (image
>> ID 92890adfadfb, built 6 weeks ago), I still see the same issue. My image
>> has
>>  PI>         the digest "apache/zeppelin@sha256
>> :0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092".
>>  PI>
>>  PI>         If it's not on the tip of master, could you guys please
>> release a newer 0.9.0 image?
>>  PI>
>>  PI>         Best Regards,
>>  PI>         Patrik Iselind
>>
>>  PI>         On Sat, May 9, 2020 at 4:03 PM Jeff Zhang <zj...@gmail.com>
>> wrote:
>>  PI>
>>  PI>             This might be a bug of 0.8, I tried that in 0.9 (master
>> branch), it works for me.
>>  PI>
>>  PI>             print(sc.master)
>>  PI>             print(sc.defaultParallelism)
>>  PI>
>>  PI>             ---
>>  PI>             local[*] 8
>>
>>  PI>             Patrik Iselind <pa...@gmail.com>
>> 于2020年5月9日周六下午8:34写道：
>>  PI>
>>  PI>                 Hi,
>>  PI>
>>  PI>                 First comes some background, then I have some
>> questions.
>>  PI>
>>  PI>                 Background
>>  PI>                 I'm trying out Zeppelin 0.8.2 based on the Docker
>> image. My Docker file looks like this:
>>  PI>
>>  PI>                 ```Dockerfile
>>  PI>                 FROM apache/zeppelin:0.8.2
>>
>>  PI>
>>  PI>                 # Install Java and some tools
>>  PI>                 RUN apt-get -y update &&\
>>  PI>                     DEBIAN_FRONTEND=noninteractive \
>>  PI>                         apt -y install vim python3-pip
>>  PI>
>>  PI>                 RUN python3 -m pip install -U pyspark
>>  PI>
>>  PI>                 ENV PYSPARK_PYTHON python3
>>  PI>                 ENV PYSPARK_DRIVER_PYTHON python3
>>  PI>                 ```
>>  PI>
>>  PI>                 When I start a section like so
>>  PI>
>>  PI>                 ```Zeppelin paragraph
>>  PI>                 %pyspark
>>  PI>
>>  PI>                 print(sc)
>>  PI>                 print()
>>  PI>                 print(dir(sc))
>>  PI>                 print()
>>  PI>                 print(sc.master)
>>  PI>                 print()
>>  PI>                 print(sc.defaultParallelism)
>>  PI>                 ```
>>  PI>
>>  PI>                 I get the following output
>>  PI>
>>  PI>                 ```output
>>  PI>                 <SparkContext master=local appName=Zeppelin>
>> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__',
>>  PI>                 '__doc__', '__enter__', '__eq__', '__exit__',
>> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
>>  PI>                 '__hash__', '__init__', '__le__', '__lt__',
>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>> '__repr__',
>>  PI>                 '__setattr__', '__sizeof__', '__str__',
>> '__subclasshook__', '__weakref__', '_accumulatorServer',
>> '_active_spark_context',
>>  PI>                 '_batchSize', '_callsite', '_checkpointFile',
>> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway',
>>  PI>                 '_getJavaStorageLevel', '_initialize_context',
>> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
>>  PI>                 '_pickled_broadcast_vars', '_python_includes',
>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>> 'addFile',
>>  PI>                 'addPyFile', 'appName', 'applicationId',
>> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs',
>> 'cancelJobGroup',
>>  PI>                 'defaultMinPartitions', 'defaultParallelism',
>> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty',
>>  PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD', 'master',
>> 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', 'pickleFile',
>>  PI>                 'profiler_collector', 'pythonExec', 'pythonVer',
>> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
>>  PI>                 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime',
>>  PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl',
>> 'union', 'version', 'wholeTextFiles'] local 1
>>  PI>                 ```
>>  PI>
>>  PI>                 This even though the "master" property in the
>> interpretter is set to "local[*]". I'd like to use all cores on my machine.
>> To
>>  PI>                 do that I have to explicitly create the
>> "spark.master" property in the spark interpretter with the value
>> "local[*]", then I
>>  PI>                 get
>>  PI>
>>  PI>                 ```new output
>>  PI>                 <SparkContext master=local[*] appName=Zeppelin>
>> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__',
>>  PI>                 '__doc__', '__enter__', '__eq__', '__exit__',
>> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
>>  PI>                 '__hash__', '__init__', '__le__', '__lt__',
>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>> '__repr__',
>>  PI>                 '__setattr__', '__sizeof__', '__str__',
>> '__subclasshook__', '__weakref__', '_accumulatorServer',
>> '_active_spark_context',
>>  PI>                 '_batchSize', '_callsite', '_checkpointFile',
>> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway',
>>  PI>                 '_getJavaStorageLevel', '_initialize_context',
>> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
>>  PI>                 '_pickled_broadcast_vars', '_python_includes',
>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>> 'addFile',
>>  PI>                 'addPyFile', 'appName', 'applicationId',
>> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs',
>> 'cancelJobGroup',
>>  PI>                 'defaultMinPartitions', 'defaultParallelism',
>> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty',
>>  PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD', 'master',
>> 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', 'pickleFile',
>>  PI>                 'profiler_collector', 'pythonExec', 'pythonVer',
>> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
>>  PI>                 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime',
>>  PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl',
>> 'union', 'version', 'wholeTextFiles'] local[*] 8
>>  PI>                 ```
>>  PI>                 This is what I want.
>>  PI>
>>  PI>                 The Questions
>>  PI>                   @ Why is the "master" property not used in the
>> created SparkContext?
>>  PI>                   @ How do I add the spark.master property to the
>> docker image?
>>  PI>
>>  PI>                 Any hint or support you can provide would be greatly
>> appreciated.
>>  PI>
>>  PI>                 Yours Sincerely,
>>  PI>                 Patrik Iselind
>>
>>  PI>             --
>>  PI>             Best Regards
>>  PI>
>>  PI>             Jeff Zhang
>>
>>  PI>     --
>>  PI>     Best Regards
>>  PI>
>>  PI>     Jeff Zhang
>>
>>
>>
>> --
>> With best wishes,                    Alex Ott
>> http://alexott.net/
>> Twitter: alexott_en (English), alexott (Russian)
>>
>

-- 
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

Re: Apache Spark master value question

Posted by Alex Ott <al...@gmail.com>.

Thank you for clarification Patrik

Can you create JIRA for tracking & fixing of this?

thanks

Patrik Iselind  at "Sat, 16 May 2020 15:45:07 +0200" wrote:
 PI> Hi Alex,

 PI> Thanks a lot for helping out with this.

 PI> You're correct, but it doesn't seem that it's the interpreter-settings.json for Spark interpreter that is being used. It's conf/
 PI> interpreter.json. In this file both 0.8.2 and 0.9.0 have 
 PI> ```partial-json
 PI>     "spark": {
 PI>       "id": "spark",
 PI>       "name": "spark",
 PI>       "group": "spark",
 PI>       "properties": {
 PI>         "SPARK_HOME": {
 PI>           "name": "SPARK_HOME",
 PI>           "value": "",
 PI>           "type": "string",
 PI>           "description": "Location of spark distribution"
 PI>         },
 PI>         "master": {
 PI>           "name": "master",
 PI>           "value": "local[*]",
 PI>           "type": "string",
 PI>           "description": "Spark master uri. local | yarn-client | yarn-cluster | spark master address of standalone mode, ex) spark://
 PI> master_host:7077"
 PI>         },
 PI> ```
 PI> That "master" should be "spark.master".

 PI> By adding an explicit spark.master with the value "local[*]" I can use all cores as expected. Without this and printing sc.master I get
 PI> "local". With the addition of the spark.master property set to "local[*]" and printing sc.master I get "local[*]". My conclusion is that conf/
 PI> interpreter.json isn't in sync with the interpreter-settings.json for Spark interpreter.

 PI> Best regards,
 PI> Patrik Iselind

 PI> On Sat, May 16, 2020 at 11:22 AM Alex Ott <al...@gmail.com> wrote:

 PI>     Spark master is set to `local[*]` by default. Here is corresponding piece
 PI>     form interpreter-settings.json for Spark interpreter:
 PI>    
 PI>           "master": {
 PI>             "envName": "MASTER",
 PI>             "propertyName": "spark.master",
 PI>             "defaultValue": "local[*]",
 PI>             "description": "Spark master uri. local | yarn-client | yarn-cluster | spark master address of standalone mode, ex) spark://
 PI>     master_host:7077",
 PI>             "type": "string"
 PI>           },

 PI>     Patrik Iselind  at "Sun, 10 May 2020 20:31:08 +0200" wrote:
 PI>      PI> Hi Jeff,
 PI>    
 PI>      PI> I've tried the release from http://zeppelin.apache.org/download.html, both in a docker and without a docker. They both have the same
 PI>     issue as
 PI>      PI> previously described.
 PI>    
 PI>      PI> Can I somehow set spark.master to "local[*]" in zeppelin, perhaps using some environment variable?
 PI>    
 PI>      PI> When is the next Zeppelin 0.9.0 docker image planned to be released?
 PI>    
 PI>      PI> Best Regards,
 PI>      PI> Patrik Iselind
 PI>    
 PI>      PI> On Sun, May 10, 2020 at 9:26 AM Jeff Zhang <zj...@gmail.com> wrote:
 PI>    
 PI>      PI>     Hi Patric,
 PI>      PI>   
 PI>      PI>     Do you mind to try the 0.9.0-preview, it might be an issue of docker container.
 PI>      PI>   
 PI>      PI>     http://zeppelin.apache.org/download.html
 PI>    
 PI>      PI>     Patrik Iselind <pa...@gmail.com> 于2020年5月10日周日上午2:30写道：
 PI>      PI>   
 PI>      PI>         Hello Jeff,
 PI>      PI>       
 PI>      PI>         Thank you for looking into this for me.
 PI>      PI>       
 PI>      PI>         Using the latest pushed docker image for 0.9.0 (image ID 92890adfadfb, built 6 weeks ago), I still see the same issue. My
 PI>     image has
 PI>      PI>         the digest "apache/zeppelin@sha256:0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092".
 PI>      PI>       
 PI>      PI>         If it's not on the tip of master, could you guys please release a newer 0.9.0 image?
 PI>      PI>       
 PI>      PI>         Best Regards,
 PI>      PI>         Patrik Iselind
 PI>    
 PI>      PI>         On Sat, May 9, 2020 at 4:03 PM Jeff Zhang <zj...@gmail.com> wrote:
 PI>      PI>       
 PI>      PI>             This might be a bug of 0.8, I tried that in 0.9 (master branch), it works for me.
 PI>      PI>           
 PI>      PI>             print(sc.master)
 PI>      PI>             print(sc.defaultParallelism)
 PI>      PI>           
 PI>      PI>             ---
 PI>      PI>             local[*] 8
 PI>    
 PI>      PI>             Patrik Iselind <pa...@gmail.com> 于2020年5月9日周六下午8:34写道：
 PI>      PI>           
 PI>      PI>                 Hi,
 PI>      PI>               
 PI>      PI>                 First comes some background, then I have some questions.
 PI>      PI>               
 PI>      PI>                 Background
 PI>      PI>                 I'm trying out Zeppelin 0.8.2 based on the Docker image. My Docker file looks like this:
 PI>      PI>               
 PI>      PI>                 ```Dockerfile
 PI>      PI>                 FROM apache/zeppelin:0.8.2                                                                                           
 PI>            
 PI>      PI>               
 PI>      PI>                 # Install Java and some tools
 PI>      PI>                 RUN apt-get -y update &&\
 PI>      PI>                     DEBIAN_FRONTEND=noninteractive \
 PI>      PI>                         apt -y install vim python3-pip
 PI>      PI>               
 PI>      PI>                 RUN python3 -m pip install -U pyspark
 PI>      PI>               
 PI>      PI>                 ENV PYSPARK_PYTHON python3
 PI>      PI>                 ENV PYSPARK_DRIVER_PYTHON python3
 PI>      PI>                 ```
 PI>      PI>               
 PI>      PI>                 When I start a section like so
 PI>      PI>               
 PI>      PI>                 ```Zeppelin paragraph
 PI>      PI>                 %pyspark
 PI>      PI>               
 PI>      PI>                 print(sc)
 PI>      PI>                 print()
 PI>      PI>                 print(dir(sc))
 PI>      PI>                 print()
 PI>      PI>                 print(sc.master)
 PI>      PI>                 print()
 PI>      PI>                 print(sc.defaultParallelism)
 PI>      PI>                 ```
 PI>      PI>               
 PI>      PI>                 I get the following output
 PI>      PI>               
 PI>      PI>                 ```output
 PI>      PI>                 <SparkContext master=local appName=Zeppelin> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__',
 PI>     '__dir__',
 PI>      PI>                 '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
 PI>      PI>                 '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
 PI>     '__repr__',
 PI>      PI>                 '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_accumulatorServer',
 PI>     '_active_spark_context',
 PI>      PI>                 '_batchSize', '_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized',
 PI>     '_gateway',
 PI>      PI>                 '_getJavaStorageLevel', '_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
 PI>      PI>                 '_pickled_broadcast_vars', '_python_includes', '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
 PI>     'addFile',
 PI>      PI>                 'addPyFile', 'appName', 'applicationId', 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs',
 PI>     'cancelJobGroup',
 PI>      PI>                 'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD', 'environment', 'getConf',
 PI>     'getLocalProperty',
 PI>      PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize',
 PI>     'pickleFile',
 PI>      PI>                 'profiler_collector', 'pythonExec', 'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
 PI>      PI>                 'setJobGroup', 'setLocalProperty', 'setLogLevel', 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
 PI>     'startTime',
 PI>      PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union', 'version', 'wholeTextFiles'] local 1
 PI>      PI>                 ```
 PI>      PI>               
 PI>      PI>                 This even though the "master" property in the interpretter is set to "local[*]". I'd like to use all cores on my
 PI>     machine. To
 PI>      PI>                 do that I have to explicitly create the "spark.master" property in the spark interpretter with the value "local[*]",
 PI>     then I
 PI>      PI>                 get
 PI>      PI>               
 PI>      PI>                 ```new output
 PI>      PI>                 <SparkContext master=local[*] appName=Zeppelin> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__',
 PI>     '__dir__',
 PI>      PI>                 '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
 PI>      PI>                 '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
 PI>     '__repr__',
 PI>      PI>                 '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_accumulatorServer',
 PI>     '_active_spark_context',
 PI>      PI>                 '_batchSize', '_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized',
 PI>     '_gateway',
 PI>      PI>                 '_getJavaStorageLevel', '_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
 PI>      PI>                 '_pickled_broadcast_vars', '_python_includes', '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
 PI>     'addFile',
 PI>      PI>                 'addPyFile', 'appName', 'applicationId', 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs',
 PI>     'cancelJobGroup',
 PI>      PI>                 'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD', 'environment', 'getConf',
 PI>     'getLocalProperty',
 PI>      PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize',
 PI>     'pickleFile',
 PI>      PI>                 'profiler_collector', 'pythonExec', 'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
 PI>      PI>                 'setJobGroup', 'setLocalProperty', 'setLogLevel', 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
 PI>     'startTime',
 PI>      PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union', 'version', 'wholeTextFiles'] local[*] 8
 PI>      PI>                 ```
 PI>      PI>                 This is what I want.
 PI>      PI>               
 PI>      PI>                 The Questions
 PI>      PI>                   @ Why is the "master" property not used in the created SparkContext?
 PI>      PI>                   @ How do I add the spark.master property to the docker image?
 PI>      PI>               
 PI>      PI>                 Any hint or support you can provide would be greatly appreciated.
 PI>      PI>               
 PI>      PI>                 Yours Sincerely,
 PI>      PI>                 Patrik Iselind
 PI>    
 PI>      PI>             --
 PI>      PI>             Best Regards
 PI>      PI>           
 PI>      PI>             Jeff Zhang
 PI>    
 PI>      PI>     --
 PI>      PI>     Best Regards
 PI>      PI>   
 PI>      PI>     Jeff Zhang

 PI>     --
 PI>     With best wishes,                    Alex Ott
 PI>     http://alexott.net/
 PI>     Twitter: alexott_en (English), alexott (Russian)



-- 
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

Re: Apache Spark master value question

Posted by Patrik Iselind <pa...@gmail.com>.

Hi Alex,

Thanks a lot for helping out with this.

You're correct, but it doesn't seem that it's the interpreter-settings.json
for Spark interpreter that is being used. It's conf/interpreter.json. In
this file both 0.8.2 and 0.9.0 have
```partial-json
    "spark": {
      "id": "spark",
      "name": "spark",
      "group": "spark",
      "properties": {
        "SPARK_HOME": {
          "name": "SPARK_HOME",
          "value": "",
          "type": "string",
          "description": "Location of spark distribution"
        },
        "master": {
          "name": "master",
          "value": "local[*]",
          "type": "string",
          "description": "Spark master uri. local | yarn-client |
yarn-cluster | spark master address of standalone mode, ex)
spark://master_host:7077"
        },
```
That "master" should be "spark.master".

By adding an explicit spark.master with the value "local[*]" I can use all
cores as expected. Without this and printing sc.master I get "local". With
the addition of the spark.master property set to "local[*]" and printing
sc.master I get "local[*]". My conclusion is that conf/interpreter.json
isn't in sync with the interpreter-settings.json for Spark interpreter.

Best regards,
Patrik Iselind


On Sat, May 16, 2020 at 11:22 AM Alex Ott <al...@gmail.com> wrote:

> Spark master is set to `local[*]` by default. Here is corresponding piece
> form interpreter-settings.json for Spark interpreter:
>
>       "master": {
>         "envName": "MASTER",
>         "propertyName": "spark.master",
>         "defaultValue": "local[*]",
>         "description": "Spark master uri. local | yarn-client |
> yarn-cluster | spark master address of standalone mode, ex)
> spark://master_host:7077",
>         "type": "string"
>       },
>
>
> Patrik Iselind  at "Sun, 10 May 2020 20:31:08 +0200" wrote:
>  PI> Hi Jeff,
>
>  PI> I've tried the release from http://zeppelin.apache.org/download.html, both
> in a docker and without a docker. They both have the same issue as
>  PI> previously described.
>
>  PI> Can I somehow set spark.master to "local[*]" in zeppelin, perhaps
> using some environment variable?
>
>  PI> When is the next Zeppelin 0.9.0 docker image planned to be released?
>
>  PI> Best Regards,
>  PI> Patrik Iselind
>
>  PI> On Sun, May 10, 2020 at 9:26 AM Jeff Zhang <zj...@gmail.com> wrote:
>
>  PI>     Hi Patric,
>  PI>
>  PI>     Do you mind to try the 0.9.0-preview, it might be an issue of
> docker container.
>  PI>
>  PI>     http://zeppelin.apache.org/download.html
>
>  PI>     Patrik Iselind <pa...@gmail.com> 于2020年5月10日周日上午2:30写道：
>  PI>
>  PI>         Hello Jeff,
>  PI>
>  PI>         Thank you for looking into this for me.
>  PI>
>  PI>         Using the latest pushed docker image for 0.9.0 (image
> ID 92890adfadfb, built 6 weeks ago), I still see the same issue. My image
> has
>  PI>         the digest "apache/zeppelin@sha256
> :0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092".
>  PI>
>  PI>         If it's not on the tip of master, could you guys please
> release a newer 0.9.0 image?
>  PI>
>  PI>         Best Regards,
>  PI>         Patrik Iselind
>
>  PI>         On Sat, May 9, 2020 at 4:03 PM Jeff Zhang <zj...@gmail.com>
> wrote:
>  PI>
>  PI>             This might be a bug of 0.8, I tried that in 0.9 (master
> branch), it works for me.
>  PI>
>  PI>             print(sc.master)
>  PI>             print(sc.defaultParallelism)
>  PI>
>  PI>             ---
>  PI>             local[*] 8
>
>  PI>             Patrik Iselind <pa...@gmail.com>
> 于2020年5月9日周六下午8:34写道：
>  PI>
>  PI>                 Hi,
>  PI>
>  PI>                 First comes some background, then I have some
> questions.
>  PI>
>  PI>                 Background
>  PI>                 I'm trying out Zeppelin 0.8.2 based on the Docker
> image. My Docker file looks like this:
>  PI>
>  PI>                 ```Dockerfile
>  PI>                 FROM apache/zeppelin:0.8.2
>
>  PI>
>  PI>                 # Install Java and some tools
>  PI>                 RUN apt-get -y update &&\
>  PI>                     DEBIAN_FRONTEND=noninteractive \
>  PI>                         apt -y install vim python3-pip
>  PI>
>  PI>                 RUN python3 -m pip install -U pyspark
>  PI>
>  PI>                 ENV PYSPARK_PYTHON python3
>  PI>                 ENV PYSPARK_DRIVER_PYTHON python3
>  PI>                 ```
>  PI>
>  PI>                 When I start a section like so
>  PI>
>  PI>                 ```Zeppelin paragraph
>  PI>                 %pyspark
>  PI>
>  PI>                 print(sc)
>  PI>                 print()
>  PI>                 print(dir(sc))
>  PI>                 print()
>  PI>                 print(sc.master)
>  PI>                 print()
>  PI>                 print(sc.defaultParallelism)
>  PI>                 ```
>  PI>
>  PI>                 I get the following output
>  PI>
>  PI>                 ```output
>  PI>                 <SparkContext master=local appName=Zeppelin>
> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__',
>  PI>                 '__doc__', '__enter__', '__eq__', '__exit__',
> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
>  PI>                 '__hash__', '__init__', '__le__', '__lt__',
> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
> '__repr__',
>  PI>                 '__setattr__', '__sizeof__', '__str__',
> '__subclasshook__', '__weakref__', '_accumulatorServer',
> '_active_spark_context',
>  PI>                 '_batchSize', '_callsite', '_checkpointFile',
> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway',
>  PI>                 '_getJavaStorageLevel', '_initialize_context',
> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
>  PI>                 '_pickled_broadcast_vars', '_python_includes',
> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
> 'addFile',
>  PI>                 'addPyFile', 'appName', 'applicationId',
> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs',
> 'cancelJobGroup',
>  PI>                 'defaultMinPartitions', 'defaultParallelism',
> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty',
>  PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD', 'master',
> 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', 'pickleFile',
>  PI>                 'profiler_collector', 'pythonExec', 'pythonVer',
> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
>  PI>                 'setJobGroup', 'setLocalProperty', 'setLogLevel',
> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime',
>  PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl',
> 'union', 'version', 'wholeTextFiles'] local 1
>  PI>                 ```
>  PI>
>  PI>                 This even though the "master" property in the
> interpretter is set to "local[*]". I'd like to use all cores on my machine.
> To
>  PI>                 do that I have to explicitly create the
> "spark.master" property in the spark interpretter with the value
> "local[*]", then I
>  PI>                 get
>  PI>
>  PI>                 ```new output
>  PI>                 <SparkContext master=local[*] appName=Zeppelin>
> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__',
>  PI>                 '__doc__', '__enter__', '__eq__', '__exit__',
> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
>  PI>                 '__hash__', '__init__', '__le__', '__lt__',
> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
> '__repr__',
>  PI>                 '__setattr__', '__sizeof__', '__str__',
> '__subclasshook__', '__weakref__', '_accumulatorServer',
> '_active_spark_context',
>  PI>                 '_batchSize', '_callsite', '_checkpointFile',
> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway',
>  PI>                 '_getJavaStorageLevel', '_initialize_context',
> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
>  PI>                 '_pickled_broadcast_vars', '_python_includes',
> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
> 'addFile',
>  PI>                 'addPyFile', 'appName', 'applicationId',
> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs',
> 'cancelJobGroup',
>  PI>                 'defaultMinPartitions', 'defaultParallelism',
> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty',
>  PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD', 'master',
> 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', 'pickleFile',
>  PI>                 'profiler_collector', 'pythonExec', 'pythonVer',
> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
>  PI>                 'setJobGroup', 'setLocalProperty', 'setLogLevel',
> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime',
>  PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl',
> 'union', 'version', 'wholeTextFiles'] local[*] 8
>  PI>                 ```
>  PI>                 This is what I want.
>  PI>
>  PI>                 The Questions
>  PI>                   @ Why is the "master" property not used in the
> created SparkContext?
>  PI>                   @ How do I add the spark.master property to the
> docker image?
>  PI>
>  PI>                 Any hint or support you can provide would be greatly
> appreciated.
>  PI>
>  PI>                 Yours Sincerely,
>  PI>                 Patrik Iselind
>
>  PI>             --
>  PI>             Best Regards
>  PI>
>  PI>             Jeff Zhang
>
>  PI>     --
>  PI>     Best Regards
>  PI>
>  PI>     Jeff Zhang
>
>
>
> --
> With best wishes,                    Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
>

Re: Apache Spark master value question

Posted by Alex Ott <al...@gmail.com>.

Spark master is set to `local[*]` by default. Here is corresponding piece
form interpreter-settings.json for Spark interpreter:

      "master": {
        "envName": "MASTER",
        "propertyName": "spark.master",
        "defaultValue": "local[*]",
        "description": "Spark master uri. local | yarn-client | yarn-cluster | spark master address of standalone mode, ex) spark://master_host:7077",
        "type": "string"
      },


Patrik Iselind  at "Sun, 10 May 2020 20:31:08 +0200" wrote:
 PI> Hi Jeff,

 PI> I've tried the release from http://zeppelin.apache.org/download.html, both in a docker and without a docker. They both have the same issue as
 PI> previously described.

 PI> Can I somehow set spark.master to "local[*]" in zeppelin, perhaps using some environment variable?

 PI> When is the next Zeppelin 0.9.0 docker image planned to be released?

 PI> Best Regards,
 PI> Patrik Iselind

 PI> On Sun, May 10, 2020 at 9:26 AM Jeff Zhang <zj...@gmail.com> wrote:

 PI>     Hi Patric,
 PI>    
 PI>     Do you mind to try the 0.9.0-preview, it might be an issue of docker container.
 PI>    
 PI>     http://zeppelin.apache.org/download.html

 PI>     Patrik Iselind <pa...@gmail.com> 于2020年5月10日周日上午2:30写道：
 PI>    
 PI>         Hello Jeff,
 PI>        
 PI>         Thank you for looking into this for me.
 PI>        
 PI>         Using the latest pushed docker image for 0.9.0 (image ID 92890adfadfb, built 6 weeks ago), I still see the same issue. My image has
 PI>         the digest "apache/zeppelin@sha256:0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092".
 PI>        
 PI>         If it's not on the tip of master, could you guys please release a newer 0.9.0 image?
 PI>        
 PI>         Best Regards,
 PI>         Patrik Iselind

 PI>         On Sat, May 9, 2020 at 4:03 PM Jeff Zhang <zj...@gmail.com> wrote:
 PI>        
 PI>             This might be a bug of 0.8, I tried that in 0.9 (master branch), it works for me.
 PI>            
 PI>             print(sc.master)
 PI>             print(sc.defaultParallelism)
 PI>            
 PI>             ---
 PI>             local[*] 8

 PI>             Patrik Iselind <pa...@gmail.com> 于2020年5月9日周六下午8:34写道：
 PI>            
 PI>                 Hi,
 PI>                
 PI>                 First comes some background, then I have some questions.
 PI>                
 PI>                 Background
 PI>                 I'm trying out Zeppelin 0.8.2 based on the Docker image. My Docker file looks like this:
 PI>                
 PI>                 ```Dockerfile
 PI>                 FROM apache/zeppelin:0.8.2                                                                                                   
 PI>                
 PI>                 # Install Java and some tools
 PI>                 RUN apt-get -y update &&\
 PI>                     DEBIAN_FRONTEND=noninteractive \
 PI>                         apt -y install vim python3-pip
 PI>                
 PI>                 RUN python3 -m pip install -U pyspark
 PI>                
 PI>                 ENV PYSPARK_PYTHON python3
 PI>                 ENV PYSPARK_DRIVER_PYTHON python3
 PI>                 ```
 PI>                
 PI>                 When I start a section like so
 PI>                
 PI>                 ```Zeppelin paragraph
 PI>                 %pyspark
 PI>                
 PI>                 print(sc)
 PI>                 print()
 PI>                 print(dir(sc))
 PI>                 print()
 PI>                 print(sc.master)
 PI>                 print()
 PI>                 print(sc.defaultParallelism)
 PI>                 ```
 PI>                
 PI>                 I get the following output
 PI>                
 PI>                 ```output
 PI>                 <SparkContext master=local appName=Zeppelin> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__',
 PI>                 '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
 PI>                 '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
 PI>                 '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_accumulatorServer', '_active_spark_context',
 PI>                 '_batchSize', '_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway',
 PI>                 '_getJavaStorageLevel', '_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
 PI>                 '_pickled_broadcast_vars', '_python_includes', '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator', 'addFile',
 PI>                 'addPyFile', 'appName', 'applicationId', 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
 PI>                 'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty',
 PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', 'pickleFile',
 PI>                 'profiler_collector', 'pythonExec', 'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
 PI>                 'setJobGroup', 'setLocalProperty', 'setLogLevel', 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime',
 PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union', 'version', 'wholeTextFiles'] local 1
 PI>                 ```
 PI>                
 PI>                 This even though the "master" property in the interpretter is set to "local[*]". I'd like to use all cores on my machine. To
 PI>                 do that I have to explicitly create the "spark.master" property in the spark interpretter with the value "local[*]", then I
 PI>                 get
 PI>                
 PI>                 ```new output
 PI>                 <SparkContext master=local[*] appName=Zeppelin> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__',
 PI>                 '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
 PI>                 '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
 PI>                 '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_accumulatorServer', '_active_spark_context',
 PI>                 '_batchSize', '_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway',
 PI>                 '_getJavaStorageLevel', '_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
 PI>                 '_pickled_broadcast_vars', '_python_includes', '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator', 'addFile',
 PI>                 'addPyFile', 'appName', 'applicationId', 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
 PI>                 'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty',
 PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', 'pickleFile',
 PI>                 'profiler_collector', 'pythonExec', 'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
 PI>                 'setJobGroup', 'setLocalProperty', 'setLogLevel', 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime',
 PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union', 'version', 'wholeTextFiles'] local[*] 8
 PI>                 ```
 PI>                 This is what I want.
 PI>                
 PI>                 The Questions
 PI>                   @ Why is the "master" property not used in the created SparkContext?
 PI>                   @ How do I add the spark.master property to the docker image?
 PI>                
 PI>                 Any hint or support you can provide would be greatly appreciated.
 PI>                
 PI>                 Yours Sincerely,
 PI>                 Patrik Iselind

 PI>             --
 PI>             Best Regards
 PI>            
 PI>             Jeff Zhang

 PI>     --
 PI>     Best Regards
 PI>    
 PI>     Jeff Zhang



-- 
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

Re: Apache Spark master value question

Posted by Patrik Iselind <pa...@gmail.com>.

Hi Jeff,

I've tried the release from http://zeppelin.apache.org/download.html, both
in a docker and without a docker. They both have the same issue as
previously described.

Can I somehow set spark.master to "local[*]" in zeppelin, perhaps using
some environment variable?

When is the next Zeppelin 0.9.0 docker image planned to be released?

Best Regards,
Patrik Iselind


On Sun, May 10, 2020 at 9:26 AM Jeff Zhang <zj...@gmail.com> wrote:

> Hi Patric,
>
> Do you mind to try the 0.9.0-preview, it might be an issue of docker
> container.
>
> http://zeppelin.apache.org/download.html
>
>
>
> Patrik Iselind <pa...@gmail.com> 于2020年5月10日周日 上午2:30写道：
>
>> Hello Jeff,
>>
>> Thank you for looking into this for me.
>>
>> Using the latest pushed docker image for 0.9.0 (image ID 92890adfadfb,
>> built 6 weeks ago), I still see the same issue. My image has the
>> digest "apache/zeppelin@sha256
>> :0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092".
>>
>> If it's not on the tip of master, could you guys please release a newer
>> 0.9.0 image?
>>
>> Best Regards,
>> Patrik Iselind
>>
>>
>> On Sat, May 9, 2020 at 4:03 PM Jeff Zhang <zj...@gmail.com> wrote:
>>
>>> This might be a bug of 0.8, I tried that in 0.9 (master branch), it
>>> works for me.
>>>
>>> print(sc.master)
>>> print(sc.defaultParallelism)
>>>
>>> ---
>>> local[*] 8
>>>
>>>
>>> Patrik Iselind <pa...@gmail.com> 于2020年5月9日周六 下午8:34写道：
>>>
>>>> Hi,
>>>>
>>>> First comes some background, then I have some questions.
>>>>
>>>> *Background*
>>>> I'm trying out Zeppelin 0.8.2 based on the Docker image. My Docker file
>>>> looks like this:
>>>>
>>>> ```Dockerfile
>>>> FROM apache/zeppelin:0.8.2
>>>>
>>>>
>>>> # Install Java and some tools
>>>> RUN apt-get -y update &&\
>>>>     DEBIAN_FRONTEND=noninteractive \
>>>>         apt -y install vim python3-pip
>>>>
>>>> RUN python3 -m pip install -U pyspark
>>>>
>>>> ENV PYSPARK_PYTHON python3
>>>> ENV PYSPARK_DRIVER_PYTHON python3
>>>> ```
>>>>
>>>> When I start a section like so
>>>>
>>>> ```Zeppelin paragraph
>>>> %pyspark
>>>>
>>>> print(sc)
>>>> print()
>>>> print(dir(sc))
>>>> print()
>>>> print(sc.master)
>>>> print()
>>>> print(sc.defaultParallelism)
>>>> ```
>>>>
>>>> I get the following output
>>>>
>>>> ```output
>>>> <SparkContext master=local appName=Zeppelin> ['PACKAGE_EXTENSIONS',
>>>> '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__',
>>>> '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__',
>>>> '__getnewargs__', '__gt__', '__hash__', '__init__', '__le__', '__lt__',
>>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>>>> '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
>>>> '__weakref__', '_accumulatorServer', '_active_spark_context', '_batchSize',
>>>> '_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init',
>>>> '_ensure_initialized', '_gateway', '_getJavaStorageLevel',
>>>> '_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock',
>>>> '_next_accum_id', '_pickled_broadcast_vars', '_python_includes',
>>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>>>> 'addFile', 'addPyFile', 'appName', 'applicationId', 'binaryFiles',
>>>> 'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
>>>> 'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD',
>>>> 'environment', 'getConf', 'getLocalProperty', 'getOrCreate', 'hadoopFile',
>>>> 'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD',
>>>> 'parallelize', 'pickleFile', 'profiler_collector', 'pythonExec',
>>>> 'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer',
>>>> 'setCheckpointDir', 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>>>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
>>>> 'startTime', 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union',
>>>> 'version', 'wholeTextFiles'] local 1
>>>> ```
>>>>
>>>> This even though the "master" property in the interpretter is set to
>>>> "local[*]". I'd like to use all cores on my machine. To do that I have to
>>>> explicitly create the "spark.master" property in the spark
>>>> interpretter with the value "local[*]", then I get
>>>>
>>>> ```new output
>>>> <SparkContext master=local[*] appName=Zeppelin> ['PACKAGE_EXTENSIONS',
>>>> '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__',
>>>> '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__',
>>>> '__getnewargs__', '__gt__', '__hash__', '__init__', '__le__', '__lt__',
>>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>>>> '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
>>>> '__weakref__', '_accumulatorServer', '_active_spark_context', '_batchSize',
>>>> '_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init',
>>>> '_ensure_initialized', '_gateway', '_getJavaStorageLevel',
>>>> '_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock',
>>>> '_next_accum_id', '_pickled_broadcast_vars', '_python_includes',
>>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>>>> 'addFile', 'addPyFile', 'appName', 'applicationId', 'binaryFiles',
>>>> 'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
>>>> 'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD',
>>>> 'environment', 'getConf', 'getLocalProperty', 'getOrCreate', 'hadoopFile',
>>>> 'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD',
>>>> 'parallelize', 'pickleFile', 'profiler_collector', 'pythonExec',
>>>> 'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer',
>>>> 'setCheckpointDir', 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>>>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
>>>> 'startTime', 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union',
>>>> 'version', 'wholeTextFiles'] local[*] 8
>>>> ```
>>>> This is what I want.
>>>>
>>>> *The Questions*
>>>>
>>>>    - Why is the "master" property not used in the created SparkContext?
>>>>    - How do I add the spark.master property to the docker image?
>>>>
>>>>
>>>> Any hint or support you can provide would be greatly appreciated.
>>>>
>>>> Yours Sincerely,
>>>> Patrik Iselind
>>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Apache Spark master value question

Posted by Jeff Zhang <zj...@gmail.com>.

Hi Patric,

Do you mind to try the 0.9.0-preview, it might be an issue of docker
container.

http://zeppelin.apache.org/download.html



Patrik Iselind <pa...@gmail.com> 于2020年5月10日周日 上午2:30写道：

> Hello Jeff,
>
> Thank you for looking into this for me.
>
> Using the latest pushed docker image for 0.9.0 (image ID 92890adfadfb,
> built 6 weeks ago), I still see the same issue. My image has the
> digest "apache/zeppelin@sha256
> :0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092".
>
> If it's not on the tip of master, could you guys please release a newer
> 0.9.0 image?
>
> Best Regards,
> Patrik Iselind
>
>
> On Sat, May 9, 2020 at 4:03 PM Jeff Zhang <zj...@gmail.com> wrote:
>
>> This might be a bug of 0.8, I tried that in 0.9 (master branch), it works
>> for me.
>>
>> print(sc.master)
>> print(sc.defaultParallelism)
>>
>> ---
>> local[*] 8
>>
>>
>> Patrik Iselind <pa...@gmail.com> 于2020年5月9日周六 下午8:34写道：
>>
>>> Hi,
>>>
>>> First comes some background, then I have some questions.
>>>
>>> *Background*
>>> I'm trying out Zeppelin 0.8.2 based on the Docker image. My Docker file
>>> looks like this:
>>>
>>> ```Dockerfile
>>> FROM apache/zeppelin:0.8.2
>>>
>>>
>>> # Install Java and some tools
>>> RUN apt-get -y update &&\
>>>     DEBIAN_FRONTEND=noninteractive \
>>>         apt -y install vim python3-pip
>>>
>>> RUN python3 -m pip install -U pyspark
>>>
>>> ENV PYSPARK_PYTHON python3
>>> ENV PYSPARK_DRIVER_PYTHON python3
>>> ```
>>>
>>> When I start a section like so
>>>
>>> ```Zeppelin paragraph
>>> %pyspark
>>>
>>> print(sc)
>>> print()
>>> print(dir(sc))
>>> print()
>>> print(sc.master)
>>> print()
>>> print(sc.defaultParallelism)
>>> ```
>>>
>>> I get the following output
>>>
>>> ```output
>>> <SparkContext master=local appName=Zeppelin> ['PACKAGE_EXTENSIONS',
>>> '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__',
>>> '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__',
>>> '__getnewargs__', '__gt__', '__hash__', '__init__', '__le__', '__lt__',
>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>>> '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
>>> '__weakref__', '_accumulatorServer', '_active_spark_context', '_batchSize',
>>> '_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init',
>>> '_ensure_initialized', '_gateway', '_getJavaStorageLevel',
>>> '_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock',
>>> '_next_accum_id', '_pickled_broadcast_vars', '_python_includes',
>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>>> 'addFile', 'addPyFile', 'appName', 'applicationId', 'binaryFiles',
>>> 'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
>>> 'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD',
>>> 'environment', 'getConf', 'getLocalProperty', 'getOrCreate', 'hadoopFile',
>>> 'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD',
>>> 'parallelize', 'pickleFile', 'profiler_collector', 'pythonExec',
>>> 'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer',
>>> 'setCheckpointDir', 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
>>> 'startTime', 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union',
>>> 'version', 'wholeTextFiles'] local 1
>>> ```
>>>
>>> This even though the "master" property in the interpretter is set to
>>> "local[*]". I'd like to use all cores on my machine. To do that I have to
>>> explicitly create the "spark.master" property in the spark
>>> interpretter with the value "local[*]", then I get
>>>
>>> ```new output
>>> <SparkContext master=local[*] appName=Zeppelin> ['PACKAGE_EXTENSIONS',
>>> '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__',
>>> '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__',
>>> '__getnewargs__', '__gt__', '__hash__', '__init__', '__le__', '__lt__',
>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>>> '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
>>> '__weakref__', '_accumulatorServer', '_active_spark_context', '_batchSize',
>>> '_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init',
>>> '_ensure_initialized', '_gateway', '_getJavaStorageLevel',
>>> '_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock',
>>> '_next_accum_id', '_pickled_broadcast_vars', '_python_includes',
>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>>> 'addFile', 'addPyFile', 'appName', 'applicationId', 'binaryFiles',
>>> 'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
>>> 'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD',
>>> 'environment', 'getConf', 'getLocalProperty', 'getOrCreate', 'hadoopFile',
>>> 'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD',
>>> 'parallelize', 'pickleFile', 'profiler_collector', 'pythonExec',
>>> 'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer',
>>> 'setCheckpointDir', 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
>>> 'startTime', 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union',
>>> 'version', 'wholeTextFiles'] local[*] 8
>>> ```
>>> This is what I want.
>>>
>>> *The Questions*
>>>
>>>    - Why is the "master" property not used in the created SparkContext?
>>>    - How do I add the spark.master property to the docker image?
>>>
>>>
>>> Any hint or support you can provide would be greatly appreciated.
>>>
>>> Yours Sincerely,
>>> Patrik Iselind
>>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>

-- 
Best Regards

Jeff Zhang

Re: Apache Spark master value question

Posted by Patrik Iselind <pa...@gmail.com>.

Hello Jeff,

Thank you for looking into this for me.

Using the latest pushed docker image for 0.9.0 (image ID 92890adfadfb,
built 6 weeks ago), I still see the same issue. My image has the
digest "apache/zeppelin@sha256
:0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092".

If it's not on the tip of master, could you guys please release a newer
0.9.0 image?

Best Regards,
Patrik Iselind


On Sat, May 9, 2020 at 4:03 PM Jeff Zhang <zj...@gmail.com> wrote:

> This might be a bug of 0.8, I tried that in 0.9 (master branch), it works
> for me.
>
> print(sc.master)
> print(sc.defaultParallelism)
>
> ---
> local[*] 8
>
>
> Patrik Iselind <pa...@gmail.com> 于2020年5月9日周六 下午8:34写道：
>
>> Hi,
>>
>> First comes some background, then I have some questions.
>>
>> *Background*
>> I'm trying out Zeppelin 0.8.2 based on the Docker image. My Docker file
>> looks like this:
>>
>> ```Dockerfile
>> FROM apache/zeppelin:0.8.2
>>
>>
>> # Install Java and some tools
>> RUN apt-get -y update &&\
>>     DEBIAN_FRONTEND=noninteractive \
>>         apt -y install vim python3-pip
>>
>> RUN python3 -m pip install -U pyspark
>>
>> ENV PYSPARK_PYTHON python3
>> ENV PYSPARK_DRIVER_PYTHON python3
>> ```
>>
>> When I start a section like so
>>
>> ```Zeppelin paragraph
>> %pyspark
>>
>> print(sc)
>> print()
>> print(dir(sc))
>> print()
>> print(sc.master)
>> print()
>> print(sc.defaultParallelism)
>> ```
>>
>> I get the following output
>>
>> ```output
>> <SparkContext master=local appName=Zeppelin> ['PACKAGE_EXTENSIONS',
>> '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__',
>> '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__',
>> '__getnewargs__', '__gt__', '__hash__', '__init__', '__le__', '__lt__',
>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>> '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
>> '__weakref__', '_accumulatorServer', '_active_spark_context', '_batchSize',
>> '_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init',
>> '_ensure_initialized', '_gateway', '_getJavaStorageLevel',
>> '_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock',
>> '_next_accum_id', '_pickled_broadcast_vars', '_python_includes',
>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>> 'addFile', 'addPyFile', 'appName', 'applicationId', 'binaryFiles',
>> 'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
>> 'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD',
>> 'environment', 'getConf', 'getLocalProperty', 'getOrCreate', 'hadoopFile',
>> 'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD',
>> 'parallelize', 'pickleFile', 'profiler_collector', 'pythonExec',
>> 'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer',
>> 'setCheckpointDir', 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
>> 'startTime', 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union',
>> 'version', 'wholeTextFiles'] local 1
>> ```
>>
>> This even though the "master" property in the interpretter is set to
>> "local[*]". I'd like to use all cores on my machine. To do that I have to
>> explicitly create the "spark.master" property in the spark
>> interpretter with the value "local[*]", then I get
>>
>> ```new output
>> <SparkContext master=local[*] appName=Zeppelin> ['PACKAGE_EXTENSIONS',
>> '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__',
>> '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__',
>> '__getnewargs__', '__gt__', '__hash__', '__init__', '__le__', '__lt__',
>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>> '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
>> '__weakref__', '_accumulatorServer', '_active_spark_context', '_batchSize',
>> '_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init',
>> '_ensure_initialized', '_gateway', '_getJavaStorageLevel',
>> '_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock',
>> '_next_accum_id', '_pickled_broadcast_vars', '_python_includes',
>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>> 'addFile', 'addPyFile', 'appName', 'applicationId', 'binaryFiles',
>> 'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
>> 'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD',
>> 'environment', 'getConf', 'getLocalProperty', 'getOrCreate', 'hadoopFile',
>> 'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD',
>> 'parallelize', 'pickleFile', 'profiler_collector', 'pythonExec',
>> 'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer',
>> 'setCheckpointDir', 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
>> 'startTime', 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union',
>> 'version', 'wholeTextFiles'] local[*] 8
>> ```
>> This is what I want.
>>
>> *The Questions*
>>
>>    - Why is the "master" property not used in the created SparkContext?
>>    - How do I add the spark.master property to the docker image?
>>
>>
>> Any hint or support you can provide would be greatly appreciated.
>>
>> Yours Sincerely,
>> Patrik Iselind
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Apache Spark master value question

Posted by Jeff Zhang <zj...@gmail.com>.

This might be a bug of 0.8, I tried that in 0.9 (master branch), it works
for me.

print(sc.master)
print(sc.defaultParallelism)

---
local[*] 8


Patrik Iselind <pa...@gmail.com> 于2020年5月9日周六 下午8:34写道：

> Hi,
>
> First comes some background, then I have some questions.
>
> *Background*
> I'm trying out Zeppelin 0.8.2 based on the Docker image. My Docker file
> looks like this:
>
> ```Dockerfile
> FROM apache/zeppelin:0.8.2
>
>
> # Install Java and some tools
> RUN apt-get -y update &&\
>     DEBIAN_FRONTEND=noninteractive \
>         apt -y install vim python3-pip
>
> RUN python3 -m pip install -U pyspark
>
> ENV PYSPARK_PYTHON python3
> ENV PYSPARK_DRIVER_PYTHON python3
> ```
>
> When I start a section like so
>
> ```Zeppelin paragraph
> %pyspark
>
> print(sc)
> print()
> print(dir(sc))
> print()
> print(sc.master)
> print()
> print(sc.defaultParallelism)
> ```
>
> I get the following output
>
> ```output
> <SparkContext master=local appName=Zeppelin> ['PACKAGE_EXTENSIONS',
> '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__',
> '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__',
> '__getnewargs__', '__gt__', '__hash__', '__init__', '__le__', '__lt__',
> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
> '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
> '__weakref__', '_accumulatorServer', '_active_spark_context', '_batchSize',
> '_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init',
> '_ensure_initialized', '_gateway', '_getJavaStorageLevel',
> '_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock',
> '_next_accum_id', '_pickled_broadcast_vars', '_python_includes',
> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
> 'addFile', 'addPyFile', 'appName', 'applicationId', 'binaryFiles',
> 'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
> 'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD',
> 'environment', 'getConf', 'getLocalProperty', 'getOrCreate', 'hadoopFile',
> 'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD',
> 'parallelize', 'pickleFile', 'profiler_collector', 'pythonExec',
> 'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer',
> 'setCheckpointDir', 'setJobGroup', 'setLocalProperty', 'setLogLevel',
> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
> 'startTime', 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union',
> 'version', 'wholeTextFiles'] local 1
> ```
>
> This even though the "master" property in the interpretter is set to
> "local[*]". I'd like to use all cores on my machine. To do that I have to
> explicitly create the "spark.master" property in the spark
> interpretter with the value "local[*]", then I get
>
> ```new output
> <SparkContext master=local[*] appName=Zeppelin> ['PACKAGE_EXTENSIONS',
> '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__',
> '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__',
> '__getnewargs__', '__gt__', '__hash__', '__init__', '__le__', '__lt__',
> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
> '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
> '__weakref__', '_accumulatorServer', '_active_spark_context', '_batchSize',
> '_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init',
> '_ensure_initialized', '_gateway', '_getJavaStorageLevel',
> '_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock',
> '_next_accum_id', '_pickled_broadcast_vars', '_python_includes',
> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
> 'addFile', 'addPyFile', 'appName', 'applicationId', 'binaryFiles',
> 'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
> 'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD',
> 'environment', 'getConf', 'getLocalProperty', 'getOrCreate', 'hadoopFile',
> 'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD',
> 'parallelize', 'pickleFile', 'profiler_collector', 'pythonExec',
> 'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer',
> 'setCheckpointDir', 'setJobGroup', 'setLocalProperty', 'setLogLevel',
> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
> 'startTime', 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union',
> 'version', 'wholeTextFiles'] local[*] 8
> ```
> This is what I want.
>
> *The Questions*
>
>    - Why is the "master" property not used in the created SparkContext?
>    - How do I add the spark.master property to the docker image?
>
>
> Any hint or support you can provide would be greatly appreciated.
>
> Yours Sincerely,
> Patrik Iselind
>


-- 
Best Regards

Jeff Zhang