You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemml.apache.org by Gustavo Frederico <gu...@thinkwrap.com> on 2017/07/02 04:46:03 UTC

Install - Configure Jupyter Notebook

A basic question: step 3 in https://systemml.apache.org/install-systemml.html <https://systemml.apache.org/install-systemml.html>  for “Configure Jupyter Notebook” has 
# Start Jupyter Notebook Server
PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark --master local[*] --conf "spark.driver.memory=12g" --conf spark.driver.maxResultSize=0 --conf spark.akka.frameSize=128 --conf spark.default.parallelism=100
Where does that go? There are no details in this step…

Thanks

Gustavo

Re: Install - Configure Jupyter Notebook

Posted by du...@gmail.com.
For a bit more context, this is the general way of starting Jupyter with PySpark support.  In contrast, the usual `jupyter notebook` command will only launch Jupyter with a standard Python kernel.

Additionally, all of the extra "conf" settings in that command refer to settings that could be placed in the standard `conf/spark-defaults.conf`file of your Spark installation, with spaces instead of the equals signs, in case you're already familiar with that.

- Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jul 5, 2017, at 2:14 PM, Niketan Pansare <np...@us.ibm.com> wrote:
> 
> Hi Gustavo,
> 
> You can paste that code into the commandline:
> $ PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark --master local[*] --conf "spark.driver.memory=12g" --conf spark.driver.maxResultSize=0 --conf spark.akka.frameSize=128 --conf spark.default.parallelism=100
> 
> The above command tells "pyspark" that the python driver is jupyter. For more details, please see https://github.com/apache/spark/blob/master/bin/pyspark#L27
> 
> Alternatively, you can follow Arijit's suggestion.
> 
> Thanks,
> 
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
> 
> arijit chakraborty ---07/02/2017 04:22:28 AM---Hi Gustavo, You can put that pyspark details in the jupyter console itself.
> 
> From: arijit chakraborty <ak...@hotmail.com>
> To: "dev@systemml.apache.org" <de...@systemml.apache.org>
> Date: 07/02/2017 04:22 AM
> Subject: Re: Install - Configure Jupyter Notebook
> 
> 
> 
> 
> Hi Gustavo,
> 
> 
> You can put that pyspark details in the jupyter console itself.
> 
> 
> import os
> import sys
> import pandas as pd
> import numpy as np
> 
> spark_path = "C:\spark"
> os.environ['SPARK_HOME'] = spark_path
> os.environ['HADOOP_HOME'] = spark_path
> 
> sys.path.append(spark_path + "/bin")
> sys.path.append(spark_path + "/python")
> sys.path.append(spark_path + "/python/pyspark/")
> sys.path.append(spark_path + "/python/lib")
> sys.path.append(spark_path + "/python/lib/pyspark.zip")
> sys.path.append(spark_path + "/python/lib/py4j-0.10.4-src.zip")
> 
> from pyspark import SparkContext
> from pyspark import SparkConf
> 
> sc = SparkContext("local[*]", "test")
> 
> 
> # SystemML Specifications:
> 
> 
> from pyspark.sql import SQLContext
> import systemml as sml
> sqlCtx = SQLContext(sc)
> ml = sml.MLContext(sc)
> 
> 
> But this is not a very good way of doing it. I did it as I'm using windows and it's easier to do it like that.
> 
> 
> Regards,
> 
> Arijit
> 
> ________________________________
> From: Gustavo Frederico <gu...@thinkwrap.com>
> Sent: Sunday, July 2, 2017 10:16:03 AM
> To: dev@systemml.apache.org
> Subject: Install - Configure Jupyter Notebook
> 
> 
> A basic question: step 3 in https://systemml.apache.org/install-systemml.html <https://systemml.apache.org/install-systemml.html>  for “Configure Jupyter Notebook” has
> # Start Jupyter Notebook Server
> PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark --master local[*] --conf "spark.driver.memory=12g" --conf spark.driver.maxResultSize=0 --conf spark.akka.frameSize=128 --conf spark.default.parallelism=100
> Where does that go? There are no details in this step…
> 
> Thanks
> 
> Gustavo
> 
> 
> 

Re: Install - Configure Jupyter Notebook

Posted by Niketan Pansare <np...@us.ibm.com>.
Hi Gustavo,

You can paste that code into the commandline:
$ PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook"
pyspark --master local[*] --conf "spark.driver.memory=12g" --conf
spark.driver.maxResultSize=0 --conf spark.akka.frameSize=128 --conf
spark.default.parallelism=100

The above command tells "pyspark" that the python driver is jupyter. For
more details, please see
https://github.com/apache/spark/blob/master/bin/pyspark#L27

Alternatively, you can follow Arijit's suggestion.

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar



From:	arijit chakraborty <ak...@hotmail.com>
To:	"dev@systemml.apache.org" <de...@systemml.apache.org>
Date:	07/02/2017 04:22 AM
Subject:	Re: Install - Configure Jupyter Notebook



Hi Gustavo,


You can put that pyspark details in the jupyter console itself.


import os
import sys
import pandas as pd
import numpy as np

spark_path = "C:\spark"
os.environ['SPARK_HOME'] = spark_path
os.environ['HADOOP_HOME'] = spark_path

sys.path.append(spark_path + "/bin")
sys.path.append(spark_path + "/python")
sys.path.append(spark_path + "/python/pyspark/")
sys.path.append(spark_path + "/python/lib")
sys.path.append(spark_path + "/python/lib/pyspark.zip")
sys.path.append(spark_path + "/python/lib/py4j-0.10.4-src.zip")

from pyspark import SparkContext
from pyspark import SparkConf

sc = SparkContext("local[*]", "test")


# SystemML Specifications:


from pyspark.sql import SQLContext
import systemml as sml
sqlCtx = SQLContext(sc)
ml = sml.MLContext(sc)


But this is not a very good way of doing it. I did it as I'm using windows
and it's easier to do it like that.


Regards,

Arijit

________________________________
From: Gustavo Frederico <gu...@thinkwrap.com>
Sent: Sunday, July 2, 2017 10:16:03 AM
To: dev@systemml.apache.org
Subject: Install - Configure Jupyter Notebook


A basic question: step 3 in
https://systemml.apache.org/install-systemml.html <
https://systemml.apache.org/install-systemml.html>  for “Configure Jupyter
Notebook” has
# Start Jupyter Notebook Server
PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark
--master local[*] --conf "spark.driver.memory=12g" --conf
spark.driver.maxResultSize=0 --conf spark.akka.frameSize=128 --conf
spark.default.parallelism=100
Where does that go? There are no details in this step…

Thanks

Gustavo



Re: Install - Configure Jupyter Notebook

Posted by arijit chakraborty <ak...@hotmail.com>.
Hi Gustavo,


You can put that pyspark details in the jupyter console itself.


import os
import sys
import pandas as pd
import numpy as np

spark_path = "C:\spark"
os.environ['SPARK_HOME'] = spark_path
os.environ['HADOOP_HOME'] = spark_path

sys.path.append(spark_path + "/bin")
sys.path.append(spark_path + "/python")
sys.path.append(spark_path + "/python/pyspark/")
sys.path.append(spark_path + "/python/lib")
sys.path.append(spark_path + "/python/lib/pyspark.zip")
sys.path.append(spark_path + "/python/lib/py4j-0.10.4-src.zip")

from pyspark import SparkContext
from pyspark import SparkConf

sc = SparkContext("local[*]", "test")


# SystemML Specifications:


from pyspark.sql import SQLContext
import systemml as sml
sqlCtx = SQLContext(sc)
ml = sml.MLContext(sc)


But this is not a very good way of doing it. I did it as I'm using windows and it's easier to do it like that.


Regards,

Arijit

________________________________
From: Gustavo Frederico <gu...@thinkwrap.com>
Sent: Sunday, July 2, 2017 10:16:03 AM
To: dev@systemml.apache.org
Subject: Install - Configure Jupyter Notebook


A basic question: step 3 in https://systemml.apache.org/install-systemml.html <https://systemml.apache.org/install-systemml.html>  for “Configure Jupyter Notebook” has
# Start Jupyter Notebook Server
PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark --master local[*] --conf "spark.driver.memory=12g" --conf spark.driver.maxResultSize=0 --conf spark.akka.frameSize=128 --conf spark.default.parallelism=100
Where does that go? There are no details in this step…

Thanks

Gustavo