You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/07/22 17:07:32 UTC

running jupyter notebook server Re: spark and plot data

Hi Pseudo

I do not know much about zeppelin . What languages are you using?

I have been doing my data exploration and graphing using python mostly
because early on spark had good support for python. Its easy to collect()
data as a local PANDAS object. I think at this point R should work well. You
should be able to easily collect() your data as a R dataframe. I have not
tried to Rstudio.

I typically run the Jupiter notebook server in my data center. I find the
notebooks really nice. I typically use matplotlib to generates my graph.
There are a lot of graphing packages.

Attached is the script I use to start the notebook server. This script and
process works but is a little hacky You call it as follows


#
# on a machine in your cluster
#
$ cd dirWithNotebooks

# all the logs will be in startIPythonNotebook.sh.out
# nohup allows you to log in start your notebook server and log out.
$ nohup startIPythonNotebook.sh > startIPythonNotebook.sh.out &

#
# on you local machine
#

# because of firewalls I need to open an ssh tunnel
$ ssh -o ServerAliveInterval=120 -N -f -L localhost:8889:localhost:7000
myCluster

# connect to the notebook server using the browser of you choice
http://localhost:8889




#
# If you need to stop your notebooks server you may need to kill the server
# there is probably a cleaner way to do this
# $ ps -el | head -1; ps -efl | grep python
#

 <http://jupyter.org/> http://jupyter.org/


P.S. Jupiter is in the process of being released. The new Juypter lab alpha
was just announced it looks really sweet.



From:  pseudo oduesp <ps...@gmail.com>
Date:  Friday, July 22, 2016 at 2:08 AM
To:  Andrew Davidson <An...@SantaCruzIntegration.com>
Subject:  Re: spark and plot data

> HI andy  ,
> thanks for reply ,
> i tell it just hard to each time switch  between local concept and destributed
> concept , for example zepplin give easy way to interact with data ok , but
> it's hard to configure on huge cluster with lot of node in my case i have
> cluster with 69 nodes and i process huge volume of data with pyspark and it
> cool but when  i want to plot some chart  i get hard job to make it .
> 
> i sampling my result or aggregate  , take for example if i user randomforest
> algorithme in machine learning  i want to retrive  most importante features
> with my version alerady installed in our cluster (1.5.0) i can't get this.
> 
> do you have any solution.
> 
> Thanks 
> 
> 2016-07-21 18:44 GMT+02:00 Andy Davidson <An...@santacruzintegration.com>:
>> Hi Pseudo
>> 
>> Plotting, graphing, data visualization, report generation are common needs in
>> scientific and enterprise computing.
>> 
>> Can you tell me more about your use case? What is it about the current
>> process / workflow do you think could be improved by pushing plotting (I
>> assume you mean plotting and graphing) into spark.
>> 
>> 
>> In my personal work all the graphing is done in the driver on summary stats
>> calculated using spark. So for me using standard python libs has not been a
>> problem.
>> 
>> Andy
>> 
>> From:  pseudo oduesp <ps...@gmail.com>
>> Date:  Thursday, July 21, 2016 at 8:30 AM
>> To:  "user @spark" <us...@spark.apache.org>
>> Subject:  spark and plot data
>> 
>>> Hi , 
>>> i know spark  it s engine  to compute large data set but for me i work with
>>> pyspark and it s very wonderful machine
>>> 
>>> my question  we  don't have tools for ploting data each time we have to
>>> switch and go back to python for using plot.
>>> but when you have large result scatter plot or roc curve  you cant use
>>> collect to take data .
>>> 
>>> somone have propostion for plot .
>>> 
>>> thanks 
> 



Re: running jupyter notebook server Re: spark and plot data

Posted by Inam Ur Rehman <in...@gmail.com>.
Hello guys..i know its irrelevant to this topic but i've been looking
desperately for the solution. I am facing en exception
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-resolve-you-must-build-spark-with-hive-exception-td27390.html

plz help me.. I couldn't find any solution..

On Fri, Jul 22, 2016 at 10:07 PM, Andy Davidson <
Andy@santacruzintegration.com> wrote:

> Hi Pseudo
>
> I do not know much about zeppelin . What languages are you using?
>
> I have been doing my data exploration and graphing using python mostly
> because early on spark had good support for python. Its easy to collect()
> data as a local PANDAS object. I think at this point R should work well.
> You should be able to easily collect() your data as a R dataframe. I have
> not tried to Rstudio.
>
> I typically run the Jupiter notebook server in my data center. I find the
> notebooks really nice. I typically use matplotlib to generates my graph.
> There are a lot of graphing packages.
>
> *Attached is the script I use to start the notebook server*. This script
> and process works but is a little hacky You call it as follows
>
>
> #
> # on a machine in your cluster
> #
> $ cd dirWithNotebooks
>
> # all the logs will be in startIPythonNotebook.sh.out
> # nohup allows you to log in start your notebook server and log out.
> $ nohup startIPythonNotebook.sh > startIPythonNotebook.sh.out &
>
> #
> # on you local machine
> #
>
> # because of firewalls I need to open an ssh tunnel
> $ ssh -o ServerAliveInterval=120 -N -f -L localhost:8889:localhost:7000
> myCluster
>
> # connect to the notebook server using the browser of you choice
>
> http://localhost:8889
>
>
>
> #
> # If you need to stop your notebooks server you may need to kill the server
> # there is probably a cleaner way to do this
> # $ ps -el | head -1; ps -efl | grep python
> #
>
> <http://jupyter.org/>http://jupyter.org/
>
>
> P.S. Jupiter is in the process of being released. The new Juypter lab
> alpha was just announced it looks really sweet.
>
>
>
> From: pseudo oduesp <ps...@gmail.com>
> Date: Friday, July 22, 2016 at 2:08 AM
> To: Andrew Davidson <An...@SantaCruzIntegration.com>
> Subject: Re: spark and plot data
>
> HI andy  ,
> thanks for reply ,
> i tell it just hard to each time switch  between local concept and
> destributed concept , for example zepplin give easy way to interact with
> data ok , but it's hard to configure on huge cluster with lot of node in my
> case i have cluster with 69 nodes and i process huge volume of data with
> pyspark and it cool but when  i want to plot some chart  i get hard job to
> make it .
>
> i sampling my result or aggregate  , take for example if i user
> randomforest algorithme in machine learning  i want to retrive  most
> importante features with my version alerady installed in our cluster
> (1.5.0) i can't get this.
>
> do you have any solution.
>
> Thanks
>
> 2016-07-21 18:44 GMT+02:00 Andy Davidson <An...@santacruzintegration.com>:
>
>> Hi Pseudo
>>
>> Plotting, graphing, data visualization, report generation are common
>> needs in scientific and enterprise computing.
>>
>> Can you tell me more about your use case? What is it about the current
>> process / workflow do you think could be improved by pushing plotting (I
>> assume you mean plotting and graphing) into spark.
>>
>>
>> In my personal work all the graphing is done in the driver on summary
>> stats calculated using spark. So for me using standard python libs has not
>> been a problem.
>>
>> Andy
>>
>> From: pseudo oduesp <ps...@gmail.com>
>> Date: Thursday, July 21, 2016 at 8:30 AM
>> To: "user @spark" <us...@spark.apache.org>
>> Subject: spark and plot data
>>
>> Hi ,
>> i know spark  it s engine  to compute large data set but for me i work
>> with pyspark and it s very wonderful machine
>>
>> my question  we  don't have tools for ploting data each time we have to
>> switch and go back to python for using plot.
>> but when you have large result scatter plot or roc curve  you cant use
>> collect to take data .
>>
>> somone have propostion for plot .
>>
>> thanks
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>