You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by pseudo oduesp <ps...@gmail.com> on 2016/07/21 15:30:23 UTC

spark and plot data

Hi ,
i know spark  it s engine  to compute large data set but for me i work with
pyspark and it s very wonderful machine

my question  we  don't have tools for ploting data each time we have to
switch and go back to python for using plot.
but when you have large result scatter plot or roc curve  you cant use
collect to take data .

somone have propostion for plot .

thanks

Re: spark and plot data

Posted by andy petrella <an...@gmail.com>.

Heya,

Might be worth checking the spark-notebook <http://spark-notebook.io/> I
guess, it offers custom and reactive dynamic charts (scatter, line, bar,
pie, graph, radar, parallel, pivot, …) for any kind of data from an
intuitive and easy Scala API (with server side, incl. spark based, sampling
if needed).

There are many charts available natively, you can check this repo
<https://github.com/data-fellas/scala-for-data-science> (specially the
notebook named Why Spark Notebook) and if you’re familiar with docker, you
can even simply do the following (and use spark 2.0)

docker datafellas/scala-for-data-science:1.0-spark2
docker run --rm -it --net=host -m 8g
datafellas/scala-for-data-science:1.0-spark2 bash

<https://github.com/data-fellas/scala-for-data-science#start-the-services>

For any question, you can poke the community live on our gitter
<https://gitter.im/andypetrella/spark-notebook?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge>
or from github <https://github.com/andypetrella/spark-notebook> of course
HTH
andy

On Sat, Jul 23, 2016 at 11:26 AM Gourav Sengupta <go...@gmail.com>
wrote:

Hi Pedro,
>
> Toree is Scala kernel for Jupyter in case anyone needs a short intro. I
> use it regularly (when I am not using IntelliJ) and its quite good.
>
> Regards,
> Gourav
>
> On Fri, Jul 22, 2016 at 11:15 PM, Pedro Rodriguez <ski.rodriguez@gmail.com
> > wrote:
>
>> As of the most recent 0.6.0 release its partially alleviated, but still
>> not great (compared to something like Jupyter).
>>
>> They can be "downloaded" but its only really meaningful in importing it
>> back to Zeppelin. It would be great if they could be exported as HTML or
>> PDF, but at present they can't be. I know they have some sort of git
>> support, but it was never clear to me how it was suppose to be used since
>> the docs are sparse on that. So far what works best for us is S3 storage,
>> but you don't get the benefit of Github using that (history + commits etc).
>>
>> There are a couple other notebooks floating around, Apache Toree seems
>> the most promising for portability since its based on jupyter
>> https://github.com/apache/incubator-toree
>>
>> On Fri, Jul 22, 2016 at 3:53 PM, Gourav Sengupta <
>> gourav.sengupta@gmail.com> wrote:
>>
>>> The biggest stumbling block to using Zeppelin has been that we cannot
>>> download the notebooks, cannot export them and certainly cannot sync them
>>> back to Github, without mind numbing and sometimes irritating hacks. Have
>>> those issues been resolved?
>>>
>>>
>>> Regards,
>>> Gourav
>>>
>>>
>>> On Fri, Jul 22, 2016 at 2:22 PM, Pedro Rodriguez <
>>> ski.rodriguez@gmail.com> wrote:
>>>
>>>> Zeppelin works great. The other thing that we have done in notebooks
>>>> (like Zeppelin or Databricks) which support multiple types of spark session
>>>> is register Spark SQL temp tables in our scala code then escape hatch to
>>>> python for plotting with seaborn/matplotlib when the built in plots are
>>>> insufficient.
>>>>
>>>> —
>>>> Pedro Rodriguez
>>>> PhD Student in Large-Scale Machine Learning | CU Boulder
>>>> Systems Oriented Data Scientist
>>>> UC Berkeley AMPLab Alumni
>>>>
>>>> pedrorodriguez.io | 909-353-4423
>>>> github.com/EntilZha | LinkedIn
>>>> <https://www.linkedin.com/in/pedrorodriguezscience>
>>>>
>>>> On July 22, 2016 at 3:04:48 AM, Marco Colombo (
>>>> ing.marco.colombo@gmail.com) wrote:
>>>>
>>>> Take a look at zeppelin
>>>>
>>>> http://zeppelin.apache.org
>>>>
>>>> Il giovedì 21 luglio 2016, Andy Davidson <An...@santacruzintegration.com>
>>>> ha scritto:
>>>>
>>>>> Hi Pseudo
>>>>>
>>>>> Plotting, graphing, data visualization, report generation are common
>>>>> needs in scientific and enterprise computing.
>>>>>
>>>>> Can you tell me more about your use case? What is it about the current
>>>>> process / workflow do you think could be improved by pushing plotting (I
>>>>> assume you mean plotting and graphing) into spark.
>>>>>
>>>>>
>>>>> In my personal work all the graphing is done in the driver on summary
>>>>> stats calculated using spark. So for me using standard python libs has not
>>>>> been a problem.
>>>>>
>>>>> Andy
>>>>>
>>>>> From: pseudo oduesp <ps...@gmail.com>
>>>>> Date: Thursday, July 21, 2016 at 8:30 AM
>>>>> To: "user @spark" <us...@spark.apache.org>
>>>>> Subject: spark and plot data
>>>>>
>>>>> Hi ,
>>>>> i know spark  it s engine  to compute large data set but for me i work
>>>>> with pyspark and it s very wonderful machine
>>>>>
>>>>> my question  we  don't have tools for ploting data each time we have
>>>>> to switch and go back to python for using plot.
>>>>> but when you have large result scatter plot or roc curve  you cant use
>>>>> collect to take data .
>>>>>
>>>>> somone have propostion for plot .
>>>>>
>>>>> thanks
>>>>>
>>>>>
>>>>
>>>> --
>>>> Ing. Marco Colombo
>>>>
>>>>
>>>
>>
>>
>> --
>> Pedro Rodriguez
>> PhD Student in Distributed Machine Learning | CU Boulder
>> UC Berkeley AMPLab Alumni
>>
>> ski.rodriguez@gmail.com | pedrorodriguez.io | 909-353-4423
>> Github: github.com/EntilZha | LinkedIn:
>> https://www.linkedin.com/in/pedrorodriguezscience
>>
>>
> 
-- 
andy

Re: spark and plot data

Posted by Gourav Sengupta <go...@gmail.com>.

And we are all smiling: https://github.com/bokeh/bokeh-scala


Something that helped me immensely, particularly the example.
https://github.com/bokeh/bokeh-scala/issues/24

Please note that I use Toree as the Jupyter kernel.

Regards,
Gourav Sengupta

On Sat, Jul 23, 2016 at 8:01 PM, Andrew Ehrlich <an...@aehrlich.com> wrote:

> @Gourav, did you find any good inline plotting tools when using the Scala
> kernel? I found one based on highcharts but it was not frictionless the way
> matplotlib is.
>
> On Jul 23, 2016, at 2:26 AM, Gourav Sengupta <go...@gmail.com>
> wrote:
>
> Hi Pedro,
>
> Toree is Scala kernel for Jupyter in case anyone needs a short intro. I
> use it regularly (when I am not using IntelliJ) and its quite good.
>
> Regards,
> Gourav
>
> On Fri, Jul 22, 2016 at 11:15 PM, Pedro Rodriguez <ski.rodriguez@gmail.com
> > wrote:
>
>> As of the most recent 0.6.0 release its partially alleviated, but still
>> not great (compared to something like Jupyter).
>>
>> They can be "downloaded" but its only really meaningful in importing it
>> back to Zeppelin. It would be great if they could be exported as HTML or
>> PDF, but at present they can't be. I know they have some sort of git
>> support, but it was never clear to me how it was suppose to be used since
>> the docs are sparse on that. So far what works best for us is S3 storage,
>> but you don't get the benefit of Github using that (history + commits etc).
>>
>> There are a couple other notebooks floating around, Apache Toree seems
>> the most promising for portability since its based on jupyter
>> https://github.com/apache/incubator-toree
>>
>> On Fri, Jul 22, 2016 at 3:53 PM, Gourav Sengupta <
>> gourav.sengupta@gmail.com> wrote:
>>
>>> The biggest stumbling block to using Zeppelin has been that we cannot
>>> download the notebooks, cannot export them and certainly cannot sync them
>>> back to Github, without mind numbing and sometimes irritating hacks. Have
>>> those issues been resolved?
>>>
>>>
>>> Regards,
>>> Gourav
>>>
>>>
>>> On Fri, Jul 22, 2016 at 2:22 PM, Pedro Rodriguez <
>>> ski.rodriguez@gmail.com> wrote:
>>>
>>>> Zeppelin works great. The other thing that we have done in notebooks
>>>> (like Zeppelin or Databricks) which support multiple types of spark session
>>>> is register Spark SQL temp tables in our scala code then escape hatch to
>>>> python for plotting with seaborn/matplotlib when the built in plots are
>>>> insufficient.
>>>>
>>>> —
>>>> Pedro Rodriguez
>>>> PhD Student in Large-Scale Machine Learning | CU Boulder
>>>> Systems Oriented Data Scientist
>>>> UC Berkeley AMPLab Alumni
>>>>
>>>> pedrorodriguez.io | 909-353-4423
>>>> github.com/EntilZha | LinkedIn
>>>> <https://www.linkedin.com/in/pedrorodriguezscience>
>>>>
>>>> On July 22, 2016 at 3:04:48 AM, Marco Colombo (
>>>> ing.marco.colombo@gmail.com) wrote:
>>>>
>>>> Take a look at zeppelin
>>>>
>>>> http://zeppelin.apache.org
>>>>
>>>> Il giovedì 21 luglio 2016, Andy Davidson <An...@santacruzintegration.com>
>>>> ha scritto:
>>>>
>>>>> Hi Pseudo
>>>>>
>>>>> Plotting, graphing, data visualization, report generation are common
>>>>> needs in scientific and enterprise computing.
>>>>>
>>>>> Can you tell me more about your use case? What is it about the current
>>>>> process / workflow do you think could be improved by pushing plotting (I
>>>>> assume you mean plotting and graphing) into spark.
>>>>>
>>>>>
>>>>> In my personal work all the graphing is done in the driver on summary
>>>>> stats calculated using spark. So for me using standard python libs has not
>>>>> been a problem.
>>>>>
>>>>> Andy
>>>>>
>>>>> From: pseudo oduesp <ps...@gmail.com>
>>>>> Date: Thursday, July 21, 2016 at 8:30 AM
>>>>> To: "user @spark" <us...@spark.apache.org>
>>>>> Subject: spark and plot data
>>>>>
>>>>> Hi ,
>>>>> i know spark  it s engine  to compute large data set but for me i work
>>>>> with pyspark and it s very wonderful machine
>>>>>
>>>>> my question  we  don't have tools for ploting data each time we have
>>>>> to switch and go back to python for using plot.
>>>>> but when you have large result scatter plot or roc curve  you cant use
>>>>> collect to take data .
>>>>>
>>>>> somone have propostion for plot .
>>>>>
>>>>> thanks
>>>>>
>>>>>
>>>>
>>>> --
>>>> Ing. Marco Colombo
>>>>
>>>>
>>>
>>
>>
>> --
>> Pedro Rodriguez
>> PhD Student in Distributed Machine Learning | CU Boulder
>> UC Berkeley AMPLab Alumni
>>
>> ski.rodriguez@gmail.com | pedrorodriguez.io | 909-353-4423
>> Github: github.com/EntilZha | LinkedIn:
>> https://www.linkedin.com/in/pedrorodriguezscience
>>
>>
>
>

Re: spark and plot data

Posted by Andrew Ehrlich <an...@aehrlich.com>.

@Gourav, did you find any good inline plotting tools when using the Scala kernel? I found one based on highcharts but it was not frictionless the way matplotlib is.

> On Jul 23, 2016, at 2:26 AM, Gourav Sengupta <go...@gmail.com> wrote:
> 
> Hi Pedro,
> 
> Toree is Scala kernel for Jupyter in case anyone needs a short intro. I use it regularly (when I am not using IntelliJ) and its quite good.
> 
> Regards,
> Gourav
> 
> On Fri, Jul 22, 2016 at 11:15 PM, Pedro Rodriguez <ski.rodriguez@gmail.com <ma...@gmail.com>> wrote:
> As of the most recent 0.6.0 release its partially alleviated, but still not great (compared to something like Jupyter).
> 
> They can be "downloaded" but its only really meaningful in importing it back to Zeppelin. It would be great if they could be exported as HTML or PDF, but at present they can't be. I know they have some sort of git support, but it was never clear to me how it was suppose to be used since the docs are sparse on that. So far what works best for us is S3 storage, but you don't get the benefit of Github using that (history + commits etc).
> 
> There are a couple other notebooks floating around, Apache Toree seems the most promising for portability since its based on jupyter https://github.com/apache/incubator-toree <https://github.com/apache/incubator-toree>
> 
> On Fri, Jul 22, 2016 at 3:53 PM, Gourav Sengupta <gourav.sengupta@gmail.com <ma...@gmail.com>> wrote:
> The biggest stumbling block to using Zeppelin has been that we cannot download the notebooks, cannot export them and certainly cannot sync them back to Github, without mind numbing and sometimes irritating hacks. Have those issues been resolved?
> 
> 
> Regards,
> Gourav  
> 
> 
> On Fri, Jul 22, 2016 at 2:22 PM, Pedro Rodriguez <ski.rodriguez@gmail.com <ma...@gmail.com>> wrote:
> Zeppelin works great. The other thing that we have done in notebooks (like Zeppelin or Databricks) which support multiple types of spark session is register Spark SQL temp tables in our scala code then escape hatch to python for plotting with seaborn/matplotlib when the built in plots are insufficient.
> 
> —
> Pedro Rodriguez
> PhD Student in Large-Scale Machine Learning | CU Boulder
> Systems Oriented Data Scientist
> UC Berkeley AMPLab Alumni
> 
> pedrorodriguez.io <http://pedrorodriguez.io/> | 909-353-4423 <tel:909-353-4423>
> github.com/EntilZha <http://github.com/EntilZha> | LinkedIn <https://www.linkedin.com/in/pedrorodriguezscience>
> 
> On July 22, 2016 at 3:04:48 AM, Marco Colombo (ing.marco.colombo@gmail.com <ma...@gmail.com>) wrote:
> 
>> Take a look at zeppelin
>> 
>> http://zeppelin.apache.org <http://zeppelin.apache.org/>
>> 
>> Il giovedì 21 luglio 2016, Andy Davidson <Andy@santacruzintegration.com <ma...@santacruzintegration.com>> ha scritto:
>> Hi Pseudo
>> 
>> Plotting, graphing, data visualization, report generation are common needs in scientific and enterprise computing.
>> 
>> Can you tell me more about your use case? What is it about the current process / workflow do you think could be improved by pushing plotting (I assume you mean plotting and graphing) into spark.
>> 
>> 
>> In my personal work all the graphing is done in the driver on summary stats calculated using spark. So for me using standard python libs has not been a problem.
>> 
>> Andy
>> 
>> From: pseudo oduesp <pseudo20140@gmail.com <>>
>> Date: Thursday, July 21, 2016 at 8:30 AM
>> To: "user @spark" <user@spark.apache.org <>>
>> Subject: spark and plot data
>> 
>> Hi , 
>> i know spark  it s engine  to compute large data set but for me i work with pyspark and it s very wonderful machine 
>> 
>> my question  we  don't have tools for ploting data each time we have to switch and go back to python for using plot.
>> but when you have large result scatter plot or roc curve  you cant use collect to take data .
>> 
>> somone have propostion for plot .
>> 
>> thanks 
>> 
>> 
>> --
>> Ing. Marco Colombo
> 
> 
> 
> 
> -- 
> Pedro Rodriguez
> PhD Student in Distributed Machine Learning | CU Boulder
> UC Berkeley AMPLab Alumni
> 
> ski.rodriguez@gmail.com <ma...@gmail.com> | pedrorodriguez.io <http://pedrorodriguez.io/> | 909-353-4423
> Github: github.com/EntilZha <http://github.com/EntilZha> | LinkedIn: https://www.linkedin.com/in/pedrorodriguezscience <https://www.linkedin.com/in/pedrorodriguezscience>
> 
>

Re: spark and plot data

Posted by Gourav Sengupta <go...@gmail.com>.

Hi Pedro,

Toree is Scala kernel for Jupyter in case anyone needs a short intro. I use
it regularly (when I am not using IntelliJ) and its quite good.

Regards,
Gourav

On Fri, Jul 22, 2016 at 11:15 PM, Pedro Rodriguez <sk...@gmail.com>
wrote:

> As of the most recent 0.6.0 release its partially alleviated, but still
> not great (compared to something like Jupyter).
>
> They can be "downloaded" but its only really meaningful in importing it
> back to Zeppelin. It would be great if they could be exported as HTML or
> PDF, but at present they can't be. I know they have some sort of git
> support, but it was never clear to me how it was suppose to be used since
> the docs are sparse on that. So far what works best for us is S3 storage,
> but you don't get the benefit of Github using that (history + commits etc).
>
> There are a couple other notebooks floating around, Apache Toree seems the
> most promising for portability since its based on jupyter
> https://github.com/apache/incubator-toree
>
> On Fri, Jul 22, 2016 at 3:53 PM, Gourav Sengupta <
> gourav.sengupta@gmail.com> wrote:
>
>> The biggest stumbling block to using Zeppelin has been that we cannot
>> download the notebooks, cannot export them and certainly cannot sync them
>> back to Github, without mind numbing and sometimes irritating hacks. Have
>> those issues been resolved?
>>
>>
>> Regards,
>> Gourav
>>
>>
>> On Fri, Jul 22, 2016 at 2:22 PM, Pedro Rodriguez <ski.rodriguez@gmail.com
>> > wrote:
>>
>>> Zeppelin works great. The other thing that we have done in notebooks
>>> (like Zeppelin or Databricks) which support multiple types of spark session
>>> is register Spark SQL temp tables in our scala code then escape hatch to
>>> python for plotting with seaborn/matplotlib when the built in plots are
>>> insufficient.
>>>
>>> —
>>> Pedro Rodriguez
>>> PhD Student in Large-Scale Machine Learning | CU Boulder
>>> Systems Oriented Data Scientist
>>> UC Berkeley AMPLab Alumni
>>>
>>> pedrorodriguez.io | 909-353-4423
>>> github.com/EntilZha | LinkedIn
>>> <https://www.linkedin.com/in/pedrorodriguezscience>
>>>
>>> On July 22, 2016 at 3:04:48 AM, Marco Colombo (
>>> ing.marco.colombo@gmail.com) wrote:
>>>
>>> Take a look at zeppelin
>>>
>>> http://zeppelin.apache.org
>>>
>>> Il giovedì 21 luglio 2016, Andy Davidson <An...@santacruzintegration.com>
>>> ha scritto:
>>>
>>>> Hi Pseudo
>>>>
>>>> Plotting, graphing, data visualization, report generation are common
>>>> needs in scientific and enterprise computing.
>>>>
>>>> Can you tell me more about your use case? What is it about the current
>>>> process / workflow do you think could be improved by pushing plotting (I
>>>> assume you mean plotting and graphing) into spark.
>>>>
>>>>
>>>> In my personal work all the graphing is done in the driver on summary
>>>> stats calculated using spark. So for me using standard python libs has not
>>>> been a problem.
>>>>
>>>> Andy
>>>>
>>>> From: pseudo oduesp <ps...@gmail.com>
>>>> Date: Thursday, July 21, 2016 at 8:30 AM
>>>> To: "user @spark" <us...@spark.apache.org>
>>>> Subject: spark and plot data
>>>>
>>>> Hi ,
>>>> i know spark  it s engine  to compute large data set but for me i work
>>>> with pyspark and it s very wonderful machine
>>>>
>>>> my question  we  don't have tools for ploting data each time we have to
>>>> switch and go back to python for using plot.
>>>> but when you have large result scatter plot or roc curve  you cant use
>>>> collect to take data .
>>>>
>>>> somone have propostion for plot .
>>>>
>>>> thanks
>>>>
>>>>
>>>
>>> --
>>> Ing. Marco Colombo
>>>
>>>
>>
>
>
> --
> Pedro Rodriguez
> PhD Student in Distributed Machine Learning | CU Boulder
> UC Berkeley AMPLab Alumni
>
> ski.rodriguez@gmail.com | pedrorodriguez.io | 909-353-4423
> Github: github.com/EntilZha | LinkedIn:
> https://www.linkedin.com/in/pedrorodriguezscience
>
>

Re: spark and plot data

Posted by Gourav Sengupta <go...@gmail.com>.

Hi Taotao,

that is the way its usually used to visualize data from SPARK. But I do see
that people transfer the data to list to feed to Matplot (as in the SPARK
course currently running in EDX).

Please try using blaze and bokeh and you will be in a new world altogether.

Regards,
Gourav

On Sat, Jul 23, 2016 at 2:47 AM, Taotao.Li <ch...@gmail.com> wrote:

> hi, pesudo,
>
>       I've posted a blog before spark-dataframe-introduction
> <http://litaotao.github.io/spark-dataframe-introduction?s=gmail>  , and
> for me, I use spark dataframe [ or RDD ] to do the logic calculation on all
> the datasets, and then transform the result into pandas dataframe, and make
> data visualization using pandas dataframe, sometimes you may need
> matplotlib or seaborn.
>
> --
> *___________________*
> Quant | Engineer | Boy
> *___________________*
> *blog*:    http://litaotao.github.io
> <http://litaotao.github.io?utm_source=spark_mail>
> *github*: www.github.com/litaotao
>

Re: spark and plot data

Posted by "Taotao.Li" <ch...@gmail.com>.

hi, pesudo,

      I've posted a blog before spark-dataframe-introduction
<http://litaotao.github.io/spark-dataframe-introduction?s=gmail>  , and for
me, I use spark dataframe [ or RDD ] to do the logic calculation on all the
datasets, and then transform the result into pandas dataframe, and make
data visualization using pandas dataframe, sometimes you may need
matplotlib or seaborn.

-- 
*___________________*
Quant | Engineer | Boy
*___________________*
*blog*:    http://litaotao.github.io
<http://litaotao.github.io?utm_source=spark_mail>
*github*: www.github.com/litaotao

Re: spark and plot data

Posted by Pedro Rodriguez <sk...@gmail.com>.

As of the most recent 0.6.0 release its partially alleviated, but still not
great (compared to something like Jupyter).

They can be "downloaded" but its only really meaningful in importing it
back to Zeppelin. It would be great if they could be exported as HTML or
PDF, but at present they can't be. I know they have some sort of git
support, but it was never clear to me how it was suppose to be used since
the docs are sparse on that. So far what works best for us is S3 storage,
but you don't get the benefit of Github using that (history + commits etc).

There are a couple other notebooks floating around, Apache Toree seems the
most promising for portability since its based on jupyter
https://github.com/apache/incubator-toree

On Fri, Jul 22, 2016 at 3:53 PM, Gourav Sengupta <go...@gmail.com>
wrote:

> The biggest stumbling block to using Zeppelin has been that we cannot
> download the notebooks, cannot export them and certainly cannot sync them
> back to Github, without mind numbing and sometimes irritating hacks. Have
> those issues been resolved?
>
>
> Regards,
> Gourav
>
>
> On Fri, Jul 22, 2016 at 2:22 PM, Pedro Rodriguez <sk...@gmail.com>
> wrote:
>
>> Zeppelin works great. The other thing that we have done in notebooks
>> (like Zeppelin or Databricks) which support multiple types of spark session
>> is register Spark SQL temp tables in our scala code then escape hatch to
>> python for plotting with seaborn/matplotlib when the built in plots are
>> insufficient.
>>
>> —
>> Pedro Rodriguez
>> PhD Student in Large-Scale Machine Learning | CU Boulder
>> Systems Oriented Data Scientist
>> UC Berkeley AMPLab Alumni
>>
>> pedrorodriguez.io | 909-353-4423
>> github.com/EntilZha | LinkedIn
>> <https://www.linkedin.com/in/pedrorodriguezscience>
>>
>> On July 22, 2016 at 3:04:48 AM, Marco Colombo (
>> ing.marco.colombo@gmail.com) wrote:
>>
>> Take a look at zeppelin
>>
>> http://zeppelin.apache.org
>>
>> Il giovedì 21 luglio 2016, Andy Davidson <An...@santacruzintegration.com>
>> ha scritto:
>>
>>> Hi Pseudo
>>>
>>> Plotting, graphing, data visualization, report generation are common
>>> needs in scientific and enterprise computing.
>>>
>>> Can you tell me more about your use case? What is it about the current
>>> process / workflow do you think could be improved by pushing plotting (I
>>> assume you mean plotting and graphing) into spark.
>>>
>>>
>>> In my personal work all the graphing is done in the driver on summary
>>> stats calculated using spark. So for me using standard python libs has not
>>> been a problem.
>>>
>>> Andy
>>>
>>> From: pseudo oduesp <ps...@gmail.com>
>>> Date: Thursday, July 21, 2016 at 8:30 AM
>>> To: "user @spark" <us...@spark.apache.org>
>>> Subject: spark and plot data
>>>
>>> Hi ,
>>> i know spark  it s engine  to compute large data set but for me i work
>>> with pyspark and it s very wonderful machine
>>>
>>> my question  we  don't have tools for ploting data each time we have to
>>> switch and go back to python for using plot.
>>> but when you have large result scatter plot or roc curve  you cant use
>>> collect to take data .
>>>
>>> somone have propostion for plot .
>>>
>>> thanks
>>>
>>>
>>
>> --
>> Ing. Marco Colombo
>>
>>
>


-- 
Pedro Rodriguez
PhD Student in Distributed Machine Learning | CU Boulder
UC Berkeley AMPLab Alumni

ski.rodriguez@gmail.com | pedrorodriguez.io | 909-353-4423
Github: github.com/EntilZha | LinkedIn:
https://www.linkedin.com/in/pedrorodriguezscience

Re: spark and plot data

Posted by Gourav Sengupta <go...@gmail.com>.

The biggest stumbling block to using Zeppelin has been that we cannot
download the notebooks, cannot export them and certainly cannot sync them
back to Github, without mind numbing and sometimes irritating hacks. Have
those issues been resolved?


Regards,
Gourav


On Fri, Jul 22, 2016 at 2:22 PM, Pedro Rodriguez <sk...@gmail.com>
wrote:

> Zeppelin works great. The other thing that we have done in notebooks (like
> Zeppelin or Databricks) which support multiple types of spark session is
> register Spark SQL temp tables in our scala code then escape hatch to
> python for plotting with seaborn/matplotlib when the built in plots are
> insufficient.
>
> —
> Pedro Rodriguez
> PhD Student in Large-Scale Machine Learning | CU Boulder
> Systems Oriented Data Scientist
> UC Berkeley AMPLab Alumni
>
> pedrorodriguez.io | 909-353-4423
> github.com/EntilZha | LinkedIn
> <https://www.linkedin.com/in/pedrorodriguezscience>
>
> On July 22, 2016 at 3:04:48 AM, Marco Colombo (ing.marco.colombo@gmail.com)
> wrote:
>
> Take a look at zeppelin
>
> http://zeppelin.apache.org
>
> Il giovedì 21 luglio 2016, Andy Davidson <An...@santacruzintegration.com>
> ha scritto:
>
>> Hi Pseudo
>>
>> Plotting, graphing, data visualization, report generation are common
>> needs in scientific and enterprise computing.
>>
>> Can you tell me more about your use case? What is it about the current
>> process / workflow do you think could be improved by pushing plotting (I
>> assume you mean plotting and graphing) into spark.
>>
>>
>> In my personal work all the graphing is done in the driver on summary
>> stats calculated using spark. So for me using standard python libs has not
>> been a problem.
>>
>> Andy
>>
>> From: pseudo oduesp <ps...@gmail.com>
>> Date: Thursday, July 21, 2016 at 8:30 AM
>> To: "user @spark" <us...@spark.apache.org>
>> Subject: spark and plot data
>>
>> Hi ,
>> i know spark  it s engine  to compute large data set but for me i work
>> with pyspark and it s very wonderful machine
>>
>> my question  we  don't have tools for ploting data each time we have to
>> switch and go back to python for using plot.
>> but when you have large result scatter plot or roc curve  you cant use
>> collect to take data .
>>
>> somone have propostion for plot .
>>
>> thanks
>>
>>
>
> --
> Ing. Marco Colombo
>
>

Re: spark and plot data

Posted by Pedro Rodriguez <sk...@gmail.com>.

Zeppelin works great. The other thing that we have done in notebooks (like Zeppelin or Databricks) which support multiple types of spark session is register Spark SQL temp tables in our scala code then escape hatch to python for plotting with seaborn/matplotlib when the built in plots are insufficient.

—
Pedro Rodriguez
PhD Student in Large-Scale Machine Learning | CU Boulder
Systems Oriented Data Scientist
UC Berkeley AMPLab Alumni

pedrorodriguez.io | 909-353-4423
github.com/EntilZha | LinkedIn

On July 22, 2016 at 3:04:48 AM, Marco Colombo (ing.marco.colombo@gmail.com) wrote:

Take a look at zeppelin

http://zeppelin.apache.org

Il giovedì 21 luglio 2016, Andy Davidson <An...@santacruzintegration.com> ha scritto:
Hi Pseudo

Plotting, graphing, data visualization, report generation are common needs in scientific and enterprise computing.

Can you tell me more about your use case? What is it about the current process / workflow do you think could be improved by pushing plotting (I assume you mean plotting and graphing) into spark.


In my personal work all the graphing is done in the driver on summary stats calculated using spark. So for me using standard python libs has not been a problem.

Andy

From: pseudo oduesp <ps...@gmail.com>
Date: Thursday, July 21, 2016 at 8:30 AM
To: "user @spark" <us...@spark.apache.org>
Subject: spark and plot data

Hi , 
i know spark  it s engine  to compute large data set but for me i work with pyspark and it s very wonderful machine 

my question  we  don't have tools for ploting data each time we have to switch and go back to python for using plot.
but when you have large result scatter plot or roc curve  you cant use collect to take data .

somone have propostion for plot .

thanks 


--
Ing. Marco Colombo

Re: spark and plot data

Posted by Marco Colombo <in...@gmail.com>.

Take a look at zeppelin

http://zeppelin.apache.org

Il giovedì 21 luglio 2016, Andy Davidson <An...@santacruzintegration.com> ha
scritto:

> Hi Pseudo
>
> Plotting, graphing, data visualization, report generation are common needs
> in scientific and enterprise computing.
>
> Can you tell me more about your use case? What is it about the current
> process / workflow do you think could be improved by pushing plotting (I
> assume you mean plotting and graphing) into spark.
>
>
> In my personal work all the graphing is done in the driver on summary
> stats calculated using spark. So for me using standard python libs has not
> been a problem.
>
> Andy
>
> From: pseudo oduesp <pseudo20140@gmail.com
> <javascript:_e(%7B%7D,'cvml','pseudo20140@gmail.com');>>
> Date: Thursday, July 21, 2016 at 8:30 AM
> To: "user @spark" <user@spark.apache.org
> <javascript:_e(%7B%7D,'cvml','user@spark.apache.org');>>
> Subject: spark and plot data
>
> Hi ,
> i know spark  it s engine  to compute large data set but for me i work
> with pyspark and it s very wonderful machine
>
> my question  we  don't have tools for ploting data each time we have to
> switch and go back to python for using plot.
> but when you have large result scatter plot or roc curve  you cant use
> collect to take data .
>
> somone have propostion for plot .
>
> thanks
>
>

-- 
Ing. Marco Colombo

Re: spark and plot data

Posted by Andy Davidson <An...@SantaCruzIntegration.com>.

Hi Pseudo

Plotting, graphing, data visualization, report generation are common needs
in scientific and enterprise computing.

Can you tell me more about your use case? What is it about the current
process / workflow do you think could be improved by pushing plotting (I
assume you mean plotting and graphing) into spark.


In my personal work all the graphing is done in the driver on summary stats
calculated using spark. So for me using standard python libs has not been a
problem.

Andy

From:  pseudo oduesp <ps...@gmail.com>
Date:  Thursday, July 21, 2016 at 8:30 AM
To:  "user @spark" <us...@spark.apache.org>
Subject:  spark and plot data

> Hi , 
> i know spark  it s engine  to compute large data set but for me i work with
> pyspark and it s very wonderful machine
> 
> my question  we  don't have tools for ploting data each time we have to switch
> and go back to python for using plot.
> but when you have large result scatter plot or roc curve  you cant use collect
> to take data .
> 
> somone have propostion for plot .
> 
> thanks