You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zeppelin.apache.org by "Bernhard Walter (JIRA)" <ji...@apache.org> on 2017/02/23 16:12:44 UTC

[jira] [Created] (ZEPPELIN-2160) PySpark: Matplotlib Integration extremely slow

Bernhard Walter created ZEPPELIN-2160:
-----------------------------------------

             Summary: PySpark: Matplotlib Integration extremely slow
                 Key: ZEPPELIN-2160
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2160
             Project: Zeppelin
          Issue Type: Bug
          Components: front-end, GUI
    Affects Versions: 0.7.0
            Reporter: Bernhard Walter


*Issue:*
I tested matplotlib integration in Pyspark. As a baseline, the following 3 examples took at 1 - 2 seconds in Jupyter on the same machine.

{code}
%pyspark

import matplotlib.pyplot as plt
plt.plot([1,2,3,4])
plt.ylabel('some numbers')
z.show(plt)
{code}

==> 1 sec

{code}
%pyspark

import numpy as np
import matplotlib.pyplot as plt

# Fixing random state for reproducibility
np.random.seed(19680801)

mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)

# the histogram of the data
n, bins, patches = plt.hist(x, 50, normed=1, facecolor='g', alpha=0.75)

plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title('Histogram of IQ')
plt.text(60, .025, r'$\mu=100,\ \sigma=15$')
plt.axis([40, 160, 0, 0.03])
plt.grid(True)
plt.show()
{code}

==> 11 sec

{code}
%pyspark
from ggplot import *

ggplot(diamonds, aes(x='price', fill='cut')) +\
    geom_density(alpha=0.25) +\
    facet_wrap("clarity")
{code}

==> 138 sec

*Environment:*
Downloaded http://apache.mirror.digionline.de/zeppelin/zeppelin-0.7.0/zeppelin-0.7.0-bin-netinst.tgz and installed spark, python, sh, md and angular interpreter
Started via bin/zeppelin.sh



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)