You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by "Bernhard Walter (JIRA)" <ji...@apache.org> on 2017/02/23 16:12:44 UTC
[jira] [Created] (ZEPPELIN-2160) PySpark: Matplotlib Integration
extremely slow
Bernhard Walter created ZEPPELIN-2160:
-----------------------------------------
Summary: PySpark: Matplotlib Integration extremely slow
Key: ZEPPELIN-2160
URL: https://issues.apache.org/jira/browse/ZEPPELIN-2160
Project: Zeppelin
Issue Type: Bug
Components: front-end, GUI
Affects Versions: 0.7.0
Reporter: Bernhard Walter
*Issue:*
I tested matplotlib integration in Pyspark. As a baseline, the following 3 examples took at 1 - 2 seconds in Jupyter on the same machine.
{code}
%pyspark
import matplotlib.pyplot as plt
plt.plot([1,2,3,4])
plt.ylabel('some numbers')
z.show(plt)
{code}
==> 1 sec
{code}
%pyspark
import numpy as np
import matplotlib.pyplot as plt
# Fixing random state for reproducibility
np.random.seed(19680801)
mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)
# the histogram of the data
n, bins, patches = plt.hist(x, 50, normed=1, facecolor='g', alpha=0.75)
plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title('Histogram of IQ')
plt.text(60, .025, r'$\mu=100,\ \sigma=15$')
plt.axis([40, 160, 0, 0.03])
plt.grid(True)
plt.show()
{code}
==> 11 sec
{code}
%pyspark
from ggplot import *
ggplot(diamonds, aes(x='price', fill='cut')) +\
geom_density(alpha=0.25) +\
facet_wrap("clarity")
{code}
==> 138 sec
*Environment:*
Downloaded http://apache.mirror.digionline.de/zeppelin/zeppelin-0.7.0/zeppelin-0.7.0-bin-netinst.tgz and installed spark, python, sh, md and angular interpreter
Started via bin/zeppelin.sh
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)