You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zeppelin.apache.org by "Nawid Sayed (Jira)" <ji...@apache.org> on 2019/09/27 21:07:00 UTC

[jira] [Created] (ZEPPELIN-4358) Seaborn renders plots slowly in apache zeppelin notebooks

Nawid Sayed created ZEPPELIN-4358:
-------------------------------------

             Summary: Seaborn renders plots slowly in apache zeppelin notebooks
                 Key: ZEPPELIN-4358
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-4358
             Project: Zeppelin
          Issue Type: Bug
          Components: pySpark
    Affects Versions: 0.8.1
            Reporter: Nawid Sayed


I am currently trying to generate visualizations in zeppelin (0.8.1) notebooks using the pyspark interpreter with python 3.7.3.

Generating the following simple plot with seaborn (0.9.0) takes around 5 minutes (with very high CPU usage throughout the duration):

```%pyspark
%pyspark
import seaborn as sns
import numpy as np
import pandas as pd

data = pd.DataFrame(np.random.rand(100,3))

sns.pairplot(data)
```

This behavior is rather inconsistent as the following (much more data intensive) plot is rendered instantly

```%pyspark
%pyspark
import seaborn as sns
import numpy as np
import pandas as pd

df = pd.DataFrame(data = np.random.rand(10000,2))

sns.lineplot(x = 0, y = 1, data = df)
```

I noticed that using matplotlib (3.1.0) is generally much faster for and almost as snappy as I am used to from jupyter notebook environments.

I have already read about issue [ZEPPELIN-1894](https://jira.apache.org/jira/browse/ZEPPELIN-1894) but I can render the mentioned scatterplot instantly as well.

 

I already stated my question on StackOverflow but I think here is a better place:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)