You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2019/03/03 18:16:00 UTC

[jira] [Resolved] (SPARK-26906) Pyspark RDD Replication Potentially Not Working

     [ https://issues.apache.org/jira/browse/SPARK-26906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-26906.
-------------------------------
    Resolution: Cannot Reproduce

> Pyspark RDD Replication Potentially Not Working
> -----------------------------------------------
>
>                 Key: SPARK-26906
>                 URL: https://issues.apache.org/jira/browse/SPARK-26906
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, Web UI
>    Affects Versions: 2.3.2
>         Environment: I am using Google Cloud's Dataproc version [1.3.19-deb9 2018/12/14|https://cloud.google.com/dataproc/docs/release-notes#december_14_2018] (version 2.3.2 Spark and version 2.9.0 Hadoop) with version Debian 9, with python version 3.7. PySpark shell is activated using pyspark --num-executors = 100
>            Reporter: Han Altae-Tran
>            Priority: Minor
>         Attachments: spark_ui.png
>
>
> Pyspark RDD replication doesn't seem to be functioning properly. Even with a simple example, the UI reports only 1x replication, despite using the flag for 2x replication
> {code:java}
> rdd = sc.range(10**9)
> mapped = rdd.map(lambda x: x)
> mapped.persist(pyspark.StorageLevel.DISK_ONLY_2) \\ PythonRDD[1] at RDD at PythonRDD.scala:52
> mapped.count(){code}
>  
> Interestingly, if you catch the UI page at just the right time, you see that it starts off 2x replicated, but ends up 1x replicated afterward. Perhaps the RDD is replicated, but it is just the UI that is unable to register this.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org